cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
261
Views
0
Helpful
0
Comments
Virginia Teixeira
Cisco Employee
Cisco Employee

AI as a New Traffic Type: Implications for the Access, WAN and DCI

Artificial Intelligence (AI) is transforming the digital landscape, and Service Providers (SPs) play a key role in enabling this shift. This post explores how AI traffic differs from traditional patterns and what capabilities are needed to prepare networks for AI-driven demands. 

What Makes AI Traffic Different?

AI applications generate unique traffic patterns that differ from traditional workloads, placing new demands on transport networks. While much of the focus around AI infrastructure centers on the data center (DC), AI’s impact extends well beyond. Effective AI operation depends on robust, high-performance connectivity across the entire network. Although the full impact of AI traffic is still evolving—shaped by algorithmic advances, hardware improvements, and market adoption—it is clear AI is not just more traffic, it’s different traffic,requiring careful planning and adaptation. 

AI Training vs. AI Inference in the Network

A primary distinction to make is between AI training and AI inference, as each possesses fundamentally different objectives and network demands.

AI Training:

While AI training primarily occurs within data center clusters, its data-driven nature and evolving techniques significantly impact the transport network connecting data sources to processing, including edge, DC-to-DC and hybrid cloud environments.

Pre-Training Traffic Demands: AI training's impact begins before the training itself, as massive datasets must reach the training cluster. These AI-related data transfers—data ingestion, replication, and continuous data collection for data freshness—demand high bandwidth (100Gbps+ connectivity for TB-to-PB-scale datasets, depending on latency tolerance) and are often batched, resulting in significant traffic spikes that can exceed average utilization (Peak-to-Average Ratio, PAR) by 5x to 10x. Frequent data updates drive uplink-heavy flows, shifting UL/DL ratio towards symmetry, or even UL dominance, especially at the edge of the network.

The Challenge of Multi-DC Site Training: Traditional AI training primarily relies on intra-DC communication, with the transport network handling checkpoints and backups. However, the increasing limitations of individual DCs—scaling constraints, data locality requirements, and the sheer size of modern models—are driving the need for distributed training across geographically dispersed sites. This introduces significant challenges for the DCI network, requiring ultra-low latency (as low as 200µsec) and dedicated high bandwidth (400Gbps+) to ensure timely model synchronization. 

Data Security and Sovereignty Drive Network Demands: Growing concerns about data protection, security, sovereignty, and the need to prevent biased training are driving the demand for reliable and secure data movement. This places stringent security and resiliency requirements on the transport network, including quantum-safe readiness, robust redundancy mechanisms and traffic path control. Additionally, federated training, a decentralized collaborative approach where multiple entities train a model using their private, locally-stored data and iteratively share updates, is gaining traction as a means of preserving data confidentiality and integrity. This drives periodic data exchanges that scale with model size, number of participants and update frequency. Swarm training further extends this collaboration to the edge, generating uplink-heavy flows that significantly impact the UL/DL bandwidth ratio and spreads the networking effects of AI training to the network's edge.

 

AI Inference:

AI Inference also imposes bandwidth, latency, resiliency, and security requirements on the transport network, stemming from different needs and reflected in different network requirements. From retrieval-based models to AI agents, inference brings challenges that cascade from core to access networks.

Bandwidth growth: AI inference generates unique, dynamic, and largely non-cacheable traffic tailored to each prompt. This is driving increasingly unpredictable bandwidth demands as multi-modal AI and overall AI adoption accelerate.

Shifting Uplink/Downlink Dynamics: advanced Retrieval-Augmented Generation (RAG) scenarios can reverse traditional UL/DL ratios, with uplink traffic exceeding downlink by up to 10x in some test environments. This is due to the large context windows (up to 2M tokens) sent to the model, while the model’s output may be a short response, command, or decision. The placement of the RAG database relative to the AI app and model determines which transport network segments are most affected. Furthermore, collaborative AI agents, multimodal AI assistants, and edge inference workflows are driving a broader shift towards higher uplink traffic, leading to UL/DL symmetry or even UL dominance at the edge, requiring re-evaluation of access network design and bandwidth allocation strategies

Growing Role of Latency in AI Networks: Conversational AI assistants require low latency for optimal user experience (a 30% engagement drop occurs for every 300ms of perceived latency), and real-time machine-based applications demand even stricter latency. While model inference traditionally dominated overall response time, advances in inference acceleration are reducing model-side latency—making transport latency a more critical factor. This shift demands network optimization strategies such as edge inference and latency-aware routing. The challenge is amplified in agentic AI architectures, where agents complete tasks through multiple sequential reasoning and interaction steps—often involving external APIs or other agents. Instead of a single request-response pattern, these workflows generate 1:N interactions, multiplying the effect of latency on end-user experience. As a result, even networks with low round-trip times must now minimize cumulative delay. Addressing this requires more granular latency visibility, adaptive path selection, and architectural support for distributed inference closer to the user. 

Edge Inference improve resiliency: Distributing inference to the edge and leveraging local hosting addresses more than latency and bandwidth challenges. It also enhances resiliency, reduces the blast radius and supports data sovereignty. To facilitate this distributed model, the transport network must enable flexible and dynamic placement of edge inference resources.

AI Connectivity: Three key Scenarios

Considering these changes in traffic patterns, the segments of the network they impact, and the emergence of new business models, there are three AI connectivity scenarios that represent current trends.

Scenario 1: Centralized Training at Scale

  • Focus: Large-scale model development and foundation model training
  • Characteristics: Centralized compute, experimentation, lower risk, faster time-to-market and the use of large public datasets
  • Network Demands:
    • High-throughput data center interconnect (DCI)
    • Quantum-safe connectivity between DCs and cloud environments
    • Efficient support for large, batched data transfers (TBs to PBs)

Scenario 2: Production Inference and Data Sovereignty

  • Focus: customised model deployment using private or proprietary datasets
  • Characteristics: Shift from public cloud to private distributed DCs for cost, IP protection, and compliance. Power and space constraints also lead to a move away from major cities.
  • Network Demands:
    • Reliable, on-demand DCI across distributed sites
    • Support for hybrid environments and dynamic workload placement
    • Enforced data residency and secure traffic segmentation

Inference becomes production-grade, and connectivity must be agile, sovereign, and cost-optimized.

Scenario 3: Pervasive AI and Edge Inference

  • Focus: Decentralized AI execution, including agentic AI, swarm learning, and physical AI applications
  • Characteristics: AI runs close to users and devices, with real-time collaboration between distributed nodes
  • Network Demands:
    • Any-to-any connectivity at scale
    • Low-latency transport from edge to core
    • Resilient, bandwidth-efficient access and aggregation layers
    • Support for increasing uplink traffic and symmetric UL/DL ratios

This phase pushes AI to the edge, requiring the network to deliver performance, reliability, and real-time coordination across all domains.

Key Network Capabilities for AI-Driven Traffic

Supporting AI is not about replacing the network—it’s about evolving it. Existing investments in simplification, programmability, automation, visibility, and security, as detailed in Digital Infrastructure for a Digital Society, can be extended to meet AI-specific demands. However, AI workloads introduce distinct performance characteristics that require a more intentional application of these capabilities:

  • Performance: AI implementations need high-performing networks that surpass the demands of typical video or internet traffic. The network needs to deliver predictable behavior, support a shift towards symmetric access and higher peak-to-average ratios. Effective network latency assessment needs to capture the multi-step nature of agentic AI (1:N RTTs).
  • Deep Visibility: Understanding traffic flows and profiles, as well as real-time network behavior and capacity, is essential for designing effective Quality of Service (QoS) policies and making informed traffic engineering choices. Deep visibility is also key to securing traffic, controlling and automating path policies, and providing assurance.
  • Security: The quality of AI results relies on the quality of the data it uses and produces. Ensuring data protection during transit is crucial. This requires securing networking nodes (using a trustworthy stack/Cisco SDL) and links (implementing MacSEC) to establish end-to-end trusted paths that keep traffic contained within specified network borders.
  • Resiliency: Uninterrupted access to AI applications is critical for user adoption, business relevance, and return on AI investments. Network resiliency that mitigates the impact of failures, errors, congestion, and security attacks, is foundational for digital resilience. This requires a multi-dimensional approach encompassing resilient network components, a simplified architecture minimizing single points of failure and reducing failure scenarios, and real-time visibility for proactive failure detection and performance management.

Conclusion: Building AI-Ready Networks: Turning Readiness into Differentiation

AI is not just another application—it’s a new type of traffic that will fundamentally transform how transport networks are architected and operated. To stay ahead, Service Providers must deeply understand these unique traffic behaviors, anticipate their impact across all layers of the network, and invest in foundational capabilities—such as real-time visibility, built-in security, and end-to-end resiliency on a simplified automated network. Those who modernize their infrastructure with these capabilities won't just support AI—they'll unlock new value propositions, differentiate their offerings, and position themselves as strategic enablers in the AI-driven digital economy.

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the NSO Developer community: