My customer has bought a layer 2 WAN service from a service provider and already purchased 3750 switches to act as CE devices. They now want me to specify a design of how to implement what they have bought. They have about 50 sites but due to QoS limitations of the service provider they need to use 1 physical interface to route voice traffic towards the WAN at each site and a different interface for data traffic. They wish to have a dynamic routing environment so in my opinion they have two basic options for routing protocols that could send the voice and data out of different interfaces:-
1) Run EIGRP and use distribute lists to advertise voice subnets out of 1 link and data subnets out of the other.
2) Run 2 different routing protocols, e.g. OSPF for data and EIGRP for voice. I don't think there should be any need to redistribute between them.
I think that either of these solutions is likely to give me scalability issues due to the number of neighbors required (ie approximately 100 per switch) but I haven't managed to find any numbers for what should be achievable. Can anyone give me any indication of the likely number of eigrp neighbors I could reasonably support? I have seen some people mention best practice is 20 but I have also seen references to a Networkers presentation where they discussed a live environment with 800 neighbors on a 7200.
I have also been considering ways to improve scalability. I have identified the following:-
1) Most sites can run as stubs because there is only a single switch per site.
2) I am not sure yet but I am hoping that the LAN addresses within the sites can be summarised towards the WAN so instability in the LAN will not cause instability in the WAN.
3) I think I could possibly reduce the number of neighbors required on some devices by configuring static neighbors and configuring switches at key sites as hubs for the smaller sites. I think this would probably require me to turn off split horizon. I think I could also use the 'no ip next-hop-self eigrp' command to stop traffic having to route via the hub site. I am nervous of turning off split horizon and using this command in an environment with multiple hubs. Please could you give me any guidance?
4) I could implement multiple VLANs within the WAN and implement a hierarchical design over the top of the layer 2 WAN. I am sure I could get a solution like this to work and it is the one I will most likely fall back on if I don't get an answer to this query, but the exact solution would also depend on how many neighbors I could configure on the 3750s. The main problem with this solution, however, is that traffic will route via the hub sites rather than going directly to the destination site. This will cause an overhead on the WAN connections at the key sites, which is exactly where I don't want it.
Failing all of the above, I could push back on the customer's requirement to have dynamic routing and suggest static routing over half or all of the network. I obviously don't want to do this if these is a feasible solution.
All suggestions appreciated.
I am no expert so don't take my suggestion to heart. It merely an implementation of something that I have seen in the past.
Presuming that the customer is using Metro-E as the transport, in most cases the definition of services is based on an Access VLAN on the ports of the CE facing the customer device. i.e. DATA VLAN 100 and VOICE VLAN 101. In the event the CE (3750) is trunking with SP Metro-E Core Switch, then you would define your our C-Tag VLAN using dot1q tunneling and the SP encapsulates both VLANs using a S-Tag. Based on your details, the former seems applicable.
The simplest solution is to run two separate instances of EIGRP using a separate AS number. e.g AS 100 for DATA and AS 101 for VOICE and Advertise only the relevant networks (both WAN and LAN) in each instance. Traffic from spoke will not always route to the hub (only in the case of default traffic via a default route 0.0.0.0/0) and will be switched in the Metro-E cloud via MAC address learning of the respective Access ports of the 3750. There is no need for advertising both networks under each process, because most likely both VLANs will be riding the same physical transport. i.e. if the circuit fails, both voice and data will be unreachable. I hope this helps.
The presentation is .1Q on each of the 2 physical interfaces. There is then a small provider-owned (non-Cisco) L2 switch at the site that marks the QoS and switches both types of traffic over a common uplink to a non-Cisco L2 core. The way it was explained to me is that the network is dedicated to the customer so there is no need to tunnel in the core. We just need to specify which VLANs to define in the core and which access ports they should go to.
Previously the service provider has provided a managed service but the old L3 switches are now being replaced by new ones that I need to define the configuration of. The service provider is being less than helpful in providing information about what is already there, despite the fact that we will have to integrate with their setup during the migration phase and they will continue to provide the layer 2 after the migration. I am not 100% sure but based on their comments I think they are currently doing static routing over both the Data and Voice VLANs (although they won't currently tell me about their IP addressing so I can't even integrate with that at the moment). The customer is keen to do dynamic routing so they don't need to make changes across the network when changes are made within an individual site.
Running 2 EIGRP autonomous systems is similar to what I was considering with the second option of running a different routing protocols on each link. This would still involve each switch having 100 neighbors.My question is still 'would this be scalable for a 3750'? In reality, I think that if I was using 2 instances/protocols I would prefer to use OSPF rather than EIGRP. I would still need to know whether the 3750 would scale this high. What is your opinion of EIGRP vs OSPF in this environment?
I was trying to weigh up the pros and cons of 1) treating the WAN as a broadcast and every switch having 2 neighbors with every other switch (in which case traffic will be forwarded directly to the destination), ie the default behaviour if I enable EIGRP everywhere and also what you are talking about; or 2) treating the WAN as an NBMA network to reduce the numbers of neighbors defined but also looking into techniques such as 'no ip next-hop-self eigrp' to get the traffic to go straight to the destination. I am not sure if either will scale to the required number of neighbors but the second option should scale better. The downside of the second option is that it is more complicated and I am not sure if it will cause any routing loops if more than 1 hub is defined.
Brian's suggestions are wise
if you can use different Vlan tags you can easily separate the sites in small subsets and you can solve the scalability issues.
To be noted:
from scalability point of view one thing is to have N EIGRP (or OSPF) neighbors on multiple L3 interfaces (even if logical interfaces like vlan based SVIs or router subinterfaces) another thing is to have N EIGRP neighbors on a single interface on a single subnet.
L2 services look like attractive as it looks like you can easily have a virtual switch in the middle connecting all your sites. However, when routing protocol activity is taken in account they are less plug and play then it could be believed by management.
If you can use vlans so you divide the sites in small groups like
first 15 sites use vlans 101 for voice routing and vlan 201 for data routing
second 15 sites can use vlan 102 for voice routing and vlan 202 for data routing
last 20 sites can use vlan 103 for voice routing and vlan 203 for data routing
two sites should act as hub terminating all vlans on them.
This approach saves resources on routers and provides scalability.
Someone could even go further by using a different pair of vlans for each site, specially if direct communication between spokes is not desired.
So you should ask them if you can use multiple Vlans like in the example above and you can indicate two hub sites and two vlans per site (I would use one vlan pair for each 15 sites subset)
This is the key point.
the provides is probably an MPLS service like VPLS their signalling plane is totally separated
Hope to help
Thanks for your responses. I understand that I can split the network into groups to avoid scalability issues related to the number of neighbors. There are, however, negative sides to this as well. All of the sites in the network are connected by circuits to the core. If I group the network then I will have to route traffic between groups via one or more of the key sites. This will adversely affect the circuits to these sites compared to if I could configure it as a flat network. The higher the number of groups the more traffic will have to be routed between groups so there are benefits in having as few groups as possible.
I expected that I probably couldn't keep the network to a single subnet, hence why I raised the question. Is the 15 devices per subnet a best practice recommendation? Is it specific to the 3750? Could I go higher than that? I would probably want to connect each site to 2 hub sites in order to provide resilience and separate voice and data on both so could I do 15 devices on each of 4 subnets? The hub sites will also need to be interconnected or meshed so these sites will have additional neighbors. Are there any published limits or best practices? If I have to explain to the customer why they can't keep the network flat then it would be useful to have some references to show them.
If I were to use OSPF rather than EIGRP or EIGRP on data and OSPF on voice then what would the limits or recommendations be for a 3750? If I have to route between groups then it will be more difficult to route voice and data over different circuits at the hub sites using a single routing protocol without having to make changes in the hub sites every time a change is made at a remote site (e.g. distribute lists or offset lists). One of the reasons for using a dynamic routing protocol is because they only want to make changes at the site that has a change. If I am routing between groups then it seems to make more sense to me to use 2 different routing protocols for voice and data. Do you agree? Are there any downsides to this (such as the CPU being less efficient when reconverging 2 protocols rather than 1)?
The provider tells me it is a switched core rather than VPLS but I assume the scalabililty limits are the same. One of the benefits of using different groups is that I can use different STP instances for different VLANs in the core so I may be able to avoid a single link blocking all traffic.