Solved: Re: Nexus N5k/N2k Replacement suggestions -> Nexus 9300

BastiiGee · ‎01-12-2023

Hello all,

In the near future I will get the task to renew the network hardware in the data center. So far we have two N5k with four N2k each as fabric extender in use. The two N5k are configured as vPc peers. Virtual machines and servers are connected with vpc to two different fabric extenders each, in order to create a failsafe and to be able to perform ISSU updates without interruption.

We plan to replace the N5k with slightly "larger" Nexus 9300 and the N2K with slightly "smaller" Nexus 9300 as fabric extenders.

Now there are two possibilities from my point of view:

1. the current design is rebuild (Fex mode)

8 N9k 9300 are driven in FEX mode. Two 9300 form a vpc domain and each of the two gets four fabric extenders assigned.

2. new topology: collapsed core (all standalone with back to back vpc)

The two slightly larger 9300 models form the core layer (N9K_1 + N9K_2 )vpc domain 1 ).

The access layer consists of 9300 pairs (N9K_3/4/5/6 + N9K_7/8/9/10 = vpc domain 2,3,4,5). These 4 pairs are then bound to the core layer with vpc.

So every Access pair has a uplink to N9K_1 and one uplink to N9K_2

Please also see the attachments for both options.

I realize that the future of data center best practices is in a different direction.(VXLAN or ACI)
However, I think the traditional Layer 2 design meets our needs and requirements.

So maybe you could tell me which of the two options above you think makes more sense.
Or how would you carry out the replacement? Maybe there are other possibilities that I have not considered.

Also, maybe someone has official Cisco links and guides regarding these topics. Unfortunately I find almost nothing on this.

Thanks a lot,
basti

f00z · ‎01-20-2023

Well it completely depends on your application. The safest and best method would be to have a layer3 port for every server with an ip address on each port and no layer2 at all and do routing. But this doesn't work for some designs i.e. where you have a VM that has to move from one port to another, or there are servers using vrrp/keepalived/carp whatever type of redundancy method. The best way around that would be to have BGP to every server and make every server a router and it advertises the /32 for the VM that moves around, but again if it's not supported by the software then it's too much work to make that operate correctly.

So a second design would be to have an appropriate length prefix on each access switch like a /24 and then servers on that switch share the /24 and VMs can move around on that switch to any port, but not off of that switch to another switch unless they change IP. Redundancy provided by 2x access witch in vpc with active/active hsrp.

So the most simplistic design is to have converged core where the 2 core devices do routing and have the vlan SVI/IRB interfaces and have FHRP using HSRP (in cisco's case) with VPC. So active/active hsrp with vpc. Then all the access is layer2 , VM can move anywhere on any port on any switch. Only layer3 is on the 2 core devices.

So the bottom line is, it depends on the application. Either way works, I personally like the BGP to every server but that requires some type of cloud orchestration software to update BGP when VM or containers move and want to move the IP with it. The converged core with only the core doing layer3 with the IRB interfaces on the 2 cores /hsrp also works great, the only real limitation there is the scale of those 2 devices. This is a pretty standard design I did many years ago with 2x 6509 for the core and then everything else was layer2. The 6509's had a lot of scale for many adjacency entries, many routes. It seems that this method of doing things is going out of style .. new switches do not have the same scale even as the 6509 does (i.e. most switches have 48k 64k maximum adjacency [egress rewrite db meaning arp entries, nd entries, mpls etc]). So like a nexus switch has fairly low scale for this because it's designed to be a distributed platform (EVPN with adjacency entries on the rack/leaf switches so it's distributed) but high route scale so they can hold the route table for everything. The catalyst 9600x though has a lot more adjacency and mac scale also tunable values for these things because it's designed to be a type of converged core device for campus, which is fine to use in a datacenter also. But like i said, for a really small setup none of this matters as you probably will never hit the 48k adj limit , probably won't need to use more than 4000 vlans either, so there's no point in building out a really complex setup if it won't get any larger. The maximum scale is reached by full l3 everywhere BGP to server (then the adj entries are on each rack switch ports) or by using EVPN. But if you can fit your infrastructure into smaller scale devices there's no reason to spend extra money and development time trying to use another method. You can PM if if more info is needed or for specific design requirements, otherwise I could go on forever with pages

View solution in original post

f00z · ‎01-14-2023

For this type of design with small number of devices, it would be easier to go with the catalyst 9k stuff i.e. 9400 chassis or stack or 9600 chassis and plug everything into it, or use 9300 stacked, etc.. To go nexus you could go to the 6k, but to be honest I've had so many weird quirks with using FEX+VPC (had a few 5k deployments) that I don't want to use it again.. FEX+VPC has been really buggy for me when constantly changing configs due to being in MSP environment. I suppose it's fine if it's set up once and configs don't change much.

For nexus 9300, back to back VPC is ok also, 2 n9k in vpc, then connect 2 more n9k in vpc to those, but i wouldn't do it more than 3 switches in a row, example: 2 n9k 'core', VPC to two n9k aggregation, VPC to 2 n9k access. I try to avoid even doing 3 in a row like that if possible but is not avoidable in these scenarios. A better method would be to use the catalyst 9k stack or chassis . Depends on feature requirements and budget neither of which you specified in the question. If you don't want to include it here send me a PM.

BastiiGee · ‎01-17-2023

Hi and thanks for your answer.
In our current deployment we don’t have any buggy behavior with FEXs.

why you wouldn’t do more then two in a row? In my expectations a collapsed Core and 4 access pairs don’t bring any troubles ..

i also thought about the Cat9k series. But I think the high availability and furthermore the non disruptive updates just are possible with nexus.

f00z · ‎01-17-2023

Yes two nexus 9k VPC and then plug access switches into them with VPC works fine. I have set this up before. No reason to go outside standard l2 setup unless there's some requirement for it. Everyone is pushing EVPN now, and any nexus 9k new you buy today will support EVPN down the road so it has future proofing. Same with cat9kX (the X versions).

Your n9k standalone design is exactly what I'd do , and have been doing in small cases like this and it works perfectly. Just make sure each pair has a separate VPC domain ID as it generates the mac addrs from that and it won't sync up unless every VPC domain ID is different for each pair of switches. I wouldn't use FEX with n9k.

Been doing networking for 30 years, and had some weird experiences with more than 3 switches in a row 'daisy chain'.. Tested it out and daisy chained 16x switches together and really got some strange results.. Ethernet isn't smart This is partly why the EVPN push (layer2 only at the access layer and layer3 everywhere else) gives some intelligence to the network plus the obvious reasons (vlan scale, services, etc).

BastiiGee · ‎01-20-2023

So also in the collapsed core/ aggregation you wouldn’t go layer 3 with first hop redundancy? Just all plain layer 2 connections between access layer and core?

f00z · ‎01-20-2023

Well it completely depends on your application. The safest and best method would be to have a layer3 port for every server with an ip address on each port and no layer2 at all and do routing. But this doesn't work for some designs i.e. where you have a VM that has to move from one port to another, or there are servers using vrrp/keepalived/carp whatever type of redundancy method. The best way around that would be to have BGP to every server and make every server a router and it advertises the /32 for the VM that moves around, but again if it's not supported by the software then it's too much work to make that operate correctly.

So a second design would be to have an appropriate length prefix on each access switch like a /24 and then servers on that switch share the /24 and VMs can move around on that switch to any port, but not off of that switch to another switch unless they change IP. Redundancy provided by 2x access witch in vpc with active/active hsrp.

So the most simplistic design is to have converged core where the 2 core devices do routing and have the vlan SVI/IRB interfaces and have FHRP using HSRP (in cisco's case) with VPC. So active/active hsrp with vpc. Then all the access is layer2 , VM can move anywhere on any port on any switch. Only layer3 is on the 2 core devices.

So the bottom line is, it depends on the application. Either way works, I personally like the BGP to every server but that requires some type of cloud orchestration software to update BGP when VM or containers move and want to move the IP with it. The converged core with only the core doing layer3 with the IRB interfaces on the 2 cores /hsrp also works great, the only real limitation there is the scale of those 2 devices. This is a pretty standard design I did many years ago with 2x 6509 for the core and then everything else was layer2. The 6509's had a lot of scale for many adjacency entries, many routes. It seems that this method of doing things is going out of style .. new switches do not have the same scale even as the 6509 does (i.e. most switches have 48k 64k maximum adjacency [egress rewrite db meaning arp entries, nd entries, mpls etc]). So like a nexus switch has fairly low scale for this because it's designed to be a distributed platform (EVPN with adjacency entries on the rack/leaf switches so it's distributed) but high route scale so they can hold the route table for everything. The catalyst 9600x though has a lot more adjacency and mac scale also tunable values for these things because it's designed to be a type of converged core device for campus, which is fine to use in a datacenter also. But like i said, for a really small setup none of this matters as you probably will never hit the 48k adj limit , probably won't need to use more than 4000 vlans either, so there's no point in building out a really complex setup if it won't get any larger. The maximum scale is reached by full l3 everywhere BGP to server (then the adj entries are on each rack switch ports) or by using EVPN. But if you can fit your infrastructure into smaller scale devices there's no reason to spend extra money and development time trying to use another method. You can PM if if more info is needed or for specific design requirements, otherwise I could go on forever with pages