That's very interesting - actually, yesterday we put a VM PG on vSwitch0 too, without vmk obviously, and failover worked without a problem.
This problem definitely seems to be isolated to MGMT traffic, and perhaps only when a vmk is attached (which, for MGMT, is likely to be every time).
Do you have any additional info about what you have seen?
... View more
We did some of your suggestions, and setup a SPAN to watch the ARPs, etc...
What we found when we executed a ping from another subnet to the ESXi host:
1) When the A and B side were both up, an ARP request came from ESXi to the N5k on the A side (vmnic0).
2) The reply came from the N5k down the B side (vmnic1).
3) The ARP info was simply ignored. As if it was an unsolicited reply.
vSwitch0 is configured as active(vmnic0, vmnic1), load-balance(iphash), notify(false), failback(true).
I'm proceeding with a box swap now - a HP server with SFP and will try the same VPC with this.
... View more
We have used all of the failover policies.... portid, iphash, mac, explicit.. all the same behaviour.
I will have to go back to our network team... a hard sell though, as this N5k has been serving our UCS blade infrastructure (which runs our main vSphere environment) with VPC for a good 5 years now, without problems.
Can you recommend any debugging guides for this sort of problem? I do have access to the N5Ks with a lower-priv role (san admin).
Could it also be a problem on the upstream N7K vpc? On the N5k itself, the vpc never enters a failed state - it's always up/success/success
... View more
Thanks for your reply.
You make good points, especially about understanding the path of the packets. I just wanted to add a couple of points which I maybe did not make clear:
1) This is a standalone C-series. It is not managed by UCSM.
2) There are no VMs running - we simply want to get a stable mgmt network out of it. We cannot even get this far.
3) This is reproducable on another C220 M4S - the firmware versions are 4.1(2d).
4) We currently have a large UCS Blade infrastructure (with Gen2 FIs, 6296) running on the same N5k pair, with VPC, and no problems with the main vSphere environment. (failover works, etc..).
The only different here is that we are using these C-series "direct" to the N5k. I am wondering if there is not some sort of limitation when using VIC interfaces direct to a N5k? (i.e, it's not VM-FEX - it's just "vanilla" VIC cards carved into multiple interfaces).
... View more
We are facing what seems to be a very weird problem. We have a TAC case and a separate VMware case, but up until this point there has been no idea as to what is causing it.
Our system will only come online when one side of the VPC is shutdown. This brings the link up. At this point, bringing the downed side of the VPC back up again does keep the box externally reachable, but we are no longer able to ping the default gateway from the box itself. Also, external packets into the box show a ~60% success rate, as if we are occasionally going down a "bad" path, or as if it was some sort of layer-2-based asymmetric routing.
vSphere does not failover correctly. All links are shown as up, the cables have been tested, as have the SFPs.
The MAC is seen is both N5k switches - the links are never suspended, compatibility checks all pass. The config looks like it should "just work".
This is a new install, and has never been functional - the vSphere host itself is empty, and not even in vCenter yet.
2x C220 M4S w/ VIC1225 - 2 uplinks plugged directly into N5k (A and B N5k) with SFPs - cisco-branded.
Standalone mode - no UCSM
Firmware 3.0(1c) - latest Cisco "starred" release
vSwitch0 [ eth0 ( uplink0 ( vmnic0 ) ), eth1 ( uplink1 ( vmnic1 ) ) ] == Management - vmk0 - 192.168.10.10
vSwitch1 [ eth2 ( uplink0 ( vmnic2 ) ), eth3 ( uplink1 ( vmnic3 ) ) ] == vMotion - vmk1 - x.x.x.x
vSwitch2 [ eth4 ( uplink0 ( vmnic4 ) ), eth5 ( uplink1 ( vmnic5 ) ) ] == NFS - vmk2 - x.x.x.x
vSwitch3 [ eth6 ( uplink0 ( vmnic6 ) ), eth7 ( uplink1 ( vmnic7 ) ) ] == VM Portgroup
The VIC defaults are used. We don't do anything esoteric. Each VIC is in TRUNK mode, with the untagging being done as the vSwitch level.
N5k (both A and B switches)
interface port-channel200 description ** esxi-test-1 ** switchport mode trunk switchport trunk allowed vlan 111-115,500,476 vpc 200
As you can see, the port channel/vpc config is bog-standard.
[root@esxi-test-1:~] esxcfg-vswitch -l Switch Name Num Ports Used Ports Configured Ports MTU Uplinks vSwitch0 7802 6 128 1500 vmnic0,vmnic1 PortGroup Name VLAN ID Used Ports Uplinks Management Network 111 1 vmnic0,vmnic1
[root@esxi-test-1:~] esxcli network vswitch standard policy failover get -v vSwitch0 Load Balancing: iphash Network Failure Detection: link Notify Switches: false Failback: false Active Adapters: vmnic0, vmnic1 Standby Adapters: Unused Adapters:
Our problem is focused on management access - we haven't even gotten the box into vCenter yet (well, not stable anyway) - so for the scope of this problem, we focus on vSwitch0.
We are using vpc port channels and not LACP. We have tried to use every failover policy / active/active/standby combination going... we've tried it all.
We can reproduce this problem on another C220 M4S with VIC1225, and even on vSphere 5.5. We therefore think it's either a bug in the VIC firmware, or a problem with our approach / understanding of the architecture.
Any advice appreciated
... View more
Hello everyone, Please find attached a basic script to extract the kickstart, system and ucsm firmware from the ucs infra bundle (and others..). I am not sure as to the exact legality of this, so admin, if you object to this material, feel free to remove it. This is hardly ground-breaking stuff though, and no encryption is used. Maybe there's already a well-known way to extract this - let me know if so! Background The background here is - Cisco bundle the relevant firmware objects in a large blob. This is not helpful when you need to boot off of an alternate kickstart (for example, during a failed FI upgrade, from tftp) and/or if your system image is corrupt, and you want to copy scp: bootflash:, etc..... during such failed upgrades, we've had to rely on TAC providing them to us, which is not entirely uncomfortable, but does take a bit of time. If you're downloading firmware for a Nexus device, you conveniently have access to the individual kickstart/system from the outset. Cisco's blob format Cisco's ".bin" files are headed by a small header, which describes a few things about the bin package, such as the size of the bundled package, the type of hardware platform its for, etc.. Here is some typical output from the 'show' operation of a certain UCS system command, which is available when accessing the system via the debug plugin... however, I won't mention any names (Incidentally, this command and a helper wrapper script are what perform the exact thing my script does... but of course they do it better, and provide more functionality.) ********************************************** HEADER CONTENTS ********************************************** Header version: 1.0 Len: 800 byte Image length:488933830 byte Magic number: 21326 Platform type: 7 Verification type: 1 Software family: 2 Image type: 11 Debug attribute: 2 Hardware type: 0 Compression type: 2 Run time location: 1 Packaged by: 0 Memsize: 256 Timestamp: 1482316264 Version string: 3.1(2e)B Interim version string: 3.1(2e)B Image full name: ucs-k9-bundle-b-series.3.1.2e.B.bin Features: Build ID: S0 ********************************************** Cisco NX-OS(tm) ucs, Software (ucs-k9-bundle-b-series), Version 3.1(2e)B, RELEASE SOFTWARE Copyright (c) 2002-2013 by Cisco Systems, Inc. ------------------------------- So, Cisco bin files begin with this header, and straight after consist of (usually) an inline tarred-gzip archive. Depending on the bin file, there may be one or more archvies, as well as a NetBoot Linux image (in the case of kickstart) which is loopback-mountable, and cpio archives (in the case of the IOM/fex/chassis image). I've only so far implemented basic tar/gzip extraction of the first archive, which is what we actually need - the rest can be done via xxd and searching for the magic numbers of certain archives + dd'ing the image out... and is left as an exercise for the curious... but the script can also be applied to most sub-bin files which arise from the extraction of the main bin file (infra bundle, then system, then plugins, etc...), as most all contain the same cisco header + tgz format. Extraction Trivial use of the script: ./extractbins.sh: [file.bin] [directory_to_extract_to] Actual use: ./extractbins.sh ucs-6300-k9-bundle-infra.3.1.2e.A.bin extracted cisco image extractor 1.1 - dsw(c),2017 [.] cisco image of 756 bytes found in ucs-6300-k9-bundle-infra.3.1.2e.A.bin [.] seeking past header length 756.... [.] wrote header-less image ucs-6300-k9-bundle-infra.3.1.2e.A.bin.nohdr [.] gzip found; decompressing.. [.] tar found; untarring.. ./ ./isan/ ./isan/etc/ ./isan/etc/climib/ ./isan/etc/imghdr.bin ./isan/plugin_img/ ./isan/plugin_img/ucs-6300-k9-system.5.0.3.N2.3.12e.bin ./isan/plugin_img/ucs-manager-k18.104.22.168e.bin ./isan/plugin_img/ucs-2300-6300.3.1.2e.bin ./isan/plugin_img/ucs-6300-k9-kickstart.5.0.3.N2.3.12e.bin ./isan/plugin_img/ucs-2200-6300.3.1.2e.bin [.] cleaning up.. [.] done Much can be improved and added, but the basic functionality is there. Bugs: almost certainly. Please heed the disclaimer in the script. Cheers dan
... View more
Hi Rafael, Thanks for your reply. Are you able to cite that in official Cisco docs by any chance? I'd like to take this to our account manager and see if we can't find out whether or not there is a roadmap of platform support. It's curious to say the least that the n5k support would be suddenly "dropped" (don't want to jump the gun here, though). Cheers dan
... View more
Just wanted to add the following - " One Platform Kit (OnePK) Support has been added for One Platform Kit (onePK) Turbo API. OnePK is a cross-platform API and software development kit that enables you to develop applications that interact directly with Cisco networking devices. onePK provides you access to networking services by using a set of controlled APIs that share the same programming model and style. For more information, see the following URL: " Cisco Nexus 5500 Series Release Notes, Cisco NX-OS Release 7.x - Cisco If this is really so, why is the official Compatibility Matrix not updated to reflect this fact? Thanks! dan
... View more
Hi all, The Nexus 5000 Product page alludes that onePK is available for the platform. However, I can find no evidence of this from the Compat Matrix. Is there currently a rough ETA of when support is expected? Many thanks dan  http://www.cisco.com/c/en/us/products/switches/nexus-5000-series-switches/index.html
... View more