Solved: TFTP connection timed out copying image to 4900M

Terence_O · ‎06-12-2020

Running into an issue when trying to download an updated IOS image file to bootflash: from a TFTP server.

Pings to server from 4900M switch are good

Pings to 4900M switch from server are good

copy startup-config to tftp is successful.

copy tftp bootflash: times out

example output:

SWITCH#copy tftp: bootflash:

Address or name of remote host []? 10.20.30.12

Source filename []? cat4500e-entservices-mz.152-4.E9.bin

Destination filename [cat4500e-entservices-mz.152-4.E9.bin]?

Accessing tftp://10.20.30.12/cat4500e-entservices-mz.152-4.E9.bin...

ifs_check_file 359 CPU_i86 0

ifs_check_file 361 cpu 183

ifs_check_file 362 cpu_family -1

Loading cat4500e-entservices-mz.152-4.E9.bin from 10.20.30.12 (via Vlan100): !O !OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO

OOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO... [timed out]

%Error reading tftp://10.20.30.12/cat4500e-entservices-mz.152-4.E9.bin (Connection timed out)

tftp server address: 10.20.30.12 - virtual server

4900 switch address: 10.20.30.2

SVI on 4900

interface Vlan100

mtu 9198

ip address 10.20.30.2 255.255.255.0

ip access-group 110 out

no ip redirects

no ip unreachables

standby 110 ip 10.20.30.1

standby 110 timers msec 200 msec 600

standby 110 priority 120

standby 110 preempt delay reload 300

end

ACL 110 is just a tracking acl and does not filter traffic.

SWITCH#sh ip access-l 110

Extended IP access list 110

10 permit ip any any

The 4900M is scheduled for replacement in the near future, but upgrades are needed to address CVEs.

Richard Burts · ‎06-12-2020

Thanks for the additional information. Here are some comments about what I notice, what I am thinking, and what I might suggest:

- the switch and the server are in the same subnet (same vlan 100). So we do not have to be concerned with any layer 3 routing type issues.

- connectivity between the switch and the server is good for things that are fairly quick - such as ping or even copying a config file.

- but connectivity seems problematic for things that are big - like copying an image file.

- typically ping is short and quick. But I wonder if you did an extended ping to do several thousand, if you might see some pattern of dropped packets? I wonder what might happen if you did 2 or 3 of the extended pings at the same time if the behavior might change?

- would I be correct in assuming that there might be more than 1 layer 2 path from the switch to the server?

- could you find the layer 2 path between the devices? using the mac address of the server do show mac address table on switches between the server and the switch to find the path from the switch to the server. and using the mac address of the switch do show mac address table to find the path from server to switch? Is the path going and coming the same? I wonder while doing the copy of the image file if the path might change?

- I wonder if there is any possibility that there is spanning tree instability that might provoke the problem?

- I wonder if you check the logs on the switches between the server and the 4900 if there are any messages generated that might relate to this behavior?

- I remember working with a customer doing code upgrades to a bunch of routers and switches. These were not in the same subnet, and in fact many of them were over a WAN connection that was pretty heavily used. I had the experience many times of starting an image copy to a remote device, watching it run for a good amount of time, and then time out. I discovered that if I were careful about when I did the image copy (looking for times when traffic on the WAN was lower) I could get the image copies to work better. But then I found a better solution. Instead of using tftp (which uses UDP and is a not reliable transport) if I used something with a reliable transport (like FTP, or like SCP, or like HTTP) that I could run the image copies any time of the day and that they worked! So ultimately my suggestion is that perhaps you might want to use a different protocol to copy the image files.

HTH

Rick

View solution in original post

Richard Burts · ‎06-12-2020

The output you provide does give us a clue about the issue. As the copy start you get !OO. The ! indicates successful transfer. the O then indicates out of order packets. We do not know anything about the topology here but it looks to me like there are some issues on the connection from the 4900 to the server.

HTH

Rick

Terence_O · ‎06-12-2020

The architecture looks like this top down with 4900s as core of a small datacenter, 9Ks as distribution, and Dell Switches at the access layer:

4900sw1 4900sw2

9300sw1 9300sw2

access sw1 access sw2 .....access swN

Dell/EMC Virtual environment

Richard Burts · ‎06-12-2020