I now have a workaround for this issue. I simply replaced the NAT pool in the "ip nat inside" command with the outside interface name. Specifically:
OLD non-working commands:
ip nat pool BDI11_NAT_POOL 220.127.116.11 18.104.22.168 netmask 255.255.255.0 ip nat inside source list BDI11_NAT_LIST pool BDI11_NAT_POOL overload
NEW working command:
ip nat inside source list BDI11_NAT_LIST interface BD I11 overload
That is cleaner anyway, since the BDI 11 interface address was the only entry in the old pool.
Apparently, NAT was working for all outbound traffic (ICMP, UDP, and TCP SYN from internal client). But for two-way TCP traffic, when the outside server's SYN/ACK reply reached the router outside interface (BDI 11) the router failed to recognize that reply as part of an active NAT session originated by an inside client. Since the router itself didn't initiate the three-way handshake, it sent a RST in response to the server's SYN/ACK.
Thanks again to Georg, and to anyone else who took the time to look this over and chew on it.
... View more
Thanks for taking a look. ZONE_INTERNAL is VLAN3, where the web client resides. A separate route map governs NAT for VLAN3 traffic to all other WAN destinations (i.e. another NAT outside interface).
The rest of the config is sensitive, and relatively lengthy. So I'm not at liberty to post the whole thing, and it would take a long time to sanitize it enough to make it releasable. In any case, the rest of the router's NAT and ZBFW configuration, including bridge domain 11, has been in use for some time without issues. The only thing new is that we assigned a new IP address to the existing BDI 11, added a second NAT route map, added a new firewall zone (BDI11), added zone pairs between BDI11 & INT, and added ACLs for the route map and zone pairs.
NAT support was only recently added to IOS-XE for BDIs. So it is tempting to conclude that this is a bug, especially given the fact that pings work and the firewall isn't logging drops. But I have learned not to be too quick to assume that IOS is at fault for seemingly strange behavior.
... View more
I'm looking for ideas on where my TCP handshakes might be going in this lab topology:
[Web client] --> [3650 switch & VLAN11] --> [4451 NAT router/dot1q 11 bridge domain/BDI 11] --> [Web server]
The 4451-x is running IOS XE Version 16.06.05 (IP Base).
The web client connects to a VLAN11 access port on the 3650 switch. The switch is trunked to the router via PortChannel1. The Po1.11 subinterface is assigned to Bridge Domain 11 at the router, as is Gig0/1/0 (a NIM-2GE-CU-SFP). And the web server is connected to Gig0/1/0. (My apologies for not attaching a nice diagram. I'm swamped, and this problem is not helping.)
Pings & tracert from web client to server look good:
C:\Documents and Settings\Mark.OBS>ping 22.214.171.124
Pinging 126.96.36.199 with 32 bytes of data:
Reply from 188.8.131.52: bytes=32 time=4ms TTL=127 Reply from 184.108.40.206: bytes=32 time<1ms TTL=127 Reply from 220.127.116.11: bytes=32 time<1ms TTL=127 Reply from 18.104.22.168: bytes=32 time=1ms TTL=127
Ping statistics for 22.214.171.124: Packets: Sent = 4, Received = 4, Lost = 0 (0% loss), Approximate round trip times in milli-seconds: Minimum = 0ms, Maximum = 4ms, Average = 1ms
C:\Documents and Settings\Mark.OBS>tracert 126.96.36.199
Tracing route to 188.8.131.52 over a maximum of 30 hops
1 <1 ms <1 ms <1 ms 192.168.120.2 2 <1 ms <1 ms <1 ms 184.108.40.206
C:\Documents and Settings\Mark.OBS>
Wireshark sessions on client and server show TCP SYN packet reaching server, and server sending SYN/ACK back towards client (via BDI 11's MAC and NAT global IP). Embedded packet capture on router also shows router receiving SYN/ACK from server at BDI address, but the router immediately (same timestamp) responds with a TCP RST. Server SYN/ACK replies never make it back to the client. UDP from client to server also works.
Here is a dump from the NAT translation table on the router, showing both the ping and TCP translations, as expected:
router01# router01#sh ip nat trans Pro Inside global Inside local Outside local Outside global --- 10.0.0.35 172.18.0.17 --- --- tcp 10.0.0.32:50001 192.168.120.11:50001 --- --- udp 10.0.0.32:1581 192.168.120.11:1581 --- --- tcp 10.0.0.32:1581 192.168.120.11:1581 --- --- tcp 10.0.0.32:80 192.168.120.4:80 --- --- udp 10.0.0.32:53 192.168.120.4:53 --- --- tcp 220.127.116.11:515 192.168.120.12:8 18.104.22.168 22.214.171.124 tcp 126.96.36.199:1385 192.168.120.12:1385 188.8.131.52:80 184.108.40.206:80 tcp 220.127.116.11:1387 192.168.120.12:1387 18.104.22.168:80 22.214.171.124:80 icmp 126.96.36.199:1280 192.168.120.12:1280 188.8.131.52:1280 184.108.40.206:1280 tcp 220.127.116.11:1386 192.168.120.12:1386 18.104.22.168:80 22.214.171.124:80 Total number of translations: 11
Relevant router configuration extracts are attached, along with a PCAP file showing successful ICMP and failed TCP as captured at the physical router interface by EPC (i.e. where server packets ingress on the way to the NAT outside BDI).
The zone-based firewall is not logging any denies, and I have opened up the relevant ACLs to minimize that risk.
Does anyone have a guess at what might be going wrong here?
... View more
Michel: Thanks for the response. Actually, I understand what kind of routing workarounds could allow NTP to function in spite of this "best practice." But I am mystified as to why a Cisco "NTP best practice" paper (http://www.cisco.com/en/US/tech/tk869/tk769/technologies_white_paper09186a0080117070.shtml) and various security policies would call for setting a loopback address as the NTP source when that practice will often cause more problems than it solves. The stability of a loopback address is nice when that address is used to uniquely identify the platform for a routing protocol or syslog. A loopback-based source address can also simplify ACL management, since that address won't change if an interface or link failure forces the router to send traffic from a different interface. But I keep seeing security configuration guides/policies that call for also using a loopback address as the source for two-way protocols, such as FTP and NTP. That just doesn't make sense to me when you balance the routing implications against the limited security benefits (stable device identification, simplified ACL maintenance, and obfuscation of device addresses). I was hoping to learn that some obscure command might allow me to control which NTP exchanges use the loopback-based source address. For example, the loopback source address would work fine on outgoing NTP broadcasts (and probably in replies from NTP servers). But I would prefer that NTP client requests use a source address based on the exit interface. That way replies can be routed back to the client without cluttering up routing tables with routes to loopback addresses. So far, it looks like I'll need to chalk this up to poor coordination between the network security and network administration communities. Thanks again, Mark
... View more
The Cisco NTP Best Practices White Paper and DISA STIGs recommend setting the NTP source address to a loopback interface (e.g. "ntp source loopback0"). But this only seems to work if the requesting (NTP client) router is the default gateway for the NTP server. Specifically, the NTP server will attempt to reply to the requesting router's loopback-based source address (taken from the NTP request packet). Since that address will always be non-local from the perspective of the NTP server, the NTP server will encapsulate the reply in a Layer 2 frame addressed to its default gateway. If the gateway was the source of the original NTP request, that should work. But in most other situations that gateway won't know how to reach a loopback-based address, and will discard the reply. I have verified this in tests with routers running both 12.4 and 15.1 releases (and NTP debugging enabled). When the NTP source is a loopback address, NTP replies never reach the requesting router. With the default NTP source address (i.e. based on the exit interface) everything works fine. Obviously, you could employ workarounds, such as static routes or injecting loopback addresses into your routing protocols. But that seems uglier than leaving NTP source addresses at their defaults. Why is this "best practice" so commonly advocated without mention of some significant caveats regarding routing? Am I missing something? Thanks, Mark
... View more
Hello. Bottom line is I'd like to speed up multicast convergence after topology changes in the network pictured below. Here is an overview of the situation: Two VLANS/subnets, with a pair of 3560 multilayer switches configured as ip (and multicast) routers between VLANs. HSRP used to provide fault tolerance for gateway role on both VLANs. RAPID-PVST spanning tree protocol on all VLANs. global portfast on all access ports. all VLANS permitted on all trunks. Redundant trunks, as depicted in diagram. PIM sparse-dense-mode enabled on VLAN2 and VLAN3 SVIs for Distrib and Access switches. Multicast servers in VLAN2 and clients in VLAN3 can (and do) serve as sources for multicast streams. If members of multicast group fail to see a heartbeat from a server for more than about 5 seconds, the streams will drop for all members. Currently IP unicast traffic continues with only minimal (~1 second) interruption when any single switch or trunk fails. With current setup, multicast streams drop for roughly 10 seconds when Distrib2, or the trunk between Distrib2 and Access2, fails. All other failure scenarios do not seem to have this effect. Note that Access2 is, by default, the PIM DR (apparently because it has the highest IP address of the switches running PIM). Again, I'd like to speed up multicast re-convergence in all single switch/link failure scenarios. But I'm unclear at this point just how I might do that. Options I'm currently considering include: Shorten the IGMP query interval? Shorten the IP pim query-interval on both VLAN SVIs? Force the PIM DR role to one of the Distrub switches? I'd appreciate any thoughts you multicast experts might have. I don't have much time and lab access is scarce. Thanks, Mark
... View more