We're trying to set up netflow based billing for our hosting environment. We're dual homed to two different ISPs using BGP peering with each (ASN 18817). The drawing below shows the logical setup: The routers are each Cat6506E with Sup32 and PFC3B/MFSC2A running 12.2(33)SXJ. The direct link between is layer 3 (routed ports) for iBGP peering. The dark line is a layer 2 LAN made up of multiple switches (not shown) each connected to both 6500's with spanning-tree for loop protection. These switches also provide a layer-2 path between routers but the configured BGP peering is set to the layer-3 port IP on the link. There is at least one firewall connected directly to a 6500. The firewalls are all ASAs connected to any one of the not-shown mid-span switches. The router interfaces to the layer-2 LAN is a VLAN SVI using GLBP to provde a redundant gateway. There are multiple subnets on this vlan configured as secondary IP addresses on the VLAN interface (and in the GLBP gateway configs).
The question is: how do we collect stats from all traffic to/from the internet without collecting any local traffic (firewall to firewall) and without getting duplicate flows sent to our collector?
I was hoping to just put "ip flow ingress" and "ip flow egress" on the ISP facing L3 ports but that's not actually collecting outgoing flows (appears egress netflow may not be supported on the 6500). I added "ip flow ingress" to the VLAN interface but that didn't seem to fix it so I also addes the global command "ip flow ingress layer2-switched vlan 50". However after a small change to the layer2 LAN some (but not all) of our clients saw a near doubling of bandwidth (according to netflow). We compared the netflow stats to the server port usage stats for one client and it looks like we were under reporting before the change and appear to be over-reporting now. I'm thinking the overage is either because we're counting MLS traffic flowing between subnets on the layer-2 LAN but can't verify. With GLBP moving the default gateway and MSTP blocking at least one uplink from a mid-span switch to a 6500, traffic flows can be difficult to predict. Especially when you don't know exactly when a flow gets accounted. For instance, if a packet comes into one 6500 becase that's the only layer-2 path to the gateway BUT the gateway is currently on the other 6500, does the first 6500 count that flow or does the second one count it or will both? etc etc.
Fixed my own problem. I removed the line "ip flow ingress layer2-switched vlan 50" and left the "ip flow ingress" on the internet facing port and the VLAN SVI. I did not add any netflow commands on the routed link between routers under the theory that any traffic crossing that path was already conted for on one of the other two interfaces.
After leaving this run for a week our flow totals from NDE and from an fprobe (sniffing a span port) are within 0.01% of each other.
What's weird though is that we tested with the layer2-switched command prior to deployment and had very accurate results. The only change to the LAN was the removal of a non-involved router. AND we upgraded the IOS on the 6500's to SXJ. Whether the code change or router removal caused the over counting of flows is a question for the ages.