C4900 Multicast redundancy issue (RPF/wrong config)

ss1 · ‎09-08-2017

Hello,

I am currently using a Cisco 4900 as a PIM multicast router. The usage concept and configs are explained below.

main server interface (server is 10.1.2.1):

interface Vlan666 
ip address 10.1.2.2 255.255.255.252 
no ip redirects 
no ip unreachables 
no ip proxy-arp 
ip pim sparse-mode 
counter 
end

backup server interface (server is 10.10.1.1):

interface Vlan661
 ip address 10.10.1.2 255.255.255.252
 no ip redirects
 no ip unreachables
 no ip proxy-arp
 ip pim sparse-mode
 counter
end

multicast group addresses: 239.100.10.0/24

other configs:

access-list 69 permit 239.100.10.0 0.0.0.255
ip pim rp-address 192.168.10.74 69 override
interface Loopback4
 ip address 192.168.10.74 255.255.255.252
 no ip redirects
 no ip unreachables
 no ip proxy-arp
 ip pim sparse-mode
end

The multicast source address is 192.168.10.66. What do we have now? Two VLAN interfaces receiving 192.168.10.66/239.100.10.0 (1:1 duplicated content). Since the source address 192.168.10.66 is not directly connected I installed some quagga on the servers in order to announce 192.168.10.66/32 via BGP and have it into the routing table either from the main or backup server:

#show ip bgp summary 
(....)
Neighbor        V    AS MsgRcvd MsgSent   TblVer  InQ OutQ Up/Down  State/PfxRcd
10.1.2.1        4 65500 1405688 1646537       86    0    0 2w2d            1
10.10.1.1       4 65500 2230641 2612837       86    0    0 14w6d           1

And now the result is:

#show ip bgp 192.168.10.66
BGP routing table entry for 192.168.10.66/32, version 81
Paths: (4 available, best #1, table Default-IP-Routing-Table)
  Not advertised to any peer
  65500
    10.1.2.1 from 10.1.2.1 (192.168.80.226)
      Origin IGP, metric 0, localpref 120, valid, external, best
  65500, (received-only)
    10.1.2.1 from 10.1.2.1 (192.168.80.226)
      Origin IGP, metric 0, localpref 100, valid, external
  65500
    10.10.1.1 from 10.10.1.1 (10.20.30.40)
      Origin IGP, metric 0, localpref 80, valid, external
  65500, (received-only)
    10.10.1.1 from 10.10.1.1 (10.20.30.40)
      Origin IGP, metric 0, localpref 100, valid, external


#show ip mroute 239.100.10.1  
IP Multicast Routing Table
Flags: D - Dense, S - Sparse, B - Bidir Group, s - SSM Group, C - Connected,
       L - Local, P - Pruned, R - RP-bit set, F - Register flag,
       T - SPT-bit set, J - Join SPT, M - MSDP created entry,
       X - Proxy Join Timer Running, A - Candidate for MSDP Advertisement,
       U - URD, I - Received Source Specific Host Report, 
       Z - Multicast Tunnel, z - MDT-data group sender, 
       Y - Joined MDT-data group, y - Sending to MDT-data group
       V - RD & Vector, v - Vector
Outgoing interface flags: H - Hardware switched, A - Assert winner
 Timers: Uptime/Expires
 Interface state: Interface, Next-Hop or VCD, State/Mode

(*, 239.100.10.1), 1y14w/00:03:15, RP 192.168.10.74, flags: SJC
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    Vlan411, Forward/Sparse, 17:58:37/00:02:19, H
    Vlan3142, Forward/Sparse, 4d14h/00:02:20, H
    Vlan3143, Forward/Sparse, 4d22h/00:02:19, H

(192.168.10.66, 239.100.10.1), 1y14w/00:03:19, flags: T
  Incoming interface: Vlan666, RPF nbr 10.1.2.1
  Outgoing interface list:
    Vlan411, Forward/Sparse, 17:58:37/00:02:19, H
    Vlan3142, Forward/Sparse, 4d14h/00:02:20, H
    Vlan3143, Forward/Sparse, 4d22h/00:02:19, H

I am not sure if that's a proper approach to build multicast groups redundancy but the weird thing is that this one worked on my router running cat4500e-entservicesk9-mz.150-2.SG.bin but stopped to work once I upgraded to cat4500e-entservicesk9-mz.152-2.E6.bin.

I have noticed that the cat4500e-entservicesk9-mz.152-2.E6.bin does introduce a different concept where there are commands/features like show ip mrib, mfib etc. I only managed to route multicast traffic in case the source address is directly connected to the interface. It seems like the new IOS ignores my multicast traffic in case it's not directly connected to the interface where it's flooded on.

I worked at this in our lab today and noticed the following. 239.100.10.2 works because its source address 10.3.2.1 is directly connected on the C4900's VLAN interface (10.3.2.2/30).

#show ip mfib 239.100.10.2
Entry Flags:    C - Directly Connected, S - Signal, IA - Inherit A flag,
                ET - Data Rate Exceeds Threshold, K - Keepalive
                DDE - Data Driven Event, HW - Hardware Installed
                ME - MoFRR ECMP entry, MNE - MoFRR Non-ECMP entry, MP - MFIB 
                MoFRR Primary, RP - MRIB MoFRR Primary, P - MoFRR Primary
                MS  - MoFRR  Entry in Sync, MC - MoFRR entry in MoFRR Client.
I/O Item Flags: IC - Internal Copy, NP - Not platform switched,
                NS - Negate Signalling, SP - Signal Present,
                A - Accept, F - Forward, RA - MRIB Accept, RF - MRIB Forward,
                MA - MFIB Accept, A2 - Accept backup,
                RA2 - MRIB Accept backup, MA2 - MFIB Accept backup

Forwarding Counts: Pkt Count/Pkts per second/Avg Pkt Size/Kbits per second
Other counts:      Total/RPF failed/Other drops
I/O Item Counts:   FS Pkt Count/PS Pkt Count
Default
 (*,239.100.10.2) Flags: C HW
   SW Forwarding: 0/0/0/0, Other: 3455/3455/0
   Tunnel8 Flags: A
 (10.3.2.1,239.100.10.2) Flags: HW
   SW Forwarding: 0/0/0/0, Other: 10/0/10
   HW Forwarding:   5317270/641/1344/6734, Other: NA/NA/NA
   Vlan2666 Flags: A

However, others that come from 192.168.10.66 are never okay regardless what I did. Tried BGP announcing as /32, /30, static mroute, static route, etc. Always receive: No matching routes in MRIB route-DB. Not recognized on show ip mroute 239.100.10.X (x=any other from source 192.168.10.66)

#show ip mrib route 10.3.2.1
IP Multicast Routing Information Base
Entry flags: L - Domain-Local Source, E - External Source to the Domain,
    C - Directly-Connected Check, S - Signal, IA - Inherit Accept, D - Drop
    ET - Data Rate Exceeds Threshold,K - Keepalive,DDE - Data Driven Event
    ME - MoFRR ECMP Flow based, MNE - MoFRR Non-ECMP Flow based,
    MP - Primary MoFRR Non-ECMP Flow based entry
Interface flags: F - Forward, A - Accept, IC - Internal Copy,
    NS - Negate Signal, DP - Don't Preserve, SP - Signal Present,
    II - Internal Interest, ID - Internal Disinterest, LI - Local Interest,
    LD - Local Disinterest, MD - mCAC Denied, MI - mLDP Interest
    A2 - MoFRR ECMP Backup Accept

(10.3.2.1,239.100.10.2) RPF nbr: 0.0.0.0 Flags:
  Vlan2666 Flags: A

#show ip mrib route 192.168.10.66
No matching routes in MRIB route-DB

#show ip route 192.168.10.66
Routing entry for 192.168.10.66/32
  Known via "bgp 65505", distance 20, metric 0
  Tag 65500, type external
  Last update from 10.3.2.1 00:44:53 ago
  Routing Descriptor Blocks:
  * 10.3.2.1, from 10.3.2.1, 00:44:53 ago
      Route metric is 0, traffic share count is 1
      AS Hops 1
      Route tag 65500
      MPLS label: none

#show ip rpf 192.168.10.66
RPF information for ? (192.168.10.66)
  RPF interface: Vlan2666
  RPF neighbor: ? (10.3.2.1)
  RPF route/mask: 192.168.10.66/32
  RPF type: unicast (bgp 65505)
  Doing distance-preferred lookups across tables
  Multicast Multipath enabled.
  RPF topology: ipv4 multicast base, originated from ipv4 unicast base

I would appreciate any kind of help on the above issue. Maybe there's a better approach to make multicast redundancy from two servers?

Thank you,

Stefan

Rytis Urnezius · ‎10-26-2017

Hello,

i have exactly the same problem with C4500-X, tested with IOS XE Software - 3.6.7 3.8.5 3.9.0 3.10.0

it looks like that RPF drops packets:

debug ip mfib pak 239.1.2.3
Oct 26 10:37:07.221: MFIBv4(0x0): Receive (10.10.10.253,239.1.2.3) from Vlan1024 (PS): hlen 5 prot 17 len 1344 ttl 59 frag 0x4000
Oct 26 10:37:07.221: MFIBv4(0x0): Pkt (10.10.10.253,239.1.2.3) from Vlan1024 (PS) Acceptance check failed - dropping

I also can't find a solution.

ss1 · ‎10-29-2017

Well, I managed to get that concept running successfully, although not in the best possible way...

I did 'ip pim dense-mode' instead of 'ip pim sparse-mode' on all incoming interfaces. Also I removed the static-rp definition and now it works.

Although it works, it's not without other issues: I had a situation when a neigbor PIM router told me that they are RP for 224.0.0.0/4 (due to misconfiguration) and all my multicast groups started flowing through that neighbor (okay, I forgot to disable autorp listener - my issue). But I am not sure if I am secured from similar risks in future, in case a customer sends me back the same multicast addresses as incoming on their interfaces by mistake, etc. A second trouble - I had a customer with ip pim static-join 239.x.y.z on their ip pim sparse-mode interface in order to constantly flood them with what they wanted, and then nobody else among the other customers ever received the same groups through their interface.

To stress again - I am not sure if the way I am doing it is a best practice approach to accomplish multicast redundancy. I continue thinking that I do it conceptually wrong all the time, and there could be a better way to accomplish the goal. To sum it up:

- source 192.168.10.66/32 streaming multicast addresses 239.100.10.0/24
- 2 different VLANs + 2 different VLAN IFs for the incoming traffic from the servers
- a BGP on the Cisco and quagga on the servers are sending a BGP next-hop to 192.168.10.66 (only one route up all the time - servers are local preferenced better and lower).

A C4900 machine with an old IOS 12.2 is working that way for two years, however not as best as it can: I am always getting InputIf Failures processed by CPU for the interface of my failover server. I think that I receive these at the time when a customer sends an IGMP report request for multicast group - so the actual multicast contents is hardware dropped but not at the time when IGMP snooping must do its job. I have approx. 800 InputIf Failures every 5 seconds and that's blowing my CPU to constant 60% all the time; that's more or less the total number of IGMP requests from my customer interfaces. For this reason I would appreciate if anybody out there is aware of a more professional way to accomplish what I am constantly trying to do for ages.

Thank you for the great help.