Strange multicast problem when using Alteris Deployment Server - Page 4

cbeswick · ‎05-11-2010

Hi,

We use the PXE boot funtion on our desktop PCs and laptops. Multicast TFTP is enabled within the BIOS to grab a boot file and only recently we have been witnessing an "Open TFTP Timeout" during the bootup process.

I have checked over the IGMP / IP PIM configuration across our network and everything looks fine. However when I check the IGMP group that the client is trying to join I see address 224.1.1.2. Yet when I look at the IGMP group on the port to which the Altiris Deployment server connects I only see group 225.1.2.3.

A colleague has shown me the configuration for the MTFTP server on the deployment server and the group address is set at 224.1.1.0.

Shouldn't the group address on the Altiris deployment server also be 224.1.1.2 or is the 225.1.2.3 address within the 224.1.1.0 IGMP group and the problem is in fact something to do with the network ?

Any help would be appreciated.

Chris

Giuseppe Larosa · ‎06-24-2010

Hello Chris,

(172.16.192.49, 225.1.1.6), 00:23:56/00:02:50, flags: JT

Incoming interface: Vlan224, RPF nbr 172.16.224.250, RPF-MFD

Outgoing interface list:

Vlan68, Forward/Sparse, 00:01:01/00:02:27, H

Vlan84, Forward/Sparse, 00:04:18/00:01:09, H

Vlan6, Forward/Sparse, 00:23:56/00:02:49, H

very good news so you moved to to sparse-mode PIM + autoRP listener and this solved.

Hope to help

Giuseppe

cbeswick · ‎06-24-2010

Hi Giuseppe,

Yes. I also enabled a few other commands including :

3) Enable the "no ip dm-fallback" command
4) Enable the "mls ip multicast non-rpf aging fast" command
5) Enable the "mls ip multicast non-rpf aging global" command
6) Enable the "mls ip multicast consistency-check" command
7) Enable the "mls ip multicast consistency-check type scan-mroute" command

On the 2 core switches I disabled multicast for directly connected interfaces as according to the best practice guides, this should only be enabled on the first hop routers.

Strangely enough the new config didnt work straight away. I had to clear the mroute's on all switches before the routers serving the distribution layer could join the source tree.

I have also enabled 2 mapping agents and 2 candidate RPs on the core, with a 3 second failover on the interval. I still have a few tweeks to do, such as optimising the DR to follow HSRP, and some filters to protect the RPs, but I am almost there.

Thanks again for your help.

Mohamed Sobair · ‎06-26-2010

Hi Chris,

I am very glad that you make it through and your problem is resolved. However, I dont know why I was very suspecious about The DR. as from the previous strange output you had, this could be the only reason afer we checked all neccessary config.

Good luck ...

HTH

Mohamed

cbeswick · ‎07-01-2010

Giuseppe / Mohamed,

After all that, it appears the problem hasn't gone away after all. I have however done some more debugging and found the following output on the RP:

001410: .Jul 1 10:15:41: MRT(0): Reset the z-flag for (172.16.192.49, 225.1.1.6)
001411: .Jul 1 10:15:41: MRT(0): Create (172.16.192.49,225.1.1.6), RPF Vlan228/172.16.228.251
001412: .Jul 1 10:15:41: MRT(0): WAVL Insert interface: Vlan224 in (* ,225.1.1.6) Successful
001413: .Jul 1 10:15:41: MRT(0): set min mtu for (172.16.255.4, 225.1.1.6) 0->1500
001414: .Jul 1 10:15:41: MRT(0): Add Vlan224/225.1.1.6 to the olist of (*, 225.1.1.6), Forward state - MAC not built
001415: .Jul 1 10:15:41: MRT(0): Add Vlan224/225.1.1.6 to the olist of (*, 225.1.1.6), Forward state - MAC not built
001416: .Jul 1 10:15:41: MRT(0): WAVL Insert interface: Vlan224 in (172.16.192.49,225.1.1.6) Successful
001417: .Jul 1 10:15:41: MRT(0): set min mtu for (172.16.192.49, 225.1.1.6) 18010->1500
001418: .Jul 1 10:15:41: MRT(0): Add Vlan224/225.1.1.6 to the olist of (172.16.192.49, 225.1.1.6), Forward state - MAC not built
001419: .Jul 1 10:15:41: MRT(0): Add Vlan224/225.1.1.6 to the olist of (172.16.192.49, 225.1.1.6), Forward state - MAC not built

Strangely enough, if I clear the mroute cache for group 225.1.1.6 the recievers start joining the group and everything works. The problem starts when the source tree times out and gets pruned. When this happens, the shared tree entry (*,G) remains on the distribution switch, but then I get the above messages on the RP, which is also a mapping agent. The output looks like it is failing to create a forward state because of something to do with the mtu ?

I think the reason why I suspected the fault had cleared is because I cleared all the mroute caches once I applied the new config. The only problem was that the fault returned the very next day when people tried logging onto the network....

If I clear the mroute cache on the switches, the system works, and continues to work until the source tree times out.