cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
9080
Views
0
Helpful
48
Replies

Strange multicast problem when using Alteris Deployment Server

cbeswick
Level 1
Level 1

Hi,

We use the PXE boot funtion on our desktop PCs and laptops. Multicast TFTP is enabled within the BIOS to grab a boot file and only recently we have been witnessing an "Open TFTP Timeout" during the bootup process.

I have checked over the IGMP / IP PIM configuration across our network and everything looks fine. However when I check the IGMP group that the client is trying to join I see address 224.1.1.2. Yet when I look at the IGMP group on the port to which the Altiris Deployment server connects I only see group 225.1.2.3.

A colleague has shown me the configuration for the MTFTP server on the deployment server and the group address is set at 224.1.1.0.

Shouldn't the group address on the Altiris deployment server also be 224.1.1.2 or is the 225.1.2.3 address within the 224.1.1.0 IGMP group and the problem is in fact something to do with the network ?

Any help would be appreciated.

Chris

48 Replies 48

Hi Giuseppe,

There are two distribution switches that provide layer 3 to the receivers / clients of the multicast stream. The switch I have been debugging is the one containing the active HSRP gateway for Vlan 68.

When I look at the mroute information for 224.1.1.3 on the standby switch I see the following:

ci_t1_c075_dist_sw1#sh ip mroute 224.1.1.3
IP Multicast Routing Table
Flags: D - Dense, S - Sparse, B - Bidir Group, s - SSM Group, C - Connected,
       L - Local, P - Pruned, R - RP-bit set, F - Register flag,
       T - SPT-bit set, J - Join SPT, M - MSDP created entry,
       X - Proxy Join Timer Running, A - Candidate for MSDP Advertisement,
       U - URD, I - Received Source Specific Host Report,
       Z - Multicast Tunnel, z - MDT-data group sender,
       Y - Joined MDT-data group, y - Sending to MDT-data group
       V - RD & Vector, v - Vector
Outgoing interface flags: H - Hardware switched, A - Assert winner
Timers: Uptime/Expires
Interface state: Interface, Next-Hop or VCD, State/Mode

(*, 224.1.1.3), 00:25:08/00:01:38, RP 172.16.255.4, flags: SP
  Incoming interface: Vlan211, RPF nbr 172.16.211.250, RPF-MFD
  Outgoing interface list: Null

I just debugged PIM on the standby switch and saw the following:

003387: .May 14 07:01:27: PIM(0): Building Periodic (*,G) Join / (S,G,RP-bit) Prune message for 224.1.1.3

003388: .May 14 07:02:19: PIM(0): Received RP-Reachable on Vlan68 from 172.16.255.4
003389: .May 14 07:02:19: PIM(0): Received RP-Reachable on Vlan68 from 172.16.255.4
003390: .May 14 07:02:19:      for group 224.1.1.3
003391: .May 14 07:02:19: PIM(0): Update RP expiration timer (270 sec) for 224.1.1.3
003392: .May 14 07:02:19: PIM(0): Not RPF interface, group 224.1.1.3
003393: .May 14 07:02:19: PIM(0): Received RP-Reachable on Vlan82 from 172.16.255.4
003394: .May 14 07:02:19: PIM(0): Received RP-Reachable on Vlan82 from 172.16.255.4
003395: .May 14 07:02:19:      for group 224.1.1.3
003396: .May 14 07:02:19: PIM(0):   Duplicate RP-reachable from 172.16.255.4 for 224.1.1.3
003397: .May 14 07:02:19: PIM(0): Received RP-Reachable on Vlan8 from 172.16.255.4
003398: .May 14 07:02:19: PIM(0): Received RP-Reachable on Vlan8 from 172.16.255.4
003399: .May 14 07:02:19:      for group 224.1.1.3
003400: .May 14 07:02:19: PIM(0):   Duplicate RP-reachable from 172.16.255.4 for 224.1.1.3
003401: .May 14 07:02:19: PIM(0): Received RP-Reachable on Vlan5 from 172.16.255.4
003402: .May 14 07:02:19: PIM(0): Received RP-Reachable on Vlan5 from 172.16.255.4
003403: .May 14 07:02:19:      for group 224.1.1.3
003404: .May 14 07:02:19: PIM(0):   Duplicate RP-reachable from 172.16.255.4 for 224.1.1.3
003405: .May 14 07:02:19: PIM(0): Received RP-Reachable on Vlan7 from 172.16.255.4
ci_t1_c075_dist_sw1#
003406: .May 14 07:02:19: PIM(0): Received RP-Reachable on Vlan7 from 172.16.255.4
003407: .May 14 07:02:19:      for group 224.1.1.3
003408: .May 14 07:02:19: PIM(0):   Duplicate RP-reachable from 172.16.255.4 for 224.1.1.3
ci_t1_c075_dist_sw1#
003409: .May 14 07:02:27: PIM(0): Building Periodic (*,G) Join / (S,G,RP-bit) Prune message for 224.1.1.3

What does the "duplicate RP-reachable" mean ? Could this be a problem ?

Hello Chris,

I cannot see your last post

any news?

also what platform and what IOS image is running on last switch (the one affected) ?

is IGMP snooping enabled on vlan 68?

I'm thinking of possible bugs ( I know it was working before, but sometimes they are triggered by some network changes)

Hope to help

Giuseppe

Hi Giuseppe,

I have copied in my last post again below so you can read it. We have the same IOS / Platform across the backbone, consisting of the Sup720-3B on version 12.2(33)SXH6

There are two distribution switches that provide layer 3 to the receivers / clients of the multicast stream. The switch I have been debugging is the one containing the active HSRP gateway for Vlan 68.

When I look at the mroute information for 224.1.1.3 on the standby switch I see the following:

ci_t1_c075_dist_sw1#sh ip mroute 224.1.1.3
IP Multicast Routing Table
Flags: D - Dense, S - Sparse, B - Bidir Group, s - SSM Group, C - Connected,
       L - Local, P - Pruned, R - RP-bit set, F - Register flag,
       T - SPT-bit set, J - Join SPT, M - MSDP created entry,
       X - Proxy Join Timer Running, A - Candidate for MSDP Advertisement,
       U - URD, I - Received Source Specific Host Report,
       Z - Multicast Tunnel, z - MDT-data group sender,
       Y - Joined MDT-data group, y - Sending to MDT-data group
       V - RD & Vector, v - Vector
Outgoing interface flags: H - Hardware switched, A - Assert winner
Timers: Uptime/Expires
Interface state: Interface, Next-Hop or VCD, State/Mode

(*, 224.1.1.3), 00:25:08/00:01:38, RP 172.16.255.4, flags: SP
  Incoming interface: Vlan211, RPF nbr 172.16.211.250, RPF-MFD
  Outgoing interface list: Null

I just debugged PIM on the standby switch and saw the following:

003387: .May 14 07:01:27: PIM(0): Building Periodic (*,G) Join / (S,G,RP-bit) Prune message for 224.1.1.3

003388: .May 14 07:02:19: PIM(0): Received RP-Reachable on Vlan68 from 172.16.255.4
003389: .May 14 07:02:19: PIM(0): Received RP-Reachable on Vlan68 from 172.16.255.4
003390: .May 14 07:02:19:      for group 224.1.1.3
003391: .May 14 07:02:19: PIM(0): Update RP expiration timer (270 sec) for 224.1.1.3
003392: .May 14 07:02:19: PIM(0): Not RPF interface, group 224.1.1.3
003393: .May 14 07:02:19: PIM(0): Received RP-Reachable on Vlan82 from 172.16.255.4
003394: .May 14 07:02:19: PIM(0): Received RP-Reachable on Vlan82 from 172.16.255.4
003395: .May 14 07:02:19:      for group 224.1.1.3
003396: .May 14 07:02:19: PIM(0):   Duplicate RP-reachable from 172.16.255.4 for 224.1.1.3
003397: .May 14 07:02:19: PIM(0): Received RP-Reachable on Vlan8 from 172.16.255.4
003398: .May 14 07:02:19: PIM(0): Received RP-Reachable on Vlan8 from 172.16.255.4
003399: .May 14 07:02:19:      for group 224.1.1.3
003400: .May 14 07:02:19: PIM(0):   Duplicate RP-reachable from 172.16.255.4 for 224.1.1.3
003401: .May 14 07:02:19: PIM(0): Received RP-Reachable on Vlan5 from 172.16.255.4
003402: .May 14 07:02:19: PIM(0): Received RP-Reachable on Vlan5 from 172.16.255.4
003403: .May 14 07:02:19:      for group 224.1.1.3
003404: .May 14 07:02:19: PIM(0):   Duplicate RP-reachable from 172.16.255.4 for 224.1.1.3
003405: .May 14 07:02:19: PIM(0): Received RP-Reachable on Vlan7 from 172.16.255.4
ci_t1_c075_dist_sw1#
003406: .May 14 07:02:19: PIM(0): Received RP-Reachable on Vlan7 from 172.16.255.4
003407: .May 14 07:02:19:      for group 224.1.1.3
003408: .May 14 07:02:19: PIM(0):   Duplicate RP-reachable from 172.16.255.4 for 224.1.1.3
ci_t1_c075_dist_sw1#
003409: .May 14 07:02:27: PIM(0): Building Periodic (*,G) Join / (S,G,RP-bit) Prune message for 224.1.1.3

What does the "duplicate RP-reachable" mean ? Could this be a problem ?

Hello Chris,

I've discovered that the problem is related to the specific web browser I use.

it is correct that the standby switch if it is not the PIM DR in segment (you can check this with sh ip pim neighbors) and if it does not provide a better path to the  source of multicast traffic shouldn't have interface vlan68 in its OI list.

About the other messages:

003402: .May 14 07:02:19: PIM(0): Received RP-Reachable on Vlan5 from 172.16.255.4
003403: .May 14 07:02:19:      for group 224.1.1.3

003404: .May 14 07:02:19: PIM(0):   Duplicate RP-reachable from 172.16.255.4 for 224.1.1.3

my guess is that 172.16.255.4 is the loopback address of companion switch, the two devices share multiple LAN segments (different Vlans / IP subnets).

For a reason that we haven't understood up to now, the companion switch the one that I've called last switch in some of my previous posts is trapped in PIM sparse mode on the shared tree and it is not able to join the source specific tree (that partial SC in the sh ip mroute).

these messages like the one above say that actually something strange is happening there, the standby switch receives a PIM message from companion about the involved group (224.1.1.3) on other client vlans.

I think this is not the  root cause but another symptom that affected switch is not behaving  correctly.

I would agree to clear ip mroute on it as a start and if this is not enough you could think of a reload.

Another possible action plan: make the standby switch the PIM DR on segment by using the ip pim dr-priority command in interface mode and see if this solves this can give you time to handle the misbehaving switch as described in the previous sentence

Hope to help

Giuseppe

Hi Giuseppe,

We have a change raised this evening to clear the multicast routes. I will let you know how we get on.

Hi Giuseppe,

We cleared the ip mroute cache and the problem still remains. I will schedule a switch reload and see if this helps.

Hello Chris,

thanks for your update. The issue is still there.

I would consider also making the companion switch PIM DR on segment for vlan 68 (client vlan if I remember correctly)

using interface configuration command ip pim dr-priority (check with help I'm not sure of spelling)

Hope to help

Giuseppe

Hi Giuseppe,

I forgot to mention that I have already tried swapping the DR to the other distribution switch. The problem remains with the same output.

Well, we have finally managed to reboot the switch to no avail. The problem still remains. I am wondering if this is a bug in the code as the problem did start to manifest shortly after we completed a complete backbone upgrade to 12.2(33)SXH6

Hello Chris,

nice to hear from you even if this is not good news

>> Well, we have finally managed to reboot the switch to no avail. The problem still remains. I am wondering if this is a bug in the code as the problem did start to manifest shortly after we completed a complete backbone upgrade to 12.2(33)SXH6

Unfortunately this is something that cannot be excluded!

What IOS image was running before?

Consider also moving to a 12.2(33)SXI2a or later that could fix this.

Hope to help

Giuseppe

Mohamed Sobair
Level 7
Level 7

Hi Chris,

could you please send a small network diagram pointing out the following:

1- The RP for the Group 224.1.1.2

2- Hosts joining the group 224.1.1.2.

3- The multicast source.

Thanks,

Mohamed

Hi Mohamed,

Thanks for your response. I have attached a modified diagram of our backbone architecture. We have what I think is a text book configuration for multicast routing using auto-rp and pim sparse/dense mode. The two core switches are auto-rp candidates, with one of the cores acting as the rp-mapping agent.

All backone routed links are configured in sparse-dense mode. The multicast group was changed to 227.1.1.3 by our server guys just incase the group it uses by default (224.1.1.2) was causing issues. The Alteris PXE boot server that advertises this group sits on the server farm, the clients (receivers) all sit on the switch block access layers.

I am beginning to think that this is something to do with the way in which the PXE boot service operates, because we have CCTV streaming across the network fine using a proper shared multicast tree, i.e. I can see clients on the access distribution layers joining the shared tree for a source which is on another switch block.

Giuseppe - we were on software release 12.2(18)SXF8 prior to our upgrade.

Mohamed Sobair
Level 7
Level 7

Hello Chris,

2 issues could resulted this problem:-

1- Now, You mentioned the RP is Core 1, while when you issue (sh ip mroute 224.1.1.2), the RPF interface is Core 2??? Why ?

Answer: Although you have configured 2 RPs for the same groups for redundancy purposes, but be informed that only ONE RP is going to be elected by the hosts joining the group. The pim join message will be send from the DR to the RP with the highest IP address for that group.

So, here I would explicitly make the primary RP has the highest IP address first, Secondly , I would check the RPF.

Please post: (sh ip mroute 227.x.x.x) from the Active HSRP gateway for the recievers in that group and let us see if it points to the correct RP.

Also from the RP itself (Core 1) , post (sh ip mroute 227.x.x.x) and let us see if there is a multicast source known to the RP.

2- I would make sure that the DR for the multicast groups (224.x.x.x) and (227.x.x.x) is the Active HSRP router , as from your previous output of (sh ip mroute) from the Active HSRP router, the output is not the desired, the Forwarding (Outgoing Interface is: Null) and this shouldnt be, that means its not the PIM DR for that Group. make the DR the Active HSRP by manually setting a highest ip address on the interface OR changing the DR priority to a higer value than the current DR.

Please come back with the output of those commands and let us know the result,

HTH

Mohamed

Hi Mohamed,

The RP is actually Core 2 (the core on the right in the diagram) - my mistake.

Output from the Active HSRP gateway for Vlan 68 - this is the Vlan I am focusing on at the moment.

ci_t1_3a1_dist_sw1#sh ip mroute 227.1.1.3
IP Multicast Routing Table
Flags: D - Dense, S - Sparse, B - Bidir Group, s - SSM Group, C - Connected,
       L - Local, P - Pruned, R - RP-bit set, F - Register flag,
       T - SPT-bit set, J - Join SPT, M - MSDP created entry,
       X - Proxy Join Timer Running, A - Candidate for MSDP Advertisement,
       U - URD, I - Received Source Specific Host Report,
       Z - Multicast Tunnel, z - MDT-data group sender,
       Y - Joined MDT-data group, y - Sending to MDT-data group
       V - RD & Vector, v - Vector
Outgoing interface flags: H - Hardware switched, A - Assert winner
Timers: Uptime/Expires
Interface state: Interface, Next-Hop or VCD, State/Mode

(*, 227.1.1.3), 1d01h/00:02:55, RP 172.16.255.4, flags: SJC
  Incoming interface: Vlan224, RPF nbr 172.16.224.250, Partial-SC
  Outgoing interface list:
    Vlan8, Forward/Sparse-Dense, 00:00:27/00:02:32, H
    Vlan74, Forward/Sparse-Dense, 00:01:27/00:01:32, H
    Vlan73, Forward/Sparse-Dense, 00:01:57/00:01:02, H
    Vlan5, Forward/Sparse-Dense, 00:02:14/00:00:45, H
    Vlan92, Forward/Sparse-Dense, 12:29:12/00:02:27, H
    Vlan6, Forward/Sparse-Dense, 1d01h/00:02:31, H
    Vlan68, Forward/Sparse-Dense, 00:00:04/00:02:55, H

Below is the output from the RP, which is Core 2 on ip 172.16.255.4:

ci_t2_65_c200_core2#sh ip mroute 227.1.1.3
IP Multicast Routing Table
Flags: D - Dense, S - Sparse, B - Bidir Group, s - SSM Group, C - Connected,
       L - Local, P - Pruned, R - RP-bit set, F - Register flag,
       T - SPT-bit set, J - Join SPT, M - MSDP created entry,
       X - Proxy Join Timer Running, A - Candidate for MSDP Advertisement,
       U - URD, I - Received Source Specific Host Report,
       Z - Multicast Tunnel, z - MDT-data group sender,
       Y - Joined MDT-data group, y - Sending to MDT-data group
       V - RD & Vector, v - Vector
Outgoing interface flags: H - Hardware switched, A - Assert winner
Timers: Uptime/Expires
Interface state: Interface, Next-Hop or VCD, State/Mode

(*, 227.1.1.3), 1d01h/00:02:31, RP 172.16.255.4, flags: S
  Incoming interface: Null, RPF nbr 0.0.0.0
  Outgoing interface list:
    Vlan222, Forward/Sparse-Dense, 00:00:58/00:02:31
    Vlan224, Forward/Sparse-Dense, 1d01h/00:02:30

(172.16.192.49, 227.1.1.3), 1d01h/00:01:48, flags:
  Incoming interface: Vlan228, RPF nbr 172.16.228.251
  Outgoing interface list:
    Vlan222, Forward/Sparse-Dense, 00:00:58/00:02:31
    Vlan224, Forward/Sparse-Dense, 1d01h/00:02:30

As you can see, the Multicast source, 172.16.192.49 can be seen on the RP, however the distribution switch just wont join the shared tree, it lists the *,G entries, but not the S,G.

Mohamed Sobair
Level 7
Level 7


Hi Chris,

The output from the RP looks fine,  However, its not from the distribution Switch.

1- You should recieve the groups (224.0.0.39) and (224.0.0.40) from the output of the sh ip mroute at the distribution switch. you have only showed a shared tree of 227.x.x.x , IS there any shared/source base tree for the groups (224.0.1.39) and (224.0.1.40) respectively?


--- You will need to make sure the mapping Agent configured correctly on Core2 the RP--

2- Do you have reachability to the source of the multicast stream from the Distribution switch?

Please confirm,

Mohamed