cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Bookmark
|
Subscribe
|
4334
Views
0
Helpful
17
Replies

LAN Crashes

Hi to everyone,

I would like to share with you a problem that I'm facing nowadays in our showroom network. The architecture is really simple no segmentation and one VLAN:

FTTH Alcatel Lucent router plays role of modem.

Cisco 887VA router connected to above alcatel.

HP Switch manageable 10/100 connected to the cisco port FA3

TPLINK Switch 10/100/1000 connected to the HP Switch in 10/100 port.

Dlink 10/100 Switch connected to TPLINK switch.

3 access point TPLINK in the whole house.

HP Switch has 5 devices.

TPLINK Switch almost 20 devices.

Dlink Switch has 7 Devices.

Finally the users' devices (phones, iPad ...)

The main problem we have is after certain period of time the network seems not responding anymore, no internet no communication inside the LAN. We saw each time that all the LED switches blink at the same time. I tried to get info from cisco router to have an idea what causing this and the result of "sh processes cpu sorted" shows me "IP NAT AGER" has 99%/0%. Sometimes I saw that "IGMP Snooping RE" and "IGMPSN"  consume 35% to 40% of cisco resources. So I disabled the IGMP snooping and I don't know if it is healthy for the network or not ?

I'm trying to figure out what is the best way to find out who's causing this: Is worm ? is a specific protocol ? IP fragmentation ?  ....

Just for information this network is used for home entertainment solutions like (Lighting control, Media control ...) and these devices need to be in the same segment of the network to be able to communicate with users applications. The idea of making VLANS seems tricky to me because I'm thinking should I forward broadcasts to another VLANS is the same thing if everything is in the same segment or not and if it is a bad or good way of doing things ...

I have some skills in networking but not really experienced in network analysis and network crash. I'm really confused and I need your recommendations and your expertise, bearing in mind that the most of the equipment will need broadcast to communication with the end users.

This is my cisco router config:

Router>en
Password:
Router#sh run
Building configuration...

Current configuration : 3382 bytes
!
! Last configuration change at 11:50:07 UTC Fri Feb 5 2016
version 15.3
no service pad
service tcp-keepalives-in
service tcp-keepalives-out
service timestamps debug datetime msec
service timestamps log datetime msec
service password-encryption
!
hostname Router
!
boot-start-marker
boot-end-marker
!
aqm-register-fnf
!
no logging console
no logging monitor
!
no aaa new-model
!
!
no ip source-route
ip options drop
!
!
!
!


!
ip dhcp excluded-address 192.168.1.1 192.168.1.50
ip dhcp excluded-address 192.168.2.1 192.168.2.50
ip dhcp excluded-address 192.168.3.1 192.168.3.50
ip dhcp excluded-address 192.168.1.254
!
ip dhcp pool USERS
 import all
 network 192.168.1.0 255.255.255.0
 default-router 192.168.1.1
 dns-server 8.8.8.8 8.8.4.4
!
ip dhcp pool SAVANT
 import all
 network 192.168.2.0 255.255.255.0
 dns-server 8.8.8.8 8.8.4.4
 default-router 192.168.2.1
!
ip dhcp pool SONOS
 import all
 network 192.168.3.0 255.255.255.0
 default-router 192.168.3.1
 dns-server 8.8.8.8 8.8.4.4
!
!
!
no ip domain lookup
ip cef
no ip igmp snooping
no ipv6 cef
!
!
!
!
!
multilink bundle-name authenticated
!
!
!
!
!
!
!
license udi pid C887VA-K9 sn FCZ191471DA
!
!
!
!
!
!
!
controller VDSL 0
!
!
!
!
!
!
!
!
!
!
!
interface ATM0
 no ip address
 shutdown
 no atm ilmi-keepalive
!
interface Ethernet0
 no ip address
 shutdown
!
interface FastEthernet0
 no ip address
!
interface FastEthernet1
 switchport access vlan 999
 no ip address
!
interface FastEthernet2
 switchport access vlan 3
 no ip address
!
interface FastEthernet3
 no ip address
 storm-control broadcast level 40.00
 storm-control multicast level 40.00
!
interface Vlan1
 ip address 192.168.1.1 255.255.255.0
 ip flow ingress
 ip nat inside
 ip virtual-reassembly in
 ip policy route-map CLEAR_DF
!
interface Vlan2
 ip address 192.168.2.1 255.255.255.0
 ip nat inside
 ip virtual-reassembly in
!
interface Vlan3
 ip address 192.168.3.1 255.255.255.0
 ip nat inside
 ip virtual-reassembly in
!
interface Vlan999
 no ip address
 ip virtual-reassembly in
 pppoe enable group global
 pppoe-client dial-pool-number 1
!
interface Dialer1
 ip address negotiated
 ip mtu 1492
 ip nat outside
 ip virtual-reassembly in max-fragments 64 max-reassemblies 1024
 encapsulation ppp
 dialer pool 1
 ppp authentication chap pap callin
 ppp chap hostname xxx
 ppp chap password 7 xxx
 ppp pap sent-username xxx password 7 xxx
!
ip forward-protocol nd
no ip http server
no ip http secure-server
!
!
ip nat translation tcp-timeout 300
ip nat translation udp-timeout 600
ip nat translation max-entries 200
ip nat inside source list FTTH_WAN interface Dialer1 overload
ip route 0.0.0.0 0.0.0.0 Dialer1
!
ip access-list extended FTTH_WAN
 permit ip host 0.0.0.0 any
 permit ip 192.168.1.0 0.0.0.255 any
!
!
route-map CLEAR_DF permit 10
 match ip address TCP
 set ip df 0
!
!
control-plane
!
!
!
mgcp behavior rsip-range tgcp-only
mgcp behavior comedia-role none
mgcp behavior comedia-check-media-src disable
mgcp behavior comedia-sdp-force disable
!
mgcp profile default
!
!
!
!
!
line con 0
 password 7 xxxx
 login
 no modem enable
line aux 0
line vty 0 4
 password 7 xxx
 login
 transport input all
!
scheduler allocate 20000 1000
!
end

1 Accepted Solution

Accepted Solutions

Hey

your switches are probably gig if they are there ok id say , its the router is the bottleneck more likely and what's struggling with pushing traffic through , not enough throughput and only fastethernet port which would be a bottleneck to gig switches behind it

so your amount of bandwidth should predict what you need in terms of a router  , it doesn't have to be an exact measurement but you want to make sure you have enough , do you have any idea of what amount of traffic your sending through the router ?  An 890 series would be twice what you have now and have gig ports but you may want to move slightly higher depending on volume of traffic coming through , the guide from earlier post is good shows throughput of what to expect go up to like a 1900 you definitely should not have issues as its 500mb but its going to be about cost

Yes you can use qos to restrict your video/multicast traffic , not sure about your dlink switches but the Cisco/HP equipment can definitely support that , its a matter of restricting the correct protocol or value say video you would restrict af41 I think off my head as an example , you can match a subnet against the policy so anyone coming from the vlan 1 is only pushing so much video traffic so it does not kill the network

---http://anticisco.ru/pubs/ISR_G2_Perfomance.pdf

Looks like your on the safe harbour 15.3.3M5 version that's the best software available already for your platform , have it running here 2 on our remote backup sites , least amount of bugs and issues

http://www.cisco.com/c/en/us/td/docs/ios/12_2/qos/configuration/guide/fqos_c/qcfpoli.html

View solution in original post

17 Replies 17

Mark Malone
VIP Alumni
VIP Alumni

Hey couple of things I don't get , you say 1 vlan but I see multiple configured and is there a specific reason your binding the pppoe to vlan 999

Tighten up the NAT acl its not going to help the cpu like that using 0.0.0.0

Use this if its only vlan 1 being natted or if more specify exactly don't use 0.0.0.0

ip access-list extended FTTH_WAN
 permit ip 192.168.1.0 0.0.0.255 any

Only ever seen this used when GRE tunnel is in place don't see the requirement on standard pppoe setup I would remove it

route-map CLEAR_DF permit 10
 match ip address TCP
 set ip df 0

This doesn't look right the bit in bold should be put in the interface attached  directly to Alcatel and then have ip nat outside on it not on dialer and don't use a vlan int

interface Vlan999
 no ip address
 ip virtual-reassembly in
 pppoe enable group global
 pppoe-client dial-pool-number 1

Use

interface FastEthernet1

ip nat ouside
pppoe enable group global
 pppoe-client dial-pool-number 1

On your dialer your missing for fragmentation

ip tcp adjust-mss 1452

remove int vlan 2 and 3 if your not using them

***********************************************************

Now if that does not fix the cpu issues we need to see why , cpu issues must be caught in real time to help identify the issue . this is an eem script I use which will collect data and send it to the flash for you

Run this and leave it until cpu hits 80% it will collect the below , then retrieve them from flash in text file and we can take a look

event manager applet High_CPU

    event snmp oid 1.3.6.1.4.1.9.9.109.1.1.1.1.4.1 get-type exact entry-op ge entry-val "80" exit-time 10 poll-interval 5

    action 0.1 syslog msg "CPU Utilization is high"

    action 0.2 cli command "enable"

    action 0.4 cli command "show log | append flash:CPU_Profile.txt"

    action 0.5 cli command "show process cpu sorted | append flash:CPU_Profile.txt"

    action 0.6 cli command "show interfaces | append flash:CPU_Profile.txt"

    action 0.7 cli command " show ip cef switching stat | append flash:CPU_Profile.txt"

    action 0.8 cli command " show ip traffic | append flash:CPU_Profile.txt"

    action 0.9 cli command " show int switching | append flash:CPU_Profile.txt"

    action 1.0 cli command "no event manager applet High_CPU"

    action 1.1 cli command "end"

when the script is sent ot flash do no event manager applet High_CPU you don't want it to keep running

IGMP maps multicast streams host to router have you specifically set igmp groups and multicast for video traffic on all switches or just cisco ?

Let me know how you get on with that , this doc shows some tips explains pppoe setup too

http://www.cisco.com/c/en/us/td/docs/routers/access/800/software/configuration/guide/SCG800Guide/SCG800_Guide_BookMap_chapter_01010.html#con_1052857

Hi Mark Malone,

I really appreciate your detailed reply.

Yes I had three vlans but I remove them physically but not from cisco router. I will remove them.

Thanks for the EEM script this will help us a lot doing diagnostics.

Regarding your question: IGMP maps multicast streams host to router have you specifically set igmp groups and multicast for video traffic on all switches or just cisco ?

Answer: Just Cisco router.

I will try to execute the EEM script and when my LAN will crash I will post it here.

Stay tuned.

Thanks brothers.

Hey yes when you can send on the eem captured  and updated copy of config with multicast configuration if you have it

just thinking when the issue occurs have you just tried first rebooting either HP or tplink incase its one of them that's seized up , if rebooting only 1 of them and the issue is resolved it may be the factor causing it rather than booting whole network

Hi,

In attachment the whole file captured during HIGH CPU utilization. This was during the weekend.

I just forgot to mention that in attempt to prevent this crash from happening we removed a segment from a network to make it stable. So, this last weekend the client wanted this portion of the network, thus we connected it and may be it was the main responsible for causing this. This is just for info.

Actually I don't get your request regarding multicast configuration. If you can be more specific please.

I couldn't configure the fa0 as wan interface and remove the Dialer pool from VLAN 999 because I wasn't these times in site. So I'll do it later on.

This is the cisco config:


Building configuration...

Current configuration : 3501 bytes
!
! Last configuration change at 09:55:28 UTC Mon Feb 8 2016
version 15.3
no service pad
service tcp-keepalives-in
service tcp-keepalives-out
service timestamps debug datetime msec
service timestamps log datetime msec
service password-encryption
!
hostname Router
!
boot-start-marker
boot-end-marker
!
aqm-register-fnf
!
no logging console
no logging monitor
!
no aaa new-model
!
!
no ip source-route
ip options drop
!
!
!
!


!
ip dhcp excluded-address 192.168.1.1 192.168.1.50
ip dhcp excluded-address 192.168.2.1 192.168.2.50
ip dhcp excluded-address 192.168.3.1 192.168.3.50
ip dhcp excluded-address 192.168.1.254
!
ip dhcp pool USERS
 import all
 network 192.168.1.0 255.255.255.0
 default-router 192.168.1.1
 dns-server 212.217.1.1 8.8.8.8
!
ip dhcp pool SAVANT
 import all
 network 192.168.2.0 255.255.255.0
 dns-server 8.8.8.8 8.8.4.4
 default-router 192.168.2.1
!
ip dhcp pool SONOS
 import all
 network 192.168.3.0 255.255.255.0
 default-router 192.168.3.1
 dns-server 8.8.8.8 8.8.4.4
!
!
!
no ip domain lookup
ip cef
no ip igmp snooping
no ipv6 cef
!
!
!
!
!
multilink bundle-name authenticated
!
!
!
!
!
!
!
license udi pid C887VA-K9 sn FCZ191471DA
!
!
!
!
!
!
!
controller VDSL 0
!
!
!
!
!
!
!
!
!
!
!
interface ATM0
 no ip address
 shutdown
 no atm ilmi-keepalive
!
interface Ethernet0
 no ip address
 shutdown
!
interface FastEthernet0
 no ip address
!
interface FastEthernet1
 switchport access vlan 999
 no ip address
!
interface FastEthernet2
 no ip address
!
interface FastEthernet3
 no ip address
 storm-control broadcast level 40.00
 storm-control multicast level 40.00
!
interface Vlan1
 ip address 192.168.1.1 255.255.255.0
 ip flow ingress
 ip nat inside
 ip virtual-reassembly in
!
interface Vlan2
 no ip address
!
interface Vlan3
 no ip address
!
interface Vlan999
 no ip address
 ip virtual-reassembly in
 pppoe enable group global
 pppoe-client dial-pool-number 1
!
interface Dialer1
 ip address negotiated
 ip mtu 1492
 ip nat outside
 ip virtual-reassembly in max-fragments 64 max-reassemblies 1024
 encapsulation ppp
 ip tcp adjust-mss 1452
 dialer pool 1
 ppp authentication chap pap callin
 ppp chap hostname xxxxxx
 ppp chap password 7 xxxxxx
 ppp pap sent-username xxxxx password 7 xxxxxx
!
no ip forward-protocol nd
no ip http server
no ip http secure-server
!
!
ip nat translation tcp-timeout 300
ip nat translation udp-timeout 600
ip nat translation max-entries 200
ip nat inside source list FTTH_WAN interface Dialer1 overload
ip nat inside source static tcp 192.168.1.110 5001 interface Dialer1 5001
ip nat inside source static udp 192.168.1.110 5001 interface Dialer1 5001
ip nat inside source static udp 192.168.1.110 85 interface Dialer1 85
ip nat inside source static tcp 192.168.1.110 85 interface Dialer1 85
ip nat inside source static tcp 192.168.1.110 5006 interface Dialer1 5006
ip nat inside source static udp 192.168.1.110 5006 interface Dialer1 5006
ip route 0.0.0.0 0.0.0.0 Dialer1
!
ip access-list extended FTTH_WAN
 permit ip 192.168.1.0 0.0.0.255 any
!
!
!
control-plane
!
!
!
mgcp behavior rsip-range tgcp-only
mgcp behavior comedia-role none
mgcp behavior comedia-check-media-src disable
mgcp behavior comedia-sdp-force disable
!
mgcp profile default
!
!
!
!
!
line con 0
 password 7 xxxxx
 login
 no modem enable
line aux 0
line vty 0 4
 password 7 xxxxx
 login
 transport input all
!
scheduler allocate 20000 1000
!
end

Thanks a lot for your help.

Hi

so the first thing I see is huge volume of interrupt traffic ,this occurs when either cef is not enabled at layer 3 and the router is processing everything by cpu or the router is over utilized and theres too much taffic going through it to handle , bear in mind throughput is very low on these routers there only small business and max users only supposed to be 20 , looking at what your saying above that's way more than recommended

http://anticisco.ru/pubs/ISR_G2_Perfomance.pdf

http://www.cisco.com/c/en/us/products/collateral/routers/800-series-routers/data_sheet_c78-613481.html

Then there's another issue I see flooding IGMP traffic but looks to be STP related maybe , can you run this command on the router it will show if you having cionstant STP changes in your network which will cuase halts to traffic and make it look like th network is seized whikle the changes take place or its actual flooded IGMP traffic  

sh spanning-tree | i ieee|from|is exec|occur

What way is the HP and DLink switches configured for STP , this command should stop it but and should be ran on all particpating igmp ports or multicast device ports , not sure how you run it on HP or DLink though ---no ip igmp snooping tcn flood

Remove the eem as its causing this alert , got what was needed

%PARSER-3-URLOPENFAIL: cannot open file for redirection 'File in use in an incompatible mode'

Post the STP command when you can and it will tell us more about what the router sees the LAN doing at layer 2

Hi, This is STP informations on Cisco router: Router#sh spanning-tree | i ieee|from|is exec|occur VLAN1 is executing the ieee compatible Spanning Tree protocol Number of topology changes 2 last change occurred 3d01h ago VLAN999 is executing the ieee compatible Spanning Tree protocol Number of topology changes 1 last change occurred 4d01h ago from FastEthernet1 I removed the EEM. When I'm trying to execute the command: no ip igmp snooping tcn flood, an error near tcn occur. There is no tcn. I tried: no ip igmp snooping ? it shows just vlan. What do you suggest ? As for HP and Dlink switches, the Dlink and Tplink are not manageable . However the HP it is but I kept its config by default. There is just one active port mirrored for diagnostics. Don't hesitate for further info. Regards,

Hi,

I post the same message because the previous was not well organized.

This is STP informations on Cisco router:

Router#sh spanning-tree | i ieee|from|is exec|occur

VLAN1 is executing the ieee compatible Spanning Tree protocol Number of topology changes 2 last change occurred 3d01h ago

VLAN999 is executing the ieee compatible Spanning Tree protocol Number of topology changes 1 last change occurred 4d01h ago from FastEthernet1

I removed the EEM.

When I'm trying to execute the command: no ip igmp snooping tcn flood, an error near tcn occur. There is no tcn. I tried: no ip igmp snooping ? it shows just vlan.What do you suggest for that ?

As for HP and Dlink switches, the Dlink and Tplink are not manageable . However the HP it is but I kept its config by default. There is just one active port mirrored for diagnostics.

Don't hesitate to ask  for further info.

Thanks.

That looks ok the STP is stable at layer 2 and that's what you want for it not to be constantly re-calculating itself which would freeze the network

The igmp should not cause the network freeze either unless its stp related and whole network goes into a spin and the fact its on one vlan should be ok no pim required , I would be more concerned about the high volume of interrupt traffic at times which is basically overutilization of the device , when this happens nothing gets processed in cpu because its hammered and running at max and that would like like a network freeze as nothing gets passed through the router

You say there is 32 devices connected up, how many users on the network including aps as well at the same time ?

Where did you try the IGMP command it goes under the interfaces where the multicast traffic would be sent out from , if it on the HP or Dell it wont work its a Cisco command.

If you check the logs today do you still see the flood messages for IGMP  ?

Hi Mark Malone,

No IGMP message in the log.

I could set the no ip igmp snooping tcn flood on FA3.

Regarding the devices in the network there is exactly 30 devices connected to the network. Adding to this a 5 users who they used 2 main apps (SONOS + Savant systems) They send SSDP to identify devices.

Also I notice that some IGMPSN logs appear when the flood occur.

Currently the cisco router is at 89%/86% and there is no process that take more that 3% it's weird problem.

i don't know if this recommended or not I disabled the STP in HP switch. Should I disable it in cisco router as well ? also, Is storm-control broadcast level 40.00 can affect the network ? What do you recommend ? Thanks.

Hi,

The CPU is high this morning, in attachment the log file. I become more and more confused.

Regards,

Hi

Looking at the new logs the routers is being overutilized , too much traffic basically for router to handle so it cannot cef switch traffic its sending traffic to the cpu which is causing it too spike really high which freezes the router

81% interrupt traffic sent to cpu and then the cef is showing it punting current traffic due to it being overloaded


Router#sh process cpu sorted | exclude 0.00%
CPU utilization for five seconds: 87%/81%; one minute: 87%; five minutes: 89%

Doc explaining interrupts
http://www.cisco.com/c/en/us/support/docs/routers/7500-series-routers/41120-highcpu-interrupts.html

Turning STP off is not good its a layer 2 loop prevention mechanism should always be on will prevent issues like loop storms


You can try set igmp up but I don't think that's your cause and actually because the device is oversubscribed it could likely cause more problems as its another enabled feature


Cef can handle amount of traffic coming in and its sending it to cpu to process which causes interrupt traffic


       Reason                          Drop       Punt  Punt2Host
RP LES Packet destined for us             0      27699        149
RP LES Incomplete adjacency               0          0        192
RP LES TTL expired                        0          0 1810017459
RP LES Bad IP packet length               1          0          0
RP LES Features                       19653      14471      16656
RP LES Unclassified reason            36749          0          0
RP LES Neighbor resolution req            5          6          0
RP LES Fragmentation no pak               0          0          4
RP LES Total                          56408      42176 1810034460

All    Total                          56408      42176 1810034460


These symptoms are a cases of of oversubscribing the router which will cause freezes and make the network halt when the router just cant process anymore

Storm-control is re commended it prevent specific ports becoming unuseable when there's a broadcast storm at layer 2


My opinion looking at everything that was taken is you need a better router that can process more throughput and more traffic

These routers are small business there not really built to handle large amounts of data or that many devices connected , recommend is 20 max for that platform

Also I notice that some IGMPSN logs appear when the flood occur.--This is what I would expect as long as its not always there flooding logs

Its always worth a shot going to a stable image for your platform just in case its doing this because software's gone array , what version are you on now in flash ?

Hi Mark,

We'll change the whole switches to HP. As for the router we'll migrate to another one that can support more throughput.

What type of cisco router do you recommend for this network architecture ?

My version is 15.3(3)M5.

Is there any way to limit the network traffic ? I want to do this just to keep the network stable a little bit more before changing switches.

I really appreciate  your help guys your ideas make me feel more confident about these.

Thanks.

Hey

your switches are probably gig if they are there ok id say , its the router is the bottleneck more likely and what's struggling with pushing traffic through , not enough throughput and only fastethernet port which would be a bottleneck to gig switches behind it

so your amount of bandwidth should predict what you need in terms of a router  , it doesn't have to be an exact measurement but you want to make sure you have enough , do you have any idea of what amount of traffic your sending through the router ?  An 890 series would be twice what you have now and have gig ports but you may want to move slightly higher depending on volume of traffic coming through , the guide from earlier post is good shows throughput of what to expect go up to like a 1900 you definitely should not have issues as its 500mb but its going to be about cost

Yes you can use qos to restrict your video/multicast traffic , not sure about your dlink switches but the Cisco/HP equipment can definitely support that , its a matter of restricting the correct protocol or value say video you would restrict af41 I think off my head as an example , you can match a subnet against the policy so anyone coming from the vlan 1 is only pushing so much video traffic so it does not kill the network

---http://anticisco.ru/pubs/ISR_G2_Perfomance.pdf

Looks like your on the safe harbour 15.3.3M5 version that's the best software available already for your platform , have it running here 2 on our remote backup sites , least amount of bugs and issues

http://www.cisco.com/c/en/us/td/docs/ios/12_2/qos/configuration/guide/fqos_c/qcfpoli.html