02-05-2016 04:41 AM - edited 03-08-2019 04:29 AM
Hi to everyone,
I would like to share with you a problem that I'm facing nowadays in our showroom network. The architecture is really simple no segmentation and one VLAN:
FTTH Alcatel Lucent router plays role of modem.
Cisco 887VA router connected to above alcatel.
HP Switch manageable 10/100 connected to the cisco port FA3
TPLINK Switch 10/100/1000 connected to the HP Switch in 10/100 port.
Dlink 10/100 Switch connected to TPLINK switch.
3 access point TPLINK in the whole house.
HP Switch has 5 devices.
TPLINK Switch almost 20 devices.
Dlink Switch has 7 Devices.
Finally the users' devices (phones, iPad ...)
The main problem we have is after certain period of time the network seems not responding anymore, no internet no communication inside the LAN. We saw each time that all the LED switches blink at the same time. I tried to get info from cisco router to have an idea what causing this and the result of "sh processes cpu sorted" shows me "IP NAT AGER" has 99%/0%. Sometimes I saw that "IGMP Snooping RE" and "IGMPSN" consume 35% to 40% of cisco resources. So I disabled the IGMP snooping and I don't know if it is healthy for the network or not ?
I'm trying to figure out what is the best way to find out who's causing this: Is worm ? is a specific protocol ? IP fragmentation ? ....
Just for information this network is used for home entertainment solutions like (Lighting control, Media control ...) and these devices need to be in the same segment of the network to be able to communicate with users applications. The idea of making VLANS seems tricky to me because I'm thinking should I forward broadcasts to another VLANS is the same thing if everything is in the same segment or not and if it is a bad or good way of doing things ...
I have some skills in networking but not really experienced in network analysis and network crash. I'm really confused and I need your recommendations and your expertise, bearing in mind that the most of the equipment will need broadcast to communication with the end users.
This is my cisco router config:
Router>en
Password:
Router#sh run
Building configuration...
Current configuration : 3382 bytes
!
! Last configuration change at 11:50:07 UTC Fri Feb 5 2016
version 15.3
no service pad
service tcp-keepalives-in
service tcp-keepalives-out
service timestamps debug datetime msec
service timestamps log datetime msec
service password-encryption
!
hostname Router
!
boot-start-marker
boot-end-marker
!
aqm-register-fnf
!
no logging console
no logging monitor
!
no aaa new-model
!
!
no ip source-route
ip options drop
!
!
!
!
!
ip dhcp excluded-address 192.168.1.1 192.168.1.50
ip dhcp excluded-address 192.168.2.1 192.168.2.50
ip dhcp excluded-address 192.168.3.1 192.168.3.50
ip dhcp excluded-address 192.168.1.254
!
ip dhcp pool USERS
import all
network 192.168.1.0 255.255.255.0
default-router 192.168.1.1
dns-server 8.8.8.8 8.8.4.4
!
ip dhcp pool SAVANT
import all
network 192.168.2.0 255.255.255.0
dns-server 8.8.8.8 8.8.4.4
default-router 192.168.2.1
!
ip dhcp pool SONOS
import all
network 192.168.3.0 255.255.255.0
default-router 192.168.3.1
dns-server 8.8.8.8 8.8.4.4
!
!
!
no ip domain lookup
ip cef
no ip igmp snooping
no ipv6 cef
!
!
!
!
!
multilink bundle-name authenticated
!
!
!
!
!
!
!
license udi pid C887VA-K9 sn FCZ191471DA
!
!
!
!
!
!
!
controller VDSL 0
!
!
!
!
!
!
!
!
!
!
!
interface ATM0
no ip address
shutdown
no atm ilmi-keepalive
!
interface Ethernet0
no ip address
shutdown
!
interface FastEthernet0
no ip address
!
interface FastEthernet1
switchport access vlan 999
no ip address
!
interface FastEthernet2
switchport access vlan 3
no ip address
!
interface FastEthernet3
no ip address
storm-control broadcast level 40.00
storm-control multicast level 40.00
!
interface Vlan1
ip address 192.168.1.1 255.255.255.0
ip flow ingress
ip nat inside
ip virtual-reassembly in
ip policy route-map CLEAR_DF
!
interface Vlan2
ip address 192.168.2.1 255.255.255.0
ip nat inside
ip virtual-reassembly in
!
interface Vlan3
ip address 192.168.3.1 255.255.255.0
ip nat inside
ip virtual-reassembly in
!
interface Vlan999
no ip address
ip virtual-reassembly in
pppoe enable group global
pppoe-client dial-pool-number 1
!
interface Dialer1
ip address negotiated
ip mtu 1492
ip nat outside
ip virtual-reassembly in max-fragments 64 max-reassemblies 1024
encapsulation ppp
dialer pool 1
ppp authentication chap pap callin
ppp chap hostname xxx
ppp chap password 7 xxx
ppp pap sent-username xxx password 7 xxx
!
ip forward-protocol nd
no ip http server
no ip http secure-server
!
!
ip nat translation tcp-timeout 300
ip nat translation udp-timeout 600
ip nat translation max-entries 200
ip nat inside source list FTTH_WAN interface Dialer1 overload
ip route 0.0.0.0 0.0.0.0 Dialer1
!
ip access-list extended FTTH_WAN
permit ip host 0.0.0.0 any
permit ip 192.168.1.0 0.0.0.255 any
!
!
route-map CLEAR_DF permit 10
match ip address TCP
set ip df 0
!
!
control-plane
!
!
!
mgcp behavior rsip-range tgcp-only
mgcp behavior comedia-role none
mgcp behavior comedia-check-media-src disable
mgcp behavior comedia-sdp-force disable
!
mgcp profile default
!
!
!
!
!
line con 0
password 7 xxxx
login
no modem enable
line aux 0
line vty 0 4
password 7 xxx
login
transport input all
!
scheduler allocate 20000 1000
!
end
Solved! Go to Solution.
02-09-2016 05:27 AM
Hey
your switches are probably gig if they are there ok id say , its the router is the bottleneck more likely and what's struggling with pushing traffic through , not enough throughput and only fastethernet port which would be a bottleneck to gig switches behind it
so your amount of bandwidth should predict what you need in terms of a router , it doesn't have to be an exact measurement but you want to make sure you have enough , do you have any idea of what amount of traffic your sending through the router ? An 890 series would be twice what you have now and have gig ports but you may want to move slightly higher depending on volume of traffic coming through , the guide from earlier post is good shows throughput of what to expect go up to like a 1900 you definitely should not have issues as its 500mb but its going to be about cost
Yes you can use qos to restrict your video/multicast traffic , not sure about your dlink switches but the Cisco/HP equipment can definitely support that , its a matter of restricting the correct protocol or value say video you would restrict af41 I think off my head as an example , you can match a subnet against the policy so anyone coming from the vlan 1 is only pushing so much video traffic so it does not kill the network
---http://anticisco.ru/pubs/ISR_G2_Perfomance.pdf
Looks like your on the safe harbour 15.3.3M5 version that's the best software available already for your platform , have it running here 2 on our remote backup sites , least amount of bugs and issues
http://www.cisco.com/c/en/us/td/docs/ios/12_2/qos/configuration/guide/fqos_c/qcfpoli.html
02-05-2016 07:59 AM
Hey couple of things I don't get , you say 1 vlan but I see multiple configured and is there a specific reason your binding the pppoe to vlan 999
Tighten up the NAT acl its not going to help the cpu like that using 0.0.0.0
Use this if its only vlan 1 being natted or if more specify exactly don't use 0.0.0.0
ip access-list extended FTTH_WAN
permit ip 192.168.1.0 0.0.0.255 any
Only ever seen this used when GRE tunnel is in place don't see the requirement on standard pppoe setup I would remove it
route-map CLEAR_DF permit 10
match ip address TCP
set ip df 0
This doesn't look right the bit in bold should be put in the interface attached directly to Alcatel and then have ip nat outside on it not on dialer and don't use a vlan int
interface Vlan999
no ip address
ip virtual-reassembly in
pppoe enable group global
pppoe-client dial-pool-number 1
Use
interface FastEthernet1
ip nat ouside
pppoe enable group global
pppoe-client dial-pool-number 1
On your dialer your missing for fragmentation
ip tcp adjust-mss 1452
remove int vlan 2 and 3 if your not using them
***********************************************************
Now if that does not fix the cpu issues we need to see why , cpu issues must be caught in real time to help identify the issue . this is an eem script I use which will collect data and send it to the flash for you
Run this and leave it until cpu hits 80% it will collect the below , then retrieve them from flash in text file and we can take a look
event manager applet High_CPU
event snmp oid 1.3.6.1.4.1.9.9.109.1.1.1.1.4.1 get-type exact entry-op ge entry-val "80" exit-time 10 poll-interval 5
action 0.1 syslog msg "CPU Utilization is high"
action 0.2 cli command "enable"
action 0.4 cli command "show log | append flash:CPU_Profile.txt"
action 0.5 cli command "show process cpu sorted | append flash:CPU_Profile.txt"
action 0.6 cli command "show interfaces | append flash:CPU_Profile.txt"
action 0.7 cli command " show ip cef switching stat | append flash:CPU_Profile.txt"
action 0.8 cli command " show ip traffic | append flash:CPU_Profile.txt"
action 0.9 cli command " show int switching | append flash:CPU_Profile.txt"
action 1.0 cli command "no event manager applet High_CPU"
action 1.1 cli command "end"
when the script is sent ot flash do no event manager applet High_CPU you don't want it to keep running
IGMP maps multicast streams host to router have you specifically set igmp groups and multicast for video traffic on all switches or just cisco ?
Let me know how you get on with that , this doc shows some tips explains pppoe setup too
http://www.cisco.com/c/en/us/td/docs/routers/access/800/software/configuration/guide/SCG800Guide/SCG800_Guide_BookMap_chapter_01010.html#con_1052857
02-06-2016 11:26 AM
Hi Mark Malone,
I really appreciate your detailed reply.
Yes I had three vlans but I remove them physically but not from cisco router. I will remove them.
Thanks for the EEM script this will help us a lot doing diagnostics.
Regarding your question: IGMP maps multicast streams host to router have you specifically set igmp groups and multicast for video traffic on all switches or just cisco ?
Answer: Just Cisco router.
I will try to execute the EEM script and when my LAN will crash I will post it here.
Stay tuned.
Thanks brothers.
02-08-2016 12:35 AM
Hey yes when you can send on the eem captured and updated copy of config with multicast configuration if you have it
just thinking when the issue occurs have you just tried first rebooting either HP or tplink incase its one of them that's seized up , if rebooting only 1 of them and the issue is resolved it may be the factor causing it rather than booting whole network
02-08-2016 02:21 AM
Hi,
In attachment the whole file captured during HIGH CPU utilization. This was during the weekend.
I just forgot to mention that in attempt to prevent this crash from happening we removed a segment from a network to make it stable. So, this last weekend the client wanted this portion of the network, thus we connected it and may be it was the main responsible for causing this. This is just for info.
Actually I don't get your request regarding multicast configuration. If you can be more specific please.
I couldn't configure the fa0 as wan interface and remove the Dialer pool from VLAN 999 because I wasn't these times in site. So I'll do it later on.
This is the cisco config:
Building configuration...
Current configuration : 3501 bytes
!
! Last configuration change at 09:55:28 UTC Mon Feb 8 2016
version 15.3
no service pad
service tcp-keepalives-in
service tcp-keepalives-out
service timestamps debug datetime msec
service timestamps log datetime msec
service password-encryption
!
hostname Router
!
boot-start-marker
boot-end-marker
!
aqm-register-fnf
!
no logging console
no logging monitor
!
no aaa new-model
!
!
no ip source-route
ip options drop
!
!
!
!
!
ip dhcp excluded-address 192.168.1.1 192.168.1.50
ip dhcp excluded-address 192.168.2.1 192.168.2.50
ip dhcp excluded-address 192.168.3.1 192.168.3.50
ip dhcp excluded-address 192.168.1.254
!
ip dhcp pool USERS
import all
network 192.168.1.0 255.255.255.0
default-router 192.168.1.1
dns-server 212.217.1.1 8.8.8.8
!
ip dhcp pool SAVANT
import all
network 192.168.2.0 255.255.255.0
dns-server 8.8.8.8 8.8.4.4
default-router 192.168.2.1
!
ip dhcp pool SONOS
import all
network 192.168.3.0 255.255.255.0
default-router 192.168.3.1
dns-server 8.8.8.8 8.8.4.4
!
!
!
no ip domain lookup
ip cef
no ip igmp snooping
no ipv6 cef
!
!
!
!
!
multilink bundle-name authenticated
!
!
!
!
!
!
!
license udi pid C887VA-K9 sn FCZ191471DA
!
!
!
!
!
!
!
controller VDSL 0
!
!
!
!
!
!
!
!
!
!
!
interface ATM0
no ip address
shutdown
no atm ilmi-keepalive
!
interface Ethernet0
no ip address
shutdown
!
interface FastEthernet0
no ip address
!
interface FastEthernet1
switchport access vlan 999
no ip address
!
interface FastEthernet2
no ip address
!
interface FastEthernet3
no ip address
storm-control broadcast level 40.00
storm-control multicast level 40.00
!
interface Vlan1
ip address 192.168.1.1 255.255.255.0
ip flow ingress
ip nat inside
ip virtual-reassembly in
!
interface Vlan2
no ip address
!
interface Vlan3
no ip address
!
interface Vlan999
no ip address
ip virtual-reassembly in
pppoe enable group global
pppoe-client dial-pool-number 1
!
interface Dialer1
ip address negotiated
ip mtu 1492
ip nat outside
ip virtual-reassembly in max-fragments 64 max-reassemblies 1024
encapsulation ppp
ip tcp adjust-mss 1452
dialer pool 1
ppp authentication chap pap callin
ppp chap hostname xxxxxx
ppp chap password 7 xxxxxx
ppp pap sent-username xxxxx password 7 xxxxxx
!
no ip forward-protocol nd
no ip http server
no ip http secure-server
!
!
ip nat translation tcp-timeout 300
ip nat translation udp-timeout 600
ip nat translation max-entries 200
ip nat inside source list FTTH_WAN interface Dialer1 overload
ip nat inside source static tcp 192.168.1.110 5001 interface Dialer1 5001
ip nat inside source static udp 192.168.1.110 5001 interface Dialer1 5001
ip nat inside source static udp 192.168.1.110 85 interface Dialer1 85
ip nat inside source static tcp 192.168.1.110 85 interface Dialer1 85
ip nat inside source static tcp 192.168.1.110 5006 interface Dialer1 5006
ip nat inside source static udp 192.168.1.110 5006 interface Dialer1 5006
ip route 0.0.0.0 0.0.0.0 Dialer1
!
ip access-list extended FTTH_WAN
permit ip 192.168.1.0 0.0.0.255 any
!
!
!
control-plane
!
!
!
mgcp behavior rsip-range tgcp-only
mgcp behavior comedia-role none
mgcp behavior comedia-check-media-src disable
mgcp behavior comedia-sdp-force disable
!
mgcp profile default
!
!
!
!
!
line con 0
password 7 xxxxx
login
no modem enable
line aux 0
line vty 0 4
password 7 xxxxx
login
transport input all
!
scheduler allocate 20000 1000
!
end
Thanks a lot for your help.
02-08-2016 03:26 AM
Hi
so the first thing I see is huge volume of interrupt traffic ,this occurs when either cef is not enabled at layer 3 and the router is processing everything by cpu or the router is over utilized and theres too much taffic going through it to handle , bear in mind throughput is very low on these routers there only small business and max users only supposed to be 20 , looking at what your saying above that's way more than recommended
http://anticisco.ru/pubs/ISR_G2_Perfomance.pdf
http://www.cisco.com/c/en/us/products/collateral/routers/800-series-routers/data_sheet_c78-613481.html
Then there's another issue I see flooding IGMP traffic but looks to be STP related maybe , can you run this command on the router it will show if you having cionstant STP changes in your network which will cuase halts to traffic and make it look like th network is seized whikle the changes take place or its actual flooded IGMP traffic
sh spanning-tree | i ieee|from|is exec|occur
What way is the HP and DLink switches configured for STP , this command should stop it but and should be ran on all particpating igmp ports or multicast device ports , not sure how you run it on HP or DLink though ---no ip igmp snooping tcn flood
Remove the eem as its causing this alert , got what was needed
%PARSER-3-URLOPENFAIL: cannot open file for redirection 'File in use in an incompatible mode'
Post the STP command when you can and it will tell us more about what the router sees the LAN doing at layer 2
02-08-2016 05:39 AM
02-08-2016 06:50 AM
Hi,
I post the same message because the previous was not well organized.
This is STP informations on Cisco router:
Router#sh spanning-tree | i ieee|from|is exec|occur
VLAN1 is executing the ieee compatible Spanning Tree protocol Number of topology changes 2 last change occurred 3d01h ago
VLAN999 is executing the ieee compatible Spanning Tree protocol Number of topology changes 1 last change occurred 4d01h ago from FastEthernet1
I removed the EEM.
When I'm trying to execute the command: no ip igmp snooping tcn flood, an error near tcn occur. There is no tcn. I tried: no ip igmp snooping ? it shows just vlan.What do you suggest for that ?
As for HP and Dlink switches, the Dlink and Tplink are not manageable . However the HP it is but I kept its config by default. There is just one active port mirrored for diagnostics.
Don't hesitate to ask for further info.
Thanks.
02-08-2016 07:36 AM
That looks ok the STP is stable at layer 2 and that's what you want for it not to be constantly re-calculating itself which would freeze the network
The igmp should not cause the network freeze either unless its stp related and whole network goes into a spin and the fact its on one vlan should be ok no pim required , I would be more concerned about the high volume of interrupt traffic at times which is basically overutilization of the device , when this happens nothing gets processed in cpu because its hammered and running at max and that would like like a network freeze as nothing gets passed through the router
You say there is 32 devices connected up, how many users on the network including aps as well at the same time ?
Where did you try the IGMP command it goes under the interfaces where the multicast traffic would be sent out from , if it on the HP or Dell it wont work its a Cisco command.
02-08-2016 07:37 AM
If you check the logs today do you still see the flood messages for IGMP ?
02-09-2016 01:38 AM
Hi Mark Malone,
No IGMP message in the log.
I could set the no ip igmp snooping tcn flood on FA3.
Regarding the devices in the network there is exactly 30 devices connected to the network. Adding to this a 5 users who they used 2 main apps (SONOS + Savant systems) They send SSDP to identify devices.
Also I notice that some IGMPSN logs appear when the flood occur.
Currently the cisco router is at 89%/86% and there is no process that take more that 3% it's weird problem.
i don't know if this recommended or not I disabled the STP in HP switch. Should I disable it in cisco router as well ? also, Is storm-control broadcast level 40.00 can affect the network ? What do you recommend ? Thanks.
02-09-2016 01:49 AM
02-09-2016 02:14 AM
Hi
Looking at the new logs the routers is being overutilized , too much traffic basically for router to handle so it cannot cef switch traffic its sending traffic to the cpu which is causing it too spike really high which freezes the router
81% interrupt traffic sent to cpu and then the cef is showing it punting current traffic due to it being overloaded
Router#sh process cpu sorted | exclude 0.00%
CPU utilization for five seconds: 87%/81%; one minute: 87%; five minutes: 89%
Doc explaining interrupts
http://www.cisco.com/c/en/us/support/docs/routers/7500-series-routers/41120-highcpu-interrupts.html
Turning STP off is not good its a layer 2 loop prevention mechanism should always be on will prevent issues like loop storms
You can try set igmp up but I don't think that's your cause and actually because the device is oversubscribed it could likely cause more problems as its another enabled feature
Cef can handle amount of traffic coming in and its sending it to cpu to process which causes interrupt traffic
Reason Drop Punt Punt2Host
RP LES Packet destined for us 0 27699 149
RP LES Incomplete adjacency 0 0 192
RP LES TTL expired 0 0 1810017459
RP LES Bad IP packet length 1 0 0
RP LES Features 19653 14471 16656
RP LES Unclassified reason 36749 0 0
RP LES Neighbor resolution req 5 6 0
RP LES Fragmentation no pak 0 0 4
RP LES Total 56408 42176 1810034460
All Total 56408 42176 1810034460
These symptoms are a cases of of oversubscribing the router which will cause freezes and make the network halt when the router just cant process anymore
Storm-control is re commended it prevent specific ports becoming unuseable when there's a broadcast storm at layer 2
My opinion looking at everything that was taken is you need a better router that can process more throughput and more traffic
These routers are small business there not really built to handle large amounts of data or that many devices connected , recommend is 20 max for that platform
Also I notice that some IGMPSN logs appear when the flood occur.--This is what I would expect as long as its not always there flooding logs
Its always worth a shot going to a stable image for your platform just in case its doing this because software's gone array , what version are you on now in flash ?
02-09-2016 04:05 AM
Hi Mark,
We'll change the whole switches to HP. As for the router we'll migrate to another one that can support more throughput.
What type of cisco router do you recommend for this network architecture ?
My version is 15.3(3)M5.
Is there any way to limit the network traffic ? I want to do this just to keep the network stable a little bit more before changing switches.
I really appreciate your help guys your ideas make me feel more confident about these.
Thanks.
02-09-2016 05:27 AM
Hey
your switches are probably gig if they are there ok id say , its the router is the bottleneck more likely and what's struggling with pushing traffic through , not enough throughput and only fastethernet port which would be a bottleneck to gig switches behind it
so your amount of bandwidth should predict what you need in terms of a router , it doesn't have to be an exact measurement but you want to make sure you have enough , do you have any idea of what amount of traffic your sending through the router ? An 890 series would be twice what you have now and have gig ports but you may want to move slightly higher depending on volume of traffic coming through , the guide from earlier post is good shows throughput of what to expect go up to like a 1900 you definitely should not have issues as its 500mb but its going to be about cost
Yes you can use qos to restrict your video/multicast traffic , not sure about your dlink switches but the Cisco/HP equipment can definitely support that , its a matter of restricting the correct protocol or value say video you would restrict af41 I think off my head as an example , you can match a subnet against the policy so anyone coming from the vlan 1 is only pushing so much video traffic so it does not kill the network
---http://anticisco.ru/pubs/ISR_G2_Perfomance.pdf
Looks like your on the safe harbour 15.3.3M5 version that's the best software available already for your platform , have it running here 2 on our remote backup sites , least amount of bugs and issues
http://www.cisco.com/c/en/us/td/docs/ios/12_2/qos/configuration/guide/fqos_c/qcfpoli.html
Discover and save your favorite ideas. Come back to expert answers, step-by-step guides, recent topics, and more.
New here? Get started with these tips. How to use Community New member guide