Sup2T XL,2 full bgp tables and FIB exception

wavecomas · ‎10-07-2019

Hi there,

Im facing huge packet loss with only longer distances affected.
Whitin our transit providers directly connected As´s all works fine.
They all have less then 10 routing hops..
All longer distances are lossing more then 50%

We have 6504-E platform with Sup2T XL and WS-X6904-40G linecard with WS-F6K-DFC4-EXL

Latest rcommended IOS s2t54-adventerprisek9-mz.SPA.155-1.SY3.bin
2 connected upstreams sending us full bgp table. ibgp and some smaller peers total fewhundred prefixes.

Is that because they are conected same linecard WS-X6904-40G ?
Do i need to connect one peer to supervisor ? Or this can be a bug ?

I can see FIB exception . But tcam is not filled. There is 847185 routes while platform allowes up to 1017k
Also there are no whatsoverer logs, especially FIB_EXCEPTION_THRESHOLD exceeded..

adala-gw#show mls cef exception status
Current IPv4 FIB exception state = TRUE
Current IPv6 FIB exception state = TRUE
Current MPLS FIB exception state = FALSE
Current EoM/VPLS FIB TCAM exception state = FALSE

adala-gw#show mls cef max

Fib-size: 1024k (1048576), shared-size: 1016k (1040384), shared-usage: 892k(914084)

Protocol Max-routes Use-shared-region Dedicated
-------- ---------- ----------------- ---------
IPV4 1017k Yes 1k
IPV4-MCAST 1017k Yes 1k
IPV6 1017k Yes 1k
IPV6-MCAST 1017k Yes 1k
MPLS 1017k Yes 1k
EoMPLS 1017k Yes 1k
VPLS-IPV4-MCAST 1017k Yes 1k
VPLS-IPV6-MCAST 1017k Yes 1k

adala-gw#show mls cef summary

Total routes: 847185
IPv4 unicast routes: 778222
IPv4 non-vrf routes: 778222
IPv4 vrf routes: 0
IPv4 multicast routes: 3
IPv6 unicast routes: 68957
IPv6 global routes: 68956
IPv6 non-vrf routes: 68956
IPv6 vrf routes: 0
IPv6 link-local routes: 1
IPv6 multicast routes: 1
mpls routes: 1
mpls-vpn routes: 0
eompls-l2 routes: 1
eom-ipv4-mcast routes: 0
eom-ipv6-mcast routes: 0
adala-gw#

adala-gw#sh mod
Mod Ports Card Type Model Serial No.
--- ----- -------------------------------------- ------------------ -----------
1 5 Supervisor Engine 2T 10GE w/ CTS (Acti VS-SUP2T-10G SAL1834Z38M
3 20 DCEF2T 4 port 40GE / 16 port 10GE WS-X6904-40G SAL1828WCJZ

Mod MAC addresses Hw Fw Sw Status
--- ---------------------------------- ------ ------------ ------------ -------
1 6c41.6a0c.7a7f to 6c41.6a0c.7a86 2.1 12.2(50r)SYS 15.5(1)SY3 Ok
3 f8c2.884b.3f60 to f8c2.884b.3f73 1.1 12.2(50r)SYL 15.5(1)SY3 Ok

Mod Sub-Module Model Serial Hw Status
---- --------------------------- ------------------ ----------- ------- -------
1 Policy Feature Card 4 VS-F6K-PFC4XL SAL1832Y503 2.1 Ok
1 CPU Daughterboard VS-F6K-MSFC5 SAL1833YV6M 3.0 Ok
3 Distributed Forwarding Card WS-F6K-DFC4-EXL SAL1832XYPV 1.2 Ok

Mod Online Diag Status
---- -------------------
1 Pass
3 Pass

Leo Laohoo · ‎10-07-2019

@wavecomas wrote:

Im facing huge packet loss with only longer distances affected.

Define "long distances"? Are we talking about links on fibre?

Can you post the complete output to the command "sh interface <PORTS>"?

wavecomas · ‎10-09-2019

We are talking about longer routes.. All is fine with in our transit partner HE.NET network and directly to HE.NET connected networks. all remote networks are lossing. Its not HE.NET issue.. We have other routers running now with default route because i have not let full tables in anymore. But they do had same issue until rebooted.

Here excample is mtr to cogent web. Loss starting in our router.

1. xxxxxxxxx 60.0% 10 0.3 0.3 0.2 0.3 0.0
2. 10ge1-17.core1..he.net 0.0% 10 0.7 0.7 0.7 0.8 0.0
3. tln-b3-link.telia.net 0.0% 10 0.4 0.6 0.4 1.4 0.3
4. s-bb4-link.telia.net 0.0% 10 6.6 6.3 6.2 6.6 0.1
5. kbn-bb4-link.telia.net 30.0% 10 12.8 13.4 12.8 15.8 1.1
6. kbn-b3-link.telia.net 40.0% 10 13.1 13.3 13.1 14.3 0.5
7. hu0-5-0-0.rcr21.cph01.atlas. 40.0% 10 13.6 13.7 13.6 13.8 0.1
8. be2303.ccr41.ham01.atlas.cog 40.0% 10 23.1 23.0 22.9 23.1 0.1
9. be2797.ccr41.fra03.atlas.cog 50.0% 10 30.2 30.2 30.1 30.4 0.1
10. be2156.rcr21.b023657-1.fra03 60.0% 10 29.3 29.3 29.3 29.3 0.0
11. cogentco.com 60.0% 10 26.3 26.3 26.2 26.3 0.1

Google otherhand is just fine..

[root@Centos6 ~]# mtr -r google.com

1. xxxxxxxxxs 0.0% 10 0.3 0.4 0.3 1.2 0.3
2. 10ge1-17.core1.he.net 0.0% 10 10.5 7.1 0.4 16.6 7.1
3. 100ge9-2.core1.sto1.he.net 0.0% 10 5.9 5.8 5.8 5.9 0.0
4. as15169-10g-sk1.sthix.net 0.0% 10 6.0 17.1 6.0 105.3 31.1
5. 108.170.253.161 0.0% 10 7.8 7.5 7.2 7.8 0.2
6. 209.85.242.83 0.0% 10 7.0 6.9 6.6 7.2 0.2
7. arn09s19-in-f14.1e100.net 0.0% 10 6.1 6.0 5.9 6.2 0.1

There is no whatsover error in interfaces.

Roderick Groesbeek · ‎12-04-2019

If you have hit the exception state

~~

Current IPv4 FIB exception state = TRUE <---

~~

Then roughly 70% of your packets or more, of some prefixes will get dropped. Probably depends on your traffic usage..

So all those prefixes will almost have 'no internet', but some older/other prefixes will still flow well.

The only method to get out of that exception state, is a reload however.

(Probably fix the issue first, before the reload e.g. summary, default route, fib changing: platform "hardware cef maximum-routes ip ...")