on 02-12-2014 04:38 AM
This document provides a guide on how to Troubleshoot common issues on satellite (ASR9000v/ASR901) with the ASR9000 and ASR9900 series routers.
The Discovery protocol is the lowest level protocol in satellite bringup, and provides the bootstrap mechanism by which a Satellite and Host begin communicating. The purpose of the Discovery protocol is for the Host to become aware of a reachable Satellite device, and exchange sufficient information to set up (bootstrap) a full Control session. The Host device initiates discovery. Satellite devices are factory-shipped to listen for incoming Discovery probe packets.
The protocol operates at Layer 2. It is a prerequisite that the devices be provided with L2 connectivity, either directly, or via intermediate nodes that forward the Discovery protocol packets.
RP/0/RSP1/CPU0:ASR9K-PE-Agg-1#sh nv satellite status satellite 300
State: Discovery halted; Conflict: no Identification received yet
Type: asr9000v
IPv4 address: 12.23.54.230
Configured satellite fabric links:
Bundle-Ether300
State: Discovery halted; Conflict: no Identification received yet
Port range: GigabitEthernet0/0/0-39
Discovered satellite fabric links:
TenGigE0 /1/0/19: Discovery halted; no Identification received yet
TenGigE0 /1/0/9: Discovery halted; no Identification received yet
“Status: Probing for Satellites; Conflict: no Identification received yet state” means that the Host is sending DPM probes to satellite and waiting for a response. The satellite is either not responding to the discovery probes from the host or the host drops the discovery replies from the satellite.
Check the NP counters of the ICL port. Look for any drops on the NP counters.
if IN_SATELLITE_DISCOVERY_DISCARD NP counter is increasing:
show uidb data location 0/0/CPU0 tenGigE 0/0/1/0 ingress
Satellite IC interface 0x1
sh ethernet infra internal ea trunks tenGigE 0/14/0/3 location 0/14/cPU0 --> main interfaces
sh ethernet infra internal ea subs tenGigE 0/14/0/3.1 location 0/14/cPU0 --> for icl which is a sub interface (incase of L2 Fab)
is_in_icl_mode:
sh ethernet infra internal ether-ma trunks tenGigE 0/14/0/3 location 0/14/cPU0 ( use subs for sub interfaces)
is_in_icl_mode: 1
Note: Incase of bundle ICL, Satellite IC bit is set for ICL bundle interface but not for ICL bundle member.
Please collect the below traces and show tech for the above scenarios for further debugging:
login to bcm shell by the below command
LC:Satellite#test bcm shell ( for 9000v)
show c xe0 where xe0 -xe3 is tengige 1/45 to tengige 1/48
LC:Satellite#test platform bcm shell for (901)
Show c ge2 or ge3 ,where ge2 is gig 0/10 and ge3 is gi 0/11
BCM.0> show c xe0
RUC.xe0 : 87 +30 5/s
RDBGC0.xe0 : 2 +1 -------------> RDBGC0 counter- this counter represents drops
RDBGC2.xe0 : 15 +4
If the satellite is dropping the discovery packets, please collect the below outputs.
show satellite discovery - to check the in/out discovery counters
show satellite nv-info
show satellite iclid_info active
show satellite iclid_info dump
sh monitor event-trace satellite discovery all - for trace logs
show satellite swsb <icl port> [ ICL port ranges from 45-48]
. From BCM shell:
4.1 fp show entry 1
4.2 fp show entry <entryid> [ Entry id can be get from 3.2 command…check for “Probe FP id”]
4.3 show c [ execute 3-4 times at interval of 2 secs]
4.4 vlan show 4094
If the issue is reproable please enable the below debugs and recreate the issue:
debug sdac discovery all
* first start with the below SDAC debugs
debug sdac discovery discovery info
debug sdac discovery discovery error
debug sdac discovery discovery notification
debug sdac discovery discovery state-machine
* If there are no errors or clues from the above debugs, then add the following PD debugs:
debug satellite discovery info
debug satellite discovery error
* To take a look at the raw packet contents, use the following debugs
PI - debug sdac discovery packet
901 PD - debug satellite control misc
The Control Protocol encompasses the remaining steps of Satellite Bringup. It makes use of the connectivity set up by the discovery protocol to provide a reliable and extensible mechanism for the Host device and each satellite device to exchange the required configuration, state, etc.
Check if image transfer was successful
Typical times for image transfer for 9000v is about 3-5 mins and for 901 is 4-6 mins. Following logs appear on the console once we start transfer:
RP/0/RSP0/CPU0:dor1101eiuc202#install nv satellite 9002 transfer progress
Mon Dec 2 16:32:08.652 CET Install Op 6: transfer: 90021 configured satellite has been specified for transfer.
1 satellite has successfully initiated transfer.
Press Ctrl+C at any time to stop displaying the progress bar.
| Working...
RP/0/RSP0/CPU0:Dec 2 16:33:05.676 : icpe_satmgr[1164]: %PKT_INFRA-ICPE_GCO-6-TRANSFER_DONE : Image transfer completed on Satellite 9002 Completed.
From the above log image is transferred in 1 min. Since the recorded time is less than the expected time, image transfer itself failed.
Activation Process
Once the activation is started, the satellite goes for a reload and comes up with the newly transferred image.
Note: For Host Upgrade from pre 511 image to 511 image
If image transfer to the 9000v or 901 satellite fails, please check the below:
control-plane
management-plane
inband
Bundle-Ether102
allow TFTP
hostname Host1
tftp vrf default ipv4 server homedir disk0:
If the tftp transfer requests from satellite comes on the default vrf [through manual IP configuration] and tftp home directory configured on the host is disk0: then image transfer request will fail as the tftp_fs will try to read the disk0:/ path. Please check DDTS CSCuj02716 for more details.
Please remove the tftp homedir and retry the transfer again.
Show cinetd services
RP/0/RSP0/CPU0:R3#show cinetd services
Vrf Name Family Service Proto Port ACL max_cnt curr_cnt wait Program Client Option
default v4 telnet tcp 23 10 0 nowait telnet sysdb
default v4 tftp udp 69 unlimited 0 wait tftpd icpe-cpm /pkg/fpd-nv/ <<<<<<<<<<<<<<<< ICPE started the service, shows as client.Inout output also there should be some tftp service started by icpe-cpm,
if tftp service is not running for icpe-cpm, then collect "show tech satellite" and console logs after enabling following debugs on host for further debugging:
debug cinetd
debug tftp server
debug nv satellite internal cpm tftp verbose
Workaround: process restart ipe_cpm
Image Transfer is successful but Activation says “not latest version”.
9000v
XR Release | FCS or SMU | Image Type | Image Version | Notes |
4.2.1 | FCS | IOS / Kernel | 202.0 (151-3.SVA) | |
ROMMON | 125 | |||
FPGA | 1.13 | |||
4.2.3 | FCS | IOS / Kernel | 210 (151-3.SVB) | |
ROMMON | 125 | |||
FPGA | 1.13 | |||
CSCuc59715 ![]() | IOS / Kernel | 211 | ||
ROMMON | 125 | |||
FPGA | 1.13 | |||
CSCty86900 ![]() | IOS / Kernel | 212 | ||
ROMMON | 125 | |||
FPGA | 1.13 | |||
CSCul09549 ![]() | IOS / Kernel | 213 | ||
ROMMON | 125 | |||
FPGA | 1.13 | |||
4.3.0 | FCS | IOS / Kernel | 252 (151-3.SVC) | |
ROMMON | 125 | |||
FPGA | 1.13 | |||
4.3.1 | FCS | IOS / Kernel | 276 (151-3.SVD) | |
ROMMON | 125 | |||
FPGA | 1.13 | |||
CSCuj97259 ![]() | IOS / Kernel | 277 | ||
ROMMON | 125 | |||
FPGA | 1.13 | |||
CSCui77863 ![]() | IOS / Kernel | 278 | ||
ROMMON | 125 | |||
FPGA | 1.13 | |||
CSCuj97259 ![]() | IOS / Kernel | 279 | ||
ROMMON | 125 | |||
FPGA | 1.13 | |||
4.3.2 | IOS / Kernel | 285 (151-3.SVF) | ||
ROMMON | 125 | |||
FPGA | 1.13 | |||
4.3.4 | IOS / Kernel | 287 (151-3.SVFa) | May say 285 available, this is wrong | |
ROMMON | 125 | |||
FPGA | 1.13 | |||
5.1.0 | IOS / Kernel | 292 (151-3.SVE) | ||
ROMMON | 125 | |||
FPGA | 1.13 | |||
5.1.1 | IOS / Kernel | 322.6 (151-3.SVG) | ||
ROMMON | 126 | In Order to use the 5.1.1 features a satellite must run this version | ||
FPGA | 1.13 | |||
5.1.2 | IOS / Kernel | 327 (151-3.SVG2) | ||
ROMMON | 127 | |||
FPGA | 1.13 | |||
5.2.0 | IOS / Kernel | 353 (151-3.SVH) | ||
ROMMON | 127 | |||
FPGA | 1.13 |
901
XR Release | FCS or SMU | Image Type | Image Version | Notes |
4.3.0 | FCS | IOS / Kernel | 1212.1 | |
ROMMON | 2.1 | |||
FPGA | n/a | |||
4.3.1 | FCS | IOS / Kernel | 1304.23 | |
ROMMON | 2.1 | |||
FPGA | n/a | |||
4.3.2 | FCS | IOS / Kernel | 1308.18 | |
ROMMON | 2.1 | |||
FPGA | n/a | |||
4.3.4 | FCS | IOS / Kernel | 1312.06 | |
ROMMON | 2.1 | |||
FPGA | n/a | |||
5.1.0 | FCS | IOS / Kernel | 1308.18 | |
ROMMON | 2.1 | |||
FPGA | n/a | |||
5.1.1 | FCS | IOS / Kernel | 1401.13 | |
ROMMON | 2.1 | |||
FPGA | n/a | |||
5.1.2 | IOS / Kernel | 1310.03? | ||
ROMMON | 2.1 | |||
FPGA | n/a | |||
5.2.0 | FCS | IOS / Kernel | 1406.12 | |
ROMMON | 2.1 | |||
FPGA | n/a |
<
<h2 id="Image_TransferActivation_Issues1” name=" image_transferactivation_issues1"="">Image Transfer Behavior(5.1.3/5.2.2 and pre 5.1.3 relases)
Pre 5.1.3 release:
Observation:
There is no retry mechanism and satellite returns success to the Host. "show nv satellite status" indicates "Transfer complete".
Recovery:
Correct the misconfig and re-transfer.
To verify if image is really transferred:
The "show nv satellite status" shows that some newer/older version is still available to be transferred.
The user can also telnet to the satellite and do a show version.
Release 5.1.3 and 5.2.2
9000v:
Observation:
Satellite is stuck in "Transferring New Image" state. If you have entered progress option, your telnet session can get stuck. Issue a ctrl C to come out.
Recovery:
For image upgrade sessions that remain in "Transferring New Image" state for more than five minutes, reattempt image transfer by restarting process icpe_satmgr, and executing the right transfer commands. Before that, ensure configuration is correct.
901:
Same behaviour as Pre 5.1.3
LC:Satellite#sh satellite crosslink tenGigabitEthernet 1/45
Interconnect Link: TenGigabitEthernet1 /45 xos_if: 5
icl_id: 1 host_id: 1 cp_vlan: 0
Link State: Up SDAC State: Discovered Crosslink State: Up
---------------------------------------------------------------
Access Port ICLID Link State ForcedDown ? TxDisabled ?
---------------------------------------------------------------
Gi1/1 1 Up No No
Gi1/2 1 Up No No
Gi1/3 1 Up No No
Gi1/4 1 Up No No
If crosslink map is missed or not expeted, please turn on “debug sdac control all feature-channel 3/ feature-channel 4(for bundle ICL), and send the data for further analysis.
If interfaces are showing up on the satellite but down on the host, collect “show tech satellite” from host and “debug sdac control all feature-channel 1” from the satellite.
login to bcm shell
test bcm shell for 9000v and test platform bcm shell for 901.
show c xe0 where xe0 -xe3 is ten 1/45 to ten 1/48
BCM.0> show c xe0
RUC.xe0 : 87 +30 5/s
RDBGC0.xe0 : 2 +1 ------------->
RDBGC0 counter- packets are dropped,check if this is same as ping rate
RDBGC2.xe0 : 15 +4
ITPOK.xe0 : 91 +29 6/s
ITUC.xe0 : 85 +28 5/s
ITMCA.xe0 : 4 +1
ITPKT.xe0 : 91 +29 6/s
IT64.xe0 : 2 +1
IT127.xe0 : 40 +3
IT255.xe0 : 17 +4
IT1518.xe0 : 26 +21 4/s----> packets sent out towards the host
ITBYT.xe0 : 48,626 +30,872 7,043/s
IR64.xe0 : 21 +6
IR127.xe0 : 41 +4
IR255.xe0 : 16 +4
IR1518.xe0 : 26 +21 4/s ---> this counter means packets with size 1400 are recieved on bcm shell
Check if the “echo reply” is going out of Satellite ICL ( port gig 300/0/0/0 would mean ge0 on bcm shell and so on):
BCM.0> show c ge0
RUC.ge0 : 767 +767 4/s
RDBGC1.ge0 : 17 +17
GR127.ge0 : 50 +50
GR1518.ge0 : 734 +734 4/s --- >>> packets recieved back from remote router.
GRPKT.ge0 : 784 +784 5/s
GRBYT.ge0 : 1,046,372 +1,046,372 6,858/s
GRMCA.ge0 : 17 +17
GRUC.ge0 : 767 +767 4/s
GRPOK.ge0 : 784 +784 5/s
GT127.ge0 : 50 +50
GT1518.ge0 : 734 +734 4/s -----------> packets go out of access port.
GTPKT.ge0 : 784 +784 4/s
GTMCA.ge0 : 17 +17
GTBYT.ge0 : 1,046,372 +1,046,372 6,765/s
GTUC.ge0 : 767 +767 4/s
GTPOK.ge0 : 784 +784 4/s
PERQ_PKT(0).ge0 : 784 +784 4/s
PERQ_BYTE(0).ge0 : 1,046,372 +1,046,372 6,765/s
BCM.0>
Trace the packets end to end along the path and determine where the packets are getting dropped. If packets are getting dropped on the bcm shell(RDBGC0 counter usually indicates drops) and if the drop rate on this counter is same as the total rate drops, contact the satellite team.
If there are drops under the egress NP on 9K incrementing under “EGR_SATELLITE_NOT_LOCAL_DROP”, the NP local bit is not set correctly on ICL or the packets are not taking the correct ICL bundle member as per the load- balancing algorithm.
Packets are loadbalanced on the ICL bundle as per the following formula:
(Satellite ether port number +2 ) modulo (no of members in bundle)
For eg, if the sat ether port is 100/0/0/0 and there are 3 members in the bundle, the traffic will take the ICL member (0+2) modulo 3 2, ie the member with LON=2.
The value for LON can be got from
sh bundle load-balancing bund-ether <> loc <>
Member Information:
Port: LON ULID BW
--- ---- --
Te0/0/0/23 0 0 1
Te0/3/0/1 1 1 1
Te0/3/0/3 2 2 1
Capture the below CLI's to check for the np local bit programming on the ICL Bundle members:
RP/0/RSP0/CPU0:vkg1#sh controllers pm inter gigabitEthernet 100/0/0/32 location 0/3/CPU0
Thu Nov 8 01:55:39.482 UTC
Ifname(1):
GigabitEthernet100 _0_0_32, ifh: 0x1a60 :
iftype 0x59
egress_uidb_index 0x30
ingress_uidb_index 0x30
port_num 0x0
subslot_num 0x0
phy_port_num 0x108
channel_id 0xa
channel_map 0xc
lag_id 0x2
virtual_port_id
0x108
Take virtual_port_is value =0108, revese the digits-0801
show controllers np struct
EgR -Port detail all-entries np2 location 0/3/CPU0
Entry 24: >>
KEY:0801 KEY SIZE:2
MASK:ffff MASK SIZE:2
RESULT:112
800200400a2000100000001405539575f710b000000000008012200000000 RESULT SIZE:32
<< it should be set to 8, if not programmed it is set to 0
RP/0/RSP0/CPU0:vkg1#show controllers np struct egr-port se ke 0801 np2 location 0/3/CPU0
Node: 0/3/CPU0:
----------------------------------------------------------------
Entry 1: >>
KEY:0801 KEY SIZE:2
MASK:ffff MASK SIZE:2
RESULT:112800200400a2000100000001405539575f710b000000000008012200000000 RESULT SIZE:32
Collect the below traces and contact the satellite support team.
sh nv satellite trace internal hardware all location <ICL_Linecard_Location>
If data packets are dropped on ingress satellite interface with NP drops as PARSE_DROP_IN_UIDB_TCAM_MISS the ICL mode may not be set on the ICL physical port. CHeck the UIDB on the ICL link
show uidb data location 0/0/CPU0 tenGigE 0/0/1/0 ingress
Satellite IC interface 0x1
Collect the below traces and contact the satellite support team.
- sh nv satellite trace internal hardware all location <ICL_Linecard_Location>
show ethernet hardware interface <> loc <> -> Gives the tcam address
show prm server tcam entries l2-LT <tcam addr> 1 np0 location <> -> dumps uidb entry
When these drops are seen, please collect sh nv satellite trace internal hardware all location <ICL_Linecard_Location> Satellite interface counters not updated on host
If satellite interface counters are not updated on host, check the output of sh nv sat protocol control to see if all channels are open. Collect the output of show tech satellite and contact the satellite support team.
RP/0/RSP0/CPU0:vkg3#sh im database interface gigabitEthernet 100/0/0/30 detail | i Redundancy
GDP - Global Data Plane, RED - Redundancy, UL - UL
Redundancy State: ACTIVE
1. Check if the optics used are supported: https://supportforums.cisco.com/document/12345786/asr9000-nv-satellite-optics-support-matrix has the list of optics supported by satellite
2. Check that the satellite is running with the latest software version.
3. Check Speed/auto neg settings on both the connected devices.
speed and autonegotiation configuration on access ports must match speed/autoneg configuration on connected peer device.
ASR9000/XR: Using Satellite(Tech Note)
ASR9000 Satellite QoS offload FAQ
ASR9000 nv Satellite Optics Support Matrix
Satellite 51x configuration guide
Satellite 43x configuration guide
Santosh Sharma, CCIE #40773
S/W Engineer, ASR9000
Abhisek Jhunjhunwala,
S/W Engineer, ASR9000
Sam Milstead
Customer Support Engineer, XR TAC
Add version mapping for 5.1.2, 5.2.0 and 5.2.2
Hi Santosh,
What is the image version I should see on satellite for the IOS-XR 4.3.4?
Thanks in advance,
Renato Reis
Hi Renato,
For the 9000v
4.3.4 | IOS / Kernel | 287 (151-3.SVFa) | May say 285 available, this is wrong | |
ROMMON | 125 | |||
FPGA | 1.13 |
Note: 285 (151-3.SVF) is 4.3.2
For the 901
4.3.4 | FCS | IOS / Kernel | 1312.06 | |
ROMMON | 2.1 | |||
FPGA | n/a |
Thanks,
Sam
Hi Sam,
Thanks for the information. So based on your post, once the image of the satellite matches with the ASR, I should ignore the fact it says "not latest version", right?
RP/0/RSP0/CPU0:router#show nv satellite status satellite 100
Fri Jul 4 09:34:19.113 GMT
Satellite 100
-------------
State: Connected (Stable)
Type: asr9000v
Description: SPOVMTSAT01
MAC address: 8478.ac01.6b40
IPv4 address: 10.0.0.1
Received Serial Number: CAT1703U11S
Remote version: Compatible (not latest version)
ROMMON: 125.0 (Latest)
FPGA: 1.13 (Latest)
IOS: 287.0 (Available: 285.0)
Configured satellite fabric links:
TenGigE0/0/0/0
--------------
State: Satellite Ready
IPv4 address: 10.0.0.254
Port range: GigabitEthernet0/0/40-43
Bundle-Ether100
---------------
State: Satellite Ready
IPv4 address: 10.0.0.254
Port range: GigabitEthernet0/0/0-39
Discovered satellite fabric links:
TenGigE0/0/0/1: Satellite Ready; No conflict
TenGigE0/1/0/0: Satellite Ready; No conflict
TenGigE0/1/0/1: Satellite Ready; No conflict
RP/0/RSP0/CPU0:router#
LC:Satellite#show version
Cisco IOS Software, ASR9000v Software (asr9000v-GOLDEN_KNL-M), Version 15.1(3)SVF4a, RELEASE SOFTWARE (fc3)
Technical Support: http://www.cisco.com/techsupport
Copyright (c) 1986-2013 by Cisco Systems, Inc.
Compiled Tue 12-Nov-13 12:27 by prod_rel_team
ROM: System Bootstrap, Version 125.0(20120413:034627) [karagraw-ROMMON_125.bootcode 104], DEVELOPMENT SOFTWARE
LC:Satellite uptime is 2 hours, 19 minutes
System returned to ROM by reload at 10:12:06 UTC Fri Jul 4 2014
Running default software
Last reload type: Normal Reload
cisco ASR 9000v (MPC8500) processor (revision ) with 1036144K bytes of memory.
Processor board ID CAT1703U11, with hardware revision
CARDTYPE = REV >= P2B PIZZA BOX 1.5GHZ CPU
FPGA REV = Zola 00001.0x000D
CPLD REV = 0x0B
Last reset from CMP SReset.
7K bytes of non-volatile configuration memory.
8192K bytes of processor board Hidden Config flash (Read/Write)
61440K bytes of processor board RAM Disk (Read/Write)
20480K bytes of processor board System flash (Read/Write)
Configuration register is 0x101 (will be 0x1 at next reload)
LC:Satellite#
Thanks
Renato Reis
Correct, that was a build error when we published 4.3.4
Everything looks good.
Sam
Hi Renato,
I have added the details for other versions as well to this doc.
Thanks,
Santosh
Thank you Sam !!!
Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: