cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
7628
Views
0
Helpful
13
Replies

Commit history ceased to exist

Orkhan Gasimov
Level 1
Level 1

Hi all,

After our network engineer upgraded IOS XR to version 5.2.4, commit history ceased to exist.

sh conf hist displays this:

... (normal commit history)
392   commit     id 1000000923                  Fri Apr 24 15:49:05 2015
Unknown record type '434c4900'!
Unknown record type '4a287e00'!
Unknown record type '434c4900'!
Unknown record type '0'!
... (many similar strings follow)
394 commit id 1000000924 Fri Apr 24 16:57:27 2015
395 backup Periodic ASCII backup Fri Apr 24 17:01:27 2015
... (normal commit history)
480 commit id 1000000988 Thu May 14 15:25:20 2015
481 shutdown sync for potential shutdown Thu May 14 16:08:06 2015

That network engineer left, and I don't know what IOS XR version was before 5.2.4.

Event #481 is the last in the list, although I made many changes ever since.

Could anyone please explain why might this be happening, and what consequences follow?

E.g., if I reboot the router, will all changes after event #481 be lost?

Thank you very much!

2 Accepted Solutions

Accepted Solutions

>>>show tech cfgmgr worked this time and produced a lot of output, unfortunately it will not tell me as much as it would to a TAC engineer.

I think we have no many choices. But we at least try. Lets look on it.

View solution in original post

This looks to be CSCus42510 which is still being worked. Unfortunately not enough information was collected when the issue was seen once before so we have not been able to find a fix yet.

Please try to open a TAC SR, if you have entitlement issues please email me and I'll work to get that resolved. If this can't be done then we can try over email, without a TAC SR, to work the issue, unfortunately the supportforums isn't a good tool for working a bug.

Thanks,

Sam

View solution in original post

13 Replies 13

smilstea
Cisco Employee
Cisco Employee

Hi Orkhan,

Can you open a TAC SR with the info you pasted here plus the following:

show configuration history

show install active summary

show platform

show log

show tech cfgmgr

show rdsfs trace boot location all

show rdsfs trace iofunc location all

show rdsfs trace error location all

show rdsfs trace internal location all

As for what will happen if you reload, I am not entirely sure. I would recommend, and always do, keeping a backup of the configuration just in case (copy running <destination>).

Thanks,

Sam

Hi Sam,

Many thanks for these precious tips.

Unfortunately, I can't open a TAC SR, because this usually means being asked questions about licenses, SmartNet etc., and I don't have info about all these.

However, I did copied all show outputs that you recommended, and put them inside an archive in this public cloud.

One of the most interesting and simplest to understand (for me) was this output:

RP/1/RSP0/CPU0:ironman0#show tech cfgmgr
Thu Dec 3 21:32:18.001 Baku
++ Show tech start time: 2015-Dec-03.213219.Baku ++
mkdir: //net/node1_RSP0_CPU0/harddisk:/showtech: No space left on device
Failed to create directory //net/node1_RSP0_CPU0/harddisk:/showtech
++ Show tech end time: 2015-Dec-03.213222.Baku ++

Then I did this:

RP/1/RSP0/CPU0:ironman0#dir harddisk:
Thu Dec 3 22:11:09.244 Baku

Directory of harddisk:

11498 -r-- 734208 Thu Jan 1 04:07:07 1970 .bitmap
17 -r-- 24576 Thu Jan 1 04:07:07 1970 .inodes
18 -rw- 0 Thu Jan 1 04:07:07 1970 .boot
19 -rw- 0 Thu Jan 1 04:07:07 1970 .altboot
11503 drwx 4096 Thu Feb 13 02:43:05 2014 LOST.DIR
11504 -r-- 73728 Thu Jan 1 04:07:07 1970 .longfilenames
11505 drwx 12288 Tue Nov 24 18:45:04 2015 dumper
11506 -rw- 2418 Thu Feb 13 02:43:12 2014 repart.quick
11507 drwx 4096 Thu Feb 13 02:43:44 2014 sld
11508 drwx 4096 Thu Feb 13 02:43:44 2014 shutdown
11509 -rw- 2480 Mon Oct 19 04:40:29 2015 uptime_hist
11563 -rw- 32 Thu Feb 13 02:44:47 2014 uptime_static_data
11511 drwx 4096 Thu Feb 13 02:44:10 2014 clihistory
11512 drwx 4096 Thu Feb 13 02:44:46 2014 idiags
11513 -rw- 61440 Wed Nov 25 09:45:08 2015 uptime_cont
11514 -rw- 28 Thu Feb 13 02:44:49 2014 env_static_data
11515 -rw- 24 Wed Feb 12 11:42:51 2014 env_hist
11516 -rw- 5056 Mon Oct 19 04:40:32 2015 env_cont
11517 -rw- 92 Fri Dec 26 09:39:37 2014 temp_static_data
11518 -rw- 444 Fri Sep 4 12:16:54 2015 temp_hist
11519 -rw- 70144 Thu Dec 3 21:53:03 2015 temp_cont
11520 -rw- 548 Thu Feb 13 02:48:07 2014 volt_static_data
11521 -rw- 1188 Wed Aug 19 03:20:42 2015 volt_hist
11522 -rw- 153088 Thu Dec 3 21:49:00 2015 volt_cont
11523 drwx 8192 Thu May 21 06:13:50 2015 insttr
11524 drwx 4096 Thu Feb 13 02:58:19 2014 ipodwdm_log
11525 drwx 4096 Mon Oct 19 04:53:06 2015 np
11526 drwx 4096 Tue Apr 8 22:04:18 2014 nvgen_traces
11527 drwx 4096 Thu Sep 4 23:35:37 2014 showtech-temp
11528 drwx 4096 Sun Dec 21 03:28:27 2014 install
11529 -rw- 24 Wed Feb 12 11:42:55 2014 errmsg_hist
11530 -rw- 87552 Tue Nov 24 18:45:41 2015 errmsg_cont
11531 -rw- 24 Wed Feb 12 11:42:59 2014 diag_hist
11532 -rw- 24 Wed Feb 12 11:43:01 2014 diag_cont
11533 drwx 4096 Mon Jul 21 03:12:19 2014 dhcp
11534 drwx 4096 Fri May 22 08:24:30 2015 sysmgr_debug
11535 drw- 4096 Mon Jul 6 17:40:05 2015 xr524
11607 -rw- 0 Thu May 21 06:43:24 2015 zl30160_ts_info.log
11613 -rw- 147989 Fri May 22 08:05:38 2015 goldeninfo.hog.node1_RSP0_CPU0
11538 ---- 2404 Fri May 22 08:05:24 2015 goldeninfo.first
11539 --wx 75 Mon Oct 19 03:07:33 2015 chkfs_repair.log

3007315968 bytes total (0 bytes free)
RP/1/RSP0/CPU0:ironman0#

As the disk is full, no wonder there's a problem.

However, I'm not sure which of these files I can safely delete.

Can you please give a piece of advice?

Thank you!

Hi Orkhan,

I would look toward 'dumper directory' in harddisk:

Files in this directory - are results of core dumpes during sytem crashes. It seems since you havent Cisco TAC support, you dont need these files :)

Also I would check other directories with most recent date of changes, looking into these directories and make decision  - delete files from it or not.

For example, look into

sysmgr_debug, xr524 

It seems to me, 'xr524'  is the directory created your previous network admin, with 5.2.4 SW packages within ( or may be  full tar-bundle - it is more then 2 gigs), and in 'sysmgr_debug' - perhaps results from some debugging of sysmgr process from previous network administrator.

May be more comfortable in your investigation about large and unnecessary files - some sftp clients (FileZilla for example)

Also please check 'showtech-temp' directory.

Also, look into config or the box and find out - whether archive configured for logs? If yes - find and  decide: destroy these files or not.

Also, check inactive packages - 'sho install inactive' (.pie, SMU) - If it exists and your dont planned to rollback to this SW version - delete it ('install remove inactive' from admin-mode). So in ideal, in output from 'dir disk0:' you shouldnt see files with sw-version other than you current sw version - in case that you 100% sure that you dont have plans of rolling back to other sw version, existsing on the box.

Also for find out, what happened about installation activities - check the output of 'sho install log' - perhaps you find what was the previous SW-version. But if your previous network admin used Turbo boot for installing 5.2.4, no logs about previous installation will be there.

Regards,

Olev

Hi Olev,

I'm grateful for your advice.

Deleting the 'dumper' directory freed up a lot of space.

Unluckily, the problem with commit history did not disappear: event #481 is still the last in the output of 'sh conf hist'. New commits are not shown.

'sh log' has a lot of these lines:

RP/1/RSP0/CPU0:Dec  4 10:23:13.655 : cfgmgr_show_history[65895]: Invalid message type(0x434c4900) possible endian mismatch file config/cfgmgr/common/src/cfgmgr_config_history.c:379

Can it give a clue as to what is wrong with our ASR9001?

Thank you.

Hi Orkhan.

I sorry if some misunderstood happened, but I talk about 'delete files within dumper directory', not the 'delete dumper directory at all'.

So, as you get enough free space (how many, interesting?...) may be you try to get 'show tech cfgmgr' along with the other commands that Sam adviced you, again?

Can you run 'cfs check' and show the output?

'fsck disk0: ' - run this for file system possible corruption repair.

If after that nothing changes,

As we can see from the 'sho logs' and from 'sho conf hist', messages in logs like

RP/1/RSP0/CPU0:Dec  3 21:23:42.318 : cfgmgr_show_history[65884]: Invalid message type(0x434c4900) possible endian mismatch file config/cfgmgr/common/src/cfgmgr_config_history.c:379

are corellated with

Unknown record type '434c4900'! 

Futher, in 'sho conf hist' we see"

426   commit     id 1000000948                  Sun May 10 05:00:45 2015

,,, many rows with 'Unknown record type' ...

427   backup     Periodic ASCII backup          Sun May 10 05:14:07 2015

So within 15 minutes we have many rows with 'Unknown record type'

Can you find the file 'config/cfgmgr/common/src/cfgmgr_config_history' and show it? It seems that you need delete some entries (pointed to Unknown record type') in that file and save it again to the box.

If it impossible or fearfully, I would try to delete from CLI  config history entries from now till 426. But you should understand that consequently you will no have config entries from 426 till now.

No worries, I did not delete the directory itself, just the files inside it.

As to free space: 3007315968 bytes total (2321936896 bytes free). It's really a lot.

show tech cfgmgr worked this time and produced a lot of output, unfortunately it will not tell me as much as it would to a TAC engineer.

The next command didn't produce anything abnormal:

RP/1/RSP0/CPU0:ironman0#cfs check
Fri Dec 4 15:18:34.888 Baku
Creating any missing directories in Configuration File system...OK
Initializing Configuration Version Manager...OK
Syncing commit database with running configuration...OK
RP/1/RSP0/CPU0:ironman0#

fsck disk0: also went well, but the result about the commit history is the same: no entries after event #481.

As to the file config/cfgmgr/common/src/cfgmgr_config_history , utility find did not reveal it anywhere, either in EXEC or in ADMIN mode.

I also tried the command clear configuration commits oldest 100 , but sh conf history still shows the same exact records, from the beginning to the end. Maybe clear configuration commits oldest 100 is not what I though it was for (I wanted 'to delete from CLI  config history entries from now till 426', as you said).

Result: I have no idea what else I can do next...

>>>show tech cfgmgr worked this time and produced a lot of output, unfortunately it will not tell me as much as it would to a TAC engineer.

I think we have no many choices. But we at least try. Lets look on it.

Thanks a lot for your willingness to help.

Sam (also in this conversation) said he would try to help with TAC SR, so I think I'll bother him first. If no luck, and you will still have willingness to investigate the problem, then I'll bother you, OK?

Thanks again.

Orkhan,

about  then I'll bother you - no problem, welcome.

 

This looks to be CSCus42510 which is still being worked. Unfortunately not enough information was collected when the issue was seen once before so we have not been able to find a fix yet.

Please try to open a TAC SR, if you have entitlement issues please email me and I'll work to get that resolved. If this can't be done then we can try over email, without a TAC SR, to work the issue, unfortunately the supportforums isn't a good tool for working a bug.

Thanks,

Sam

Hi Sam,

I've spent so much time working with Cisco devices, learning Cisco technologies and encouraging everyone else around me to learn them, that when I hear about bugs in the Cisco IOS, I feel very sad, like it hurts me personally...

However, your willingness to help with TAC SR is really impressive.

The thing is, I tried to open a TAC SR, but the system does not allow to procede without info about service contracts, and currently I have no info about them.

So the only way is over Email, but I don't know your mailbox.

Please direct me. Thank you.

smilstea@cisco.com

Orkhan Gasimov
Level 1
Level 1

To all those who are experiencing the same problem, here's the end of the story:

"Unfortunately the logs have wrapped since Oct 19th so we cannot comment on how the corruption happened. As well there is no command to fix the corruption. However, all we are losing is the history of events which is used for troubleshooting config issues, rollback capability will not be impacted.

In parallel, we are working to get some fixes put in for config manager which should help to prevent this type of error in the future."

This was the answer from the Cisco side. Although the problem turned to be uncorrectable, I would like to thank everyone involved in investigating the issue.