cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
241
Views
2
Helpful
8
Replies

FMC 7.6.0: MonetDB crashes

Bernd Nies
Level 1
Level 1

Hi,

I'm running FMC 7.6.0-113 in a test environment without license, hence no TAC support. The database integrity check fails and the FMC dashboards show no graphs. I already did the following:

 

admin@cisco-fmc-test:~$ sudo DBCheck.pl
admin@cisco-fmc-test:~$ sudo pmtool restartbyid monetdb

 

The logfile shows that database 'eventdb' repeatedly crashes.

 

root@cisco-fmc-test:~# tail -f /var/log/monetdb/merovingian.log
2024-12-18 10:53:12 MSG eventdb[22167]: # MonetDB 5 server v11.45.31 (Sep2022-SP6)
2024-12-18 10:53:12 MSG eventdb[22167]: # Serving database 'eventdb', using 8 threads
2024-12-18 10:53:12 MSG eventdb[22167]: # Compiled for x86_64-pc-linux-gnu/64bit with 128bit integers
2024-12-18 10:53:12 MSG eventdb[22167]: # Found 31.363 GiB available main-memory of which we use 6.492 GiB
2024-12-18 10:53:12 MSG eventdb[22167]: # Virtual memory usage limited to 118.323 GiB
2024-12-18 10:53:12 MSG eventdb[22167]: # Copyright (c) 1993 - July 2008 CWI.
2024-12-18 10:53:12 MSG eventdb[22167]: # Copyright (c) August 2008 - 2023 MonetDB B.V., all rights reserved
2024-12-18 10:53:12 MSG eventdb[22167]: # Visit https://www.monetdb.org/ for further information
2024-12-18 10:53:12 MSG eventdb[22167]: # MonetDB/SQL module loaded
2024-12-18 10:53:12 MSG eventdb[22167]: # Listening for UNIX domain connection requests on mapi:monetdb:///Volume/lib/monetdb/dbfarm/eventdb/.mapi.sock
2024-12-18 10:53:13 MSG merovingian[29721]: database 'eventdb' has crashed after start on 2024-12-18 10:53:11, attempting restart, up min/avg/max: 32m/2w/11w, crash average: 1.00 1.00 1.00 (6990-10=6980)
2024-12-18 10:53:13 MSG eventdb[22193]: arguments: /usr/bin/mserver5 --dbpath=/Volume/lib/monetdb/dbfarm/eventdb --set=merovingian_uri=mapi:monetdb://cisco-fmc-test:5193/eventdb --set=mapi_listenaddr=none --set=mapi_usock=/Volume/lib/monetdb/dbfarm/eventdb/.mapi.sock --set=monet_vault_key=/Volume/lib/monetdb/dbfarm/eventdb/.vaultkey --set=gdk_nr_threads=8 --set=tablet_threads=8 --set=max_clients=64 --set=sql_optimizer=default_pipe --set=gdk_mem_maxsize=6970776576 --set=gdk_vm_maxsize=127048247296

 

Maybe its related to a recent increase of CPU count from 4 to 8, or unrelated. Any idea how to fix it?

We're in beginning transition phase from ASA to FTD and I'm afraid that this is only the tip of the iceberg of future problems.

Regards,

Bernd

 

2 Accepted Solutions

Accepted Solutions

Yeah, this looks like something that would've perhaps been damaged while still running 7.2.8.  You should be able to just delete those tables and the DB will come up.  The monetdbrepair.sh script should help do that, I think.  Otherwise you can manually remove the named tables and monetdbrepair.sh should then fix any associated references and be clean.

Good luck!

View solution in original post

Bernd Nies
Level 1
Level 1

Thanks. Repairing MonetDB worked using that script.

root@cisco-fmc-test:~# /usr/local/sf/etc/monetdb_repair/monetdbrepair.sh
 ------------------------------------------------------------------------
 ----------------------------- MonetDB Repair ---------------------------
 ------------------------------------------------------------------------
 1. diagnose current state of monetdb <--- Start HERE
 2. collect debugging information
 3. Hide advanced options
 4. Exit.
      --------- advanced options: ---------
 5. repair monetdb (restarts MonetDB, which can disrupt the FMC)
 6. recover monetdb (rebuild while attempting to save customer data)
 7. rebuild monetdb (this process will delete all data in MonetDB)
 8. clear monetdb WAL
 9. drop bad database partition tables
 ------------------------------------------------------------------------
Select (1..9) [1]:
7

        --- Rebuild monetdb ---
  This action will delete all customer data stored in monetdb (events, etc)
Has a REPAIR already been attempted? (Yes/No):
Yes
Really proceed with REBUILD? (Yes/No):
Yes

---------------------------------------------------------------------
REBUILD Started: 2024-12-19 04:51:12
REBUILDING (1/13): Confirm 'mysqld' is running
REBUILDING (2/13): monetdb already enabled
REBUILDING (3/13): Disable SFDataCorrelator
.....
REBUILDING (4/13): Stop eventDB - Success!
REBUILDING (5/13): flag rebuild in progress - Success!
REBUILDING (6/13): Destroy eventDB - Success!
REBUILDING (7/13): Remove eventDB directory - Success!
REBUILDING (8/13): Rebuild eventDB - Success!
REBUILDING (9/13): Recreate app tables - Success!
REBUILDING (10/13): Synchronize AppTags - Success!
REBUILDING (11/13): Synchronize AppIdInfo - Success!
REBUILDING (12/13): Enable SFDataCorrelator
...
REBUILDING (13/13): Connection test eventDB

 ----------------------------------------------------------
 -- Trying to establish a connection to monetdb.  This can take some time
    after a restart (depending on system load, WAL file processing, etc).
 --   -- waiting (max 300s) [ cancel waiting with control-c ]



 MonetDB connection has been established.
 ----------------------------------------------------------

REBUILD Finished: 2024-12-19 04:51:58
DIAG Started: 2024-12-19 04:51:58
DIAG: connection established with monetdb
DIAG: running internal schema check
Writing Current Schema to /tmp/tmpqfyya95u
DIAG: monetdbcheck: no issues reported
DIAG Finished: 2024-12-19 04:51:59

-----------------------------------------------------------
-- Recommendations ----------------------------------------
-----------------------------------------------------------

  MonetDB is running and connections are possible;
  no monetdb errors detected.

- nothing needs to be done.
------------------------------------------------------

 ------------------------------------------------------------------------
 ----------------------------- MonetDB Repair ---------------------------
 ------------------------------------------------------------------------
 1. diagnose current state of monetdb <--- Start HERE
 2. collect debugging information
 3. Hide advanced options
 4. Exit.
      --------- advanced options: ---------
 5. repair monetdb (restarts MonetDB, which can disrupt the FMC)
 6. recover monetdb (rebuild while attempting to save customer data)
 7. rebuild monetdb (this process will delete all data in MonetDB)
 8. clear monetdb WAL
 9. drop bad database partition tables
 ------------------------------------------------------------------------
Select (1..9) [1]:
4

View solution in original post

8 Replies 8

What type of lab environment?  What hypervisor?  What size FMC?  Do you have the proper amount of resources assigned?  How did you download the FMC software without a support contract?

Bernd Nies
Level 1
Level 1

Upgraded from Cisco_Secure_FW_Mgmt_Center_Virtual_VMware-7.2.8-25.tar.gz to 7.6.0 with the sizing the OVF template came with: (4 CPU, 32 GB RAM, 250 GB disks). Runs on ESXi 6.7. We have support contract for ton of other Cisco stuff, just no smart license for FMC/FTD. It's a test environment where I prepare actual migration of ASA rules and setup to FTD. The MonetDB was running before, even after update to 7.6.0. I know 7.6.0 is too new and not recommended, but I wanted to look ahead. The production environment will be 7.4.2.

https://blogs.vmware.com/vsphere/2020/06/announcing-extension-of-vsphere-6-7-general-support-period.html

I don't think this is your issue but I would highly recommend going with the recommended specs, not minimums.  Also without a support contract you are technically not eligible to download upgrades either.  

If it was running successfully after upgrade to 7.6.0, then at least it's a newer problem.  An issue that can occur with MonetDB is that it develops a problem that doesn't demonstrate until a fresh start/restart, and this often first occurs in upgrade.  Makes things confusing.

The reason it is crashing will be in the merovingian.log, /var/log/monetdb/merovingian.log.  If it has been failing for a long time, the original problem may have rolled out of the log files, but check the logs there to see if one of them indicates the original problem.

There is a script that can perform some repairs to known problems, but it's meant for use by people with foreknowledge of the various operating (and failure) modes of MonetDB.  If you want to try it, to get diagnostics, you can run /usr/local/sf/etc/monetdb_repair/monetdbrepair.sh and see what it finds.  It can perform repairs in multiple ways, with varying degrees of risk/damage, but will lead you on recommended paths in most cases.

Bernd Nies
Level 1
Level 1

Thats the error on first occurence of monetdb crashing. 

 

 

2024-12-12 09:32:34 MSG eventdb[8458]: # Listening for UNIX domain connection requests on mapi:monetdb:///Volume/lib/monetdb/dbfarm/eventdb/.mapi.sock
2024-12-12 09:32:37 ERR eventdb[8458]: #client1: createExceptionInternal: ERROR: SQLException:sql.count:42S22!Column missing event_schema.ssl_certificatestats_1733994000_0
2024-12-12 09:33:02 ERR eventdb[8458]: #client12: createExceptionInternal: ERROR: SQLException:sql.rel_check_tables:3F000!ALTER MERGE TABLE: to be added table doesn't match MERGE TABLE definition
2024-12-12 09:35:15 MSG merovingian[8405]: database 'eventdb' has crashed after start on 2024-12-12 09:32:30, attempting restart, up min/avg/max: 32m/2w/11w, crash average: 1.00 0.10 0.03 (10-9=1)
2024-12-12 09:35:15 MSG eventdb[11619]: arguments: /usr/bin/mserver5 --dbpath=/Volume/lib/monetdb/dbfarm/eventdb --set=merovingian_uri=mapi:monetdb://cisco-fmc-test:5193/eventdb --set=mapi_listenaddr=none --set=mapi_usock=/Volume/lib/monetdb/dbfarm/eventdb/.mapi.sock --set=monet_vault_key=/Volume/lib/monetdb/dbfarm/eventdb/.vaultkey --set=gdk_nr_threads=8 --set=tablet_threads=8 --set=max_clients=64 --set=sql_optimizer=default_pipe --set=gdk_mem_maxsize=6970776576 --set=gdk_vm_maxsize=127048247296

 

 

Seems it altered the database schema during update to 7.6.0 and got missed because I didn't boot the VM since then.

 

 

admin@cisco-fmc-test:~$ sudo /usr/local/sf/etc/monetdb_repair/monetdbrepair.sh
··········8<··········
DIAG Started: 2024-12-18 14:54:48
DIAG: MonetDB daemon last started on 2024-12-18 11:08:45
DIAG: MonetDB engine last started on [could not determine time]
DIAG: FMC Version: 7.6.0
DIAG: MonetDB Database Server v11.45.31 (Sep2022-SP6)
DIAG: WAL Size: 12K
DIAG: mserver5 pid: 14987
DIAG: monetdb is running via pmtool
DIAG: connection established with monetdb
DIAG: running internal schema check
monetdbd: an internal error has occurred 'could not receive initial byte: Connection reset by peer', refer to the logs for details, please try again later
Something went wrong during connection to monetdb!
2024-12-18 14:54:49,665 - cisco.monetdb_util.extract_eventdb_schema - ERROR - Failed extract_monetdb_schema with Exception monetdbd: an internal error has occurred 'could not receive initial byte: Connection reset by peer', refer to the logs for details, please try again later
Failed extract_monetdb_schema with Exception monetdbd: an internal error has occurred 'could not receive initial byte: Connection reset by peer', refer to the logs for details, please try again later
2024-12-18 14:54:49,665 - cisco.monetdb_util.extract_eventdb_schema - ERROR - Failed extract schema from MonetDB with monetdbd: an internal error has occurred 'could not receive initial byte: Connection reset by peer', refer to the logs for details, please try again later
Failed extract schema from MonetDB with monetdbd: an internal error has occurred 'could not receive initial byte: Connection reset by peer', refer to the logs for details, please try again later
DIAG: event schema does not match definition
monetdbd: internal error while starting mserver 'unable to create pipe: Too many open files', please refer to the logs
WARNING: Failed to connect to monetdb
DIAG: failed to examine app and tag tables
DIAG: eventdb schema errors detected with extra tables
DIAG: monetdbcheck: 6.13 -- /usr/local/sf/etc/monetdbcheck.py
DIAG: monetdbcheck: reports potential issues
-----------------------------------------------------------
-- monetdbcheck output: -----------------------------------
-----------------------------------------------------------
SQL id 1385295, BATid 1080 type: hge, rows: 5, width: 16, filename: 20/2070, tailsize: 80: no sql name known
SQL id 1385292, BATid 1600 type: str, rows: 5, width: 1, filename: 31/3100, tailsize: 5, heapsize: 8219: no sql name known
SQL id 1385294, BATid 2991 type: str, rows: 5, width: 1, filename: 56/5657, tailsize: 5, heapsize: 8230: no sql name known
SQL id 1385291, BATid 3011 type: lng, rows: 5, width: 8, filename: 57/5703, tailsize: 40: no sql name known
SQL id 1385293, BATid 3968 type: lng, rows: 5, width: 8, filename: 76/7600, tailsize: 40: no sql name known
event_schema.ssl_certificatestats_1733994000_0, SQL id 1385296, BATid 7396 type: msk, rows: 5, width: 1, filename: 01/63/16344, tailsize: 4: table without columns; cannot be repaired
-----------------------------------------------------------
--  monetdbcheck details: ---------------------------------
-----------------------------------------------------------
WARNING: table with missing columns
  --    CDETS: (probably) CSCwk33516 or CSCwj23777
FATAL: non-repairable errors detected
-----------------------------------------------------------
DIAG Finished: 2024-12-18 14:58:23
··········8<··········

 

https://bst.cisco.com/quickview/bug/CSCwj23777 

 So as usual: Never ever try software newer than recommended version.

Yeah, this looks like something that would've perhaps been damaged while still running 7.2.8.  You should be able to just delete those tables and the DB will come up.  The monetdbrepair.sh script should help do that, I think.  Otherwise you can manually remove the named tables and monetdbrepair.sh should then fix any associated references and be clean.

Good luck!

Bernd Nies
Level 1
Level 1

Thanks. Repairing MonetDB worked using that script.

root@cisco-fmc-test:~# /usr/local/sf/etc/monetdb_repair/monetdbrepair.sh
 ------------------------------------------------------------------------
 ----------------------------- MonetDB Repair ---------------------------
 ------------------------------------------------------------------------
 1. diagnose current state of monetdb <--- Start HERE
 2. collect debugging information
 3. Hide advanced options
 4. Exit.
      --------- advanced options: ---------
 5. repair monetdb (restarts MonetDB, which can disrupt the FMC)
 6. recover monetdb (rebuild while attempting to save customer data)
 7. rebuild monetdb (this process will delete all data in MonetDB)
 8. clear monetdb WAL
 9. drop bad database partition tables
 ------------------------------------------------------------------------
Select (1..9) [1]:
7

        --- Rebuild monetdb ---
  This action will delete all customer data stored in monetdb (events, etc)
Has a REPAIR already been attempted? (Yes/No):
Yes
Really proceed with REBUILD? (Yes/No):
Yes

---------------------------------------------------------------------
REBUILD Started: 2024-12-19 04:51:12
REBUILDING (1/13): Confirm 'mysqld' is running
REBUILDING (2/13): monetdb already enabled
REBUILDING (3/13): Disable SFDataCorrelator
.....
REBUILDING (4/13): Stop eventDB - Success!
REBUILDING (5/13): flag rebuild in progress - Success!
REBUILDING (6/13): Destroy eventDB - Success!
REBUILDING (7/13): Remove eventDB directory - Success!
REBUILDING (8/13): Rebuild eventDB - Success!
REBUILDING (9/13): Recreate app tables - Success!
REBUILDING (10/13): Synchronize AppTags - Success!
REBUILDING (11/13): Synchronize AppIdInfo - Success!
REBUILDING (12/13): Enable SFDataCorrelator
...
REBUILDING (13/13): Connection test eventDB

 ----------------------------------------------------------
 -- Trying to establish a connection to monetdb.  This can take some time
    after a restart (depending on system load, WAL file processing, etc).
 --   -- waiting (max 300s) [ cancel waiting with control-c ]



 MonetDB connection has been established.
 ----------------------------------------------------------

REBUILD Finished: 2024-12-19 04:51:58
DIAG Started: 2024-12-19 04:51:58
DIAG: connection established with monetdb
DIAG: running internal schema check
Writing Current Schema to /tmp/tmpqfyya95u
DIAG: monetdbcheck: no issues reported
DIAG Finished: 2024-12-19 04:51:59

-----------------------------------------------------------
-- Recommendations ----------------------------------------
-----------------------------------------------------------

  MonetDB is running and connections are possible;
  no monetdb errors detected.

- nothing needs to be done.
------------------------------------------------------

 ------------------------------------------------------------------------
 ----------------------------- MonetDB Repair ---------------------------
 ------------------------------------------------------------------------
 1. diagnose current state of monetdb <--- Start HERE
 2. collect debugging information
 3. Hide advanced options
 4. Exit.
      --------- advanced options: ---------
 5. repair monetdb (restarts MonetDB, which can disrupt the FMC)
 6. recover monetdb (rebuild while attempting to save customer data)
 7. rebuild monetdb (this process will delete all data in MonetDB)
 8. clear monetdb WAL
 9. drop bad database partition tables
 ------------------------------------------------------------------------
Select (1..9) [1]:
4

Glad you're up and running again.  I'm sorry the `repair` and `recover` options didn't work.  I'm told a newer version of the repair tools will know better how to deal with the original problem without having to do something as drastic. I believe the issue is resolved as of 7.6.0 and upcoming MRs, so I hope you won't see it again.

Review Cisco Networking for a $25 gift card