Solved: Re: ESA queue out rate is only 20 messages in every 100 seconds

Crovax · ‎03-01-2018

Hello.

I have ESA server on perimeter edge.

Current AsyncOS Version: 11.0.1-027

Output rate of workqueue is very low:

esa.domain.local> workqueue rate 10

Time      Pending    In   Out
10:21:42     1434     0     0
10:21:52     1454    20     0
10:22:02     1470    16     0
10:22:12     1489    19     0
10:22:22     1511    22     0
10:22:32     1533    22     0
10:22:42     1551    18     0
10:22:52     1569    18     0
10:23:02     1587    18     0
10:23:12     1573     6    20
10:23:22     1573     0     0
10:23:33     1575     2     0
10:23:43     1577     2     0
10:23:53     1578     1     0
10:24:03     1579     1     0
10:24:13     1579     0     0
10:24:23     1579     0     0
10:24:33     1579     0     0
10:24:43     1579     0     0
10:24:53     1560     1    20
10:25:03     1560     0     0
10:25:13     1561     1     0
10:25:23     1562     1     0
10:25:33     1565     3     0
10:25:43     1566     1     0
10:25:53     1566     0     0
10:26:03     1566     0     0
10:26:13     1567     1     0
10:26:23     1567     0     0
10:26:33     1547     0    20
10:26:43     1549     2     0
10:26:54     1550     1     0
10:27:04     1550     0     0
10:27:14     1552     2     0
10:27:24     1552     0     0
10:27:34     1554     2     0
10:27:44     1555     1     0
10:27:54     1557     2     0
10:28:04     1558     1     0
10:28:14     1541     3    20
10:28:24     1541     0     0
10:28:34     1541     0     0
10:28:44     1542     1     0
10:28:54     1542     0     0
10:29:04     1543     1     0

How can I remedy this problem?

Thank you.

Crovax · ‎05-15-2018

Hello.

Thank you for reminder.

Yes, the problem was solved

Quote from support:

Case summary :

------------------------------------------------------

We confirmed that the delay was introduced due to error of SLBL as timing out “Warning: MID 4959181 unable to lookup SLBL for recipient because the DB server is unavailable.” , where the SLBL got corrupted and was exiting and restarting multiple times due to improper reboot for the appliance.

So we have repaired the engine and restarted the services from the backend and now mail flow and workqueue is getting empty quickly.

View solution in original post

Libin Varghese · ‎03-02-2018

I would recommend opening a TAC case to check what part of the workqueue processing is taking time.

Since there are multiple scanning engines within the queue, TAC using a remote access to the appliance can review further.

CLI command "status detail" should show basic information on the CPU usage as well.

You can also check the mail_logs, output for command "displayalerts" to check if there are any alerts being generated from the appliance.

Regards,

Libin Varghese

Mathew Huynh · ‎03-02-2018

Hello Crovax,

I definitely agree with the details Libin has shared for an indepth analysis into this.

But in the event you're not able to reach out to TAC at the moment.

One method of looking into what -may- be occupying your workqueue is, looking at the message tracking or mail_logs (whichever is easier) on which process it seems to be held up for 100 seconds on.

This is my rough speculation here, but it seems you may have a security engine with a timeout set to 100, and all queue threads are being held up on this until a timeout has met and it flushes the lot out before being stuck again.

Another useful command if not already done is "workqueue status" to verify if it's paused on anything.

Finally, "displayalerts".

Most definitely contacting TAC will give you a more definitive response.

Let us know how it goes.

Cheers,

Matthew

Crovax · ‎03-05-2018

Thank you.

I've searched, but cannot find any settings that matches 100 seconds.

Will try to contact TAC.

HStrohmaier · ‎05-15-2018

Did you solve your problem?

Crovax · ‎05-15-2018

Hello.

Thank you for reminder.

Yes, the problem was solved

Quote from support:

Case summary :

------------------------------------------------------

We confirmed that the delay was introduced due to error of SLBL as timing out “Warning: MID 4959181 unable to lookup SLBL for recipient because the DB server is unavailable.” , where the SLBL got corrupted and was exiting and restarting multiple times due to improper reboot for the appliance.

So we have repaired the engine and restarted the services from the backend and now mail flow and workqueue is getting empty quickly.

HStrohmaier · ‎05-15-2018

Were are not used to repair the SDB please. So, please, could you give us a short introduction how to repair the SDB?

Thank you so much :)

Crovax · ‎05-15-2018

I'm sorry. I don't know details. Repair was done by TAC engineer.

Mathew Huynh · ‎05-15-2018

Hello All,

The safelist blocklist DB is only repairable by TAC using the remote tunnel access - there was likely some corrupted entries causing the DB to fail to initialize and with thes remote tunnel access TAC should be able to further scope it down to correct.

Thanks for letting us know how it went :).

Regards,
Mathew