Solved: SSL read timeout

smp · ‎01-10-2017

I am having some trouble with my API calls timing out. I am using the Get Devices API and I am trying to get 1000 devices at a time with a URL like this: /webacs/api/v2/data/Devices.json?.full=true&.firstResult=0&.maxResults=1000 . I have ~5600 devices in Prime and I have set the timeout on my HTTP object to 900 seconds.

The timeout does not happen 100% of the time, but it is often enough to be interfering with a daily process. According to the API Health, the average response time is 160304 ms (~2.5 mins). Is there some way to troubleshoot why it is taking over 900s for this call to come back at times?

I'm running 3.1.4 Update 1 with Device Pack 6.

Spencer Zier · ‎01-11-2017

We noticed that, especially on instances running over a long period of time or on instances with low alarm thresholds (or a lot rules that create alarms), that the performance of the Devices resource pretty quickly degraded. We've addressed this in API v3 (which will be released with PI 3.2) by removing alarm counts from the Devices API. It could be something else causing the issue on your appliance, but if it's specific to that API resource, it's very likely the alarm counts that's causing the problem.

For an immediate workaround, try adding the ".nocount=true" parameter to your query string. This will instruct the API to bypass the query that it does to get you the count, first, and last attributes; but we generally see API response times cut in half with the use of this parameter. Please let me know if that addresses the issue.

https://developer.cisco.com/site/prime-infrastructure/

View solution in original post

Spencer Zier · ‎01-11-2017

We noticed that, especially on instances running over a long period of time or on instances with low alarm thresholds (or a lot rules that create alarms), that the performance of the Devices resource pretty quickly degraded. We've addressed this in API v3 (which will be released with PI 3.2) by removing alarm counts from the Devices API. It could be something else causing the issue on your appliance, but if it's specific to that API resource, it's very likely the alarm counts that's causing the problem.

For an immediate workaround, try adding the ".nocount=true" parameter to your query string. This will instruct the API to bypass the query that it does to get you the count, first, and last attributes; but we generally see API response times cut in half with the use of this parameter. Please let me know if that addresses the issue.

https://developer.cisco.com/site/prime-infrastructure/

smp · ‎01-12-2017

As I understand the documentation, adding the .nocount parameter will remove the first, last, and count attributes which I need to page through all my devices. How can I return all of the devices when the URL contains the .nocount parameter?

Spencer Zier · ‎01-12-2017

You're right in that those attributes will be removed from the response. However, you'll still be able to request pages via the .firstResult and .maxResults query parameters in your request. So you'd still send .firstResult=0&.maxResults=1000 in the first request, .firstResult=1000&.maxResults=1000 in the second request, and so on. To figure out if you've reached the end of the data set, you can count the number of entities in the response and if it equals .maxResults, you keep going, if it's less, then you've reached the last page.

I know it's not a perfect solution, but hopefully the response times will get back within acceptable ranges. I'd also suggest you talk to your Cisco account team and ask them to get you in the PI 3.2 beta; that way you can check if the v3/Devices response times are indeed better for you, and I'm sure we'd get a lot of constructive feedback from you

https://developer.cisco.com/site/prime-infrastructure/

smp · ‎01-13-2017

That workaround seems to be working nicely. Wasn't too difficult to implement either, just had to fiddle with the counter variables a little bit. Thanks a lot for the response - that was really helpful, as I'm not sure I would have come up with it myself.

And thanks for the beta offer, but we run pretty lean here so don't have the idle cycles to dig into a project like that. I can wait for the 3.2 release now that I have a workaround.

Thanks again!

smp · ‎01-19-2017

Well, I spoke too soon. It has worked once in the past week. This is a big problem for me, as I have no other way to maintain my inventory which is over 5,600 devices.

I am getting this exception with the 500 error:

exception": "org.springframework.orm.hibernate3.HibernateJdbcException: JDBC exception on Hibernate data access: SQLException for SQL [select this_.ID as ID2183_0_, this_.DeviceId as DeviceId2183_0_, this_.Reachability as Reachabi3_2183_0_, this_.ManagementStatus as Manageme4_2183_0_, this_.DeviceName as DeviceName2183_0_, this_.IpAddress as IpAddress2183_0_, this_.DeviceType as DeviceType2183_0_, this_.CollectionDetail as Collecti8_2183_0_, this_.CollectionTime as Collecti9_2183_0_, this_.SoftwareType as Softwar10_2183_0_, this_.SoftwareVersion as Softwar11_2183_0_, this_.CreationTime as Creatio12_2183_0_, this_.Location as Location2183_0_, this_.productFamily as product14_2183_0_, this_.criticalAlarms as critica15_2183_0_, this_.majorAlarms as majorAl16_2183_0_, this_.minorAlarms as minorAl17_2183_0_, this_.warningAlarms as warning18_2183_0_, this_.clearedAlarms as cleared19_2183_0_, this_.informationAlarms as informa20_2183_0_, this_.AuthEntityId as AuthEnt21_2183_0_, this_.AuthEntityClass as AuthEnt22_2183_0_ from Devices this_ order by this_.ID asc]; SQL state [72000]; error code [8103]; could not execute query; nested exception is org.hibernate.QueryTimeoutException: could not execute query",

I also found the full stack trace in the xmpNbiFw.log file. Is this something you've seen before, or do I need to open a support case?

Spencer Zier · ‎01-19-2017

Have you tried using the InventoryDetails resource as an alternative? If that doesn't work please contact me via my email address (available on my communities.cisco.com profile) and we can setup a Webex meeting to diagnose this issue.

https://developer.cisco.com/site/prime-infrastructure/

smp · ‎01-19-2017

No I haven't but I just constructed a request with Postman and it appears the data structure has the fields I need. It will take a decent bit of code changes though, so I'll need some time to work on it.

My /webacs/api/v2/data/InventoryDetails.json?.full=true&.nocount=true Postman call returned in 527114ms, or close to 9 minutes. Does that seem normal for ~5600 devices?

Spencer Zier · ‎01-19-2017

No, that's high as well. It could be that there is something in particular wrong with your appliance. Have you noticed any performance problems or slowness elsewhere? Are any jobs in the system running for long periods of time (>15 minutes)?

https://developer.cisco.com/site/prime-infrastructure/

smp · ‎01-19-2017

> Have you noticed any performance problems or slowness elsewhere? Are any jobs in the system running for long periods of time (>15 minutes)?

***chucking to myself***

Um, yeah, I have been working with a DE for several months now on my deployment. But your response does confirm that the API responsiveness is probably just another symptom of the issues I'm working on with TAC. No need to drag you into that nightmare...If I get around to updating my code to use InventoryDetails before the other issues are resolved, I'll post my results.

Thanks for the feedback.