cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
795
Views
0
Helpful
15
Replies
Highlighted

onep_dpss_inject_raw_packet performance

When a onep application invokes the onep_dpss_inject_raw_packet function to inject a newly created packet, what kind of communication takes place between the node hosting the onep application and the router / network element? More specifically how many messages / packets are sent between:
- dpss process and the router (datapath transport gre)
- onep application (library) and the router (transport tls)

Best regards
Viktor

15 REPLIES 15
Highlighted
Hall of Fame Cisco Employee

For every packet generated, there will be one packet sent over GRE from the dpss_mp to the NE.  This is not done over TLS.

As for the Thrift RPC channel, it will depend on what you have to do additionally.  In my packet generator case, I make a call to onep_element_get_interface_by_name() each time I generate a packet (I use OUTPUT as my inject point).  This yields six Thrift packets for one DPSS packet, ten for two, and 14 for three.  So from a transmit point, my app sends two Thrift packets per DPSS packet.

Highlighted

Hi Joseph,

The single packet sent over GRE from the dpss_mp to the NE for every packet generated is of course as expected. However, we do not expect a call to onep_dpss_inject_raw() to generate any RPC calls.

How come the overhead of four thrift packets per inject? I assume that is two thrift packets in each direction? Could you try to do nothing additionally, except from calling onep_dpss_inject_raw, in order to see if you can avoid the RPC calls altogether? For example, by calling the onep_element_get_interface_by_name() only once, and then injecting lots of packets with onep_dpss_inject_raw in some loop.

Best regards

Viktor

Highlighted

Hi Joseph,

As mentioned, we do not expect a call to onep_dpss_inject_raw() to generate any RPC calls.

For our onep application (built towards onepk_1.1.0.99) it seems like each onep_dpps_inject_raw() call results in some TLS traffic (port 15002). The observation is that the this traffic increases proportionally with the inject rate. Before reaching 1000 injects per second, the c2951 router reaches more than 50% CPU load.

Based on this observation, if each onep_dpss_inject_raw call triggers some RPC communication and the router handles the TLS traffic in software, that might be the reason for the high load on the router. So we wanted to hear if there is any chance that the onep_dpss_inject_raw() function implicitly generates this additional traffic. If not in general, might this be the case for some routers, such as the c2951?

When we turn on debugging, via the following command on the router "debug onep application session onepapp" we see that each packet injected results in a number of thrift messages, as below:

Wed Feb 26 10:35:27 2014:[ONEP][UnknownClass][DEBUG]: onepapp-eg-0: [cthrift_write_entire_buffer__]: 319[4160567360]: Write cthrift buffer to socket 11: bytes 53
Wed Feb 26 10:35:27 2014:[ONEP][UnknownClass][DEBUG]: onepapp-eg-0: [cthrift_write_entire_buffer__]: 392[4160567360]: cthrift wrote 53 bytes, 0 remaining to send, took 0 usec
Wed Feb 26 10:35:27 2014:[ONEP][UnknownClass][DEBUG]: onepapp-eg-0: [cthrift_recv_main__]: 2592[4160567360]: Client Done tos 1
Wed Feb 26 10:35:27 2014:[ONEP][UnknownClass][DEBUG]: onepapp-eg-0: [cthrift_recv_main__]: 2604[4160567360]: Received 89(bytes) equal to or grater than parsed 89(bytes)
Wed Feb 26 10:35:27 2014:[ONEP][UnknownClass][DEBUG]: onepapp-eg-0: [onep_dpss_pak_loop_internal]: 967[3925817088]: Found traffic reg using local_id 1

In other words, it appears that each onep_dpss_inject_raw generates a RPC call.

Are these RPC calls triggered by the call to the onep_dpss_inject_raw() function or does this indicate a problem in our application?

Best regards
Viktor

Highlighted
Hall of Fame Cisco Employee

Yes.  That was my point.  Sorry I was a bit confusing.  Each call to onep_dpss_inject_raw() triggers the app to generate two RPC packets (with ACKs, that's four total packets).  One of the RPC calls made is to determine the interface ID of the specific interface (when using OUTPUT).

Highlighted

Hi again Joseph,

How can the RPC calls be avoided then? In other words, is it possible to use the onep_dpss_inject_raw() without any RPC calls taking place (per inject)?

Currently, when using a c2951 router we achieve maximum only around 4 Mbps before the router reaches 50 - 60 percent CPU load.

We suspect the high CPU load on the router to be caused by the TLS encrypt/decrypt being handled in software. Is that the case?

Best regards
Viktor

Highlighted
Hall of Fame Cisco Employee

There will be one call made to get the specified interface's parent if the OUTPUT injection point is used.  PREROUTING does not make this call.  From the code that's the only call I can see being made, so perhaps the two packets equal that one call.  In terms of CPU usage, if you're seeing high CPU, can provide the output of "show proc cpu sort"?

Highlighted

Hi again Joseph,

We inject layer 2 frames, and hence use ONEP_TARGET_LOCATION_HARDWARE_DEFINED_OUTPUT. It is not clear why a RPC call is necessary for the inject_raw. Wouldn't it be possible to include the required interface information with the packet itself, sent via GRE from the dpss_mp to the router?

Below is the output taken from a test run with a c2951 router when injecting only 1 Mbps of traffic. As can be seen, the ONEP Application load is around 40% CPU usage. The tests were short, hence the difference for 5Sec, 1Min and 5Min.

Do you see similar results?

By the way, we have requested support for plain TCP transport (which was removed), also for test and comparisons in cases like this.


onepk1#show processes cpu sorted
CPU utilization for five seconds: 53%/6%; one minute: 49%; five minutes: 24%
PID Runtime(ms)     Invoked      uSecs   5Sec   1Min   5Min TTY Process
   3      428252     1764780        242 40.79% 37.61% 16.69%   0 ONEP Application
174      202020     2951564         68  5.67%  5.20%  2.31%   0 IP Input        
  15      324504      347586        933  0.63%  0.12%  0.06%   0 Environmental mo
195          96    44172441          0  0.15%  0.13%  0.13%   0 Ethernet Msec Ti
  89        2248      347942          6  0.15%  0.15%  0.15%   0 Per-Second Jobs 
108        2328     1391678          1  0.15%  0.18%  0.18%   0 Netclock Backgro
373       34780      937090         37  0.07%  0.02%  0.02%   0 XOS async sync X
238          16      347586          0  0.07%  0.00%  0.00%   0 RUDPV1 Main Proc
   9           0           2          0  0.00%  0.00%  0.00%   0 Timers          
  10           0         565          0  0.00%  0.00%  0.00%   0 WATCH_AFS       
  11           0           1          0  0.00%  0.00%  0.00%   0 License Client N
   8           0           1          0  0.00%  0.00%  0.00%   0 DiscardQ Backgro
  13      281584        6083      46290  0.00%  0.05%  0.05%   0 Licensing Auto U
  14           8        4213          1  0.00%  0.00%  0.00%   0 CEF OneP DPSS ha
   7          12        5800          2  0.00%  0.00%  0.00%   0 Pool Manager    
<cut>


Best regards
Viktor

Highlighted
Hall of Fame Cisco Employee

I confirmed with development that as of 1.1.0 and 1.2.0, one RPC call will be made to get the parent interface (as I stated).  This only happens in the OUTPUT case.  Post-1.2.0 there will new API where you can get the interface ahead of time and pass that to onePK, thus caching the value so each injection will not trigger another RPC.

In the meantime, can you send the output of "show ver", "show region", and "show stack 3" when the CPU is this high?

Highlighted

Below you find the information that you requested, including a couple of "show stack 3", as I am not sure about how this information is "sampled". What can you see from this information, Joseph? Has this anything to do with TLS, or something else?

(PS! I could not see any monospaced text options suitable for this kind of text output)


onepk1#show processes cpu sorted 
CPU utilization for five seconds: 62%/8%; one minute: 25%; five minutes: 11%
PID Runtime(ms)     Invoked      uSecs   5Sec   1Min   5Min TTY Process
   3       26172      108051        242 47.03% 18.14%  5.75%   0 ONEP Application
174     1242456    24212180         51  6.63%  2.84%  2.26%   0 IP Input        
305         708        1087        651  0.15%  0.17%  0.04% 646 Virtual Exec    
  89        2940      498094          5  0.15%  0.15%  0.15%   0 Per-Second Jobs 
108        3192     1992198          1  0.15%  0.19%  0.18%   0 Netclock Backgro
293         548     7781322          0  0.07%  0.02%  0.00%   0 MMA DB TIMER    
170         500    15542588          0  0.07%  0.02%  0.00%   0 IPAM Manager    
373       78920     2136445         36  0.07%  0.02%  0.01%   0 XOS async sync X
187          64     1945524          0  0.07%  0.00%  0.00%   0 SSS Feature Time
195         148    63227658          0  0.07%  0.10%  0.08%   0 Ethernet Msec Ti
125       21836      996052         21  0.07%  0.04%  0.05%   0 BPSM stat Proces
  11           0           1          0  0.00%  0.00%  0.00%   0 License Client N
  10           0         678          0  0.00%  0.00%  0.00%   0 WATCH_AFS       
<cut>


onepk1#show region
Region Manager:

      Start         End     Size(b)  Class  Media  Name
0x00000000  0x1FFFFFFF   536870912  Local  R/W    main
0x00000000  0x1D5FFFFF   492830720  Local  R/W    main:main_partial
0x01000000  0x03FFFFFF    50331648  Local  R/W    main_partial:heap
0x040001A0  0x0ACE743B   114193052  IText  R/O    main_partial:text
0x0C000000  0x12425FFF   105013248  IData  R/W    main_partial:data
0x10000000  0x11406FFF    21000192  Local  R/W    data:heap
0x12426000  0x134E90B7    17576120  IBss   R/W    main_partial:bss
0x134E90B8  0x1D5FFFFF   168914760  Local  R/W    main_partial:heap
0x1D600000  0x1FFFFFFF    44040192  Iomem  R/W    main:iomem


Free Region Manager:

      Start         End     Size(b)  Class  Media  Name


onepk1#show stacks 3
Process 3:  ONEP Application
  Stack segment 0x25808EC - 0x25837CC
  FP: 0x25836E0, RA: 0x73F67F8
  FP: 0x2583718, RA: 0x77A5944
  FP: 0x2583798, RA: 0x69245B4
  FP: 0x25837B8, RA: 0x7784B40
  FP: 0x25837C0, RA: 0x4DDEA50
  FP: 0x0, RA: 0x4DC5D04
onepk1#show stacks 3
Process 3:  ONEP Application
  Stack segment 0x25808EC - 0x25837CC
  FP: 0x25836E0, RA: 0x73F6770
  FP: 0x2583718, RA: 0x77A5944
  FP: 0x2583798, RA: 0x69245B4
  FP: 0x25837B8, RA: 0x7784B40
  FP: 0x25837C0, RA: 0x4DDEA50
  FP: 0x0, RA: 0x4DC5D04
onepk1#show stacks 3
Process 3:  ONEP Application
  Stack segment 0x25808EC - 0x25837CC
  FP: 0x25836E0, RA: 0x73F6770
  FP: 0x2583718, RA: 0x77A5944
  FP: 0x2583798, RA: 0x69245B4
  FP: 0x25837B8, RA: 0x7784B40
  FP: 0x25837C0, RA: 0x4DDEA50
  FP: 0x0, RA: 0x4DC5D04
onepk1#show stacks 3
Process 3:  ONEP Application
  Stack segment 0x25808EC - 0x25837CC
  FP: 0x25836E0, RA: 0x73F67F8
  FP: 0x2583718, RA: 0x77A5944
  FP: 0x2583798, RA: 0x69245B4
  FP: 0x25837B8, RA: 0x7784B40
  FP: 0x25837C0, RA: 0x4DDEA50
  FP: 0x0, RA: 0x4DC5D04
onepk1#show stacks 3
Process 3:  ONEP Application
  Stack segment 0x25808EC - 0x25837CC
  FP: 0x25836E0, RA: 0x73F6770
  FP: 0x2583718, RA: 0x77A5944
  FP: 0x2583798, RA: 0x69245B4
  FP: 0x25837B8, RA: 0x7784B40
  FP: 0x25837C0, RA: 0x4DDEA50
  FP: 0x0, RA: 0x4DC5D04
onepk1#show stacks 3
Process 3:  ONEP Application
  Stack segment 0x25808EC - 0x25837CC
  FP: 0x25836E0, RA: 0x73F6770
  FP: 0x2583718, RA: 0x77A5944
  FP: 0x2583798, RA: 0x69245B4
  FP: 0x25837B8, RA: 0x7784B40
  FP: 0x25837C0, RA: 0x4DDEA50
  FP: 0x0, RA: 0x4DC5D04
onepk1#show stacks 3
Process 3:  ONEP Application
  Stack segment 0x25808EC - 0x25837CC
  FP: 0x25836E0, RA: 0x73F6770
  FP: 0x2583718, RA: 0x77A5944
  FP: 0x2583798, RA: 0x69245B4
  FP: 0x25837B8, RA: 0x7784B40
  FP: 0x25837C0, RA: 0x4DDEA50
  FP: 0x0, RA: 0x4DC5D04
onepk1#show stacks 3
Process 3:  ONEP Application
  Stack segment 0x25808EC - 0x25837CC
  FP: 0x25836E0, RA: 0x73F6770
  FP: 0x2583718, RA: 0x77A5944
  FP: 0x2583798, RA: 0x69245B4
  FP: 0x25837B8, RA: 0x7784B40
  FP: 0x25837C0, RA: 0x4DDEA50
  FP: 0x0, RA: 0x4DC5D04
onepk1#


onepk1#show processes cpu history      

onepk1   10:28:26 AM Thursday Feb 27 2014 UTC

                                                                 
                                                                 
                                                                 
      6666666666666666666666666666644444                         
      333322222222222222233333222223333311111111111111122222111112
  100                                                            
   90                                                            
   80                                                            
   70                                                            
   60 *****************************                              
   50 *****************************                              
   40 **********************************                         
   30 **********************************                         
   20 **********************************                         
   10 **********************************                         
     0....5....1....1....2....2....3....3....4....4....5....5....6
               0    5    0    5    0    5    0    5    0    5    0
               CPU% per second (last 60 seconds)

            

onepk1#show version
Cisco IOS Software, C2951 Software (C2951-UNIVERSALK9-M), Experimental Version 15.4(20131213:031344) [surf-onep_ca_pi23_throttle-nightly 147]
Copyright (c) 1986-2013 by Cisco Systems, Inc.
Compiled Fri 13-Dec-13 20:41 by surf

ROM: System Bootstrap, Version 15.0(1r)M13, RELEASE SOFTWARE (fc1)

onepk1 uptime is 5 days, 19 hours, 7 minutes
System returned to ROM by reload at 16:04:27 UTC Fri Feb 21 2014
System image file is "flash0:/c2951-universalk9-mz.SSA"
Last reload type: Normal Reload
Last reload reason: Reload Command

This product contains cryptographic features and is subject to United
States and local country laws governing import, export, transfer and
use. Delivery of Cisco cryptographic products does not imply
third-party authority to import, export, distribute or use encryption.
Importers, exporters, distributors and users are responsible for
compliance with U.S. and local country laws. By using this product you
agree to comply with applicable laws and regulations. If you are unable
to comply with U.S. and local laws, return this product immediately.

A summary of U.S. laws governing Cisco cryptographic products may be found at:
http://www.cisco.com/wwl/export/crypto/tool/stqrg.html

If you require further assistance please contact us by sending email to
export@cisco.com.

Cisco CISCO2951/K9 (revision 1.0) with 481280K/43008K bytes of memory.
Processor board ID FCZ1546206Y
4 Gigabit Ethernet interfaces
1 Serial interface
1 terminal line
2 Channelized E1/PRI ports
DRAM configuration is 72 bits wide with parity enabled.
255K bytes of non-volatile configuration memory.
255744K bytes of ATA System CompactFlash 0 (Read/Write)


License Info:

License UDI:

-------------------------------------------------
Device#      PID            SN
-------------------------------------------------
*1        CISCO2951/K9          FCZ1546206Y    

Technology Package License Information for Module:'c2951'

------------------------------------------------------------------------
Technology    Technology-package                  Technology-package
              Current              Type           Next reboot 
------------------------------------------------------------------------
ipbase        ipbasek9             Permanent      ipbasek9
security      None                 None           None
uc            None                 None           None
data          datak9               RightToUse     datak9
NtwkEss       None                 None           None
CollabPro     None                 None           None

Configuration register is 0x2102
                                                
  

Highlighted
Hall of Fame Cisco Employee

Thanks, Viktor.  I decoded this, and it doesn't look like TLS.  However, it could be due to the overall injection overhead.  I heard Einar sent you a debugging shared library to help narrow this down.  Did you receive that, and are you able to run your tests?

Highlighted

Yes Joseph, we received a debugging shared library from Einar which avoids the RPC request/reply packets over the TLS connection for each and every onep_dpss_inject_raw_packet call. We have only done some initial tests, and performance seems to be significantly improved (10x).

Best regards

Viktor

Highlighted
Hall of Fame Cisco Employee

Thanks, Viktor.  Einar and I caught up over email.  I'm glad you're seeing better performance.

Highlighted

Thanks a lot for the help Joseph.

Could you please confirm that the next version of onepk that will be made available, that is version 1.2 as far as I understand, will have this issue fixed?

Best regards

Viktor

Highlighted
Hall of Fame Cisco Employee

We're trying to get it in, yes.  We had a discussion about it yesterday.

Content for Community-Ad
Cisco Community October 2020 Spotlight Award Winners
This widget could not be displayed.