cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
3515
Views
0
Helpful
6
Replies

DMVPN scalability

Chris Ingram
Level 1
Level 1

I have three Hub routers that I'm wanting to compare DMVPN scalabiltiy capabilities (3825 versus 3945 and 3845).  I'm having trouble finding enough information anywhere on Cisco's website that can help me.  I know it must be there somewhere and I'm just not looking in the right place.  But I've read and read and read about DMVPN designs and I'm not finding anything.  This is turning into a time killer.  Could someone please help me determine what the DMVPN limitations of these three routers are?

Thanks,

Chris

2 Accepted Solutions

Accepted Solutions

Marcin Latosiewicz
Cisco Employee
Cisco Employee

Chris,

We rarely test anything lower than 7200 for hubs. I can give you theoretical numbers I found internally.

I strongly suggest you get in touch with your SEs or account team for more precise info. Below info is just estimates.

Note that major scalability factor is the ability to sustain multiple routing adjacencies.

BGP should scale best.

3825 - even up to 200 peers

3845 - up to 300-400 depends on config/amount of load.

3945 - 500-750 (without getting into high CPU, but can be much stretched further)

Regarding throughput it will be even harder to give you a good estimate, especially since most likely we would not be able to much your actual traffic without testing and will depend on HW config.

Marcin

View solution in original post

Chris,

If I remember correctly from my university days O is function of complexity/growth and n is usually quantity.

So for example complecity of hub and spoke design is linear ( O(n) ) . The setup/complexity grows by certain value (usualy same) every time you add a spoke.

I even found a wiki articule http://en.wikipedia.org/wiki/Big_O_notation

Regarding throughput values, different tests are done, but I'm pretty sure not all HW can be tested.

The idea is that at some point we will either run out of hold-queue on interfaces (it is recommended to set hold-queue to twice the number of spokes on tunnel interface) or internal buffers will start overflowing.
The PPS capabilities of DMVPN will also depend on type of crypto engine you're using and of course packet size.

There are multiple factors to take into account and a few good reasons why you will not achieve actuall advertised PPS with real internet traffic (I'm not even talking about IMIX testing).

I know this will prompt most likely even more questions, that's why I thought involving your SE would be best idea ;-)

Marcin

View solution in original post

6 Replies 6

Marcin Latosiewicz
Cisco Employee
Cisco Employee

Chris,

We rarely test anything lower than 7200 for hubs. I can give you theoretical numbers I found internally.

I strongly suggest you get in touch with your SEs or account team for more precise info. Below info is just estimates.

Note that major scalability factor is the ability to sustain multiple routing adjacencies.

BGP should scale best.

3825 - even up to 200 peers

3845 - up to 300-400 depends on config/amount of load.

3945 - 500-750 (without getting into high CPU, but can be much stretched further)

Regarding throughput it will be even harder to give you a good estimate, especially since most likely we would not be able to much your actual traffic without testing and will depend on HW config.

Marcin

Marcin,

Thank you for your reply.  That helped make an immediate decision for the question at hand.

While I was looking for an answer I found some documentation that mentioned scalability but didn't go into great, great, detail.  It left me with more questions than answers.

example:

Hub-and-spoke

All VPN traffic must go via hub

Hub bandwidth and CPU utilization limit VPN

Number of tunnels = O(n)

http://www.cisco.com/en/US/prod/collateral/vpndevc/ps6525/ps9370/ps6658/prod_presentation0900aecd80313ca9.pdf

I don't understand what the variables 'O' and 'n' are in the equation.

This next document has a lot of detail and shows an example with 1000 branches.

http://www.cisco.com/en/US/customer/docs/solutions/Enterprise/WAN_and_MAN/DMVPN_3.html#wp70006

Now, I'm curious about the packets per second in our configuration.  I don't know what the limit is, or how to calculate

Our primary head end router has an ES module with 2 etherchanneled Fa ports for the tunnel internet source.

  5 minute input rate 757000 bits/sec, 536 packets/sec
  5 minute output rate 1383000 bits/sec, 624 packets/sec

The gig interface between the ES and router handles all traffic (encrypted and non-encrypted).

  5 minute input rate 2198000 bits/sec, 1235 packets/sec
  5 minute output rate 2354000 bits/sec, 1247 packets/sec

I may be a little confused about where exactly I'm supposed to be watching the pps.

The CPU doesn't look too bad, at all, to me.

LIT1-DC-ECT-3845-1#show proc cpu history

LIT1-DC-ECT-3845-1   07:40:41 PM Wednesday Jun 1 2011 UTC

                                                               
    111111111111111111111111111111111111111111111111111111111111
    444442222211111111111111155555444441111111111111112222222222
100                                                            
90                                                            
80                                                            
70                                                            
60                                                            
50                                                            
40                                                            
30                                                            
20                          *****                             
10 ************************************************************
   0....5....1....1....2....2....3....3....4....4....5....5....6
             0    5    0    5    0    5    0    5    0    5    0
               CPU% per second (last 60 seconds)

                                                               
    111111112111111111222211112131111111211111111111111111111211
    657567672667698878110087880747867669699654878677887898878065
100                                                            
90                                                            
80                                                            
70                                                            
60                                                            
50                                                            
40                                                            
30                             *       *                      
20 **************##*############******####** **#********##*****
10 ############################################################
   0....5....1....1....2....2....3....3....4....4....5....5....6
             0    5    0    5    0    5    0    5    0    5    0
               CPU% per minute (last 60 minutes)
              * = maximum CPU%   # = average CPU%

                                                                           
    333333332222232222323345485343472222222222222223233333332222222222222232
    465342507777707778080550751713138898988988888890910100009887787869688909
100                                                                        
90                          *                                             
80                          *                                             
70                          *     *                                       
60                          *     *                                       
50                       *****    *                                       
40  **   *              ******** **                                       
30 ***********************#************************************************
20 **********************#####*##******************************************
10 ########################################################################
   0....5....1....1....2....2....3....3....4....4....5....5....6....6....7..
             0    5    0    5    0    5    0    5    0    5    0    5    0 
                   CPU% per hour (last 72 hours)
                  * = maximum CPU%   # = average CPU%


LIT1-DC-ECT-3845-1#

My interpretation of the documentation is that the CPU shouldn't average more than 65%.  Is that correct?

Chris,

If I remember correctly from my university days O is function of complexity/growth and n is usually quantity.

So for example complecity of hub and spoke design is linear ( O(n) ) . The setup/complexity grows by certain value (usualy same) every time you add a spoke.

I even found a wiki articule http://en.wikipedia.org/wiki/Big_O_notation

Regarding throughput values, different tests are done, but I'm pretty sure not all HW can be tested.

The idea is that at some point we will either run out of hold-queue on interfaces (it is recommended to set hold-queue to twice the number of spokes on tunnel interface) or internal buffers will start overflowing.
The PPS capabilities of DMVPN will also depend on type of crypto engine you're using and of course packet size.

There are multiple factors to take into account and a few good reasons why you will not achieve actuall advertised PPS with real internet traffic (I'm not even talking about IMIX testing).

I know this will prompt most likely even more questions, that's why I thought involving your SE would be best idea ;-)

Marcin

Ok.  Now, as far as bandwidth concerns go, I took the example out of the second link and  tried to apply it to our DMVPN configuration. http://www.cisco.com/en/US/customer/docs/solutions/Enterprise/WAN_and_MAN/DMVPN_3.html#wp70006

Here's the example:

This example is from a case study of 1000 branch offices.

typical case—1000 x (384kbps + 1.5 Mbps) x 33 percent utilization = 628 Mbps

worst case—1000 x (384kbps + 1.5 Mbps) x 100 percent utilization = 1.9 Gbps

We currently have 149 tunnels with speeds that vary.

So here's what I came up with:

105 tunnels = 6768 Kbps
29 tunnels = 3512 Kbps
15 tunnels = 1920 Kbps


105 * (768 + 6000) * .33 = 234511.2 Kbps
29 * (512 + 3000) * .33 = 33609.84 Kbps
15 * (384 + 1536) * .33 = 9504 Kbps

So we need a total of 277625.04 Kbps (277.6 Mbps).

Does this seem right?  If it is, then this could be part of our problem because the port channel for our internet source only equals 200 Mbps.

Chris,

I didn't check your numbers exactly but the calculation seems correct.

Looking at what's calculated I see that they are taking into account inbound and outbound traffic (i.e. total throughput).

I assume that the portchannel you have is 2x100Mbits in full duplex (or similar config). Which should give you 200Mbits throughput for ingree and egress.

This sort of scenario would also require QoS configuration so even if you're close to your BW. Normally you should not run into big degradation of service (at least for the services that matter) even with higher rates.

I geuss you would also want some sort of redundancy which COULD allow you to load-balance (or load share) traffic between multiple ISPs/HUBs.

Marcin

Marcin,

Thanks for your reply.

We do have a dual hub DMVPN topology for redundancy (for the particular DMVPN network I was refering to).  We're in the process of migrating both hubs to new DCs and connecting them to Nexus 7ks with fiber.

I initially was asking about the scalability for a 2nd DMVPN network we are building.  I didn't think we would run into a scalbility problem with the first before we changed the design.  The "problem" I refered to in a later post was some buffer failures I noticed last week while troubleshooting 'per-tunnel-qos' that I implemented back in February.  The qos for all 150 tunnels was somehow causing the physical interface (on each hub) to drop packets fairly rapidly therefore causing our end users to experience degradation in network performance. As soon as I removed the qos, the drops stopped.  But that is another subject.

Thanks again for the helpful information.

Chris