OSPF, iBGP, EBGP across datacenters

cjsattler · ‎12-05-2011

I have two datacenters with two leased fibers running between them. Currently one datacenter is just a slave off the main site with layer 3 switches connected by OSPF to routed interfaces. The fibers themselves hook up to stacked 3750gs one each side with OSPF running between them. The OSPF cost tells it what fiber to use as primarily. The default-route is currently learned by OSPF from the core router at site A

Site A has 2 providers with full tables with ebgp and site B will have two providers with full routes and ebgp. I would like to share internet routes by ibgp between both routers and i would like to also keep the fibers plugged into the stacked 3750s for redundnacy and not have the fibers plugged into the backbone routers.

The problem i am having is when a packet comes in from the internet destined for a provider on site B, router A sends it to the switches at side A and it gets in a routing loop since the destination isnt on the layer 3 switch (only ospf with internal network routes) and sends it back to router A.

I know i can just plug the fibers into the backbone routers to fix this but i really want it on the stacked 3750s since the likelihood i have to take down a single router is greater then both switches that have a port-channel to the access layer.

I was thinking of also doing a l2tpv3 pseudowire between both core routers to allow them to have layer two between them but i'd rather not have the added overhead and complexity.

Here are the questions i have and id love to hear people's recommendations:

A) Is there any way to do this other then a pseudowire or plugging the fibers into the 6500s?

B) When i do get this working, should both sites have default-route originate by ospf or would it be better to set one preference higher?

Talha Ansari · ‎12-06-2011

Hi,

I assume your gateway router must be doing NAT. so the the packet reaching your l3 switches must contain your local ip address in the destination field of the packet header. If your internal OSPF routing is perfect then there should be no reason for the packet to get looped. Since, the destinations lying on the site B might be advertised from site B in OSPF and destinations lying on site A must be advertised in site A OSPF process.

Can you explain with an example and configuration of your devices about what is exactly happening?

Regards

Talha

cjsattler · ‎12-06-2011

Hi!,

The network is not running NAT. We have a /19, /20 and /23 being announced from our 6500s to our upstreams with ebgp. The problem with the loop is that OSPF doesnt have all the internal routes from EBGP so when someone on the inside wants to hit a site on the internet that Router B says is a better path, the path follows from access layer switches, to the 3750g's , follows their default gateway to Router A, router A says it should be on router B and sends it back down to the 3750 switches they connect to. Since Router B doesnt have the external route (would kill the memory of the 3750s to have all the internet routes), it sends it back up its default gateway to Router A and then it sends it back

This is what it showed when i did a sho ip route on a destination i couldnt get to:

Router A#sho ip route 168.215.5.209

Routing entry for 168.215.0.0/19

Known via "bgp XXXX", distance 200, metric 0

Tag 4323, type internal

Redistributing via ospf XXXX

Last update from ROUTERB 00:04:52 ago

Routing Descriptor Blocks:

* ROUTERB, from loopback0, 00:04:52 ago

Route metric is 0, traffic share count is 1

AS Hops 1

Route tag 4323

MPLS label: none

MPLS Flags: NSF

and if i did a traceroute it bounced back and forth.

heres my bgp, ospf configs:

router bgp XXXX

no synchronization

bgp router-id X.X.X.X

no bgp fast-external-fallover

bgp log-neighbor-changes

bgp graceful-restart restart-time 120

bgp graceful-restart stalepath-time 360

bgp graceful-restart

bgp maxas-limit 50

bgp dampening

network X.X.X.X mask 255.255.240.0

network X.X.X.X mask 255.255.224.0

network X.X.X.X mask 255.255.254.0

neighbor X.X.X.X remote-as INTERNAL

neighbor X.X.X.X update-source Loopback0

neighbor PROVIDER1 remote-as XXXX

neighbor PROVIDER1 description PROVIDER1

neighbor PROVIDER1 version 4

neighbor PROVIDER1 send-community

neighbor PROVIDER1 prefix-list INTERNALFILTER out

neighbor PROVIDER1 maximum-prefix 500000 90

neighbor PROVIDER2 remote-as XXXXX

neighbor PROVIDER2 description PROVIDER2

neighbor PROVIDER2 password 7

neighbor PROVIDER2 version 4

neighbor PROVIDER2 send-community

neighbor PROVIDER2 prefix-list XXXXX out

neighbor PROVIDER2 maximum-prefix 500000 90

no auto-summary

router ospf XXXXX

router-id XXXX

log-adjacency-changes

no auto-cost

max-lsa 8000

area 0 authentication

area 1 authentication

area 10 authentication

redistribute static subnets

redistribute bgp XXXX subnets route-map BGP_OSPF_REDIST (for failover for some customers but limits everything except a few subnets)

network XXXX XXXXX area 0

default-information originate

cjsattler · ‎12-06-2011

So what it sounds like is that my only solutions are really plug the fibers directly into the 6500s or create a tunnel. I just wanted to make sure i'm not missing anything as im fairly new to ibgp.

Im probably leaning towards just plugging them into the 6500s since the tunneling will have overhead and the links are already running at about 500mb and the tunneling will add some overhead.

Kishore Chennupati · ‎12-06-2011

Hi ,

If I understand this correctly, From the output you pasted here Router B is advertising /19 to Router A.via iBGP so Router B should forward the packet to where it learned that /19 from and not resort to the default route. Can you post the sh ip bgp 168.215.5.209 on Router B and also sh ip route 168.215.5.209 as well

Regards,

Kishore

cjsattler · ‎12-06-2011

Both routers are advertising the /19 and connecting to the switches by connected interfaces (switches only running ospf with internal /19). Here is the output, however i am not currently running ibgp between them because of the loop and only accepting incoming traffic from router B, there is no default-originate on router B anywhere.

A>sho ip route 168.215.5.209

Routing entry for 168.215.0.0/19

Known via "bgp XXXX", distance 20, metric 0

Tag 22773, type external

Redistributing via ospf XXXX

Last update from X.X.18.193 1d17h ago

Routing Descriptor Blocks:

* X.X.X.193, from X.X.18.193, 1d17h ago

Route metric is 0, traffic share count is 1

AS Hops 2

Route tag 22773

MPLS label: none

MPLS Flags: NSF

RouterB>sho ip route 168.215.5.209

Routing entry for 168.215.0.0/19

Known via "bgp XXXX", distance 20, metric 0

Tag 4323, type external

Redistributing via ospf XXXX

Last update from X.X.132.149 1d17h ago

Routing Descriptor Blocks:

* X.X.132.149, from X.X.132.149, 1d17h ago

Route metric is 0, traffic share count is 1

AS Hops 1

Route tag 4323

MPLS label: none

MPLS Flags: NSF

I did think of one way to get redundnacy across everything without needing to resort to layer 2 spanning tree or anything. If i plug one fiber into the core routers directly and then the other fibers create a trunk on the switches and carve out a vlan for the 6500s to communicate with, that should allow me to survive a single router failure and talk across datacenters without the complexities and overhead of tunneling.

Any opinions on that?

Kishore Chennupati · ‎12-07-2011

cjasttler,

Here is the lab results just for you. I just used a diff ip addressing.

+++++ WIth GRE

R1#sh ip bgp

BGP table version is 2, local router ID is 22.22.22.22

Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,

r RIB-failure, S Stale

Origin codes: i - IGP, e - EGP, ? - incomplete

Network Next Hop Metric LocPrf Weight Path

*>i33.33.0.0/16 1.1.1.2 0 100 0 i

R1#traceroute 33.33.33.33

Type escape sequence to abort.

Tracing the route to 33.33.33.33

1 1.1.1.2 72 msec * 60 msec <<<< Trace succesful

R1#sh ip route 1.1.1.2

Routing entry for 1.1.1.0/24

Known via "connected", distance 0, metric 0 (connected, via interface)

Routing Descriptor Blocks:

* directly connected, via Tunnel0

Route metric is 0, traffic share count is 1

+++++ Without GRE

R1#sh ip bgp

BGP table version is 4, local router ID is 22.22.22.22

Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,

r RIB-failure, S Stale

Origin codes: i - IGP, e - EGP, ? - incomplete

Network Next Hop Metric LocPrf Weight Path

*>i33.33.0.0/16 2.2.2.2 0 100 0 i

R1#traceroute 33.33.33.33

Type escape sequence to abort.

Tracing the route to 33.33.33.33

1 192.168.1.3 40 msec 36 msec 24 msec

2 192.168.1.1 32 msec 16 msec 24 msec

3

*Dec 7 20:16:59.151: ICMP: time exceeded rcvd from 192.168.1.3

*Dec 7 20:16:59.191: ICMP: time exceeded rcvd from 192.168.1.3

*Dec 7 20:16:59.215: ICMP: time exceeded rcvd from 192.168.1.3

*Dec 7 20:16:59.235: ICMP: bogus redirect from 192.168.1.3 - for 33.33.33.33 use gw 192.168.1.1

*Dec 7 20:16:59.235: gateway address is one of our addresses

So, just use a GRE Tunnel between both the Routers and run iBGP between them

HTH

Regards,

Kishore

Marwan ALshawi · ‎12-06-2011

You can enable mpls on the switches make it like a core with ospf igp only and run mp-bgp between the two sites routers full route exchange

For defual route it is up to your requirements which one is better but since you have full Internet routing the defaule route is more from igp to bgp path only

Hope this help

cjsattler · ‎12-06-2011

That would be a interesting solution but the 3750s cant do mpls (the metro ethernet can but i dont have those)

thanks

Marwan ALshawi · ‎12-06-2011

You can run a gre tunnel between the routers with MPLs enabled over the tunnels and using the igp for tunnels reachability

On top of that you run MP-BGP

http://www.cisco.com/en/US/docs/ios/mpls/configuration/guide/mp_vpn_gre.html

Sent from Cisco Technical Support iPhone App

maayre · ‎12-06-2011

Well if you use a GRE tunnel you will no longer need MPLS to tunnel as you're doing that with GRE

I vote for MPLS but as you said not supported so tunnel it "ipip" or "ip gre".

With the defaults I would just make sure that the devices at each site use their local exit point (if that is desirable). The two defaults can easily coexist if the cost between sites us high.

Marwan ALshawi · ‎12-06-2011

If you use tunneling without MPLs/mp-bgp a full routing has to be redistributed onto the igp and this is not recommended or best practice !

The concept same as pe-to-pe tunneling in the above link

Sent from Cisco Technical Support iPhone App

maayre · ‎12-06-2011

Not sure if I've missed something but my response is;

- No redistribution required, iBGP is advertising routes with a next-hop routed over the tunnel

- MP-BGP not sure why you need new address-families in this setup. MP-BGP by definition means you are adding some other address-family besides IPv4, this isn't required

Kishore Chennupati · ‎12-07-2011

Marwan,

You suggested a very good idea of GRE but using MP-BGP or something is an overkill. All that needs to be done is to hide the destination from the internal network which can be done by GRE and should be enough. I dont believe that you need any BGP sAFI's here. Also as Matt suggested no redis is required between the BGP and IGP.

The problem the poster is having is that the switches (IGP) dont know the destination address for eg : 168.215.5.0/19

and hence when Router A tries to route via the 3750's the switches send the packets back to the Router A because of the default route.

Talha,

I see what you are saying . However I would say its suboptimal routing to go out to the internet and then come back from Router A to Router B. The network has a backdoor fibre link and it would make sense to route those destinations internally via iBGP.

cjsattle,

I will definetly recommend to use GRE. I have tested this for you just in case you want some assurance. I will paste the results in the following post.

HTH

Kishore

Marwan ALshawi · ‎12-07-2011

Agree mp-bgp not required unless multiple vrf routes needed

Again GRE will form the bgp over the tunnels and make sure to use the tunnels as the bgp session source and the tunnel IPs are reachable via igp not bgp to avoid recursive lookup

Hope this help