cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
478
Views
4
Helpful
0
Comments
Jawad Al Akrabawi
Cisco Employee
Cisco Employee

With the growing AI and VXLAN fabrics in the industry, security demands also increase, especially when deploying K8s clusters. In some certain scenarios, especially in private K8s clouds, you might want to restrict client access to some certain workloads and pods. Cilium with eBPF-based networking merged into the Linux kernel can certainly provide superior capabilities.

However, from the VXLAN networking perspective, it might be ideal as well to restrict or somehow isolate your clients into separate VRFs/Tenants where a tenant can only see IP addresses belonging to that tenant in isolation of any other tenant.

Isovalent Enterprise for Cilium provides a true end to end VRF isolation between the K8s clusters and the client VRFs using L3VPN SRV6. The is the true definition of Tenancy on K8s cluster, where a conflicting Service IP range of Pods doesn’t really conflict with others as long as they are separated into different VRFs/Tenants. Such capabilities is truly end to end from the K8s side all the way through the Nexus VXLAN fabric towards any border leaf switches.

However, in this post, aside from SRV6 demonstrations, we will also demonstrate simple implementation of BGP communities between Cilium and Nexus VXLAN EVPN fabric that can provide some sort of multi-tenant capabilities. Yet, this is not a true tenancy as opposed to deploying L3VPN SRV6 configurations

Using BGP Attributes to Steer routes into VRFs (Not True End to End Multi-Tenant)

While there are many methods to steer BGP routes into separate VRFs, we will make the use of BGP communities here. Consider two pods in the deployment, Pod1 belongs to Tenant1 only, and IP routing table of Tenant1 should only have Service IP's or CIDR's belonging to this Pod or any Pods belonging to Tenant1, more specifically, Tenant 1 should be able to see 20.0.10.1/32 in the routing table, but should never see Pod2’s Service IP of 30.0.10.1. We will use simply BGP communities advertised from Cilium to achieve that. Cilium will add BGP community of 64512:301 to any advertisements of pods belonging to Tenant1, in this example, Pod2 will be advertised with a different community such as 64512:302.

The Cisco Nexus EVPN fabric can have multiple VRFs deployed, one of them will be a generic VRF facing the K8s nodes (Tenant-K8s), and this is the VRF where all BGP sessions will be established. Note that in this example, I used loopback in VRF called “Tenant-K8s”, the use of loopbacks might be needed when deploying VPCs to keep BGP sessions up in case of link failures. However, in this example, I have only one worker node with single link to a single leaf.

JawadAlAkrabawi_0-1732534064615.png

 

 

Checking on the BGP sessions:

From the Leaf Switch side:

Leaf01# show bgp vrf Tenant-K8s  ipv4 unicast summary
BGP summary information for VRF Tenant-K8s, address family IPv4 Unicast
BGP router identifier 192.168.1.1, local AS number 65000
BGP table version is 325, IPv4 Unicast config peers 1, capable peers 1
6 network entries and 6 paths using 1800 bytes of memory
BGP attribute entries [5/1840], BGP AS path entries [1/6]
BGP community entries [2/88], BGP clusterlist entries [0/0]
4 received paths for inbound soft reconfiguration
0 identical, 4 modified, 0 filtered received paths using 64 bytes

Neighbor        V    AS    MsgRcvd    MsgSent   TblVer  InQ OutQ Up/Down  State/
PfxRcd
192.168.16.35   4 64512      13305      13197      325    0    0    1d22h 4


From the worker node:

 

jawad@ubuntu1:~$ cilium bgp peers
Node            Local AS   Peer AS   Peer Address   Session State   Uptime      Family         Received   Advertised
ubuntu2.local   64512      65000     192.168.1.1    established     45h49m25s   ipv4/unicast   2          6

 

The worker node has two pods, each with a different Service IP.

jawad@ubuntu1:~$ kubectl get pods
NAME     READY   STATUS    RESTARTS   AGE
nginx    1/1     Running   0          20h
nginx2   1/1     Running   0          20h

jawad@ubuntu1:~$ kubectl describe ippools/ip-pool-pod1 | grep Cidr
    Cidr:    20.0.10.0/24
jawad@ubuntu1:~$ kubectl describe ippools/ip-pool-pod2 | grep Cidr
    Cidr:    30.0.10.0/24

jawad@ubuntu1:~$ kubectl get service
NAME             TYPE           CLUSTER-IP       EXTERNAL-IP   PORT(S)        AGE
kubernetes       ClusterIP      10.96.0.1        <none>        443/TCP        4d17h
nginx-app        NodePort       10.100.246.185   <none>        80:31705/TCP   4d17h
nginx-service    LoadBalancer   10.110.206.139   20.0.10.1     80:32216/TCP   41h
nginx-service2   LoadBalancer   10.109.203.74    30.0.10.1     80:32489/TCP   40h

 

Cilium will advertise each Pod’s service IP with a different communities, we basically assign some labels for each Pod (which could represent a Tenant) and then Cilium can match those and assign the correct community. In the below example of Cilium BGP Advertisement config file, Pod1 will be advertised with community 64512:301 and Pod2 with community of 64512:302

 

jawad@ubuntu1:~/ciliumconfigs$ cat CiliumBGPAdvertisement.yaml
apiVersion: cilium.io/v2alpha1
kind: CiliumBGPAdvertisement
metadata:
  name: bgp-advertisements
  labels:
    advertise: bgp
spec:
  advertisements:
    - advertisementType: "Service"
      service:
        addresses:
          - ClusterIP
          - ExternalIP
          - LoadBalancerIP
      selector:
        matchExpressions:
          - { key: servicebgp, operator: In, values: [ proxy ] }
      attributes:
        communities:
          standard: [ "64512:301" ]
    - advertisementType: "Service"
      service:
        addresses:
          - ClusterIP
          - ExternalIP
          - LoadBalancerIP
      selector:
        matchExpressions:
          - { key: servicebgp, operator: In, values: [ proxy2 ] }
      attributes:
        communities:
          standard: [ "64512:302" ]

 

Let’s have a look at what routes Cilium is advertising. We can clearly see two routes advertised; 20.0.10.1/32 which belongs to Tenant1 and also 30.0.10.1/32 which belongs to different Tenant according to our requirements. We are also adverting the Cluster IP’s as part of our demo for demonstration purposes.

jawad@ubuntu1:~$ cilium bgp routes
(Defaulting to `available ipv4 unicast` routes, please see help for more options)

Node            VRouter   Prefix              NextHop   Age         Attrs
ubuntu2.local   64512     10.0.1.0/24         0.0.0.0   59h54m37s   [{Origin: i} {Nexthop: 0.0.0.0}]
                64512     10.109.203.74/32    0.0.0.0   40h29m5s    [{Origin: i} {Nexthop: 0.0.0.0}]
                64512     10.110.206.139/32   0.0.0.0   40h41m2s    [{Origin: i} {Nexthop: 0.0.0.0}]
                64512     20.0.10.1/32        0.0.0.0   20h2m1s     [{Origin: i} {Nexthop: 0.0.0.0}]
                64512     30.0.10.1/32        0.0.0.0   20h2m59s    [{Origin: i} {Nexthop: 0.0.0.0}]

 

Now Let’s have a look at what the Nexus is reporting in VRF EVPN routing table facing the K8s cluster:

 

Leaf01# show bgp l2vpn evpn vrf Tenant-K8s
BGP routing table information for VRF default, address family L2VPN EVPN
BGP table version is 3435, Local Router ID is 10.70.0.11
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist, I-i
njected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup, 2 - b
est2

   Network            Next Hop            Metric     LocPrf     Weight Path
Route Distinguisher: 10.70.0.11:6    (L3VNI 100301)
*>l[5]:[0]:[0]:[24]:[192.168.1.0]/224
                      10.77.0.111              0        100      32768 ?
*>l[5]:[0]:[0]:[24]:[192.168.16.0]/224
                      10.77.0.111              0        100      32768 ?
*>l[5]:[0]:[0]:[32]:[10.109.203.74]/224
                      10.77.0.111                                    0 64512 i
*>l[5]:[0]:[0]:[32]:[10.110.206.139]/224
                      10.77.0.111                                    0 64512 i
*>l[5]:[0]:[0]:[32]:[20.0.10.1]/224
                      10.77.0.111                                    0 64512 i
*>l[5]:[0]:[0]:[32]:[30.0.10.1]/224
                      10.77.0.111                                    0 64512 i

 

Now Let’s look at the communities for each of the above two routes:

 

Leaf01# show bgp l2vpn evpn 20.0.10.1 vrf Tenant-K8s  | grep Community
      Community: 64512:301
Leaf01# show bgp l2vpn evpn 30.0.10.1 vrf Tenant-K8s  | grep Community
      Community: 64512:302

 

As long as the communities are advertised correctly, we can now do some treatments, we have another VRF called “Tenant1-Pods”, the goal is to import only EVPN routes that has community of 64512:301 into it. The 64512:302 community should not be imported into this VRF, as this VRF represents a different Tenant.

 


Leaf01# sh run | sec "vrf context Tenant-K8s"
vrf context Tenant-K8s
  vni 100301
  ip pim ssm range 232.0.0.0/8
  rd auto
  address-family ipv4 unicast
    route-target both auto
    route-target both auto evpn
    export map K8-Export-Others
    import map CommunityImport evpn

Leaf01# sh run | sec "vrf context Tenant1-Pods"

vrf context Tenant1-Pods
  vni 100302
  ip pim ssm range 232.0.0.0/8
  rd auto
  address-family ipv4 unicast
    route-target both auto
    route-target both auto evpn
    import vrf advertise-vpn

Leaf01# sh run | sec "route-map K8-Export-Others"
route-map K8-Export-Others permit 10
  match community Tenant1-Pods-BGP-Community
  set extcommunity rt 65000:100302

Leaf01# sh run | sec "route-map CommunityImport"
route-map CommunityImport permit 10
  match extcommunity Tenant1-Pods-RT
  set community 64512:301
route-map CommunityImport permit 20
  match tag 12346
  set community 64512:301
route-map CommunityImport deny 30

Leaf01# sh ip community-list
Standard Community List Tenant1-Pods-BGP-Community
    10 permit 64512:301
Leaf01# sh ip extcommunity-list
Standard Extended Community List Tenant1-Pods-RT
   10 permit RT:65000:100302

 

Let’s have a look at the VRF belonging to Tenant1 (Tenant1-Pods), only the 20.0.10.1./32 should appear there, but not the 30.0.10.1/32 route.

 

Leaf01# show bgp l2vpn evpn vrf Tenant1-Pods
BGP routing table information for VRF default, address family L2VPN EVPN
BGP table version is 3435, Local Router ID is 10.70.0.11
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist, I-i
njected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup, 2 - b
est2

   Network            Next Hop            Metric     LocPrf     Weight Path
Route Distinguisher: 10.70.0.11:5    (L3VNI 100302)
*>l[5]:[0]:[0]:[24]:[10.71.1.0]/224
                      10.77.0.111              0        100      32768 ?
*>l[5]:[0]:[0]:[32]:[10.110.206.139]/224
                      10.77.0.111                                    0 64512 i
*>l[5]:[0]:[0]:[32]:[20.0.10.1]/224
                      10.77.0.111                                    0 64512 i
*>l[5]:[0]:[0]:[32]:[172.16.16.16]/224
                      10.77.0.111              0        100      32768 ?

 

It is obvious from the above output that we correctly imported the routes belonging to specific tenant into the correct routing table. We have used the BGP communities received from Cilium to filter and place the routes into the correct VRF/tenant. Though this simple implementation provides sort of "multi-tenant" capabilities, it is important to note that this is not a complete end to end tenancy unless you run real L3VPN services on the worker nodes, such as deploying Isovalent Enterprise for Cilium as we will see shortly.

 

Using L3VPN SRV6 with Native True End to End VRF/Tenants

Let’s consider this scenario where Isovalent Enterprise for Cilium is running on a K8s node. We might have several VRFs configured directly on the nodes. Let’s consider that Tenant 1 is represented by the VRF “Blue”. There is a seamless integration between EVPN and L3VPN SRV6, but we will focus here on the L3VPN part only.

 

JawadAlAkrabawi_0-1732515456089.png

 

Let’s first see how the BGP sessions are established:

 

root@ubuntuk8:/home/jawad/srv6# cilium bgp peers
Node             Local AS   Peer AS   Peer Address        Session State   Uptime     Family          Received   Advertised
ubuntuk8.local   65000      65000     2016:40:40:40::40   established     2h39m59s   ipv6/unicast    0          2
        ipv4/mpls_vpn   1          1

 

We can see from the above that Cilium is establishing both ipv6 unicast and VPNv4 bgp sessions with the Cisco Switch/Router. The IPv6 session – in this setup- will enable reachability to the exchanged Locator prefixes/SIDs for SRV6, while the VPNv4 session will carry VRF/Tenant routes including any advertised SIDs.

I am using a Cisco XRV virtual router in this lab, but it could be a Nexus 9K switch. Let’s see the IPV6 session from router’s side:

 

RP/0/RP0/CPU0:XRV#show bgp ipv6 unicast summary

Process    RcvTblVer     bRIB/RIB     LabelVer    ImportVer    SendTblVer   StandbyVer
Speaker           10            10            10            10            10             0

Neighbor        Spk    AS MsgRcvd MsgSent       TblVer  InQ OutQ  Up/Down  St/PfxRcd
2016:23:23:23:20c:29ff:fe73:137d
                  0 65000     355     353           10    0    0 02:53:04          2

 

Similarly, we can check the VPNv4 sessions between the Cilium Enterprise and the XRV router.

 

RP/0/RP0/CPU0:XRV#show bgp vpnv4 unicast summary
Neighbor        Spk    AS MsgRcvd MsgSent       TblVer  InQ OutQ  Up/Down  St/PfxRcd
2016:23:23:23:20c:29ff:fe73:137d
                  0 65000     356     354           10    0    0 02:53:43          1

 

Let’s see what Locator/SID are being received from Cilium Enterprise.

 

RP/0/RP0/CPU0:XRV#sh route ipv6
B 2001:23:23:23::/64
[200/0] via 2016:23:23:23:20c:29ff:fe73:137d, 00:13:10
C 2016:23:23:23::/64 is directly connected,
2d10h, GigabitEthernet0/0/0/0
L 2016:23:23:23::40/128 is directly connected,
2d10h, GigabitEthernet0/0/0/0
L 2016:40:40:40::40/128 is directly connected,
2d10h, Loopback0
L cafe:cafe:100::/48, SRv6 Endpoint uN (shift)
[0/0] via ::, 2d10h
L cafe:cafe:100::/64, SRv6 Endpoint uN (PSP/USD)
[0/0] via ::, 2d10h
L cafe:cafe:100:e000::/64, SRv6 Endpoint uDT4
[0/0] via ::ffff:0.0.0.0 (nexthop in vrf blue), 00:21:04
L cafe:cafe:100:e001::/64, SRv6 Endpoint uDT4
[0/0] via ::ffff:0.0.0.0 (nexthop in vrf default), 00:21:04
L cafe:cafe:100:e002::/64, SRv6 Endpoint uDT6
[0/0] via ::, 00:21:04
B cafe:cafe:2be::/48
[200/0] via 2016:23:23:23:20c:29ff:fe73:137d, 00:13:10

 

This above advertised SID should be part of the locator prefix configured on the Cilium Enterprise, so let's check.

 

root@ubuntuk8:/home/jawad/srv6# kubectl get sidmanager -o yaml | grep  "prefix: cafe"
        prefix: cafe:cafe:2be::/48

 

Pod1 belongs to the blue VRF, and we expect that Cilium will advertise it only in the blue VRF, but before that, let’s check the IP address of that pod.

 

root@ubuntuk8:/home/jawad# kubectl get pods -n blue
NAME        READY   STATUS    RESTARTS      AGE
pod1-blue   1/1     Running   2 (41m ago)   167m

root@ubuntuk8:/home/jawad# kubectl describe -n blue pod pod1-blue | grep vrf
Labels:           vrf=blue

root@ubuntuk8:/home/jawad# kubectl exec -it -n blue pod1-blue -- sh
/ # ifconfig | grep 10.23
          inet addr:10.23.0.98  Bcast:0.0.0.0  Mask:255.255.255.255

 

So now we know that this pod is running with ipv4 address of 10.23.0.98 in the blue VRF, the XRV router should have this POD CIDR subnet inside the blue VRF, and it should be received with the locator prefix/SID belonging to cilium node.

 

RP/0/RP0/CPU0:XRV#show bgp vpnv4 unicast received-sids
Status codes: s suppressed, d damped, h history, * valid, > best
              i - internal, r RIB-failure, S stale, N Nexthop-discard
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Received Sid
Route Distinguisher: 10:10 (default for vrf blue)
Route Distinguisher Version: 23
*>i10.23.0.0/24 2016:23:23:23:20c:29ff:fe73:137d
cafe:cafe:2be:8b71::
*> 172.19.19.19/32 0.0.0.0 NO SRv6 Sid

 

Indeed, the next hop and the SID belong to this K8s node. For testing, I created a dummy loopback in the blue VRF, I have also advertised it in BGP under the blue VRF

 

RP/0/RP0/CPU0:XRV#sh run int lo10
interface Loopback10
 vrf blue
 ipv4 address 172.19.19.19 255.255.255.255

RP/0/RP0/CPU0:XRV#show bgp vpnv4 unicast vrf blue
Status codes: s suppressed, d damped, h history, * valid, > best
              i - internal, r RIB-failure, S stale, N Nexthop-discard
Origin codes: i - IGP, e - EGP, ? - incomplete
   Network            Next Hop            Metric LocPrf Weight Path
Route Distinguisher: 10:10 (default for vrf blue)
Route Distinguisher Version: 10
*>i10.23.0.0/24       2016:23:23:23:20c:29ff:fe73:137d
                                               0    100      0 ?
*> 172.19.19.19/32    0.0.0.0                  0         32768 ?
Processed 2 prefixes, 2 paths

 

Great, so both routes exist in the correct VRF, one of them is advertised through Cilium in the Blue VRF, and the other is a dummy loopback on the router. let’s make sure that they exist only in the blue VRF and not for example in the underlay (global routing table).

 

RP/0/RP0/CPU0:XRV#sh route ipv4
C    192.168.16.0/24 is directly connected, 1d00h, GigabitEthernet0/0/0/0
L    192.168.16.40/32 is directly connected, 1d00h, GigabitEthernet0/0/0/0
L    192.168.40.40/32 is directly connected, 1d00h, Loopback0

 

You can clearly see that the routing table of the XRV router has no idea on those routes, it indeed resembles a true underlay in a fabric.

Let’s test connectivity from the blue pod itself:

 

root@ubuntuk8:/home/jawad# kubectl exec -it pod1-blue -n blue -- sh
/ # ping 172.19.19.19
PING 172.19.19.19 (172.19.19.19): 56 data bytes
64 bytes from 172.19.19.19: seq=0 ttl=253 time=2.747 ms
64 bytes from 172.19.19.19: seq=1 ttl=253 time=2.272 ms
64 bytes from 172.19.19.19: seq=2 ttl=253 time=2.390 ms
64 bytes from 172.19.19.19: seq=3 ttl=253 time=2.184 ms
64 bytes from 172.19.19.19: seq=4 ttl=253 time=2.255 ms
^C
--- 172.19.19.19 ping statistics ---
5 packets transmitted, 5 packets received, 0% packet loss
round-trip min/avg/max = 2.184/2.369/2.747 ms

 

Drilling into packet captures, we can clearly see that the ICMP packet request was sent to some SID locator.

ICMP Packet sent:

JawadAlAkrabawi_1-1732515921244.png

 

ICMP Packet reply:

JawadAlAkrabawi_2-1732516038278.png

 

Note that traffic for ICMP request was destined to cafe:cafe:100:e000::, you’ve probably guessed already that this SID was advertised by the XRV router inside the blue VRF. So let’s check.

 

RP/0/RP0/CPU0:XRV#show bgp vpnv4 unicast local-sids
Status codes: s suppressed, d damped, h history, * valid, > best
              i - internal, r RIB-failure, S stale, N Nexthop-discard
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Local Sid Alloc mode Locator
Route Distinguisher: 10:10 (default for vrf blue)
Route Distinguisher Version: 23
*>i10.23.0.0/24 NO SRv6 Sid - -
*> 172.19.19.19/32 cafe:cafe:100:e000:: per-vrf XRV

Processed 2 prefixes, 2 paths

 

The prefix 172.19.19.19/32 has a local SID of cafe:cafe:100:e000::, this is what the XRV router has as a local SID, while the 10.23.0.0/24 had a received SID from the Cilium’s side as was shown previously.

Cilium Enterprise supports SRv6 L3VPN, literally acting as a PE router exchanging routes using Segment Routing over IPv6 (SRv6). This feature allows us to create virtual private networks that can provide true end to send segmentation and multi-tenant environments to providing secure and isolated connectivity between Kubernetes clusters, data centers, and even the public clouds.

Since we have now a fully complete proper L3VPN/VRF routing table, we can now configure seamless integration with our EVPN fabric.

https://www.cisco.com/c/en/us/td/docs/dcn/nx-os/nexus9000/105x/configuration/vxlan/cisco-nexus-9000-series-nx-os-vxlan-configuration-guide-release-105x/m_configuring_seamless_integration_of_evpn_with_l3vpn_mpls_sr.html

 

Special thanks to Cisco teams who created internal Cilium dcloud demo that had several examples on configuration of the Isovalent Cilium !

 

 

 

 

Getting Started

Find answers to your questions by entering keywords or phrases in the Search bar above. New here? Use these resources to familiarize yourself with the community: