One of the biggest misconceptions I had before moving into the service provider space was that all layer 2 operations had been replaced with layer 3. I quickly found out that even if you are routing everything there are still a lot of layer 2 overlays in place, carrier ethernet, or even spanning-tree (STP). Running these technologies, hopefully not STP, is especially important rural broadband, electric co-ops, and telcos to enable things such as IP conservation and subscriber management on a broadband network gateway.
The side effect of layer 2 transport technologies being so heavily used in last mile internet service providers is having to understand vlan tagging operations. The are occasions where one side of the circuit will be double tagged and the other side will be single tagged or everything will be double tagged/single tagged/untagged. Let’s take a quickly look at some simple tag operations on IP Infusions OcNOS SP 6.0.
IGP and MPLS setup
We setup a simple topology of isis and ldp to create VPWS circuits. We are going to build two circuits in this example.
We are going to attach the circuits to the same physical ports and utilize tags to place the traffic into the correct VPWS circuit. Mikrotik-1 is double tagged and Mikrotik-2 is single tagged.
If you’ve built pseudowires on OcNOS before you might be familiar with the service-template model. However, when building VPWS circuits and utilizing the same interface this isn’t an option. You have to make a subinterface that is a switchport. This also changes the method for tag operations.
The switchport subinterfaces are then assigned as an access interface matching specific tags. In this case outer tag 3 and inner tag 400-499. On IPI-2 we are expecting single tagged frames instead of double tagged. This is where the rewrite pop comes in. This will remove the outer tag on ingress to the PW and push it back on on egress towards Mikrotik-1.
Since the AC towards Mikrotik-2 is only expecting single tagged packets we just do a simple match on encapsulation.
Let’s do some verification. First we’ll verify that the circuits are up.
ipi-2.lab.jan1.us.ipa.net#show ldp targeted-peers
IP Address Interface
100.127.0.1 xe48
ipi-2.lab.jan1.us.ipa.net#show ldp mpls-l2-circuit
Transport Client VC VC Local Remote Destination
VC ID Binding State Type VC Label VC Label Address
123 xe11.400 UP Ethernet VLAN 25602 26240 100.127.0.1
1234 xe11.4 UP Ethernet VLAN 25603 26241 100.127.0.1
ipi-2.lab.jan1.us.ipa.net#show ldp mpls-l2-circuit detail
PW ID: 123, VC state is up
Access IF: xe11.400,up,AC state is up
Session IF: xe48, state is up
Destination: 100.127.0.1, Peer LDP Ident: 100.127.0.1
Local vctype: vlan, remote vctype :vlan
Local groupid: 0, remote groupid: 0
Local label: 25602, remote label: 26240
Local MTU: 1500, Remote MTU: 1500
Local Control Word: disabled Remote Control Word: Not-Applicable Current use: disabled
Local Flow Label Direction: Disabled, Static: Disabled
Remote Flow Label Direction: Both , Static: Disabled
Local PW Status Capability : disabled
Remote PW Status Capability : disabled
Current PW Status TLV : disabled
PW ID: 1234, VC state is up
Access IF: xe11.4,up,AC state is up
Session IF: xe48, state is up
Destination: 100.127.0.1, Peer LDP Ident: 100.127.0.1
Local vctype: vlan, remote vctype :vlan
Local groupid: 0, remote groupid: 0
Local label: 25603, remote label: 26241
Local MTU: 1500, Remote MTU: 1500
Local Control Word: disabled Remote Control Word: Not-Applicable Current use: disabled
Local Flow Label Direction: Disabled, Static: Disabled
Remote Flow Label Direction: Disabled, Static: Disabled
Local PW Status Capability : disabled
Remote PW Status Capability : disabled
Current PW Status TLV : disabled
We have successfully manipulated the tags and passed traffic end to end. We started with a double tagged packet, pop off the outer tag, placed it into a PW, spit out a single tagged packet, and had end to end reachability.
Conclusion
There are various methods and configurations to manipulate traffic. This is one of the more common examples of tag manipulation that occurs in broadband aggregation. I will explore tag manipulation with service-templates and VPLS in a later post.
IP Infusion just released OcNOS version 6.0 and the release notes, as well as press release, show a focus on EVPN with an MPLS data plane. Don’t forget EVPN and VxLAN aren’t mutually exclusive, EVPN runs on and was originally designed for a MPLS data plane. I recently discussed this on a podcast EVPN doesn’t need VxLAN if you want to know more on that topic.
Lets take a look at basic EVPN-VPWS and EVPN-VPLS deployment. Since we’re looking at an MPLS data plane we will utilize ISIS-SR for MPLS. We’re utilizing ISIS-SR as it is increasingly replacing LDP and RSVP-TE for label distribution.
IGP and Label Distribution
First let’s look at the IGP setup and label distribution as everything else will be built on top of this.
ipi-1.lab.jan1.us.ipa.net#show run int lo
interface lo
ip address 127.0.0.1/8
ip address 100.127.0.1/32 secondary
ipv6 address ::1/128
ipv6 address 2001:db8::1/128
prefix-sid index 101
ip router isis UNDERLAY
ipv6 router isis UNDERLAY
!
We have to set an index to create the node-sid for this device. In this case we use 101.
ipi-1.lab.jan1.us.ipa.net#show run segment-routing
segment-routing
mpls sr-prefer
global block 16000 23999
Since our segment routing global block starts at 16000 the node-sid becomes 16101 as the index + the start of the SRGB defines the sid. Additionally, we run mpls sr-prefer as this will prefer SR labels over LDP or RSVP-TE labels.
ipi-1.lab.jan1.us.ipa.net#show run isis
router isis UNDERLAY
is-type level-1-2
metric-style wide
mpls traffic-eng router-id 100.127.0.1
mpls traffic-eng level-1
mpls traffic-eng level-2
capability cspf
dynamic-hostname
fast-reroute ti-lfa level-1 proto ipv4
fast-reroute ti-lfa level-2 proto ipv4
net 49.0015.1001.2700.0001.00
segment-routing mpls
!
Finally, we have to enable ISIS for segment routing.
ipi-1.lab.jan1.us.ipa.net#show clns neighbors
Total number of L1 adjacencies: 1
Total number of L2 adjacencies: 1
Total number of adjacencies: 2
Tag UNDERLAY: VRF : default
System Id Interface SNPA State Holdtime Type Protocol
ipi-2.lab.jan1.us.ipa.net xe48 3c2c.99c0.00aa Up 26 L1L2 IS-IS
ipi-1.lab.jan1.us.ipa.net#show mpls ilm-table
Codes: > - installed ILM, * - selected ILM, p - stale ILM
K - CLI ILM, T - MPLS-TP, s - Stitched ILM
S - SNMP, L - LDP, R - RSVP, C - CRLDP
B - BGP , K - CLI , V - LDP_VC, I - IGP_SHORTCUT
O - OSPF/OSPF6 SR, i - ISIS SR, k - SR CLI
P - SR Policy, U - unknown
Code FEC/VRF/L2CKT ILM-ID In-Label Out-Label In-Intf Out-Intf/VRF Nexthop
LSP-Type
i> 100.127.0.1/32 4 16101 Nolabel N/A N/A 127.0.0.1
LSP_DEFAULT
B> evpn:1 3 17 Nolabel N/A N/A 127.0.0.1
LSP_DEFAULT
B> evpn:100 1 16 Nolabel N/A N/A 127.0.0.1
LSP_DEFAULT
B> evpn:1 2 640 Nolabel N/A N/A 127.0.0.1
LSP_DEFAULT
P> 100.127.0.2/32 7 20 3 N/A xe48 100.126.0.2
LSP_DEFAULT
i> 100.126.0.2/32 5 26240 3 N/A xe48 100.126.0.2
LSP_DEFAULT
i> 100.127.0.2/32 6 16102 3 N/A xe48 100.126.0.2
LSP_DEFAULT
Now we can see that we have a clns/isis neighbor with ipi-2 as well as learned labels. We can see both device’s node-sids in the label table on ipi-1.
ipi-1.lab.jan1.us.ipa.net#show bgp l2vpn evpn summary
BGP router identifier 100.127.0.1, local AS number 65000
BGP table version is 32
1 BGP AS-PATH entries
0 BGP community entries
Neighbor V AS MsgRcv MsgSen TblVer InQ OutQ Up/Down State/PfxRcd AD MACIP
MCAST ESI PREFIX-ROUTE
100.127.0.2 4 65000 22856 22856 32 0 0 6d18h34m 2 1 0
1 0 0
EVPN-VPWS
Next we can start build services on top. First we’ll build an EVPN-VPWS service.
ipi-1.lab.jan1.us.ipa.net:
!
evpn mpls enable
!
evpn mpls vtep-ip-global 100.127.0.1
!
mac vrf BLUE
rd 100.127.0.1:1
route-target both evpn-auto-rt
!
evpn mpls id 100 xconnect target-mpls-id 2
host-reachability-protocol evpn-bgp BLUE
!
interface xe46.10 switchport
encapsulation dot1q 10
access-if-evpn
map vpn-id 100
!
EVPN MPLS has to be enabled. *IMPORTANT* This requires a reboot. Next the vtep id needs to be set. These are global settings for the environment.
For the creation of the service we’ll start by making a mac vrf to generate the information needed to create a EVPN type 2 route (mac-ip).
Since this is VPWS it is considered a cross connect xconnect and a target is defined. This is the remote PE vpn-id, in this case 2.
Finally it is assigned to a switchport. It has to be a switchport with a type of access-if-evpn. This maps back to the EVPN mac-vrf via the xconnect. Anything arriving on xe46.10 with a dot1q tag of 10 is placed into this tunnel.
ipi-1.lab.jan1.us.ipa.net#show evpn mpls xconnect
EVPN Xconnect Info
========================
AC-AC: Local-Cross-connect
AC-NW: Cross-connect to Network
AC-UP: Access-port is up
AC-DN: Access-port is down
NW-UP: Network is up
NW-DN: Network is down
NW-SET: Network and AC both are up
Local Remote Connection-Details
================================ ============ ==========================================================================
=========
VPN-ID EVI-Name MTU VPN-ID Source Destination PE-IP MTU Type NW
-Status
================================ ============ ==========================================================================
=========
100 ---- 1500 2 xe46.10 --- Single Homed Port --- 100.127.0.2 1500 AC-NW NW
-SET
Total number of entries are 1
ipi-1.lab.jan1.us.ipa.net#show evpn mpls xconnect tunnel
EVPN-MPLS Network tunnel Entries
Source Destination Status Up/Down Update local-evpn-id remote-evpn-id
========================================================================================================
100.127.0.1 100.127.0.2 Installed 01:31:06 01:31:06 100 2
Total number of entries are 1
The tunnels are up, installed, and ready for forwarding. We can see the CE macs as mac-ip routes in evpn.
ipi-1.lab.jan1.us.ipa.net#show bgp l2vpn evpn vrf BLUE
BGP table version is 1, local router ID is 100.127.0.1
Status codes: s suppressed, d damped, h history, a add-path, * valid, > best, i - internal,
l - labeled, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
[EVPN route type]:[ESI]:[VNID]:[relevent route informantion]
1 - Ethernet Auto-discovery Route
2 - MAC/IP Route
3 - Inclusive Multicast Route
4 - Ethernet Segment Route
5 - Prefix Route
Network Next Hop Metric LocPrf Weight Path Peer Encap
* i [1]:[0]:[2]:[16] 100.127.0.2 0 100 0 i 100.127.0.2 MPLS
*> [1]:[0]:[100]:[16]
100.127.0.1 0 100 32768 i ---------- MPLS
Total number of prefixes 2
The mac addresses are sent via an EVPN type-2 route between PEs.
ipi-1.lab.jan1.us.ipa.net#show evpn mpls mac-table
========================================================================================================================
=================
EVPN MPLS MAC Entries
========================================================================================================================
=================
VNID Interface VlanId In-VlanId Mac-Addr VTEP-Ip/ESI Type Status MAC mo
ve AccessPortDesc
________________________________________________________________________________________________________________________
_________________
Total number of entries are : 0
Since this is VPWS there are no macs learned on the device.
[email protected]# run ping 172.16.0.2
PING 172.16.0.2 (172.16.0.2): 56 data bytes
64 bytes from 172.16.0.2: icmp_seq=0 ttl=64 time=21.531 ms
64 bytes from 172.16.0.2: icmp_seq=1 ttl=64 time=22.124 ms
Success! The CEs can reach each other over the EVPN-VPWS circuit.
EVPN-VPLS
Now we’ll build an EVPN-VPLS service. The BGP setup is the same so we’ll focus solely on the differences. The first one being the vpn-id creation.
mac vrf ORANGE
rd 100.127.0.1:2
route-target both evpn-auto-rt
!
evpn mpls id 1
host-reachability-protocol evpn-bgp ORANGE
!
There is no end point defined as a xconnect. All that is necessary is to bind the mac vrf to the evpn vpn id.
Again, a switchport defined as an access-if-evpn is necessary. This is then mapped to the vpn-id for the VPLS service. In this case anything coming in with a dot1q tag of 100 will be placed into vpn-id 1.
ipi-1.lab.jan1.us.ipa.net#show evpn mpls mac-table
========================================================================================================================
=================
EVPN MPLS MAC Entries
========================================================================================================================
=================
VNID Interface VlanId In-VlanId Mac-Addr VTEP-Ip/ESI Type Status MAC mo
ve AccessPortDesc
________________________________________________________________________________________________________________________
_________________
1 xe46.100 ---- ---- 84c1.c132.5031 100.127.0.1 Dynamic Local ------- 0
-------
1 ---- ---- ---- 84c1.c132.5032 100.127.0.2 Dynamic Remote ------- 0
-------
Total number of entries are : 2
Since this is a VPLS service MACs are learned both locally and remotely. The remote MAC is the MAC of the remote CE. This was learned via EVPN and from the VTEP 100.127.0.2.
[email protected]# run ping 192.168.0.2
PING 192.168.0.2 (192.168.0.2): 56 data bytes
64 bytes from 192.168.0.2: icmp_seq=0 ttl=64 time=21.894 ms
64 bytes from 192.168.0.2: icmp_seq=1 ttl=64 time=22.159 ms
Success! We have reachability across the service.
Conclusion
IP Infusion is continuing to build their evpn/mpls deployment as well as segment routing. It is exciting to see these feature sets continue to mature as traditional LDP/VPLS deployments move to EVPN/MPLS. If you need assistance on the transition from LDP to segment routing or VPLS to EVPN reach out to IP Architechs.
At IP Architechs we perform a lot of network migrations and it is no secret network migrations/ maintenance windows can be one of the most nerve-racking things for engineers, managers, and business leaders for a variety of reasons.
For the engineers the uncertainty might be caused by fear of failure, not being able to predict the outcome due to complexity, rushed on preparation to meet a deadline, or a litany of other reasons.
For managers and business leaders it might be more along the lines of; what happens if this goes wrong, how will this effect my bottom line, are there going to be 1000s of trouble tickets come 8/9am when everyone hits the office, and so on.
The Preparation
We’re going to look at this at the perspective of the engineer throughout. The prep work is probably one of the most important pieces of success. This is where you do many things including but not limited to:
building and testing the configuration to be implemented
making a rollback plan — this might be something as simple as move a cable and shut an interface or a multistep/multi-device plan
know the situation surrounding the window
Lets explore understanding the situation surrounding the window a some more. I’ll use some real examples here to help.
We were getting ready to change the internet edge deployment at an enterprise. We did all the prep and rollback planning. However, we were given a few constraints on downtime by the business. Additionally, all of the product teams had to join the call for verification due to the impact of the, relatively small, routing change. The next opportunity was going to be a few months out due to change freezes and the coordination of resources necessary.
So what did we learn by engaging outside of the technical realm?
We had tight timeframes which placed an increased emphasis on planning
We needed to have plans for things that could go wrong and resolution paths based on downtime constraints
although a low impact routing change it was a high impact business change
We needed to have clearly defined decision points on what would be cause for a rollback
The Execution
All the prep is done and it’s time to execute the change. We put in the first couple lines of the script and everything is going well. We get to the point where we need to clean up the old configuration. Then every engineers nightmare happens – everything starts to go down.
Okay what do we do now, we know based on the situation we don’t have a lot of time to work through the problem. We need to stay calm and start working through our decision trees made during the planning process.
Some quick troubleshooting revealed when we removed the no longer used virtual routing and forwarding (VRF) instance it shutdown the ports that we now in the global table. We put the VRF back, still unused, everything began to work as expected again.
Next the debate began, should we get TAC on the line to assist. There were still a few items to knock out in the change window to avoid a complete rollback. A majority of people wanted to “chase the rabbit” of what caused the VRF deletion to bring down the interface. However, this would not be a good use of our time. If we got TAC on the line and began to go down that rabbit hole there is no telling where it would have gone or how long it would have taken. The facts were leaving the unused VRF, although annoying to have extra config, didn’t effect performance as far as we could tell and we needed to get through the rest of the migration.
After a short debate we all agreed based on the circumstances of the migration, coordination efforts, business drivers, and still needing to get some more work done we would continue down the migration path. We also took the necessary logs for an initial case with TAC and opened a ticket in the morning. Would we get the same level of info/t-shooting on that problem? No, but we were able to complete the migration and follow up on the weird behavior at a safer time.
Conclusion
Sometimes, based on different circumstance, the right decision would be to get TAC on the line and work through the issue. The owners might decide everything can be down until it’s working as planned or anywhere in between. Often, things like physical access or travel will allow for longer down time/troubleshooting.
It is important to know the situation around the migration, why it’s happening, who’s involved, and keep awareness of those during the migration to make informed decisions with the owner to make everyone successful.
If you need help planning your migrations reach out to us.
If you read part 2 of this series and came out wondering this is great but:
How do I connect to the internet?
Does this breakdown once I need to have connections?
What else do I have to do to manage state?
We’ll set out to answer these questions and show how it works. There are some dependancies such as your provider supporting customer BGP TE communities as laid out in part 3.
This seems to be the elusive grail in enterprise networking that everyone wants but is unsure of where to start. Hopefully, a few of those questions have been answered throughout this series but be sure to understand what you’re getting into and that your team can support it before and after you leave.
The overall topology
We’ve got data center 1 (DC1) and data center 2 (DC2). They each have a connection to an internal router in ASN 60500. A lot of networks I come across have dedicated routers coming out of the DC to terminate internet connections and support full tables. These router usually only pass a default internally. I don’t have the full tables but instead copy the topology and pass a default into the dc1 and dc2 borders.
We’ll be looking at DC1 to keep the amount of variables and options down. We set the community on the default route received from the customer-1-rtr2 to utilize later on advertisements to the FW. This is important for state management.
dc1-border-leaf-1# show ip bgp vrf INTERNET
BGP routing table information for VRF INTERNET, address family IPv4 Unicast
BGP table version is 232, Local Router ID is 10.150.0.0
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist, I-i
njected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup, 2 - b
est2
dc1-border-leaf-1# show ip bgp vrf INTERNET 0.0.0.0/0
BGP routing table information for VRF INTERNET, address family IPv4 Unicast
BGP routing table entry for 0.0.0.0/0, version 223
Paths: (2 available, best #2)
Flags: (0x80c001a) (high32 0x000020) on xmit-list, is in urib, is best urib rout
e, is in HW, exported
vpn: version 431, (0x00000000100002) on xmit-list
Path type: external, path is valid, not best reason: AS Path, no labeled nexth
op
Imported from 100.127.1.1:5:[5]:[0]:[0]:[0]:[0.0.0.0]/224
AS-Path: 65200 60500 65030 , path sourced external to AS
100.127.1.1 (metric 0) from 100.127.1.255 (100.127.1.1)
Origin IGP, MED not set, localpref 100, weight 0
Received label 3003002
Community: 65200:3002
Extcommunity: RT:65100:3003002 ENCAP:8 Router MAC:5004.0000.1b08
Advertised path-id 1, VPN AF advertised path-id 1
Path type: external, path is valid, is best path, no labeled nexthop, in rib
AS-Path: 60500 65020 , path sourced external to AS
100.120.0.2 (metric 0) from 100.120.0.2 (100.127.0.1)
Origin IGP, MED not set, localpref 100, weight 0
Community: 65100:3002
Extcommunity: RT:65100:3003002
VRF advertise information:
Path-id 1 not advertised to any peer
VPN AF advertise information:
Path-id 1 not advertised to any peer
dc1-border-leaf-1# show run | section bgp
<<SNIP>>
vrf INTERNET
address-family ipv4 unicast
redistribute direct route-map RM-CON-INTERNET
neighbor 100.120.0.2
remote-as 60500
address-family ipv4 unicast
as-override
send-community
route-map INET-IN in
dc1-border-leaf-1# show run rpm
<<SNIP>>
route-map INET-IN permit 10
set community 65100:3002
dc1-border-leaf-1# show ip bgp neighbors 100.120.0.2 advertised-routes vrf INTERNET
Peer 100.120.0.2 routes for address family IPv4 Unicast:
BGP table version is 232, Local Router ID is 10.150.0.0
Status: s-suppressed, x-deleted, S-stale, d-dampened, h-history, *-valid, >-best
Path type: i-internal, e-external, c-confed, l-local, a-aggregate, r-redist, I-i
njected
Origin codes: i - IGP, e - EGP, ? - incomplete, | - multipath, & - backup, 2 - b
est2
Network Next Hop Metric LocPrf Weight Path
*>i10.0.0.0/32 100.127.0.2 100 0 65110 651
10 ?
*>i10.0.0.1/32 100.127.0.2 120 0 65110 651
10 65200 ?
*>i10.100.0.0/32 100.127.0.2 100 0 65110 651
10 ?
*>r10.150.0.0/32 0.0.0.0 0 100 32768 ?
*>e10.151.0.0/32 100.127.1.1 0 0 65200 ?
*>e100.127.0.2/32 100.127.1.1 0 0 65200 605
00 i
*>i192.168.1.0/24 100.127.0.2 100 0 65110 651
10 ?
*>i192.168.2.0/24 100.127.0.2 120 0 65110 651
10 65200 ?
*>i192.168.10.0/24 100.127.0.2 100 0 65110 651
10 ?
*>i192.168.20.0/24 100.127.0.2 120 0 65110 651
10 65200 ?
So, we’ve got our default route in and advertise all our internal subnets 192.168.xx.0/24 towards the edge. When xx starts with 1 it’s from DC1 and when it starts with 2 it’s from DC2.
We utilize the provider communities referenced in part 3 to set dc1 to prefer ISP-2 and dc2 to prefer ISP-3. Pay close attention to the local preference on ISP2 in the output below.
CUSTOMER-1-RTR-2#show run
<<SNIP>>
router bgp 60500
bgp router-id 100.127.0.1
bgp log-neighbor-changes
neighbor 100.125.0.1 remote-as 65020
neighbor 100.125.0.1 send-community
neighbor 100.125.0.1 route-map FROM-INET in
neighbor 100.125.0.1 route-map TO-INET out
ip prefix-list DC1-PRIMARY seq 5 permit 192.168.1.0/24
ip prefix-list DC1-PRIMARY seq 10 permit 192.168.10.0/24
!
ip prefix-list DC2-PRIMARY seq 5 permit 192.168.2.0/24
ip prefix-list DC2-PRIMARY seq 10 permit 192.168.20.0/24
!
ip prefix-list DEFAULT seq 5 permit 0.0.0.0/0
!
ip prefix-list LOOPBACK seq 5 permit 100.127.0.1/32
!
route-map TO-INET permit 10
match ip address prefix-list DC1-PRIMARY
set community 65020:120
!
route-map TO-INET permit 20
match ip address prefix-list DC2-PRIMARY
set community 65020:80
!
ISP-2-RTR-1#show ip bgp
BGP table version is 400, local router ID is 100.127.2.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
x best-external, a additional-path, c RIB-compressed,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
0.0.0.0 0.0.0.0 0 i
*> 100.127.2.1/32 0.0.0.0 0 32768 i
* 100.127.3.1/32 100.122.0.2 0 65010 65030 i
*> 100.121.0.2 0 0 65030 i
* 192.0.2.0 100.121.0.2 0 65030 65010 i
*> 100.122.0.2 0 0 65010 i
*> 192.168.1.0 100.125.0.2 120 0 60500 65100 65110 65110 ?
* 192.168.2.0 100.125.0.2 80 0 60500 65100 65110 65110 65200 ?
* 100.1 Network Next Hop Metric LocPrf Weight Path
*> 192.168.10.0 100.125.0.2 120 0 60500 65100 65110 65110 ?
* 192.168.20.0 100.125.0.2 80 0 60500 65100 65110 65110 65200 ?
* 100.122.0.2 0 65010 65030 60500 65200 65210 65210 ?
*> 100.121.0.2 0 65030 60500 65200 65210 65210 ?
* 198.51.100.0 100.122.0.2 0 65010 65030 65040 i
*> 100.121.0.2 0 65030 65040 i
There is nothing fancy to see here, this generally speaking, just works provided the prefixes were setup to utilize their primary DC for internet connections taking advantage of customer BGP TE communities. If this is not done the WILL be a state problem. Let’s examine the path vrf BLUE takes. This will be used throughout for our reference.
vrf-BLUE-1#show ip int bri
Interface IP-Address OK? Method Status Protocol
GigabitEthernet0/0 192.168.1.2 YES manual up up
vrf-BLUE-1#ping 192.0.2.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.0.2.1, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 8/9/11 ms
vrf-BLUE-1#traceroute 192.0.2.1
Type escape sequence to abort.
Tracing the route to 192.0.2.1
VRF info: (vrf in name/id, vrf out name/id)
1 192.168.1.1 4 msec 1 msec 1 msec
2 172.16.0.1 2 msec 3 msec 2 msec
3 172.16.0.10 2 msec 3 msec 2 msec
4 10.150.0.0 7 msec 7 msec 6 msec
5 100.120.0.2 10 msec 12 msec 11 msec
6 100.125.0.1 8 msec 9 msec 13 msec
7 100.122.0.2 9 msec * 10 msec
FW failure
Next we’ll see what happens when the firewall in dc1 fails due to either expected or unexpected reasons.
Upon the failure all of the routes will be relearned and advertised through dc2. This is explained in detail in part 2 of this series so I will not go into details here. We will look at the final path and failure times though. Remember this lab is not running any optimizations to speed up convergence throughout the system.
The UU and . are the point when I shut down the internet peering between dc1-leaf-1 and fortinet-1. This forced a routing change and sent the traffic over to fortinet-2 following the path seen above. You can also see the 3 additional hops due to traversing fortinet-2 instead of fortinet-1.
The return path from the internet being through customer-1-rtr-2 is due to the provider communities used earlier ensure 192.168.1.0/24 bound traffic returns in this dc to avoid a state problem during normal operations.
I’m sure with the right tooling this could be resolved but it would take an automated action or so much complexity it isn’t worth maintaining. The increased latency is probably worth the operational simplicity.
Internet failure
This failure is a little more straight forward as the outbound and return path are symmetric not only from a FW policy perspective but also from an overall perspective. We make use of the communities set on the internet advertisements to enable this failure.
Without marking the default route with an attribute to act on we wouldn’t be able to differentiate on the fortinets if the upstream internet was down which would introduce that state problem. To solve this we only send the default route from the DC that the fortinet is in.
dc1-leaf-1# show run bgp
<<SNIP>>
router bgp 65100
<<SNIP>>
vrf INTERNET
address-family ipv4 unicast
redistribute direct route-map RM-CON-INTERNET
neighbor 172.16.0.9
remote-as 65110
address-family ipv4 unicast
send-community
route-map INET-FROM-FW in
route-map INET-TO-FW out
dc1-leaf-1# show run rpm
!Command: show running-config rpm
!Running configuration last done at: Sun Jul 24 13:16:59 2022
!Time: Sun Jul 24 13:23:46 2022
version 9.3(3) Bios:version
ip prefix-list DEFAULT seq 10 permit 0.0.0.0/0
ip community-list standard DC1-BLUE-CL seq 10 permit 65100:3000
ip community-list standard DC1-INET seq 10 permit 65100:3002
ip community-list standard DC1-ORANGE-CL seq 10 permit 65100:3001
ip community-list standard DC2-BLUE-CL seq 10 permit 65200:3000
ip community-list standard DC2-INET seq 10 permit 65200:3002
ip community-list standard DC2-ORANGE-CL seq 10 permit 65200:3001
route-map BLUE-TO-FW-IN permit 5
match ip address prefix-list DEFAULT
route-map BLUE-TO-FW-IN permit 10
match community DC1-ORANGE-CL
route-map BLUE-TO-FW-IN permit 20
match community DC2-ORANGE-CL
set local-preference 120
route-map BLUE-TO-FW-OUT permit 10
match community DC1-BLUE-CL DC2-BLUE-CL
route-map INET-FROM-FW permit 10
match community DC2-ORANGE-CL DC2-BLUE-CL
set local-preference 120
route-map INET-FROM-FW permit 20
match community DC1-ORANGE-CL DC1-BLUE-CL
route-map INET-TO-FW permit 10
match community DC1-INET
route-map ORANGE-TO-FW-IN permit 5
match ip address prefix-list DEFAULT
route-map ORANGE-TO-FW-IN permit 10
match community DC1-BLUE-CL
route-map ORANGE-TO-FW-IN permit 20
match community DC2-BLUE-CL DC2-ORANGE-CL
set local-preference 80
route-map ORANGE-TO-FW-OUT permit 10
match community DC1-ORANGE-CL DC2-ORANGE-CL
route-map RM-CON-BLUE permit 10
match tag 3000
set community 65100:3000
route-map RM-CON-INTERNET permit 10
match tag 3002
set community 65100:3002
route-map RM-CON-ORANGE permit 10
match tag 3001
set community 65100:3001
The additional route-map for inbound routes, INET-FROM-FW, is also to help maintain state. If we did not force this action to occur then under normal operations the traffic inbound from isp-2 to dc2 would go back to fortinet-2 which causes a problem during a failure scenario. If there is interest I will add some more failure scenario of what happens when this isn’t in place.
On this test I will bring down the connection between customer-1-rtr-2 and isp-2 to simulate the outage. This will force the withdrawal of routes from isp-2 directly from customer-1, the entire system, forcing all traffic via dc2.
Again you can see the the path change and additional hops.
Conclusion
It’s possible to have active/active datacenters and manage state in the DC firewalls by combining techniques to achieve the goals. However, it takes quite a bit of upfront work to get the policy correct to maintain state. It’s important to understand the trade offs when going from a traditional active/standby to an active/active setup.
Reach out to us at IP Architechs if you want to know more or have data center design questions. Post comments for more failure scenario or deep dives you’d like to see.
If you’ve ever been asked to prioritize one internet connection over another for any variety of reasons, cost, latency, SLA, etc… this is for you.
Often I hear the same tactics to solve this problem:
AS-PATH prepending
conditional advertisements
scripting
some other manual process
However, most carriers offer customer BGP TE communities that you can use to influence traffic within their AS, with one notable exception Hurricane Electric. If you’re not sure what a BGP community is take a quick look at this post on them first.
Lets explore how to utilize these, where to find them, and how they might give more deterministic path selection than the options laid out above.
BGP Topology
Default behavior with no modification
First to get familiar with the topology and show reachability we’ll leave all settings as “defaults” with no modifications.
ISP-1-RTR-1#traceroute 203.0.113.1 source 192.0.2.1 Type escape sequence to abort. Tracing the route to 203.0.113.1 VRF info: (vrf in name/id, vrf out name/id) 1 100.123.0.1 1 msec 1 msec 1 msec 2 100.124.0.2 1 msec 0 msec 0 msec 3 100.126.0.10 2 msec * 1 msec ISP-1-RTR-1#ping 203.0.113.1 source 192.0.2.1 Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 203.0.113.1, timeout is 2 seconds: Packet sent with a source address of 192.0.2.1 !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/3 ms
ISP-4-RTR-1#traceroute 203.0.113.1 source 198.51.100.1 Type escape sequence to abort. Tracing the route to 203.0.113.1 VRF info: (vrf in name/id, vrf out name/id) 1 100.120.0.1 1 msec 0 msec 1 msec 2 100.124.0.2 2 msec 1 msec 1 msec 3 100.126.0.10 2 msec * 1 msec
ISP-4-RTR-1#ping 203.0.113.1 source 198.51.100.1 Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 203.0.113.1, timeout is 2 seconds: Packet sent with a source address of 198.51.100.1 !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/2 ms
CUSTOMER-1-RTR-1#traceroute 192.0.2.1 source 203.0.113.1 Type escape sequence to abort. Tracing the route to 192.0.2.1 VRF info: (vrf in name/id, vrf out name/id) 1 100.126.0.1 1 msec 1 msec 1 msec 2 100.125.0.1 1 msec 1 msec 0 msec 3 100.122.0.2 1 msec * 1 msec CUSTOMER-1-RTR-1#ping 192.0.2.1 source 203.0.113.1 Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 192.0.2.1, timeout is 2 seconds: Packet sent with a source address of 203.0.113.1 !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/2 ms
CUSTOMER-1-RTR-1#traceroute 198.51.100.1 source 203.0.113.1 Type escape sequence to abort. Tracing the route to 198.51.100.1 VRF info: (vrf in name/id, vrf out name/id) 1 100.126.0.9 1 msec 1 msec 1 msec 2 100.124.0.1 1 msec 2 msec 3 msec 3 100.120.0.2 2 msec * 2 msec CUSTOMER-1-RTR-1#ping 198.51.100.1 source 203.0.113.1 Type escape sequence to abort. Sending 5, 100-byte ICMP Echos to 198.51.100.1, timeout is 2 seconds: Packet sent with a source address of 203.0.113.1 !!!!! Success rate is 100 percent (5/5), round-trip min/avg/max = 1/1/2 ms
We’re setting the source as only the public prefixes are advertised into BGP. The private CG-NAT prefixes seen in the traceroute are the transit links responding along the path.
You’ll also notice that the return path, the upload direction, utilizes a different path to 192.0.2.1. We’ll come back to this further down.
Path with AS-PATH prepending
Lets look at what almost always comes as the first recommendation: AS-PATH prepending. In our use case we’ll take this approach and prepend 5 times on CUSTOMER-1-RTR-3.
CUSTOMER-1-RTR-3#show run | sec route-map neighbor 100.124.0.1 route-map PREPEND out route-map PREPEND permit 10 set as-path prepend 65000 65000 65000 65000 65000
This results in ISP-3-RTR-1 receiving the prefix with 65000 in the AS-PATH 6 times. As all of the other route attributes are default the BGP best path algorithm makes it to comparing AS-PATH where shorter is better. Well be using cisco’s best path algorithm as the reference:
highest weight
highest local-preference
locally originated
shortest AS-PATH
prefer path with lowest origin type
prefer path with lowest MED
prefer eBGP over iBGP
prefer path with lowest IGP metric to the next-hop
determine if multipath needs installation
oldest route
lowest router-id
minimum cluster list length
prefer lowest neighbor address
This means that the path via ISP-2-RTR-1 is now better for 203.0.113.0/24 as you can see in the output below.
ISP-3-RTR-1#show ip bgp BGP table version is 8, local router ID is 100.124.0.1 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, x best-external, a additional-path, c RIB-compressed, Origin codes: i - IGP, e - EGP, ? - incomplete RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
<<clipped>>
* 203.0.113.0 100.123.0.2 0 65100 65200 65000 i *> 100.121.0.1 0 65200 65000 i * 100.124.0.2 0 65000 65000 65000 65000 65000 65000 i
Now running a traceroute from ISP-4 it appears as everything has been achieved.
ISP-4-RTR-1#traceroute 203.0.113.1 source 198.51.100.1 Type escape sequence to abort. Tracing the route to 203.0.113.1 VRF info: (vrf in name/id, vrf out name/id) 1 100.120.0.1 1 msec 1 msec 1 msec 2 100.121.0.1 1 msec 1 msec 1 msec 3 100.125.0.2 1 msec 1 msec 2 msec 4 100.126.0.2 1 msec * 2 msec ISP-4-RTR-1#
However, our outbound traffic hasn’t changed.
CUSTOMER-1-RTR-1#traceroute 198.51.100.1 source 203.0.113.1 Type escape sequence to abort. Tracing the route to 198.51.100.1 VRF info: (vrf in name/id, vrf out name/id) 1 100.126.0.9 2 msec 0 msec 1 msec 2 100.124.0.1 1 msec 1 msec 1 msec 3 100.120.0.2 2 msec * 1 msec
Most times I see people modify the metric to the next hop to get this behavior to change. Take notice that this is pretty far down the best path selection process. So lets raise the cost on the link from CUSTOMER-1-RTR-1 to CUSTOMER-1-RTR-3.
CUSTOMER-1-RTR-1#traceroute 198.51.100.1 source 203.0.113.1 Type escape sequence to abort. Tracing the route to 198.51.100.1 VRF info: (vrf in name/id, vrf out name/id) 1 100.126.0.1 1 msec 0 msec 0 msec 2 100.126.0.18 1 msec 0 msec 0 msec 3 100.124.0.1 1 msec 1 msec 2 msec 4 100.120.0.2 2 msec * 2 msec
CUSTOMER-1-RTR-1#show run int g0/1 Building configuration...
Current configuration : 155 bytes ! interface GigabitEthernet0/1 ip address 100.126.0.10 255.255.255.248 ip ospf 1 area 0 ip ospf cost 100
Perfect now we ingress and egress the same router.
Something a lot of providers due to raise the local-preference on routes received from customers. This makes these routes preferred in their AS over the paths received from transit and peers. As you saw local-preference is higher in the BGP best path selection process.
ISP-3 is one of those providers. They set LP to be 120 on their routes received from customers and leave it at a default of 100 for peers (ISP-2 and ISP-1). What happens now?
ISP-3-RTR-1#show ip bgp BGP table version is 10, local router ID is 100.124.0.1 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, x best-external, a additional-path, c RIB-compressed, Origin codes: i - IGP, e - EGP, ? - incomplete RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path
<<clipped>>
* 203.0.113.0 100.123.0.2 0 65100 65200 65000 i * 100.121.0.1 0 65200 65000 i *> 100.124.0.2 120 0 65000 65000 65000 65000 65000 65000 i
The best path is now the path with all of our AS-PATH prepends! This is because LP is further up in the BGP best path selection so the router doesn’t need AS-PATH length to determine the best path. The LP was 120 on one path and 100, default, on the other so it selected higher as better. Now, we’re back to where we started with the question of how do we influence return traffic to our AS?
ISP-4-RTR-1#traceroute 203.0.113.1 source 198.51.100.1 Type escape sequence to abort. Tracing the route to 203.0.113.1 VRF info: (vrf in name/id, vrf out name/id) 1 100.120.0.1 1 msec 0 msec 0 msec 2 100.124.0.2 1 msec 1 msec 1 msec 3 100.126.0.10 2 msec * 1 msec
customer BGP TE communities
Typically, the providers that do something similar to above offer their customers TE communities. You can send them a community to influence how they treat your traffic.
You may have to ask them for these values or it might be published publicly. A large listing can be found here, but verify before usage it is not an inclusive list of all vendors and I can’t speak to how up to date it is.
ISP-3 supports these and if you send 65300:80 they’ll set the local-preference on the routes received with this community to 80.
ip bgp-community new-format ip community-list standard SET-LP-80 permit 65300:80 ! route-map FROM-CUSTOMER permit 10 match community SET-LP-80 set local-preference 80 !
route-map TO-INET permit 10 set community 65300:80 !
The result is now that ISP-3 offloads all traffic destined to the customer through ISP-2 or ISP-1 because the local-preference for the same route received from these peers is higher.
ISP-3-RTR-1#show ip bgp BGP table version is 12, local router ID is 100.124.0.1 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, x best-external, a additional-path, c RIB-compressed, Origin codes: i - IGP, e - EGP, ? - incomplete RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path * 100.127.2.1/32 100.124.0.2 80 0 65000 65200 i * 100.123.0.2 0 65100 65200 i *> 100.121.0.1 0 0 65200 i *> 100.127.3.1/32 0.0.0.0 0 32768 i * 192.0.2.0 100.121.0.1 0 65200 65100 i *> 100.123.0.2 0 0 65100 i *> 198.51.100.0 100.120.0.2 0 0 65400 i * 203.0.113.0 100.124.0.2 80 0 65000 i * 100.123.0.2 0 65100 65200 65000 i *> 100.121.0.1 0 65200 65000 i
ISP-4-RTR-1#traceroute 203.0.113.1 source 198.51.100.1 Type escape sequence to abort. Tracing the route to 203.0.113.1 VRF info: (vrf in name/id, vrf out name/id) 1 100.120.0.1 0 msec 1 msec 1 msec 2 100.121.0.1 1 msec 1 msec 1 msec 3 100.125.0.2 1 msec 1 msec 1 msec 4 100.126.0.2 1 msec * 2 msec
Everything is back to how we expect.
However, to avoid having to make one off changes in the IGP metrics lets utilize local-preference on received routes as well. We’ll set the metric back to default on the IGP and move up the BGP best path selection algorithm by using LP. We will raise the LP to 120 on the routes from ISP-2.
CUSTOMER-1-RTR-2#show run | sec route-map neighbor 100.125.0.1 route-map FROM-INET in route-map FROM-INET permit 10 set local-preference 120
CUSTOMER-1-RTR-1#show ip bgp BGP table version is 16, local router ID is 100.127.0.0 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal, r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter, x best-external, a additional-path, c RIB-compressed, Origin codes: i - IGP, e - EGP, ? - incomplete RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path *>i 100.127.2.1/32 100.127.0.1 0 120 0 65200 i *>i 100.127.3.1/32 100.127.0.1 0 120 0 65200 65300 i *>i 192.0.2.0 100.127.0.1 0 120 0 65200 65100 i *>i 198.51.100.0 100.127.0.1 0 120 0 65200 65300 65400 i *> 203.0.113.0 0.0.0.0 0 32768 i
Now the best path is always in and out CUSTOMER-1-RTR-2, as desired, as long as the peering to ISP-2 is up.
If you’re trying to influence traffic or need help implementing a customer BGP TE community scheme reach out to us at iparchitechs.
Recently, we recorded a webinar to explain a design concept frequently used by iparchitechs.com to build and migrate WISP, FISP and Telco networks – separation of network functions. It centers around simplification of roles within an ISP network. It also explores the use of lower-cost commodity network equipment to maximize the service area for a given ISP footprint while meeting key requirements like scale, redundancy and capacity.
Topics that were covered include:
What are network functions?
Design examples for WISP/FISP and Telco
Equipment and budget considerations
Here is an example of solving design/operational issues with network function separation:
During the first networking field day: service provider one of the big topics was EVPN versus VPLS. Arista has put a lot of work into their EVPN deployment and this has give then a ton of success in the data center. However, a large portion of the provider space, especially last mile providers, rely on VPLS heavily. This naturally led to discussion about Arista VPLS support.
I’m pleased to see that there is now basic support in EOS as of EOS 4.27.2F and more on the roadmap. Hopefully, we’ll see the off ramp, RFC8560, from VPLS to EVPN which was a hot button topic throughout the week.
In the release notes for EOS 4.27.2F it calls our basic VPLS support. So I took a look. Reviewing the new 4.27.2F manual I found support for LDP PWs on RFC4447 which is virtual private wire support. This also appeared to be in EOS 4.26 but not earlier. Thanks to Arista for providing more documentation on their support for RFC4762 – LDP signaled VPLS.
In the meantime lets review how this works:
mpls ip
!
mpls ldp
router-id 100.127.0.3
transport-address interface Loopback0
no shutdown
!
pseudowires
pseudowire TEST-PW
neighbor 100.127.0.1
pseudowire-id 1
mtu 1500
!
patch panel
patch TEST
!
patch TEST-PW-PATCH
connector 1 pseudowire ldp TEST-PW
connector 2 interface Ethernet3
!
You have to define the end point for the LDP signaling in the LDP configuration. The configuration requires an endpoint (neighbor), pseudowire-id, and mtu. Without all three of these the PW won’t establish.
Then tie the port you want to use the PW with a patch panel connector. In this case we tied ethernet3 to PW TEST-PW.
Everything that comes in on Ethernet3 will be pushed into the PW and on to the endpoint. Let’s verify that the signaling mechanism works:
arista-11#show patch panel detail
PW Fault Legend:
ET-IN - Ethernet receive fault
ET-OUT - Ethernet transmit fault
TUN-IN - Tunnel receive fault
TUN-OUT - Tunnel transmit fault
NF - Pseudowire not forwarding (other reason)
Patch: TEST, Status: Down, Last change: 0:26:17 ago
Patch: TEST-PW-PATCH, Status: Up, Last change: 16:35:05 ago
Connector 1: LDP neighbor 100.127.0.1 PW ID 1
Status: Up
Local MPLS label: 116384, Group ID: 0x0
MTU: 1500, 802.1Q VLAN request sent: -
Flow label capability: none
Supported VCCV CV types: LSP ping
Supported VCCV CC types: Router alert label
Neighbor MPLS label: 116384, Group ID: 0x0
MTU: 1500, 802.1Q VLAN request received: -
Flow label capability: none
Supported VCCV CV types: LSP ping
Supported VCCV CC types: Router alert label
PW type: 5 (raw), Control word: N
Flow label used: no
Tunnel type: LDP, Tunnel index: 1
Connector 2: Ethernet3
Status: Up
CE-1#ping 172.16.0.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.0.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 14/18/22 ms
Now we have a functional layer 2 link between the distance CEs.
An important note if you want to put this into production is you have to use service routing protocols model multi-agent which requires a reboot of your devices.
There are also some restrictions in vlan translation/passing which I will explore in a future post. Now let’s check out the basic configuration.
After reviewing the documents for LDP signaled VPLS we built the topology above. All 3 PEs are in the same mesh so the 3 CE routers are all layer 2 adjacent.
I probably made every mistake you could as I started building this but the CLI is pretty helpful in what is wrong.
arista-11#show vpls
VPLS: TEST-VPLS
VLAN: 10, 802.1Q tag: -
MAC withdrawal trigger for local interface going down: Y
Pseudowire group: MESH, split-horizon
MAC withdrawal trigger on pseudowire failure: N
MAC withdrawal propagation: locally triggered
LDP neighbor 100.127.0.1 PW ID 1 PW name ARISTA-10
Status: No remote, Interface: Pseudowire3.0
LDP neighbor 100.127.0.4 PW ID 1 PW name ARISTA-13
Status: CLI incomplete
I originally missed specifying the MTU on the CLI so it told me my configuration was incomplete. I thought this was pretty neat as it prevented me from going down a bunch of different paths to determine why my original build was broken.
arista-13#show vpls
VPLS: TEST-VPLS
VLAN: 10, 802.1Q tag: -
MAC withdrawal trigger for local interface going down: Y
Pseudowire group: MESH, split-horizon
MAC withdrawal trigger on pseudowire failure: N
MAC withdrawal propagation: locally triggered
LDP neighbor 100.127.0.1 PW ID 1 PW name ARISTA-10
Status: Up, Interface: Pseudowire1.0
LDP neighbor 100.127.0.3 PW ID 1 PW name ARISTA-11
Status: Up, Interface: Pseudowire2.0
CE-3#ping 172.16.0.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.0.2, timeout is 2 seconds:
.!!!!
Success rate is 80 percent (4/5), round-trip min/avg/max = 11/14/17 ms
CE-3#ping 172.16.0.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.0.1, timeout is 2 seconds:
.!!!!
Success rate is 80 percent (4/5), round-trip min/avg/max = 11/12/14 ms
CE-3#
If you need help with your deployment reach out to us at IP architechs.
It’s been a while since we started work on one of our newest projects. We have been trying to solve a problem in app location. It all came from the notion that Little Caesars know where my pizza is, so why can’t the network resolve where the app is? We also thought it would be novel use of Anycast because the app can be anywhere.
So, what problems specifically have we solved using this design? Intent based gateways are a signaling mechanism allows the apps to be delivered along with the pizza. As we can see app Buffalo Wings can reach both the intent based gateway and Fried Pickles using TI-LFA, which strips the fat bits before they reach the gateway. Our unique caching solution using Tupperware, which are stacked in K8s, allows for the apps to be delivered in a bursty nexthop specific competitive manner. This has proven to keep the apps warm within the physical layer.
In our example, the Delivery Center Interconnect, we are doing an east to west Multi Pizza Layered Service that can drop the apps with full BTU into any of the regions. The apps are unaware of the topping layer and rely on layer 2 media skipping protocol to travel between regions. As you can see, pineapple pizza is restricted to the west security zone because its traffic is otherwise discarded. Flows between Brooklyn and deep dish can be maintained because the overlay is based on ZeroTabasco.
A full mesh is achieved with the IGulP maintained by IS-IS. In this example a choice of which app is being forced so FFR can redirect service when the apps fail to reach the intent based gateway
A few weeks ago, we recorded a webinar on deploying IPv6 for WISPs and FISPs. As IPv6 adoption continues to climb, developing an IPv6 strategy for design, deployment and system integration is an important step before subscribers begin asking for IPv6.
Some of the topics that were covered include:
IPv6 basics – addressing, subnetting, types
IPv6 design and deployment
IPv6 systems and operations
Here is an example of getting started with IPv6 deployment at the border of the ASN
This post has been a while in the making and follows up on an article about BGP communities that can be found here. Then we followed it up with some more discussion about FW design and place, or lack there of, on this podcast which inspired me to finish up “part 2”.
Anyone who has ever had to run active/active data centers and has come across this problem of how do I manage state?
You can ignore it and prepare yourself for a late night at the worst time.
Take everyone’s word that systems will never have to talk to the a system in a different security zone in the remote DC
Utilize communities and BGP policy to manage state; which we’ll focus on here
One of the biggest reasons we see for stretching a virtual routing and forwarding (vrf) is to move DC to DC flows of the same security zone below FWs. This reduces the load on the firewall and makes for easier rule management. However, it does introduce a state problem.
We’ll be using the smallest EVPN-multisite deployment you’ve ever seen with Nexus 9000v and Fortinet FWs.
Inter vrf intra data center
The first flow we’ll look at it is transitioning vrfs in the same data center. In this example and all work going forward vrf Blue is allowed to initiate to vrf Orange. However, vrf Orange cannot initiate communication to vrf Blue.
Assuming your firewall rules are correct this “just works” and is no different than running your standard deployment.
vrf-BLUE-1#ping 192.168.10.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.10.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 4/5/8 ms
initial request
dc1-leaf-1# show ip route 192.168.10.0/24 vrf BLUE
IP Route Table for VRF "BLUE"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
192.168.10.0/24, ubest/mbest: 1/0
*via 172.16.0.1, [20/0], 17:29:08, bgp-65100, external, tag 65110
Fortinet-1 routing table
return traffic
dc1-leaf-1# show ip route 192.168.1.0/24 vrf ORANGE
IP Route Table for VRF "ORANGE"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
192.168.1.0/24, ubest/mbest: 1/0
*via 172.16.0.5, [20/0], 17:30:21, bgp-65100, external, tag 65110
Inter DC intra vrf flow
Here is the flow that normal starts this conversation. There is a desire to move same security zone flows and/or large traffic flows (replication) between DCs below FWs. This can reduce load on the FWs and make rulesets easier to manage since you don’t have to write a lot of exceptions for inbound flows on your untrusted interface.
vrf-BLUE-1#ping 192.168.2.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.2.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 16/18/23 ms
Initial request
dc1-leaf-1# show ip route 192.168.2.0/24 vrf BLUE
IP Route Table for VRF "BLUE"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
192.168.2.0/24, ubest/mbest: 1/0
*via 100.127.0.255%default, [200/1], 19:24:36, bgp-65100, internal, tag 6520
0, segid: 3003000 tunnelid: 0x647f00ff encap: VXLAN
Since we utilized EVPN-Multisite to extend the vrfs between DCs (to be covered in a later blog) the first stop is the border gateway. This is abstracted on the flow diagram but can be seen on the original BGP layout.
dc1-border-leaf-1# show ip route 192.168.2.0/24 vrf BLUE
IP Route Table for VRF "BLUE"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
192.168.2.0/24, ubest/mbest: 1/0
*via 100.127.1.255%default, [20/1], 19:30:27, bgp-65100, external, tag 65200
, segid: 3003000 tunnelid: 0x647f01ff encap: VXLAN
dc2-border-leaf-1# show ip route 192.168.2.0/24 vrf BLUE
IP Route Table for VRF "BLUE"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
192.168.2.0/24, ubest/mbest: 1/0
*via 100.127.1.2%default, [200/0], 19:30:59, bgp-65200, internal, tag 65200,
segid: 3003000 tunnelid: 0x647f0102 encap: VXLAN
dc2-leaf-1# show ip route 192.168.2.0/24 vrf BLUE
IP Route Table for VRF "BLUE"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
192.168.2.0/24, ubest/mbest: 1/0, attached
*via 192.168.2.1, Vlan2000, [0/0], 20:00:30, direct, tag 3000
This traffic never reaches the FW on the way there and the same behavior happens on the return path. I’m not going to show every hop on the way as it’s identical but in reverse.
Intra vrf Intra DC
Here is the flow that causes a problem. When you change vrfs and change DCs without any other considerations there is an asymmetric path which introduces a state problem. After defining and analyzing the problem here we’ll walk through a solution.
vrf-BLUE-1#ping 192.168.20.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.20.2, timeout is 2 seconds:
.....
Success rate is 0 percent (0/5)
Initial request
dc1-leaf-1# show ip route 192.168.20.0/24 vrf BLUE
IP Route Table for VRF "BLUE"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
192.168.20.0/24, ubest/mbest: 1/0
*via 172.16.0.1, [20/0], 17:50:50, bgp-65100, external, tag 65110
Fortinet-1 routing table
vrf change has occurred and we’re now in vrf Orange after starting in vrf Blue
dc1-leaf-1# show ip route 192.168.20.0/24 vrf ORANGE
IP Route Table for VRF "ORANGE"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
192.168.20.0/24, ubest/mbest: 1/0
*via 100.127.0.255%default, [200/1], 18:54:34, bgp-65100, internal, tag 6520
0, segid: 3003001 tunnelid: 0x647f00ff encap: VXLAN
we’re going to skip the border gateways as nothing excited happens there.
dc2-leaf-1# show ip route 192.168.20.0/24 vrf ORANGE
IP Route Table for VRF "ORANGE"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
192.168.20.0/24, ubest/mbest: 1/0, attached
*via 192.168.20.1, Vlan2001, [0/0], 18:58:30, direct, tag 3001
Now we hit the connected route on dc2-leaf-1 as we expected. Remember that we initiated state on fortinet-1.
Return traffic
Okay, now that we made it to vrf-ORANGE-2 what happens to the return traffic.
dc2-leaf-1# show ip route 192.168.1.0/24 vrf ORANGE
IP Route Table for VRF "ORANGE"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
192.168.1.0/24, ubest/mbest: 1/0
*via 172.16.1.5, [20/0], 17:47:05, bgp-65200, external, tag 65210
Fortinet-2 routing table
The first thing that the return traffic does is try to switch vrf’s back to vrf BLUE. However, fortinet-2 doesn’t have state for this flow. Since vrf-ORANGE can’t initiate communication with vrf-BLUE and there is no state in fortinet-2 the traffic is dropped on the default rule.
The Solution
The first thing we’re going to do is set a community on generation of the type-5 route. This is done by matching a tag of the $L3VNI-VLAN-ID and setting a community of $ASN:$L3VNI-VLAN-ID.
vlan 2000
name BLUE-DATA
vn-segment 2002000
vlan 2001
name ORANGE-DATA
vn-segment 2002001
vlan 3000
name VRF-BLUE
vn-segment 3003000
vlan 3001
name VRF-ORANGE
vn-segment 3003001
route-map RM-CON-BLUE permit 10
match tag 3000
set community 65100:3000
route-map RM-CON-ORANGE permit 10
match tag 3001 set community 65100:3001
vrf context BLUE
vni 3003000
rd auto
address-family ipv4 unicast
route-target both auto
route-target both auto evpn
vrf context ORANGE
vni 3003001
rd auto
address-family ipv4 unicast
route-target both auto
route-target both auto evpn
interface Vlan2000
no shutdown
vrf member BLUE
ip address 192.168.1.1/24 tag 3000
fabric forwarding mode anycast-gateway
interface Vlan2001
no shutdown
vrf member ORANGE
ip address 192.168.10.1/24 tag 3001
fabric forwarding mode anycast-gateway
interface Vlan3000
no shutdown
vrf member BLUE
ip forward
interface Vlan3001
no shutdown
vrf member ORANGE
ip forward
By setting the logic correctly we can force the traffic to always utilize the FW from the datacenter it originated from.
dc1-leaf-1# show run rpm
!Command: show running-config rpm
!Running configuration last done at: Sun Mar 20 15:08:38 2022
!Time: Sun Mar 20 15:43:29 2022
version 9.3(3) Bios:version
ip community-list standard DC1-BLUE-CL seq 10 permit 65100:3000
ip community-list standard DC1-ORANGE-CL seq 10 permit 65100:3001
ip community-list standard DC2-BLUE-CL seq 10 permit 65200:3000
ip community-list standard DC2-ORANGE-CL seq 10 permit 65200:3001
route-map BLUE-TO-FW-IN permit 10
match community DC1-ORANGE-CL
route-map BLUE-TO-FW-IN permit 20
match community DC2-ORANGE-CL
set local-preference 120
route-map BLUE-TO-FW-OUT permit 10
match community DC1-BLUE-CL DC2-BLUE-CL
route-map ORANGE-TO-FW-IN permit 10
match community DC1-BLUE-CL
route-map ORANGE-TO-FW-IN permit 20
match community DC2-BLUE-CL DC2-ORANGE-CL
set local-preference 80
route-map ORANGE-TO-FW-OUT permit 10
match community DC1-ORANGE-CL DC2-ORANGE-CL
route-map RM-CON-BLUE permit 10
match tag 3000
set community 65100:3000
route-map RM-CON-ORANGE permit 10
match tag 3001
set community 65100:3001
dc1-leaf-1# show run bgp
!Command: show running-config bgp
!Running configuration last done at: Sun Mar 20 15:08:38 2022
!Time: Sun Mar 20 15:44:05 2022
version 9.3(3) Bios:version
feature bgp
router bgp 65100
neighbor 100.127.0.0
remote-as 65100
update-source loopback0
address-family l2vpn evpn
send-community extended
vrf BLUE
address-family ipv4 unicast
advertise l2vpn evpn
redistribute direct route-map RM-CON-BLUE
neighbor 172.16.0.1
remote-as 65110
address-family ipv4 unicast
send-community
route-map BLUE-TO-FW-IN in route-map BLUE-TO-FW-OUT out
vrf ORANGE
address-family ipv4 unicast
redistribute direct route-map RM-CON-ORANGE
neighbor 172.16.0.5
remote-as 65110
address-family ipv4 unicast
send-community
route-map ORANGE-TO-FW-IN in
route-map ORANGE-TO-FW-OUT out
dc2-leaf-1# show run rpm
!Command: show running-config rpm
!Running configuration last done at: Sun Mar 20 15:13:30 2022
!Time: Sun Mar 20 15:45:25 2022
version 9.3(3) Bios:version
ip community-list standard DC1-BLUE-CL seq 10 permit 65100:3000
ip community-list standard DC1-ORANGE-CL seq 10 permit 65100:3001
ip community-list standard DC2-BLUE-CL seq 10 permit 65200:3000
ip community-list standard DC2-ORANGE-CL seq 10 permit 65200:3001
route-map BLUE-TO-FW-IN permit 10
match community DC2-ORANGE-CL
route-map BLUE-TO-FW-IN permit 20
match community DC1-ORANGE-CL
set local-preference 120
route-map BLUE-TO-FW-OUT permit 10
match community DC1-BLUE-CL DC2-BLUE-CL
route-map ORANGE-TO-FW-IN permit 10
match community DC2-BLUE-CL
route-map ORANGE-TO-FW-IN permit 20
match community DC1-BLUE-CL DC1-ORANGE-CL
set local-preference 80
route-map ORANGE-TO-FW-OUT permit 10
match community DC1-ORANGE-CL DC2-ORANGE-CL
route-map RM-CON-BLUE permit 10
match tag 3000
set community 65200:3000
route-map RM-CON-ORANGE permit 10
match tag 3001
set community 65200:3001
dc2-leaf-1# show run bgp
!Command: show running-config bgp
!Running configuration last done at: Sun Mar 20 15:13:30 2022
!Time: Sun Mar 20 15:45:40 2022
version 9.3(3) Bios:version
feature bgp
router bgp 65200
neighbor 100.127.1.0
remote-as 65200
update-source loopback0
address-family l2vpn evpn
send-community
send-community extended
vrf BLUE
address-family ipv4 unicast
redistribute direct route-map RM-CON-BLUE
neighbor 172.16.1.1
remote-as 65210
address-family ipv4 unicast
send-community
route-map BLUE-TO-FW-IN in
route-map BLUE-TO-FW-OUT out
vrf ORANGE
address-family ipv4 unicast
redistribute direct route-map RM-CON-ORANGE
neighbor 172.16.1.5
remote-as 65210
address-family ipv4 unicast
send-community
route-map ORANGE-TO-FW-IN in route-map ORANGE-TO-FW-OUT out
Here is the result of this implementation
vrf-BLUE-1#ping 192.168.20.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.20.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 16/18/22 ms
Lets look at the routing tables now.
dc1-leaf-1# show ip route 192.168.20.0/24 vrf BLUE
IP Route Table for VRF "BLUE"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
192.168.20.0/24, ubest/mbest: 1/0
*via 172.16.0.1, [20/0], 00:40:23, bgp-65100, external, tag 65110
fortinet-1 routing table
We changed vrfs to vrf ORANGE now.
dc1-leaf-1# show ip route 192.168.20.0/24 vrf ORANGE
IP Route Table for VRF "ORANGE"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
192.168.20.0/24, ubest/mbest: 1/0
*via 100.127.0.255%default, [200/1], 20:54:07, bgp-65100, internal, tag 6520
0, segid: 3003001 tunnelid: 0x647f00ff encap: VXLAN
again we’ll skip over the border gateways
dc2-leaf-1# show ip route 192.168.20.0/24 vrf ORANGE
IP Route Table for VRF "ORANGE"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
192.168.20.0/24, ubest/mbest: 1/0, attached
*via 192.168.20.1, Vlan2001, [0/0], 20:57:54, direct, tag 3001
Return traffic
Now the return traffic will go back to fortinet-1 where we have the original state instead of fortinet-2.
dc2-leaf-1# show ip route 192.168.1.0/24 vrf ORANGE
IP Route Table for VRF "ORANGE"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
192.168.1.0/24, ubest/mbest: 1/0
*via 100.127.1.255%default, [200/2000], 00:43:36, bgp-65200, internal, tag 6
5100, segid: 3003001 tunnelid: 0x647f01ff encap: VXLAN
skipping over the border gateways we land back at dc1-leaf-1
dc1-leaf-1# show ip route 192.168.1.0/24 vrf ORANGE
IP Route Table for VRF "ORANGE"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
192.168.1.0/24, ubest/mbest: 1/0
*via 172.16.0.5, [20/0], 19:54:44, bgp-65100, external, tag 65110
and we arrived back at fortinet-1 where we have a valid session.
switch vrfs back to vrf BLUE and hit the connected route
dc1-leaf-1# show ip route 192.168.1.0/24 vrf BLUE
IP Route Table for VRF "BLUE"
'*' denotes best ucast next-hop
'**' denotes best mcast next-hop
'[x/y]' denotes [preference/metric]
'%<string>' in via output denotes VRF <string>
192.168.1.0/24, ubest/mbest: 1/0, attached
*via 192.168.1.1, Vlan2000, [0/0], 22:19:11, direct, tag 3000
Conclusion
That was a lot of work to meet the goal of utilizing both data centers, allowing vrf to vrf communication below firewalls, and not breaking state.
However, it is manageable. It also gives a few other benefits such as:
being able to take an entire DCs firewall stack offline and not losing connectivity.
less load on FWs
less FW rule complexity
But with this comes increased routing complexity. So as always there are tradeoffs! Make sure you analyze them against your business needs before proceeding.
If you’d like to know more or need help with that contact us at IP Architechs.