lunedì 20 agosto 2012

BGP Diverse-Path for a faster convergence


The BGP implementation in Junos is event-driven while in IOS is timer based and require that the scan process goes trough the BGP Table and select the best path to put into the RIB. The BGP scan-time command control this interval, with a default value of 60 sec.
In a large scale bgp scenario where usually route reflectors are involved, this mean that in the worst case the convergence time can be up to 120 sec because the route-reflector convergence and bgp update is required before the client can have a consistent BGP table and compute the new best path updating the RIB.
This is because Route Reflectors distribute to the clients only the best path.

In layer-3 MPLS VPN scenario this problem is solved using different Route-Distinguisher that create not comparable entry on route reflectors allowing reflection of both routes to all clients. This action moves the best-path selection process to all clients, eliminating the intermediate covergence step of route-reflectors.
But how to solve the problem in global routing table ?
Different approach are proposed, and a wonderful discussion can be reached here:

http://blog.ine.com/2010/11/22/understanding-bgp-convergence/

I already use the Add-Path ( http://tools.ietf.org/html/draft-ietf-idr-add-paths-07 ) extension that permit multiple next-hop for the same prefix, this allows load-balancing in addition to the fast convergence due to the direct next-hop tracking, but this approach require the support of this new bgp capability and usually MPLS encapsulation on the backbone to prevent ip lookups and possible routing loops on transit nodes.
BGP Diverse-Path ( http://tools.ietf.org/html/draft-ietf-grow-diverse-bgp-path-dist-08 ) it's not a new capability, but comes from the knowledge of the topology and uses existing attributes of a typical RR BGB Cluster. One cluster member are selected as a "shadow" route-reflector and instead of reflect the best path ( that is reflected by the others route reflector in the cluster ) it's announce the backup path to his clients. It's also important to note that like all other routers in the backbone, it still install the best path into it's own RIB for traffic forwarding.

Now all backbone routers has at least two iBGP peering session to the RR Cluster, the first to the regular route-reflector and the other to the shadow RR.
BGP topology on the RR clients now contain the best and the backup path, allowing a local calculation of the best path. This step eliminates the need of convergence of the route-reflector, halving the total convergence time removing the convergence requirement of the route reflector.

This behavior it's not new, and in the past was performed with an IGP metric manipulation on the Shadow route-reflector ( because the in these cases the tie-break for the best path selection process is the IGP ) but now on some IOS image there is the support to build in a simple manner this architecture.
The last step to speed up the convergence process is to eliminate the scan time and trigger the reconvergence process to the next-hop availability. This can be performed using the next-hop-tracking feature that track the IGP for the next-hop reachability and trigger an immediate reconvergence. In recent IOS version this function is enabled by default.
Take care that having so different converging time ( from few ms to 120 sec ) on different part of the backbone can lead to a traffic loops and high dependence to flapping links. The development of a fast convergence and high capacity backbone require a careful analysis of all components ( and the possible involvement of MPLS, LFA and TE ) and not just enabling some fancy feature.

Testing Lab

This is the complete lab scenario to test this capability:

Into the lab only IPv6 addresses form the ULA ( Unique Local Address ) address Range are used: only one single level-2 ISIS area with all the point-to-point internal lefts to the automatic link-local addresses. Loopback are numbered as /128 ipv6 address and an aggregate prefix is generated on the peering point.
The complete addressing and IGP configuration of R2 looks like:

!
interface Loopback0
no ip address
ipv6 address FD00::2/128
!
interface FastEthernet0/0
no ip address
!
interface FastEthernet0/0.201
description ---- to R1 ----
encapsulation dot1Q 102
ipv6 enable
ipv6 router isis
isis network point-to-point
!
interface FastEthernet0/0.203
description ---- to R3 ----
encapsulation dot1Q 203
ipv6 enable
ipv6 router isis
isis network point-to-point
!
interface FastEthernet0/0.205
description ---- to R5 - ASN2 ----
encapsulation dot1Q 205
ipv6 address FD00:25::2/64
!
router isis
net 49.0000.0000.0002.00
is-type level-2-only
metric-style wide
no hello padding
passive-interface Loopback0
!


R1 is chosen as the shadow route reflectors.
To configure the BGP Diverse Path on the shadow router 4 steps are required:


1) Disable the IGP bestpath igp-metric tie-break ( optional and topology depended )
bgp bestpath igp-metric ignore
2) Allow the identification of the backup path
bgp additional-paths select backup
3) Permit the backup path announcement
bgp additional-paths send
4) select the route-reflection clients enabled for the update ( the Clients peer-group )
neighbor Clients advertise diverse-path backup


The complete BGP configuration of the shadow RR ( R1 )


!
router bgp 1
bgp router-id 100.0.0.1
bgp cluster-id 1
bgp log-neighbor-changes
no bgp default ipv4-unicast
neighbor Clients peer-group
neighbor Clients remote-as 1
neighbor Clients update-source Loopback0
neighbor FD00::2 peer-group Clients
neighbor FD00::3 peer-group Clients
neighbor FD00::4 peer-group Clients
!
address-family ipv4
exit-address-family
!
address-family ipv6
bgp additional-paths select backup
bgp additional-paths send
bgp bestpath igp-metric ignore
neighbor Clients route-reflector-client
neighbor Clients advertise diverse-path backup
neighbor FD00::2 activate
neighbor FD00::3 activate
neighbor FD00::4 activate
exit-address-family
!


check the BGP status:


R1#sh bgp all summary
...
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
FD00::2 4 1 20 24 5 0 0 00:14:47 1
FD00::3 4 1 22 28 5 0 0 00:18:56 0
FD00::4 4 1 24 27 5 0 0 00:18:55 2


The bgp table identify the best path for for "FD00:5::/64" trough R2 ( and install into the RIB ) and the possible "backup-path" trough R4:


R1#sh bgp ipv6 unicast
BGP table version is 5, local router ID is 100.0.0.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
x best-external, a additional-path, c RIB-compressed,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

Network Next Hop Metric LocPrf Weight Path
*>i FD00::/64 FD00::4 0 100 0 i
*>i FD00:5::/64 FD00::2 0 100 0 2 i
*bi FD00::4 0 100 0 2 i


This backup path is now sent to R3:

R1#sh bgp ipv6 unicast neighbors FD00::3 advertised-routes
BGP table version is 5, local router ID is 100.0.0.1
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
x best-external, a additional-path, c RIB-compressed,
Origin codes: i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

Network Next Hop Metric LocPrf Weight Path
*>i FD00::/64 FD00::4 0 100 0 i
*biaFD00:5::/64 FD00::4 0 100 0 2 i

Total number of prefixes 2


on R3 the nexthop trigger is enabled with a timeout of 1 sec for the IPv6 address-family


router bgp 1
bgp router-id 100.0.0.3
no bgp default ipv4-unicast
bgp log-neighbor-changes
neighbor FD00::1 remote-as 1
neighbor FD00::1 update-source Loopback0
neighbor FD00::2 remote-as 1
neighbor FD00::2 update-source Loopback0
!
address-family ipv6
bgp nextop trigger enable
bgp nextop trigger delay 1
neighbor FD00::1 activate
neighbor FD00::1 activate
neighbor FD00::1 activate
exit-address-family
!


on R3 two exit point for FD00:5::/64 are now present, and the best path still select R2 as the primary, but the backup path is already present in the BGP table

R3#sh bgp ipv6 unicast
BGP table version is 8, local router ID is 100.0.0.3
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete

Network Next Hop Metric LocPrf Weight Path
* iFD00::/64 FD00::4 0 100 0 i
*>i FD00::4 0 100 0 i
*>iFD00:5::/64 FD00::2 0 100 0 2 i
* i FD00::4 0 100 0 2 i


A traceroute confirm the complete path correctness:

R3#traceroute ipv6 fd00:5::5

Type escape sequence to abort.
Tracing the route to FD00:5::5

1 FD00::2 12 msec 8 msec 8 msec
2 FD00:5::5 24 msec 84 msec 84 msec


As a simple test, during a continuous ping to R5 from R3, the R2 loopback was forced down, triggering the backup path selection without any packet loss.


R3#ping fd00:5::5 repeat 1000

Type escape sequence to abort.
Sending 10000, 100-byte ICMP Echos to FD00:5::5, timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

*Mar 1 01:00:05.675: %BGP-3-NOTIFICATION: received from neighbor FD00::2 4/0 (hold time expired) 0 bytes
*Mar 1 01:00:05.675: %BGP-5-ADJCHANGE: neighbor FD00::2 Down BGP Notification received
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!



Conclusion

Whit this solution the convergence time of bgp is now comparable to the IGP also with route reflectors.
BGP add-path is obviously the more powerful options but require the specific capability in most of the BGP speaker, and then recommended for new solutions,  while diverse-path can help to improve the global convergent time without requiring any new capability on legacy device. MPLS is not always required for both solutions, but take my advice and adopt it always.

Feature availability:

This feature is primary available in IOS XR and recently implemented in IOS 15.2(3)T and 15.2(4)S

The complete lab configuration are here