The BGP implementation in Junos is event-driven while in IOS is timer based and require that the scan process goes trough the BGP Table and select the best path to put into the RIB. The BGP scan-time command control this interval, with a default value of 60 sec.
In a large scale bgp scenario where usually route reflectors are involved, this mean that in the worst case the convergence time can be up to 120 sec because the route-reflector convergence and bgp update is required before the client can have a consistent BGP table and compute the new best path updating the RIB.
This is because Route Reflectors distribute to the clients only the best path.
In layer-3 MPLS VPN scenario this problem is solved using different Route-Distinguisher that create not comparable entry on route reflectors allowing reflection of both routes to all clients. This action moves the best-path selection process to all clients, eliminating the intermediate covergence step of route-reflectors.
But how to solve the problem in global routing table ?
Different approach are proposed, and a wonderful discussion can be reached here:
http://blog.ine.com/2010/11/22/understanding-bgp-convergence/
I already use the Add-Path ( http://tools.ietf.org/html/draft-ietf-idr-add-paths-07 ) extension that permit multiple next-hop for the same prefix, this allows load-balancing in addition to the fast convergence due to the direct next-hop tracking, but this approach require the support of this new bgp capability and usually MPLS encapsulation on the backbone to prevent ip lookups and possible routing loops on transit nodes.
BGP Diverse-Path ( http://tools.ietf.org/html/draft-ietf-grow-diverse-bgp-path-dist-08 ) it's not a new capability, but comes from the knowledge of the topology and uses existing attributes of a typical RR BGB Cluster. One cluster member are selected as a "shadow" route-reflector and instead of reflect the best path ( that is reflected by the others route reflector in the cluster ) it's announce the backup path to his clients. It's also important to note that like all other routers in the backbone, it still install the best path into it's own RIB for traffic forwarding.
Now all backbone routers has at least two iBGP peering session to the RR Cluster, the first to the regular route-reflector and the other to the shadow RR.
BGP topology on the RR clients now contain the best and the backup path, allowing a local calculation of the best path. This step eliminates the need of convergence of the route-reflector, halving the total convergence time removing the convergence requirement of the route reflector.
This behavior it's not new, and in the past was performed with an
IGP metric manipulation on the Shadow route-reflector ( because the
in these cases the tie-break for the best path selection process is
the IGP ) but now on some IOS image there is the support to build in
a simple manner this architecture.
The last step to speed up the convergence process is to eliminate
the scan time and trigger the reconvergence process to the next-hop
availability. This can be performed using the next-hop-tracking
feature that track the IGP for the next-hop reachability and trigger
an immediate reconvergence. In recent IOS version this function is
enabled by default.
Take care that having so different converging time ( from few ms
to 120 sec ) on different part of the backbone can lead to a traffic
loops and high dependence to flapping links. The development of a
fast convergence and high capacity backbone require a careful
analysis of all components ( and the possible involvement of MPLS,
LFA and TE ) and not just enabling some fancy feature.
Testing Lab
This is the complete lab scenario to test this capability:
The complete addressing and IGP configuration of R2 looks like:
!
interface
Loopback0
no
ip address
ipv6
address FD00::2/128
!
interface
FastEthernet0/0
no
ip address
!
interface
FastEthernet0/0.201
description
---- to R1 ----
encapsulation
dot1Q 102
ipv6
enable
ipv6
router isis
isis
network point-to-point
!
interface
FastEthernet0/0.203
description
---- to R3 ----
encapsulation
dot1Q 203
ipv6
enable
ipv6
router isis
isis
network point-to-point
!
interface
FastEthernet0/0.205
description
---- to R5 - ASN2 ----
encapsulation
dot1Q 205
ipv6
address FD00:25::2/64
!
router
isis
net
49.0000.0000.0002.00
is-type
level-2-only
metric-style
wide
no
hello padding
passive-interface
Loopback0
!
R1 is chosen as the shadow route reflectors.
To configure the BGP Diverse Path on the shadow router 4 steps are required:
1) Disable the IGP bestpath igp-metric tie-break ( optional and topology depended )
bgp bestpath igp-metric ignore
2) Allow the identification of the backup path
bgp additional-paths select backup
3) Permit the backup path announcement
bgp additional-paths send
4) select the route-reflection clients enabled for the update ( the Clients peer-group )
neighbor Clients advertise diverse-path backup
The complete BGP configuration of the shadow RR ( R1 )
!
router
bgp 1
bgp
router-id 100.0.0.1
bgp
cluster-id 1
bgp
log-neighbor-changes
no
bgp default ipv4-unicast
neighbor
Clients peer-group
neighbor
Clients remote-as 1
neighbor
Clients update-source Loopback0
neighbor
FD00::2 peer-group Clients
neighbor
FD00::3 peer-group Clients
neighbor
FD00::4 peer-group Clients
!
address-family
ipv4
exit-address-family
!
address-family
ipv6
bgp
additional-paths select backup
bgp
additional-paths send
bgp
bestpath igp-metric ignore
neighbor
Clients route-reflector-client
neighbor
Clients advertise diverse-path backup
neighbor
FD00::2 activate
neighbor
FD00::3 activate
neighbor
FD00::4 activate
exit-address-family
!
check the BGP status:
R1#sh
bgp all summary
...
Neighbor
V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down
State/PfxRcd
FD00::2
4 1 20 24 5 0 0 00:14:47
1
FD00::3
4 1 22 28 5 0 0 00:18:56
0
FD00::4
4 1 24 27 5 0 0 00:18:55
2
The bgp table identify the best path for for "FD00:5::/64" trough R2 ( and install into the RIB ) and the possible "backup-path" trough R4:
R1#sh
bgp ipv6 unicast
BGP
table version is 5, local router ID is 100.0.0.1
Status
codes: s suppressed, d damped, h history, * valid, > best, i -
internal,
r
RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
x
best-external, a additional-path, c RIB-compressed,
Origin
codes: i - IGP, e - EGP, ? - incomplete
RPKI
validation codes: V valid, I invalid, N Not found
Network
Next Hop Metric LocPrf Weight Path
*>i
FD00::/64 FD00::4 0 100 0 i
*>i
FD00:5::/64 FD00::2 0 100 0 2 i
*bi
FD00::4 0 100 0 2 i
This backup path is now sent to R3:
R1#sh
bgp ipv6 unicast neighbors FD00::3 advertised-routes
BGP
table version is 5, local router ID is 100.0.0.1
Status
codes: s suppressed, d damped, h history, * valid, > best, i -
internal,
r
RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
x
best-external, a additional-path, c RIB-compressed,
Origin
codes: i - IGP, e - EGP, ? - incomplete
RPKI
validation codes: V valid, I invalid, N Not found
Network
Next Hop Metric LocPrf Weight Path
*>i
FD00::/64 FD00::4 0 100 0 i
*biaFD00:5::/64
FD00::4 0 100 0 2 i
Total
number of prefixes 2
on R3 the nexthop trigger is enabled with a timeout of 1 sec for the IPv6 address-family
router
bgp 1
bgp
router-id 100.0.0.3
no
bgp default ipv4-unicast
bgp
log-neighbor-changes
neighbor
FD00::1 remote-as 1
neighbor
FD00::1 update-source Loopback0
neighbor
FD00::2 remote-as 1
neighbor
FD00::2 update-source Loopback0
!
address-family
ipv6
bgp
nextop trigger enable
bgp
nextop trigger delay 1
neighbor
FD00::1 activate
neighbor
FD00::1 activate
neighbor
FD00::1 activate
exit-address-family
!
on R3 two exit point for FD00:5::/64 are now present, and the best path still select R2 as the primary, but the backup path is already present in the BGP table
R3#sh
bgp ipv6 unicast
BGP
table version is 8, local router ID is 100.0.0.3
Status
codes: s suppressed, d damped, h history, * valid, > best, i -
internal,
r
RIB-failure, S Stale
Origin
codes: i - IGP, e - EGP, ? - incomplete
Network
Next Hop Metric LocPrf Weight Path
*
iFD00::/64 FD00::4 0 100 0 i
*>i
FD00::4 0 100 0 i
*>iFD00:5::/64
FD00::2 0 100 0 2 i
*
i FD00::4 0 100 0 2 i
A traceroute confirm the complete path correctness:
R3#traceroute
ipv6 fd00:5::5
Type
escape sequence to abort.
Tracing
the route to FD00:5::5
1
FD00::2 12 msec 8 msec 8 msec
2
FD00:5::5 24 msec 84 msec 84 msec
As a simple test, during a continuous ping to R5 from R3, the R2 loopback was forced down, triggering the backup path selection without any packet loss.
R3#ping
fd00:5::5 repeat 1000
Type
escape sequence to abort.
Sending
10000, 100-byte ICMP Echos to FD00:5::5, timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
*Mar
1 01:00:05.675: %BGP-3-NOTIFICATION: received from neighbor FD00::2
4/0 (hold time expired) 0 bytes
*Mar
1 01:00:05.675: %BGP-5-ADJCHANGE: neighbor FD00::2 Down BGP
Notification received
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Conclusion
Whit this solution the convergence time of bgp is now comparable to the IGP also with route reflectors.
BGP add-path is obviously the more powerful options but require the specific capability in most of the BGP speaker, and then recommended for new solutions, while diverse-path can help to improve the global convergent time without requiring any new capability on legacy device. MPLS is not always required for both solutions, but take my advice and adopt it always.
Feature availability:
This feature is primary available in IOS XR and recently implemented in IOS 15.2(3)T and 15.2(4)S
The complete lab configuration are here
It's very clear and nice, very useful.
RispondiEliminaThanks for share this.