The BGP implementation in Junos is event-driven while in IOS is
timer based and require that the scan process goes trough the BGP
Table and select the best path to put into the RIB. The BGP scan-time
command control this interval, with a default value of 60 sec.
In a large scale bgp scenario where usually route reflectors are
involved, this mean that in the worst case the convergence time can
be up to 120 sec because the route-reflector convergence and bgp
update is required before the client can have a consistent BGP table
and compute the new best path updating the RIB.
This is because Route Reflectors distribute to the clients only
the best path.
In layer-3 MPLS VPN scenario this problem is solved using
different Route-Distinguisher that create not comparable entry on
route reflectors allowing reflection of both routes to all clients.
This action moves the best-path selection process to all clients,
eliminating the intermediate covergence step of route-reflectors.
But how to solve the problem in global routing table ?
Different approach are proposed, and a wonderful discussion can be
reached here:
http://blog.ine.com/2010/11/22/understanding-bgp-convergence/
I already use the Add-Path (
http://tools.ietf.org/html/draft-ietf-idr-add-paths-07 ) extension
that permit multiple next-hop for the same prefix, this allows
load-balancing in addition to the fast convergence due to the direct
next-hop tracking, but this approach require the support of this new
bgp capability and usually MPLS encapsulation on the backbone to
prevent ip lookups and possible routing loops on transit nodes.
BGP Diverse-Path (
http://tools.ietf.org/html/draft-ietf-grow-diverse-bgp-path-dist-08 )
it's not a new capability, but comes from the knowledge of the
topology and uses existing attributes of a typical RR BGB Cluster.
One cluster member are selected as a "shadow"
route-reflector and instead of reflect the best path ( that is
reflected by the others route reflector in the cluster ) it's
announce the backup path to his clients. It's also important to note
that like all other routers in the backbone, it still install the
best path into it's own RIB for traffic forwarding.
Now all backbone routers has at least two iBGP peering session to
the RR Cluster, the first to the regular route-reflector and the
other to the shadow RR.
BGP topology on the RR clients now contain the best and the backup
path, allowing a local calculation of the best path. This step
eliminates the need of convergence of the route-reflector, halving
the total convergence time removing the convergence requirement of the route reflector.
This behavior it's not new, and in the past was performed with an
IGP metric manipulation on the Shadow route-reflector ( because the
in these cases the tie-break for the best path selection process is
the IGP ) but now on some IOS image there is the support to build in
a simple manner this architecture.
The last step to speed up the convergence process is to eliminate
the scan time and trigger the reconvergence process to the next-hop
availability. This can be performed using the next-hop-tracking
feature that track the IGP for the next-hop reachability and trigger
an immediate reconvergence. In recent IOS version this function is
enabled by default.
Take care that having so different converging time ( from few ms
to 120 sec ) on different part of the backbone can lead to a traffic
loops and high dependence to flapping links. The development of a
fast convergence and high capacity backbone require a careful
analysis of all components ( and the possible involvement of MPLS,
LFA and TE ) and not just enabling some fancy feature.
Testing Lab
This is the complete lab scenario to test this capability:
Into the lab only IPv6 addresses form the ULA ( Unique Local
Address ) address Range are used: only one single level-2 ISIS area
with all the point-to-point internal lefts to the automatic
link-local addresses. Loopback are numbered as /128 ipv6 address and
an aggregate prefix is generated on the peering point.
The complete addressing and IGP configuration of R2 looks like:
!
interface
Loopback0
no
ip address
ipv6
address FD00::2/128
!
interface
FastEthernet0/0
no
ip address
!
interface
FastEthernet0/0.201
description
---- to R1 ----
encapsulation
dot1Q 102
ipv6
enable
ipv6
router isis
isis
network point-to-point
!
interface
FastEthernet0/0.203
description
---- to R3 ----
encapsulation
dot1Q 203
ipv6
enable
ipv6
router isis
isis
network point-to-point
!
interface
FastEthernet0/0.205
description
---- to R5 - ASN2 ----
encapsulation
dot1Q 205
ipv6
address FD00:25::2/64
!
router
isis
net
49.0000.0000.0002.00
is-type
level-2-only
metric-style
wide
no
hello padding
passive-interface
Loopback0
!
R1 is chosen as the shadow route reflectors.
To configure the BGP Diverse Path on the shadow router 4 steps are
required:
1) Disable the IGP bestpath igp-metric tie-break ( optional and
topology depended )
bgp bestpath igp-metric ignore
2) Allow the identification of the backup path
bgp additional-paths select backup
3) Permit the backup path announcement
bgp additional-paths send
4) select the route-reflection clients enabled for the update (
the Clients peer-group )
neighbor Clients advertise diverse-path backup
The complete BGP configuration of the shadow RR ( R1 )
!
router
bgp 1
bgp
router-id 100.0.0.1
bgp
cluster-id 1
bgp
log-neighbor-changes
no
bgp default ipv4-unicast
neighbor
Clients peer-group
neighbor
Clients remote-as 1
neighbor
Clients update-source Loopback0
neighbor
FD00::2 peer-group Clients
neighbor
FD00::3 peer-group Clients
neighbor
FD00::4 peer-group Clients
!
address-family
ipv4
exit-address-family
!
address-family
ipv6
bgp
additional-paths select backup
bgp
additional-paths send
bgp
bestpath igp-metric ignore
neighbor
Clients route-reflector-client
neighbor
Clients advertise diverse-path backup
neighbor
FD00::2 activate
neighbor
FD00::3 activate
neighbor
FD00::4 activate
exit-address-family
!
check the BGP status:
R1#sh
bgp all summary
...
Neighbor
V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down
State/PfxRcd
FD00::2
4 1 20 24 5 0 0 00:14:47
1
FD00::3
4 1 22 28 5 0 0 00:18:56
0
FD00::4
4 1 24 27 5 0 0 00:18:55
2
The bgp table identify the best path for for "FD00:5::/64"
trough R2 ( and install into the RIB ) and the possible "backup-path"
trough R4:
R1#sh
bgp ipv6 unicast
BGP
table version is 5, local router ID is 100.0.0.1
Status
codes: s suppressed, d damped, h history, * valid, > best, i -
internal,
r
RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
x
best-external, a additional-path, c RIB-compressed,
Origin
codes: i - IGP, e - EGP, ? - incomplete
RPKI
validation codes: V valid, I invalid, N Not found
Network
Next Hop Metric LocPrf Weight Path
*>i
FD00::/64 FD00::4 0 100 0 i
*>i
FD00:5::/64 FD00::2 0 100 0 2 i
*bi
FD00::4 0 100 0 2 i
This backup path is now sent to R3:
R1#sh
bgp ipv6 unicast neighbors FD00::3 advertised-routes
BGP
table version is 5, local router ID is 100.0.0.1
Status
codes: s suppressed, d damped, h history, * valid, > best, i -
internal,
r
RIB-failure, S Stale, m multipath, b backup-path, f RT-Filter,
x
best-external, a additional-path, c RIB-compressed,
Origin
codes: i - IGP, e - EGP, ? - incomplete
RPKI
validation codes: V valid, I invalid, N Not found
Network
Next Hop Metric LocPrf Weight Path
*>i
FD00::/64 FD00::4 0 100 0 i
*biaFD00:5::/64
FD00::4 0 100 0 2 i
Total
number of prefixes 2
on R3 the nexthop trigger is enabled with a timeout of 1 sec for
the IPv6 address-family
router
bgp 1
bgp
router-id 100.0.0.3
no
bgp default ipv4-unicast
bgp
log-neighbor-changes
neighbor
FD00::1 remote-as 1
neighbor
FD00::1 update-source Loopback0
neighbor
FD00::2 remote-as 1
neighbor
FD00::2 update-source Loopback0
!
address-family
ipv6
bgp
nextop trigger enable
bgp
nextop trigger delay 1
neighbor
FD00::1 activate
neighbor
FD00::1 activate
neighbor
FD00::1 activate
exit-address-family
!
on R3 two exit point for FD00:5::/64 are now present, and the best
path still select R2 as the primary, but the backup path is already present in the BGP table
R3#sh
bgp ipv6 unicast
BGP
table version is 8, local router ID is 100.0.0.3
Status
codes: s suppressed, d damped, h history, * valid, > best, i -
internal,
r
RIB-failure, S Stale
Origin
codes: i - IGP, e - EGP, ? - incomplete
Network
Next Hop Metric LocPrf Weight Path
*
iFD00::/64 FD00::4 0 100 0 i
*>i
FD00::4 0 100 0 i
*>iFD00:5::/64
FD00::2 0 100 0 2 i
*
i FD00::4 0 100 0 2 i
A traceroute confirm the complete path correctness:
R3#traceroute
ipv6 fd00:5::5
Type
escape sequence to abort.
Tracing
the route to FD00:5::5
1
FD00::2 12 msec 8 msec 8 msec
2
FD00:5::5 24 msec 84 msec 84 msec
As a simple test, during a continuous ping to R5 from R3, the R2
loopback was forced down, triggering the backup path selection
without any packet loss.
R3#ping
fd00:5::5 repeat 1000
Type
escape sequence to abort.
Sending
10000, 100-byte ICMP Echos to FD00:5::5, timeout is 2 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
*Mar
1 01:00:05.675: %BGP-3-NOTIFICATION: received from neighbor FD00::2
4/0 (hold time expired) 0 bytes
*Mar
1 01:00:05.675: %BGP-5-ADJCHANGE: neighbor FD00::2 Down BGP
Notification received
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Conclusion
Whit this solution the convergence time of bgp is now comparable to the IGP also with route reflectors.
BGP add-path is obviously the more powerful options but require the specific capability in most of the BGP speaker, and then recommended for new solutions, while diverse-path can help to improve the global convergent time without requiring any new capability on legacy device. MPLS is not always required for both solutions, but take my advice and adopt it always.
Feature availability:
This feature is primary available in IOS XR and recently
implemented in IOS 15.2(3)T and 15.2(4)S
The complete lab configuration are
here