The L3VPN protocol implements RFC 4364 BGP/MPLS VPNs using MPLS backbone.
It works similarly to pipe. It connects IP table (one per VRF) with (global)
VPN table. Routes passed from VPN table to IP table are stripped of RD and
filtered by import targets, routes passed in the other direction are extended
with RD, MPLS labels and export targets in extended communities. A separate
MPLS channel is used to announce MPLS routes for the labels.
The new labeling policy MPLS_POLICY_VRF assigns one label to all routes
(from the same FEC map associated with one VRF), while replaces their
next hops with a lookup to a VRF table. This is useful for L3VPN
protocol.
The MPLS subsystem manages MPLS labels and handles their allocation to
MPLS-aware routing protocols. These labels are then attached to IP or VPN
routes representing label switched paths -- LSPs.
There was already a preliminary MPLS support consisting of MPLS label
net_addr, MPLS routing tables with static MPLS routes, remote labels in
next hops, and kernel protocol support.
This patch adds the MPLS domain as a basic structure representing local
label space with dynamic label allocator and configurable label ranges.
To represent LSPs, allocated local labels can be attached as route
attributes to IP or VPN routes with local labels as attributes.
There are several steps for handling LSP routes in routing protocols --
deciding to which forwarding equivalence class (FEC) the LSP route
belongs, allocating labels for new FECs, announcing MPLS routes for new
FECs, attaching labels to LSP routes. The FEC map structure implements
basic code for managing FECs in routing protocols, therefore existing
protocols can be made MPLS-aware by adding FEC map and delegating
most work related to local label management to it.
In general, private_id is sparse and protocols may want to map some
internal values directly into it. For example, L3VPN needs to
map VPN route discriminators to private_id.
OTOH, u32 is enough for global_id, as these identifiers are dense.
Add a new protocol offering route aggregation.
User can specify list of route attributes in the configuration file and
run route aggregation on the export side of the pipe protocol. Routes are
sorted and for every group of equivalent routes new route is created and
exported to the routing table. It is also possible to specify filter
which will run for every route before aggregation.
Furthermore, it will be possible to set attributes of new routes
according to attributes of the aggregated routes.
This is a work in progress.
Original work by Igor Putovny, subsequent cleanups and finalization by
Maria Matejka.
This is a backport cherry-pick of commits
165156beebcce974e8ea
from the v3.0 branch as we need symbol hashes directly inside their
scopes for more general usage than before.
Most syntactic constructs in BIRD configuration (e.g. protocol options)
are defined as keywords, which are distinct from symbols (user-defined
names for protocols, variables, ...). That may cause backwards
compatibility issue when a new feature is added, as it may collide with
existing user names.
We can allow keywords to be shadowed by symbols in almost all cases to
avoid this issue.
This replaces the previous mechanism, where shadowable symbols have to be
explictly added to kw_syms.
Nonterminal bytestring allows to provide expressions to be evaluated in
places where BYTETEXT is used now: passwords, radv custom option.
Based on the patch from Alexander Zubkov <green@qrator.net>, thanks!
- Rename BYTESTRING lexem to BYTETEXT, not to collide with 'bytestring' type name
- Add bytestring type with id T_BYTESTRING (0x2c)
- Add from_hex() filter function to create bytestring from hex string
- Add filter test cases for bytestring type
Minor changes by committer.
Despite not having defined 'master interface', VRF interfaces should be
treated as being inside respective VRFs. They behave as a loopback for
respective VRFs. Treating the VRF interface as inside the VRF allows
e.g. OSPF to pick up IP addresses defined on the VRF interface.
For this, we also need to tell apart VRF interfaces and regular interfaces.
Extend Netlink code to parse interface type and mark VRF interfaces with
IF_VRF flag.
Based on the patch from Erin Shepherd, thanks!
Now we use rt_notify() and channels for both feed and notifications,
in both import tables (pre-policy) and regular tables (post-policy).
Remove direct walk in bmp_route_monitor_snapshot().
- Manage BMP state through bmp_peer, bmp_stream, bmp_table structures
- Use channels and rt_notify() hook for route announcements
- Add support for post-policy monitoring
- Send End-of-RIB even when there is no routes
- Remove rte_update_in_notify() hook from import tables
- Update import tables to support channels
- Add bmp_hack (no feed / no flush) flag to channels
Basic fib_get() / fib_find() test for random prefixes, FIB_WALK() test,
and benchmark for fib_find(). Also generalize and reuse some code from
trie tests.
For whatever reason, parser allocated a symbol for every parsed keyword
in each scope. That wasted time and memory. The effect is worsened with
recent changes allowing local scopes, so keywords often promote soft
scopes (with no symbols) to real scopes.
Do not allocate a symbol for a keyword. Take care of keywords that could
be promoted to symbols (kw_sym) and do it explicitly.
Initial implementation of a basic subset of the BMP (BGP Monitoring
Protocol, RFC 7854) from Akamai team. Submitted for further review
and improvement.
The feature of showing all prefixes inside the given one has been added
in v2.0.9 but not well documented. Fixing it by this update.
Text in doc and commit message added by commiter.
There ware missing dependencies for proto-build.c generation, which
sometimes lead to failed builds, and ignores changes in the set of
built protocols. Fix that, and also improve formatting of proto-build.c
During backporting attribute changes from 3.0-branch, some internal
attributes (RIP iface and Babel seqno) leaked to 'show route all' output.
Allow protocols to hide specific attributes with GA_HIDDEN value.
Thanks to Nigel Kukard for the bugreport.
There were some confusion about validity and usage of pflags, which
caused incorrect usage after some flags from (now removed) protocol-
specific area were moved to pflags.
We state that pflags:
- Are secondary data used by protocol-specific hooks
- Can be changed on an existing route (in contrast to copy-on-write
for primary data)
- Are irrelevant for propagation (not propagated when changed)
- Are specific to a routing table (not propagated by pipe)
The patch did these fixes:
- Do not compare pflags in rte_same(), as they may keep cached values
like BGP_REF_STALE, causing spurious propagation.
- Initialize pflags to zero in rte_get_temp(), avoid initialization in
protocol code, fixing at least two forgotten initializations (krt
and one case in babel).
- Improve documentation about pflags
When there is a continuos stream of CLI commands, cli_get_command()
always returns 1 (there is a new command). Anyway, the socket receive
buffer was reset only when there was no command at all, leading to a
strange behavior: after a while, the CLI receive buffer came to its end,
then read() was called with zero size buffer, it returned 0 which was
interpreted as EOF.
The patch fixes that by resetting the buffer position after each command
and moving remaining data at the beginning of buffer.
Thanks to Maria Matejka for examining the bug and for the original bugfix.
When filtered routes (enabled by 'import keep filtered' option) are
updated, they trigger announcements by rte_announce(). For regular
channels (e.g. type RA_OPTIMAL or RA_ANY) such announcement is just
ignored, but in case of RA_ACCEPTED (BGP peer with 'secondary' option)
it just reannounces the old (and still valid) best route.
The patch ensures that such no-change is ignored even for these channels.
Add BGP channel option 'next hop prefer global' that modifies BGP
recursive next hop resolution to use global next hop IPv6 address instead
of link-local next hop IPv6 address for immediate next hop of received
routes.
It is useful to distinguish whehter channel config returned from
channel_config_get() was allocated new, or existing from template.
Caller may want to initialize new ones.
In principle, the channel list is a list of parent struct proto and can
contain general structures of type struct channel, That is useful e.g.
for adding MPLS channels to BGP.
In some specific configurations, it was possible to send BIRD into an
infinite loop of recursive next hop resolution. This was caused by route
priority inversion.
To prevent priority inversions affecting other next hops, we simply
refuse to resolve any next hop if the best route for the matching prefix
is recursive or any other route with the same preference is recursive.
Next hop resolution doesn't change route priority, therefore it is
perfectly OK to resolve BGP next hops e.g. by an OSPF route, yet if the
same (or covering) prefix is also announced by iBGP, by retraction of
the OSPF route we would get a possible priority inversion.
For loops allow to iterate over elements in compound data like BGP paths
or community lists. The syntax is:
for [ <type> ] <variable> in <expr> do <command-body>
When f_line is done, we have to pop the stack frame. The old code just
removed nominal number of args/vars. Change it to use stored ventry value
modified by number of returned values. This allows to allocate variables
on a stack frame during execution of f_lines instead of just at start.
But we need to know the number of returned values for a f_line. It is 1
for term, 0 for cmd. Store that to f_line during linearization.
Passing protocol to preexport was in fact a historical relic from the
old times when channels weren't a thing. Refactoring that to match
current extensibility needs.
Use timer (configurable as 'gc period') to schedule routing table
GC/pruning to ensure that prune is done on time but not too often.
Randomize GC timers to avoid concentration of GC events from different
tables in one loop cycle.
Fix a bug that caused minimum inter-GC interval be 5 us instead of 5 s.
Make default 'gc period' adaptive based on number of routing tables,
from 10 s for small setups to 600 s for large ones.
In marge multi-table RS setup, the patch improved time of flushing
a downed peer from 20-30 min to <2 min and removed 40s latencies.
The prefix hash table in BGP used the same hash function as the rtable.
When a batch of routes are exported during feed/flush to the BGP, they
all have similar hash values, so they are all crowded in a few slots in
the BGP prefix table (which is much smaller - around the size of the
batch - and uses higher bits from hash values), making it much slower due
to excessive collisions. Use a different hash function to avoid this.
Also, increase the batch size to fill 4k BGP packets and increase minimum
BGP bucket and prefix hash sizes to avoid back and forth resizing during
flushes.
This leads to order of magnitude faster flushes (on my test data).
The interface pointer was improperly converted to u32 and back. Fixing
this by explicitly allocating an adata structure for it. It's not so
memory efficient, we'll optimize this later.