The Kernel protocol, even with the option 'learn' enabled, ignores
direct routes created by the OS kernel (on Linux these are routes
with rtm_protocol == RTPROT_KERNEL).
Implement optional behavior where both OS kernel and third-party routes
are learned, it can be enabled by 'learn all' option.
Minor changes by committer.
This variant of logging avoids calling write() for every log line,
allowing for waitless logging. This makes heavy logging less heavy
and more useful for race condition debugging.
Methods can now be called as x.m(y), as long as x can have its type
inferred in config time. If used as a command, it modifies the object,
if used as a value, it keeps the original object intact.
Also functions add(x,y), delete(x,y), filter(x,y) and prepend(x,y) now
spit a warning and are considered deprecated.
It's also possible to call a method on a constant, see filter/test.conf
for examples like bgp_path = +empty+.prepend(1).
Inside instruction definitions (filter/f-inst.c), a METHOD_CONSTRUCTOR()
call is added, which registers the instruction as a method for the type
of its first argument. Each type has its own method symbol table and
filter parser switches between them based on the inferred type of the
object calling the method.
Also FI_CLIST_(ADD|DELETE|FILTER) instructions have been split to allow
for this method dispatch. With type inference, it's now possible.
This adds support to the Babel protocol for the RTT extension specified
in draft-ietf-babel-rtt-extension. While this extension is not yet at the
RFC stage, it is one of the more useful extensions to Babel[0], so it
seems worth having in Bird as well.
The extension adds timestamps to Hello and IHU TLVs and uses these to
compute an RTT to each neighbour. An extra per-neighbour cost is then
computed from the RTT based on a minimum and maximum interval and cost
value specified in the configuration. The primary use case for this is
improving routing in a geographically distributed tunnel-based overlay
network.
The implementation follows the babeld implementation when picking
constants and default configuration values. It also uses the same RTT
smoothing algorithm as babeld, and follows it in adding a new 'tunnel'
interface type which enables RTT by default.
[0] https://alioth-lists.debian.net/pipermail/babel-users/2022-April/003932.html
The feature of showing all prefixes inside the given one has been added
in v2.0.9 but not well documented. Fixing it by this update.
Text in doc and commit message added by commiter.
The patch implements an IPv4 via IPv6 extension (RFC 9229) to the Babel
routing protocol (RFC 8966) that allows annoncing routes to an IPv4
prefix with an IPv6 next hop, which makes it possible for IPv4 traffic
to flow through interfaces that have not been assigned an IPv4 address.
The implementation is compatible with the current Babeld version.
Thanks to Toke Høiland-Jørgensen for early review on this work.
Minor changes from committer.
On large configurations, too many threads would spawn with one thread
per loop. Therefore, threads may now run multiple loops at once. The
thread count is configurable and may be changed during run. All threads
are spawned on startup.
This change helps with memory bloating. BIRD filters need large
temporary memory blocks to store their stack and also memory management
keeps its hot page storage per-thread.
Known bugs:
* Thread autobalancing is not yet implemented.
* Low latency loops are executed together with standard loops.
If no channel is flushing, table prune doesn't walk over routes in nets
and also doesn't walk over importing channel lists. This helps to
alleviate the memory caching burdens a lot.
Add static route attribute to set onlink flag for route next hop. Can be
used to build a dynamically routed IP-in-IP overlay network. Usage:
ifname = "tunl0";
onlink = true;
gw = bgp_next_hop;
The import table does not work reliably together with re-evaluation of
routes due to recursive next hops or flowspec validation. We will at
least document that here, as import tables are completely redesigned and
this issue is fixed in BIRD 3.x branch.
The effective keepalive time now scales relative to the negotiated
hold time, to maintain proportion between the keepalive time and the
hold time. This avoids issues when both keepalive and hold times
were configured, the hold time was negotiated to a smaller value,
but the keepalive time stayed the same.
Add new options 'min hold time' and 'min keepalive time', which reject
session attempts with too small hold time.
Improve validation of config options an their documentation.
Thanks to Alexander Zubkov and Sergei Goriunov for suggestions.
BIRD keeps a previous (old) configuration for the purpose of undo. The
existing code frees it after a new configuration is successfully parsed
during reconfiguration. That causes memory usage spikes as there are
temporarily three configurations (old, current, and new). The patch
changes it to free the old one before parsing the new one (as user
already requested a new config). The disadvantage is that undo is
not available after failed reconfiguration.
Add BGP channel option 'next hop prefer global' that modifies BGP
recursive next hop resolution to use global next hop IPv6 address instead
of link-local next hop IPv6 address for immediate next hop of received
routes.
By this, the requesting channels do the timers in their own loops,
avoiding unnecessary synchronization when the central timer went off.
This is of course less effective for now, yet it allows to easily
implement selective reloads in future.
These routines detect the export congestion (as defined by configurable
thresholds) and propagate the state to readers. There are no readers for
now, they will be added in following commits.
Implement BGP roles as described in RFC 9234. It is a mechanism for
route leak prevention and automatic route filtering based on common BGP
topology relationships. It defines role capability (controlled by 'local
role' option) and OTC route attribute, which is used for automatic route
filtering and leak detection.
Minor changes done by commiter.
For loops allow to iterate over elements in compound data like BGP paths
or community lists. The syntax is:
for [ <type> ] <variable> in <expr> do <command-body>
Allow variable declarations mixed with code, also in nested blocks with
proper scoping, and with variable initializers. E.g:
function fn(int a)
{
int b;
int c = 10;
if a > 20 then
{
b = 30;
int d = c * 2;
print a, b, c, d;
}
string s = "Hello";
}
Added an option for export filter to allow for prefiltering based on the
prefix. Routes outside the given prefix are completely ignored. Config
is simple:
export in <net> <filter>;
Use timer (configurable as 'gc period') to schedule routing table
GC/pruning to ensure that prune is done on time but not too often.
Randomize GC timers to avoid concentration of GC events from different
tables in one loop cycle.
Fix a bug that caused minimum inter-GC interval be 5 us instead of 5 s.
Make default 'gc period' adaptive based on number of routing tables,
from 10 s for small setups to 600 s for large ones.
In marge multi-table RS setup, the patch improved time of flushing
a downed peer from 20-30 min to <2 min and removed 40s latencies.
The route scope attribute was used for simple user route marking. As
there is a better tool for this (custom attributes), the old and limited
way can be dropped.
Add BFD protocol option 'strict bind' to use separate listening socket
for each BFD interface bound to its address instead of using shared
listening sockets.
Implement flowspec validation procedure as described in RFC 8955 sec. 6
and RFC 9117. The Validation procedure enforces that only routers in the
forwarding path for a network can originate flowspec rules for that
network.
The patch adds new mechanism for tracking inter-table dependencies, which
is necessary as the flowspec validation depends on IP routes, and flowspec
rules must be revalidated when best IP routes change.
The validation procedure is disabled by default and requires that
relevant IP table uses trie, as it uses interval queries for subnets.
Add option 'netlink rx buffer' to specify netlink socket receive buffer
size. Uses SO_RCVBUFFORCE, so it can override rmem_max limit.
Thanks to Trisha Biswas and Michal for the original patches.
We can also quite simply allocate bigger blocks. Anyway, we need these
blocks to be aligned to their size which needs one mmap() two times
bigger and then two munmap()s returning the unaligned parts.
The user can specify -B <N> on startup when <N> is the exponent of 2,
setting the block size to 2^N. On most systems, N is 12, anyway if you
know that your configuration is going to eat gigabytes of RAM, you are
almost forced to raise your block size as you may easily get into memory
fragmentation issues or you have to raise your maximum mapping count,
e.g. "sysctl vm.max_map_count=(number)".