On large configurations, too many threads would spawn with one thread
per loop. Therefore, threads may now run multiple loops at once. The
thread count is configurable and may be changed during run. All threads
are spawned on startup.
This change helps with memory bloating. BIRD filters need large
temporary memory blocks to store their stack and also memory management
keeps its hot page storage per-thread.
Known bugs:
* Thread autobalancing is not yet implemented.
* Low latency loops are executed together with standard loops.
If no channel is flushing, table prune doesn't walk over routes in nets
and also doesn't walk over importing channel lists. This helps to
alleviate the memory caching burdens a lot.
BIRD keeps a previous (old) configuration for the purpose of undo. The
existing code frees it after a new configuration is successfully parsed
during reconfiguration. That causes memory usage spikes as there are
temporarily three configurations (old, current, and new). The patch
changes it to free the old one before parsing the new one (as user
already requested a new config). The disadvantage is that undo is
not available after failed reconfiguration.
Add BGP channel option 'next hop prefer global' that modifies BGP
recursive next hop resolution to use global next hop IPv6 address instead
of link-local next hop IPv6 address for immediate next hop of received
routes.
By this, the requesting channels do the timers in their own loops,
avoiding unnecessary synchronization when the central timer went off.
This is of course less effective for now, yet it allows to easily
implement selective reloads in future.
These routines detect the export congestion (as defined by configurable
thresholds) and propagate the state to readers. There are no readers for
now, they will be added in following commits.
Implement BGP roles as described in RFC 9234. It is a mechanism for
route leak prevention and automatic route filtering based on common BGP
topology relationships. It defines role capability (controlled by 'local
role' option) and OTC route attribute, which is used for automatic route
filtering and leak detection.
Minor changes done by commiter.
For loops allow to iterate over elements in compound data like BGP paths
or community lists. The syntax is:
for [ <type> ] <variable> in <expr> do <command-body>
Allow variable declarations mixed with code, also in nested blocks with
proper scoping, and with variable initializers. E.g:
function fn(int a)
{
int b;
int c = 10;
if a > 20 then
{
b = 30;
int d = c * 2;
print a, b, c, d;
}
string s = "Hello";
}
Added an option for export filter to allow for prefiltering based on the
prefix. Routes outside the given prefix are completely ignored. Config
is simple:
export in <net> <filter>;
Use timer (configurable as 'gc period') to schedule routing table
GC/pruning to ensure that prune is done on time but not too often.
Randomize GC timers to avoid concentration of GC events from different
tables in one loop cycle.
Fix a bug that caused minimum inter-GC interval be 5 us instead of 5 s.
Make default 'gc period' adaptive based on number of routing tables,
from 10 s for small setups to 600 s for large ones.
In marge multi-table RS setup, the patch improved time of flushing
a downed peer from 20-30 min to <2 min and removed 40s latencies.
The route scope attribute was used for simple user route marking. As
there is a better tool for this (custom attributes), the old and limited
way can be dropped.
Add BFD protocol option 'strict bind' to use separate listening socket
for each BFD interface bound to its address instead of using shared
listening sockets.
Implement flowspec validation procedure as described in RFC 8955 sec. 6
and RFC 9117. The Validation procedure enforces that only routers in the
forwarding path for a network can originate flowspec rules for that
network.
The patch adds new mechanism for tracking inter-table dependencies, which
is necessary as the flowspec validation depends on IP routes, and flowspec
rules must be revalidated when best IP routes change.
The validation procedure is disabled by default and requires that
relevant IP table uses trie, as it uses interval queries for subnets.
Add option 'netlink rx buffer' to specify netlink socket receive buffer
size. Uses SO_RCVBUFFORCE, so it can override rmem_max limit.
Thanks to Trisha Biswas and Michal for the original patches.
We can also quite simply allocate bigger blocks. Anyway, we need these
blocks to be aligned to their size which needs one mmap() two times
bigger and then two munmap()s returning the unaligned parts.
The user can specify -B <N> on startup when <N> is the exponent of 2,
setting the block size to 2^N. On most systems, N is 12, anyway if you
know that your configuration is going to eat gigabytes of RAM, you are
almost forced to raise your block size as you may easily get into memory
fragmentation issues or you have to raise your maximum mapping count,
e.g. "sysctl vm.max_map_count=(number)".