Memory unmapping causes slow address space fragmentation, leading in
extreme cases to failing to allocate pages at all. Removing this problem
by keeping all the pages allocated to us, yet calling madvise() to let
kernel dispose of them.
This adds a little complexity and overhead as we have to keep the
pointers to the free pages, therefore to hold e.g. 1 GB of 4K pages with
8B pointers, we have to store 2 MB of data.
While onlink flag is meaningful only with explicit next hops, it can be
defined also on direct routes. Parse it also in this case to avoid
periodic updates of the same route.
Thanks to Marcin Saklak for the bugreport.
This is a reimplementation of commit 0f2be469f8
by Alexander Zubkov. In the master branch, changes in commit eb937358
broke setting of channel preference for alien routes learned during
scan. The preference was set only for async routes.
The original solution is extended here to accomodate for v3 specifics.
Changes in commit eb937358 broke setting of channel preference for alien
routes learned during scan. The preference was set only for async routes.
Move common attribute processing part of functions krt_learn_async() and
krt_learn_async() to a separate function to have only one place for such
changes.
Seems like the previous patch was too optimistic, as route replace is
still broken even in Linux 4.19 LTS (but fixed in Linux 5.10 LTS) for:
ip route add 2001:db8::/32 via fe80::1 dev eth0
ip route replace 2001:db8::/32 dev eth0
It ends with two routes instead of just the second.
The issue is limited to direct and special type (e.g. unreachable)
routes, the patch restricts route replace for cases when the new route
is a regular route (with a next hop address).
When IPv6 ECMP support first appeared in Linux kernel, it used different
API than IPv4 ECMP. Individual next hops were updated and announced
separately, instead of using RTA_MULTIPATH as in IPv4. This has several
drawbacks and requires complex code to merge received notifications to
one multipath route.
When Linux came with IPv6 RTA_MULTIPATH support, the initial versions
were somewhat buggy, so we kept using the old API for updates (splitting
multipath routes to sequences of route updates), while accepting both
old-style routes and RTA_MULTIPATH routes in scans / notifications.
As IPv6 RTA_MULTIPATH support is here for a long time, this patch fully
switches Netlink to the IPv6 RTA_MULTIPATH API and removes old complex
code for handling individual next hop announces.
The required Linux version is at least 4.11 for reliable operation.
Thanks to Daniel Gröber for the original patch.
Remove compile-time sysdep option CONFIG_ALL_TABLES_AT_ONCE, replace it
with runtime ability to run either separate table scans or shared scan.
On Linux, use separate table scans by default when the netlink socket
option NETLINK_GET_STRICT_CHK is available, but retreat to shared scan
when it fails.
Running separate table scans has advantages where some routing tables are
managed independently, e.g. when multiple routing daemons are running on
the same machine, as kernel routing table modification performance is
significantly reduced when the table is modified while it is being
scanned.
Thanks Daniel Gröber for the original patch and Toke Høiland-Jørgensen
for suggestions.
The learnt routes are now pushed all into the connected table, not only
the best one. This shouldn't do any damage in well managed setups, yet
it should be noted that it is a change of behavior.
If anybody misses a feature which they implemented by misusing this
internal learn table, let us know, we'll consider implementing it in a
better way.
Passing protocol to preexport was in fact a historical relic from the
old times when channels weren't a thing. Refactoring that to match
current extensibility needs.
There were quite a lot of conflicts in flowspec validation code which
ultimately led to some code being a bit rewritten, not only adapted from
this or that branch, yet it is still in a limit of a merge.
For now, all route attributes are stored as eattrs in ea_list. This
should make route manipulation easier and it also allows for a layered
approach of route attributes where updates from filters will be stored
as an overlay over the previous version.
As there is either a nexthop or another destination specification
(or othing in case of ROAs and Flowspec), it may be merged together.
This code is somehow quirky and should be replaced in future by better
implementation of nexthop.
Also flowspec validation result has its own attribute now as it doesn't
have anything to do with route nexthop.
This doesn't do anything more than to put the whole structure inside
adata. The overall performance is certainly going downhill; we'll
optimize this later.
Anyway, this is one of the latest items inside rta and in several
commits we may drop rta completely and move to eattrs-only routes.
The route scope attribute was used for simple user route marking. As
there is a better tool for this (custom attributes), the old and limited
way can be dropped.
Some tokens are both keywords and symbols. For now, we allow only
specific keywords to be redefined; in future, more of the keywords may
be added to this category.
The redefinable keywords must be specified in any .Y file as follows:
toksym: THE_KEYWORD ;
See proto/bgp/config.Y for an example.
Also dropped a lot of unused terminals.
Changes in internal API:
* Every route attribute must be defined as struct ea_class somewhere.
* Registration of route attributes known at startup must be done by
ea_register_init() from protocol build functions.
* Every attribute has now its symbol registered in a global symbol table
defined as SYM_ATTRIBUTE
* All attribute ID's are dynamically allocated.
* Attribute value custom formatting hook is defined in the ea_class.
* Attribute names are the same for display and filters, always prefixed
by protocol name.
Also added some unit testing code for filters with route attributes.
This commit removes the EAF_TYPE_* namespace completely and also for
route attributes, filter-based types T_* are used. This simplifies
fetching and setting route attributes from filters.
Also, there is now union bval which serves as an universal value holder
instead of private unions held separately by eattr and filter code.
Before this change, fetch-update-write and bitmasking was hardcoded in
attribute access code cased by the attribute type. Several filter
instructions are used to do it instead.
As this is certainly going to be a little bit slower than before, the
switch block in attribute access code should be completely removed in
near future, helping with both performance and code cleanliness.
The user interface should have stayed intact.
When BIRD was munmapping too many pages, it sometimes aborted, saying
that munmap failed with "Not enough memory" as the address space was
getting more and more fragmented.
There is a workaround in place, simply keeping that page for future use,
yet it has never been compiled in because I somehow forgot to include
errno.h. And because I also thought that somebody may have ENOMEM not
defined (why?!), there was a check which quietly omitted that
workaround.
Anyway, ENOMEM is POSIX. It's an utter nonsense to check for its
existence. If it doesn't exist, something is broken.