We missed that the protocol spawner violates the prescribed
locking order. When the rtable level is locked, no new protocol can be
started, thus we need to:
* create the protocol from a clean mainloop context
* in protocol start hook, take the socket
Testsuite: cf-bgp-autopeer
Fixes: #136
Thanks to Job Snijders <job@fastly.com> for reporting:
https://trubka.network.cz/pipermail/bird-users/2024-December/017980.html
Otherwise it looks like we are sending too much traffic to netlink
every other while, which is not true. Now we can disambiguate between
in-kernel updates and ignored routes.
If there is no feed pending, the requested one should be
activated immediately, otherwise it is activated only after
the full run, effectively running first a full feed and
then the requested one.
When many changes are done during reconfiguration, the table may
start pruning old routes before everything is settled down, slowing
down not only the reconfiguration, but also the shutdown process.
In our usecase, these are impossibly greedy because we often
free memory in a different thread than where we allocate, forcing
the default allocator to scatter the used memory all over the place.
There was a suspicion that maybe the BIRD 3 version of ROA gets the
digesting wrong. This test covers the nastiest cornercases we could
think about, so now we can expect it to be right.
We have quite large critical sections and we need to allocate inside
them. This is something to revise properly later on, yet for now,
instead of slowly but surely growing the virtual memory address space,
it's better to optimize the cold page cache pickup and count situations
where this happened inside the critical section.
The resource dumping routines needed to be updated in v3 to use the new
API introduced in v2.
Conflicts:
filter/f-util.c
filter/filter.c
lib/birdlib.h
lib/event.c
lib/mempool.c
lib/resource.c
lib/resource.h
lib/slab.c
lib/timer.c
nest/config.Y
nest/iface.c
nest/iface.h
nest/locks.c
nest/neighbor.c
nest/proto.c
nest/route.h
nest/rt-attr.c
nest/rt-table.c
proto/bfd/bfd.c
proto/bmp/bmp.c
sysdep/unix/io.c
sysdep/unix/krt.c
sysdep/unix/main.c
sysdep/unix/unix.h
The EAttr ID space is dense so we can just walk once, sweep the whole
input and go home.
There is a little bit of memory inefficiency in allocating always the
largest possible block, yet it isn't too bad.
There are also unit tests for this.
The function lfjour_cleanup_hook() was scheduled each time any of the
journal recipients reached end of a block of journal items or read all
of journal items. Because lfjour_cleanup_hook() can clean only journal
items every recipient has processed, it was often called uselessly.
This commit restricts most of the unuseful scheduling. Only some
recipients are given a token alowing them to try to schedule the
cleanup hook. When a recipient wants to schedule the cleanup hook, it
checks whether it has a token. If yes, it decrements number of tokens
the journal has given (issued_tokens) and discards its own token. If
issued_tokens reaches zero, the recipient is allowed to schedule the
cleanup hook.
There is a maximum number of tokens a journal can give to its recipients
(max_tokens). A new recipient is given a token in its init, unless the
maximum number of tokens is reached. The rest of tokens is given to
customers in lfjour_cleanup_hook().
In the cleanup hook, the issued_tokens number is increased in order to
avoid calling the hook before it finishes. Then, tokens are given to the
slowest recipients (but never to more than max_token recipients). Before
leaving lfjour_cleanup_hook(), the issued_tokens number is decreased back.
If no other tokens are given, we have to make sure the
lfjour_cleanup_hook will be called again. If every item in journal was
read by every recipient, tokens are given to random recipients. If all
recipients with tokens managed to finish until now, we give the token to
the first unfinished customer we find, or we just call the hook again.