After converting BFD to the new IO loop system, reconfiguration never
really worked. Sadly, we missed this case in our testing suite so it
passed under the radar for quite a while.
Thanks to Andrei Dinu <andrei.dinu@digitalit.ro> for reporting and
isolating this issue.
When an OPEN message without capability options was parsed, the remote
role field was not initialized with the proper (non-zero) default value,
so it was interpreted as if 'provider' was announced.
Thanks to Mikhail Grishin for the bugreport.
Backport some changes from branch oz-parametric-hashes. Replace naive
hash function for IPv6 addresses, fix hashing of VPNx (where upper half
of RD was ignored), fix hashing of MPLS labels (where identity was used).
Basic fib_get() / fib_find() test for random prefixes, FIB_WALK() test,
and benchmark for fib_find(). Also generalize and reuse some code from
trie tests.
A forgotten else-clause caused BIRD to treat some pseudo-random place in
memory as fd-pair. This was happening only on startup of the first
thread in group and the value there in memory was typically zero ... and
writing to stdin succeeded.
When running BIRD with stdin not present (like systemd does), it died on
this spurious write. Now it seems to work correctly.
Thanks to Daniel Suchy <danny@danysek.cz> for reporting.
http://trubka.network.cz/pipermail/bird-users/2023-May/016929.html
There was a bug occuring when one thread sought for a src by its global id
and another one was allocating another src with such an ID that it caused
route src global index reallocation. This brief moment of inconsistency
led to a rare use-after-free of the old global index block.
The original algorithm was suffering from an ABA race condition:
A: fp = page_stack
B: completely allocates the same page and writes into it some data
A: unsuspecting, loads (invalid) next = fp->next
B: finishes working with the page and returns it back to page_stack
A: compare-exchange page_stack: fp => next succeeds and writes garbage
to page_stack
Fixed this by using an implicit spinlock in hot page allocator.
If a thread encounters timeout == 0 for poll, it considers itself
"busy" and with some hysteresis it tries to drop loops for others to
pick and thus better distribute work between threads.
The BMP protocol needs OPEN messages of established BGP sessions to
construct appropriate Peer Up messages. Instead of saving them internally
we use OPEN messages stored in BGP instances. This allows BMP instances
to be restarted or enabled later.
Because of this change, we can simplify BMP data structures. No need to
keep track of BGP sessions when we are not started. We have to iterate
over all (established) BGP sessions when the BMP session is established.
This is just a scaffolding now, but some kind of iteration would be
necessary anyway.
Also, the commit cleans up handling of msg/msg_length arguments to be
body/body_length consistently in both rx/tx and peer_up/peer_down calls.
For whatever reason, parser allocated a symbol for every parsed keyword
in each scope. That wasted time and memory. The effect is worsened with
recent changes allowing local scopes, so keywords often promote soft
scopes (with no symbols) to real scopes.
Do not allocate a symbol for a keyword. Take care of keywords that could
be promoted to symbols (kw_sym) and do it explicitly.
The symbol table used just symbol name as a key, and used a trick with
active flag to find symbols in active scopes with one hash table lookup.
The disadvantage is that it can degenerate to O(n) for negative queries
in situations where are many symbols with the same name in different
scopes.
Thanks to Yanko Kaneti for the bugreport.
Memory allocation is a fragile part of BIRD and we need checking that
everybody is using the resource pools in an appropriate way. To assure
this, all the resource pools are associated with locking domains and
every resource manipulation is thoroughly checked whether the
appropriate locking domain is locked.
With transitive resource manipulation like resource dumping or mass free
operations, domains are locked and unlocked on the go, thus we require
pool domains to have higher order than their parent to allow for this
transitive operations.
Adding pool locking revealed some cases of insecure memory manipulation
and this commit fixes that as well.