The problem happened like this:
1. Single route for the given net in table
2. A feed is started
3. The route is deleted (from another thread)
4. The feed finds an empty net, exports nothing, ignores journal (here is bug)
5. The route is added
6. The export transitions from FEEDING to READY
7. While processing the journal, the route deletion and addition combines into noop.
This way routes mysteriously disappeared in specific cases of link instability.
Problem fixed by explicitly marking the empty-net journal entries as processed in step 4.
This is a backport cherry-pick of commits
165156beebcce974e8ea
from the v3.0 branch as we need symbol hashes directly inside their
scopes for more general usage than before.
Due to a race condition between rta_apply_hostentry() and rt_update_hostentry(),
happening when a new route is inserted to a table, this commit makes it mandatory
to lock the next hop resolution table while resolving the next hop.
This may be slow, we'll fix it better in some future release
Most syntactic constructs in BIRD configuration (e.g. protocol options)
are defined as keywords, which are distinct from symbols (user-defined
names for protocols, variables, ...). That may cause backwards
compatibility issue when a new feature is added, as it may collide with
existing user names.
We can allow keywords to be shadowed by symbols in almost all cases to
avoid this issue.
This replaces the previous mechanism, where shadowable symbols have to be
explictly added to kw_syms.
Nonterminal bytestring allows to provide expressions to be evaluated in
places where BYTETEXT is used now: passwords, radv custom option.
Based on the patch from Alexander Zubkov <green@qrator.net>, thanks!
- Rename BYTESTRING lexem to BYTETEXT, not to collide with 'bytestring' type name
- Add bytestring type with id T_BYTESTRING (0x2c)
- Add from_hex() filter function to create bytestring from hex string
- Add filter test cases for bytestring type
Minor changes by committer.
Despite not having defined 'master interface', VRF interfaces should be
treated as being inside respective VRFs. They behave as a loopback for
respective VRFs. Treating the VRF interface as inside the VRF allows
e.g. OSPF to pick up IP addresses defined on the VRF interface.
For this, we also need to tell apart VRF interfaces and regular interfaces.
Extend Netlink code to parse interface type and mark VRF interfaces with
IF_VRF flag.
Based on the patch from Erin Shepherd, thanks!
Now we use rt_notify() and channels for both feed and notifications,
in both import tables (pre-policy) and regular tables (post-policy).
Remove direct walk in bmp_route_monitor_snapshot().
- Manage BMP state through bmp_peer, bmp_stream, bmp_table structures
- Use channels and rt_notify() hook for route announcements
- Add support for post-policy monitoring
- Send End-of-RIB even when there is no routes
- Remove rte_update_in_notify() hook from import tables
- Update import tables to support channels
- Add bmp_hack (no feed / no flush) flag to channels
Basic fib_get() / fib_find() test for random prefixes, FIB_WALK() test,
and benchmark for fib_find(). Also generalize and reuse some code from
trie tests.
There was a bug occuring when one thread sought for a src by its global id
and another one was allocating another src with such an ID that it caused
route src global index reallocation. This brief moment of inconsistency
led to a rare use-after-free of the old global index block.
For whatever reason, parser allocated a symbol for every parsed keyword
in each scope. That wasted time and memory. The effect is worsened with
recent changes allowing local scopes, so keywords often promote soft
scopes (with no symbols) to real scopes.
Do not allocate a symbol for a keyword. Take care of keywords that could
be promoted to symbols (kw_sym) and do it explicitly.
Memory allocation is a fragile part of BIRD and we need checking that
everybody is using the resource pools in an appropriate way. To assure
this, all the resource pools are associated with locking domains and
every resource manipulation is thoroughly checked whether the
appropriate locking domain is locked.
With transitive resource manipulation like resource dumping or mass free
operations, domains are locked and unlocked on the go, thus we require
pool domains to have higher order than their parent to allow for this
transitive operations.
Adding pool locking revealed some cases of insecure memory manipulation
and this commit fixes that as well.
Initial implementation of a basic subset of the BMP (BGP Monitoring
Protocol, RFC 7854) from Akamai team. Submitted for further review
and improvement.
If there are lots of loops in a single thread and only some of the loops
are actually active, the other loops are now kept aside and not checked
until they actually get some timers, events or active sockets.
This should help with extreme loads like 100k tables and protocols.
Also ping and loop pickup mechanism was allowing subtle race
conditions. Now properly handling collisions between loop ping and pickup.
Repeated pipe refeed should not end route refresh as the prune routine
may start pruning otherwise valid routes.
The same applies for BGP repeated route refresh.
When changing default table behavior, I missed that it enabled to
configure multiple master4 and master6 tables. Now BIRD recognizes it
and fails properly.
The import table feed wasn't resetting the table-specific route values
like REF_FILTERED and thus made the route look like filtered even though
it should have been re-evaluated as accepted.
The feature of showing all prefixes inside the given one has been added
in v2.0.9 but not well documented. Fixing it by this update.
Text in doc and commit message added by commiter.
Instead of propagating interface updates as they are loaded from kernel,
they are enqueued and all the notifications are called from a
protocol-specific event. This change allows to break the locking loop
between protocols and interfaces.
Anyway, this change is based on v2 branch to keep the changes between v2
and v3 smaller.
The interface list must be flushed when device protocol is stopped. This
was done in a hardcoded specific hook inside generic protocol routines.
The cleanup hook was originally used for table reference counting late
cleanup, yet it can be also simply used for prettier interface list flush.
There ware missing dependencies for proto-build.c generation, which
sometimes lead to failed builds, and ignores changes in the set of
built protocols. Fix that, and also improve formatting of proto-build.c
Instead of calling custom hooks from object locks, we use standard event
sending mechanism to inform protocols about object lock changes. This is
a backport from version 3 where these events are passed across threads.
This implementation of object locks doesn't use mutexes to lock the
whole data structure. In version 3, this data structure may get accessed
from multiple threads and must be protected by mutex.
Instead of calling custom hooks from object locks, we use standard event
sending mechanism to inform protocols about object lock changes. As
event sending is lockless, the unlocking protocol simply enqueues the
appropriate event to the given loop when the locking is done.
If no channel is flushing, table prune doesn't walk over routes in nets
and also doesn't walk over importing channel lists. This helps to
alleviate the memory caching burdens a lot.
Some CLI actions, notably "show route", are run by queuing an event
somewhere else. If the user closes the socket, in case such an action is
being executed, the CLI must free the socket immediately from the error
hook but the pool must remain until the asynchronous event finishes and
cleans everything up.
During backporting attribute changes from 3.0-branch, some internal
attributes (RIP iface and Babel seqno) leaked to 'show route all' output.
Allow protocols to hide specific attributes with GA_HIDDEN value.
Thanks to Nigel Kukard for the bugreport.
There were some confusion about validity and usage of pflags, which
caused incorrect usage after some flags from (now removed) protocol-
specific area were moved to pflags.
We state that pflags:
- Are secondary data used by protocol-specific hooks
- Can be changed on an existing route (in contrast to copy-on-write
for primary data)
- Are irrelevant for propagation (not propagated when changed)
- Are specific to a routing table (not propagated by pipe)
The patch did these fixes:
- Do not compare pflags in rte_same(), as they may keep cached values
like BGP_REF_STALE, causing spurious propagation.
- Initialize pflags to zero in rte_get_temp(), avoid initialization in
protocol code, fixing at least two forgotten initializations (krt
and one case in babel).
- Improve documentation about pflags
When there is a continuos stream of CLI commands, cli_get_command()
always returns 1 (there is a new command). Anyway, the socket receive
buffer was reset only when there was no command at all, leading to a
strange behavior: after a while, the CLI receive buffer came to its end,
then read() was called with zero size buffer, it returned 0 which was
interpreted as EOF.
The patch fixes that by resetting the buffer position after each command
and moving remaining data at the beginning of buffer.
Thanks to Maria Matejka for examining the bug and for the original bugfix.
When filtered routes (enabled by 'import keep filtered' option) are
updated, they trigger announcements by rte_announce(). For regular
channels (e.g. type RA_OPTIMAL or RA_ANY) such announcement is just
ignored, but in case of RA_ACCEPTED (BGP peer with 'secondary' option)
it just reannounces the old (and still valid) best route.
The patch ensures that such no-change is ignored even for these channels.
Add BGP channel option 'next hop prefer global' that modifies BGP
recursive next hop resolution to use global next hop IPv6 address instead
of link-local next hop IPv6 address for immediate next hop of received
routes.
It is useful to distinguish whehter channel config returned from
channel_config_get() was allocated new, or existing from template.
Caller may want to initialize new ones.
In principle, the channel list is a list of parent struct proto and can
contain general structures of type struct channel, That is useful e.g.
for adding MPLS channels to BGP.
In some specific configurations, it was possible to send BIRD into an
infinite loop of recursive next hop resolution. This was caused by route
priority inversion.
To prevent priority inversions affecting other next hops, we simply
refuse to resolve any next hop if the best route for the matching prefix
is recursive or any other route with the same preference is recursive.
Next hop resolution doesn't change route priority, therefore it is
perfectly OK to resolve BGP next hops e.g. by an OSPF route, yet if the
same (or covering) prefix is also announced by iBGP, by retraction of
the OSPF route we would get a possible priority inversion.
By this, the requesting channels do the timers in their own loops,
avoiding unnecessary synchronization when the central timer went off.
This is of course less effective for now, yet it allows to easily
implement selective reloads in future.
Instead of synchronous notifications, we use the asynchronous export
framework to notify flowspec src route updates. This allows us to
invoke flowspec revalidation without locking collisions.
Instead of synchronous notifications, we use the asynchronous export
framework to notify also hostcache updates. This allows us to do the
hostcache update and the subsequent next hop update notification without
locking collisions.
We can't free the network structures before the export has been cleaned
up, therefore it makes more sense to request prune only after export
cleanup. This change also reduces prune calls on table shutdown.
These routines detect the export congestion (as defined by configurable
thresholds) and propagate the state to readers. There are no readers for
now, they will be added in following commits.