The socket structure has the field rbsize (receive buffer size), which
controls the size of the userspace receive buffer. There is also kernel
receive buffer, which in some cases may be smaller (e.g. on FreeBSD it
is by default ~8k). The patch ensures that the kernel receive buffer is
as large as the userspace receive buffer.
When VRFs are used, BIRD correctly binds listening (and connecting)
sockets to their VRFs but also re-binds accepted sockets to the same VRF.
This is not needed as the interface bind is inherited in this case, and
indeed this redundant bind causes an -EPERM if BIRD is running as
non-root making BIRD close the connection and reject the peer.
Thanks to Christian Svensson for the original patch and Alexander Zubkov
for suggestions.
Some vendors do not fill the checksum for IPv6 UDP packets.
For interoperability with such implementations one can set
UDP_NO_CHECK6_RX socket option on Linux.
Thanks to Ville O for the suggestion.
Minor changes by committer.
The UDP logging had to be substantially rewritten due to a different
logging backend and reconfiguration mechanisms.
Conflicts:
doc/bird.sgml
sysdep/unix/config.Y
sysdep/unix/io.c
sysdep/unix/log.c
sysdep/unix/unix.h
When regular event was added from work event, we did remember that
regular event list was empty and therefore we did not use zero time
in poll(). This leads to ~3 s latency in route reload during
reconfiguration.
This variant of logging avoids calling write() for every log line,
allowing for waitless logging. This makes heavy logging less heavy
and more useful for race condition debugging.
The original logging routines were locking a common mutex. This led to
massive underperformance and unwanted serialization when heavily logging
due to lock contention. Now the logging is lockless, though still
serializing on write() syscalls to the same filedescriptor.
This change also brings in a persistent logging channel structures and
thus avoids writing into active configuration data structures during
regular run.
Add a current_time_now() function which gets an immediate monotonic
timestamp instead of using the cached value from the event loop. This is
useful for callers that need precise times, such as the Babel RTT
measurement code.
Minor changes by committer.
Now sk_open() requires an explicit IO loop to open the socket in. Also
specific functions for socket RX pause / resume are added to allow for
BGP corking.
And last but not least, socket reloop is now synchronous to resolve
weird cases of the target loop stopping before actually picking up the
relooped socket. Now the caller must ensure that both loops are locked
while relooping, and this way all sockets always have their respective
loop.
If there are lots of loops in a single thread and only some of the loops
are actually active, the other loops are now kept aside and not checked
until they actually get some timers, events or active sockets.
This should help with extreme loads like 100k tables and protocols.
Also ping and loop pickup mechanism was allowing subtle race
conditions. Now properly handling collisions between loop ping and pickup.
On large configurations, too many threads would spawn with one thread
per loop. Therefore, threads may now run multiple loops at once. The
thread count is configurable and may be changed during run. All threads
are spawned on startup.
This change helps with memory bloating. BIRD filters need large
temporary memory blocks to store their stack and also memory management
keeps its hot page storage per-thread.
Known bugs:
* Thread autobalancing is not yet implemented.
* Low latency loops are executed together with standard loops.
Log message before aborting due to watchdog timeout. We have to use
async-safe write to debug log, as it is done in signal handler.
Minor changes from committer.
Add option to socket interface for nonlocal binding, i.e. binding to an
IP address that is not present on interfaces. This behaviour is enabled
when SKF_FREEBIND socket flag is set. For Linux systems, it is
implemented by IP_FREEBIND socket flag.
Minor changes done by commiter.
There is a simple universal IO loop, taking care of events, timers and
sockets. Primarily, one instance of a protocol should use exactly one IO
loop to do all its work, as is now done in BFD.
Contrary to previous versions, the loop is now launched and cleaned by
the nest/proto.c code, allowing for a protocol to just request its own
loop by setting the loop's lock order in config higher than the_bird.
It is not supported nor checked if any protocol changed the requested
lock order in reconfigure. No protocol should do it at all.
In previous versions, every thread used its own time structures,
effectively leading to different time in every thread and strange
logging messages.
The time processing code now uses global atomic variables to keep
current time available for fast concurrent reading and safe updates.
In general, events are code handling some some condition, which is
scheduled when such condition happened and executed independently from
I/O loop. Work-events are a subgroup of events that are scheduled
repeatedly until some (often significant) work is done (e.g. feeding
routes to protocol). All scheduled events are executed during each
I/O loop iteration.
Separate work-events from regular events to a separate queue and
rate limit their execution to a fixed number per I/O loop iteration.
That should prevent excess latency when many work-events are
scheduled at one time (e.g. simultaneous reload of many BGP sessions).