The socket structure has the field rbsize (receive buffer size), which
controls the size of the userspace receive buffer. There is also kernel
receive buffer, which in some cases may be smaller (e.g. on FreeBSD it
is by default ~8k). The patch ensures that the kernel receive buffer is
as large as the userspace receive buffer.
When VRFs are used, BIRD correctly binds listening (and connecting)
sockets to their VRFs but also re-binds accepted sockets to the same VRF.
This is not needed as the interface bind is inherited in this case, and
indeed this redundant bind causes an -EPERM if BIRD is running as
non-root making BIRD close the connection and reject the peer.
Thanks to Christian Svensson for the original patch and Alexander Zubkov
for suggestions.
Some vendors do not fill the checksum for IPv6 UDP packets.
For interoperability with such implementations one can set
UDP_NO_CHECK6_RX socket option on Linux.
Thanks to Ville O for the suggestion.
Minor changes by committer.
When regular event was added from work event, we did remember that
regular event list was empty and therefore we did not use zero time
in poll(). This leads to ~3 s latency in route reload during
reconfiguration.
Add a current_time_now() function which gets an immediate monotonic
timestamp instead of using the cached value from the event loop. This is
useful for callers that need precise times, such as the Babel RTT
measurement code.
Minor changes by committer.
Log message before aborting due to watchdog timeout. We have to use
async-safe write to debug log, as it is done in signal handler.
Minor changes from committer.
Add option to socket interface for nonlocal binding, i.e. binding to an
IP address that is not present on interfaces. This behaviour is enabled
when SKF_FREEBIND socket flag is set. For Linux systems, it is
implemented by IP_FREEBIND socket flag.
Minor changes done by commiter.
In general, events are code handling some some condition, which is
scheduled when such condition happened and executed independently from
I/O loop. Work-events are a subgroup of events that are scheduled
repeatedly until some (often significant) work is done (e.g. feeding
routes to protocol). All scheduled events are executed during each
I/O loop iteration.
Separate work-events from regular events to a separate queue and
rate limit their execution to a fixed number per I/O loop iteration.
That should prevent excess latency when many work-events are
scheduled at one time (e.g. simultaneous reload of many BGP sessions).
When dynamic BGP with remote range is configured, MD5SIG needs to use
newer socket option (TCP_MD5SIG_EXT) to specify remote addres range for
listening socket.
Thanks to Adam Kułagowski for the suggestion.
The C11 specification allows only sig_atomic_t and _Atomic variable
access. All other accesses to global variables are undefined behavior.
Using int was probably OK on x86 and x86_64; yet there were some reports
from other architectures (especially some MIPS) that in rare cases,
after issuing SIGHUP, BIRD did strange things.
Support for dynamically spawning BGP protocols for incoming connections.
Use 'neighbor range' to specify range of valid neighbor addresses, then
incoming connections from these addresses spawn new BGP instances.
Since v2 we have multiple listening BGP sockets, and each BGP protocol
has associated one of them. Use listening socket that accepted the
incoming connection as a key in the dispatch process so only BGP
protocols assocaited with that listening socket can be selected.
This is necesary for proper dispatch when VRFs are used.
FreeBSD silently changes TTL to 1 when MSG_DONTROUTE is used, even when
it is explicitly set to another value. That breaks TTL security sockets,
including BFD which always uses TTL 255. Bad FreeBSD!
no more warnings
No more warnings over me
And while it is being compiled all the log is black and white
Release BIRD now and then let it flee
(use the melody of well-known Oh Freedom!)
BSD systems cannot use SO_DONTROUTE, because it does not work properly
with multicast packets (perhaps it tries to find iface based on multicast
group address). But we can use MSG_DONTROUTE sendmsg() flag for unicast
packets. Works on FreeBSD, is ignored on OpenBSD and is broken on NetBSD
(i guess due to integrated routing table and ARP table).
Use full time precision to initialize random generator. The old
code was prone to initialize it to the same values in specific
circumstances (boot without RTC, multiple VMs starting at once).
On Linux, setting the ToS will also set the priority and the range of
accepted values is quite limited (masked by 0x1e). Therefore, 0xc0 is
translated to a priority of 0, not something we want, overriding the
"7" priority which was set previously explicitely. To avoid that, just
move setting priority later in the code.
Thanks to Vincent Bernat for the patch.
The old timer interface is still kept, but implemented by new timers. The
plan is to switch from the old inteface to the new interface, then clean
it up.
Add basic VRF (virtual routing and forwarding) support. Protocols can be
associated with VRFs, such protocols will be restricted to interfaces
assigned to the VRF (as reported by Linux kernel) and will use sockets
bound to the VRF. E.g., different multihop BGP instances can use diffent
kernel routing tables to handle BGP TCP connections.
The VRF support is preliminary, currently there are several limitations:
- Recent Linux kernels (4.11) do not handle correctly sockets bound
to interaces that are part of VRF, so most protocols other than multihop
BGP do not work. This will be fixed by future kernel versions.
- Neighbor cache ignores VRFs. Breaks config with the same prefix on
local interfaces in different VRFs. Not much problem as single hop
protocols do not work anyways.
- Olock code ignores VRFs. Breaks config with multiple BGP peers with the
same IP address in different VRFs.
- Incoming BGP connections are not dispatched according to VRFs.
Breaks config with multiple BGP peers with the same IP address in
different VRFs. Perhaps we would need some kernel API to read VRF of
incoming connection? Or probably use multiple listening sockets in
int-new branch.
- We should handle master VRF interface up/down events and perhaps
disable associated protocols when VRF goes down. Or at least disable
associated interfaces.
- Also we should check if the master iface is really VRF iface and
not some other kind of master iface.
- BFD session request dispatch should be aware of VRFs.
- Perhaps kernel protocol should read default kernel table ID from VRF
iface so it is not necessary to configure it.
- Perhaps we should have per-VRF default table.