From now, there are no auxiliary pointers stored in the free slab nodes.
This led to strange debugging problems if use-after-free happened in
slab-allocated structures, especially if the structure's first member is
a next pointer.
This also reduces the memory needed by 1 pointer per allocated object.
OTOH, we now rely on pages being aligned to their size's multiple, which
is quite common anyway.
In general, events are code handling some some condition, which is
scheduled when such condition happened and executed independently from
I/O loop. Work-events are a subgroup of events that are scheduled
repeatedly until some (often significant) work is done (e.g. feeding
routes to protocol). All scheduled events are executed during each
I/O loop iteration.
Separate work-events from regular events to a separate queue and
rate limit their execution to a fixed number per I/O loop iteration.
That should prevent excess latency when many work-events are
scheduled at one time (e.g. simultaneous reload of many BGP sessions).
The babel protocol code was initialising objects returned from the slab
allocator by assigning to each of the struct members individually, but
wasn't touching the NODE member while doing so. This leads to warnings on
debug builds since commit:
baac700906 ("List expensive check.")
To fix this, introduce an sl_allocz() variant of the slab allocator which
will zero out the memory before returning it, and switch all the babel call
sites to use this version. The overhead for doing this should be negligible
for small objects, and in the case of babel, the largest object being
allocated was being zeroed anyway, so we can drop the memset in
babel_read_tlv().
The RFC 5575 does not explicitly reject flowspec rules without dst part,
it just requires dst part in validation procedure for feasibility, which
we do not implement anyway. Thus flow without dst prefix is syntactically
valid, but unfeasible (if feasibilty testing is done).
Thanks to Alex D. for the bugreport.
When dynamic BGP with remote range is configured, MD5SIG needs to use
newer socket option (TCP_MD5SIG_EXT) to specify remote addres range for
listening socket.
Thanks to Adam Kułagowski for the suggestion.
Use a hierarchical bitmap in a routing table to assign ids to routes, and
then use bitmaps (indexed by route id) in channels to keep track whether
routes were exported. This avoids unreliable and inefficient re-evaluation
of filters for old routes in order to determine whether they were exported.
Basic bitmap is obvious. Hierarchical bitmap is structure of several
bitmaps, where higher levels are conjunctions of intervals on level
below, allowing for efficient lookup of first unset bit.
During NLRI parsing of IPv6 Flowspec, dst prefix was not properly
extracted from NLRI, therefore a received flow was stored in a different
position in flowspec routing table, and was not reachable by command
'show route <flow>'.
Add proper prefix part accessors to flowspec code and use them from BGP
NLRI parsing code.
Thanks to Alex D. for the bugreport.
Instead of having large stack buffer for max amount of AFI/SAFI pairs.
The old code is not correct w.r.t. extendeded option length, as more
AFI/SAFI pairs may fit into the capability option.
Mismatched types to printf(). The old code coincidentally worked on amd64
due to its calling conventions.
Thanks to Maximilian Eschenbacher for the bugreport.
This is a major change of how the filters are interpreted. If everything
works how it should, it should not affect you unless you are hacking the
filters themselves.
Anyway, this change should make a huge improvement in the filter performance
as previous benchmarks showed that our major problem lies in the
recursion itself.
There are also some changes in nest and protocols, related mostly to
spreading const declarations throughout the whole BIRD and also to
refactored dynamic attribute definitions. The need of these came up
during the whole work and it is too difficult to split out these
not-so-related changes.
The new MRT protocol is responsible for periodic RIB table dumps in the
MRT format (RFC 6396). Also the existing code for BGP4MP MRT dumps is
refactored and splitted between BGP to MRT protocols, will be more
integrated into MRT in the future.
Example:
protocol mrt {
table "*";
filename "%N_%F_%T.mrt";
period 60;
}
It is partially based on the old MRT code from Pavel Tvrdik.
no more warnings
No more warnings over me
And while it is being compiled all the log is black and white
Release BIRD now and then let it flee
(use the melody of well-known Oh Freedom!)
BSD systems cannot use SO_DONTROUTE, because it does not work properly
with multicast packets (perhaps it tries to find iface based on multicast
group address). But we can use MSG_DONTROUTE sendmsg() flag for unicast
packets. Works on FreeBSD, is ignored on OpenBSD and is broken on NetBSD
(i guess due to integrated routing table and ARP table).
This patch adds support for source-specific IPv6 routes to BIRD core.
This is based on Dean Luga's original patch, with the review comments
addressed. SADR support is added to network address parsing in confbase.Y
and to the kernel protocol on Linux.
Currently there is no way to mix source-specific and non-source-specific
routes (i.e., SADR tables cannot be connected to non-SADR tables).
Thanks to Toke Hoiland-Jorgensen for the original patch.
Minor changes by Ondrej Santiago Zajicek.
The old timer interface is still kept, but implemented by new timers. The
plan is to switch from the old inteface to the new interface, then clean
it up.
Add basic VRF (virtual routing and forwarding) support. Protocols can be
associated with VRFs, such protocols will be restricted to interfaces
assigned to the VRF (as reported by Linux kernel) and will use sockets
bound to the VRF. E.g., different multihop BGP instances can use diffent
kernel routing tables to handle BGP TCP connections.
The VRF support is preliminary, currently there are several limitations:
- Recent Linux kernels (4.11) do not handle correctly sockets bound
to interaces that are part of VRF, so most protocols other than multihop
BGP do not work. This will be fixed by future kernel versions.
- Neighbor cache ignores VRFs. Breaks config with the same prefix on
local interfaces in different VRFs. Not much problem as single hop
protocols do not work anyways.
- Olock code ignores VRFs. Breaks config with multiple BGP peers with the
same IP address in different VRFs.
- Incoming BGP connections are not dispatched according to VRFs.
Breaks config with multiple BGP peers with the same IP address in
different VRFs. Perhaps we would need some kernel API to read VRF of
incoming connection? Or probably use multiple listening sockets in
int-new branch.
- We should handle master VRF interface up/down events and perhaps
disable associated protocols when VRF goes down. Or at least disable
associated interfaces.
- Also we should check if the master iface is really VRF iface and
not some other kind of master iface.
- BFD session request dispatch should be aware of VRFs.
- Perhaps kernel protocol should read default kernel table ID from VRF
iface so it is not necessary to configure it.
- Perhaps we should have per-VRF default table.
Basic support for SAFI 4 and 128 (MPLS labeled IP and VPN) for IPv4 and
IPv6. Should work for route reflector, but does not properly handle
originating routes with next hop self.
Based on patches from Jan Matejka.
The patch fixes several bugs introduced in previous changes, simplifies
the protocol by handing routes uniformly, introduces asynchronous route
processing to avoid issues with separate notifications for each next-hop
in ECMP routes, and makes reconfiguration faster by avoiding quadratic
complexity.
From now on, protocol static accepts VPN4 and VPN6 addressess.
With some concerns about VPN6 Route Distinguishers, I finally chose
to have the same format as for VPN4 (where it is defined by RFC 4364).