The code in tm_format_real_time() mixed up two buffers and their
sizes, which may cause crash in MRT dumping code.
Thanks to Piotr Wydrych for the bugreport.
From now, there are no auxiliary pointers stored in the free slab nodes.
This led to strange debugging problems if use-after-free happened in
slab-allocated structures, especially if the structure's first member is
a next pointer.
This also reduces the memory needed by 1 pointer per allocated object.
OTOH, we now rely on pages being aligned to their size's multiple, which
is quite common anyway.
In general, events are code handling some some condition, which is
scheduled when such condition happened and executed independently from
I/O loop. Work-events are a subgroup of events that are scheduled
repeatedly until some (often significant) work is done (e.g. feeding
routes to protocol). All scheduled events are executed during each
I/O loop iteration.
Separate work-events from regular events to a separate queue and
rate limit their execution to a fixed number per I/O loop iteration.
That should prevent excess latency when many work-events are
scheduled at one time (e.g. simultaneous reload of many BGP sessions).
The babel protocol code was initialising objects returned from the slab
allocator by assigning to each of the struct members individually, but
wasn't touching the NODE member while doing so. This leads to warnings on
debug builds since commit:
baac700906 ("List expensive check.")
To fix this, introduce an sl_allocz() variant of the slab allocator which
will zero out the memory before returning it, and switch all the babel call
sites to use this version. The overhead for doing this should be negligible
for small objects, and in the case of babel, the largest object being
allocated was being zeroed anyway, so we can drop the memset in
babel_read_tlv().
The RFC 5575 does not explicitly reject flowspec rules without dst part,
it just requires dst part in validation procedure for feasibility, which
we do not implement anyway. Thus flow without dst prefix is syntactically
valid, but unfeasible (if feasibilty testing is done).
Thanks to Alex D. for the bugreport.
When dynamic BGP with remote range is configured, MD5SIG needs to use
newer socket option (TCP_MD5SIG_EXT) to specify remote addres range for
listening socket.
Thanks to Adam Kułagowski for the suggestion.
Use a hierarchical bitmap in a routing table to assign ids to routes, and
then use bitmaps (indexed by route id) in channels to keep track whether
routes were exported. This avoids unreliable and inefficient re-evaluation
of filters for old routes in order to determine whether they were exported.
Basic bitmap is obvious. Hierarchical bitmap is structure of several
bitmaps, where higher levels are conjunctions of intervals on level
below, allowing for efficient lookup of first unset bit.
During NLRI parsing of IPv6 Flowspec, dst prefix was not properly
extracted from NLRI, therefore a received flow was stored in a different
position in flowspec routing table, and was not reachable by command
'show route <flow>'.
Add proper prefix part accessors to flowspec code and use them from BGP
NLRI parsing code.
Thanks to Alex D. for the bugreport.
Instead of having large stack buffer for max amount of AFI/SAFI pairs.
The old code is not correct w.r.t. extendeded option length, as more
AFI/SAFI pairs may fit into the capability option.
Mismatched types to printf(). The old code coincidentally worked on amd64
due to its calling conventions.
Thanks to Maximilian Eschenbacher for the bugreport.
This is a major change of how the filters are interpreted. If everything
works how it should, it should not affect you unless you are hacking the
filters themselves.
Anyway, this change should make a huge improvement in the filter performance
as previous benchmarks showed that our major problem lies in the
recursion itself.
There are also some changes in nest and protocols, related mostly to
spreading const declarations throughout the whole BIRD and also to
refactored dynamic attribute definitions. The need of these came up
during the whole work and it is too difficult to split out these
not-so-related changes.
The new MRT protocol is responsible for periodic RIB table dumps in the
MRT format (RFC 6396). Also the existing code for BGP4MP MRT dumps is
refactored and splitted between BGP to MRT protocols, will be more
integrated into MRT in the future.
Example:
protocol mrt {
table "*";
filename "%N_%F_%T.mrt";
period 60;
}
It is partially based on the old MRT code from Pavel Tvrdik.
no more warnings
No more warnings over me
And while it is being compiled all the log is black and white
Release BIRD now and then let it flee
(use the melody of well-known Oh Freedom!)
BSD systems cannot use SO_DONTROUTE, because it does not work properly
with multicast packets (perhaps it tries to find iface based on multicast
group address). But we can use MSG_DONTROUTE sendmsg() flag for unicast
packets. Works on FreeBSD, is ignored on OpenBSD and is broken on NetBSD
(i guess due to integrated routing table and ARP table).
This patch adds support for source-specific IPv6 routes to BIRD core.
This is based on Dean Luga's original patch, with the review comments
addressed. SADR support is added to network address parsing in confbase.Y
and to the kernel protocol on Linux.
Currently there is no way to mix source-specific and non-source-specific
routes (i.e., SADR tables cannot be connected to non-SADR tables).
Thanks to Toke Hoiland-Jorgensen for the original patch.
Minor changes by Ondrej Santiago Zajicek.
The old timer interface is still kept, but implemented by new timers. The
plan is to switch from the old inteface to the new interface, then clean
it up.
Add basic VRF (virtual routing and forwarding) support. Protocols can be
associated with VRFs, such protocols will be restricted to interfaces
assigned to the VRF (as reported by Linux kernel) and will use sockets
bound to the VRF. E.g., different multihop BGP instances can use diffent
kernel routing tables to handle BGP TCP connections.
The VRF support is preliminary, currently there are several limitations:
- Recent Linux kernels (4.11) do not handle correctly sockets bound
to interaces that are part of VRF, so most protocols other than multihop
BGP do not work. This will be fixed by future kernel versions.
- Neighbor cache ignores VRFs. Breaks config with the same prefix on
local interfaces in different VRFs. Not much problem as single hop
protocols do not work anyways.
- Olock code ignores VRFs. Breaks config with multiple BGP peers with the
same IP address in different VRFs.
- Incoming BGP connections are not dispatched according to VRFs.
Breaks config with multiple BGP peers with the same IP address in
different VRFs. Perhaps we would need some kernel API to read VRF of
incoming connection? Or probably use multiple listening sockets in
int-new branch.
- We should handle master VRF interface up/down events and perhaps
disable associated protocols when VRF goes down. Or at least disable
associated interfaces.
- Also we should check if the master iface is really VRF iface and
not some other kind of master iface.
- BFD session request dispatch should be aware of VRFs.
- Perhaps kernel protocol should read default kernel table ID from VRF
iface so it is not necessary to configure it.
- Perhaps we should have per-VRF default table.
Basic support for SAFI 4 and 128 (MPLS labeled IP and VPN) for IPv4 and
IPv6. Should work for route reflector, but does not properly handle
originating routes with next hop self.
Based on patches from Jan Matejka.
The patch fixes several bugs introduced in previous changes, simplifies
the protocol by handing routes uniformly, introduces asynchronous route
processing to avoid issues with separate notifications for each next-hop
in ECMP routes, and makes reconfiguration faster by avoiding quadratic
complexity.
From now on, protocol static accepts VPN4 and VPN6 addressess.
With some concerns about VPN6 Route Distinguishers, I finally chose
to have the same format as for VPN4 (where it is defined by RFC 4364).
Anyway, Bird is now capable to insert both MPLS routes and MPLS encap
routes into kernel.
It was (among others) needed to define platform-specific AF_MPLS to 28
as this constant has been assigned in the linux kernel.
No support for BSD now, it may be added in the future.
Prefix and bucket tables are initialized when entering established state
but not explicitly freed when leaving it (that is handled by protocol
restart). With graceful restart, BGP may enter and leave established
state multiple times without hard protocol restart causing memory leak.
- Unit Testing Framework (BirdTest)
- Integration of BirdTest into the BIRD build system
- Tests for several BIRD modules
Based on squashed Pavel Tvrdik's int-test branch, updated for
current int-new branch.
BIRD passed string from configuration to openlog(), which kept it
internally. After reconfiguration the old string was freed, therefore
openlog had invalid copy.
Thanks to Chris Caputo for the original patch.
Also removed the lib-dir merging with sysdep. Updated #include's
accordingly.
Fixed make doc on recent Debian together with moving generated doc into
objdir.
Moved Makefile.in into root dir
Retired all.o and birdlib.a
Linking the final binaries directly from all the .o files.
This patch implements the IPv6 subset of the Babel routing protocol.
Based on the patch from Toke Hoiland-Jorgensen, with some heavy
modifications and bugfixes.
Thanks to Toke Hoiland-Jorgensen for the original patch.
Add code for manipulation with TCP-MD5 keys in the IPsec SA/SP database
at FreeBSD systems. Now, BGP MD5 authentication (RFC 2385) keys are
handled automatically on both Linux and FreeBSD.
Based on patches from Pavel Tvrdik.
In BIRD, RX has lower priority than TX with the exception of RX from
control socket. The patch replaces heuristic based on socket type with
explicit mark and uses it for both control socket and BGP session waiting
to be established.
This should avoid an issue when during heavy load, outgoing connection
could connect (TX event), send open, but then failed to receive OPEN /
establish in time, not sending notifications between and therefore
got hold timer expired error from the neighbor immediately after it
finally established the connection.
The old linked list implementation used some wild typecasts and required
GCC option -fno-strict-aliasing to work properly. This patch fixes that.
However, we still keep the option due to other potential problems.
(Commited by Ondrej Santiago Zajicek)
The patch adds support for channels, structures connecting protocols and
tables and handling most interactions between them. The documentation is
missing yet.
- Remove `u8 src` from net_add_roaX
- Add `u8 max_pxlen` to net_add_roaX
- Add some missing macro and functions for ROA
- Remove ASN from hash function for ROA
Thanks to Ondrej Santiago Zajicek
Explicit setting of AF_INET(6|) in IP socket creation. BFD set to listen
on v6, without setting the V6ONLY flag to catch both v4 and v6 traffic.
Squashing and minor changes by Ondrej Santiago Zajicek
New data types net_addr and variants (in lib/net.h) describing
network addresses (prefix/pxlen). Modifications of FIB structures
to handle these data types and changing everything to use these
data types instead of prefix/pxlen pairs where possible.
The commit is WiP, some protocols are not yet updated (BGP, Kernel),
and the code contains some temporary scaffolding.
Comments are welcome.
The new RIP implementation fixes plenty of old bugs and also adds support
for many new features: ECMP support, link state support, BFD support,
configurable split horizon and more. Most options are now per-interface.
New LSA checksumming code separates generic Fletcher-16 and OSPF-specific
code and avoids back and forth endianity conversions, making it much more
readable and also several times faster.
I/O:
- BSD: specify src addr on IP sockets by IP_HDRINCL
- BSD: specify src addr on UDP sockets by IP_SENDSRCADDR
- Linux: specify src addr on IP/UDP sockets by IP_PKTINFO
- IPv6: specify src addr on IP/UDP sockets by IPV6_PKTINFO
- Alternative SKF_BIND flag for binding to IP address
- Allows IP/UDP sockets without tx_hook, on these
sockets a packet is discarded when TX queue is full
- Use consistently SOL_ for socket layer values.
OSPF:
- Packet src addr is always explicitly set
- Support for secondary addresses in BSD
- Dynamic RX/TX buffers
- Fixes some minor buffer overruns
- Interface option 'tx length'
- Names for vlink pseudoifaces (vlinkX)
- Vlinks use separate socket for TX
- Vlinks do not use fixed associated iface
- Fixes TTL for direct unicast packets
- Fixes DONTROUTE for OSPF sockets
- Use ifa->ifname instead of ifa->iface->name
Interfaces for OSPF and RIP could be configured to use (and request)
TTL 255 for traffic to direct neighbors.
Thanks to Simon Dickhoven for the original patch for RIPng.
Implements support for IPv6 traffic class, sets higher priority for OSPF
and RIP outgoing packets by default and allows to configure ToS/DS/TClass
IP header field and the local priority of outgoing packets.
Allows to send and receive multiple routes for one network by one BGP
session. Also contains necessary core changes to support this (routing
tables accepting several routes for one network from one protocol).
It needs some more cleanup before merging to the master branch.
- ROA tables, which are used as a basic part for RPKI.
- Commands for examining and modifying ROA tables.
- Filter operators based on ROA tables consistent with RFC 6483.
- BSD kernel syncer is now self-conscious and can learn alien routes
- important bugfix in BSD kernel syncer (crash after protocol restart)
- many minor changes and bugfixes in kernel syncers and neighbor cache
- direct protocol does not generate host and link local routes
- min_scope check is removed, all routes have SCOPE_UNIVERSE by default
- also fixes some remaining compiler warnings
There is no reak callback scheduler and previous behavior causes
bad things during hard congestion (like BGP hold timeouts).
Smart callback scheduler is still missing, but main loop was
changed such that it first processes all tx callbacks (which
are fast enough) (but max 4* per socket) + rx callbacks for CLI,
and in the second phase it processes one rx callback per
socket up to four sockets (as rx callback can be slow when
there are too many protocols, because route redistribution
is done synchronously inside rx callback). If there is event
callback ready, second phase is skipped in 90% of iterations
(to speed up CLI during congestion).
This also fixes bug that timer->recurrent was not cleared
in tm_new() and unexpected recurrence of startup timer
in BGP confused state machine and caused crash.
Prefix sets were broken beyond any repair and have to be reimplemented.
They are reimplemented using a trie with bitmasks in nodes.
There is also change in the interpretation of minus prefix pattern,
but the old interpretation was already inconsistent with
the documentation and broken.
There is also some bugfixes in filter code related to set variables.
Filter code used 'aux' integer field of 'symbol' struct to store ptr
to next symbol and both 'aux2' and 'def' fields for value.
Changed to just 'def' for value and 'aux2' for ptr to next symbol.
Also another minor bugfix.
WALK_LIST_DELSAFE (in ev_run_list) is not safe with regard
to deletion of next node. When some events are rescheduled
during event execution, it may lead to deletion of next
node and some events are skipped. Such skipped nodes remain
in temporary list on stack and the last of them contains
'next' pointer to stack area. When this event is later
scheduled, it damages stack area trying to remove it from
the list, which leads to random crashes with funny
backtraces :-) .
you can delete the socket from anywhere in the hooks and nothing should break.
Also, the receive/transmit buffers are now regular xmalloc()'ed buffers,
not separate resources which would need shuffling around between pools.
sk_close() is gone, use rfree() instead.
with two exceptions:
o Any non-zero field width is automatically replaced by standard
IP address width. This hides dependences on IPv4/IPv6.
o %#I generates hexadecimal form of the address.
Therefore |%I| generates unpadded format, |%1I| full size flush-right,
and |%-1I| full size flush-left format.
Please try compiling your code with --enable-warnings to see them. (The
unused parameter warnings are usually bogus, the unused variable ones
are very useful, but gcc is unable to control them separately.)
but the core routines are there and seem to be working.
o lib/ipv6.[ch] written
o Lexical analyser recognizes IPv6 addresses and when in IPv6
mode, treats pure IPv4 addresses as router IDs.
o Router ID must be configured manually on IPv6 systems.
o Added SCOPE_ORGANIZATION for org-scoped IPv6 multicasts.
o Fixed few places where ipa_(hton|ntoh) was called as a function
returning converted address.
over EFence and also hopefully smaller memory overhead, but sadly it's non-free
for commercial use).
If the DMALLOC_OPTIONS environment variable is not set, switch on `reasonable'
checks by default.
Also introduced mb_allocz() for cleared mb_alloc().
of various callbacks.
Events are just another resource type objects (thus automatically freed
and unlinked when the protocol using them shuts down). Each event can
be linked in at most one event list. For most purposes, just use the
global event list handled by the following functions:
ev_schedule Schedule event to be called at the next event
scheduling point. If the event was already
scheduled, it's just re-linked to the end of the list.
ev_postpone Postpone an already scheduled event, so that it
won't get called. Postponed events can be scheduled
again by ev_schedule().
You can also create custom event lists to build your own synchronization
primitives. Just use:
ev_init_list to initialize an event list
ev_enqueue to schedule event on specified event list
ev_postpone works as well for custom lists
ev_run_list to run all events on your custom list
ev_run to run a specific event and dequeue it
#define L_DEBUG "\001" /* Debugging messages */
#define L_INFO "\002" /* Informational messages */
#define L_WARN "\003" /* Warnings */
#define L_ERR "\004" /* Errors */
#define L_AUTH "\005" /* Authorization failed etc. */
#define L_FATAL "\006" /* Fatal errors */
#define L_TRACE "\002" /* Protocol tracing */
#define L_INFO "\003" /* Informational messages */
#define L_REMOTE "\004" /* Remote protocol errors */
#define L_WARN "\004" /* Local warnings */
#define L_ERR "\005" /* Local errors */
#define L_AUTH "\006" /* Authorization failed etc. */
#define L_FATAL "\007" /* Fatal errors */
#define L_BUG "\010" /* BIRD bugs */
Introduced bug() which is like die(), but with level L_BUG. Protocols
should _never_ call die() as it should be used only during initialization
and on irrecoverable catastrophic events like out of memory.
Also introduced ASSERT() which behaves like normal assert(), but it calls
bug() when assertion fails. When !defined(DEBUGGING), it gets ignored.
- cfg_strcpy() -> cfg_strdup()
- mempool -> linpool, mp_* -> lp_* [to avoid confusion with memblock, mb_*]
Anyway, it might be better to stop ranting about names and do some *real* work.