There was an inefficiency in the initial scan state machine,
causing routes to be fed several times instead of just once.
Now the export startup is postponed until first krt_scan()
finishes and we actually can do the pruning with full information.
This is now done in worker threads and the mainloop needs to do other things,
most notably kernel and CLI, with less overhead of repeatedly checking poll.
Otherwise it looks like we are sending too much traffic to netlink
every other while, which is not true. Now we can disambiguate between
in-kernel updates and ignored routes.
In our usecase, these are impossibly greedy because we often
free memory in a different thread than where we allocate, forcing
the default allocator to scatter the used memory all over the place.
We have quite large critical sections and we need to allocate inside
them. This is something to revise properly later on, yet for now,
instead of slowly but surely growing the virtual memory address space,
it's better to optimize the cold page cache pickup and count situations
where this happened inside the critical section.
The resource dumping routines needed to be updated in v3 to use the new
API introduced in v2.
Conflicts:
filter/f-util.c
filter/filter.c
lib/birdlib.h
lib/event.c
lib/mempool.c
lib/resource.c
lib/resource.h
lib/slab.c
lib/timer.c
nest/config.Y
nest/iface.c
nest/iface.h
nest/locks.c
nest/neighbor.c
nest/proto.c
nest/route.h
nest/rt-attr.c
nest/rt-table.c
proto/bfd/bfd.c
proto/bmp/bmp.c
sysdep/unix/io.c
sysdep/unix/krt.c
sysdep/unix/main.c
sysdep/unix/unix.h
All the 'dump something' CLI commands now have a new mandatory
argument -- name of the file where to dump the data. This allows
for more flexible dumping even for production deployments where
the debug output is by default off.
Also the dump commands are now restricted (they weren't before)
to assure that only the appropriate users can run these time consuming
commands.
Actually, completely rewritten the original patch as in v3, the logging
initialization is much more complex and requires allocation.
This way, to bootstrap properly, the logger has a pre-defined log target
to stderr.
Before this commit, on kernel shutdown, the routes were re-exported by
the regular export but treated as withdraw. This was too hairy and
caused unnecessary complexity of the protocol's state machine.
Instead of that, we found out that it makes more sense to just refeed
the routes synchronously and convert to withdraw. This is done by the
direct export access instead of the channel.
It would (maybe) make more sense to run export filters on this in case
the export filter updates the krt_metric attribute, but as this doesn't
work on regular withdraw anyway, it's better for now to just let it be
and maybe somebody in the future fixes this issue.
Introduced in 08ff0af898, the additional CLI
configuration wasn't properly initialized in the parse-and-exit mode
due to an oversight that cli_init_unix() is not called in this mode.
Thanks to Felix Friedlander for the bugreport.
The socket structure has the field rbsize (receive buffer size), which
controls the size of the userspace receive buffer. There is also kernel
receive buffer, which in some cases may be smaller (e.g. on FreeBSD it
is by default ~8k). The patch ensures that the kernel receive buffer is
as large as the userspace receive buffer.
When VRFs are used, BIRD correctly binds listening (and connecting)
sockets to their VRFs but also re-binds accepted sockets to the same VRF.
This is not needed as the interface bind is inherited in this case, and
indeed this redundant bind causes an -EPERM if BIRD is running as
non-root making BIRD close the connection and reject the peer.
Thanks to Christian Svensson for the original patch and Alexander Zubkov
for suggestions.
This allows to have one main socket for the heavy operations
very restricted just for the appropriate users, whereas the
looking glass socket may be more open.
Implemented an idea originally submitted and requested by Akamai.
Some vendors do not fill the checksum for IPv6 UDP packets.
For interoperability with such implementations one can set
UDP_NO_CHECK6_RX socket option on Linux.
Thanks to Ville O for the suggestion.
Minor changes by committer.