On large configurations, too many threads would spawn with one thread
per loop. Therefore, threads may now run multiple loops at once. The
thread count is configurable and may be changed during run. All threads
are spawned on startup.
This change helps with memory bloating. BIRD filters need large
temporary memory blocks to store their stack and also memory management
keeps its hot page storage per-thread.
Known bugs:
* Thread autobalancing is not yet implemented.
* Low latency loops are executed together with standard loops.
After a suggestion by Santiago, I added the direct list pointer into
events and the events are now using this value to check whether the
route is active or not. Also the whole trick with sentinel node unioned
with event list is now gone.
For debugging, there is also an internal circular buffer to store what
has been recently happening in event code before e.g. a crash happened.
By default, this debug is off and must be manually enabled in
lib/event.c as it eats quite some time and space.
In multithreaded environment, we need to pass messages between workers.
This is done by queuing events to their respective queues. The
double-linked list is not really useful for that as it needs locking
everywhere.
This commit rewrites the event subsystem to use a single-linked list
where events are enqueued by a single atomic instruction and the queue
is processed after atomically moving the whole queue aside.
There is a simple universal IO loop, taking care of events, timers and
sockets. Primarily, one instance of a protocol should use exactly one IO
loop to do all its work, as is now done in BFD.
Contrary to previous versions, the loop is now launched and cleaned by
the nest/proto.c code, allowing for a protocol to just request its own
loop by setting the loop's lock order in config higher than the_bird.
It is not supported nor checked if any protocol changed the requested
lock order in reconfigure. No protocol should do it at all.
In general, events are code handling some some condition, which is
scheduled when such condition happened and executed independently from
I/O loop. Work-events are a subgroup of events that are scheduled
repeatedly until some (often significant) work is done (e.g. feeding
routes to protocol). All scheduled events are executed during each
I/O loop iteration.
Separate work-events from regular events to a separate queue and
rate limit their execution to a fixed number per I/O loop iteration.
That should prevent excess latency when many work-events are
scheduled at one time (e.g. simultaneous reload of many BGP sessions).
This also fixes bug that timer->recurrent was not cleared
in tm_new() and unexpected recurrence of startup timer
in BGP confused state machine and caused crash.
WALK_LIST_DELSAFE (in ev_run_list) is not safe with regard
to deletion of next node. When some events are rescheduled
during event execution, it may lead to deletion of next
node and some events are skipped. Such skipped nodes remain
in temporary list on stack and the last of them contains
'next' pointer to stack area. When this event is later
scheduled, it damages stack area trying to remove it from
the list, which leads to random crashes with funny
backtraces :-) .