We have quite large critical sections and we need to allocate inside
them. This is something to revise properly later on, yet for now,
instead of slowly but surely growing the virtual memory address space,
it's better to optimize the cold page cache pickup and count situations
where this happened inside the critical section.
Explicitly marking domains eligible for RCU synchronization. It's then
forbidden to lock these domains in RCU critical section to avoid
possible deadlock.
On large configurations, too many threads would spawn with one thread
per loop. Therefore, threads may now run multiple loops at once. The
thread count is configurable and may be changed during run. All threads
are spawned on startup.
This change helps with memory bloating. BIRD filters need large
temporary memory blocks to store their stack and also memory management
keeps its hot page storage per-thread.
Known bugs:
* Thread autobalancing is not yet implemented.
* Low latency loops are executed together with standard loops.