Current implementation handles flowspec prefix length and offset only
in bytes, but RFC 8956 (Dissemination of Flow Specification Rules for
IPv6) Section 3.1 [1] and example in Section 3.8.2 [2] states the
pattern should begin right after offset *bits*.
For example, pattern "::1:1234:5678:9800:0/60-104" is currently
serialized as "02 68 3c 01 12 34 56 78 98", but it should shift its
pattern 4 more bits to the left: "02 68 3c 11 23 45 67 89 80".
This patch implements shifting left/right for IPv6 type and use it to
correct the behaviour. Test data are replaced with the correct ones.
Minor changes and test vectors done by committer.
[1]: https://www.rfc-editor.org/rfc/rfc8956.html#section-3.1
[2]: https://www.rfc-editor.org/rfc/rfc8956.html#section-3.8.2
The period of recurent timers was stored in 32b field, despite it was
btime-compatible value in us. Therefore, it was limited to ~72 min,
which mas okay for most purposes, except configurable MRT dump periods.
Thanks to Felix Friedlander for the bugreport.
Some vendors do not fill the checksum for IPv6 UDP packets.
For interoperability with such implementations one can set
UDP_NO_CHECK6_RX socket option on Linux.
Thanks to Ville O for the suggestion.
Minor changes by committer.
Add route attribute gw_mpls_stack to make MPLS stack of route nexthop
accessible from filters. Its type is T_CLIST, which is really not correct
(as it is a list, while T_CLIST is a set). Therefore, we keep this
attribute *undocumented* and it will be *changed* without further notice.
Based on a patch from Trisha Biswas <tbiswas@fastly.com>, thanks!
We can distinguish BGP sessions if at least one side uses a different IP
address. Extend olock mechanism to handle local IP as a part of key, with
optional wildcard, so BGP sessions could local IP in the olock and not
block themselves.
The MPLS subsystem manages MPLS labels and handles their allocation to
MPLS-aware routing protocols. These labels are then attached to IP or VPN
routes representing label switched paths -- LSPs.
There was already a preliminary MPLS support consisting of MPLS label
net_addr, MPLS routing tables with static MPLS routes, remote labels in
next hops, and kernel protocol support.
This patch adds the MPLS domain as a basic structure representing local
label space with dynamic label allocator and configurable label ranges.
To represent LSPs, allocated local labels can be attached as route
attributes to IP or VPN routes with local labels as attributes.
There are several steps for handling LSP routes in routing protocols --
deciding to which forwarding equivalence class (FEC) the LSP route
belongs, allocating labels for new FECs, announcing MPLS routes for new
FECs, attaching labels to LSP routes. The FEC map structure implements
basic code for managing FECs in routing protocols, therefore existing
protocols can be made MPLS-aware by adding FEC map and delegating
most work related to local label management to it.
Add a current_time_now() function which gets an immediate monotonic
timestamp instead of using the cached value from the event loop. This is
useful for callers that need precise times, such as the Babel RTT
measurement code.
Minor changes by committer.
Backport some changes from branch oz-parametric-hashes. Replace naive
hash function for IPv6 addresses, fix hashing of VPNx (where upper half
of RD was ignored), fix hashing of MPLS labels (where identity was used).
The symbol table used just symbol name as a key, and used a trick with
active flag to find symbols in active scopes with one hash table lookup.
The disadvantage is that it can degenerate to O(n) for negative queries
in situations where are many symbols with the same name in different
scopes.
Thanks to Yanko Kaneti for the bugreport.
When lp_save() is called on an empty linpool, then some allocation is
done, then lp_restore() is called, the linpool is restored but the used
chunks are inaccessible. Fix it.
and "%M" formats expect "Input/output error" message but musl returns
"I/O error". Proposed change compares the printf output with string
returned from strerror function for EIO constant.
See-also: https://bugs.gentoo.org/836713
Minor change from committer.
When a linpool is used to allocate a one-off big load of memory, it
makes no sense to keep that amount of memory for future use inside the
linpool. Contrary to previous implementations where the memory was
directly free()d, we now use the page allocator which has an internal
cache which keeps the released pages for us and subsequent allocations
simply get these released pages back.
And even if the page cleanup routine kicks in inbetween, the pages get
only madvise()d, not munmap()ed so performance aspects are negligible.
This may fix some memory usage peaks in extreme cases.
Log message before aborting due to watchdog timeout. We have to use
async-safe write to debug log, as it is done in signal handler.
Minor changes from committer.
There were several requests to allow use of 240.0.0.0/4 as a private
range, and Linux kernel already allows such routes, so perhaps we can
allow that too.
Thanks to Vincent Bernat and others for suggestion and patches.
Alignment of slabs should be at least sizeof(ptr) to avoid unaligned
pointers in slab structures. Fixme: Use proper way to choose alignment
for internal allocators.
Attach a prefix trie to IP/VPN/ROA tables. Use it for net_route() and
net_roa_check(). This leads to 3-5x speedups for IPv4 and 5-10x
speedup for IPv6 of these calls.
TODO:
- Rebuild the trie during rt_prune_table()
- Better way to avoid trie_add_prefix() in net_get() for existing tables
- Make it configurable (?)