The current code pads with an extra character that is then rewritten
into a null character. This isn't necessary with post-C99
implementations of standardized snprintf, so get rid of it.
Also add a note warning that nk_generate_env() and nk_execute()
are not async signal safe and are thus unsuitable for use in
multithreaded processes.
nk_execute() could be rewritten to be async signal safe without much
trouble, as the only problem point is snprintf() which is not guaranteed
to be async signal safe by POSIX.
However, nk_generate_env() performs chroot() if a chroot_path is
specified, and chroot() is not async signal safe in POSIX.
Additionally, malloc() can be called in rare cases where user
information fields are very long, and malloc() is obviously not async
signal safe. Finally, snprintf() is used here, too, but it could be
replaced.
Converting to posix_spawn() is a no-go because posix_spawn() has no
facility for changing rlimits or chroot on the spawned process.
In summary, I don't think the gains are worth it. Multithreaded
processes should just not fork().
If no 'script-file = SCRIPTFILE' is specified in the configuration
file and if no '-X SCRIPTFILE' or '--script-file SCRIPTFILE'
command argument is provided, then this functionality is entirely
inactive and no associated subprocess is spawned.
Otherwise, ndhc will spawn a subprocess that runs as root that has the
sole job of forking off a subprocess that exec's the specified script in
a sanitized and fixed-state environment whenever a new DHCPv4 lease is
acquired.
Note that this script is provided no information about ndhc or the
DHCP state in the environment or in any argument fields; it is the
responsibility of this script to gather whatever information it needs
from either the filesystem or syscalls. This design is intended to
avoid the historical problems that are associated with dhcp clients
invoking scripts.
The path of the scriptfile cannot be changed after ndhc is initially
run; ndhc forks off the privsep script subprocess that executes scripts
after it has read the configuration file and command arguments, but
before it begins processing network data; thus, it is impossible for the
network-handling process to modify or influence the script assuming
proper OS memory protection.
The privsep channel communicates that the script should be run by simply
writing a newline; anything else will result in ndhc terminating itself.
Before the recommended way to update system state after a change in
lease information was to run the fcactus program and watch the
associated leasefile for the interface for modification; now no external
program is needed for this job.
This can be enabled via the s6-notify configure option; see
http://www.skarnet.org/software/s6/notifywhenup.html for details.
ndhc will signal that it is ready when the first valid lease is
obtained. Programs dependent on a working network interface
can then simply use s6-wait on the associated ndhc service dir.
A typical command line option assuming the s6 service directory
notification-fd contains '3' would be '--s6-notify 3', and
a typical configure file option would be 's6-notify 3'.
This notation originally had the sole advantage of implying
that the pointer was non-NULL, but it has seen little use and
now clang presents warnings about it in certain contexts.
This is done by performing one synchronous query for carrier state at
the start of the program; after that, we just monitor the nlsocket for
carrier state changes and update the cached state accordingly.
The benefit is that ifchd needs to do a lot less work and this should
reduce the CPU cycle consumption; prior to this commit, the CPU time
ends up being a few CPU-minutes per month.
Before the handling would constantly acculmulate a prefix of previous
incomplete commands. Now it still has a latent defect where the entire
buffer will be discarded given a spurious command, but ndhc shouldn't
generate such commands so it shouldn't matter.
I only built with this flag to mitigate accidental UB. Now that
UBSan exists, there's no point as UBSan does better and actually
allows offending code to be located and fixed easily.
The existing code tracked the fingerprinting completion state with
a bool, which is insufficient; what is required is a tristate
(none|inprogress|done) where the inprogress state reverts to
none on carrier down, but done state stays done on carrier down.
We could use 32-bit fetches with the same technique on 64-bit
architectures, or SIMD could be used to do very fast 128-bit
fetches, but this isn't a performance bottleneck and this
method is very simple and relatively fast.
That trivial warning fix causes warnings with -Wcast-qual, which
is a more valid warning than whatever was causing the warnings
under -Wall -pedantic with the normal Makefile.
There's really no advantage to using signalfd in ndhc, particularly
since the normal POSIX signal API is now used for handling SIGCHLD in
ndhc-master. So just use the tried and true volatile sig_atomic_t set
and check approach.
The only intended behavior change is in the dhcp RELEASE state --
before there would be a spurious attempt at renewing a nonexistent
lease when the RENEW signal was received.
Suppose a system call such as bind() fails in the sockd subprocess in
request_sockd_fd(). sockd will suicide(). This will send a SIGCHLD to
the master process, which the master process should respond to by
calling suicide(), forcing a process supervisor to respawn the entire
ndhc program.
But, this doesn't reliably happen prior to this commit because of the
interaction between request_sock_fd() and signalfd() [or equivalently
self-pipe-trick] signal handling.
request_sock_fd() makes ndhc-master synchronously wait for a response
from sockd via safe_recvmsg(). The normal goto-like signal handling
path is suppressed when using signalfd() , so when SIGCHLD is received,
it will not be handled until io is dispatched for the signalfd or pipe.
But such code will never be reached because ndhc-master is waiting in
safe_recvmsg() and thus never polls signal fd status.
So, revert to using traditional POSIX sigaction() for SIGCHLD, which
provides exactly the required behavior for proper functioning.
Apparently I had forgotten the counter-intuitive semantics of
signalfd(): it's necessary to BLOCK the signals that will be
handled exclusively by signalfd() so that the default POSIX
signal handling mechanism won't intercept the signals first.
The lack of response to ctrl+c is a legitimate bug that is
now properly fixed; ba046c02c729c fixed that issue, but
regressed the handling of other signals.
If we get no response to three renews (unicast), switch to sending
rebinds (broadcast). Servers are supposed to always reply with
a DHCPACK or DHCPNAK even if the server doesn't update its internal
lease duration database, so this behavior should be RFC compliant.