Commit Graph

60 Commits

Author SHA1 Message Date
Nicholas J. Kain
4575f74164 Remove legacy support for exiting after obtaining a DHCP lease. 2020-10-20 06:55:04 -04:00
Nicholas J. Kain
ade4e988af Remove legacy support for forking to background. 2020-10-20 06:55:04 -04:00
Nicholas J. Kain
58067200d6 Remove legacy support for writing a pidfile. 2020-10-20 06:55:04 -04:00
Nicholas J. Kain
4d33c00e04 Use poll() instead of epoll() for ndhc-master. 2020-10-20 05:58:29 -04:00
Nicholas J. Kain
06a541261e Stop using signalfd and audit signal handling code.
There's really no advantage to using signalfd in ndhc, particularly
since the normal POSIX signal API is now used for handling SIGCHLD in
ndhc-master.  So just use the tried and true volatile sig_atomic_t set
and check approach.

The only intended behavior change is in the dhcp RELEASE state --
before there would be a spurious attempt at renewing a nonexistent
lease when the RENEW signal was received.
2020-10-20 04:42:58 -04:00
Nicholas J. Kain
8d89ca9f19 Reliably force restart when a subprocess has a fatal error.
Suppose a system call such as bind() fails in the sockd subprocess in
request_sockd_fd().  sockd will suicide().  This will send a SIGCHLD to
the master process, which the master process should respond to by
calling suicide(), forcing a process supervisor to respawn the entire
ndhc program.

But, this doesn't reliably happen prior to this commit because of the
interaction between request_sock_fd() and signalfd() [or equivalently
self-pipe-trick] signal handling.

request_sock_fd() makes ndhc-master synchronously wait for a response
from sockd via safe_recvmsg().  The normal goto-like signal handling
path is suppressed when using signalfd() , so when SIGCHLD is received,
it will not be handled until io is dispatched for the signalfd or pipe.
But such code will never be reached because ndhc-master is waiting in
safe_recvmsg() and thus never polls signal fd status.

So, revert to using traditional POSIX sigaction() for SIGCHLD, which
provides exactly the required behavior for proper functioning.
2020-10-20 04:41:51 -04:00
Nicholas J. Kain
f0340b1475 Correct ba046c02c7.
Apparently I had forgotten the counter-intuitive semantics of
signalfd(): it's necessary to BLOCK the signals that will be
handled exclusively by signalfd() so that the default POSIX
signal handling mechanism won't intercept the signals first.

The lack of response to ctrl+c is a legitimate bug that is
now properly fixed; ba046c02c7 fixed that issue, but
regressed the handling of other signals.
2020-10-19 13:03:35 -04:00
Nicholas J. Kain
4df035ced3 Make sure xids in packets sent conform to RFC2131 pg36 table5. 2020-10-19 05:48:52 -04:00
Nicholas J. Kain
ba046c02c7 Make custom SIGUSR signals work again.
These were broken long ago when converting from signal()
to sigprocmask().  This change also makes ctrl+c work again.
2020-09-02 21:04:59 -04:00
Nicholas J. Kain
56b6ae2cd3 Quit using NULL macro. 2018-10-26 07:17:39 -04:00
Nicholas J. Kain
05a075aeb2 Replace '(c)' with 'Copyright'.
'(c)' may not be a valid substitute for 'Copyright' in some legal
domains/interpretations.  So be safe, since I obviously am asserting
copyright on my legal work.
2018-10-26 07:11:16 -04:00
Nicholas J. Kain
8983df3c86 Update copyright dates. 2018-02-18 08:25:10 -05:00
Nicholas J. Kain
e08d3b15b5 Remove seccomp support.
It breaks with the existing whitelists on the latest glibc and is
just too much maintenance burden.  It also causes the most questions
for new users.

Something like openbsd's pledge() would be fine, but I have no
intention of maintaining such a thing.

Most of the value-gain would come from disallowing high-risk
syscalls like ptrace() and the perf syscalls, anyway.

ndhc already uses extensive defense-in-depth and wasn't using
seccomp on non-(x86|x86-64) platforms, so it's not a huge loss.
2018-02-09 03:33:04 -05:00
Nicholas J. Kain
e8d97205e9 Compile cleanly with -Wsign-conversion.
I didn't notice anything that worried me.
2018-02-09 03:16:59 -05:00
Nicholas J. Kain
759b6bd831 Update to the new ncmlib random API. 2017-08-24 02:36:31 -04:00
Nicholas J. Kain
7af3e64a99 arp: Remove reply_offset, and keep previous ARP packet after epoll.
ARP packets aren't split across multiple receive events, so
reply_offset is pointless, and we implicitly assume that the
previous ARP packet data is still available after a forced sleep.
2017-04-10 08:56:11 -04:00
Nicholas J. Kain
4fdde404aa Remove unused client_config_t foreground variable. 2017-01-19 05:18:04 -05:00
Nicholas J. Kain
c38fd2be9b Convert logical booleans in client_config_t to bool type. 2017-01-19 05:13:30 -05:00
Nicholas J. Kain
571b22c4b2 Rename client_state_t init variable to program_init.
Easier to grep.  No functional change.
2017-01-19 05:05:35 -05:00
Nicholas J. Kain
931530786b Convert logically boolean client_state_t variables from uint8_t to bool. 2017-01-19 05:01:23 -05:00
Nicholas J. Kain
b8ee0bd5c2 Update copyright dates to 2017. 2017-01-13 20:15:27 -05:00
Nicholas J. Kain
29498f5341 Remove ifsPrevState and set non-infinite timeout on a send error.
We instead check carrier status as needed.  This approach is more
robust.  For a simple example, imagine link state changes that happen
while the machine is suspended.
2017-01-13 20:15:27 -05:00
Nicholas J. Kain
04ec7c8f4b Update to latest write_pid semantics and don't write pidfile by default.
There was no way to disable writing pidfiles before.

pidfiles are an unreliable method of tracking processes, anyway; process
supervisors are strongly recommended.  If a pidfile is really needed, it
can be explicitly specified.
2016-05-06 15:00:31 -04:00
Nicholas J. Kain
5ab36719f1 If a fd closes unexpectedly in epoll, print error and exit with failure.
Before the exit code would be success and no error message would print,
and it required a bit of control flow tracing to determine what would
actually happen.

No direct functional change (unless the supervising process cares about
the return code of ndhc on exit).
2015-12-25 02:44:54 -05:00
Nicholas J. Kain
277f0f67c5 When converting timeout from ll to int, also guard against underflow. 2015-05-27 15:00:02 -04:00
Nicholas J. Kain
abb1b54bfe Fix an overflow that can cause spuriously short epoll timeouts.
Lease times and arp timeouts are all calculated using long long,
but epoll takes its timeout argument as an int.  Guard against
timeouts > INT_MAX but < UINT_MAX wrapping and causing spuriously
short timeouts when converted to a signed int.

This problem has been observed in the wild.  Thanks to thypon
for a detailed strace that pointed me towards this issue.
2015-05-27 12:58:42 -04:00
Nicholas J. Kain
ba875d4b2e Failsafe should only trigger is the new timeout is also zero.
This is what I get for rushing!
2015-05-27 12:35:16 -04:00
Nicholas J. Kain
f061a78a18 Fix dumb mistake in patch before last; epoll timeout is in ms, not s. 2015-05-27 12:29:46 -04:00
Nicholas J. Kain
8273b383ab Add a failsafe to prevent epoll busy-spin. 2015-05-27 12:23:16 -04:00
Nicholas J. Kain
b3bd13d45f Fix the return values of dhcp_packet_get and arp_packet_get.
This corrects a bug where stale dhcp packets would get reprocessed,
causing very bad behavior; an issue that was introduced in the
coroutine conversion.
2015-02-18 11:02:13 -05:00
Nicholas J. Kain
3ede5fbe33 Handle the release and renew signals again. 2015-02-18 07:31:19 -05:00
Nicholas J. Kain
69cf41f1b1 Only process one epoll event at a time.
If ndhc were a high-performance program that handled lots of events,
this change would harm performance.  But it is not, and it implicitly
believes that events come in one at a time.  Processing batches
would make it harder to assure correctness while also never allocating
memory at runtime.

The previous structure was fine when everything was handled
immediately by callbacks, but it isn't now.
2015-02-18 05:36:13 -05:00
Nicholas J. Kain
99ce918a31 Use a coroutine instead of several callback state machines.
This change makes it much easier to reason about ndhc's behavior
and properly handle errors.

It is a very large changeset, but there is no way to make this
sort of change incrementally.  Lease acquisition is tested to
work.

It is highly likely that some bugs were both introduced and
squashed here.  Some obvious code cleanups will quickly follow.
2015-02-18 05:31:13 -05:00
Nicholas J. Kain
37aa866ae4 Move action dispatch out of main epoll loop. 2015-02-15 06:48:49 -05:00
Nicholas J. Kain
61387408d0 Separate event state gathering from action dispatch in main epoll loop.
This is the first step towards using coroutines.
2015-02-15 06:38:03 -05:00
Nicholas J. Kain
5b82be8b00 If ifchd interactions fail, terminate.
Ideally we would pause and resume state, but for now just bail out.
If ndhc is process-supervised, it will recover to the proper state
quickly.
2015-02-14 20:47:14 -05:00
Nicholas J. Kain
170f87c0e7 Propagate returns through ifchange_(deconfig|bind).
While doing so remove unnecessary argument null checks and
make sure not to alter the stored interface state if the
ifch requests failed.
2015-02-14 19:10:23 -05:00
Nicholas J. Kain
44175bd77c Make ifch requests synchronous just like sockd requests.
This change paves the way for allowing ifch to notify the core ndhc
about failures.  It would be far too difficult to reason about the
state machine if the requests to ifch were asynchronous.

Currently ndhc assumes that ifch requests never fail, but this
is not always true because of eg, rfkill.
2015-02-14 16:49:50 -05:00
Nicholas J. Kain
61a48b0fb6 Fix the rfkill waiting. 2015-02-14 15:33:02 -05:00
Nicholas J. Kain
04840c261d Fix some c99 struct initializer uninitialized member warnings
that clang detects and GCC misses.
2015-02-13 23:25:42 -05:00
Nicholas J. Kain
702d8b0c5b Mark pointer arguments that cannot ever be null as [static 1].
Also constify some cases, too.
2015-02-13 23:14:08 -05:00
Nicholas J. Kain
911d4cc58e Fix the dhcp state bootstrapping when rfkill is set #3. 2015-02-13 19:08:50 -05:00
Nicholas J. Kain
2e679ed491 Fix the dhcp state bootstrapping when rfkill is set #2. 2015-02-13 18:35:44 -05:00
Nicholas J. Kain
a8af406307 Fix the dhcp state bootstrapping when rfkill is set. 2015-02-13 18:07:14 -05:00
Nicholas J. Kain
79a97131bc Handle the case where the rfkill is set when ndhc is initializing. 2015-02-13 17:50:24 -05:00
Nicholas J. Kain
cf81573082 Fix a dumb typo in the previous commit. 2015-02-13 16:56:56 -05:00
Nicholas J. Kain
e3d4d4c1aa rfkill: Add support for reacting to radio kill switch events.
In order for this to work, the correct rfkill index must be specified
with the rfkill-idx option.

It might be possible to auto-detect the corresponding rfkill-idx option,
but I'm not sure if there's a guaranteed mapping between rfkill name and
interface name, as it seems that rfkills should represent phy devices
and not wlan devices.

The rfkill indexes can be found by checking
/sys/class/rfkill/rfkill<IDX>.
2015-02-13 16:25:36 -05:00
Nicholas J. Kain
c58a071f52 Update copyright dates. 2015-02-13 01:54:57 -05:00
Nicholas J. Kain
07cbd88049 Just use raw sockets for listening to DHCP requests. A UDP SO_BROADCAST
socket was previously used only for receiving RENEWING packets, and it
added needless complexity and was somewhat fragile.
2014-04-16 01:00:36 -04:00
Nicholas J. Kain
0884d96d1e PR_SET_PDEATHSIG is not fully reliable, so instead maintain a pair of
AF_UNIX SOCK_STREAM sockets between the master processes and each subprocess,
and poll for the HUP event.

At the same time, be specific about the events that are checked in epoll
when dispatching on an event.
2014-04-15 23:19:24 -04:00