626 Commits

Author SHA1 Message Date
Nicholas J. Kain
8d89ca9f19 Reliably force restart when a subprocess has a fatal error.
Suppose a system call such as bind() fails in the sockd subprocess in
request_sockd_fd().  sockd will suicide().  This will send a SIGCHLD to
the master process, which the master process should respond to by
calling suicide(), forcing a process supervisor to respawn the entire
ndhc program.

But, this doesn't reliably happen prior to this commit because of the
interaction between request_sock_fd() and signalfd() [or equivalently
self-pipe-trick] signal handling.

request_sock_fd() makes ndhc-master synchronously wait for a response
from sockd via safe_recvmsg().  The normal goto-like signal handling
path is suppressed when using signalfd() , so when SIGCHLD is received,
it will not be handled until io is dispatched for the signalfd or pipe.
But such code will never be reached because ndhc-master is waiting in
safe_recvmsg() and thus never polls signal fd status.

So, revert to using traditional POSIX sigaction() for SIGCHLD, which
provides exactly the required behavior for proper functioning.
2020-10-20 04:41:51 -04:00
Nicholas J. Kain
f0340b1475 Correct ba046c02c729c.
Apparently I had forgotten the counter-intuitive semantics of
signalfd(): it's necessary to BLOCK the signals that will be
handled exclusively by signalfd() so that the default POSIX
signal handling mechanism won't intercept the signals first.

The lack of response to ctrl+c is a legitimate bug that is
now properly fixed; ba046c02c729c fixed that issue, but
regressed the handling of other signals.
2020-10-19 13:03:35 -04:00
Nicholas J. Kain
32bc422d0e Add a heuristic to detect when server ignores dhcp renews.
If we get no response to three renews (unicast), switch to sending
rebinds (broadcast).  Servers are supposed to always reply with
a DHCPACK or DHCPNAK even if the server doesn't update its internal
lease duration database, so this behavior should be RFC compliant.
2020-10-19 07:03:03 -04:00
Nicholas J. Kain
f4365897bc Make renew and rebinding directly track whether DHCPREQUEST was sent.
Before it was inferred by examining timeouts.  Also, simplify
the associated timeout code so that there are no longer effectively
two redundant paths.
2020-10-19 06:36:26 -04:00
Nicholas J. Kain
4df035ced3 Make sure xids in packets sent conform to RFC2131 pg36 table5. 2020-10-19 05:48:52 -04:00
Nicholas J. Kain
5dc35eca6d Merge send_renew() and send_rebind() into send_renew_or_rebind(). 2020-10-19 05:26:47 -04:00
Nicholas J. Kain
23d23c108a DHCPREQUEST in REBINDING state shouldn't have reqip option.
See RFC2131 pg31 paragraph 6.
2020-10-19 05:13:47 -04:00
Nicholas J. Kain
9b8c63d998 Give up on fingerprinting router/gateway if it doesn't reply. 2020-10-19 04:28:06 -04:00
Nicholas J. Kain
87ac82fa45 Add back project() for CMake. Corrects regression in c5a1edd5f6f. 2020-10-19 04:28:06 -04:00
Nicholas J. Kain
b2af308647 Trivial refactoring. 2020-10-19 04:28:06 -04:00
Nicholas J. Kain
7bd551d564 Give up on fingerprinting relay agent/server if it doesn't reply.
Try to send/wait three times; then if there's no response, then
assume that the relay agent is ignoring or firewalled from
receiving ARP requests.
2020-10-19 04:28:06 -04:00
Nicholas J. Kain
ba046c02c7 Make custom SIGUSR signals work again.
These were broken long ago when converting from signal()
to sigprocmask().  This change also makes ctrl+c work again.
2020-09-02 21:04:59 -04:00
Nicholas J. Kain
fb143995d2 Silence unused variable warnings in Ragel-generated code. 2019-01-01 01:10:34 -05:00
Nicholas J. Kain
253a97662d sockd: Preserve a const qualifier. 2018-10-26 13:07:37 -04:00
Nicholas J. Kain
3f5285b7ce options.c: Remove an unnecessary cast. 2018-10-26 13:07:19 -04:00
Nicholas J. Kain
c5a1edd5f6 Use more modern CMake syntax. 2018-10-26 13:07:05 -04:00
Nicholas J. Kain
56b6ae2cd3 Quit using NULL macro. 2018-10-26 07:17:39 -04:00
Nicholas J. Kain
05a075aeb2 Replace '(c)' with 'Copyright'.
'(c)' may not be a valid substitute for 'Copyright' in some legal
domains/interpretations.  So be safe, since I obviously am asserting
copyright on my legal work.
2018-10-26 07:11:16 -04:00
Nicholas J. Kain
8983df3c86 Update copyright dates. 2018-02-18 08:25:10 -05:00
Nicholas J. Kain
a66f007931 Trivial documentation updates. 2018-02-18 08:25:10 -05:00
Nicholas J. Kain
e08d3b15b5 Remove seccomp support.
It breaks with the existing whitelists on the latest glibc and is
just too much maintenance burden.  It also causes the most questions
for new users.

Something like openbsd's pledge() would be fine, but I have no
intention of maintaining such a thing.

Most of the value-gain would come from disallowing high-risk
syscalls like ptrace() and the perf syscalls, anyway.

ndhc already uses extensive defense-in-depth and wasn't using
seccomp on non-(x86|x86-64) platforms, so it's not a huge loss.
2018-02-09 03:33:04 -05:00
Nicholas J. Kain
e8d97205e9 Compile cleanly with -Wsign-conversion.
I didn't notice anything that worried me.
2018-02-09 03:16:59 -05:00
Nicholas J. Kain
c8dd123a5d README.md: Trivial fix to download links. 2017-10-04 05:33:36 -04:00
Nicholas J. Kain
3e4812eb35 README.md: Cosmetic improvements. 2017-09-23 15:17:59 -04:00
Nicholas J. Kain
8bb00c9c36 Convert the README to Markdown README.md. 2017-09-23 12:18:28 -04:00
Nicholas J. Kain
b6dda8f4f8 Update README. Mention the ncmlib requirement and make it more succinct. 2017-08-24 14:06:56 -04:00
Nicholas J. Kain
759b6bd831 Update to the new ncmlib random API. 2017-08-24 02:36:31 -04:00
Nicholas J. Kain
0732ed5f84 Disable GCC7 warning implicit-fallthrough and unused-const-variable 2017-05-13 13:14:35 -04:00
Nicholas J. Kain
ed44a90114 state: Faster recovery when carrier lost during DHCP init.
If carrier is lost before network fingerprinting is complete, we
have a few problems; first, we don't know whether the network has
changed underneath us.  Second, we've not yet configured the
interface properties, and it is not unlikely that doing so will
fail as the underlying network device may have been destroyed
and recreated during this time (eg, if ethtool has been run at
start-up time).

Thus, the safest reaction is to terminate and force a supervisor
respawn.  It is best to do this once carrier recovers, not when
the carrier is lost, as it is more likely to minimize delays.
2017-04-10 10:07:34 -04:00
Nicholas J. Kain
369ff59cab arp: Take extra care to preserve last received ARP packet. 2017-04-10 10:07:34 -04:00
Nicholas J. Kain
7af3e64a99 arp: Remove reply_offset, and keep previous ARP packet after epoll.
ARP packets aren't split across multiple receive events, so
reply_offset is pointless, and we implicitly assume that the
previous ARP packet data is still available after a forced sleep.
2017-04-10 08:56:11 -04:00
Nicholas J. Kain
bdad082a62 Remove -Wformat=2 -Wformat-nonliteral for C++. 2017-03-16 23:05:23 -04:00
Nicholas J. Kain
3633d55296 Enable -fno-strict-overflow just to be safer. 2017-03-16 21:59:55 -04:00
Nicholas J. Kain
348e975f2d reinit_shared_deconfig() was not fully resetting arp state. 2017-02-27 11:30:35 -05:00
Nicholas J. Kain
8ca0d28f61 reinit_shared_deconfig() was not resetting state completely.
Newer flags were not being restored properly.
2017-02-27 11:10:14 -05:00
Nicholas J. Kain
2a26acacdd Handle possible clock_gettime() errors in curms().
Use curms() instead of new clock_gettime() call points, too.
2017-02-24 08:51:36 -05:00
Nicholas J. Kain
a39f1dabfe Fix a typo in clock_gettime error path print that I somehow overlooked. 2017-02-24 08:39:27 -05:00
Nicholas J. Kain
7f08d4b6fb arp: Handle initial gateway query asynchronously and retry failures.
The gateway/router MAC fingerprinting could perhaps be done more
robustly in the face of suspend or carrier loss, but the time window
in which things could get confused is very small and I would rather
just rely on supervisor respawn in that case.

Even this case I don't think I've ever seen.
2017-02-24 07:39:14 -05:00
Nicholas J. Kain
7080850f38 Silence a spurious ARP defense message from previous commit. 2017-02-24 06:57:10 -05:00
Nicholas J. Kain
34a8cd7ad9 arp: Handle initial announcement asynchronously and retry failures.
We need to send two ARP announcements, so these are now done via a
timeout callback so that failures can be handled properly.
2017-02-24 06:57:10 -05:00
Nicholas J. Kain
4d87d5075a Handle carrier interruptions in arp_collision_timeout() better.
Still not ideal; we need to note and retry these errors, but these
changes are preparatory and do not introduce regressions.
2017-02-24 05:36:05 -05:00
Nicholas J. Kain
09d5e9ad3c arp: If first ARP announce fails, rely on the second announce.
The previous approach would desynchronize the state machine if the
carrier is paused after receiving the lease but before sending the
announce, since we have received a lease already.

This change is an improvement but is still not ideal.
2017-02-24 04:57:33 -05:00
Nicholas J. Kain
fa1c5d3a0c Add a comment describing link_set_flags() return values. 2017-01-19 05:22:22 -05:00
Nicholas J. Kain
1c2a39c544 Reorder client_config_t members to reduce structure padding. 2017-01-19 05:20:29 -05:00
Nicholas J. Kain
4fdde404aa Remove unused client_config_t foreground variable. 2017-01-19 05:18:04 -05:00
Nicholas J. Kain
c38fd2be9b Convert logical booleans in client_config_t to bool type. 2017-01-19 05:13:30 -05:00
Nicholas J. Kain
571b22c4b2 Rename client_state_t init variable to program_init.
Easier to grep.  No functional change.
2017-01-19 05:05:35 -05:00
Nicholas J. Kain
931530786b Convert logically boolean client_state_t variables from uint8_t to bool. 2017-01-19 05:01:23 -05:00
Nicholas J. Kain
b8ee0bd5c2 Update copyright dates to 2017. 2017-01-13 20:15:27 -05:00
Nicholas J. Kain
29498f5341 Remove ifsPrevState and set non-infinite timeout on a send error.
We instead check carrier status as needed.  This approach is more
robust.  For a simple example, imagine link state changes that happen
while the machine is suspended.
2017-01-13 20:15:27 -05:00