emo/ndhc - ndhc - Project Segfault Git

emo/ndhc

Author	SHA1	Message	Date
Nicholas J. Kain	4d87d5075a	Handle carrier interruptions in arp_collision_timeout() better. Still not ideal; we need to note and retry these errors, but these changes are preparatory and do not introduce regressions.	2017-02-24 05:36:05 -05:00
Nicholas J. Kain	09d5e9ad3c	arp: If first ARP announce fails, rely on the second announce. The previous approach would desynchronize the state machine if the carrier is paused after receiving the lease but before sending the announce, since we have received a lease already. This change is an improvement but is still not ideal.	2017-02-24 04:57:33 -05:00
Nicholas J. Kain	4fdde404aa	Remove unused client_config_t foreground variable.	2017-01-19 05:18:04 -05:00
Nicholas J. Kain	571b22c4b2	Rename client_state_t init variable to program_init. Easier to grep. No functional change.	2017-01-19 05:05:35 -05:00
Nicholas J. Kain	931530786b	Convert logically boolean client_state_t variables from uint8_t to bool.	2017-01-19 05:01:23 -05:00
Nicholas J. Kain	b8ee0bd5c2	Update copyright dates to 2017.	2017-01-13 20:15:27 -05:00
Nicholas J. Kain	c47630ffca	Rename check_carrier() to carrier_isup() and use bool return.	2017-01-12 05:25:15 -05:00
Nicholas J. Kain	ae16e26d00	arp: Fix case where changing interface properties consistently fails. If changing interface properties fails after getting a lease, it is possible under some strange conditions for the failure to be persistent. This seems to happen if the carrier cycles off and on several times during ndhc initialization. Since this issue is very hard to replicate, the most conservative thing to do here is to simply have ndhc suicide itself so it can be respawned by a process supervisor. Logs of the issue in practice: (carrier is down while the daemon is started here, it seems) 16:57:09.638979845 ndhc-ifch seccomp filter installed. Please disable seccomp if you 16:57:09.638989136 Discovering DHCP servers... 16:57:09.638991371 (send_dhcp_raw) carrier down; sendto would fail 16:57:09.638993318 Failed to send a discover request packet. ... 16:57:13.636519925 Discovering DHCP servers... 16:57:13.651462476 Received IP offer: X from server Y via Z ... 16:57:13.912592571 wan0: Gateway router set to: A 16:57:13.912607463 wan0: arp: Searching for dhcp server and gw addresses... 16:57:14.635532676 wan0: Carrier down. 17:04:32.983897760 wan0: arp: Still looking for gateway hardware address... 17:04:32.984158226 wan0: arp: Still looking for DHCP agent hardware address... 17:04:32.984781255 wan0: Interface is back. Revalidating lease... 17:04:32.985585501 wan0: arp: Gateway hardware address B 17:04:32.985590436 wan0: arp: DHCP agent hardware address C 17:04:38.234857403 wan0: arp: Still waiting for gateway to reply to arp ping... 17:04:38.235109016 wan0: arp: Still waiting for DHCP agent to reply to arp ping... 16:57:24.165620224 wan0: arp: Still waiting for gateway to reply to arp ping... 16:57:29.169621070 wan0: arp: DHCP agent and gateway didn't reply. Getting new lease. 16:57:29.217710616 wan0: Discovering DHCP servers... 16:57:29.249645130 wan0: Received IP offer: X from server Y via Z 16:57:29.249657203 wan0: Sending a selection request for X... 16:57:29.285632973 wan0: Received ACK: X from server Y via Z 16:57:29.297717159 wan0: arp: Probing for hosts that may conflict with our lease... 16:57:29.360249458 wan0: arp: Probing for hosts that may conflict with our lease... 16:57:29.435114526 wan0: arp: Probing for hosts that may conflict with our lease... 16:57:29.500473345 wan0: Lease of X obtained. Lease time is D seconds. 16:57:29.500485894 wan0: Failed to set the interface IP address and properties! ... And the final two errors repeat. Restarting ndhc by hand instantly fixes the issue. So there's a lot going on -- bizzare clock skew, and carrier flickering on and off.	2015-10-28 20:20:21 -04:00
Nicholas J. Kain	b3bd13d45f	Fix the return values of dhcp_packet_get and arp_packet_get. This corrects a bug where stale dhcp packets would get reprocessed, causing very bad behavior; an issue that was introduced in the coroutine conversion.	2015-02-18 11:02:13 -05:00
Nicholas J. Kain	99ce918a31	Use a coroutine instead of several callback state machines. This change makes it much easier to reason about ndhc's behavior and properly handle errors. It is a very large changeset, but there is no way to make this sort of change incrementally. Lease acquisition is tested to work. It is highly likely that some bugs were both introduced and squashed here. Some obvious code cleanups will quickly follow.	2015-02-18 05:31:13 -05:00
Nicholas J. Kain	61387408d0	Separate event state gathering from action dispatch in main epoll loop. This is the first step towards using coroutines.	2015-02-15 06:38:03 -05:00
Nicholas J. Kain	e874373dcd	Check link carrier via ifch and netlink instead of ioctl. Thus, ioctl can once again be removed from the ndhc seccomp whitelist.	2015-02-15 02:50:29 -05:00
Nicholas J. Kain	5b82be8b00	If ifchd interactions fail, terminate. Ideally we would pause and resume state, but for now just bail out. If ndhc is process-supervised, it will recover to the proper state quickly.	2015-02-14 20:47:14 -05:00
Nicholas J. Kain	56cc05599a	Add error handling for un-notified carrier downs in ifup_action.	2015-02-14 05:39:15 -05:00
Nicholas J. Kain	b6b778831c	Add error handling for un-notified carrier downs when sending packets. If a packet send failed because the carrier went down without a netlink notification, then assume the hardware carrier was lost while the machine was suspended (eg, ethernet cable pulled during suspend). Simulate a netlink carrier down event and freeze the dhcp state machine until a netlink carrier up event is received. The ARP code is not yet handling this issue everywhere, but the window of opportunity for it to happen there is much shorter.	2015-02-14 05:20:04 -05:00
Nicholas J. Kain	04840c261d	Fix some c99 struct initializer uninitialized member warnings that clang detects and GCC misses.	2015-02-13 23:25:42 -05:00
Nicholas J. Kain	702d8b0c5b	Mark pointer arguments that cannot ever be null as [static 1]. Also constify some cases, too.	2015-02-13 23:14:08 -05:00
Nicholas J. Kain	cc806acc0b	Indicate that client_state_t and client_config_t pointer args cannot ever be null. Could possibly improve code generation, and makes the intention clear.	2015-02-13 22:29:03 -05:00
Nicholas J. Kain	b4b6ed8fd5	Check for carrier before sendto() or write() on interface fd. Linux will quietly proceed as if the data were sent even if the carrier is down and nothing actually happened. There is still a tiny race condition where the carrier could drop between the check and the actual write, but we really can't do anything about that and it is a very small race.	2015-02-13 21:53:15 -05:00
Nicholas J. Kain	c58a071f52	Update copyright dates.	2015-02-13 01:54:57 -05:00
Nicholas J. Kain	27c9e2c553	Improve fingerprinting to support DHCP relay agents. Mostly reverts the previous commit and instead teaches ndhc to properly handle the case when it is communicating with a DHCP relay agent on its local segment rather than directly with a DHCP server.	2015-02-12 23:28:54 -05:00
Nicholas J. Kain	a395234a67	Support networks with relay agents that have the DHCP server on a different segment. The network fingerprinting would never complete if the DHCP server was on a different segment before this change, since it would be impossible for the ARP messages sent by ndhc to ever reach the DHCP server (and vice-versa). Now just give up trying to find the hardware address after two tries and assume that the DHCP server cannot be reached by ARP. An alternative would be to fingerprint the relay agent instead, but to do so would require a lot more work as the giaddr field is only meaningful in the client->server message path, not in the server->client path. Thus it would require gathering the source IP for DHCP replies sent by unicast or broadcast and ferrying along this information to the ARP checking code where it would be used in place of the DHCP server address. This is entirely possible to do, but is quite a bit more work.	2015-02-12 20:49:40 -05:00
Nicholas J. Kain	99e21004ea	arp_min_close_fd() will always force the arp fd to be equal to -1, so there is no need to check force_reopen twice.	2014-05-10 21:13:24 -04:00
Nicholas J. Kain	07cbd88049	Just use raw sockets for listening to DHCP requests. A UDP SO_BROADCAST socket was previously used only for receiving RENEWING packets, and it added needless complexity and was somewhat fragile.	2014-04-16 01:00:36 -04:00
Nicholas J. Kain	a9055b5ca5	Update more message prints to prefix with the interface name.	2014-04-15 15:24:22 -04:00
Nicholas J. Kain	a501789e04	Parse config options with ragel and support a configuration file.	2014-04-14 15:06:31 -04:00
Nicholas J. Kain	bb1ff7a506	arp.c: Make logging messages print the associated interface name.	2014-04-07 04:43:21 -04:00
Nicholas J. Kain	6804be2277	Use safe_sendto where necessary, and check for short writes. Also, change many log_lines to log_errors, mostly in ifset.c.	2014-04-07 04:15:02 -04:00
Nicholas J. Kain	e2ee728982	Consolidate all of the global static variables in arp.c into a single struct, and use booleans where appropriate.	2014-04-06 22:12:31 -04:00
Nicholas J. Kain	b761889025	Move source from ndhc/ to src/ since ifchd is no longer a separate program.	2014-04-06 16:57:06 -04:00

30 Commits