From ae16e26d004c4bc1c93bf37287c4aba974bf19b6 Mon Sep 17 00:00:00 2001 From: "Nicholas J. Kain" Date: Wed, 28 Oct 2015 20:20:21 -0400 Subject: [PATCH] arp: Fix case where changing interface properties consistently fails. If changing interface properties fails after getting a lease, it is possible under some strange conditions for the failure to be persistent. This seems to happen if the carrier cycles off and on several times during ndhc initialization. Since this issue is very hard to replicate, the most conservative thing to do here is to simply have ndhc suicide itself so it can be respawned by a process supervisor. Logs of the issue in practice: (carrier is down while the daemon is started here, it seems) 16:57:09.638979845 ndhc-ifch seccomp filter installed. Please disable seccomp if you 16:57:09.638989136 Discovering DHCP servers... 16:57:09.638991371 (send_dhcp_raw) carrier down; sendto would fail 16:57:09.638993318 Failed to send a discover request packet. ... 16:57:13.636519925 Discovering DHCP servers... 16:57:13.651462476 Received IP offer: X from server Y via Z ... 16:57:13.912592571 wan0: Gateway router set to: A 16:57:13.912607463 wan0: arp: Searching for dhcp server and gw addresses... 16:57:14.635532676 wan0: Carrier down. 17:04:32.983897760 wan0: arp: Still looking for gateway hardware address... 17:04:32.984158226 wan0: arp: Still looking for DHCP agent hardware address... 17:04:32.984781255 wan0: Interface is back. Revalidating lease... 17:04:32.985585501 wan0: arp: Gateway hardware address B 17:04:32.985590436 wan0: arp: DHCP agent hardware address C 17:04:38.234857403 wan0: arp: Still waiting for gateway to reply to arp ping... 17:04:38.235109016 wan0: arp: Still waiting for DHCP agent to reply to arp ping... 16:57:24.165620224 wan0: arp: Still waiting for gateway to reply to arp ping... 16:57:29.169621070 wan0: arp: DHCP agent and gateway didn't reply. Getting new lease. 16:57:29.217710616 wan0: Discovering DHCP servers... 16:57:29.249645130 wan0: Received IP offer: X from server Y via Z 16:57:29.249657203 wan0: Sending a selection request for X... 16:57:29.285632973 wan0: Received ACK: X from server Y via Z 16:57:29.297717159 wan0: arp: Probing for hosts that may conflict with our lease... 16:57:29.360249458 wan0: arp: Probing for hosts that may conflict with our lease... 16:57:29.435114526 wan0: arp: Probing for hosts that may conflict with our lease... 16:57:29.500473345 wan0: Lease of X obtained. Lease time is D seconds. 16:57:29.500485894 wan0: Failed to set the interface IP address and properties! ... And the final two errors repeat. Restarting ndhc by hand instantly fixes the issue. So there's a lot going on -- bizzare clock skew, and carrier flickering on and off. --- src/arp.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/src/arp.c b/src/arp.c index 3875fa1..8b77878 100644 --- a/src/arp.c +++ b/src/arp.c @@ -499,9 +499,8 @@ int arp_collision_timeout(struct client_state_t cs[static 1], long long nowts) garp.last_conflict_ts = 0; garp.wake_ts[AS_COLLISION_CHECK] = -1; if (ifchange_bind(cs, &garp.dhcp_packet) < 0) { - log_warning("%s: Failed to set the interface IP address and properties!", - client_config.interface); - return ARPR_FAIL; + suicide("%s: Failed to set the interface IP address and properties!", + client_config.interface); } cs->routerAddr = get_option_router(&garp.dhcp_packet); if (arp_get_gw_hwaddr(cs) < 0) {