procps/testsuite
Chris Down 866abacf88 pgrep: Support matching on the presence of a userspace signal handler
In production we've had several incidents over the years where a process
has a signal handler registered for SIGHUP or one of the SIGUSR signals
which can be used to signal a request to reload configs, rotate log
files, and the like. While this may seem harmless enough, what we've
seen happen repeatedly is something like the following:

1. A process is using SIGHUP/SIGUSR[12] to request some
   application-handled state change -- reloading configs, rotating a log
   file, etc;
2. This kind of request is deprecated and removed, so the signal handler
   is removed. However, a site where the signal might be sent from is
   missed (often logrotate or a service manager);
3. Because the default disposition of these signals is terminal, sooner
   or later these applications are going to be sent SIGHUP or similar
   and end up unexpectedly killed.

I know for a fact that we're not the only organisation experiencing
this: in general, signal use is pretty tricky to reason about and safely
remove because of the fairly aggressive SIG_DFL behaviour for some
common signals, especially for SIGHUP which has a particularly ambiguous
meaning. Especially in a large, highly interconnected codebase,
reasoning about signal interactions between system configuration and
applications can be highly complex, and it's inevitable that on occasion
a callsite will be missed.

In some cases the right call to avoid this will be to migrate services
towards other forms of IPC for this purpose, but inevitably there will
be some services which must continue using signals, so we need a safe
way to support them.

This patch adds support for the -H/--require-handler flag, which matches
on processes with a userspace handler present for the signal being sent.

With this flag we can enforce that all SIGHUP reload cases and SIGUSR
equivalents use --require-handler. This effectively mitigates the case
we've seen time and time again where SIGHUP is used to rotate log files
or reload configs, but the sending site is mistakenly left present after
the removal of signal handler, resulting in unintended termination of
the process.

Signed-off-by: Chris Down <chris@chrisdown.name>
2023-01-15 04:05:40 +00:00
..
config testsuite: Update the library tests for new location 2022-08-29 20:40:45 +10:00
free.test build-sys: Update tests to new binary locations 2022-08-29 19:13:10 +10:00
kill.test build-sys: Update tests to new binary locations 2022-08-29 19:13:10 +10:00
lib.test testsuite: Update the library tests for new location 2022-08-29 20:40:45 +10:00
pgrep.test pgrep: Support matching on the presence of a userspace signal handler 2023-01-15 04:05:40 +00:00
pkill.test build-sys: Update tests to new binary locations 2022-08-29 19:13:10 +10:00
pmap.test build-sys: Update tests to new binary locations 2022-08-29 19:13:10 +10:00
ps.test ps: Correct BSD c option 2022-12-19 16:50:12 +11:00
pwait.test pgrep: Support matching on the presence of a userspace signal handler 2023-01-15 04:05:40 +00:00
pwdx.test build-sys: Update tests to new binary locations 2022-08-29 19:13:10 +10:00
skill.test skill: Restore the -p flag functionality 2022-12-12 16:46:36 +11:00
slabtop.test build-sys: Update tests to new binary locations 2022-08-29 19:13:10 +10:00
sysctl.test build-sys: Update tests to new binary locations 2022-08-29 19:13:10 +10:00
uptime.test build-sys: Update tests to new binary locations 2022-08-29 19:13:10 +10:00
vmstat.test build-sys: Update tests to new binary locations 2022-08-29 19:13:10 +10:00
w.test build-sys: Update tests to new binary locations 2022-08-29 19:13:10 +10:00
.gitignore
Makefile.am slabtop: Check for bad d and o option combination 2021-03-11 22:10:37 +11:00
README docs: add testsuite readme file 2012-03-03 18:36:29 +11:00
sysctl_glob_test.conf sysctl: Support systemd glob patterns 2021-09-15 20:07:32 +10:00
sysctl_slash_test.conf sysctl: print dotted keys again 2022-04-09 14:18:28 +10:00

How to use check suite
----------------------

You need DejaGNU package.  Assuming you have it all you need to do is

make check


Something failed now what
-------------------------

First determine what did not work.  If only one check failed you can
run it individually in debugging mode.  For example

runtest -a -de -v w.test/w.exp
Expect binary is /usr/bin/expect
Using /usr/share/dejagnu/runtest.exp as main test driver
[...]

Do not bother capturing screen output, it is in testrun.log which
test suite generated.

$ ls  testrun.* dbg.log
dbg.log  testrun.log  testrun.sum

The reason why test failed should be in dbg.log.  Assuming you
figured out the reason you could write a patch fixing w.test/w.exp
and send it to upstream.

If you do not know how, or have time, to fix the issue create tar.gz
file containing test run logs and submit it to upstream maintainers.
Notice that in later case upstream sometimes has to ask clarifying
questions about environment where problem occurred.