From c2bd33e4838eb56bebe2707f6ca6bd05e9df5b24 Mon Sep 17 00:00:00 2001 From: Michael Orlitzky Date: Mon, 4 Sep 2017 17:58:09 -0400 Subject: [PATCH] service-script-guide.md: new guide for service script authors. This fixes #162. --- service-script-guide.md | 381 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 381 insertions(+) create mode 100644 service-script-guide.md diff --git a/service-script-guide.md b/service-script-guide.md new file mode 100644 index 00000000..5806b808 --- /dev/null +++ b/service-script-guide.md @@ -0,0 +1,381 @@ +This document is aimed at upstream and distribution developers who +write OpenRC service scripts, either for their own projects, or for +the packages they maintain. It contains advice, suggestions, tips, +tricks, hints, and counsel; cautions, warnings, heads-ups, +admonitions, proscriptions, enjoinders, and reprimands. + +It is intended to prevent common mistakes that are found "in the wild" +by pointing out those mistakes and suggesting alternatives. Each +good/bad thing that you should/not do has a section devoted to it. We +don't consider anything exotic, and assume that you will use +start-stop-daemon to manage a fairly typical long-running UNIX +process. + +# Don't write your own start/stop functions + +OpenRC is capable of stopping and starting most daemons based on the +information that you give it. For a well-behaved daemon that +backgrounds itself and writes its own PID file by default, the +following OpenRC variables are likely all that you'll need: + + * command + * command_args + * pidfile + +Given those three pieces of information, OpenRC will be able to start +and stop the daemon on its own. The following is taken from an +[OpenNTPD](http://www.openntpd.org/) service script: + +```sh +command="/usr/sbin/ntpd" + +# The special RC_SVCNAME variable contains the name of this service. +pidfile="/run/${RC_SVCNAME}.pid" +command_args="-p ${pidfile}" +``` + +If the daemon runs in the foreground by default but has options to +background itself and to create a pidfile, then you'll also need + + * command_args_background + +That variable should contain the flags needed to background your +daemon, and to make it write a PID file. Take for example the +following snippet of an +[NRPE](https://github.com/NagiosEnterprises/nrpe) service script: + +```sh +command="/usr/bin/nrpe" +command_args="--config=/etc/nagios/nrpe.cfg" +command_args_background="--daemon" +pidfile="/run/${RC_SVCNAME}.pid" +``` + +Since NRPE runs as *root* by default, it needs no special permissions +to write to `/run/nrpe.pid`. OpenRC takes care of starting and +stopping the daemon with the appropriate arguments, even passing the +`--daemon` flag during startup to force NRPE into the background (NRPE +knows how to write its own PID file). + +But what if the daemon isn't so well behaved? What if it doesn't know +how to background itself or create a pidfile? If it can do neither, +then use, + + * command_background=true + +which will additionally pass `--make-pidfile` to start-stop-daemon, +causing it to create the `$pidfile` for you (rather than the daemon +itself being responsible for creating the PID file). + +If your daemon doesn't know how to change its own user or group, then +you can tell start-stop-daemon to launch it as an unprivileged user +with + + * command_user="user:group" + +Finally, if your daemon always forks into the background but fails to +create a PID file, then your only option is to use + + * procname + +With `procname`, OpenRC will try to find the running daemon by +matching the name of its process. That's not so reliable, but daemons +shouldn't background themselves without creating a PID file in the +first place. The next example is part of the [CA NetConsole +Daemon](https://oss.oracle.com/projects/cancd/) service script: + +```sh +command="/usr/sbin/cancd" +command_args="-p ${CANCD_PORT} + -l ${CANCD_LOG_DIR} + -o ${CANCD_LOG_FORMAT}" +command_user="cancd" + +# cancd daemonizes itself, but doesn't write a PID file and doesn't +# have an option to run in the foreground. So, the best we can do +# is try to match the process name when stopping it. +procname="cancd" +``` + +To recap, in order of preference: + + 1. If the daemon backgrounds itself and creates its own PID file, use + `pidfile`. + 2. If the daemon does not background itself (or has an option to run + in the foreground) and does not create a PID file, then use + `command_background=true` and `pidfile`. + 3. If the daemon backgrounds itself and does not create a PID file, + use `procname` instead of `pidfile`. But, if your daemon has the + option to run in the foreground, then you should do that instead + (that would be the case in the previous item). + 4. The last case, where the daemon does not background itself but + does create a PID file, doesn't make much sense. If there's a way + to disable the daemon's PID file (or, to write it straight into the + garbage), then do that, and use `command_background=true`. + +# Reloading your daemon's configuration + +Many daemons will reload their configuration files in response to a +signal. Suppose your daemon will reload its configuration in response +to a `SIGHUP`. It's possible to add a new "reload" command to your +service script that performs this action. First, tell the service +script about the new command. + +```sh +extra_started_commands="reload" +``` + +We use `extra_started_commands` as opposed to `extra_commands` because +the "reload" action is only valid while the daemon is running (that +is, started). Now, start-stop-daemon can be used to send the signal to +the appropriate process (assuming you've defined the `pidfile` +variable elsewhere): + +```sh +reload() { + ebegin "Reloading ${RC_SVCNAME}" + start-stop-daemon --signal HUP --pidfile "${pidfile}" + eend $? +} +``` + +# Don't restart/reload with a broken config + +Often, users will start a daemon, make some configuration change, and +then attempt to restart the daemon. If the recent configuration change +contains a mistake, the result will be that the daemon is stopped but +then cannot be started again (due to the configuration error). It's +possible to prevent that situation with a function that checks for +configuration errors, and a combination of the `start_pre` and +`stop_pre` hooks. + +```sh +checkconfig() { + # However you want to check this... +} + +start_pre() { + # If this isn't a restart, make sure that the user's config isn't + # busted before we try to start the daemon (this will produce + # better error messages than if we just try to start it blindly). + # + # If, on the other hand, this *is* a restart, then the stop_pre + # action will have ensured that the config is usable and we don't + # need to do that again. + if [ "${RC_CMD}" != "restart" ] ; then + checkconfig || return $? + fi +} + +stop_pre() { + # If this is a restart, check to make sure the user's config + # isn't busted before we stop the running daemon. + if [ "${RC_CMD}" = "restart" ] ; then + checkconfig || return $? + fi +} +``` + +To prevent a *reload* with a broken config, keep it simple: + +```sh +reload() { + checkconfig || return $? + ebegin "Reloading ${RC_SVCNAME}" + start-stop-daemon --signal HUP --pidfile "${pidfile}" + eend $? +} +``` + +# PID files should be writable only by root + +PID files must be writable only by *root*, which means additionally +that they must live in a *root*-owned directory. + +Some daemons run as an unprivileged user account, and create their PID +files (as the unprivileged user) in a path like +`/run/foo/foo.pid`. That can usually be exploited by the unprivileged +user to kill *root* processes, since when a service is stopped, *root* +usually sends a SIGTERM to the contents of the PID file (which are +controlled by the unprivileged user). The main warning sign for that +problem is using `checkpath` to set ownership on the directory +containing the PID file. For example, + +```sh +# BAD BAD BAD BAD BAD BAD BAD BAD +start_pre() { + # Ensure that the pidfile directory is writable by the foo user/group. + checkpath --directory --mode 0700 --owner foo:foo "/run/foo" +} +# BAD BAD BAD BAD BAD BAD BAD BAD +``` + +If the *foo* user owns `/run/foo`, then he can put whatever he wants +in the `/run/foo/foo.pid` file. Even if *root* owns the PID file, the +*foo* user can delete it and replace it with his own. To avoid +security concerns, the PID file must be created as *root* and live in +a *root*-owned directory. If your daemon is responsible for forking +and writing its own PID file but the PID file is still owned by the +unprivileged runtime user, then you may have an upstream issue. + +Once the PID file is being created as *root* (before dropping +privileges), it can be written directly to a *root*-owned +directory. Typically this will be `/run` on Linux, and `/var/run` +elsewhere. For example, the *foo* daemon might write +`/run/foo.pid`. No calls to checkpath are needed. Note: there is +nothing technically wrong with using a directory structure like +`/run/foo/foo.pid`, so long as *root* owns the PID file and the +directory containing it. + +Ideally (see "Upstream your service scripts"), your service script +will be integrated upstream and the build system will determine +which of `/run` or `/var/run` is appropriate. For example, + +```sh +pidfile="@piddir@/${RC_SVCNAME}.pid" +``` + +A decent example of this is the [Nagios core service +script](https://github.com/NagiosEnterprises/nagioscore/blob/master/openrc-init.in), +where the full path to the PID file is specified at build-time. + +# Don't let the user control the PID file location + +It's usually a mistake to let the end user control the PID file +location through a conf.d variable, for a few reasons: + + 1. When the PID file path is controlled by the user, you need to + ensure that its parent directory exists and is writable. This + adds unnecessary code to the service script. + + 2. If the PID file path changes while the service is running, then + you'll find yourself unable to stop the service. + + 3. The directory that should contain the PID file is best determined + by the upstream build system (see "Upstream your service scripts"). + On Linux, the preferred location these days is `/run`. Other systems + still use `/var/run`, though, and a `./configure` script is the + best place to decide which one you want. + + 4. Nobody cares where the PID file is located, anyway. + +Since OpenRC service names must be unique, a value of + +```sh +pidfile="/run/${RC_SVCNAME}.pid" +``` + +guarantees that your PID file has a unique name. + +# Upstream your service scripts (for distribution developers) + +The ideal place for an OpenRC service script is **upstream**. Much like +systemd services, a well-crafted OpenRC service script should be +distribution-agnostic, and the best place for it is upstream. Why? For +two reasons. First, having it upstream means that there's a single +authoritative source for improvements. Second, a few paths in every +service script are dependent upon flags passed to the build system. For +example, + +```sh +command=/usr/bin/foo +``` + +in an autotools-based build system should really be + +```sh +command=@bindir@/foo +``` + +so that the user's value of `--bindir` is respected. If you keep the +service script in your own distribution's repository, then you have to +keep the command path and package synchronized yourself, and that's no +fun. + +# Be wary of "need net" dependencies + +There are two things you need to know about "need net" dependencies: + + 1. They are not satisfied by the loopback interface, so "need net" + requires some *other* interface to be up. + + 2. Depending on the value of `rc_depend_strict` in `rc.conf`, the + "need net" will be satisfied when either *any* non-loopback + interface is up, or when *all* non-loopback interfaces are up. + +The first item means that "need net" is wrong for daemons that are +happy with `0.0.0.0`, and the second point means that "need net" is +wrong for daemons that need a particular (for example, the WAN) +interface. We'll consider the two most common users of "need net"; +network clients who access some network resource, and network servers +who provide them. + +## Network clients + +Network clients typically want the WAN interface to be up. That may +tempt you to depend on the WAN interface; but first, you should ask +yourself a question: does anything bad happen if the WAN interface is +not available? In other words, if the administrator wants to disable +the WAN, should the service be stopped? Usually the answer to that +question is "no," and in that case, you should forego the "net" +dependency entirely. + +Suppose, for example, that your service retrieves virus signature +updates from the internet. In order to do its job correctly, it needs +a (working) internet connection. However, the service itself does not +require the WAN interface to be up: if it is, great; otherwise, the +worst that will happen is that a "server unavailable" warning will be +logged. The signature update service will not crash, and—perhaps more +importantly—you don't want it to terminate if the administrator turns +off the WAN interface for a second. + +## Network servers + +Network servers are generally easier to handle than their client +counterparts. Most server daemons listen on `0.0.0.0` (all addresses) +by default, and are therefore satisfied to have the loopback interface +present and operational. OpenRC ships with the loopback service in the +*boot* runlevel, and therefore most server daemons require no further +network dependencies. + +The exceptions to this rule are those daemons who produce negative +side-effects when the WAN is unavailable. For example, the Nagios +server daemon will generate "the sky is falling" alerts for as long as +your monitored hosts are unreachable. So in that case, you should +require some other interface (often the WAN) to be up. A "need" +dependency would be appropriate, because you want Nagios to be +stopped before the network is taken down. + +If your daemon can optionally be configured to listen on a particular +interface, then please see the "Depending on a particular interface" +section. + +## Depending on a particular interface + +If you need to depend on one particular interface, usually it's not +easy to determine programmatically what that interface is. For +example, if your *sshd* daemon listens on `192.168.1.100` (rather than +`0.0.0.0`), then you have two problems: + + 1. Parsing `sshd_config` to figure that out; and + + 2. Determining which network service name corresponds to the + interface for `192.168.1.100`. + +It's generally a bad idea to parse config files in your service +scripts, but the second problem is the harder one. Instead, the most +robust (i.e. the laziest) approach is to make the user specify the +dependency when he makes a change to sshd_config. Include something +like the following in the service configuration file, + +```sh +# Specify the network service that corresponds to the "bind" setting +# in your configuration file. For example, if you bind to 127.0.0.1, +# this should be set to "net.lo" which provides the loopback interface. +rc_need="net.lo" +``` + +This is a sensible default for daemons that are happy with `0.0.0.0`, +but lets the user specify something else, like `rc_need="net.wan"` if +he needs it. The burden is on the user to determine the appropriate +service whenever he changes the daemon's configuration file.