| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When this option is set to direct, the service restarts without entering a failed
state. Dependent units are not notified of transitory failure.
This is useful for the following use case:
We have a target with Requires=my-service, After=my-service.
my-service.service is a oneshot service and has Restart=on-failure in
its definition.
my-service.service can get stuck for various reasons and time out, in
which case it is restarted. Currently, when it fails the first time, the
target fails, even though my-service is restarted.
The behavior we're looking for is that until my-service is not restarted
anymore, the target stays pending waiting for my-service.service to
start successfully or fail without being restarted anymore.
|
|
|
|
|
|
|
|
|
|
| |
The announced new behavior for OnFailure= never worked properly,
and we've fixed the document instead in #27675.
Therefore, let's get rid of the unused logic completely. More at #27594.
The to-be-added RestartMode= option should cover the use case hopefully.
Closes #27594
|
| |
|
|
|
|
|
|
|
|
| |
Just to match service_release_stdio_fd() and service_release_fd_store()
in the name, since they do similar things.
This follows the concept that we "release" resources, and this is all
generically wrapped in "service_release_resources()".
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Oftentimes it is useful to allow the per-service fd store to survive
longer than for a restart. This is useful in various scenarios:
1. An fd to some security relevant object needs to be stashed somewhere,
that should not be cleaned automatically, because the security
enforcement would be dropped then.
2. A user namespace fd should be allocated on first invocation and be
kept around until the user logs out (i.e. systemd --user ends), á la
#16328 (This does not implement what #16318 asks for, but should
solve the use-case discussed there.)
3. There's interest in allow a concept of "userspace reboots" where the
kernel stays running, and userspace is swapped out (i.e. all services
exit, and the rootfs transitioned into a new version of it) while
keeping some select resources pinned, very similar to how we
implement a switch root. Thus it is useful to allow services to exit,
while leaving their fds around till the very end.
This is exposed through a new FileDescriptorStorePreserve= setting that
is closely modelled after RuntimeDirectoryPreserve= (in fact it reused
the same internal type), since we want similar behaviour in the end, and
quite often they probably want to be used together.
|
|
|
|
|
|
|
|
|
|
| |
Follow-up for #26902 and #26971
Let's always calculate the next restart interval
since that's more useful.
For that, we add 1 to s->n_restarts unconditionally,
and change RestartUSecCurrent property to RestartUSecNext.
|
|\
| |
| | |
pid1: introduce new SERVICE_{DEAD|FAILED}_BEFORE_AUTO_RESTART service…
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When a service deactivates and is then automatically restarted via
Restart= we currently quickly transition through
SERVICE_DEAD/SERVICE_FAILED. Which is weird given it's not the
normal ("permanent") dead/failed state, but a transitory one we
immediately leave from again. We do this so that software that looks for
failures/successes can take notice, even if we restart as a consequence
of the deactivation.
Let's clean this up a bit: let's introduce two new states:
SERVICE_DEAD_BEFORE_AUTO_RESTART and SERVICE_FAILED_BEFORE_AUTO_RESTART
that are used for the transitory states. Both the SERVICE_DEAD and
SERVICE_DEAD_BEFORE_AUTO_RESTART will map to the high-level
UNIT_INACTIVE state though. (and similar for the respective failed
states). This means the high-level state machine won't change by this,
only the low-level one.
This clearly seperates the substates, which makes the state engine
cleaner, and allows clients to follow precisely whether we are in a
transitory dead/failed state, or a permanent one, by looking at the
service substate. Moreover it allows us to remove the 'n_keep_fd_store'
which so far we used to ensure the fdstore was not released during this
transitory dead/failed state but only during the permanent one. Since we
can now distinguish these states properly we can just use that.
This has been bugging me for a while. Let's clean this up.
Note that the unit restart logic is already nicely covered in the
testsiute, hence this adds no new tests for that.
And yes, this could be considered a compat break, but sofar we took the
liberty to make changes to the low-level state machine (i.e. SERVICE_xyz
states, sometimes called "substates") without considering this a bad
breakage – the high-level state machine (i.e. UNIT_xyz states) should
be considered API that cannot be changed.
|
|\ \
| |/
|/| |
core: Introduce unit private exec runtime
|
| |
| |
| |
| |
| | |
This is just another piece of runtime data so let's store it in
ExecRuntime alongside the other runtime data.
|
| |
| |
| |
| |
| |
| |
| |
| | |
Currently, exec runtimes can be shared between units (using
JoinsNamespaceOf=). Let's introduce a concept of a private exec
runtime that isn't shared with JoinsNamespaceOf=. The existing
ExecRuntime struct is renamed to ExecRuntimeShared and becomes a
private member of the new private ExecRuntime.
|
| |
| |
| |
| | |
Preparation for next commit
|
|/
|
|
|
|
|
|
|
|
| |
interval between restarts
RestartSteps= accepts a positive integer as the number of steps
to take to increase the interval between auto-restarts from
RestartSec= to RestartSecMax=, or 0 to disable it.
Closes #6129
|
|
|
|
| |
Closes #25963
|
|
|
|
| |
Fixes: #6162
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To notify user of kill events from systemd-oomd we now use
`SERVICE_FAILURE_OOM_KILL` as the failure result.
`unit_check_oomd_kill` now calls `notify_cgroup_oom` to
update the service result to `oom-kill`.
We add a new xattr `user.oomd_ooms` to keep track of the OOM kills
initiated by systemd-oomd, this helps us resolve a race between sending
SIGKILL to processes and checking for OOM kill status from the xattr.
Related to: #20649
|
|
|
|
|
|
|
|
|
|
|
|
| |
A first step of removing blocking calls to the D-Bus broker from PID 1.
There's a lot more to got (i.e. grep src/core/ for sd_bus_creds
basically), but it's a start.
Removing blocking calls to D-Bus broker deals systematicallly with
deadlocks caused by dbus-daemon blocking on synchronous IPC calls back
to PID1 (e.g. Varlink calls through nss-systemd). Bugs such as #15316.
Also-see: https://github.com/systemd/systemd/pull/22038#issuecomment-1042958390
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Per-connection socket instances we currently maintain three fields
related to the socket: a reference to the Socket unit, the connection fd,
and a reference to the SocketPeer object that counts socket peers.
Let's synchronize their lifetime, i.e. always set them all three
together or unset them together, so that their reference counters stay
synchronous.
THis will in particuar ensure that we'll drop the SocketPeer reference
whenever we leave an active state of the service unit, i.e. at the same
time we close the fd for it.
Fixes: #20685
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This introduces `ExitType=main|cgroup` for services.
Similar to how `Type` specifies the launch of a service, `ExitType` is
concerned with how systemd determines that a service exited.
- If set to `main` (the current behavior), the service manager will consider
the unit stopped when the main process exits.
- The `cgroup` exit type is meant for applications whose forking model is not
known ahead of time and which might not have a specific main process.
The service will stay running as long as at least one process in the cgroup
is running. This is intended for transient or automatically generated
services, such as graphical applications inside of a desktop environment.
Motivation for this is #16805. The original PR (#18782) was reverted (#20073)
after realizing that the exit status of "the last process in the cgroup" can't
reliably be known (#19385)
This version instead uses the main process exit status if there is one and just
listens to the cgroup empty event otherwise.
The advantages of a service with `ExitType=cgroup` over scopes are:
- Integrated logging / stdout redirection
- Avoids the race / synchronisation issue between launch and scope creation
- More extensive use of drop-ins and thus distro-level configuration:
by moving from scopes to services we can have drop ins that will affect
properties that can only be set during service creation,
like `OOMPolicy` and security-related properties
- It makes systemd-xdg-autostart-generator usable by fixing [1], as obviously
only services can be used in the generator, not scopes.
[1] https://bugs.kde.org/show_bug.cgi?id=433299
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit cb0e818f7cc2499d81ef143e5acaa00c6e684711.
After this was merged, some design and implementation issues were discovered,
see the discussion in #18782 and #19385. They certainly can be fixed, but so
far nobody has stepped up, and we're nearing a release. Hopefully, this feature
can be merged again after a rework.
Fixes #19345.
|
|
|
|
|
|
|
| |
We should test both serialization and deserialization works properly.
But the serialization/deserialization code is deeply entwined with the
manager state, and I think quite a bit of refactoring will be required before
this is possible. But let's at least add this simple test for now.
|
| |
|
|
|
|
|
|
|
|
|
| |
As suggested in https://github.com/systemd/systemd/pull/11484#issuecomment-775288617.
This does not touch anything exposed in src/systemd. Changing the defines there
would be a compatibility break.
Note that tests are broken after this commit. They will be fixed in the next one.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The usual behaviour when a timeout expires is to terminate/kill the
service. This is what user usually want in production systems. To debug
services that fail to start/stop (especially sporadic failures) it
might be necessary to trigger the watchdog machinery and write core
dumps, though. Likewise, it is usually just a waste of time to
gracefully stop a stuck service. Instead it might save time to go
directly into kill mode.
This commit adds two new options to services: TimeoutStartFailureMode=
and TimeoutStopFailureMode=. Both take the same values and tweak the
behavior of systemd when a start/stop timeout expires:
* 'terminate': is the default behaviour as it has always been,
* 'abort': triggers the watchdog machinery and will send SIGABRT
(unless WatchdogSignal was changed) and
* 'kill' will directly send SIGKILL.
To handle the stop failure mode in stop-post state too a new
final-watchdog state needs to be introduced.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Suppose a service has WatchdogSec set to 2 seconds in its unit file. I
then start the service and WatchdogUSec is set correctly:
% systemctl --user show psi-notify -p WatchdogUSec
WatchdogUSec=2s
Now I call `sd_notify(0, "WATCHDOG_USEC=10000000")`. The new timer seems
to have taken effect, since I only send `WATCHDOG=1` every 4 seconds,
and systemd isn't triggering the watchdog handler. However, `systemctl
show` still shows WatchdogUSec as 2s:
% systemctl --user show psi-notify -p WatchdogUSec
WatchdogUSec=2s
This seems surprising, since this "original" watchdog timer isn't the
one taking effect any more. This patch makes it so that we instead
display the new watchdog timer after sd_notify(WATCHDOG_USEC):
% systemctl --user show psi-notify -p WatchdogUSec
WatchdogUSec=10s
Fixes #15726.
|
|
|
|
|
| |
This replaces manual string splitting and unescaping with
extract_first_word.
|
| |
|
|
|
|
| |
Closes #10596
|
|
|
|
|
|
| |
The implementation is pretty straight-foward: when we get a request to
clean some type of resources we fork off a process doing that, and while
it is running we are in the "cleaning" state.
|
|
|
|
| |
Closes #11654
|
| |
|
|
|
|
| |
Follow-up for dc653bf487bae9d1ddf794442bf4176fee173b41 (#11211).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When shooting down a service with SIGABRT the user might want to have a
much longer stop timeout than on regular stops/shutdowns. Especially in
the face of short stop timeouts the time might not be sufficient to
write huge core dumps before the service is killed.
This commit adds a dedicated (Default)TimeoutAbortSec= timer that is
used when stopping a service via SIGABRT. In all other cases the
existing TimeoutStopSec= is used. The timer value is unset by default
to skip the special handling and use TimeoutStopSec= for state
'stop-watchdog' to keep the old behaviour.
If the service is in state 'stop-watchdog' and the service should be
stopped explicitly we still go to 'stop-sigterm' and re-apply the usual
TimeoutStopSec= timeout.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a new per-service OOMPolicy= (along with a global
DefaultOOMPolicy=) that controls what to do if a process of the service
is killed by the kernel's OOM killer. It has three different values:
"continue" (old behaviour), "stop" (terminate the service), "kill" (let
the kernel kill all the service's processes).
On top of that, track OOM killer events per unit: generate a per-unit
structured, recognizable log message when we see an OOM killer event,
and put the service in a failure state if an OOM killer event was seen
and the selected policy was not "continue". A new "result" is defined
for this case: "oom-kill".
All of this relies on new cgroupv2 kernel functionality: the
"memory.events" notification interface and the "memory.oom.group"
attribute (which makes the kernel kill all cgroup processes
automatically).
|
|
|
|
|
|
| |
cache lines
Some "pahole" spelunking.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
that
When we start a service process we pass the selected watchdog timeout to
it with the $WATCHDOG_USEC environment variable. If the unit file is
reconfigured later, we need to make sure to continue to honour the
original timeout, i.e. watch $WATCHDOG_USEC was set to, otherwise we'll
expect the ping at a different time as the service process is sending it
to us.
Hence, whenever we start a unit, save the watchdog timeout, and stick to
that for everything we do.
Fixes: #9467
|
|
|
|
| |
Let's better be safe than sorry, and put a limit on what we receive.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Users are often surprised that "systemd-run" command lines like
"systemd-run -p User=idontexist /bin/true" will return successfully,
even though the logs show that the process couldn't be invoked, as the
user "idontexist" doesn't exist. This is because Type=simple will only
wait until fork() succeeded before returning start-up success.
This patch adds a new service type Type=exec, which is very similar to
Type=simple, but waits until the child process completed the execve()
before returning success. It uses a pipe that has O_CLOEXEC set for this
logic, so that the kernel automatically sends POLLHUP on it when the
execve() succeeded but leaves the pipe open if not. This means PID 1
waits exactly until the execve() succeeded in the child, and not longer
and not shorter, which is the desired functionality.
Making use of this new functionality, the command line
"systemd-run -p User=idontexist -p Type=exec /bin/true" will now fail,
as expected.
|
|
|
|
|
|
|
|
|
|
|
| |
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This part of the copyright blurb stems from the GPL use recommendations:
https://www.gnu.org/licenses/gpl-howto.en.html
The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.
hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously we were a bit sloppy with the index and size types of arrays,
we'd regularly use unsigned. While I don't think this ever resulted in
real issues I think we should be more careful there and follow a
stricter regime: unless there's a strong reason not to use size_t for
array sizes and indexes, size_t it should be. Any allocations we do
ultimately will use size_t anyway, and converting forth and back between
unsigned and size_t will always be a source of problems.
Note that on 32bit machines "unsigned" and "size_t" are equivalent, and
on 64bit machines our arrays shouldn't grow that large anyway, and if
they do we have a problem, however that kind of overly large allocation
we have protections for usually, but for overflows we do not have that
so much, hence let's add it.
So yeah, it's a story of the current code being already "good enough",
but I think some extra type hygiene is better.
This patch tries to be comprehensive, but it probably isn't and I missed
a few cases. But I guess we can cover that later as we notice it. Among
smaller fixes, this changes:
1. strv_length()' return type becomes size_t
2. the unit file changes array size becomes size_t
3. DNS answer and query array sizes become size_t
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=76745
|
|
|
|
|
|
|
|
|
|
| |
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.
I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
An auto-restarted unit B may depend on unit A with StopWhenUnneeded=yes.
If A stops before B's restart timeout expires, it'll be started again as part
of B's dependent jobs. However, if stopping takes longer than the timeout, B's
running stop job collides start job which also cancels B's start job. Result is
that neither A or B are active.
Currently, when a service with automatic restarting fails, it transitions
through following states:
1) SERVICE_FAILED or SERVICE_DEAD to indicate the failure,
2) SERVICE_AUTO_RESTART while restart timer is running.
The StopWhenUnneeded= check takes place in service_enter_dead between the two
state mentioned above. We temporarily store the auto restart flag to query it
during the check. Because we don't return control to the main event loop, this
new service unit flag needn't be serialized.
This patch prevents the pathologic situation when the service with Restart=
won't restart automatically. As a side effect it also avoid restarting the
dependency unit with StopWhenUnneeded=yes.
Fixes: #7377
|
|\
| |
| | |
Fix delegation in the unified hierarchy + more cgroup work
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This introduces a new function unit_prepare_exec() that encapsulates a
number of calls we do in preparation for spawning off some processes in
all our unit types that do so.
This allows us to neatly unify a bit of code between unit types and
shorten our code.
|
|/
|
|
|
| |
All kinds of units can fail, hence it makes sense to offer this as
generic concept for all unit types.
|