| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
| |
Avoids the need to maintain the same list over and over again, and
link it to the defition table in the implementation as a reminder
too
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
These operations might require slow I/O, and thus might block PID1's main
loop for an undeterminated amount of time. Instead of performing them
inline, fork a worker process and stash away the D-Bus message, and reply
once we get a SIGCHILD indicating they have completed. That way we don't
break compatibility and callers can continue to rely on the fact that when
they get the method reply the operation either succeeded or failed.
To keep backward compatibility, unlike reload control processes, these
are ran inside init.scope and not the target cgroup. Unlike ExecReload,
this is under our control and is not defined by the unit. This is necessary
because previously the operation also wasn't ran from the target cgroup,
so suddenly forking a copy-on-write copy of pid1 into the target cgroup
will make memory usage spike, and if there is a MemoryMax= or MemoryHigh=
set and the cgroup is already close to the limit, it will cause an OOM
kill, where previously it would have worked fine.
|
| |
|
| |
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This commit overhauls the way freeze/thaw works recursively:
First, it introduces new FreezerActions that are like the existing
FREEZE and THAW but indicate that the action was initiated by a parent
unit. We also refactored the code to pass these FreezerActions through
the whole call stack so that we can make use of them. FreezerState was
extended similarly, to be able to differentiate between a unit that's
frozen manually and a unit that's frozen because a parent is frozen.
Next, slices were changed to check recursively that all their child
units can be frozen before it attempts to freeze them. This is different
from the previous behavior, that would just check if the unit's type
supported freezing at all. This cleans up the code, and also ensures
that the behavior of slices corresponds to the unit's actual ability
to be frozen
Next, we make it so that if you FREEZE a slice, it'll PARENT_FREEZE
all of its children. Similarly, if you THAW a slice it will PARENT_THAW
its children.
Finally, we use the new states available to us to refactor the code
that actually does the cgroup freezing. The code now looks at the unit's
existing freezer state and the action being requested, and decides what
next state is most appropriate. Then it puts the unit in that state.
For instance, a RUNNING unit with a request to PARENT_FREEZE will
put the unit into the PARENT_FREEZING state. As another example, a
FROZEN unit who's parent is also FROZEN will transition to
PARENT_FROZEN in response to a request to THAW.
Fixes https://github.com/systemd/systemd/issues/30640
Fixes https://github.com/systemd/systemd/issues/15850
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 1483892a421ca34bc841a8e8b1f385744c0407ed.
As the commit says, it does not solve the race. Moreover, it introduces
an regression #28410.
Also, checking by `path_is_mount_point()` may trigger automount. From
statx(2),
> AT_NO_AUTOMOUNT
> Don't automount the terminal ("basename") component of pathname
> if it is a directory that is an automount point.
Similar statements can be found in fstatat(2), which is used in the
fallback call for statx() in glibc, and name_to_handle_at(2), which is
used as the fallback when statx() failed.
So, `path_is_mount_point()` may _do_ trigger automount for parent paths.
That should be avoided especially on shutdown.
The original issue #25527 that is 'fixed' by the commit is not serious,
and should be fixed by making umount command handle path gracefully:
https://github.com/util-linux/util-linux/issues/2132
Fixes #28410.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This new job mode will enqueue a start job for a unit, and all units
depending on the unit will get a restart job enqueued. This is then used
for automatic sevice restarts: the unit itself is only started, the
depending units restarted. This way the unit will not go down
unnecessarily, triggering OnSuccess= needlessly.
This also introduces a new state SERVICE_AUTO_RESTART_QUEUED that is
entered once the restart jobs are enqueued. Previously we'd stay in
SERVICE_AUTO_RESTART, but that's problematic, since we'd lose
information whether we still need to enqueue the restart job during a
serialization/deserialization cycle or not. By having an explicit state
for this we know exactly whether we still need to enqueue the job or
not. It's also good since when we are in SERVICE_AUTO_RESTART_QUEUED we
want to act on unit_start(), but on SERVICE_AUTO_RESTART we want to wait
for the holdoff time to pass before we act on unit_start().
Fixes: #27722
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Notifications from /proc/self/mountinfo are async, so if we stop a
service (and while doing so get rid of the credentials mount point of
it), then it will take a while until the notification reaches us and we
actually scan the table again. In particular as we nowadays ratelimit
notifications on the table, since it's so inefficient. And as I learnt
the ratelimiting is actually quite regularly hit during shutdown, where
a flurry of umount events are genreated. Hence, let's check if a mount
point is actually a mountpoint before trying to unmount it. And if it
isn't let's wait for the notification to come in.
(This race might be triggred not just by us on ourselves btw: there are
other daemons that unmount stuff when stopping where the race also
exists, but might simply be harder to trigger: if during service
shutdown these services remove some mount then they might collide with
us doing the same. After all, we have the rule to unmount everything
mounted automatically for you during shutdown.)
In the long run we should also start making us of this when it becomes
available: https://github.com/util-linux/util-linux/issues/2132 With
that we can make issues like this go away entirely from our side of
things at least.
Fixes: #25527
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Oftentimes it is useful to allow the per-service fd store to survive
longer than for a restart. This is useful in various scenarios:
1. An fd to some security relevant object needs to be stashed somewhere,
that should not be cleaned automatically, because the security
enforcement would be dropped then.
2. A user namespace fd should be allocated on first invocation and be
kept around until the user logs out (i.e. systemd --user ends), á la
#16328 (This does not implement what #16318 asks for, but should
solve the use-case discussed there.)
3. There's interest in allow a concept of "userspace reboots" where the
kernel stays running, and userspace is swapped out (i.e. all services
exit, and the rootfs transitioned into a new version of it) while
keeping some select resources pinned, very similar to how we
implement a switch root. Thus it is useful to allow services to exit,
while leaving their fds around till the very end.
This is exposed through a new FileDescriptorStorePreserve= setting that
is closely modelled after RuntimeDirectoryPreserve= (in fact it reused
the same internal type), since we want similar behaviour in the end, and
quite often they probably want to be used together.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a service deactivates and is then automatically restarted via
Restart= we currently quickly transition through
SERVICE_DEAD/SERVICE_FAILED. Which is weird given it's not the
normal ("permanent") dead/failed state, but a transitory one we
immediately leave from again. We do this so that software that looks for
failures/successes can take notice, even if we restart as a consequence
of the deactivation.
Let's clean this up a bit: let's introduce two new states:
SERVICE_DEAD_BEFORE_AUTO_RESTART and SERVICE_FAILED_BEFORE_AUTO_RESTART
that are used for the transitory states. Both the SERVICE_DEAD and
SERVICE_DEAD_BEFORE_AUTO_RESTART will map to the high-level
UNIT_INACTIVE state though. (and similar for the respective failed
states). This means the high-level state machine won't change by this,
only the low-level one.
This clearly seperates the substates, which makes the state engine
cleaner, and allows clients to follow precisely whether we are in a
transitory dead/failed state, or a permanent one, by looking at the
service substate. Moreover it allows us to remove the 'n_keep_fd_store'
which so far we used to ensure the fdstore was not released during this
transitory dead/failed state but only during the permanent one. Since we
can now distinguish these states properly we can just use that.
This has been bugging me for a while. Let's clean this up.
Note that the unit restart logic is already nicely covered in the
testsiute, hence this adds no new tests for that.
And yes, this could be considered a compat break, but sofar we took the
liberty to make changes to the low-level state machine (i.e. SERVICE_xyz
states, sometimes called "substates") without considering this a bad
breakage – the high-level state machine (i.e. UNIT_xyz states) should
be considered API that cannot be changed.
|
|
|
|
| |
Fixes: #6162
|
|
|
|
|
|
|
|
|
| |
Previously it was possible to set delegate property for scope, but you
were not able to allow unprivileged process to manage the scope's cgroup
hierarchy. This is useful when launching manager process that will run
unprivileged but is supposed to manage its own (scope) sub-hierarchy.
Fixes #21683
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is like a really strong version of Wants=, that keeps starting the
specified unit if it is ever found inactive.
This is an alternative to Restart= inside a unit, acknowledging the fact
that whether to keep restarting the unit is sometimes not a property of
the unit itself but the state of the system.
This implements a part of what #4263 requests. i.e. there's no
distinction between "always" and "opportunistic". We just dumbly
implement "always" and become active whenever we see no job queued for
an inactive unit that is supposed to be upheld.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is similar to OnFailure= but is activated whenever a unit returns
into inactive state successfully.
I was always afraid of adding this, since it effectively allows building
loops and makes our engine Turing complete, but it pretty much already
was it was just hidden.
Given that we have per-unit ratelimits as well as an event loop global
ratelimit I feel safe to add this finally, given it actually is useful.
Fixes: #13386
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This takes inspiration from PropagatesReloadTo=, but propagates
stop jobs instead of restart jobs.
This is defined based on exactly two atoms: UNIT_ATOM_PROPAGATE_STOP +
UNIT_ATOM_RETROACTIVE_STOP_ON_STOP. The former ensures that when the
unit the dependency is originating from is stopped based on user
request, we'll propagate the stop job to the target unit, too. In
addition, when the originating unit suddenly stops from external causes
the stopping is propagated too. Note that this does *not* include the
UNIT_ATOM_CANNOT_BE_ACTIVE_WITHOUT atom (which is used by BoundBy=),
i.e. this dependency is purely about propagating "edges" and not
"levels", i.e. it's about propagating specific events, instead of
continious states.
This is supposed to be useful for dependencies between .mount units and
their backing .device units. So far we either placed a BindsTo= or
Requires= dependency between them. The former gave a very clear binding
of the to units together, however was problematic if users establish
mounnts manually with different block device sources than our
configuration defines, as we there might come to the conclusion that the
backing device was absent and thus we need to umount again what the user
mounted. By combining Requires= with the new StopPropagatedFrom= (i.e.
the inverse PropagateStopTo=) we can get behaviour that matches BindsTo=
in every single atom but one: UNIT_ATOM_CANNOT_BE_ACTIVE_WITHOUT is
absent, and hence the level-triggered logic doesn't apply.
Replaces: #11340
|
|
|
|
|
|
|
|
|
|
|
|
| |
Let's add an implicit reverse dep OnFailureOf=. This is exposed via the
bus to make things more debuggable: you can now ask systemd for which
units a specific unit is the failure handler.
OnFailure= was the only dependency type that had no inverse, this fixes
that.
Now that deps are a bit cheaper, it should be OK to add deps that only
serve debug purposes.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The slice a unit is assigned to is currently a UnitRef reference. Let's
turn it into a proper dependency, to simplify and clean up code a bit.
Now that new dep types are cheaper, deps should generally be preferable
over everything else, if the concept applies.
This brings one major benefit: we often have to iterate through all unit
a slice contains. So far we iterated through all Before= dependencies of
the slice unit to achieve that, filtering out unrelated units, and
taking benefit of the fact that slice units are implicitly ordered
Before= the units they contain. By making Slice= a proper dependency,
and having an accompanying SliceOf= dependency type, this is much
simpler and nicer as we can directly enumerate the units a slice
contains.
The forward dependency is actually called InSlice internally, since we
already used the UNIT_SLICE name as UnitType field. However, since we
don't intend to expose the dependency to users as dep anyway (we already
have the regular Slice D-Bus property for this) this shouldn't matter.
The SliceOf= implicit dependency type (the erverse of Slice=/InSlice=)
is exported over the bus, to make things a bit nicer to debug and
discoverable.
|
|
|
|
|
| |
Let's make this a bit more readable by implementing this via a
translation table, indexed by the state.
|
|
|
|
|
|
|
|
|
| |
The property is never set by systemd, only reset after a stop or restart or
reload. It may externally be set to mark the unit for a later restart/reload.
I wasn't sure whether to configure the property only for the types where this
makes sense (Service, Swap, etc). But Restart() method is defined on the unit,
and also having this always under the same property name is more convenient.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The usual behaviour when a timeout expires is to terminate/kill the
service. This is what user usually want in production systems. To debug
services that fail to start/stop (especially sporadic failures) it
might be necessary to trigger the watchdog machinery and write core
dumps, though. Likewise, it is usually just a waste of time to
gracefully stop a stuck service. Instead it might save time to go
directly into kill mode.
This commit adds two new options to services: TimeoutStartFailureMode=
and TimeoutStopFailureMode=. Both take the same values and tweak the
behavior of systemd when a start/stop timeout expires:
* 'terminate': is the default behaviour as it has always been,
* 'abort': triggers the watchdog machinery and will send SIGABRT
(unless WatchdogSignal was changed) and
* 'kill' will directly send SIGKILL.
To handle the stop failure mode in stop-post state too a new
final-watchdog state needs to be introduced.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
With cgroup v2 the cgroup freezer is implemented as a cgroup
attribute called cgroup.freeze. cgroup can be frozen by writing "1"
to the file and kernel will send us a notification through
"cgroup.events" after the operation is finished and processes in the
cgroup entered quiescent state, i.e. they are not scheduled to
run. Writing "0" to the attribute file does the inverse and process
execution is resumed.
This commit exposes above low-level functionality through systemd's DBus
API. Each unit type must provide specialized implementation for these
methods, otherwise, we return an error. So far only service, scope, and
slice unit types provide the support. It is possible to check if a
given unit has the support using CanFreeze() DBus property.
Note that DBus API has a synchronous behavior and we dispatch the reply
to freeze/thaw requests only after the kernel has notified us that
requested operation was completed.
|
| |
|
| |
|
| |
|
|
|
|
| |
Closes #10596
|
|
|
|
|
| |
It's a special case of strjoin(), so no need to keep both. In particular
as typing strjoin() is even shoert than strappend().
|
|
|
|
|
|
| |
The implementation is pretty straight-foward: when we get a request to
clean some type of resources we fork off a process doing that, and while
it is running we are in the "cleaning" state.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds basic infrastructure to implement a "clean" operation for unit
types. This "clean" operation is supposed to remove on-disk resources of
units, and is supposed to be used in a later commit to clean our
RuntimeDirectory=, StateDirectory= and so on of service units.
Later commits will open this up to the bus, and hook up service units
with this.
This also adds a new generic ActiveState called UNIT_MAINTENANCE. It's
supposed to cover all kinds of "maintainance" state of units.
Specifically, this is supposed to cover the "cleaning" operations later
added for service units which might take a bit of time. This high-level,
generic, abstract state is called UNIT_MAINTENANCE instead of the
more specific "UNIT_CLEANING", since I think this should be kept open
for different operations possibly later on that could be nicely subsumed
under this (for example, maybe a recursive chown()ing operation could be
covered by this, and similar).
|
|
|
|
|
|
|
|
|
|
| |
Allows configuring the watchdog signal (with a default of SIGABRT).
This allows an alternative to SIGABRT when coredumps are not desirable.
Appropriate references to SIGABRT or aborting were renamed to reflect
more liberal watchdog signals.
Closes #8658
|
|
|
|
|
|
|
|
|
|
|
| |
These lines are generally out-of-date, incomplete and unnecessary. With
SPDX and git repository much more accurate and fine grained information
about licensing and authorship is available, hence let's drop the
per-file copyright notice. Of course, removing copyright lines of others
is problematic, hence this commit only removes my own lines and leaves
all others untouched. It might be nicer if sooner or later those could
go away too, making git the only and accurate source of authorship
information.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This part of the copyright blurb stems from the GPL use recommendations:
https://www.gnu.org/licenses/gpl-howto.en.html
The concept appears to originate in times where version control was per
file, instead of per tree, and was a way to glue the files together.
Ultimately, we nowadays don't live in that world anymore, and this
information is entirely useless anyway, as people are very welcome to
copy these files into any projects they like, and they shouldn't have to
change bits that are part of our copyright header for that.
hence, let's just get rid of this old cruft, and shorten our codebase a
bit.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since bb28e68477a3a39796e4999a6cbc6ac6345a9159 parsing failures of
certain unit file settings will result in load failures of units. This
introduces a new load state "bad-setting" that is entered in precisely
this case.
With this addition error messages on bad settings should be a lot more
explicit, as we don't have to show some generic "errno" error in that
case, but can explicitly say that a bad setting is at fault.
Internally this unit load state is entered as soon as any configuration
loader call returns ENOEXEC. Hence: config parser calls should return
ENOEXEC now for such essential unit file settings. Turns out, they
generally already do.
Fixes: #9107
|
|
|
|
|
|
|
|
|
|
| |
Files which are installed as-is (any .service and other unit files, .conf
files, .policy files, etc), are left as is. My assumption is that SPDX
identifiers are not yet that well known, so it's better to retain the
extended header to avoid any doubt.
I also kept any copyright lines. We can probably remove them, but it'd nice to
obtain explicit acks from all involved authors before doing that.
|
|
|
|
|
| |
This follows what the kernel is doing, c.f.
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5fd54ace4721fc5ce2bb5aef6318fcf17f421460.
|
|
It always bothered me a bit that unit-name.[ch] contains so many
definitions that aren't really have much to do with unit nameing, for
example all the unit state definitions.
With this patch unit-name.[ch] is split into two: the file now contains
only the unit naming related operations, and everything else is split
out into a new set of files unit-def.[ch]. That's mostly unit state
stuff as well as dbus path and interface name operations.
No functional changes. This just moves code around.
(Note as both .c files include each other's headers this doesn't make
the build simpler or anything. All it does is make the C files a bit
shorter, and medicate my pretend OCD)
|