summaryrefslogtreecommitdiffstats
path: root/src/core/scope.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* scope: refuse activation of scopes if no PIDs to add are leftLennart Poettering2021-10-271-0/+6
| | | | | | | | | | If all processes we are supposed to add are gone by the time we are ready to do so, let's fail. THis is heavily based on Cunlong Li's work, who thankfully tracked this down. Replaces: #20577
* core: fix the return type for xxx_running_timeout() functionsFrantisek Sumsal2021-09-291-1/+1
| | | | | | | | otherwise we might return an invalid value, since `usec_t` is 64-bit, whereas `int` might not be. Follow-up to: 5918a93 Fixes: #20872
* core: implement RuntimeMaxDeltaSec directiveAlbert Brox2021-09-281-4/+22
|
* Drop the text argument from assert_not_reached()Zbigniew Jędrzejewski-Szmek2021-08-031-1/+1
| | | | | | | | | | | | | | | | | In general we almost never hit those asserts in production code, so users see them very rarely, if ever. But either way, we just need something that users can pass to the developers. We have quite a few of those asserts, and some have fairly nice messages, but many are like "WTF?" or "???" or "unexpected something". The error that is printed includes the file location, and function name. In almost all functions there's at most one assert, so the function name alone is enough to identify the failure for a developer. So we don't get much extra from the message, and we might just as well drop them. Dropping them makes our code a tiny bit smaller, and most importantly, improves development experience by making it easy to insert such an assert in the code without thinking how to phrase the argument.
* core: align string tablesZbigniew Jędrzejewski-Szmek2021-07-191-2/+2
|
* tree-wide: add FORMAT_TIMESPAN()Zbigniew Jędrzejewski-Szmek2021-07-091-2/+1
|
* core: disable event sources before unreffing themZbigniew Jędrzejewski-Szmek2021-05-121-2/+2
| | | | | | | | | | | | | | | | This mirrors the change done for systemd-resolved in 97935302283729c9206b84f5e00b1aff0f78ad19. Quoting that patch: > We generally operate on the assumption that a source is "gone" as soon as we > unref it. This is generally true because we have the only reference. But if > something else holds the reference, our unref doesn't really stop the source > and it could fire again. In particular, we take temporary references from sd-event code, and when called from an sd-event callback, we could temporarily see this elevated reference count. This patch doesn't seem to change anything, but I think it's nicer to do the same change as in other places and not rely on _unref() immediately disabling the source.
* scope: on unified, make sure to unwatch all PIDs once they've been moved to ↵Franck Bui2020-12-011-5/+7
| | | | | | | | | | | the cgroup scope Commit 428a9f6f1d0396b9eacde2b38d667cbe3f15eb55 freed u->pids which is problematic since the references to this unit in m->watch_pids were no more removed when the unit was freed. This patch makes sure to clean all this refs up before freeing u->pids by calling unit_unwatch_all_pids().
* core/scope: use set_ensure_put()Yu Watanabe2020-11-271-5/+1
|
* core: use SYNTHETIC_ERRNO() macroYu Watanabe2020-11-271-4/+2
|
* core: serialize u->pids until the processes have been moved to the scope cgroupFranck Bui2020-11-201-2/+35
| | | | | | Otherwise if a daemon-reload happens somewhere between the enqueue of the job start for the scope unit and scope_start() then u->pids might be lost and none of the processes specified by "PIDs=" will be moved into the scope cgroup.
* license: LGPL-2.1+ -> LGPL-2.1-or-laterYu Watanabe2020-11-091-1/+1
|
* core: clean up inactive/failed {service|scope}'s cgroups when the last ↵Anita Zhang2020-10-271-0/+5
| | | | | | | | | | | | | | process exits If processes remain in the unit's cgroup after the final SIGKILL is sent and the unit has exceeded stop timeout, don't release the unit's cgroup information. Pid1 will have failed to `rmdir` the cgroup path due to processes remaining in the cgroup and releasing would leave the cgroup path on the file system with no tracking for pid1 to clean it up. Instead, keep the information around until the last process exits and pid1 sends the cgroup empty notification. The service/scope can then prune the cgroup if the unit is inactive/failed.
* core: add ManagedOOM*= properties to configure systemd-oomd on the unitAnita Zhang2020-10-081-0/+1
| | | | | This adds the hook ups so it can be read with the usual systemd utilities. Used in later commits by sytemd-oomd.
* pid1: convert to the new schemeZbigniew Jędrzejewski-Szmek2020-05-051-1/+0
| | | | | | | | In all the other cases, I think the code was clearer with the static table. Here, not so much. And because of the existing dump code, the vtables cannot be made static and need to remain exported. I still think it's worth to do the change to have the cmdline introspection, but I'm disappointed with how this came out.
* core: introduce support for cgroup freezerMichal Sekletár2020-04-301-0/+3
| | | | | | | | | | | | | | | | | | | | With cgroup v2 the cgroup freezer is implemented as a cgroup attribute called cgroup.freeze. cgroup can be frozen by writing "1" to the file and kernel will send us a notification through "cgroup.events" after the operation is finished and processes in the cgroup entered quiescent state, i.e. they are not scheduled to run. Writing "0" to the attribute file does the inverse and process execution is resumed. This commit exposes above low-level functionality through systemd's DBus API. Each unit type must provide specialized implementation for these methods, otherwise, we return an error. So far only service, scope, and slice unit types provide the support. It is possible to check if a given unit has the support using CanFreeze() DBus property. Note that DBus API has a synchronous behavior and we dispatch the reply to freeze/thaw requests only after the kernel has notified us that requested operation was completed.
* core: clearly refuse OnFailure= deps on units that can't failLennart Poettering2020-01-091-0/+1
| | | | | | | | | Similar, refuse triggering deps on units that cannot trigger. And rework how we ignore After= dependencies on device units, to work the same way. See: #14142
* Merge pull request #13423 from pwithnall/12035-session-time-limitsZbigniew Jędrzejewski-Szmek2019-10-281-7/+34
|\ | | | | Add `RuntimeMaxSec=` support to scope units (time-limited login sessions)
| * scope: Support RuntimeMaxSec= directive in scope unitsPhilip Withnall2019-10-281-2/+17
| | | | | | | | | | | | | | | | | | | | | | | | Just as `RuntimeMaxSec=` is supported for service units, add support for it to scope units. This will gracefully kill a scope after the timeout expires from the moment the scope enters the running state. This could be used for time-limited login sessions, for example. Signed-off-by: Philip Withnall <withnall@endlessm.com> Fixes: #12035
| * scope: Refactor timer handling on coldplugPhilip Withnall2019-07-291-5/+17
| | | | | | | | | | | | | | Factor it out into a helper function which is a bit easier to expand in future. This introduces no functional changes. Signed-off-by: Philip Withnall <withnall@endlessm.com>
* | core: adjust load functions for other unit types to be more like serviceZbigniew Jędrzejewski-Szmek2019-10-111-15/+20
| | | | | | | | | | | | | | No functional change, just adjusting code to follow the same pattern everywhere. In particular, never call _verify() on an already loaded unit, but return early from the caller instead. This makes the code a bit easier to follow.
* | core: turn unit_load_fragment_and_dropin_optional() into a flagZbigniew Jędrzejewski-Szmek2019-10-111-1/+2
| | | | | | | | | | | | | | | | | | | | unit_load_fragment_and_dropin() and unit_load_fragment_and_dropin_optional() are really the same, with one minor difference in behaviour. Let's drop the second function. "_optional" in the name suggests that it's the "dropin" part that is optional. (Which it is, but in this case, we mean the fragment to be optional.) I think the new version with a flag is easier to understand.
* | cgroup: analyze: Report memory configurations that deviate from systemdChris Down2019-10-031-1/+1
|/ | | | | | This is the most basic consumer of the new systemd-vs-kernel checker, both acting as a reasonable standalone exerciser of the code, and also as a way for easy inspection of deviations from systemd internal state.
* core: add new call unit_reset_accounting()Lennart Poettering2019-04-121-2/+1
| | | | | | | | It's a simple wrapper for resetting both IP and CPU accounting in one go. This will become particularly useful when we also needs this to reset IO accounting (to be added in a later commit).
* scope: tiny cleanup: UNIT(s) -> uFranck Bui2019-03-201-4/+4
| | | | No functional changes.
* core: whenever we change state of a unit, force out PropertiesChanged bus signalLennart Poettering2018-12-011-0/+4
| | | | | | | | | | | | | | | | | This allows clients to follow our internal state changes safely. Previously, quick state changes (for example, when we restart a unit due to Restart= after it quickly transitioned through DEAD/FAILED states) would be coalesced into one bus signal event, with this change there's the guarantee that all state changes after the unit was announced ones are reflected on th bus. Note we only do this kind of guaranteed flushing only for unit state changes, not for other unit property changes, where clients still have to expect coalescing. This is because the unit state is a very important, high-level concept. Fixes: #10185
* core: introduce a helper function to wrap unit_log_{success,failure}Zbigniew Jędrzejewski-Szmek2018-11-161-5/+1
| | | | | It's inline so that the compiler can easily optimize away the call to get status string.
* core: log a recognizable message when a unit succeeds, tooLennart Poettering2018-11-161-1/+3
| | | | | | We already are doing it on failure, let's do it on success, too. Fixes: #10265
* core: make log messages about units entering a 'failed' state recognizableLennart Poettering2018-11-161-1/+1
| | | | | Let's make this recognizable, and carry result information in a structure fashion.
* core: rework serializationLennart Poettering2018-10-261-4/+5
| | | | | | | | | | | | | | | | | Let's be more careful with what we serialize: let's ensure we never serialize strings that are longer than LONG_LINE_MAX, so that we know we can read them back with read_line(…, LONG_LINE_MAX, …) safely. In order to implement this all serialization functions are move to serialize.[ch], and internally will do line size checks. We'd rather skip a serialization line (with a loud warning) than write an overly long line out. Of course, this is just a second level protection, after all the data we serialize shouldn't be this long in the first place. While we are at it also clean up logging: while serializing make sure to always log about errors immediately. Also, (void)ify all calls we don't expect errors in (or catch errors as part of the general fflush_and_check() at the end.
* pid1: drop unused path parameter to add_two_dependencies_by_name()Zbigniew Jędrzejewski-Szmek2018-09-151-1/+1
|
* tree-wide: remove Lennart's copyright linesLennart Poettering2018-06-141-3/+0
| | | | | | | | | | | These lines are generally out-of-date, incomplete and unnecessary. With SPDX and git repository much more accurate and fine grained information about licensing and authorship is available, hence let's drop the per-file copyright notice. Of course, removing copyright lines of others is problematic, hence this commit only removes my own lines and leaves all others untouched. It might be nicer if sooner or later those could go away too, making git the only and accurate source of authorship information.
* tree-wide: drop 'This file is part of systemd' blurbLennart Poettering2018-06-141-2/+0
| | | | | | | | | | | | | | | | This part of the copyright blurb stems from the GPL use recommendations: https://www.gnu.org/licenses/gpl-howto.en.html The concept appears to originate in times where version control was per file, instead of per tree, and was a way to glue the files together. Ultimately, we nowadays don't live in that world anymore, and this information is entirely useless anyway, as people are very welcome to copy these files into any projects they like, and they shouldn't have to change bits that are part of our copyright header for that. hence, let's just get rid of this old cruft, and shorten our codebase a bit.
* core: add a couple of more error cases that should result in "bad-setting"Lennart Poettering2018-06-111-1/+1
| | | | | This changes a number of EINVAL cases to ENOEXEC, so that we enter "bad-setting" state if they fail.
* core: enumerate perpetual units in a separate per-unit-type methodLennart Poettering2018-06-071-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously the enumerate() callback defined for each unit type would do two things: 1. It would create perpetual units (i.e. -.slice, system.slice, -.mount and init.scope) 2. It would enumerate units from /proc/self/mountinfo, /proc/swaps and the udev database With this change these two parts are split into two seperate methods: enumerate() now only does #2, while enumerate_perpetual() is responsible for #1. Why make this change? Well, perpetual units should have a slightly different effect that those found through enumeration: as perpetual units should be up unconditionally, perpetually and thus never change state, they should also not pull in deps by their state changing, not even when the state is first set to active. Thus, their state is generally initialized through the per-device coldplug() method in similar fashion to the deserialized state from a previous run would be put into place. OTOH units found through regular enumeration should result in state changes (and thus pull in deps due to state changes), hence their state should be put in effect in the catchup() method instead. Hence, given this difference, let's also separate the functions, so that the rule is: 1. What is created in enumerate_perpetual() should be started in coldplug() 2. What is created in enumerate() should be started in catchup().
* core: watch PIDs of scope units right after starting themLennart Poettering2018-06-051-0/+3
| | | | | | Scope units don't have a main or control process we can watch, hence let's explicitly watch the PIDs contained in them early on, just to make things more robust and have at least something to watch.
* core: rework how we track service and scope PIDsLennart Poettering2018-06-051-15/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reworks how systemd tracks processes on cgroupv1 systems where cgroup notification is not reliable. Previously, whenever we had reason to believe that new processes showed up or got removed we'd scan the cgroup of the scope or service unit for new processes, and would tidy up the list of PIDs previously watched. This scanning is relatively slow, and does not scale well. With this change behaviour is changed: instead of scanning for new/removed processes right away we do this work in a per-unit deferred event loop job. This event source is scheduled at a very low priority, so that it is executed when we have time but does not starve other event sources. This has two benefits: this expensive work is coalesced, if events happen in quick succession, and we won't delay SIGCHLD handling for too long. This patch basically replaces all direct invocation of unit_watch_all_pids() in scope.c and service.c with invocations of the new unit_enqueue_rewatch_pids() call which just enqueues a request of watching/tidying up the PID sets (with one exception: in scope_enter_signal() and service_enter_signal() we'll still do unit_watch_all_pids() synchronously first, since we really want to know all processes we are about to kill so that we can track them properly. Moreover, all direct invocations of unit_tidy_watch_pids() and unit_synthesize_cgroup_empty_event() are removed too, when the unit_enqueue_rewatch_pids() call is invoked, as the queued job will run those operations too. All of this is done on cgroupsv1 systems only, and is disabled on cgroupsv2 systems as cgroup-empty notifications are reliable there, and we do not need SIGCHLD events to track processes there. Fixes: #9138
* core: don't trigger OnFailure= deps when a unit is going to restartLennart Poettering2018-06-011-1/+1
| | | | | | | | | | | | This adds a flags parameter to unit_notify() which can be used to pass additional notification information to the function. We the make the old reload_failure boolean parameter one of these flags, and then add a new flag that let's unit_notify() if we are configured to restart the service. Note that this adjusts behaviour of systemd to match what the docs say. Fixes: #8398
* core: enforce that scope units can be started only onceLennart Poettering2018-04-271-0/+1
| | | | | | | | Scope units are populated from PIDs specified by the bus client. We do that when a scope is started. We really shouldn't allow scopes to be started multiple times, as the PIDs then might be heavily out of date. Moreover, clients should have the guarantee that any scope they allocate has a clear runtime cycle which is not repetitive.
* tree-wide: drop license boilerplateZbigniew Jędrzejewski-Szmek2018-04-061-13/+0
| | | | | | | | | | Files which are installed as-is (any .service and other unit files, .conf files, .policy files, etc), are left as is. My assumption is that SPDX identifiers are not yet that well known, so it's better to retain the extended header to avoid any doubt. I also kept any copyright lines. We can probably remove them, but it'd nice to obtain explicit acks from all involved authors before doing that.
* core: add new new bus call for migrating foreign processes to scope/service ↵Lennart Poettering2018-02-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | units This adds a new bus call to service and scope units called AttachProcesses() that moves arbitrary processes into the cgroup of the unit. The primary user for this new API is systemd itself: the systemd --user instance uses this call of the systemd --system instance to migrate processes if itself gets the request to migrate processes and the kernel refuses this due to access restrictions. The primary use-case of this is to make "systemd-run --scope --user …" invoked from user session scopes work correctly on pure cgroupsv2 environments. There, the kernel refuses to migrate processes between two unprivileged-owned cgroups unless the requestor as well as the ownership of the closest parent cgroup all match. This however is not the case between the session-XYZ.scope unit of a login session and the user@ABC.service of the systemd --user instance. The new logic always tries to move the processes on its own, but if that doesn't work when being the user manager, then the system manager is asked to do it instead. The new operation is relatively restrictive: it will only allow to move the processes like this if the caller is root, or the UID of the target unit, caller and process all match. Note that this means that unprivileged users cannot attach processes to scope units, as those do not have "owning" users (i.e. they have now User= field). Fixes: #3388
* cgroup: add a new "can_delegate" flag to the unit vtable, and set it for ↵Lennart Poettering2018-02-121-0/+1
| | | | | | | | | | | | | | | | scope and service units only Currently we allowed delegation for alluntis with cgroup backing except for slices. Let's make this a bit more strict for now, and only allow this in service and scope units. Let's also add a generic accessor unit_cgroup_delegate() for checking whether a unit has delegation turned on that checks the new bool first. Also, when doing transient units, let's explcitly refuse turning on delegation for unit types that don#t support it. This is mostly cosmetical as we wouldn't act on the delegation request anyway, but certainly helpful for debugging.
* Merge pull request #8107 from sourcejedi/pedantZbigniew Jędrzejewski-Szmek2018-02-061-3/+0
|\ | | | | core: a couple of tidyups to synthesized units
| * slice, scope: IgnoreOnIsolate=yes is already the defaultAlan Jenkins2018-02-041-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | `IgnoreOnIsolate=yes` is the default for slices and scopes. So it's not essential to set it on root.slice or init.scope. We don't need to worry about a bad unit file configuration. Any attempt to stop these unit should fail, since we mark them as `perpetual`. Also since init.scope cannot be stopped, there is no point setting `KillSignal=SIGRTMIN+14`. According to both documentation and testing, KillSignal= does not affect the behaviour of `systemctl kill`.
* | core: unify call we use to synthesize cgroup empty events when we stopped ↵Lennart Poettering2018-01-231-9/+6
|/ | | | | | | | | watching any unit PIDs This code is very similar in scope and service units, let's unify it in one function. This changes little for service units, but for scope units makes sure we go through the cgroup queue, which is something we should do anyway.
* core: generalize the cgroup empty check on GCLennart Poettering2017-11-251-14/+0
| | | | | | | | Let's move the cgroup empty check for all unit types into the generic unit_check_gc() call, out of the per-unit-type _check_gc() type. This not only allows us to share some code, but also hooks up mount and socket units with this kind of check, for free, as it was missing there previously.
* core: track scope controllers on the busLennart Poettering2017-11-231-4/+11
| | | | | | | | | This watches controllers on the bus, and unsets them automatically when they disappear. Note that this is primarily a cosmetical fix. Since unique bus names are not recycled, there's strictly no need to forget about them, but it's a lot nicer to do so.
* core: serialize the "controller" field in scope unitsLennart Poettering2017-11-231-0/+11
| | | | | We forgot to serialize it previously, hence daemon reload flushed it out, since we also didn't write it to any unit file...
* Add SPDX license identifiers to source files under the LGPLZbigniew Jędrzejewski-Szmek2017-11-191-0/+1
| | | | | This follows what the kernel is doing, c.f. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5fd54ace4721fc5ce2bb5aef6318fcf17f421460.
* core: implement /run/systemd/units/-based path for passing unit info from ↵Lennart Poettering2017-11-161-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PID 1 to journald And let's make use of it to implement two new unit settings with it: 1. LogLevelMax= is a new per-unit setting that may be used to configure log priority filtering: set it to LogLevelMax=notice and only messages of level "notice" and lower (i.e. more important) will be processed, all others are dropped. 2. LogExtraFields= is a new per-unit setting for configuring per-unit journal fields, that are implicitly included in every log record generated by the unit's processes. It takes field/value pairs in the form of FOO=BAR. Also, related to this, one exisiting unit setting is ported to this new facility: 3. The invocation ID is now pulled from /run/systemd/units/ instead of cgroupfs xattrs. This substantially relaxes requirements of systemd on the kernel version and the privileges it runs with (specifically, cgroupfs xattrs are not available in containers, since they are stored in kernel memory, and hence are unsafe to permit to lesser privileged code). /run/systemd/units/ is a new directory, which contains a number of files and symlinks encoding the above information. PID 1 creates and manages these files, and journald reads them from there. Note that this is supposed to be a direct path between PID 1 and the journal only, due to the special runtime environment the journal runs in. Normally, today we shouldn't introduce new interfaces that (mis-)use a file system as IPC framework, and instead just an IPC system, but this is very hard to do between the journal and PID 1, as long as the IPC system is a subject PID 1 manages, and itself a client to the journal. This patch cleans up a couple of types used in journal code: specifically we switch to size_t for a couple of memory-sizing values, as size_t is the right choice for everything that is memory. Fixes: #4089 Fixes: #3041 Fixes: #4441