summaryrefslogtreecommitdiffstats
path: root/src/core/service.h (follow)
Commit message (Collapse)AuthorAgeFilesLines
* service: add new RestartMode optionRichard Phibel2023-07-061-0/+11
| | | | | | | | | | | | | | | | | | | When this option is set to direct, the service restarts without entering a failed state. Dependent units are not notified of transitory failure. This is useful for the following use case: We have a target with Requires=my-service, After=my-service. my-service.service is a oneshot service and has Restart=on-failure in its definition. my-service.service can get stuck for various reasons and time out, in which case it is restarted. Currently, when it fails the first time, the target fails, even though my-service is restarted. The behavior we're looking for is that until my-service is not restarted anymore, the target stays pending waiting for my-service.service to start successfully or fail without being restarted anymore.
* core: get rid of unused Service.will_auto_restart logicMike Yuan2023-05-241-2/+0
| | | | | | | | | | The announced new behavior for OnFailure= never worked properly, and we've fixed the document instead in #27675. Therefore, let's get rid of the unused logic completely. More at #27594. The to-be-added RestartMode= option should cover the use case hopefully. Closes #27594
* core: rename RestartSecMax to RestartMaxDelaySecMike Yuan2023-05-171-1/+1
|
* service: rename service_close_socket_fd() → service_release_socket_fd()Lennart Poettering2023-04-131-1/+1
| | | | | | | | Just to match service_release_stdio_fd() and service_release_fd_store() in the name, since they do similar things. This follows the concept that we "release" resources, and this is all generically wrapped in "service_release_resources()".
* service: add ability to pin fd storeLennart Poettering2023-04-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | Oftentimes it is useful to allow the per-service fd store to survive longer than for a restart. This is useful in various scenarios: 1. An fd to some security relevant object needs to be stashed somewhere, that should not be cleaned automatically, because the security enforcement would be dropped then. 2. A user namespace fd should be allocated on first invocation and be kept around until the user logs out (i.e. systemd --user ends), á la #16328 (This does not implement what #16318 asks for, but should solve the use-case discussed there.) 3. There's interest in allow a concept of "userspace reboots" where the kernel stays running, and userspace is swapped out (i.e. all services exit, and the rootfs transitioned into a new version of it) while keeping some select resources pinned, very similar to how we implement a switch root. Thus it is useful to allow services to exit, while leaving their fds around till the very end. This is exposed through a new FileDescriptorStorePreserve= setting that is closely modelled after RuntimeDirectoryPreserve= (in fact it reused the same internal type), since we want similar behaviour in the end, and quite often they probably want to be used together.
* core: always calculate the next restart intervalMike Yuan2023-03-311-1/+1
| | | | | | | | | | Follow-up for #26902 and #26971 Let's always calculate the next restart interval since that's more useful. For that, we add 1 to s->n_restarts unconditionally, and change RestartUSecCurrent property to RestartUSecNext.
* Merge pull request #26971 from poettering/autostart-dead-failedLennart Poettering2023-03-291-1/+0
|\ | | | | pid1: introduce new SERVICE_{DEAD|FAILED}_BEFORE_AUTO_RESTART service…
| * pid1: introduce new SERVICE_{DEAD|FAILED}_BEFORE_AUTO_RESTART service substatesLennart Poettering2023-03-291-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a service deactivates and is then automatically restarted via Restart= we currently quickly transition through SERVICE_DEAD/SERVICE_FAILED. Which is weird given it's not the normal ("permanent") dead/failed state, but a transitory one we immediately leave from again. We do this so that software that looks for failures/successes can take notice, even if we restart as a consequence of the deactivation. Let's clean this up a bit: let's introduce two new states: SERVICE_DEAD_BEFORE_AUTO_RESTART and SERVICE_FAILED_BEFORE_AUTO_RESTART that are used for the transitory states. Both the SERVICE_DEAD and SERVICE_DEAD_BEFORE_AUTO_RESTART will map to the high-level UNIT_INACTIVE state though. (and similar for the respective failed states). This means the high-level state machine won't change by this, only the low-level one. This clearly seperates the substates, which makes the state engine cleaner, and allows clients to follow precisely whether we are in a transitory dead/failed state, or a permanent one, by looking at the service substate. Moreover it allows us to remove the 'n_keep_fd_store' which so far we used to ensure the fdstore was not released during this transitory dead/failed state but only during the permanent one. Since we can now distinguish these states properly we can just use that. This has been bugging me for a while. Let's clean this up. Note that the unit restart logic is already nicely covered in the testsiute, hence this adds no new tests for that. And yes, this could be considered a compat break, but sofar we took the liberty to make changes to the low-level state machine (i.e. SERVICE_xyz states, sometimes called "substates") without considering this a bad breakage – the high-level state machine (i.e. UNIT_xyz states) should be considered API that cannot be changed.
* | Merge pull request #26968 from DaanDeMeyer/exec-runtimeLennart Poettering2023-03-291-1/+0
|\ \ | |/ |/| core: Introduce unit private exec runtime
| * core: Move DynamicCreds into ExecRuntimeDaan De Meyer2023-03-271-1/+0
| | | | | | | | | | This is just another piece of runtime data so let's store it in ExecRuntime alongside the other runtime data.
| * core: Introduce unit private exec runtimeDaan De Meyer2023-03-271-1/+1
| | | | | | | | | | | | | | | | Currently, exec runtimes can be shared between units (using JoinsNamespaceOf=). Let's introduce a concept of a private exec runtime that isn't shared with JoinsNamespaceOf=. The existing ExecRuntime struct is renamed to ExecRuntimeShared and becomes a private member of the new private ExecRuntime.
| * execute: Rename ExecRuntime to ExecSharedRuntimeDaan De Meyer2023-03-271-1/+1
| | | | | | | | Preparation for next commit
* | core: add RestartSteps= and RestartSecMax= for exponentially increasingMike Yuan2023-03-271-0/+4
|/ | | | | | | | | | interval between restarts RestartSteps= accepts a positive integer as the number of steps to take to increase the interval between auto-restarts from RestartSec= to RestartSecMax=, or 0 to disable it. Closes #6129
* core: support overriding NOTIFYACCESS= through sd-notify during runtimeMike Yuan2023-03-211-0/+6
| | | | Closes #25963
* pid1: add new Type=notify-reload service typeLennart Poettering2023-01-101-7/+11
| | | | Fixes: #6162
* core: add OpenFile settingRichard Phibel2023-01-101-0/+3
|
* core/oomd: Use oom-kill ServiceResult for oomdNishal Kulkarni2022-03-221-1/+1
| | | | | | | | | | | | | | To notify user of kill events from systemd-oomd we now use `SERVICE_FAILURE_OOM_KILL` as the failure result. `unit_check_oomd_kill` now calls `notify_cgroup_oom` to update the service result to `oom-kill`. We add a new xattr `user.oomd_ooms` to keep track of the OOM kills initiated by systemd-oomd, this helps us resolve a race between sending SIGKILL to processes and checking for OOM kill status from the xattr. Related to: #20649
* pid1: lookup owning PID of BusName= name of services asynchronouslyLennart Poettering2022-02-181-0/+2
| | | | | | | | | | | | A first step of removing blocking calls to the D-Bus broker from PID 1. There's a lot more to got (i.e. grep src/core/ for sd_bus_creds basically), but it's a start. Removing blocking calls to D-Bus broker deals systematicallly with deadlocks caused by dbus-daemon blocking on synchronous IPC calls back to PID1 (e.g. Varlink calls through nss-systemd). Bugs such as #15316. Also-see: https://github.com/systemd/systemd/pull/22038#issuecomment-1042958390
* socket: always pass socket, fd and SocketPeer ownership to service togetherLennart Poettering2021-11-251-4/+5
| | | | | | | | | | | | | | | | Per-connection socket instances we currently maintain three fields related to the socket: a reference to the Socket unit, the connection fd, and a reference to the SocketPeer object that counts socket peers. Let's synchronize their lifetime, i.e. always set them all three together or unset them together, so that their reference counters stay synchronous. THis will in particuar ensure that we'll drop the SocketPeer reference whenever we leave an active state of the service unit, i.e. at the same time we close the fd for it. Fixes: #20685
* Reintroduce ExitTypeHenri Chain2021-11-081-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This introduces `ExitType=main|cgroup` for services. Similar to how `Type` specifies the launch of a service, `ExitType` is concerned with how systemd determines that a service exited. - If set to `main` (the current behavior), the service manager will consider the unit stopped when the main process exits. - The `cgroup` exit type is meant for applications whose forking model is not known ahead of time and which might not have a specific main process. The service will stay running as long as at least one process in the cgroup is running. This is intended for transient or automatically generated services, such as graphical applications inside of a desktop environment. Motivation for this is #16805. The original PR (#18782) was reverted (#20073) after realizing that the exit status of "the last process in the cgroup" can't reliably be known (#19385) This version instead uses the main process exit status if there is one and just listens to the cgroup empty event otherwise. The advantages of a service with `ExitType=cgroup` over scopes are: - Integrated logging / stdout redirection - Avoids the race / synchronisation issue between launch and scope creation - More extensive use of drop-ins and thus distro-level configuration: by moving from scopes to services we can have drop ins that will affect properties that can only be set during service creation, like `OOMPolicy` and security-related properties - It makes systemd-xdg-autostart-generator usable by fixing [1], as obviously only services can be used in the generator, not scopes. [1] https://bugs.kde.org/show_bug.cgi?id=433299
* core: implement RuntimeMaxDeltaSec directiveAlbert Brox2021-09-281-0/+1
|
* Revert "Introduce ExitType"Zbigniew Jędrzejewski-Szmek2021-06-301-11/+0
| | | | | | | | | | | This reverts commit cb0e818f7cc2499d81ef143e5acaa00c6e684711. After this was merged, some design and implementation issues were discovered, see the discussion in #18782 and #19385. They certainly can be fixed, but so far nobody has stepped up, and we're nearing a release. Hopefully, this feature can be merged again after a rework. Fixes #19345.
* test-unit-serialize: add a very basic test that command deserialization worksZbigniew Jędrzejewski-Szmek2021-04-261-0/+3
| | | | | | | We should test both serialization and deserialization works properly. But the serialization/deserialization code is deeply entwined with the manager state, and I think quite a bit of refactoring will be required before this is possible. But let's at least add this simple test for now.
* Introduce ExitTypeHenri Chain2021-03-311-0/+11
|
* tree-wide: use -EINVAL for enum invalid valuesZbigniew Jędrzejewski-Szmek2021-02-101-6/+6
| | | | | | | | | As suggested in https://github.com/systemd/systemd/pull/11484#issuecomment-775288617. This does not touch anything exposed in src/systemd. Changing the defines there would be a compatibility break. Note that tests are broken after this commit. They will be fixed in the next one.
* license: LGPL-2.1+ -> LGPL-2.1-or-laterYu Watanabe2020-11-091-1/+1
|
* core: let user define start-/stop-timeout behaviourJan Klötzke2020-06-091-0/+13
| | | | | | | | | | | | | | | | | | | | | | The usual behaviour when a timeout expires is to terminate/kill the service. This is what user usually want in production systems. To debug services that fail to start/stop (especially sporadic failures) it might be necessary to trigger the watchdog machinery and write core dumps, though. Likewise, it is usually just a waste of time to gracefully stop a stuck service. Instead it might save time to go directly into kill mode. This commit adds two new options to services: TimeoutStartFailureMode= and TimeoutStopFailureMode=. Both take the same values and tweak the behavior of systemd when a start/stop timeout expires: * 'terminate': is the default behaviour as it has always been, * 'abort': triggers the watchdog machinery and will send SIGABRT (unless WatchdogSignal was changed) and * 'kill' will directly send SIGKILL. To handle the stop failure mode in stop-post state too a new final-watchdog state needs to be introduced.
* service: Display updated WatchdogUSec from sd_notifyChris Down2020-05-271-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | Suppose a service has WatchdogSec set to 2 seconds in its unit file. I then start the service and WatchdogUSec is set correctly: % systemctl --user show psi-notify -p WatchdogUSec WatchdogUSec=2s Now I call `sd_notify(0, "WATCHDOG_USEC=10000000")`. The new timer seems to have taken effect, since I only send `WATCHDOG=1` every 4 seconds, and systemd isn't triggering the watchdog handler. However, `systemctl show` still shows WatchdogUSec as 2s: % systemctl --user show psi-notify -p WatchdogUSec WatchdogUSec=2s This seems surprising, since this "original" watchdog timer isn't the one taking effect any more. This patch makes it so that we instead display the new watchdog timer after sd_notify(WATCHDOG_USEC): % systemctl --user show psi-notify -p WatchdogUSec WatchdogUSec=10s Fixes #15726.
* core: (De-)Serialize poll flag for fds in fdstoreKenny Levinsen2020-04-301-0/+1
| | | | | This replaces manual string splitting and unescaping with extract_first_word.
* core: move timeout_clean_usec from Service to ExecContextYu Watanabe2019-08-281-1/+0
|
* core: ExecCondition= for servicesAnita Zhang2019-07-171-0/+2
| | | | Closes #10596
* core: hook up service unit type with the new clean operationLennart Poettering2019-07-111-0/+2
| | | | | | The implementation is pretty straight-foward: when we get a request to clean some type of resources we fork off a process doing that, and while it is running we are in the "cleaning" state.
* core: add ExecStartXYZEx= with dbus support for executable prefixesAnita Zhang2019-05-311-0/+3
| | | | Closes #11654
* core: add assertion in two inline functionsYu Watanabe2019-04-141-0/+1
|
* core: change type of Service::timeout_abort_set to boolYu Watanabe2019-04-141-1/+1
| | | | Follow-up for dc653bf487bae9d1ddf794442bf4176fee173b41 (#11211).
* service: handle abort stops with dedicated timeoutJan Klötzke2019-04-121-0/+6
| | | | | | | | | | | | | | | | | When shooting down a service with SIGABRT the user might want to have a much longer stop timeout than on regular stops/shutdowns. Especially in the face of short stop timeouts the time might not be sufficient to write huge core dumps before the service is killed. This commit adds a dedicated (Default)TimeoutAbortSec= timer that is used when stopping a service via SIGABRT. In all other cases the existing TimeoutStopSec= is used. The timer value is unset by default to skip the special handling and use TimeoutStopSec= for state 'stop-watchdog' to keep the old behaviour. If the service is in state 'stop-watchdog' and the service should be stopped explicitly we still go to 'stop-sigterm' and re-apply the usual TimeoutStopSec= timeout.
* core: implement OOMPolicy= and watch cgroups for OOM killingsLennart Poettering2019-04-091-0/+3
| | | | | | | | | | | | | | | | | | | This adds a new per-service OOMPolicy= (along with a global DefaultOOMPolicy=) that controls what to do if a process of the service is killed by the kernel's OOM killer. It has three different values: "continue" (old behaviour), "stop" (terminate the service), "kill" (let the kernel kill all the service's processes). On top of that, track OOM killer events per unit: generate a per-unit structured, recognizable log message when we see an OOM killer event, and put the service in a failure state if an OOM killer event was seen and the selected policy was not "continue". A new "result" is defined for this case: "oom-kill". All of this relies on new cgroupv2 kernel functionality: the "memory.events" notification interface and the "memory.oom.group" attribute (which makes the kernel kill all cgroup processes automatically).
* tree-wide: reorder various structures to make them smaller and use fewer ↵Lennart Poettering2019-03-271-1/+1
| | | | | | cache lines Some "pahole" spelunking.
* service: when starting a service make a copy of the watchdog timeout and use ↵Lennart Poettering2018-10-261-2/+3
| | | | | | | | | | | | | | | | that When we start a service process we pass the selected watchdog timeout to it with the $WATCHDOG_USEC environment variable. If the unit file is reconfigured later, we need to make sure to continue to honour the original timeout, i.e. watch $WATCHDOG_USEC was set to, otherwise we'll expect the ping at a different time as the service process is sending it to us. Hence, whenever we start a unit, save the watchdog timeout, and stick to that for everything we do. Fixes: #9467
* core: enforce a limit on STATUS= texts recvd from servicesLennart Poettering2018-10-261-0/+2
| | | | Let's better be safe than sorry, and put a limit on what we receive.
* core: introduce new Type=exec service typeLennart Poettering2018-07-251-0/+4
| | | | | | | | | | | | | | | | | | | | Users are often surprised that "systemd-run" command lines like "systemd-run -p User=idontexist /bin/true" will return successfully, even though the logs show that the process couldn't be invoked, as the user "idontexist" doesn't exist. This is because Type=simple will only wait until fork() succeeded before returning start-up success. This patch adds a new service type Type=exec, which is very similar to Type=simple, but waits until the child process completed the execve() before returning success. It uses a pipe that has O_CLOEXEC set for this logic, so that the kernel automatically sends POLLHUP on it when the execve() succeeded but leaves the pipe open if not. This means PID 1 waits exactly until the execve() succeeded in the child, and not longer and not shorter, which is the desired functionality. Making use of this new functionality, the command line "systemd-run -p User=idontexist -p Type=exec /bin/true" will now fail, as expected.
* tree-wide: remove Lennart's copyright linesLennart Poettering2018-06-141-4/+0
| | | | | | | | | | | These lines are generally out-of-date, incomplete and unnecessary. With SPDX and git repository much more accurate and fine grained information about licensing and authorship is available, hence let's drop the per-file copyright notice. Of course, removing copyright lines of others is problematic, hence this commit only removes my own lines and leaves all others untouched. It might be nicer if sooner or later those could go away too, making git the only and accurate source of authorship information.
* tree-wide: drop 'This file is part of systemd' blurbLennart Poettering2018-06-141-2/+0
| | | | | | | | | | | | | | | | This part of the copyright blurb stems from the GPL use recommendations: https://www.gnu.org/licenses/gpl-howto.en.html The concept appears to originate in times where version control was per file, instead of per tree, and was a way to glue the files together. Ultimately, we nowadays don't live in that world anymore, and this information is entirely useless anyway, as people are very welcome to copy these files into any projects they like, and they shouldn't have to change bits that are part of our copyright header for that. hence, let's just get rid of this old cruft, and shorten our codebase a bit.
* core: undo the dependency inversion between unit.h and all unit typesFelipe Sateler2018-05-151-0/+4
|
* tree-wide: be more careful with the type of array sizesLennart Poettering2018-04-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously we were a bit sloppy with the index and size types of arrays, we'd regularly use unsigned. While I don't think this ever resulted in real issues I think we should be more careful there and follow a stricter regime: unless there's a strong reason not to use size_t for array sizes and indexes, size_t it should be. Any allocations we do ultimately will use size_t anyway, and converting forth and back between unsigned and size_t will always be a source of problems. Note that on 32bit machines "unsigned" and "size_t" are equivalent, and on 64bit machines our arrays shouldn't grow that large anyway, and if they do we have a problem, however that kind of overly large allocation we have protections for usually, but for overflows we do not have that so much, hence let's add it. So yeah, it's a story of the current code being already "good enough", but I think some extra type hygiene is better. This patch tries to be comprehensive, but it probably isn't and I missed a few cases. But I guess we can cover that later as we notice it. Among smaller fixes, this changes: 1. strv_length()' return type becomes size_t 2. the unit file changes array size becomes size_t 3. DNS answer and query array sizes become size_t Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=76745
* tree-wide: drop license boilerplateZbigniew Jędrzejewski-Szmek2018-04-061-13/+0
| | | | | | | | | | Files which are installed as-is (any .service and other unit files, .conf files, .policy files, etc), are left as is. My assumption is that SPDX identifiers are not yet that well known, so it's better to retain the extended header to avoid any doubt. I also kept any copyright lines. We can probably remove them, but it'd nice to obtain explicit acks from all involved authors before doing that.
* service: Don't stop unneeded units needed by restarted service (#7526)Michal Koutný2017-12-051-0/+2
| | | | | | | | | | | | | | | | | | | | | | | An auto-restarted unit B may depend on unit A with StopWhenUnneeded=yes. If A stops before B's restart timeout expires, it'll be started again as part of B's dependent jobs. However, if stopping takes longer than the timeout, B's running stop job collides start job which also cancels B's start job. Result is that neither A or B are active. Currently, when a service with automatic restarting fails, it transitions through following states: 1) SERVICE_FAILED or SERVICE_DEAD to indicate the failure, 2) SERVICE_AUTO_RESTART while restart timer is running. The StopWhenUnneeded= check takes place in service_enter_dead between the two state mentioned above. We temporarily store the auto restart flag to query it during the check. Because we don't return control to the main event loop, this new service unit flag needn't be serialized. This patch prevents the pathologic situation when the service with Restart= won't restart automatically. As a side effect it also avoid restarting the dependency unit with StopWhenUnneeded=yes. Fixes: #7377
* Merge pull request #7381 from poettering/cgroup-unified-delegate-reworkZbigniew Jędrzejewski-Szmek2017-11-221-2/+0
|\ | | | | Fix delegation in the unified hierarchy + more cgroup work
| * core: unify common code for preparing for forking off unit processesLennart Poettering2017-11-211-2/+0
| | | | | | | | | | | | | | | | | | This introduces a new function unit_prepare_exec() that encapsulates a number of calls we do in preparation for spawning off some processes in all our unit types that do so. This allows us to neatly unify a bit of code between unit types and shorten our code.
* | core: generalize FailureAction= move it from service to unitLennart Poettering2017-11-201-2/+0
|/ | | | | All kinds of units can fail, hence it makes sense to offer this as generic concept for all unit types.