summaryrefslogtreecommitdiffstats
path: root/src/core (follow)
Commit message (Collapse)AuthorAgeFilesLines
* core/service: service_add_fd_store() consumes passed fdYu Watanabe11 days1-3/+1
| | | | | | | | Without this change, the fd is closed twice on failure. Fixes a bug introduced by dff9808a628c31b7ecb1f1aba8fdc3be06ce8372. Fixes #35288.
* core/exec-invoke: suppress placeholder home only in build_environment()Mike Yuan2024-11-191-8/+8
| | | | | | | | | | | | | | | Currently, get_fixed_user() employs USER_CREDS_SUPPRESS_PLACEHOLDER, meaning home path is set to NULL if it's empty or root. However, the path is also used for applying WorkingDirectory=~, and we'd spuriously use the invoking user's home as fallback even if User= is changed in that case. Let's instead delegate such suppression to build_environment(), so that home is proper initialized for usage at other steps. shell doesn't actually suffer from such problem, but it's changed too for consistency. Alternative to #34789
* core/exec-invoke: minor cleanup for apply_working_directory() error handlingMike Yuan2024-11-191-15/+7
| | | | | Assign exit_status at the same site where error log is emitted, for readability.
* basic/user-util: split out placeholder suppression from USER_CREDS_CLEAN ↵Mike Yuan2024-11-191-1/+1
| | | | | | into its own flag No functional change, preparation for later commits.
* pid1: make clear that $WATCHDOG_USEC is set for the shutdown binary, noone elseLennart Poettering2024-11-151-0/+5
| | | | | | | | | | | | | | We use the $WATCHDOG_USEC variable for two very closely uses: as part of the sd_watchdog_enabled() protocol for implementing service watchdogs. And as part of the protocol between the service manager and systemd-shutdown across the PID 1 execve() transition during shutdown. Apparently some exitrds tools got confused by the latter use. Let's address that by setting $WATCHDOG_PID to 1, in accordance to the sd_watchdog_enabled() protocol to make clear this is only intended for PID 1 and nothing else. Replaces: #35135
* dbus-manager: add missing word 'unit' to PK messageLennart Poettering2024-11-121-1/+1
|
* introduce report_errno_and_exit() helper (#35028)Luca Boccassi2024-11-061-14/+11
|\ | | | | | | | | This is a follow for https://github.com/systemd/systemd/pull/34853. In particular, this comment https://github.com/systemd/systemd/pull/34853#discussion_r1825837705.
| * use report_errno_and_exit() in src/core/exec-invoke.cIvan Kruglov2024-11-061-14/+11
| |
* | core/manager: silence false-positive warning by coverityYu Watanabe2024-11-061-1/+1
| | | | | | | | | | | | Follow-up for 406f1775017a5631bc91a1f53ac5e50f4fbfac0c. Closes CID#1564897.
* | pid1: stop refusing to boot with cgroup v1Zbigniew Jędrzejewski-Szmek2024-11-061-11/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since v256 we completely fail to boot if v1 is configured. Fedora 41 was just released with v256.7 and this is probably the first major exposure of users to this code. It turns out not work very well. Fedora switched to v2 as default in F31 (2019) and at that time some people added configuration to use v1 either because of Docker or for other reasons. But it's been long enough ago that people don't remember this and are now very unhappy when the system refuses to boot after an upgrade. Refusing to boot is also unnecessarilly punishing to users. For machines that are used remotely, this could mean somebody needs to physically access the machine. For other users, the machine might be the only way to access the net and help, and people might not know how to set kernel parameters without some docs. And because this is in systemd, after an upgrade all boot choices are affected, and it's not possible to e.g. select an older kernel for boot. And crashing the machine doesn't really serve our goal either: we were giving a hint how to continue using v1 and nothing else. If the new override is configured, warn and immediately boot to v1. If v1 is configured w/o the override, warn and wait 30 s and boot to v2. Also give a hint how to switch to v2. https://bugzilla.redhat.com/show_bug.cgi?id=2323323 https://bugzilla.redhat.com/show_bug.cgi?id=2323345 https://bugzilla.redhat.com/show_bug.cgi?id=2322467 https://www.reddit.com/r/Fedora/comments/1gfcyw9/refusing_to_run_under_cgroup_01_sy_specified_on/ The advice is to set systemd.unified_cgroup_hierarchy=1 (instead of removing systemd.unified_cgroup_hierarchy=0). I think this is easier to convey. Users who are understand what is going on can just remove the option instead. The caching is dropped in cg_is_legacy_wanted(). It turns out that the order in which those functions are called during early setup is very fragile. If cg_is_legacy_wanted() is called before we have set up the v2 hierarchy, we incorrectly cache a true answer. The function is called just a handful of times at most, so we don't really need to cache the response.
* tree-wide: time-out → timeoutZbigniew Jędrzejewski-Szmek2024-11-051-1/+1
| | | | For justification, see 3f9a0a522f2029e9295ea5e9984259022be88413.
* core: Introduce PrivatePIDs=Daan De Meyer2024-11-0515-9/+494
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This new setting allows unsharing the pid namespace in a unit. Because you have to fork to get a process into a pid namespace, we fork in systemd-executor to get into the new pid namespace. The parent then sends the pid of the child process back to the manager and exits while the child process continues on with the rest of exec_invoke() and then executes the actual payload. Communicating the child pid is done via a new pidref socket pair that is set up on manager startup. We unshare the PID namespace right before the mount namespace so we mount procfs correctly. Note PrivatePIDs=yes always implies MountAPIVFS=yes to mount procfs. When running unprivileged in a user session, user namespace is set up first to allow for PID namespace to be unshared. However, when running in privileged mode, we unshare the user namespace last to ensure the user namespace does not own the PID namespace and cannot break out of the sandbox. Note we disallow Type=forking services from using PrivatePIDs=yes since the init proess inside the PID namespace must not exit for other processes in the namespace to exist. Note Daan De Meyer did the original work for this commit with Ryan Wilson addressing follow-ups. Co-authored-by: Daan De Meyer <daan.j.demeyer@gmail.com>
* exec-invoke: Add debug logging for setup_private_users()Daan De Meyer2024-11-041-7/+7
|
* efivars: Remove STRINGIFY() helper macrosDaan De Meyer2024-11-021-1/+1
| | | | | | | | The names of these conflict with macros from efi.h that we'll move to efi-fundamental.h in a later commit. Let's avoid the conflict by getting rid of these helpers. Arguably this also improves readability by clearly indicating we're passing arbitrary strings and not constants to the macros when we invoke them.
* core: add id-mapped mount support for Exec directoriesAndres Beltran2024-11-014-8/+106
|
* core/service: don't propagate stop jobs if RestartMode=direct (#34768)Lennart Poettering2024-11-0117-123/+169
|\ | | | | Fixes https://github.com/systemd/systemd/issues/34758
| * core/service: don't propagate stop jobs if RestartMode=directMike Yuan2024-10-271-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The goal of RestartMode=direct is to make restarts invisible to dependents, so auto restart jobs shouldn't bring them down at all. So far we only skipped going through failed/dead states in service_enter_dead(), i.e. the unit would never be considered dead. But when constructing restart transaction, the stop job would be propagated to dependents. Consider the following 2 units: dependent.target: [Unit] BindsTo=a.service After=a.service a.service: [Service] ExecStart=bash -c 'sleep 100 && exit 1' Restart=on-failure RestartMode=direct Before this commit, even though BindsTo= isn't triggered since a.service never failed, when a.service auto-restarts, dependent.target is also restarted. Let's suppress it by using JOB_REPLACE instead of JOB_RESTART_DEPENDENCIES in service_enter_restart(). Fixes #34758 The example above is subtly different from the original report, to illustrate that the new behavior makes sense for less exotic use cases too.
| * core: make refuse_late_merge a proper attr of Job and introduce ↵Mike Yuan2024-10-275-38/+65
| | | | | | | | TRANSACTION_REENQUEUE_ANCHOR
| * core/manager: introduce manager_add_job_full() which takes extra ↵Mike Yuan2024-10-2712-37/+62
| | | | | | | | | | | | TransactionAddFlags No functional change. Preparation for later commits.
| * core/job: trivial modernizationMike Yuan2024-10-272-15/+18
| |
| * core: drop effectively unused UNIT_ATOM_PROPAGATE_RESTARTMike Yuan2024-10-273-31/+21
| | | | | | | | | | | | | | Restart jobs are always run as stop jobs initially, and later gets converted to start jobs by job engine. Hence UNIT_ATOM_PROPAGATE_STOP should and does cover the restart case, as currently all dep types with _RESTART also carries _STOP. Drop UNIT_ATOM_PROPAGATE_RESTART.
| * core/service: use log_unit_* where appropriateMike Yuan2024-10-271-3/+3
| |
* | core: add read-only flag for exec directoriesLuca Boccassi2024-11-016-45/+98
| | | | | | | | | | | | | | | | When an exec directory is shared between services, this allows one of the service to be the producer of files, and the other the consumer, without letting the consumer modify the shared files. This will be especially useful in conjunction with id-mapped exec directories so that fully sandboxed services can share directories in one direction, safely.
* | core: make mount(8) and swapon(8) inherit SMACK label from systemdŁukasz Stelmach2024-10-302-0/+6
| | | | | | | | | | | | | | | | | | By default mount(8), umount(8), swapon(8) and swapoff(8) should run with with the SMACK label inherited from systemd rather than the default one meant for services. Fixes: aa5ae9711ef3cd0c69b7fcfbd65bca05fb704a8a Follow-up-for: 20bbf5ee4c6c80599a91e7a4b7474e931a27db4a
* | core: add EXEC_DIRECTORY_TYPE_SHALL_CHOWN() helperLennart Poettering2024-10-303-5/+12
| | | | | | | | | | | | Let's make ConfigurationDirectory= a bit less "special-casey", by hiding the fact that it's the only per-service dir we do not do chown()ing for inside of a new EXEC_DIRECTORY_TYPE_SHALL_CHOWN() helper.
* | core/service: support sd_notify() MAINPIDFD=1 and MAINPIDFDID=Mike Yuan2024-10-291-27/+89
| | | | | | | | These serve as race-free alternatives for MAINPID= notification.
* | socket: support setting ownership of message queuesDavid Michael2024-10-281-0/+8
| | | | | | | | | | | | | | | | This applies the existing SocketUser=/SocketGroup= options to units defining a POSIX message queue, bringing them in line with UNIX sockets and FIFOs. They are set on the file descriptor rather than a file system path because the /dev/mqueue path interface is an optional mount unit.
* | cgroup: Add support for ProtectControlGroups= private and strictRyan Wilson2024-10-286-12/+167
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This commit adds two settings private and strict to the ProtectControlGroups= property. Private will unshare the cgroup namespace and mount a read-write private cgroup2 filesystem at /sys/fs/cgroup. Strict does the same except the mount is read-only. Since the unit is running in a cgroup namespace, the new root of /sys/fs/cgroup is the unit's own cgroup. We also add a new dbus property ProtectControlGroupsEx which accepts strings instead of boolean. This will allow users to use private/strict via dbus and systemd-run in addition to service files. Note private and strict fall back to no and yes respectively if the kernel doesn't support cgroup2 or system is not using unified hierarchy. Fixes: #34634
* | core: Refactor ProtectControlGroups= to use enum vs boolRyan Wilson2024-10-2810-25/+85
|/ | | | | | | This commit refactors ProtectControlGroups= from using a boolean in the dbus/execute backend to using an enum. There is no functional change but this will allow adding new non-boolean values (e.g. strict, private) a la PrivateHome.
* core: Add RootDirectory= path to error message if directory does not existRyan Wilson2024-10-271-10/+26
| | | | | | | | | | | | | | A colleague reported when RootDirectory= does not exist, systemd reports an error like: ``` Failed to set up mount namespacing: No such file or directory ``` Unfortunately, with large spec files, it can be hard to diagnose which path systemd is talking about. Thus, to make the error message more helpful and similar to mount error messages, we add the root directory/image path into the error message like: ``` Failed to set up mount namespacing: /tmp/thisdoesnotexist: No such file or directory ```
* core/execute: Rename error_path -> reterr_path/ret_path per coding guidelinesRyan Wilson2024-10-273-18/+18
| | | | | This is a non-functional change to ensure error_path used to print out the offending mount causing an error follows coding guidelines.
* core/cgroup: rename CGROUP_PRESSURE_WATCH_ON/OFF -> CGROUP_PRESSURE_WATCH_YES/NOYu Watanabe2024-10-263-7/+7
| | | | | | | | | No functional change, but let's print yes/no rather than on/off in systemd-analyze. Similar to 2e8a581b9cc1132743c2341fc334461096266ad4 and edd3f4d9b7a63dc9a142ef20119e80d1d9527f2f. (Note, the commit messages of those commits are wrong, as parse_boolean() supports on/off anyway.)
* tree-wide: replace for loop with FOREACH_ELEMENT or FOREACH_ARRAY macros ↵Integral2024-10-264-15/+13
| | | | (#34893)
* core: make sure that if PAMName= is set we always do the full user changing ↵Lennart Poettering2024-10-241-3/+18
| | | | | | | | | | | | | | even if no user is specified explicitly When PAMName= is set this should be enough to go through our entire user changing story, so that PAM is definitely run, and environment variables definitely pulled in and so on. Previously, it would happen that under some circumstances we might no do this when transitioning from root to root itself even though PAM was enabled. Fixes: #34682
* Merge pull request #34799 from YHNdnzj/service-followupsMike Yuan2024-10-244-87/+82
|\ | | | | core: follow-ups for live mount
| * core: clean up errors for live mountingMike Yuan2024-10-223-55/+38
| | | | | | | | | | | | * Use SD_BUS_ERROR_NOT_SUPPORTED where appropriate * Use Service object in service_can_live_mount() * Include errno in bus error message
| * core/service: fix one wordingMike Yuan2024-10-221-1/+1
| |
| * core/service: add missing serialization for Service.live_mount_resultMike Yuan2024-10-221-3/+13
| |
| * core/service: call service_enter_running() if live mount failsMike Yuan2024-10-221-1/+1
| | | | | | | | | | | | | | service_enter_running() would re-arm timer for RuntimeMaxSec=, hence it should be called instead of disabling timer completely when live mount operation fails, in a similar fashion as service_enter_reload_by_notify().
| * core/service: introduce service_live_mount_finish()Mike Yuan2024-10-221-8/+7
| | | | | | | | | | that combines updating Service.live_mount_result and service_mount_request_reply()
| * core/service: place occurrences of SERVICE_MOUNTING closer to reload statesMike Yuan2024-10-222-21/+20
| |
| * core/unit: put the reload job back to queue if unit is refreshingMike Yuan2024-10-221-1/+5
| |
* | Merge pull request #34834 from yuwata/protect-home-tmpfs-read-onlyYu Watanabe2024-10-231-21/+21
|\ \ | | | | | | core/namespace: make ProtectHome=tmpfs makes /home and friends read-only as documented
| * | core/namespace: replace MOUNT_PRIVATE_TMP_READ_ONLY with MOUNT_PRIVATE_TMP ↵Yu Watanabe2024-10-231-10/+5
| | | | | | | | | | | | with .read_only = true
| * | core/namespace: coding style cleanupsYu Watanabe2024-10-231-6/+6
| | |
| * | core/namespace: honor MountEntry.read_only, .options, and so on in static ↵Yu Watanabe2024-10-231-5/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | entries Otherwise, ProtectHome=tmpfs makes /home/ and friends not read-only. Also, mount options for /run/ specified in MountAPIVFS=yes are not applied. The function append_static_mounts() was introduced in 5327c910d2fc1ae91bd0b891be92b30379c7467b, but at that time, there were neither .read_only nor .options in the struct. But, when later the struct is extended, the function was not updated and they were not copied from the static table. The fields has been used in static tables since e4da7d8c796a1fd11ecfa80fb8a48eac9e823f06, and also in 94293d65cd4125347e21b3e423d0e245226b1be2. Fixes #34825.
* | | core: don't forget about fallback_smack_process_labelŁukasz Stelmach2024-10-231-1/+1
|/ / | | | | | | | | | | Call setup_smack() also when only fallback_smack_process_label is set. Fixes: 75689fb2d41f
* / fileio: port write_string_file() to LabelOps, and thus add ↵Lennart Poettering2024-10-222-3/+1
|/ | | | | | | | | | | | | WRITE_STRING_FILE_LABEL flag Given that we have the LabelOps abstraction these days, we can teach write_string_file() to use it, which means we can get rid of fileio-label.[ch] as a separate concept. (The only reason that fileio-label.[ch] exists independently of fileio.[ch] was that the former linekd to libselinux potentially, and thus had to be in src/shared/ while the other always was in src/basic/. But the LabelOps vtable provides us with a nice work-around)
* Merge pull request #34403 from poettering/askpw-per-userLennart Poettering2024-10-212-31/+38
|\ | | | | modernize the ask-password logic, and add unpriv askpw agents to the concept
| * core: modernize askpw handling a bitLennart Poettering2024-10-212-31/+38
| |