summaryrefslogtreecommitdiffstats
path: root/src/core (follow)
Commit message (Collapse)AuthorAgeFilesLines
* unit: add jobs that were skipped because of ratelimit back to run_queueMichal Sekletar2021-11-291-0/+9
| | | | | | | | | | | Assumption in edc027b was that job we first skipped because of active ratelimit is still in run_queue. Hence we trigger the queue and dispatch it in the next iteration. Actually we remove jobs from run_queue in job_run_and_invalidate() before we call unit_start(). Hence if we want to attempt to run the job again in the future we need to add it back to run_queue. Fixes #21458
* namespace: allow ProcSubset=pid with some ProtectKernel optionsTopi Miettinen2021-11-271-8/+34
| | | | | | | | | In case `/proc` is successfully mounted with pid tree subset only due to `ProcSubset=pid`, the protective mounts for `ProtectKernelTunables=yes` and `ProtectKernelLogs=yes` to non-pid `/proc` paths are failing because the paths don't exist. But the pid only option may have failed gracefully (for example because of ancient kernel), so let's try the mounts but it's not fatal if they don't succeed.
* json: add new JSON_BUILD_CONST_STRING() macroLennart Poettering2021-11-251-8/+8
| | | | | | | | | | | | | | | | This macro is like JSON_BUILD_STRING() but uses our json library's ability to use literal strings directly as JsonVariant objects. The changes all our codebase to use this new macro whenever we build JSON objects from literal strings. (I tried to make this automatic, i.e. to detect in JSON_BUILD_STRING() whether something is a literal string nicely and thus do this stuff automatically, but I couldn't find a way.) This should reduce memory usage of our JSON code a bit. Constant strings we use very often will now be shared and mapped directly from the ELF image.
* Merge pull request #21503 from poettering/ioprio-fixYu Watanabe2021-11-253-9/+12
|\ | | | | work around linux 5.15 ioprio API breakage
| * core: normalize ioprio values we acquire from kernelLennart Poettering2021-11-241-1/+1
| | | | | | | | | | | | | | | | | | | | Linux 5.15 broke API in ioprio_get(): instead of returning IOPRIO_CLASS_NONE when that's set it now returns IOPRIO_CLASS_BE, which is what this actually is (the former is just an alias for the latter with a priority value of 4). Let's hide the differences between old and new kernels here, and always normalize to what the new kernels do.
| * ioprio: normalize io priority values in configurationLennart Poettering2021-11-242-4/+4
| | | | | | | | Let's always say IOPRIO_CLASS_BE when IOPRIO_CALSS_NONE is set.
| * ioprio-util: add macro for default ioprio settingsLennart Poettering2021-11-242-4/+4
| | | | | | | | | | | | | | | | | | | | IOPRIO_CLASS_NONE with any priority value actually is an alias for IOPRIO_CLASS_BE with priority value 4 – which is the default ioprio for all processes. We got this right at one place, but wrong at three others (where we assumed the default value was 0, not 4). Let's add a macro that encodes this properly, and use it everywhere.
| * shared: split out ioprio related stuff into ioprio-util.[ch]Lennart Poettering2021-11-243-0/+3
| | | | | | | | No actual code changes, just some splitting out.
* | Merge pull request #21508 from poettering/conn-count-fixYu Watanabe2021-11-253-28/+37
|\ \ | | | | | | pid1: fix connection counting
| * | socket: various modernizationsLennart Poettering2021-11-251-12/+13
| | |
| * | socket: always pass socket, fd and SocketPeer ownership to service togetherLennart Poettering2021-11-253-16/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Per-connection socket instances we currently maintain three fields related to the socket: a reference to the Socket unit, the connection fd, and a reference to the SocketPeer object that counts socket peers. Let's synchronize their lifetime, i.e. always set them all three together or unset them together, so that their reference counters stay synchronous. THis will in particuar ensure that we'll drop the SocketPeer reference whenever we leave an active state of the service unit, i.e. at the same time we close the fd for it. Fixes: #20685
* | | build: fix build without seccompDominique Martinet2021-11-251-23/+23
|/ / | | | | | | | | | | | | | | | | | | - execute.c: bpf functions were in the middle of an #if HAVE_SECCOMP block for no reason - test-fd-util.c: make seccomp-util.h includable without depending on <seccomp.h>, and make is_seccomp_available() hardcoded to returning false in this case. Also fix a stray DEFINED() -- HAVE_SECCOMP is defined as 0, so normal #if should be used like everywhere else.
* / unit_is_bound_by_inactive: fix return pointer checkDominique Martinet2021-11-241-1/+1
|/ | | | | | | | | | *ret_culprit should be set if ret_culprit has been passed a non-null value, checking the previous *ret_culprit value does not make sense. This would cause the culprit to not properly be assigned, leading to pid1 crash when a unit could not be stopped. Fixes: #21476
* bpf: fix memleak in restrict_fs_bpfJulia Kartseva2021-11-241-1/+1
| | | | | | Memory allocated in bpf skeleton is not freed. Wrap ptr in _cleanup_. Fixes: #21471
* extension-release.d/: add a new field SYSEXT_SCOPE= for clarifying what a ↵Lennart Poettering2021-11-231-1/+1
| | | | | | | | | | | | | | | | system extension is for This should make things a bit more robust since it ensures system extension can only applied to the right environments. Right now three different "scopes" are defined: 1. "system" (for regular OS systems, after the initrd transition) 2. "initrd" (for sysext images that apply to the initrd environment) 3. "portable" (for sysext images that apply to portable images) If not specified we imply a default of "system portable", i.e. any image where the field is not specified is implicitly OK for application to OS images and for portable services – but not for initrds.
* core/automount: Add ExtraOptions fieldAndrew Stone2021-11-234-3/+39
|
* core: prefix functions to avoid identical static function namesChristian Göttsche2021-11-201-14/+14
| | | | | | | | | The function name `method_reload` is used both in dbus-unit.c and dbus-manager.c for static functions. With the previous addition of adding the function name to the audit information on SELinux denials, rename the one (and its relatives) in dbus-unit.c as most of the functions in src/core/dbus-unit.c are already prefixed with `bus_unit_`.
* selinux: name mac_selinux_generic_access_check as internal functionChristian Göttsche2021-11-202-9/+9
| | | | | | `mac_selinux_generic_access_check()` should not be called directly, only via the wrapper macros `mac_selinux_access_check` and `mac_selinux_unit_access_check`.
* selinux: improve debug log formatChristian Göttsche2021-11-201-1/+1
| | | | | | | | path might be NULL when checking against the system permissions, so wrap with strna(). The command line might not be available over D-Bus and thus cl might be empty. Print "n/a" instead of the empty string.
* selinux: add function name to audit dataChristian Göttsche2021-11-202-7/+17
| | | | | | Include the systemd C function name in the audit message to improve the debug ability on denials. Similar like kernel denial messages include the syscall name.
* tree-wide: port various places over to open_mkdir_at()Lennart Poettering2021-11-171-7/+10
|
* shared: clean up mkdir.h/label.h situationLennart Poettering2021-11-1612-12/+12
| | | | | | | | | | Previously the mkdir_label() family of calls was implemented in src/shared/mkdir-label.c but its functions partly declared ins src/shared/label.h and partly in src/basic/mkdir.h (!!). That's weird (and wrong). Let's clean this up, and add a proper mkdir-label.h matching the .c file.
* tree-wide: use new RET_NERRNO() helper at various placesLennart Poettering2021-11-163-22/+13
|
* shared: split out UID allocation range stuff from user-record.hLennart Poettering2021-11-131-1/+1
| | | | | | | | user-record.[ch] are about the UserRecord JSON stuff, and the UID allocation range stuff (i.e. login.defs handling) is a very different thing, and complex enough on its own, let's give it its own c/h files. No code changes, just some splitting out of code.
* Merge pull request #21320 from poettering/namespace-mkdir-umaskLennart Poettering2021-11-121-24/+25
|\ | | | | make pid1 namespace code independent of umask
| * namespace: make tmp dir handling code independent of umask tooLennart Poettering2021-11-121-5/+7
| | | | | | | | | | | | | | Let's make all code in namespace.c robust towards weird umask. This doesn't matter too much given that the parent dirs we deal here almost certainly exist anyway, but let's clean this up anyway and make it fully clean.
| * namespace: make whole namespace_setup() work regardless of configured umaskLennart Poettering2021-11-121-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | Let's reset the umask during the whole namespace_setup() logic, so that all our mkdir() + mknod() are not subjected to whatever umask might currently be set. This mostly moves the umask save/restore logic out of mount_private_dev() and into the stack frame of namespace_setup() that is further out. Fixes #19899
| * namespace: rebreak a few commentsLennart Poettering2021-11-121-16/+14
| |
* | execute: always log a warning when setting SELinux context failsTopi Miettinen2021-11-121-6/+12
| | | | | | | | Update also manual page to explain how the transition can still fail.
* | Change gendered terms to be gender-neutral (#21325)Emily Gonyer2021-11-121-1/+1
|/ | | Some typos are also fixed.
* pid1: add a manager_trigger_run_queue() helperLennart Poettering2021-11-124-12/+19
| | | | | | | | We have two different places where we re-trigger the run queue now. let's unify it under a common function, that is part of the Manager code. Follow-up for #20953
* Merge pull request #20953 from msekletar/mount-ratelimit-followup-20329Lennart Poettering2021-11-129-25/+49
|\ | | | | Delay running mount start jobs when we /p/s/mountinfo event source is rate limited
| * mount: retrigger run queue after ratelimit expired to run delayed mount ↵Michal Sekletar2021-11-111-0/+21
| | | | | | | | | | | | start jobs Fixes #20329
| * mount: make mount units start jobs not runnable if /p/s/mountinfo ratelimit ↵Michal Sekletar2021-11-111-0/+3
| | | | | | | | is in effect
| * core: rename/generalize UNIT(u)->test_start_limit() hookMichal Sekletar2021-11-119-25/+25
| | | | | | | | | | | | | | | | | | | | | | | | Up until now the main reason why we didn't proceed with starting the unit was exceed start limit burst. However, for unit types like mounts the other reason could be effective ratelimit on /proc/self/mountinfo event source. That means our mount unit state may not reflect current kernel state. Hence, we need to attempt to re-run the start job again after ratelimit on event source expires. As we will be introducing another reason than start limit let's rename the virtual function that implements the check.
* | Merge pull request #21241 from wat-ze-hex/2021-11-04-fix-bpf-foreign-realizationLuca Boccassi2021-11-112-11/+14
|\ \ | |/ |/| core, bpf: fix bpf-foreign cgroup controller realization
| * core: check fs type of BPFProgram= property pathJulia Kartseva2021-11-111-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tests: ``` % stat --file-system --format="%T" /root/bpf/trivial/ bpf_fs % systemd-nspawn -D/ --volatile=yes \ --property=BPFProgram=egress:/root/bpf/trivial/cgroup_skb_egress \ --quiet -- ping -c 5 -W 1 ::1 PING ::1(::1) 56 data bytes --- ::1 ping statistics --- 5 packets transmitted, 0 received, 100% packet loss, time 4110ms ``` ``` % stat --file-system --format='%T' /root/meh btrfs % systemd-nspawn -D/ --volatile=yes --property=BPFProgram=egress:/root/meh --quiet -- ping -c 5 -W 1 ::1 ``` sudo ./build/systemd-nspawn \ -D/ --volatile=yes --property=BPFProgram=egress:/home/hex --quiet -- \ ping -c 1 -W 1 ::1 PING ::1(::1) 56 data bytes 64 bytes from ::1: icmp_seq=1 ttl=64 time=0.017 ms --- ::1 ping statistics --- 1 packets transmitted, 1 received, 0% packet loss, time 0ms
| * core: fix bpf-foreign cg controller realizationJulia Kartseva2021-11-112-11/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Requiring /sys/fs/bpf path to be a mount point at the moment of cgroup controllers realization does more harm than good, because: * Realization happens early on boot, the mount point may not be ready at the time. That happens if mounts are made by a .mount unit (the issue we encountered). * BPF filesystem may be mounted on another point. Remove the check. Instead verify that path provided by BPFProgram= is within BPF fs when unit properties are parsed. Split in two commits for simple backport.
* | scope: count successful cgroup additions when delegating via D-BusJonas Witschel2021-11-111-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 8d3e4ac7cd37200d1431411a4b98925a24b7d9b3 ("scope: refuse activation of scopes if no PIDs to add are left") all "systemd-run --scope --user" calls fail because cgroup attachments delegated to the system instance are not counted towards successful additions. Fix this by incrementing the return value in case unit_attach_pid_to_cgroup_via_bus() succeeds, similar to what happens when cg_attach() succeeds directly. Note that this can *not* distinguish the case when unit_attach_pid_to_cgroup_via_bus() has been run successfully, but all processes to attach are gone in the meantime, unlike the checks that commit 8d3e4ac7cd37200d1431411a4b98925a24b7d9b3 adds for the system instance. This is because even though unit_attach_pid_to_cgroup_via_bus() leads to an internal unit_attach_pids_to_cgroup() call, the return value over D-Bus does not include the number of successfully attached processes and is always NULL on success. Fixes: #21297
* | escape: add flags argument to quote_command_line()Lennart Poettering2021-11-111-6/+5
|/ | | | That way, we can reuse the call at one more place (see later patch).
* core: replace slice dependencies as they get addedAnita Zhang2021-11-105-7/+15
| | | | | | | | Defines a "UNIT_DEPENDENCY_SLICE_PROPERTY" UnitDependencyMask type that is used when adding slices to the dependencies hashmap. This type is used to remove slice dependencies when they get overridden by new ones. Fixes #20182
* Merge pull request #20813 from unusual-thoughts/exittype_v2Zbigniew Jędrzejewski-Szmek2021-11-086-58/+95
|\ | | | | Reintroduce ExitType
| * Reintroduce ExitTypeHenri Chain2021-11-086-58/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This introduces `ExitType=main|cgroup` for services. Similar to how `Type` specifies the launch of a service, `ExitType` is concerned with how systemd determines that a service exited. - If set to `main` (the current behavior), the service manager will consider the unit stopped when the main process exits. - The `cgroup` exit type is meant for applications whose forking model is not known ahead of time and which might not have a specific main process. The service will stay running as long as at least one process in the cgroup is running. This is intended for transient or automatically generated services, such as graphical applications inside of a desktop environment. Motivation for this is #16805. The original PR (#18782) was reverted (#20073) after realizing that the exit status of "the last process in the cgroup" can't reliably be known (#19385) This version instead uses the main process exit status if there is one and just listens to the cgroup empty event otherwise. The advantages of a service with `ExitType=cgroup` over scopes are: - Integrated logging / stdout redirection - Avoids the race / synchronisation issue between launch and scope creation - More extensive use of drop-ins and thus distro-level configuration: by moving from scopes to services we can have drop ins that will affect properties that can only be set during service creation, like `OOMPolicy` and security-related properties - It makes systemd-xdg-autostart-generator usable by fixing [1], as obviously only services can be used in the generator, not scopes. [1] https://bugs.kde.org/show_bug.cgi?id=433299
* | build: preserve correct mode when generating files via jinja2Christian Brauner2021-11-081-4/+2
| | | | | | | | | | | | | | | | | | When using "capture : true" in custom_target()s the mode of the source file is not preserved when the generated file is not installed and so needs to be tweaked manually. Switch from output capture to creating the target file and copy the permissions from the input file. Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
* | exec: Add TTYRows and TTYColumns properties to set TTY dimensionsDaan De Meyer2021-11-056-5/+65
|/
* Merge pull request #20138 from keszybz/coding-style-variable-declsLuca Boccassi2021-11-052-7/+18
|\ | | | | A coding style tweak and checking of sd_notify() calls and voidification of pager_open()
| * Make pager_open() return voidZbigniew Jędrzejewski-Szmek2021-11-031-1/+1
| |
| * manager: fix confusion when to send READY=1Zbigniew Jędrzejewski-Szmek2021-11-031-3/+3
| | | | | | | | | | | | | | | | | | | | | | I got the logic reversed in 6d9326595592f98e8126eacb4176acd8c3516d5c. Let's just remove the conditionalization of the status message: if we're sending something, we might just as well always attach READY=1, the extra few bytes don't make much of a difference. FWIW, it seems that this bug didn't cause problems, probably because we'd send READY=1 either from user_manager_send_ready() or from a later call to manager_send_ready().
| * tree-wide: drop "f" from sd_notify() calls with a static stringZbigniew Jędrzejewski-Szmek2021-11-031-3/+3
| | | | | | | | If we don't need to do any formatting, let's optimize things a bit.
| * tree-wide: warn when sd_notify fails with READY=1 or FDSTOREREMOVE=1Zbigniew Jędrzejewski-Szmek2021-11-031-6/+17
| | | | | | | | | | | | | | | | Most sd_notify() calls are like log_info() — the result is only informative and if they fail, it's best ignore this. But if a call with READY=1 fails, the unit may enter a failed state, so we should warn about this. Similarly for FSTOREREMOVE=1: the manager may be left with a stale fd, at least wasting resources.