| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
| |
Assumption in edc027b was that job we first skipped because of active
ratelimit is still in run_queue. Hence we trigger the queue and dispatch
it in the next iteration. Actually we remove jobs from run_queue in
job_run_and_invalidate() before we call unit_start(). Hence if we want
to attempt to run the job again in the future we need to add it back
to run_queue.
Fixes #21458
|
|
|
|
|
|
|
|
|
| |
In case `/proc` is successfully mounted with pid tree subset only due to
`ProcSubset=pid`, the protective mounts for `ProtectKernelTunables=yes` and
`ProtectKernelLogs=yes` to non-pid `/proc` paths are failing because the paths
don't exist. But the pid only option may have failed gracefully (for example
because of ancient kernel), so let's try the mounts but it's not fatal if they
don't succeed.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This macro is like JSON_BUILD_STRING() but uses our json library's
ability to use literal strings directly as JsonVariant objects.
The changes all our codebase to use this new macro whenever we build
JSON objects from literal strings.
(I tried to make this automatic, i.e. to detect in JSON_BUILD_STRING()
whether something is a literal string nicely and thus do this stuff
automatically, but I couldn't find a way.)
This should reduce memory usage of our JSON code a bit. Constant strings
we use very often will now be shared and mapped directly from the ELF
image.
|
|\
| |
| | |
work around linux 5.15 ioprio API breakage
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Linux 5.15 broke API in ioprio_get(): instead of returning
IOPRIO_CLASS_NONE when that's set it now returns IOPRIO_CLASS_BE, which
is what this actually is (the former is just an alias for the latter
with a priority value of 4).
Let's hide the differences between old and new kernels here, and always
normalize to what the new kernels do.
|
| |
| |
| |
| | |
Let's always say IOPRIO_CLASS_BE when IOPRIO_CALSS_NONE is set.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
IOPRIO_CLASS_NONE with any priority value actually is an alias for
IOPRIO_CLASS_BE with priority value 4 – which is the default ioprio for
all processes.
We got this right at one place, but wrong at three others (where we
assumed the default value was 0, not 4). Let's add a
macro that encodes this properly, and use it everywhere.
|
| |
| |
| |
| | |
No actual code changes, just some splitting out.
|
|\ \
| | |
| | | |
pid1: fix connection counting
|
| | | |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Per-connection socket instances we currently maintain three fields
related to the socket: a reference to the Socket unit, the connection fd,
and a reference to the SocketPeer object that counts socket peers.
Let's synchronize their lifetime, i.e. always set them all three
together or unset them together, so that their reference counters stay
synchronous.
THis will in particuar ensure that we'll drop the SocketPeer reference
whenever we leave an active state of the service unit, i.e. at the same
time we close the fd for it.
Fixes: #20685
|
|/ /
| |
| |
| |
| |
| |
| |
| |
| |
| | |
- execute.c: bpf functions were in the middle of an #if HAVE_SECCOMP
block for no reason
- test-fd-util.c: make seccomp-util.h includable without depending on
<seccomp.h>, and make is_seccomp_available() hardcoded to returning
false in this case.
Also fix a stray DEFINED() -- HAVE_SECCOMP is defined as 0, so normal
#if should be used like everywhere else.
|
|/
|
|
|
|
|
|
|
|
| |
*ret_culprit should be set if ret_culprit has been passed a non-null value,
checking the previous *ret_culprit value does not make sense.
This would cause the culprit to not properly be assigned, leading to
pid1 crash when a unit could not be stopped.
Fixes: #21476
|
|
|
|
|
|
| |
Memory allocated in bpf skeleton is not freed. Wrap ptr in _cleanup_.
Fixes: #21471
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
system extension is for
This should make things a bit more robust since it ensures system
extension can only applied to the right environments. Right now three
different "scopes" are defined:
1. "system" (for regular OS systems, after the initrd transition)
2. "initrd" (for sysext images that apply to the initrd environment)
3. "portable" (for sysext images that apply to portable images)
If not specified we imply a default of "system portable", i.e. any image
where the field is not specified is implicitly OK for application to OS
images and for portable services – but not for initrds.
|
| |
|
|
|
|
|
|
|
|
|
| |
The function name `method_reload` is used both in dbus-unit.c and
dbus-manager.c for static functions.
With the previous addition of adding the function name to the audit
information on SELinux denials, rename the one (and its relatives) in
dbus-unit.c as most of the functions in src/core/dbus-unit.c are already
prefixed with `bus_unit_`.
|
|
|
|
|
|
| |
`mac_selinux_generic_access_check()` should not be called directly, only
via the wrapper macros `mac_selinux_access_check` and
`mac_selinux_unit_access_check`.
|
|
|
|
|
|
|
|
| |
path might be NULL when checking against the system permissions, so wrap
with strna().
The command line might not be available over D-Bus and thus cl might be
empty. Print "n/a" instead of the empty string.
|
|
|
|
|
|
| |
Include the systemd C function name in the audit message to improve the
debug ability on denials.
Similar like kernel denial messages include the syscall name.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Previously the mkdir_label() family of calls was implemented in
src/shared/mkdir-label.c but its functions partly declared ins
src/shared/label.h and partly in src/basic/mkdir.h (!!). That's weird
(and wrong).
Let's clean this up, and add a proper mkdir-label.h matching the .c
file.
|
| |
|
|
|
|
|
|
|
|
| |
user-record.[ch] are about the UserRecord JSON stuff, and the UID
allocation range stuff (i.e. login.defs handling) is a very different
thing, and complex enough on its own, let's give it its own c/h files.
No code changes, just some splitting out of code.
|
|\
| |
| | |
make pid1 namespace code independent of umask
|
| |
| |
| |
| |
| |
| |
| | |
Let's make all code in namespace.c robust towards weird umask. This
doesn't matter too much given that the parent dirs we deal here almost
certainly exist anyway, but let's clean this up anyway and make it fully
clean.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Let's reset the umask during the whole namespace_setup() logic, so that
all our mkdir() + mknod() are not subjected to whatever umask might
currently be set.
This mostly moves the umask save/restore logic out of
mount_private_dev() and into the stack frame of namespace_setup() that
is further out.
Fixes #19899
|
| | |
|
| |
| |
| |
| | |
Update also manual page to explain how the transition can still fail.
|
|/
|
| |
Some typos are also fixed.
|
|
|
|
|
|
|
|
| |
We have two different places where we re-trigger the run queue now.
let's unify it under a common function, that is part of the Manager
code.
Follow-up for #20953
|
|\
| |
| | |
Delay running mount start jobs when we /p/s/mountinfo event source is rate limited
|
| |
| |
| |
| |
| |
| | |
start jobs
Fixes #20329
|
| |
| |
| |
| | |
is in effect
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Up until now the main reason why we didn't proceed with starting the
unit was exceed start limit burst. However, for unit types like mounts
the other reason could be effective ratelimit on /proc/self/mountinfo
event source. That means our mount unit state may not reflect current
kernel state. Hence, we need to attempt to re-run the start job again
after ratelimit on event source expires.
As we will be introducing another reason than start limit let's rename
the virtual function that implements the check.
|
|\ \
| |/
|/| |
core, bpf: fix bpf-foreign cgroup controller realization
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Tests:
```
% stat --file-system --format="%T" /root/bpf/trivial/
bpf_fs
% systemd-nspawn -D/ --volatile=yes \
--property=BPFProgram=egress:/root/bpf/trivial/cgroup_skb_egress \
--quiet -- ping -c 5 -W 1 ::1
PING ::1(::1) 56 data bytes
--- ::1 ping statistics ---
5 packets transmitted, 0 received, 100% packet loss, time 4110ms
```
```
% stat --file-system --format='%T' /root/meh
btrfs
% systemd-nspawn -D/ --volatile=yes --property=BPFProgram=egress:/root/meh
--quiet -- ping -c 5 -W 1 ::1
```
sudo ./build/systemd-nspawn \
-D/ --volatile=yes --property=BPFProgram=egress:/home/hex --quiet -- \
ping -c 1 -W 1 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.017 ms
--- ::1 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Requiring /sys/fs/bpf path to be a mount point at the moment of cgroup
controllers realization does more harm than good, because:
* Realization happens early on boot, the mount point may not be ready at
the time. That happens if mounts are made by a .mount unit (the issue we
encountered).
* BPF filesystem may be mounted on another point.
Remove the check. Instead verify that path provided by BPFProgram= is
within BPF fs when unit properties are parsed.
Split in two commits for simple backport.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Since commit 8d3e4ac7cd37200d1431411a4b98925a24b7d9b3 ("scope: refuse
activation of scopes if no PIDs to add are left") all "systemd-run --scope
--user" calls fail because cgroup attachments delegated to the system instance
are not counted towards successful additions. Fix this by incrementing the
return value in case unit_attach_pid_to_cgroup_via_bus() succeeds, similar to
what happens when cg_attach() succeeds directly.
Note that this can *not* distinguish the case when
unit_attach_pid_to_cgroup_via_bus() has been run successfully, but all
processes to attach are gone in the meantime, unlike the checks that commit
8d3e4ac7cd37200d1431411a4b98925a24b7d9b3 adds for the system instance. This is
because even though unit_attach_pid_to_cgroup_via_bus() leads to an internal
unit_attach_pids_to_cgroup() call, the return value over D-Bus does not include
the number of successfully attached processes and is always NULL on success.
Fixes: #21297
|
|/
|
|
| |
That way, we can reuse the call at one more place (see later patch).
|
|
|
|
|
|
|
|
| |
Defines a "UNIT_DEPENDENCY_SLICE_PROPERTY" UnitDependencyMask type that
is used when adding slices to the dependencies hashmap. This type is
used to remove slice dependencies when they get overridden by new ones.
Fixes #20182
|
|\
| |
| | |
Reintroduce ExitType
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This introduces `ExitType=main|cgroup` for services.
Similar to how `Type` specifies the launch of a service, `ExitType` is
concerned with how systemd determines that a service exited.
- If set to `main` (the current behavior), the service manager will consider
the unit stopped when the main process exits.
- The `cgroup` exit type is meant for applications whose forking model is not
known ahead of time and which might not have a specific main process.
The service will stay running as long as at least one process in the cgroup
is running. This is intended for transient or automatically generated
services, such as graphical applications inside of a desktop environment.
Motivation for this is #16805. The original PR (#18782) was reverted (#20073)
after realizing that the exit status of "the last process in the cgroup" can't
reliably be known (#19385)
This version instead uses the main process exit status if there is one and just
listens to the cgroup empty event otherwise.
The advantages of a service with `ExitType=cgroup` over scopes are:
- Integrated logging / stdout redirection
- Avoids the race / synchronisation issue between launch and scope creation
- More extensive use of drop-ins and thus distro-level configuration:
by moving from scopes to services we can have drop ins that will affect
properties that can only be set during service creation,
like `OOMPolicy` and security-related properties
- It makes systemd-xdg-autostart-generator usable by fixing [1], as obviously
only services can be used in the generator, not scopes.
[1] https://bugs.kde.org/show_bug.cgi?id=433299
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When using "capture : true" in custom_target()s the mode of the source
file is not preserved when the generated file is not installed and so
needs to be tweaked manually. Switch from output capture to creating the
target file and copy the permissions from the input file.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
|
|/ |
|
|\
| |
| | |
A coding style tweak and checking of sd_notify() calls and voidification of pager_open()
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
I got the logic reversed in 6d9326595592f98e8126eacb4176acd8c3516d5c.
Let's just remove the conditionalization of the status message: if we're
sending something, we might just as well always attach READY=1, the extra
few bytes don't make much of a difference.
FWIW, it seems that this bug didn't cause problems, probably because we'd send
READY=1 either from user_manager_send_ready() or from a later call to
manager_send_ready().
|
| |
| |
| |
| | |
If we don't need to do any formatting, let's optimize things a bit.
|
| |
| |
| |
| |
| |
| |
| |
| | |
Most sd_notify() calls are like log_info() — the result is only informative
and if they fail, it's best ignore this. But if a call with READY=1 fails,
the unit may enter a failed state, so we should warn about this. Similarly
for FSTOREREMOVE=1: the manager may be left with a stale fd, at least wasting
resources.
|