| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This new setting allows unsharing the pid namespace in a unit. Because
you have to fork to get a process into a pid namespace, we fork in
systemd-executor to get into the new pid namespace. The parent then
sends the pid of the child process back to the manager and exits while
the child process continues on with the rest of exec_invoke() and then
executes the actual payload.
Communicating the child pid is done via a new pidref socket pair that is
set up on manager startup.
We unshare the PID namespace right before the mount namespace so we
mount procfs correctly. Note PrivatePIDs=yes always implies MountAPIVFS=yes
to mount procfs.
When running unprivileged in a user session, user namespace is set up first
to allow for PID namespace to be unshared. However, when running in
privileged mode, we unshare the user namespace last to ensure the user
namespace does not own the PID namespace and cannot break out of the sandbox.
Note we disallow Type=forking services from using PrivatePIDs=yes since the
init proess inside the PID namespace must not exit for other processes in
the namespace to exist.
Note Daan De Meyer did the original work for this commit with Ryan Wilson
addressing follow-ups.
Co-authored-by: Daan De Meyer <daan.j.demeyer@gmail.com>
|
|
|
|
|
|
| |
When specified, bootctl install will also set up secure boot
auto-enrollment. For now, We sign all variables using the same
certificate and key pair.
|
|
|
|
|
|
|
|
| |
The names of these conflict with macros from efi.h that we'll move
to efi-fundamental.h in a later commit. Let's avoid the conflict by
getting rid of these helpers. Arguably this also improves readability
by clearly indicating we're passing arbitrary strings and not constants
to the macros when we invoke them.
|
|
|
|
| |
for a given path
|
|\
| |
| |
| |
| |
| |
| |
| | |
This makes use of the new TIOCGPTPEER pty ioctl() for directly opening a
PTY peer, without going via path names. This is nice because it closes a
race around allocating and opening the peer. And also has the nice
benefit that if we acquired an fd originating from some other
namespace/container, we can directly derive the peer fd from it, without
having to reenter the namespace again.
|
| |
| |
| |
| |
| | |
This opens a pty peer in one go, and uses the new race-free TIOCGPTPEER
ioctl() to do so – if it is available.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Various fixes:
1. Adds O_CLOEXEC for two socketpair()s where we forgot it.
2. Uses FORK_WAIT instead of manual wait_for_terminate_and_check()
invocations.
3. Prefix opaque NULL/0 arguments with comments what they are.
4. Add a banch of assert()s, and change flag validation in
open_terminal() to be assert() (since flags mistakes are programming
errors, not runtime errors).
|
|\ \
| | |
| | |
| | |
| | |
| | | |
Fixes: #34604
Prompted by that I realized we do not correctly recognize both "ST"
sequences we want to recognize, fix that.
|
| | | |
|
| | | |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
OSC sequences can be closed with one of three terminators:
1. ASCII code 7, aka BEL, aka ^G, aka \x07, aka \a
2. ASCII code 156, aka \x9c
2. Pair of ASCII code 27 followed by ASCII code 92, aka \x1b\x5c
Of these, in some corner case scenarios BEL makes problem (see #34604).
Hence switch away from that wherever we use it, and prefer the \x1b\x5c
instead. That's preferable over \x9c, since the latter is also a valid
UTF-8 codepoint. See discussion here for example:
https://gist.github.com/egmontkob/eb114294efbcd5adb1944c9f3cb5feda#the-escape-sequence
Fixes: #34604
|
| |/ |
|
| |
| |
| |
| |
| |
| | |
instead of passing a boolean picking the destruction method just have
different functions. That's much nicer in context of _cleanup_, and how
we usually do things.
|
| | |
|
| | |
|
| | |
|
|\ \
| | |
| | | |
Follow-ups for #34761.
|
| | | |
|
| |/ |
|
|/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Setting this flag is a noop without a corresponding call to
posix_spawnattr_setsigdefault.
If we call posix_spawnattr_setsigdefault with a full signal set,
it causes glibc's posix_spawn implementation to call sigaction 63 times,
once for each signal. That seems wasteful.
This feature is really only useful for signals which have their
disposition set to SIG_IGN. Otherwise the dispostion gets set to
SIG_DFL automatically, either by clone(CLONE_CLEAR_SIGHAND) or the
subsequent execve.
As far as I can tell, systemd does not have any signals set to SIG_IGN
under normal operating conditions.
|
| |
|
|
|
|
| |
(#34893)
|
|
|
|
|
| |
This makes use of the infra introduced in 229d4a980607e9478cf1935793652ddd9a14618b to indicate visually on each prompt that we are in superuser mode temporarily.
pick ad5de3222f userdbctl: add some basic client-side filtering
|
| |
|
|\
| |
| | |
core: follow-ups for live mount
|
| | |
|
| | |
|
| | |
|
| |
| |
| |
| |
| | |
Let#s move it close to label_ops_set(), since it is somewhat symmetric
to it.
|
| | |
|
| |
| |
| |
| |
| |
| | |
This brings two benefits: we will label the created file only if it is
actually created, and we can correctly delete any file we create again
on failure.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
WRITE_STRING_FILE_LABEL flag
Given that we have the LabelOps abstraction these days, we can teach
write_string_file() to use it, which means we can get rid of
fileio-label.[ch] as a separate concept.
(The only reason that fileio-label.[ch] exists independently of
fileio.[ch] was that the former linekd to libselinux potentially, and
thus had to be in src/shared/ while the other always was in src/basic/.
But the LabelOps vtable provides us with a nice work-around)
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
dangling symlink
One of the big mistakes of Linux is that when you create a file with
open() and O_CREAT and the file already exists as dangling symlink that
the symlink will be followed and the file created that it points to.
This has resulted in many vulnerabilities, and triggered the creation of
the O_MOFOLLOW flag, addressing the problem.
O_NOFOLLOW is less than ideal in many ways, but in particular one: when
actually creating a file it makes sense to set, because it is a problem
to follow final symlinks in that case. But if the file is already
existing, it actually does make sense to follow the symlinks. With
openat_report_new() we distinguish these two cases anyway (the whole
function exists only to distinguish the create and the exists-already
case after all), hence let's do something about this: let's simply never
create files "through symlinks".
This can be implemented very easily: just pass O_NOFOLLOW to the 2nd
openat() call, where we actually create files.
And then basically remove 0dd82dab91eaac5e7b17bd5e9a1e07c6d2b78dca
again, because we don't need to care anymore, we already will see ELOOP
when we touch a symlink.
Note that this change means that openat_report_new() will thus start to
deviate from plain openat() behaviour in this one small detail: when
actually creating files we will *never* follow the symlink. That should
be a systematic improvement of security.
Fixes: #34088
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
error path
For SELinux it is essential that we reset the file creation label both
in the success and in the error path, hence do so.
Moreover, when calling the label post ops do it if possible with the
opened fd of the inode itself, rather than always going via its path,
simply to reduce the attack surface.
|
| |
| |
| |
| |
| |
| |
| | |
If openat_report_new() fails, then 'made_file' will be false, as no file
was created, hence there's no need to skip the unlinkat() explicitly
early, given that we check for 'made_file' anyway in the error path. The
extra error code checks are hence entirely redundant.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We have two distinct implementations of the post hook.
1. For SELinux we just reset the selinux label we told the kernel
earlier to use for new inodes.
2. For SMACK we might apply an xattr to the specified file.
The two calls are quite different: the first call we want to call in all
cases (failure or success), the latter only if we actually managed to
create an inode, in which case it is called on the inode.
|
| |
| |
| |
| | |
We didn't go through it at all if label_ops_post() failed.
|
|/ |
|
|\
| |
| | |
modernize the ask-password logic, and add unpriv askpw agents to the concept
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Previously, we were using touch(), which usually works fine, because the
path should always refer to an existing directory, in which case it just
updates the timestamp. However, if the dir does not exist yet (which
shouldn't happen), it would be created as regular file, which is just
wrong.
Hence, let's instead create the dir as dir if it is missing, and then
update its timestamp.
|
|/
|
|
|
|
|
|
|
|
| |
to a TTY
Let's provide a mechanism to select the number of screen columns for
rebreaking comments in Varlink IDL connected to a TTY, by honouring the
$COLUMNS env var then too. Previously we'd only honour when connected to
a TTY, but it's also useful otherwise for rebreaking ridiculously long
comments, hence honour it in this case too.
|
| |
|
|
|
|
|
|
|
| |
Previously, GREEDY_REALLOC_APPEND would compile perfectly fine and cause
subtle memory corruption if the caller messes up the type they're passing
in (i.e. by forgetting to pass-by-reference when appending a Type* to an
array of Type*). Now this will lead to compilation failure
|
| |
|
|
|
|
| |
Follow-up for de34ec188c4d4f682a337445aa7753259cd7f821.
|
|\
| |
| | |
fileio: write_string_file() naming clean-ups
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
let's rename the "_ts" flavour of these calls "_full" instead, exposing
the full functionality. And then keep two more minimal versions around:
one "_at" (which has the ts parameter suppressed, but keeps the dir_fd
one). And one without suffix (which supresses both).
Do the same for the label versions of these calls.
|
|/
|
|
|
|
|
|
|
|
| |
This PidRef just track some data, but cannot be used for any active
operation.
Background: for https://github.com/systemd/systemd/pull/34703 it makes
sense to track explicitly if some PidRef is not a local one, so that we
never attempt to for example "kill a remote process" and thus
acccidentally hit the wrong process (i.e. a local one by the same PID).
|
| |
|
|\
| |
| | |
machined: switch remaining Varlink overs over to use json_dispatch_pidref() and friends
|