| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
| |
Co-authored-by: Tom Coldrick <thomas.coldrick@codethink.co.uk>
Co-authored-by: Abderrahim Kitouni <abderrahim.kitouni@codethink.co.uk>
|
|
|
|
|
|
|
|
|
|
| |
Previously, importd was only accessible via D-Bus, which required it to
be a late boot service. Now that we have Varlink we can rearrange things
to become early-boot activated, just after the image directories are
mounted.
This will later allow us to have generator that auto-downloads images on
boot.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Otherwise, if a service unit that requests LogNamespace= stopped before
systemd-journald@.service is started, logs generated by the service will be
lost, as systemd-journald@.socket is stopped and
systemd-journald@.service will never started.
To prevent the issue, let's introduce another implicit dependency to
a oneshot service that explicitly synchronizes a namespaced journal file
when the log namespace is not needed anymore.
Fixes #32604.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
assigning resources to them
This adds a small, socket-activated Varlink daemon that can delegate UID
ranges for user namespaces to clients asking for it.
The primary call is AllocateUserRange() where the user passes in an
uninitialized userns fd, which is then set up.
There are other calls that allow assigning a mount fd to a userns
allocated that way, to set up permissions for a cgroup subtree, and to
allocate a veth for such a user namespace.
Since the UID assignments are supposed to be transitive, i.e. not
permanent, care is taken to ensure that users cannot create inodes owned
by these UIDs, so that persistancy cannot be acquired. This is
implemented via a BPF-LSM module that ensures that any member of a
userns allocated that way cannot create files unless the mount it
operates on is owned by the userns itself, or is explicitly
allowelisted.
BPF LSM program with contributions from Alexei Starovoitov.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
stale HibernateLocation EFI variable
Currently, if the HibernateLocation EFI variable exists,
but we failed to resume from it, the boot carries on
without clearing the stale variable. Therefore, the subsequent
boots would still be waiting for the device timeout,
unless the variable is purged manually.
There's no point to keep trying to resume after a successful
switch-root, because the hibernation image state
would have been invalidated by then. OTOH, we don't
want to clear the variable prematurely either,
i.e. in initrd, since if the resume device is the same
as root one, the boot won't succeed and the user might
be able to try resuming again. So, let's introduce a
unit that only runs after switch-root and clears the var.
Fixes #32021
|
| |
|
|\
| |
| | |
New capsule@.service feature
|
| | |
|
| | |
|
|/
|
|
|
|
|
|
|
|
|
|
|
| |
This new passive target is supposed to be pulled in by SSH
implementations and should be reached when remote SSH access is
possible. The idea is that this target can be used as indicator for
other components to determine if and when SSH access is possible.
One specific usecase for this is the new sd_notify() logic in PID 1 that
sends its own supervisor notifications whenever target units are
reached. This can be used to precisely schedule SSH connections from
host to VM/container, or just to identify systems where SSH is even
available.
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Then, this introduces systemd-networkd-persistent-storage.service.
systemd-networkd.service is an early starting service. So, at the time
it is started, the persistent storage for the service may not be ready,
and we cannot use StateDirectory=systemd/network in
systemd-networkd.service.
The newly added systemd-networkd-persistent-storage.service creates the
state directory for networkd, and notify systemd-networkd that the
directory is usable.
|
| |
|
|
|
|
|
| |
For now, just super basic functionality: return the list of boot menu
entries, and read/write the reboot to firmware flag
|
| |
|
|
|
|
|
| |
This can be used to make or delete a PCR policy via Varlink. It can also
be used to query the current event log in CEL format.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Distributions apparently only compile a subset of TPM2 drivers into the
kernel. For those not compiled it but provided as kmod we need a
synchronization point: we must wait before the first TPM2 interaction
until the driver is available and accessible.
This adds a tpm2.target unit as such a synchronization point. It's
ordered after /dev/tpmrm0, and is pulled in by a generator whenever we
detect that the kernel reported a TPM2 to exist but we have no device
for it yet.
This should solve the issue, but might create problems: if there are TPM
devices supported by firmware that we don't have Linux drivers for we'll
hang for a bit. Hence let's add a kernel cmdline switch to disable (or
alternatively force) this logic.
Fixes: #30164
|
| |
|
|
|
|
|
|
|
| |
This extends what systemd-firstboot does and runs on first boots only
and either processes user records passed in via credentials to create,
or asks the user interactively to create one (only if no regular user
exists yet).
|
|
|
|
| |
(This is disabled by default, for now)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This implements a "storage target mode", similar to what MacOS provides
since a long time as "Target Disk Mode":
https://en.wikipedia.org/wiki/Target_Disk_Mode
This implementation is relatively simple:
1. a new generic target "storage-target-mode.target" is added, which
when booted into defines the target mode.
2. a small tool and service "systemd-storagetm.service" is added which
exposes a specific device or all devices as NVMe-TCP devices over the
network. NVMe-TCP appears to be hot shit right now how to expose
block devices over the network. And it's really simple to set up via
configs, hence our code is relatively short and neat.
The idea is that systemd-storagetm.target can be extended sooner or
later, for example to expose block devices also as USB mass storage
devices and similar, in case the system has "dual mode" USB controller
that can also work as device, not just as host. (And people could also
plug in sharing as NBD, iSCSI, whatever they want.)
How to use this? Boot into your system with a kernel cmdline of
"rd.systemd.unit=storage-target-mode.target ip=link-local", and you'll see on
screen the precise "nvme connect" command line to make the relevant
block devices available locally on some other machine. This all requires
that the target mode stuff is included in the initrd of course. And the
system will the stay in the initrd forever.
Why bother? Primarily three use-cases:
1. Debug a broken system: with very few dependencies during boot get
access to the raw block device of a broken machine.
2. Migrate from system to another system, by dd'ing the old to the new
directly.
3. Installing an OS remotely on some device (for example via Thunderbolt
networking)
(And there might be more, for example the ability to boot from a
laptop's disk on another system)
Limitations:
1. There's no authentication/encryption. Hence: use this on local links
only.
2. NVMe target mode on Linux supports r/w operation only. Ideally, we'd
have a read-only mode, for security reasons, and default to it.
Future love:
1. We should have another mode, where we simply expose the homed LUKS
home dirs like that.
2. Some lightweight hookup with plymouth, to display a (shortened)
version of the info we write to the console.
To test all this, just run:
mkosi --kernel-command-line-extra="rd.systemd.unit=storage-target-mode.target" qemu
|
| |
|
|
|
|
|
|
|
|
| |
This is primarily supposed to be a 1st step with varlinkifying our
various command line tools, and excercise in how this might look like
across our codebase one day. However, at AllSystemsGo! 2023 it was
requested that we provide an API to do a PCR measurement along with a
matching event log record, and this provides that.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds an explicit service for initializing the TPM2 SRK. This is
implicitly also done by systemd-cryptsetup, hence strictly speaking
redundant, but doing this early has the benefit that we can parallelize
this in a nicer way. This also write a copy of the SRK public key in PEM
format to /run/ + /var/lib/, thus pinning the disk image to the TPM.
Making the SRK public key is also useful for allowing easy offline
encryption for a specific TPM.
Sooner or later we should probably grow what this service does, the
above is just the first step. For example, the service should probably
offer the ability to reset the TPM (clear the owner hierarchy?) on a
factory reset, if such a policy is needed. And we might want to install
some default AK (?).
Fixes: #27986
Also see: #22637
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before this commit, the hibernate location logic only exists in
the generator. Also, we compare device nodes (devnode_same()) and
clear EFI variable HibernateLocation in the generator too. This is
not ideal though: when the generator gets to run, udev hasn't yet
started, so effectively devnode_same() always fails. Moreover, if
the boot process is interrupted by e.g. battery-check, the hibernate
information is lost.
Therefore, let's split out the logic of finding hibernate location.
The generator only does the initial validation of system info and
enables systemd-hibernate-resume.service, and when the service
actually runs we validate everything again, which includes comparing
the device nodes and clearing the EFI variable. This should make
things more robust, plus systems that don't utilize a systemd-enabled
initrd can use the exact same logic to resume using the EFI variable.
I.e., systemd-hibernate-resume can be used standalone.
|
|
|
|
|
|
|
| |
- add reference to the service unit in the man page,
- fix several indentation and typos,
- replace '(uint64_t) -1' with 'UINT64_MAX',
- drop unnecessary 'continue'.
|
|\
| |
| | |
systemd-bsod: Add "--continuous" option
|
| | |
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This makes tmpfiles, sysusers, and udevd invoked in the following order:
1. systemd-tmpfiles-setup-dev-early.service
Create device nodes gracefully, that is, create device nodes anyway
by ignoring unknown users and groups.
2. systemd-sysusers.service
Create users and groups, to make later invocations of tmpfiles and
udevd can resolve necessary users and groups.
3. systemd-tmpfiles-setup-dev.service
Adjust owners of previously created device nodes.
4. systemd-udevd.service
Process all devices. Especially to make block devices active and can
be mountable.
5. systemd-tmpfiles-setup.service
Setup basic filesystem.
Follow-up for b42482af904ae0b94a6e4501ec595448f0ba1c06.
Fixes #28653.
Replaces #28681 and #28732.
|
|/
|
|
|
|
| |
The script is mostly equivalent to 'mkdir -p' and 'ln -sfr'.
Let's replace it with install_emptydir() builtin function and
inline meson call.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Let's rename the unit to systemd-battery-check.service. We usually want
to name our own unit files like our tools they wrap, in particular if
they are entirely defined by us (i.e. not just wrappers of foreign
concepts)
While we are at it, also hook this in from initrd.target, and order it
against initrd-root-device.target so that it runs before the root device
is possibly written to (i.e. mounted or fsck'ed).
This is heavily inspired by @aafeijoo-suse's PR #28208, but quite
different ;-)
|
|
|
|
| |
print message and shut down.
|
|
|
|
|
|
| |
main-func.h
Preparation for #27247
|
|
|
|
|
|
|
|
| |
This also merges two arrays units and in_units, and uses dictionary
for declaring units.
This also fixes the condition handling, that previously only two
conditions were handled and rests were ignored.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds a new mechanism for rebooting, a form of "userspace reboot"
hereby dubbed "soft-reboot". It will stop all services as in a usual
shutdown, possibly transition into a new root fs and then issue a fresh
initial transaction. The kernel is not replaced.
File descriptors can be passed over, thus opening the door for leaving
certain resources around between such reboots.
Usecase: this is an extremely quick way to reset userspace fully when
updating image based systems, without going through a full
hardware/firmware/boot loader/kernel/initrd cycle. It minimizes "grayout time"
for OS updates. (In particular when combined with kernel live patching)
|
|
|
|
|
|
|
|
|
|
|
| |
This mimics what we already have for cryptsetup services: the slice they
are placed in (they have their own slice since that's what we do by
default for instantiated services) shouldn't conflict with
shutdown.target, so that veritysetup services can stay around until the
very end (which is what we want for the root and usr verity volumes).
It's literally just a copy of the same unit we already have for
cryptsetup, just with an updated description string.
|
| |
|
|
|
|
|
|
|
| |
This drops all mentions of gnu-efi and its manual build machinery. A
future commit will bring bootloader builds back. A new bootloader meson
option is now used to control whether to build sd-boot and its userspace
tooling.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
since we don't have systemd-pcrphase built anyway, which breaks the tests:
...
I: Attempting to install /usr/lib/systemd/systemd-networkd-wait-online (based on unit file reference)
I: Attempting to install /usr/lib/systemd/systemd-network-generator (based on unit file reference)
I: Attempting to install /usr/lib/systemd/systemd-oomd (based on unit file reference)
I: Attempting to install /usr/lib/systemd/systemd-pcrphase (based on unit file reference)
W: Failed to install '/usr/lib/systemd/systemd-pcrphase'
make: *** [Makefile:4: setup] Error 1
make: Leaving directory '/root/systemd/test/TEST-01-BASIC'
Follow-up to 04959faa632272a8fc9cdac3121b2e4af721c1b6.
|
|
|
|
|
|
| |
If we use gpt-auto-generator, automatically measure root fs and /var.
Otherwise, add x-systemd.measure option to request this.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The systemd-growfs@.service units are currently written in full for each
file system to grow. Which is kinda pointless given that (besides an
optional ordering dep) they contain always the same definition. Let's
fix that and add a static template for this logic, that the generator
simply instantiates (and adds an ordering dep for).
This mimics how systemd-fsck@.service is handled. Similar to the wait
that for root fs there's a special instance systemd-fsck-root.service
we also add a special instance systemd-growfs-root.service for the root
fs, since it has slightly different deps.
Fixes: #20788
See: #10014
|
|
|
|
|
|
|
| |
We want PCR 15 to be useful for binding per-system policy to. Let's
measure the machine ID into it, to ensure that every OS we can
distinguish will get a different PCR (even if the root disk encryption
key is already measured into it).
|
|
|
|
|
|
|
|
|
|
| |
Before this patch the only way to prevent journald from reading the audit
messages was to mask systemd-journald-audit.socket. However this had main
drawback that downstream couldn't ship the socket disabled by default (beside
the fact that masking units is not supposed to be the usual way to disable
them).
Fixes #15777
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
systemd-boot-random-seed.service
This renames systemd-boot-system-token.service to
systemd-boot-random-seed.service and conditions it less strictly.
Previously, the job of the service was to write a "system token" EFI
variable if it was missing. It called "bootctl --graceful random-seed"
for that. With this change we condition it more liberally: instead of
calling it only when the "system token" EFI variable isn't set, we call
it whenever a boot loader interface compatible boot loader is used. This
means, previously it was invoked on the first boot only: now it is
invoked at every boot.
This doesn#t change the command that is invoked. That's because
previously already the "bootctl --graceful random-seed" did two things:
set the system token if not set yet *and* refresh the random seed in the
ESP. Previousy we put the focus on the former, now we shift the focus to
the latter.
With this simple change we can replace the logic
f913c784ad4c93894fd6cb2590738113dff5a694 added, but from a service that
can run much later and doesn't keep the ESP pinned.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds two more phases to the PCR boot phase logic: "sysinit" +
"final".
The "sysinit" one is placed between sysinit.target and basic.target.
It's good to have a milestone in this place, since this is after all
file systems/LUKS volumes are in place (which sooner or later should
result in measurements of their own) and before services are started
(where we should be able to rely on them to be complete).
This is particularly useful to make certain secrets available for
mounting secondary file systems, but making them unavailable later.
This breaks API in a way (as measurements during runtime will change),
but given that the pcrphase stuff wasn't realeased yet should be OK.
|
| |
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This makes use of the option switch that was added in the previous commit.
We used a pretty big hammer on a relatively small nail: we would do daemon-reload
and (in principle) allow any configuration to be changed. But in fact we only
made use of this in systemd-fstab-generator. systemd-fstab-generator filters
out all mountpoints except /usr and those marked with x-initrd.mount, i.e. on
a big majority of systems it wouldn't do anything.
Also, since systemd-fstab-generator first parses /proc/cmdline, and then
initrd's /etc/fstab, and only then /sysroot/etc/fstab, configuration in the
host would only matter if it the same mountpoint wasn't configured "earlier".
So the config in the host could be used for new mountpoints, but it couldn't
be used to amend configuration for existing mountpoints. And we wouldn't actually
remount anything, so mountpoints that were already mounted wouldn't be affected,
even if did change some config.
In the new scheme, we will parse /sysroot/etc/fstab and explicitly start
sysroot-usr.mount and other units that we just wrote. In most cases (as written
above), this will actually result in no units being created or started.
If the generator is invoked on a system with /sysroot/etc/fstab present,
behaviour is not changed and we'll create units as before. This is needed so
that if daemon-reload is later at some points, we don't "lose" those units.
There's a minor bugfix here: we honour x-initrd.mount for swaps, but we
wouldn't restart swap.target, i.e. the new swaps wouldn't necessarilly be
pulled in immediately.
|