summaryrefslogtreecommitdiffstats
path: root/src/nspawn (follow)
Commit message (Collapse)AuthorAgeFilesLines
* nspawn: bring back the word `may` in error textOlle Lundberg2021-05-171-1/+1
| | | | | | In the change set 6c045a999800c62368470938307951bb669f5afc the error text for the old flag `--private-users-chown` was repurposed for the new flag `--private-users-ownership=own` and while doing so the word `may` was dropped leading to a grammatically incorrect error text.
* nspawn: introduce --private-users-ownership=map|autoLennart Poettering2021-05-073-2/+46
| | | | | | | | | | | | | | | | | | | | This adds a two new values to --private-users-ownership=: "map" and "auto". "map" exposes the kernel 5.12 idmap feature pretty much 1:1. It fails if the kernel or used file system doesn't support ID mapping. "auto" is a bit smarter: if we can make ID mapping work, we'll use it, otherwise revert back to classic chown()ing. We'll also use chown()ing if we detect that an image is already ID shifted, both to increase compatibility with the status quo ante, and to simplify our codepaths, since the mappings become a lot simpler if we only have to map from zero to something else, instead of from anything to anything else. The short -U switch, and --private-users=pick will now imply --private-users-ownership=auto instead of --private-users-ownership=chown, since the new logic should be the much better choice.
* nspawn: drop an unnecessary local variableLennart Poettering2021-05-071-3/+3
|
* dissect-image: add support for optionally mounting images with idmapping onLennart Poettering2021-05-071-0/+2
|
* nspawn: tighten userns UID shift/range checksLennart Poettering2021-05-072-8/+12
| | | | | Let's add a helper that ensures the UID shift/range parameters actually fit together.
* mount-util: add helper that ensures something is a mount pointLennart Poettering2021-05-071-5/+3
|
* nspawn: replace boolean --private-user-chown by enumLennart Poettering2021-05-074-31/+99
| | | | | | | | | | | | | | | | | | | | This replaces --private-user-chown by an enum value --private-user-ownership=off|chown. Changes otherwise very little. This is mostly preparation for a follow-up commit adding a new "map" mode, using kernel 5.12 UID mapping mounts. Note that this does alter codeflow a bit: the new enum already knows three different values instead of the old true/false pair. Besides "off" and "chown" it knows -EINVAL, i.e. whenever the value wsn't set explicitly. This value is changed to "off" or "chown" before use, thus retaining compat to the status quo before, except it won't override explicit configuration anymore. Thus, if you explicitly request --private-user=pick you can now combine it wiht an explicit --private-user-ownership=off if you like, which will give you a container that runs under its own UID set, but the files will be owned by the original image. Makes not much sense besids maybe debugging, but if requested explicitly I think it's OK to implement.
* nspawn: add high-level option for identity userns mappingLennart Poettering2021-05-071-5/+19
| | | | | | | | | | userns identity 1:1 mapping is a pretty useful concept since it isolates capability sets between containers and hosts, even if it doesn't map any uid ranges. Let's support it with an explicit concept. (Note that this is identical to --private-users=0:65536 (which in turn is identical to --private-users=0), but I think it makes to emphasize this concept as a high-level one that makes sense to support.)
* Merge pull request #19391 from poettering/dissect-growZbigniew Jędrzejewski-Szmek2021-05-071-2/+2
|\ | | | | optionally, grow file systems to partition size when mounting them via GPT auto-discovery
| * tree-wide: enable automatic growing of file systems in images in various ↵Lennart Poettering2021-04-231-2/+2
| | | | | | | | | | | | | | | | | | | | tools that deal with OS images Let's enable this in all tools that intend to write to the OS images. It's not conditionalized for now, as there already is conditionalization in the existance or absence of the flag in the GPT partition table (and it's opt-in), hence it should be OK to just enable this by default for now if the flag is set.
* | string-util: add strextendf() helper, that allows extending some allocated ↵Lennart Poettering2021-05-071-24/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | string via a format string It's not going to be efficient if called in inner loops, but it's oh so handy, and we have some code that does this: asprintf(&p, "%s…", b, …); free(b); b = TAKE_PTR(p); which can now be replaced by the quicker and easier to read: strextendf(&p, "…", …);
* | nspawn: fix the sections .nspawn settings are placed inLennart Poettering2021-05-061-2/+2
|/ | | | | The actual section names are quite different from what the comment so far suggested. Fix that.
* dissect: ignore udev database entries from before the loopback attachmentLennart Poettering2021-04-201-0/+1
| | | | | | | | | This tries to shorten the race of device reuse a bit more: let's ignore udev database entries that are older than the time where we started to use a loopback device. This doesn't fix the whole loopback device raciness mess, but it makes the race window a bit shorter.
* dissect: ignore old uevents when waiting for loopback partition scanLennart Poettering2021-04-201-0/+1
| | | | | | | | | | | Let's drop all monitor uevent that were enqueued before we actually started setting up the device. This doesn't fix the race, but it makes the race window smaller: since we cannot determine the uevent seqnum and the loopback attachment atomically, there's a tiny window where uevents might be generated by the device which we mistake for being associated with out use of the loopback device.
* Merge pull request #18971 from poettering/sysusers-credsLennart Poettering2021-03-311-3/+4
|\ | | | | let's read LoadCredentials=/SetCredentials= style cred in sysusers/firstboot and when asking for passwords
| * util: add creds-util.[ch] with helpers for dealing with credentialsLennart Poettering2021-03-261-3/+4
| |
* | dissect-image: split DISSECT_IMAGE_REQUIRE_ROOT in twoLennart Poettering2021-03-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | Previously, the flag did two things at once: enable support for using generic partitions as root fs if there were only one/allow use of partition-table-less images as root fs. And secondly, insist that there was a rootfs, and fail if not. Let's split these two in two separate options so that they can be used independently of each other. There are cases where one wants to use one without the other (i.e. when inspecting things with systemd-dissect tool it should be OK to do so even if image has no root fs), and it's cleaner anyway.
* | tree-wide: make use of DISSECT_IMAGE_USR_NO_ROOT in various toolsLennart Poettering2021-03-161-5/+18
|/ | | | | | | | Let's make use of the new dissection in all tools where this makes sense, which are all tools that dissect images, except for those which inherently operate on state/configuraiton and thus where an image without state nor configuration is useless (e.g. systemd-tmpfiles/systemd-firstboot/… --image= switch).
* tree-wide: use UINT64_MAX or friendsYu Watanabe2021-03-043-36/+36
|
* Refactor network namespace specific functions in generic helpersXℹ Ruoyao2021-03-031-1/+1
|
* json: rename json_dispatch_{integer,unsigned} -> json_dispatch_{intmax,uintmax}Anita Zhang2021-02-261-2/+2
| | | | | | Prompted by https://bugzilla.redhat.com/show_bug.cgi?id=1930875 in which I had previously used json_dispatch_unsigned and passed a return variable of type unsigned when json_dispatch_unsigned writes a uintmax_t.
* signal-util: make -1 termination of ignore_signals() argument list unnecessaryLennart Poettering2021-02-251-2/+2
| | | | | | | | | | Clean up ignore_signals() + default_signals() + sigaction_many() a bit: make it unnecessary to explicitly terminate the signal list with -1. Merge all three calls into a single function that is just called with slightly different parameters. And eliminate an unnecessary extra iteration in its inner for() loop. No change in behaviour.
* tree-wide: use in_addr_is_set() or friendsYu Watanabe2021-02-171-2/+2
|
* Merge pull request #18007 from fw-strlen/ipv6_masq_and_dnatLennart Poettering2021-02-163-14/+20
|\ | | | | Support ipv6 for masquerade and dnat in nspawn and networkd
| * nspawn: expose container ipv6 address tooFlorian Westphal2021-01-193-14/+20
| | | | | | | | Extend nspawn so it can keep track of one ipv4 and one ipv6 address.
* | Merge pull request #18601 from keszybz/env-assign-cleanupLennart Poettering2021-02-161-7/+3
|\ \ | | | | | | Envvar assignment cleanup
| * | basic/env-util: add variant of strv_env_replace() that does strdup internallyZbigniew Jędrzejewski-Szmek2021-02-151-7/+3
| | |
* | | Move and rename parse_path_argument() functionZbigniew Jędrzejewski-Szmek2021-02-151-6/+7
|/ / | | | | | | | | This fits better in shared/, and the new parse-argument.c file is a good home for it.
* | tree-wide: use free_and_strdup_warn()Yu Watanabe2021-02-111-4/+1
| |
* | tree-wide: propagate error code from _from_string() functionsZbigniew Jędrzejewski-Szmek2021-02-102-7/+5
| | | | | | | | Now that we know we have something useful, no need to make an answer up.
* | tree-wide: use -EINVAL for enum invalid valuesZbigniew Jędrzejewski-Szmek2021-02-102-7/+7
| | | | | | | | | | | | | | | | | | As suggested in https://github.com/systemd/systemd/pull/11484#issuecomment-775288617. This does not touch anything exposed in src/systemd. Changing the defines there would be a compatibility break. Note that tests are broken after this commit. They will be fixed in the next one.
* | shared: rename machine-image.[ch] → discover-image.[ch]Lennart Poettering2021-02-031-1/+1
| | | | | | | | | | | | | | | | | | | | The old name originates when this was used to discover "machine" images, as managed by machined/machinectl. But nowadays this is also used by portable services and system extensions, hence let's use a more generic name for this API. Taking inspiration from "dissect-image.[ch]", let's call this "discover-image.[ch]". This is pure renaming, no other changes.
* | tree-wide: Drop custom formatting for print() help messagesDaan De Meyer2021-01-311-6/+7
| | | | | | | | | | | | | | | | | | | | | | | | I think this formatting was originally used because it simplified adding new options to the help messages. However, these days, most tools their help message end with "\nSee the %s for details.\n" so the final line almost never has to be edited which eliminates the benefit of the custom formatting used for printf() help messages. Let's make things more consistent and use the same formatting for printf() help messages that we use everywhere else. Prompted by https://github.com/systemd/systemd/pull/18355#discussion_r567241580
* | tree-wide: add spdx header on all scripts and helpersZbigniew Jędrzejewski-Szmek2021-01-281-0/+1
| | | | | | | | | | | | Even though many of those scripts are very simple, it is easier to include the header than to try to say whether each of those files is trivial enough not to require one.
* | tree-wide: ignore messages with too long control dataLennart Poettering2021-01-201-0/+4
|/ | | | | | | | | | | | | | | | | Apparently SELinux inserts control data into AF_UNIX datagrams where we don't expect it, thus miscalculating the control data. This looks like something to fix in SELinux, but we still should handle this gracefully and just drop the offending datagram and continue. recvmsg_safe() actually already drops the datagram, it's just a matter of actually ignoring EXFULL (which it generates if control data is too large) in the right places. This does this wherever an AF_UNIX/SOCK_DGRAM socket is used with recvmsg_safe() that is not just internal communication. Fixes: #17795 Follow-up for: 3691bcf3c5eebdcca5b4f1c51c745441c57a6cd1
* machine-image: properly support searching for images below some --root= pathLennart Poettering2021-01-191-1/+1
| | | | | systemd-sysext supports --root= for everything but the image discovery. Fix that.
* meson: move test or fuzzer definitions to relevant meson.build in subdirectoriesYu Watanabe2021-01-181-0/+14
|
* fuzzers: move several fuzzersYu Watanabe2021-01-184-0/+58
|
* meson: make the second and third elements of tests or fuzzers optionalYu Watanabe2021-01-181-1/+1
| | | | Then, we can shorten many test definitions.
* nspawn: minor modernizationZbigniew Jędrzejewski-Szmek2021-01-151-28/+9
|
* nspawn: make rootfs relative to oci bundle pathArian van Putten2021-01-121-1/+17
| | | | | | | | | | | This is inline with the OCI runtime spec: On POSIX platforms, path is either an absolute path or a relative path to the bundle. For example, with a bundle at /to/bundle and a root filesystem at /to/bundle/rootfs, the path value can be either /to/bundle/rootfs or rootfs. The value SHOULD be the conventional rootfs. (https://github.com/opencontainers/runtime-spec/blob/master/config.md)
* nspawn: sort headersYu Watanabe2020-12-181-2/+1
|
* Merge pull request #17026 from fw-strlen/nft_16Lennart Poettering2020-12-163-26/+39
|\ | | | | add networkd/nspawn nftables backend
| * firewall-util: add nftables backendFlorian Westphal2020-12-161-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Idea is to use a static ruleset, added when the first attempt to add a masquerade or dnat rule is made. The alternative would be to add the ruleset when the init function is called. The disadvantage is that this enables connection tracking and NAT in the kernel (as the ruleset needs this to work), which comes with some overhead that might not be needed (no nspawn usage and no IPMasquerade option set). There is no additional dependency on the 'nft' userspace binary or other libraries. sd-netlinks nfnetlink backend is used to modify the nftables ruleset. The commit message/comments still use nft syntax since that is what users will see when they use the nft tool to list the ruleset. The added initial skeleton (added on first fw_add_masquerade/local_dnat call) looks like this: table ip io.systemd.nat { set masq_saddr { type ipv4_addr flags interval elements = { 192.168.59.160/28 } } map map_port_ipport { type inet_proto . inet_service : ipv4_addr . inet_service elements = { tcp . 2222 : 192.168.59.169 . 22 } } chain prerouting { type nat hook prerouting priority dstnat + 1; policy accept; fib daddr type local dnat ip addr . port to meta l4proto . th dport map @map_port_ipport } chain output { type nat hook output priority -99; policy accept; ip daddr != 127.0.0.0/8 oif "lo" dnat ip addr . port to meta l4proto . th dport map @map_port_ipport } chain postrouting { type nat hook postrouting priority srcnat + 1; policy accept; ip saddr @masq_saddr masquerade } } Next calls to fw_add_masquerade/add_local_dnat will then only add/delete the element/mapping to masq_saddr and map_port_ipport, i.e. the ruleset doesn't change -- only the set/map content does. Running test-firewall-util with this backend gives following output on a parallel 'nft monitor': $ nft monitor add table ip io.systemd.nat add chain ip io.systemd.nat prerouting { type nat hook prerouting priority dstnat + 1; policy accept; } add chain ip io.systemd.nat output { type nat hook output priority -99; policy accept; } add chain ip io.systemd.nat postrouting { type nat hook postrouting priority srcnat + 1; policy accept; } add set ip io.systemd.nat masq_saddr { type ipv4_addr; flags interval; } add map ip io.systemd.nat map_port_ipport { type inet_proto . inet_service : ipv4_addr . inet_service; } add rule ip io.systemd.nat prerouting fib daddr type local dnat ip addr . port to meta l4proto . th dport map @map_port_ipport add rule ip io.systemd.nat output ip daddr != 127.0.0.0/8 fib daddr type local dnat ip addr . port to meta l4proto . th dport map @map_port_ipport add rule ip io.systemd.nat postrouting ip saddr @masq_saddr masquerade add element ip io.systemd.nat masq_saddr { 10.1.2.3 } add element ip io.systemd.nat masq_saddr { 10.0.2.0/28 } delete element ip io.systemd.nat masq_saddr { 10.0.2.0/28 } delete element ip io.systemd.nat masq_saddr { 10.1.2.3 } add element ip io.systemd.nat map_port_ipport { tcp . 4711 : 1.2.3.4 . 815 } delete element ip io.systemd.nat map_port_ipport { tcp . 4711 : 1.2.3.4 . 815 } add element ip io.systemd.nat map_port_ipport { tcp . 4711 : 1.2.3.5 . 815 } delete element ip io.systemd.nat map_port_ipport { tcp . 4711 : 1.2.3.5 . 815 } CTRL-C Things not implemented/supported: 1. Change monitoring. The kernel allows userspace to learn about changes made by other clients (using nfnetlink notifications). It would be possible to detect when e.g. someone removes the systemd nat table. This would need more work. Its also not clear on how to react to external changes -- it doesn't seem like a good idea to just auto-undo everthing. 2. 'set masq_saddr' doesn't handle overlaps. Example: fw_add_masquerade(true, AF_INET, "10.0.0.0" , 16); fw_add_masquerade(true, AF_INET, "10.0.0.0" , 8); /* fails */ With the iptables backend the second call works, as it adds an independent iptables rule. With the nftables backend, the range 10.0.0.0-10.255.255.255 clashes with the existing range of 10.0.0.0-10.0.255.255 so 2nd add gets rejected by the kernel. This will generate an error message from networkd ("Could not enable IP masquerading: File exists"). To resolve this it would be needed to either keep track of the added elements and perform range merging when overlaps are detected. However, the add erquests are done using the configured network on a device, so no overlaps should occur in normal setups. IPv6 support is added in a extra changeset. Fixes: #13307
| * firewall-util: introduce context structureFlorian Westphal2020-12-163-17/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | for planned nft backend we have three choices: - open/close a new nfnetlink socket for every operation - keep a nfnetlink socket open internally - expose a opaque fw_ctx and stash all internal data here. Originally I opted for the 2nd option, but during review it was suggested to avoid static storage duration because of perceived problems with threaded applications. This adds fw_ctx and new/free functions, then converts the existing api and nspawn and networkd to use it.
| * nspawn: pass userdata pointer, not inet_addr unionFlorian Westphal2020-12-162-4/+4
| | | | | | | | | | | | Next patch will need to pass two pointers to the callback instead of just the addr mask. Caller will pass a compound structure, so make this 'void *userdata' to de-clutter the next patch.
* | Move hostname setup logic to new shared/hostname-setup.[ch]Zbigniew Jędrzejewski-Szmek2020-12-161-0/+1
|/ | | | | | | | | | No functional change, just moving a bunch of things around. Before we needed a rather complicated setup to test hostname_setup(), because the code was in src/core/. When things are moved to src/shared/ we can just test it as any function. The test is still "unsafe" because hostname_setup() may modify the hostname.
* hostname-util: flagsify hostname_is_valid(), drop machine_name_is_valid()Lennart Poettering2020-12-153-5/+5
| | | | | | | | | | | | Let's clean up hostname_is_valid() a bit: let's turn the second boolean argument into a more explanatory flags field, and add a flag that accepts the special name ".host" as valid. This is useful for the container logic, where the special hostname ".host" refers to the "root container", i.e. the host system itself, and can be specified at various places. let's also get rid of machine_name_is_valid(). It was just an alias, which is confusing and even more so now that we have the flags param.
* nspawn: remove outdated comment regarding bpffsIlya Dmitrichenko2020-12-141-1/+1
| | | | | | | | | bpffs fully respects mount namespaces since kernel version 4.7 References: - https://github.com/torvalds/linux/commit/e27f4a942a0ee4b84567a3c6cfa84f273e55cbb7 - https://github.com/torvalds/linux/commit/612bacad78ba6d0a91166fc4487af114bac172a8
* systemd-nspawn: Allow setting ambient capability setTorsten Hilbrich2020-12-073-6/+41
| | | | | | | | | | | | | | | | | | | | | | | | The old code was only able to pass the value 0 for the inheritable and ambient capability set when a non-root user was specified. However, sometimes it is useful to run a program in its own container with a user specification and some capabilities set. This is needed when the capabilities cannot be provided by file capabilities (because the file system is mounted with MS_NOSUID for additional security). This commit introduces the option --ambient-capability and the config file option AmbientCapability=. Both are used in a similar way to the existing Capability= setting. It changes the inheritable and ambient set (which is 0 by default). The code also checks that the settings for the bounding set (as defined by Capability= and DropCapability=) and the setting for the ambient set (as defined by AmbientCapability=) are compatible. Otherwise, the operation would fail in any way. Due to the current use of -1 to indicate no support for ambient capability set the special value "all" cannot be supported. Also, the setting of ambient capability is restricted to running a single program in the container payload.