summaryrefslogtreecommitdiffstats
path: root/src/shared/seccomp-util.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* seccomp: add PARISC (HPPA support)Sam James2022-06-281-2/+33
| | | | | | | | We have to skip the W^X protections as we need executable memory on PARISC for now. Kernel work is in progress (started w/ 5.18). Closes: https://github.com/systemd/systemd/issues/23180
* seccomp-util: make @known include @obsoleteYu Watanabe2022-06-171-0/+1
| | | | | | @known is generated from syscall-list.txt, which generated from kernel headers. So, some syscalls in @obsolete may not be listed in syscall-list.txt.
* seccomp: fix a typo in error messageFrantisek Sumsal2022-05-311-1/+1
|
* manager: prohibit clone3() in seccomp filtersZbigniew Jędrzejewski-Szmek2022-04-191-0/+15
| | | | | | | | RestrictNamespaces should block clone3() like flatpak: https://github.com/flatpak/flatpak/commit/a10f52a7565c549612c92b8e736a6698a53db330 clone3() passes arguments in a structure referenced by a pointer, so we can't filter on the flags as with clone(). Let's disallow the whole function call.
* shared/seccomp: add note about clone2() being unimportantZbigniew Jędrzejewski-Szmek2022-04-191-0/+3
| | | | | In case anyone else starts wondering whether it should be listed as I did…
* tree-wide: add a space after if, switch, for, and whileYu Watanabe2022-04-011-1/+1
|
* strv: make iterator in STRV_FOREACH() declaread in the loopYu Watanabe2022-03-191-1/+0
| | | | This also avoids multiple evaluations in STRV_FOREACH_BACKWARDS()
* seccomp: move arch_prctl to @defaultZbigniew Jędrzejewski-Szmek2022-01-071-1/+1
| | | | | | | | | | | | | | | | | | | It was reported as used by the linker: > [It is] called in the setup of ld-linux-x86-64.so.2 from _dl_sysdep_start. > My local call stack (with LTO): > > #0 init_cpu_features.constprop.0 (/usr/lib64/ld-linux-x86-64.so.2) > #1 _dl_sysdep_start (/usr/lib64/ld-linux-x86-64.so.2) > #2 _dl_start (/usr/lib64/ld-linux-x86-64.so.2) > #3 _start (/usr/lib64/ld-linux-x86-64.so.2) > > Looking through the source, I think it's this (links for glibc 2.34): > - First dl_platform_init calls _dl_x86_init_cpu_features, a wrapper for init_cpu_features. > - Then init_cpu_features calls get_cet_status. > - At last, get_cet_status invokes arch_prctl. Fixes #22033.
* seccomp-util: include missing_syscall_def.h to make __SNR_foo mapped to __NR_fooYu Watanabe2022-01-021-7/+4
| | | | Fixes #21969.
* seccomp: move mprotect to @defaultZbigniew Jędrzejewski-Szmek2021-11-141-1/+1
| | | | | | | | | | | | | | | | | | With glibc-2.34.9000-17.fc36.x86_64, dynamically programs newly fail in early init with a restrictive syscall filter that does not include @system-service. I think this is caused by 2dd87703d4386f2776c5b5f375a494c91d7f9fe4: Author: Florian Weimer <fweimer@redhat.com> Date: Mon May 10 10:31:41 2021 +0200 nptl: Move changing of stack permissions into ld.so All the stack lists are now in _rtld_global, so it is possible to change stack permissions directly from there, instead of calling into libpthread to do the change. It seems that this call will now be very widely used, so let's just move it to default to avoid too many failures.
* nspawn: add --suppress-sync=yes mode for turning sync() and friends into ↵Lennart Poettering2021-10-201-0/+95
| | | | | | | | | | | NOPs via seccomp This is supposed to be used by package/image builders such as mkosi to speed up building, since it allows us to suppress sync() inside a container. This does what Debian's eatmydata tool does, but for a container, and via seccomp (instead of LD_PRELOAD).
* seccomp: Always install filters for native architectureBenjamin Berg2021-09-301-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The commit 6597686865ff ("seccomp: don't install filters for archs that can't use syscalls") introduced a regression where filters may not be installed for the "native" architecture. This means that setting SystemCallArchitectures=native for a unit effectively disables the SystemCallFilter= and SystemCallLog= options. Conceptually, we have two filter stages: 1. architecture used for syscall (SystemCallArchitectures=) 2. syscall + architecture combination (SystemCallFilter=) The above commit tried to optimize the filter generation by skipping the second level filtering when it is not required. However, systemd will never fully block the "native" architecture using the first level filter. This makes the code a lot simpler, as systemd can execve() the target binary using its own architecture. And, it should be perfectly fine as the "native" architecture will always be the one with the most restrictive seccomp filtering. Said differently, the bug arises because (on x86_64): 1. x86_64 is permitted by libseccomp already 2. native != x86_64 3. the loop wants to block x86_64 because the permitted set only contains "native" (i.e. "native" != "x86_64") 4. x86_64 is marked as blocked in seccomp_local_archs Thereby we have an inconsistency, where it is marked as blocked in the seccomp_local_archs array but it is allowed by libseccomp. i.e. we will skip generating filter stage 2 without having stage 1 in place. The fix is simple, we just skip the native architecture when looping seccomp_local_archs. This way the inconsistency cannot happen.
* seccomp: move sched_getaffinity() from @system-service to @defaultLennart Poettering2021-07-271-1/+1
| | | | | | | | | | | | | | See: https://github.com/systemd/systemd/pull/20191#issuecomment-881982739 In general, we shouldn't blanket move syscalls like this into @default, given that glibc actually does have fallbacks, afaics. However, as long as the syscalls are "read-only" and thus benign, I figure it's a safe thing to do. But we should probably stick to a "if in doubt, don't" rule, and put these syscalls in @system-service as default, but not into @default. I think in the real world @system-service is the sensible group people should use, and not @default actually.
* seccomp: drop getrandom() from @system-serviceLennart Poettering2021-07-271-1/+0
| | | | | | | | It's included in @default now, since 14f4b1b568907350d023d1429c1aa4aaa8925f22, and since @system-service pulls that in we can drop it from @system-service. Follow-up for #20191
* malloc() uses getrandom nowCristian Rodríguez2021-07-231-0/+1
| | | glibc master uses getrandom in malloc since https://sourceware.org/git/?p=glibc.git;a=commit;h=fc859c304898a5ec72e0ba5269ed136ed0ea10e1 , getrandom should be in the default set so to avoid all non trivial programs to fallback to a PRNG.
* seccomp: drop quotactl_path() again from filter setsLennart Poettering2021-06-151-1/+0
| | | | | | | | | | | | In the light of https://lwn.net/Articles/859679/ let's drop quotactl_path() again from the filter set list, as it got backed out again in 5.13-rc3. It's likely going to be replaced by quotactl_fd() eventually, but that hasn't made its way into the tree yet, hence let's not replace the entry for now. This partially reverts 34254e599a28529bdb89f91571adeaf7c76d9f43.
* seccomp: add some recently added syscalls to filter groupsLennart Poettering2021-06-091-0/+4
|
* seccomp: do not ignore deny-listed syscalls with errno when list is allow-listYu Watanabe2021-03-081-4/+6
| | | | | | | | | | | | Previously, if the hashmap is allow-list and a new deny-listed syscall is added, seccomp_parse_syscall_filter() simply drop the new syscall from hashmap even if error number is specified. This makes 'allow-list' hashmap store two types of entries: - allow-listed syscalls, which are stored with negative value (-1). - deny-listed syscalls, which are stored with specified errno. Fixes #18916.
* seccomp: use FLAGS_SET() macroYu Watanabe2021-03-081-5/+5
|
* core,seccomp: refuse to specify errno for allow-listed syscallsYu Watanabe2021-03-081-0/+3
|
* seccomp: fix comment and change variable nameYu Watanabe2021-03-081-7/+9
|
* seccomp_restrict_sxid: return ENOSYS for openat2()Mike Gilbert2021-01-271-2/+4
| | | | | | | We reject all openat2() calls because it is currently not possible to inspect its flags parameter via seccomp. Fallback code is more likely to look for ENOSYS than EPERM.
* util: move parse_syscall_and_errno() to seccomp-util.cYu Watanabe2021-01-181-0/+38
| | | | | This makes parse-util.c independent of seccomp-util.c, which is located in src/shared.
* seccomp: don't install filters for archs that can't use syscallsGreg Depoire--Ferrer2020-12-101-17/+30
| | | | | | | | | When seccomp_restrict_archs is called, architectures that are blocked are replaced by the SECCOMP_LOCAL_ARCH_BLOCKED marker so that they are not disabled again and filters are not installed for them. This can make some service that use SystemCallArchitecture= and SystemCallFilter= start faster.
* shared/seccomp-util: address family filtering is broken on ppcZbigniew Jędrzejewski-Szmek2020-11-261-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts the gist of da1921a5c396547261c8c7fcd94173346eb3b718 and 0d9fca76bb69e162265b2d25cb79f1890c0da31b (for ppc). Quoting #17559: > libseccomp 2.5 added socket syscall multiplexing on ppc64(el): > https://github.com/seccomp/libseccomp/pull/229 > > Like with i386, s390 and s390x this breaks socket argument filtering, so > RestrictAddressFamilies doesn't work. > > This causes the unit test to fail: > /* test_restrict_address_families */ > Operating on architecture: ppc > Failed to install socket family rules for architecture ppc, skipping: Operation canceled > Operating on architecture: ppc64 > Failed to add socket() rule for architecture ppc64, skipping: Invalid argument > Operating on architecture: ppc64-le > Failed to add socket() rule for architecture ppc64-le, skipping: Invalid argument > Assertion 'fd < 0' failed at src/test/test-seccomp.c:424, function test_restrict_address_families(). Aborting. > > The socket filters can't be added so `socket(AF_UNIX, SOCK_DGRAM, 0);` still > works, triggering the assertion. Fixes #17559.
* seccomp: also move munmap into @default syscall filter setYu Watanabe2020-11-241-1/+1
| | | | Follow-up for 5abede3247591248718026cb8be6cd231de7728b.
* seccomp: move brk+mmap+mmap2 into @default syscall filter setLennart Poettering2020-11-191-3/+3
| | | | | | | | | | | These three syscalls are internally used by libc's memory allocation logic, i.e. ultimately back malloc(). Allocating a bit of memory is so basic, it should just be in the default set. This fixes a couple of issues with asan/msan and the seccomp tests: when asan/msan is used some additional, large memory allocations take place in the background, and unless mmap/mmap2/brk are allowlisted these will fail, aborting the test prematurely.
* license: LGPL-2.1+ -> LGPL-2.1-or-laterYu Watanabe2020-11-091-1/+1
|
* seccomp: allow turning off of seccomp filtering via env varLennart Poettering2020-11-051-4/+14
| | | | | | | | | Fixes: #17504 (While we are it, also move $SYSTEMD_SECCOMP_LOG= env var description into the right document section) Also suggested in: https://github.com/systemd/systemd/issues/17245#issuecomment-704773603
* shared/seccomp-util: move stime() to @obsoleteTopi Miettinen2020-11-041-1/+1
| | | | | | Quoting the manual page of stime(2): "Starting with glibc 2.31, this function is no longer available to newly linked applications and is no longer declared in <time.h>."
* seccomp: allowlist close_range() by default in @basic-ioLennart Poettering2020-10-141-0/+1
|
* tree-wide: assorted coccinelle fixesFrantisek Sumsal2020-10-091-2/+2
|
* seccomp-util: fix typo in help messageSamanta Navarro2020-10-031-1/+1
|
* seccomp-util: add cacheflush() syscall to @default syscall setLennart Poettering2020-09-301-0/+1
| | | | | | | | This is like membarrier() I guess and basically just exposes CPU functionality via kernel syscall on some archs. Let's whitelist it for everyone. Fixes: #17197
* exec: SystemCallLog= directiveTopi Miettinen2020-09-151-0/+4
| | | | | | | | | | | | With new directive SystemCallLog= it's possible to list system calls to be logged. This can be used for auditing or temporarily when constructing system call filters. --- v5: drop intermediary, update HASHMAP_FOREACH_KEY() use v4: skip useless debug messages, actually parse directive v3: don't declare unused variables with old libseccomp v2: fix build without seccomp or old libseccomp
* exec: Add kill action to system call filtersTopi Miettinen2020-09-151-1/+3
| | | | | | | | | | | | | | | | | Define explicit action "kill" for SystemCallErrorNumber=. In addition to errno code, allow specifying "kill" as action for SystemCallFilter=. --- v7: seccomp_parse_errno_or_action() returns -EINVAL if !HAVE_SECCOMP v6: use streq_ptr(), let errno_to_name() handle bad values, kill processes, init syscall_errno v5: actually use seccomp_errno_or_action_to_string(), don't fail bus unit parsing without seccomp v4: fix build without seccomp v3: drop log action v2: action -> number
* tree-wide: define iterator inside of the macroZbigniew Jędrzejewski-Szmek2020-09-081-7/+4
|
* tree-wide: drop pointless zero initialization (#16900)fangxiuning2020-08-291-1/+1
|
* Merge pull request #16819 from keszybz/seccomp-enosysZbigniew Jędrzejewski-Szmek2020-08-251-16/+43
|\ | | | | Return ENOSYS in nspawn for "unknown" syscalls
| * shared/seccomp-util: added functionality to make list of filtred syscallsZbigniew Jędrzejewski-Szmek2020-08-241-7/+32
| | | | | | | | | | While at it, start removing the "seccomp_" prefix from our own functions. It is used by libseccomp.
| * shared/seccomp: reduce scope of indexing variablesZbigniew Jędrzejewski-Szmek2020-08-241-9/+5
| |
| * shared: add @known syscall listZbigniew Jędrzejewski-Szmek2020-08-241-0/+6
| |
* | Request seccomp logging if SYSTEMD_LOG_SECCOMP environment variable is set.Steve Dodd2020-08-211-0/+9
| |
* | seccomp: add support for riscv64Aurelien Jarno2020-08-211-4/+26
|/ | | | | | | | | | | | | | | This patch adds seccomp support to the riscv64 architecture. seccomp support is available in the riscv64 kernel since version 5.5, and it has just been added to the libseccomp library. riscv64 uses generic syscalls like aarch64, so I used that architecture as a reference to find which code has to be modified. With this patch, the testsuite passes successfully, including the test-seccomp test. The system boots and works fine with kernel 5.4 (i.e. without seccomp support) and kernel 5.5 (i.e. with seccomp support). I have also verified that the "SystemCallFilter=~socket" option prevents a service to use the ping utility when running on kernel 5.5.
* shared/seccomp: use _cleanup_ in one more placeZbigniew Jędrzejewski-Szmek2020-08-191-10/+6
| | | | (cherry picked from commit 27605d6a836d85563faf41db9f7a72883d44c0ff)
* shared/seccomp: do not use ifdef guards around textual syscall namesZbigniew Jędrzejewski-Szmek2020-08-191-6/+2
| | | | | | | | | | | | It is possible that we will be running with an upgraded libseccomp, in which case libseccomp might know the syscall name, even if the number is not known at the time when systemd is being compiled. The guard only serves to break such upgrades, by requiring that we also recompile systemd. For s390-specific syscalls, use a define to exclude them, so that that we don't try to filter them on other arches. (cherry picked from commit 6cf852e79eb0eced2f77653941f9c75c3bd79386)
* Newer Glibc use faccessat2 to implement faccessatMichael Scherer2020-08-161-0/+1
| | | | | | cf https://repo.or.cz/glibc.git/commit/3d3ab573a5f3071992cbc4f57d50d1d29d55bde2 This cause breakage on Fedora Rawhide: https://bugzilla.redhat.com/show_bug.cgi?id=1869030
* tree-wide: avoid some loaded termsLennart Poettering2020-06-251-14/+13
| | | | | | | | | | | | | | | | | | | | | | | | https://tools.ietf.org/html/draft-knodel-terminology-02 https://lwn.net/Articles/823224/ This gets rid of most but not occasions of these loaded terms: 1. scsi_id and friends are something that is supposed to be removed from our tree (see #7594) 2. The test suite defines an API used by the ubuntu CI. We can remove this too later, but this needs to be done in sync with the ubuntu CI. 3. In some cases the terms are part of APIs we call or where we expose concepts the kernel names the way it names them. (In particular all remaining uses of the word "slave" in our codebase are like this, it's used by the POSIX PTY layer, by the network subsystem, the mount API and the block device subsystem). Getting rid of the term in these contexts would mean doing some major fixes of the kernel ABI first. Regarding the replacements: when whitelist/blacklist is used as noun we replace with with allow list/deny list, and when used as verb with allow-list/deny-list.
* tree-wide: use set_ensure_put()Zbigniew Jędrzejewski-Szmek2020-06-221-10/+5
| | | | | | | | | Patch contains a coccinelle script, but it only works in some cases. Many parts were converted by hand. Note: I did not fix errors in return value handing. This will be done separate to keep the patch comprehensible. No functional change is intended in this patch.
* seccomp: filter openat2() entirely in seccomp_restrict_sxid()Lennart Poettering2020-06-031-0/+16
|