summaryrefslogtreecommitdiffstats
path: root/src/shared/seccomp-util.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* seccomp: add new 5.1 syscall pidfd_send_signal() to filter set listLennart Poettering2019-05-281-0/+1
|
* seccomp: add scmp_act_kill_process() helper that returns ↵Lennart Poettering2019-05-241-0/+15
| | | | SCMP_ACT_KILL_PROCESS if supported
* seccomp: check more error codes from seccomp_load()Anita Zhang2019-04-121-11/+11
| | | | | | | | | | | | | | | | We noticed in our tests that occasionally SystemCallFilter= would fail to set and the service would run with no syscall filtering. Most of the time the same tests would apply the filter and fail the service as expected. While it's not totally clear why this happens, we noticed seccomp_load() in the systemd code base would fail open for all errors except EPERM and EACCES. ENOMEM, EINVAL, and EFAULT seem like reasonable values to add to the error set based on what I gather from libseccomp code and man pages: -ENOMEM: out of memory, failed to allocate space for a libseccomp structure, or would exceed a defined constant -EINVAL: kernel isn't configured to support the operations, args are invalid (to seccomp_load(), seccomp(), or prctl()) -EFAULT: addresses passed as args are invalid
* Merge pull request #12198 from keszybz/seccomp-parsing-loggingZbigniew Jędrzejewski-Szmek2019-04-031-2/+2
|\ | | | | Seccomp parsing logging cleanup
| * pid1: pass unit name to seccomp parser when we have no file locationZbigniew Jędrzejewski-Szmek2019-04-031-2/+2
| | | | | | | | | | | | | | | | Building on previous commit, let's pass the unit name when parsing dbus message or builtin whitelist, which is better than nothing. seccomp_parse_syscall_filter() is not needed anymore, so it is removed, and seccomp_parse_syscall_filter_full() is renamed to take its place.
* | seccomp: rework how the S[UG]ID filter is installedZbigniew Jędrzejewski-Szmek2019-04-031-106/+138
|/ | | | | | | | | | If we know that a syscall is undefined on the given architecture, don't even try to add it. Try to install the filter even if some syscalls fail. Also use a helper function to make the whole a bit less magic. This allows the S[UG]ID test to pass on arm64.
* seccomp: introduce seccomp_restrict_suid_sgid() for blocking chmod() for ↵Lennart Poettering2019-04-021-0/+132
| | | | suid/sgid files
* seccomp: add debug messages to seccomp_protect_hostname()Lennart Poettering2019-04-021-2/+6
|
* seccomp: add rseq() to default list of syscalls to whitelistLennart Poettering2019-03-281-0/+1
| | | | | | | Apparently glibc is going to call this implicitly soon, hence let's whitelist this by default. Fixes: #12127
* seccomp: allow shmat to be a separate syscall on architectures which use a ↵Zbigniew Jędrzejewski-Szmek2019-03-151-1/+2
| | | | | | | | | | multiplexer After https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0d6040d46817, those syscalls have their separate numbers and we can block them. But glibc might still use the old ones. So let's just do a best-effort block and not assume anything about how effective it is.
* seccomp: shm{get,at,dt} now have their own numbers everywhereZbigniew Jędrzejewski-Szmek2019-03-151-5/+0
| | | | | | | | | | | | | | | | | | | | | E.g. on i686: (previously) arch x86: SCMP_SYS(mmap) = 90 arch x86: SCMP_SYS(mmap2) = 192 arch x86: SCMP_SYS(shmat) = -221 arch x86: SCMP_SYS(shmat) = -221 arch x86: SCMP_SYS(shmdt) = -222 (now) arch x86: SCMP_SYS(mmap) = 90 arch x86: SCMP_SYS(mmap2) = 192 arch x86: SCMP_SYS(shmat) = 397 arch x86: SCMP_SYS(shmat) = 397 arch x86: SCMP_SYS(shmdt) = 398 The relevant commit seems to be https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0d6040d46817.
* util: split out nulstr related stuff to nulstr-util.[ch]Lennart Poettering2019-03-141-2/+2
|
* core: ProtectHostname= featureTopi Miettinen2019-02-201-0/+37
| | | | | | Let services use a private UTS namespace. In addition, a seccomp filter is installed on set{host,domain}name and a ro bind mounts on /proc/sys/kernel/{host,domain}name.
* seccomp: drop mincore() from @system-service syscall filter groupLennart Poettering2019-01-161-1/+0
| | | | | | | | | | | | | | | | | | | Previously, this system call was included in @system-service since it is a "getter" only, i.e. only queries information, and doesn't change anything, and hence was considered not risky. However, as it turns out, mincore() is actually security sensitive, see the discussion here: https://lwn.net/Articles/776034/ Hence, let's adjust the system call filter and drop mincore() from it. This constitues a compatibility break to some level, however I presume we can get away with this as the systemcall is pretty exotic. The fact that it is pretty exotic is also reflected by the fact that the kernel intends to majorly change behaviour of the system call soon (see the linked LWN article)
* seccomp-util: drop process_vm_readv from @debug groupLennart Poettering2018-11-301-2/+0
| | | | | | it's already part of @ipc, no need to have it in both. Given that @ipc is much more popular (as it is part of @system-service for example), let's not define it a second time.
* coccinelle: make use of SYNTHETIC_ERRNOZbigniew Jędrzejewski-Szmek2018-11-221-4/+4
| | | | | | | | | | | Ideally, coccinelle would strip unnecessary braces too. But I do not see any option in coccinelle for this, so instead, I edited the patch text using search&replace to remove the braces. Unfortunately this is not fully automatic, in particular it didn't deal well with if-else-if-else blocks and ifdefs, so there is an increased likelikehood be some bugs in such spots. I also removed part of the patch that coccinelle generated for udev, where we returns -1 for failure. This should be fixed independently.
* seccomp: add some missing syscalls to filter setsLennart Poettering2018-11-161-0/+3
|
* shared: fix typoZbigniew Jędrzejewski-Szmek2018-11-101-1/+1
|
* tree-wide: replace 'unsigned int' with 'unsigned'Yu Watanabe2018-10-191-1/+1
|
* seccomp: tighten checking of seccomp filter creationZbigniew Jędrzejewski-Szmek2018-09-241-10/+16
| | | | | | | | | | | | | | | | | | | | In seccomp code, the code is changed to propagate errors which are about anything other than unknown/unimplemented syscalls. I *think* such errors should not happen in normal usage, but so far we would summarilly ignore all errors, so that part is uncertain. If it turns out that other errors occur and should be ignored, this should be added later. In nspawn, we would count the number of added filters, but didn't use this for anything. Drop that part. The comments suggested that seccomp_add_syscall_filter_item() returned negative if the syscall is unknown, but this wasn't true: it returns 0. The error at this point can only be if the syscall was known but couldn't be added. If the error comes from our internal whitelist in nspawn, treat this as error, because it means that our internal table is wrong. If the error comes from user arguments, warn and ignore. (If some syscall is not known at current architecture, it is still silently ignored.)
* seccomp: reduce logging about failure to add syscall to seccompZbigniew Jędrzejewski-Szmek2018-09-241-26/+31
| | | | | | | | | | | | | | | | | | | Our logs are full of: Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call oldstat() / -10037, ignoring: Numerical argument out of domain Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call get_thread_area() / -10076, ignoring: Numerical argument out of domain Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call set_thread_area() / -10079, ignoring: Numerical argument out of domain Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call oldfstat() / -10034, ignoring: Numerical argument out of domain Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call oldolduname() / -10036, ignoring: Numerical argument out of domain Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call oldlstat() / -10035, ignoring: Numerical argument out of domain Sep 19 09:22:10 autopkgtest systemd[690]: Failed to add rule for system call waitpid() / -10073, ignoring: Numerical argument out of domain ... This is pointless and makes debug logs hard to read. Let's keep the logs in test code, but disable it in nspawn and pid1. This is done through a function parameter because those functions operate recursively and it's not possible to make the caller to log meaningfully. There should be no functional change, except the skipped debug logs.
* seccomp: permit specifying multiple errnos for a syscallLucas Werkmeister2018-09-071-4/+2
| | | | | | | | | | | | | | | If more than one errno is specified for a syscall in SystemCallFilter=, use the last one instead of reporting an error. This is especially useful when used with system call sets: SystemCallFilter=@privileged:EPERM @reboot This will block any system call requiring super-user capabilities with EPERM, except for attempts to reboot the system, which will immediately terminate the process. (@reboot is included in @privileged.) This also effectively fixes #9939, since specifying different errnos for “the same syscall” (same pseudo syscall number) is no longer an error.
* seccomp: improve error reportingLucas Werkmeister2018-08-291-1/+11
| | | | | | | | Only report OOM if that was actually the error of the operation, explicitly report the possible error that a syscall was already blocked with a different errno and translate that into a more sensible errno (EEXIST only makes sense in connection to the hashmap), and pass through all other potential errors unmodified. Part of #9939.
* seccomp: add swapcontext into @process for ppc32Lion Yang2018-07-031-0/+1
| | | | | | | There are some modern programming languages use userspace context switches to implement coroutine features. PowerPC (32-bit) needs syscall "swapcontext" to get contexts or switch between contexts, which is special. Adding this rule should fix #9485.
* seccomp: explain why we use setuid rather than @setuid in @privilegedLennart Poettering2018-06-141-1/+1
|
* seccomp: add new system call filter, suitable as default whitelist for ↵Lennart Poettering2018-06-141-0/+69
| | | | | | | | | | | | | | | system services Currently we employ mostly system call blacklisting for our system services. Let's add a new system call filter group @system-service that helps turning this around into a whitelist by default. The new group is very similar to nspawn's default filter list, but in some ways more restricted (as sethostname() and suchlike shouldn't be available to most system services just like that) and in others more relaxed (for example @keyring is blocked in nspawn since it's not properly virtualized yet in the kernel, but is fine for regular system services).
* tree-wide: remove Lennart's copyright linesLennart Poettering2018-06-141-3/+0
| | | | | | | | | | | These lines are generally out-of-date, incomplete and unnecessary. With SPDX and git repository much more accurate and fine grained information about licensing and authorship is available, hence let's drop the per-file copyright notice. Of course, removing copyright lines of others is problematic, hence this commit only removes my own lines and leaves all others untouched. It might be nicer if sooner or later those could go away too, making git the only and accurate source of authorship information.
* tree-wide: drop 'This file is part of systemd' blurbLennart Poettering2018-06-141-2/+0
| | | | | | | | | | | | | | | | This part of the copyright blurb stems from the GPL use recommendations: https://www.gnu.org/licenses/gpl-howto.en.html The concept appears to originate in times where version control was per file, instead of per tree, and was a way to glue the files together. Ultimately, we nowadays don't live in that world anymore, and this information is entirely useless anyway, as people are very welcome to copy these files into any projects they like, and they shouldn't have to change bits that are part of our copyright header for that. hence, let's just get rid of this old cruft, and shorten our codebase a bit.
* nsflsgs: drop namespace_flag_{from,to}_string()Yu Watanabe2018-05-051-1/+1
| | | | | | This also drops namespace_flag_to_string_many_with_check(), and renames namespace_flag_{from,to}_string_many() to namespace_flags_{from,to}_string().
* tree-wide: drop license boilerplateZbigniew Jędrzejewski-Szmek2018-04-061-13/+0
| | | | | | | | | | Files which are installed as-is (any .service and other unit files, .conf files, .policy files, etc), are left as is. My assumption is that SPDX identifiers are not yet that well known, so it's better to retain the extended header to avoid any doubt. I also kept any copyright lines. We can probably remove them, but it'd nice to obtain explicit acks from all involved authors before doing that.
* tree-wide: use TAKE_PTR() and TAKE_FD() macrosYu Watanabe2018-04-051-2/+1
|
* Partially revert "seccomp: add mmap and address family restrictions for ↵James Cowgill2018-03-231-10/+4
| | | | | | | | | | | | | | | MIPS" (#8563) This reverts the mmap parts of f5aeac1439d64905c7b1b57042c39589dd31e3a6, but keeps the part which restricts address families which works correctly. Unfortunately the MIPS toolchains still do not implement PT_GNU_STACK. This means that while the commit to restrict mmap on MIPS was "correct", it had the side effect of causing pthread_create to fail because glibc tries to allocate an executable stack for new threads in the absense of PT_GNU_STACK. We should wait until PT_GNU_STACK is implemented in all the relevant parts of the toolchain (at least gcc and glibc) before enabling this again.
* seccomp: add mmap and address family restrictions for MIPS (#8547)James Cowgill2018-03-221-4/+16
|
* seccomp: enable RestrictAddressFamilies on ppc (#8505)Mathieu Malaterre2018-03-201-1/+1
| | | | | In commit da1921a5c3 ppc64/ppc64el were added as supported architectures for socketcall() for the POWER family. Extend the support for the 32bits architectures.
* seccomp: rework functions for parsing system call filtersLennart Poettering2018-02-271-15/+19
| | | | | | | | | | | | | | This reworks system call filter parsing, and replaces a couple of "bool" function arguments by a single flags parameter. This shouldn't change behaviour, except for one case: when we recursively call our parsing function on our own syscall list, then we'll lower the log level to LOG_DEBUG from LOG_WARNING, because at that point things are just a problem in our own code rather than in the user configuration we are parsing, and we shouldn't hence generate confusing warnings about syntax errors. Fixes: #8261
* seccomp: allow x86-64 syscalls on x32, used by the VDSO (fix #8060)Alan Jenkins2018-02-021-4/+22
| | | | | | | | | | The VDSO provided by the kernel for x32, uses x86-64 syscalls instead of x32 ones. I think we can safely allow this; the set of x86-64 syscalls should be very similar to the x32 ones. The real point is not to allow *x86* syscalls, because some of those are inconveniently multiplexed and we're apparently not able to block the specific actions we want to.
* seccomp-util: fix alarming debug message (#8002, #8001)Alan Jenkins2018-01-311-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | Booting with `systemd.log_level=debug` and looking in `dmesg -u` showed messages like this: systemd[433]: Failed to add rule for system call n/a() / 156, ignoring: Numerical argument out of domain This commit fixes it to: systemd[449]: Failed to add rule for system call _sysctl() / 156, ignoring: Numerical argument out of domain Some of the messages could be even more misleading, e.g. we were reporting that utimensat() / 320 was skipped as non-existent on x86, when actually the syscall number 320 is kexec_file_load() on x86 . The problem was that syscall NRs are looked up (and correctly passed to libseccomp) as native syscall NRs. But we forgot that when we tried to go back from the syscall NR to the name. I think the natural way to write this would be seccomp_syscall_resolve_num(nr), however there is no such function. I couldn't work out a short comment that would make this clearer. FWIW I wrote it up as a ticket for libseccomp instead. https://github.com/seccomp/libseccomp/issues/104
* Merge pull request #7695 from yuwata/transient-socketLennart Poettering2017-12-231-0/+59
|\ | | | | DBus-API: implement transient socket unit
| * core,seccomp: fix logic to parse syscall filter in dbus-execute.cYu Watanabe2017-12-231-0/+59
| | | | | | | | | | | | | | | | If multiple SystemCallFilter= settings, some of them are whitelist and the others are blacklist, are sent to bus, then the parse result was corrupted. This fixes the parse logic, now it is the same as one used in load-fragment.c
* | shared/seccomp: add mmap handling for powerpcMathieu Malaterre2017-12-221-1/+2
|/ | | | | | | Also remove the warning: ./src/shared/seccomp-util.c:1414:2: warning: #warning "Consider adding the right mmap() syscall definitions here!" [-Wcpp] #warning "Consider adding the right mmap() syscall definitions here!"
* tree-wide: add DEBUG_LOGGING macro that checks whether debug logging is on ↵Lennart Poettering2017-12-151-1/+1
| | | | | | | (#7645) This makes things a bit easier to read I think, and also makes sure we always use the _unlikely_ wrapper around it, which so far we used sometimes and other times we didn't. Let's clean that up.
* Add SPDX license identifiers to source files under the LGPLZbigniew Jędrzejewski-Szmek2017-11-191-0/+1
| | | | | This follows what the kernel is doing, c.f. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=5fd54ace4721fc5ce2bb5aef6318fcf17f421460.
* shared/seccomp: skip pkey_mprotect protections if the syscall is unknownZbigniew Jędrzejewski-Szmek2017-11-131-0/+2
| | | | | | When compiling with an old kernel on architectures for which the number is not defined in missing.h, a warning is generated in missing.h. Let's just skip the protection in this case, to allow build to proceed.
* shared/seccomp: disallow pkey_mprotect the same as mprotect for W^X mappings ↵Zbigniew Jędrzejewski-Szmek2017-11-121-0/+6
| | | | | | | | | | | | | | (#7295) MemoryDenyWriteExecution policy could be be bypassed by using pkey_mprotect instead of mprotect to create an executable writable mapping. The impact is mitigated by the fact that the man page says "Note that this feature is fully available on x86-64, and partially on x86", so hopefully people do not rely on it as a sole security measure. Found by Karin Hossen and Thomas Imbert from Sogeti ESEC R&D. https://bugs.launchpad.net/bugs/1725348
* seccomp: include ARM set_tls in @default (#7297)Lennart Poettering2017-11-121-0/+1
| | | Fixes: #7135
* core: add support to specify errno in SystemCallFilter=Yu Watanabe2017-11-111-8/+14
| | | | | | | | | This makes each system call in SystemCallFilter= blacklist optionally takes errno name or number after a colon. The errno takes precedence over the one given by SystemCallErrorNumber=. C.f. #7173. Closes #7169.
* Fix typo in statx macro (#7180)Antonio Rojas2017-11-101-1/+1
| | | This makes statx properly whitelisted in supported systems.
* seccomp: port @privileged to use @reboot + @swapLennart Poettering2017-10-051-5/+2
| | | | | Let's reuse two groups we already defined to make @privileged a bit shorter.
* seccomp: there is no "kexec" syscallLennart Poettering2017-10-051-1/+1
| | | | it's called "kexec_load".
* seccomp: add three more seccomp groupsLennart Poettering2017-10-051-7/+36
| | | | | | | | | @aio → asynchronous IO calls @sync → msync/fsync/... and friends @chown → changing file ownership (Also, change @privileged to reference @chown now, instead of the individual syscalls it contains)