diff options
author | Eric W. Biederman <ebiederm@xmission.com> | 2021-09-22 18:24:02 +0200 |
---|---|---|
committer | Eric W. Biederman <ebiederm@xmission.com> | 2021-10-08 19:06:02 +0200 |
commit | 0258b5fd7c7124b87e185a1a9322d2c66b1876b7 (patch) | |
tree | 55800c054b48245841afec2f62605d0ef8990dd6 /kernel | |
parent | coredump: Don't perform any cleanups before dumping core (diff) | |
download | linux-0258b5fd7c7124b87e185a1a9322d2c66b1876b7.tar.xz linux-0258b5fd7c7124b87e185a1a9322d2c66b1876b7.zip |
coredump: Limit coredumps to a single thread group
Today when a signal is delivered with a handler of SIG_DFL whose
default behavior is to generate a core dump not only that process but
every process that shares the mm is killed.
In the case of vfork this looks like a real world problem. Consider
the following well defined sequence.
if (vfork() == 0) {
execve(...);
_exit(EXIT_FAILURE);
}
If a signal that generates a core dump is received after vfork but
before the execve changes the mm the process that called vfork will
also be killed (as the mm is shared).
Similarly if the execve fails after the point of no return the kernel
delivers SIGSEGV which will kill both the exec'ing process and because
the mm is shared the process that called vfork as well.
As far as I can tell this behavior is a violation of people's
reasonable expectations, POSIX, and is unnecessarily fragile when the
system is low on memory.
Solve this by making a userspace visible change to only kill a single
process/thread group. This is possible because Jann Horn recently
modified[1] the coredump code so that the mm can safely be modified
while the coredump is happening. With LinuxThreads long gone I don't
expect anyone to have a notice this behavior change in practice.
To accomplish this move the core_state pointer from mm_struct to
signal_struct, which allows different thread groups to coredump
simultatenously.
In zap_threads remove the work to kill anything except for the current
thread group.
v2: Remove core_state from the VM_BUG_ON_MM print to fix
compile failure when CONFIG_DEBUG_VM is enabled.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
[1] a07279c9a8cd ("binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot")
Fixes: d89f3847def4 ("[PATCH] thread-aware coredumps, 2.5.43-C3")
History-tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
Link: https://lkml.kernel.org/r/87y27mvnke.fsf@disp2133
Link: https://lkml.kernel.org/r/20211007144701.67592574@canb.auug.org.au
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Diffstat (limited to 'kernel')
-rw-r--r-- | kernel/exit.c | 13 | ||||
-rw-r--r-- | kernel/fork.c | 1 |
2 files changed, 4 insertions, 10 deletions
diff --git a/kernel/exit.c b/kernel/exit.c index 774e6b5061b8..2b355e926c13 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -342,23 +342,18 @@ kill_orphaned_pgrp(struct task_struct *tsk, struct task_struct *parent) static void coredump_task_exit(struct task_struct *tsk) { struct core_state *core_state; - struct mm_struct *mm; - - mm = tsk->mm; - if (!mm) - return; /* * Serialize with any possible pending coredump. - * We must hold mmap_lock around checking core_state + * We must hold siglock around checking core_state * and setting PF_POSTCOREDUMP. The core-inducing thread * will increment ->nr_threads for each thread in the * group without PF_POSTCOREDUMP set. */ - mmap_read_lock(mm); + spin_lock_irq(&tsk->sighand->siglock); tsk->flags |= PF_POSTCOREDUMP; - core_state = mm->core_state; - mmap_read_unlock(mm); + core_state = tsk->signal->core_state; + spin_unlock_irq(&tsk->sighand->siglock); if (core_state) { struct core_thread self; diff --git a/kernel/fork.c b/kernel/fork.c index 9bd9f2da9e41..c8adb76982f7 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1044,7 +1044,6 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, seqcount_init(&mm->write_protect_seq); mmap_init_lock(mm); INIT_LIST_HEAD(&mm->mmlist); - mm->core_state = NULL; mm_pgtables_bytes_init(mm); mm->map_count = 0; mm->locked_vm = 0; |