summaryrefslogtreecommitdiffstats
path: root/kernel/task_work.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* task_work: allow TWA_SIGNAL without a rescheduling IPIJens Axboe2022-04-301-6/+19
| | | | | | | | | | | | | | | | | | Some use cases don't always need an IPI when sending a TWA_SIGNAL notification. Add TWA_SIGNAL_NO_IPI, which is just like TWA_SIGNAL, except it doesn't send an IPI to the target task. It merely sets TIF_NOTIFY_SIGNAL and wakes up the task. This can be useful in avoiding a forceful transition to the kernel if the task is running in userspace. Depending on the task_work in question, it may be quite fine waiting for the next reschedule or kernel enter anyway, or the use case may even have other mechanisms for hinting to the task that a transition may be useful. This can drive more cooperative scheduling of task_work. Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/821f42b6-7d91-8074-8212-d34998097de4@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk>
* resume_user_mode: Move to resume_user_mode.hEric W. Biederman2022-03-101-1/+1
| | | | | | | | | | | | | | | Move set_notify_resume and tracehook_notify_resume into resume_user_mode.h. While doing that rename tracehook_notify_resume to resume_user_mode_work. Update all of the places that included tracehook.h for these functions to include resume_user_mode.h instead. Update all of the callers of tracehook_notify_resume to call resume_user_mode_work. Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20220309162454.123006-12-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
* task_work: Introduce task_work_pendingEric W. Biederman2022-03-101-1/+1
| | | | | | | | | | | | | | Wrap the test of task->task_works in a helper function to make it clear what is being tested. All of the other readers of task->task_work use READ_ONCE and this is even necessary on current as other processes can update task->task_work. So for consistency I have added READ_ONCE into task_work_pending. Reviewed-by: Kees Cook <keescook@chromium.org> Link: https://lkml.kernel.org/r/20220309162454.123006-7-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
* kasan: record task_work_add() call stackWalter Wu2021-04-301-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Why record task_work_add() call stack? Syzbot reports many use-after-free issues for task_work, see [1]. After seeing the free stack and the current auxiliary stack, we think they are useless, we don't know where the work was registered. This work may be the free call stack, so we miss the root cause and don't solve the use-after-free. Add the task_work_add() call stack into the KASAN auxiliary stack in order to improve KASAN reports. It helps programmers solve use-after-free issues. [1]: https://groups.google.com/g/syzkaller-bugs/search?q=kasan%20use-after-free%20task_work_run Link: https://lkml.kernel.org/r/20210316024410.19967-1-walter-zh.wu@mediatek.com Signed-off-by: Walter Wu <walter-zh.wu@mediatek.com> Suggested-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Jens Axboe <axboe@kernel.dk> Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Matthias Brugger <matthias.bgg@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* task_work: add helper for more targeted task_work cancelingJens Axboe2021-04-121-7/+28
| | | | | | | | | | | | | | The only exported helper we have right now is task_work_cancel(), which cancels any task_work from a given task where func matches the queued work item. This is a bit too coarse for some use cases. Add a task_work_cancel_match() that allows to more specifically target individual work items outside of purely the callback function used. task_work_cancel() can be trivially implemented on top of that, hence do so. Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* task_work: remove legacy TWA_SIGNAL pathJens Axboe2020-12-121-29/+1
| | | | | | All archs now support TIF_NOTIFY_SIGNAL. Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Merge tag 'core-entry-notify-signal' of ↵Jens Axboe2020-11-091-12/+29
|\ | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into tif-task_work.arch Core changes to support TASK_NOTIFY_SIGNAL * tag 'core-entry-notify-signal' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: task_work: Use TIF_NOTIFY_SIGNAL if available entry: Add support for TIF_NOTIFY_SIGNAL signal: Add task_sigpending() helper
| * task_work: Use TIF_NOTIFY_SIGNAL if availableJens Axboe2020-10-291-12/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the arch supports TIF_NOTIFY_SIGNAL, then use that for TWA_SIGNAL as it's more efficient than using the signal delivery method. This is especially true on threaded applications, where ->sighand is shared across threads, but it's also lighter weight on non-shared cases. io_uring is a heavy consumer of TWA_SIGNAL based task_work. A test with threads shows a nice improvement running an io_uring based echo server. stock kernel: 0.01% <= 0.1 milliseconds 95.86% <= 0.2 milliseconds 98.27% <= 0.3 milliseconds 99.71% <= 0.4 milliseconds 100.00% <= 0.5 milliseconds 100.00% <= 0.6 milliseconds 100.00% <= 0.7 milliseconds 100.00% <= 0.8 milliseconds 100.00% <= 0.9 milliseconds 100.00% <= 1.0 milliseconds 100.00% <= 1.1 milliseconds 100.00% <= 2 milliseconds 100.00% <= 3 milliseconds 100.00% <= 3 milliseconds 1378930.00 requests per second ~1600% CPU 1.38M requests/second, and all 16 CPUs are maxed out. patched kernel: 0.01% <= 0.1 milliseconds 98.24% <= 0.2 milliseconds 99.47% <= 0.3 milliseconds 99.99% <= 0.4 milliseconds 100.00% <= 0.5 milliseconds 100.00% <= 0.6 milliseconds 100.00% <= 0.7 milliseconds 100.00% <= 0.8 milliseconds 100.00% <= 0.9 milliseconds 100.00% <= 1.2 milliseconds 1666111.38 requests per second ~1450% CPU 1.67M requests/second, and we're no longer just hammering on the sighand lock. The original reporter states: "For 5.7.15 my benchmark achieves 1.6M qps and system cpu is at ~80%. for 5.7.16 or later it achieves only 1M qps and the system cpu is is at ~100%" with the only difference there being that TWA_SIGNAL is used unconditionally in 5.7.16, since it's required to be able to handle the inability to run task_work if the application is waiting in the kernel already on an event that needs task_work run to be satisfied. Also see commit 0ba9c9edcd15. Reported-by: Roman Gershman <romger@amazon.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Link: https://lore.kernel.org/r/20201026203230.386348-5-axboe@kernel.dk
* | task_work: cleanup notification modesJens Axboe2020-10-171-10/+20
|/ | | | | | | | | | | | | | | | | | | | | | | | | A previous commit changed the notification mode from true/false to an int, allowing notify-no, notify-yes, or signal-notify. This was backwards compatible in the sense that any existing true/false user would translate to either 0 (on notification sent) or 1, the latter which mapped to TWA_RESUME. TWA_SIGNAL was assigned a value of 2. Clean this up properly, and define a proper enum for the notification mode. Now we have: - TWA_NONE. This is 0, same as before the original change, meaning no notification requested. - TWA_RESUME. This is 1, same as before the original change, meaning that we use TIF_NOTIFY_RESUME. - TWA_SIGNAL. This uses TIF_SIGPENDING/JOBCTL_TASK_WORK for the notification. Clean up all the callers, switching their 0/1/false/true to using the appropriate TWA_* mode for notifications. Fixes: e91b48162332 ("task_work: teach task_work_add() to do signal_wake_up()") Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* task_work: only grab task signal lock when neededJens Axboe2020-08-131-1/+7
| | | | | | | | | | | | | | | | | | | | | | | If JOBCTL_TASK_WORK is already set on the targeted task, then we need not go through {lock,unlock}_task_sighand() to set it again and queue a signal wakeup. This is safe as we're checking it _after_ adding the new task_work with cmpxchg(). The ordering is as follows: task_work_add() get_signal() -------------------------------------------------------------- STORE(task->task_works, new_work); STORE(task->jobctl); mb(); mb(); LOAD(task->jobctl); LOAD(task->task_works); This speeds up TWA_SIGNAL handling quite a bit, which is important now that io_uring is relying on it for all task_work deliveries. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Jann Horn <jannh@google.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* task_work: teach task_work_add() to do signal_wake_up()Oleg Nesterov2020-06-301-2/+14
| | | | | | | | | | | | | | | | | | So that the target task will exit the wait_event_interruptible-like loop and call task_work_run() asap. The patch turns "bool notify" into 0,TWA_RESUME,TWA_SIGNAL enum, the new TWA_SIGNAL flag implies signal_wake_up(). However, it needs to avoid the race with recalc_sigpending(), so the patch also adds the new JOBCTL_TASK_WORK bit included in JOBCTL_PENDING_MASK. TODO: once this patch is merged we need to change all current users of task_work_add(notify = true) to use TWA_RESUME. Cc: stable@vger.kernel.org # v5.7 Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* task_work_run: don't take ->pi_lock unconditionallyOleg Nesterov2020-03-021-4/+14
| | | | | | | | | | | | As Peter pointed out, task_work() can avoid ->pi_lock and cmpxchg() if task->task_works == NULL && !PF_EXITING. And in fact the only reason why task_work_run() needs ->pi_lock is the possible race with task_work_cancel(), we can optimize this code and make the locking more clear. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Merge branch 'linus' into locking/core, to resolve conflictsIngo Molnar2017-11-071-0/+1
|\ | | | | | | | | | | | | | | | | | | Conflicts: include/linux/compiler-clang.h include/linux/compiler-gcc.h include/linux/compiler-intel.h include/uapi/linux/stddef.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman2017-11-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | locking/barriers: Convert users of lockless_dereference() to READ_ONCE()Will Deacon2017-10-241-1/+1
|/ | | | | | | | | | | | | | READ_ONCE() now has an implicit smp_read_barrier_depends() call, so it can be used instead of lockless_dereference() without any change in semantics. Signed-off-by: Will Deacon <will.deacon@arm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1508840570-22169-4-git-send-email-will.deacon@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* task_work: Replace spin_unlock_wait() with lock/unlock pairOleg Nesterov2017-07-251-6/+2
| | | | | | | | | | | | | There is no agreed-upon definition of spin_unlock_wait()'s semantics, and it appears that all callers could do just as well with a lock/unlock pair. This commit therefore replaces the spin_unlock_wait() call in task_work_run() with a spin_lock_irq() and a spin_unlock_irq() aruond the cmpxchg() dequeue loop. This should be safe from a performance perspective because ->pi_lock is local to the task and because calls to the other side of the race, task_work_cancel(), should be rare. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
* task_work: use READ_ONCE/lockless_dereference, avoid pi_lock if !task_worksOleg Nesterov2016-08-031-4/+6
| | | | | | | | | | | | | | | | | | | Change task_work_cancel() to use lockless_dereference(), this is what the code really wants but we didn't have this helper when it was written. Also add the fast-path task->task_works == NULL check, in the likely case this task has no pending works and we can avoid spin_lock(task->pi_lock). While at it, change other users of ACCESS_ONCE() to use READ_ONCE(). Link: http://lkml.kernel.org/r/20160610150042.GA13868@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Andrea Parri <parri.andrea@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* locking/spinlock: Update spin_unlock_wait() usersPeter Zijlstra2016-06-141-1/+0
| | | | | | | | | | | | | | | With the modified semantics of spin_unlock_wait() a number of explicit barriers can be removed. Also update the comment for the do_exit() usecase, as that was somewhat stale/obscure. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* task_work: remove fifo ordering guaranteeEric Dumazet2015-09-051-10/+2
| | | | | | | | | | | | | | | | | | | | | | In commit f341861fb0b ("task_work: add a scheduling point in task_work_run()") I fixed a latency problem adding a cond_resched() call. Later, commit ac3d0da8f329 added yet another loop to reverse a list, bringing back the latency spike : I've seen in some cases this loop taking 275 ms, if for example a process with 2,000,000 files is killed. We could add yet another cond_resched() in the reverse loop, or we can simply remove the reversal, as I do not think anything would depend on order of task_work_add() submitted works. Fixes: ac3d0da8f329 ("task_work: Make task_work_add() lockless") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Maciej Żenczykowski <maze@google.com> Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* task_work: documentationOleg Nesterov2013-09-121-0/+36
| | | | | | | | No functional changes, just comments. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* task_work: minor cleanupsOleg Nesterov2013-09-121-2/+2
| | | | | | | | | | Trivial. Remove the unnecessary "work = NULL" initialization and turn read_barrier_depends() into smp_read_barrier_depends() in task_work_cancel(). Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* task_work: task_work_add() should not succeed after exit_task_work()Oleg Nesterov2012-09-131-6/+16
| | | | | | | | | | | | | | | | | | | | | | ed3e694d "move exit_task_work() past exit_files() et.al" destroyed the add/exit synchronization we had, the caller itself should ensure task_work_add() can't race with the exiting task. However, this is not convenient/simple, and the only user which tries to do this is buggy (see the next patch). Unless the task is current, there is simply no way to do this in general. Change exit_task_work()->task_work_run() to use the dummy "work_exited" entry to let task_work_add() know it should fail. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20120826191211.GA4228@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* task_work: Make task_work_add() locklessOleg Nesterov2012-09-131-47/+48
| | | | | | | | | | | | | | | | | | | | Change task_work's to use llist-like code to avoid pi_lock in task_work_add(), this makes it useable under rq->lock. task_work_cancel() and task_work_run() still use pi_lock to synchronize with each other. (This is in preparation for a deadlock fix.) Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20120826191209.GA4221@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* task_work: add a scheduling point in task_work_run()Eric Dumazet2012-08-211-0/+1
| | | | | | | | | | | | | It seems commit 4a9d4b024a31 ("switch fput to task_work_add") re- introduced the problem addressed in 944be0b22472 ("close_files(): add scheduling point") If a server process with a lot of files (say 2 million tcp sockets) is killed, we can spend a lot of time in task_work_run() and trigger a soft lockup. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* deal with task_work callbacks adding more workAl Viro2012-07-221-12/+14
| | | | | | | | | It doesn't matter on normal return to userland path (we'll recheck the NOTIFY_RESUME flag anyway), but in case of exit_task_work() we'll need that as soon as we get callbacks capable of triggering more task_work_add(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* move exit_task_work() past exit_files() et.al.Al Viro2012-07-221-19/+11
| | | | | | ... and get rid of PF_EXITING check in task_work_add(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* merge task_work and rcu_head, get rid of separate allocation for keyring caseAl Viro2012-07-221-7/+7
| | | | | | | | task_work and rcu_head are identical now; merge them (calling the result struct callback_head, rcu_head #define'd to it), kill separate allocation in security/keys since we can just use cred->rcu now. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* trim task_work: get rid of hlistAl Viro2012-07-221-31/+33
| | | | | | | | | | layout based on Oleg's suggestion; single-linked list, task->task_works points to the last element, forward pointer from said last element points to head. I'd still prefer much more regular scheme with two pointers in task_work, but... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* task_work_add: generic process-context callbacksOleg Nesterov2012-05-241-0/+84
Provide a simple mechanism that allows running code in the (nonatomic) context of the arbitrary task. The caller does task_work_add(task, task_work) and this task executes task_work->func() either from do_notify_resume() or from do_exit(). The callback can rely on PF_EXITING to detect the latter case. "struct task_work" can be embedded in another struct, still it has "void *data" to handle the most common/simple case. This allows us to kill the ->replacement_session_keyring hack, and potentially this can have more users. Performance-wise, this adds 2 "unlikely(!hlist_empty())" checks into tracehook_notify_resume() and do_exit(). But at the same time we can remove the "replacement_session_keyring != NULL" checks from arch/*/signal.c and exit_creds(). Note: task_work_add/task_work_run abuses ->pi_lock. This is only because this lock is already used by lookup_pi_state() to synchronize with do_exit() setting PF_EXITING. Fortunately the scope of this lock in task_work.c is really tiny, and the code is unlikely anyway. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: David Howells <dhowells@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Richard Kuo <rkuo@codeaurora.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alexander Gordeev <agordeev@redhat.com> Cc: Chris Zankel <chris@zankel.net> Cc: David Smith <dsmith@redhat.com> Cc: "Frank Ch. Eigler" <fche@redhat.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>