| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk
Pull printk updates from Petr Mladek:
- Allow to sort mixed lines by an extra information about the caller
- Remove no longer used LOG_PREFIX.
- Some clean up and documentation update.
* tag 'printk-for-5.1' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk:
printk/docs: Add extra integer types to printk-formats
printk: Remove no longer used LOG_PREFIX.
lib/vsprintf: Remove %pCr remnant in comment
printk: Pass caller information to log_store().
printk: Add caller information to printk() output.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
A few commonly used integer types were absent from this table, so add
them.
Link: https://github.com/ClangBuiltLinux/linux/issues/378
Suggested-by: Nick Desaulniers <ndesaulniers@google.com>
Link: http://lkml.kernel.org/r/20190303123647.22020-1-louis@kragniz.eu
Cc: pmladek@suse.com
Cc: geert+renesas@glider.be
Cc: andriy.shevchenko@linux.intel.com
Cc: linux-doc@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: clang-built-linux@googlegroups.com
Cc: ndesaulniers@google.com
Cc: jflat@chromium.org
Cc: Louis Taylor <louis@kragniz.eu>
Signed-off-by: Louis Taylor <louis@kragniz.eu>
[pmladek@suse.com: sorted both variants the same way by size]
Signed-off-by: Petr Mladek <pmladek@suse.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When commit 5becfb1df5ac8e49 ("kmsg: merge continuation records while
printing") introduced LOG_PREFIX, we used KERN_DEFAULT etc. as a flag
for setting LOG_PREFIX in order to tell whether to call cont_add()
(i.e. whether to append the message to "struct cont").
But since commit 4bcc595ccd80decb ("printk: reinstate KERN_CONT for
printing continuation lines") inverted the behavior (i.e. don't append
the message to "struct cont" unless KERN_CONT is specified) and commit
5aa068ea4082b39e ("printk: remove games with previous record flags")
removed the last LOG_PREFIX check, setting LOG_PREFIX via KERN_DEFAULT
etc. is no longer meaningful.
Therefore, we can remove LOG_PREFIX and make KERN_DEFAULT empty string.
Link: http://lkml.kernel.org/r/1550829580-9189-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
To: Steven Rostedt <rostedt@goodmis.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Support for "%pCr" was removed, but a reference in a comment was
forgotten.
Fixes: 666902e42fd8344b ("lib/vsprintf: Remove atomic-unsafe support for %pCr")
Link: http://lkml.kernel.org/r/20190228105315.744-1-geert+renesas@glider.be
To: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Petr Mladek <pmladek@suse.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When thread1 called printk() which did not end with '\n', and then thread2
called printk() which ends with '\n' before thread1 calls pr_cont(), the
partial content saved into "struct cont" is flushed by thread2 despite the
partial content was generated by thread1. This leads to confusing output
as if the partial content was generated by thread2. Fix this problem by
passing correct caller information to log_store().
Before:
[ T8533] abcdefghijklm
[ T8533] ABCDEFGHIJKLMNOPQRSTUVWXYZ
[ T8532] nopqrstuvwxyz
[ T8532] abcdefghijklmnopqrstuvwxyz
[ T8533] abcdefghijklm
[ T8533] ABCDEFGHIJKLMNOPQRSTUVWXYZ
[ T8532] nopqrstuvwxyz
After:
[ T8507] abcdefghijklm
[ T8508] ABCDEFGHIJKLMNOPQRSTUVWXYZ
[ T8507] nopqrstuvwxyz
[ T8507] abcdefghijklmnopqrstuvwxyz
[ T8507] abcdefghijklm
[ T8508] ABCDEFGHIJKLMNOPQRSTUVWXYZ
[ T8507] nopqrstuvwxyz
Link: http://lkml.kernel.org/r/1550314773-8607-1-git-send-email-penguin-kernel@I-love.SAKURA.ne.jp
To: Dmitry Vyukov <dvyukov@google.com>
To: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
To: Steven Rostedt <rostedt@goodmis.org>
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
[pmladek: broke 80-column rule where it made more harm than good]
Signed-off-by: Petr Mladek <pmladek@suse.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Sometimes we want to print a series of printk() messages to consoles
without being disturbed by concurrent printk() from interrupts and/or
other threads. But we can't enforce printk() callers to use their local
buffers because we need to ask them to make too much changes. Also, even
buffering up to one line inside printk() might cause failing to emit
an important clue under critical situation.
Therefore, instead of trying to help buffering, let's try to help
reconstructing messages by saving caller information as of calling
log_store() and adding it as "[T$thread_id]" or "[C$processor_id]"
upon printing to consoles.
Some examples for console output:
[ 1.222773][ T1] x86: Booting SMP configuration:
[ 2.779635][ T1] pci 0000:00:01.0: PCI bridge to [bus 01]
[ 5.069193][ T268] Fusion MPT base driver 3.04.20
[ 9.316504][ C2] random: fast init done
[ 13.413336][ T3355] Initialized host personality
Some examples for /dev/kmsg output:
6,496,1222773,-,caller=T1;x86: Booting SMP configuration:
6,968,2779635,-,caller=T1;pci 0000:00:01.0: PCI bridge to [bus 01]
SUBSYSTEM=pci
DEVICE=+pci:0000:00:01.0
6,1353,5069193,-,caller=T268;Fusion MPT base driver 3.04.20
5,1526,9316504,-,caller=C2;random: fast init done
6,1575,13413336,-,caller=T3355;Initialized host personality
Note that this patch changes max length of messages which can be printed
by printk() or written to /dev/kmsg interface from 992 bytes to 976 bytes,
based on an assumption that userspace won't try to write messages hitting
that border line to /dev/kmsg interface.
Link: http://lkml.kernel.org/r/93f19e57-5051-c67d-9af4-b17624062d44@i-love.sakura.ne.jp
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Petr Mladek <pmladek@suse.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest update fromShuah Khan:
- ir test compile warnings fixes
- seccomp test fixes and improvements from Tycho Andersen and Kees Cook
- ftrace fixes to non-POSIX-compliant constructs in colored output code
and handling absence of tput from Juerg Haefliger
* tag 'linux-kselftest-5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/ftrace: Handle the absence of tput
selftests/ftrace: Replace \e with \033
selftests/ftrace: Replace echo -e with printf
selftests: ir: skip when non-root user runs the test
selftests: ir: skip when lirc device doesn't exist.
selftests: ir: fix warning: "%s" directive output may be truncated ’ directive output may be truncated
selftests/seccomp: Actually sleep for 1/10th second
selftests/harness: Update named initializer syntax
selftests: unshare userns in seccomp pidns testcases
selftests: set NO_NEW_PRIVS bit in seccomp user tests
selftests: skip seccomp get_metadata test if not real root
selftest: include stdio.h in kselftest.h
selftests: fix typo in seccomp_bpf.c
selftests: don't kill child immediately in get_metadata() test
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In environments where tput is not available, we get the following
error
$ ./ftracetest: 163: [: Illegal number:
because ncolors is an empty string. Fix that by setting it to 0 if the
tput command fails.
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Juerg Haefliger <juergh@canonical.com>
Signed-off-by: Shuah Khan <shuah@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The \e sequence character is not POSIX. Fix that by using \033 instead.
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Juerg Haefliger <juergh@canonical.com>
Signed-off-by: Shuah Khan <shuah@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
echo -e is not POSIX. Depending on what /bin/sh is, we can get
incorrect output like:
$ -e -n [1] Basic trace file check
$ -e [PASS]
Fix that by using printf instead.
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Juerg Haefliger <juergh@canonical.com>
Signed-off-by: Shuah Khan <shuah@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Skip instead of fail when non-root user runs the test.
Signed-off-by: Shuah Khan <shuah@kernel.org>
Acked-by: Sean Young <sean@mess.org>
Signed-off-by: Shuah Khan <shuah@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Skip instead of fail when lirc device doesn't exist.
Signed-off-by: Shuah Khan <shuah@kernel.org>
Acked-by: Sean Young <sean@mess.org>
Signed-off-by: Shuah Khan <shuah@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
directive output may be truncated
Fix the following warning by sizing the buffer to max. of sysfs
path max. size + d_name max. size.
gcc -Wall -O2 -I../../../include/uapi ir_loopback.c -o ../tools/testing/selftests/ir/ir_loopback
ir_loopback.c: In function ‘lirc_open’:
ir_loopback.c:71:37: warning: ‘%s’ directive output may be truncated writing up to 255 bytes into a region of size 95 [-Wformat-truncation=]
snprintf(buf, sizeof(buf), "/dev/%s", dent->d_name);
^~
In file included from /usr/include/stdio.h:862:0,
from ir_loopback.c:14:
/usr/include/x86_64-linux-gnu/bits/stdio2.h:64:10: note: ‘__builtin___snprintf_chk’ output between 6 and 261 bytes into a destination of size 100
return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1,
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
__bos (__s), __fmt, __va_arg_pack ());
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Signed-off-by: Shuah Khan <shuah@kernel.org>
Acked-by: Sean Young <sean@mess.org>
Signed-off-by: Shuah Khan <shuah@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Clang noticed that some none-zero sleep()s were actually using zero
anyway. This switches to nanosleep() to gain sub-second granularity.
seccomp_bpf.c:2625:9: warning: implicit conversion from 'double' to
'unsigned int' changes value from 0.1 to 0 [-Wliteral-conversion]
sleep(0.1);
~~~~~ ^~~
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Shuah Khan <shuah@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The harness was still using old-style GNU named initializer syntax.
Fix this so Clang will stop warning:
seccomp_bpf.c:2924:1: warning: use of GNU old-style field designator extension
[-Wgnu-designator]
./../kselftest_harness.h:147:25: note: expanded from macro 'TEST'
^
./../kselftest_harness.h:172:5: note: expanded from macro '__TEST_IMPL'
fn: &test_name, termsig: _signal }; \
^
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Shuah Khan <shuah@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The pid ns cannot be unshare()d as an unprivileged user without owning the
userns as well. Let's unshare the userns so that we can subsequently
unshare the pidns.
This also means that we don't need to set the no new privs bit as in the
other test cases, since we're unsharing the userns.
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <shuah@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
seccomp() doesn't allow users who aren't root in their userns to attach
filters unless they have the nnp bit set, so let's set it so that these
tests can pass when run as an unprivileged user.
This idea stolen from the other seccomp tests, which use this trick :)
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <shuah@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The get_metadata() test requires real root, so let's skip it if we're not
real root.
Note that I used XFAIL here because that's what the test does later if
CONFIG_CHEKCKPOINT_RESTORE happens to not be enabled. After looking at the
code, there doesn't seem to be a nice way to skip tests defined as TEST(),
since there's no return code (I tried exit(KSFT_SKIP), but that didn't work
either...). So let's do it this way to be consistent, and easier to fix
when someone comes along and fixes it.
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <shuah@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
While playing around with a way to skip the seccomp get_metadata test, I
noticed that this header uses printf() without defining it, leading to,
../kselftest.h: In function ‘ksft_print_header’:
../kselftest.h:61:3: warning: implicit declaration of function ‘printf’ [-Wimplicit-function-declaration]
printf("TAP version 13\n");
^~~~~~
../kselftest.h:61:3: warning: incompatible implicit declaration of built-in function ‘printf’
../kselftest.h:61:3: note: include ‘<stdio.h>’ or provide a declaration of ‘printf’
if user code doesn't also use printf.
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <shuah@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
There used to be an explanation here because it could trigger lockdep
previously, but now we're not doing recursive locking, so it really is just
for grins.
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <shuah@kernel.org>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This this test forks a child, and then the parent waits for a write() to a
pipe signalling the child is ready to be attached to. If something in the
child ASSERTs before it does this write, the test will hang waiting for it.
Instead, let's EXPECT, so that execution continues until we do the write.
Any failure after that is fine and can ASSERT.
Signed-off-by: Tycho Andersen <tycho@tycho.ws>
Acked-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Shuah Khan <shuah@kernel.org>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull gcc-plugins updates from Kees Cook:
"This adds additional type coverage to the existing structleak plugin
and adds a large set of selftests to help evaluate stack variable
zero-initialization coverage.
That can be used to test whatever instrumentation might be performing
zero-initialization: either with the structleak plugin or with Clang's
coming "-ftrivial-auto-var-init=zero" option.
Summary:
- Add scalar and array initialization coverage
- Refactor Kconfig to make options more clear
- Add self-test module for testing automatic initialization"
* tag 'gcc-plugins-v5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
lib: Introduce test_stackinit module
gcc-plugins: structleak: Generalize to all variable types
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Adds test for stack initialization coverage. We have several build options
that control the level of stack variable initialization. This test lets us
visualize which options cover which cases, and provide tests for some of
the pathological padding conditions the compiler will sometimes fail to
initialize.
All options pass the explicit initialization cases and the partial
initializers (even with padding):
test_stackinit: u8_zero ok
test_stackinit: u16_zero ok
test_stackinit: u32_zero ok
test_stackinit: u64_zero ok
test_stackinit: char_array_zero ok
test_stackinit: small_hole_zero ok
test_stackinit: big_hole_zero ok
test_stackinit: trailing_hole_zero ok
test_stackinit: packed_zero ok
test_stackinit: small_hole_dynamic_partial ok
test_stackinit: big_hole_dynamic_partial ok
test_stackinit: trailing_hole_dynamic_partial ok
test_stackinit: packed_dynamic_partial ok
test_stackinit: small_hole_static_partial ok
test_stackinit: big_hole_static_partial ok
test_stackinit: trailing_hole_static_partial ok
test_stackinit: packed_static_partial ok
test_stackinit: packed_static_all ok
test_stackinit: packed_dynamic_all ok
test_stackinit: packed_runtime_all ok
The results of the other tests (which contain no explicit initialization),
change based on the build's configured compiler instrumentation.
No options:
test_stackinit: small_hole_static_all FAIL (uninit bytes: 3)
test_stackinit: big_hole_static_all FAIL (uninit bytes: 61)
test_stackinit: trailing_hole_static_all FAIL (uninit bytes: 7)
test_stackinit: small_hole_dynamic_all FAIL (uninit bytes: 3)
test_stackinit: big_hole_dynamic_all FAIL (uninit bytes: 61)
test_stackinit: trailing_hole_dynamic_all FAIL (uninit bytes: 7)
test_stackinit: small_hole_runtime_partial FAIL (uninit bytes: 23)
test_stackinit: big_hole_runtime_partial FAIL (uninit bytes: 127)
test_stackinit: trailing_hole_runtime_partial FAIL (uninit bytes: 24)
test_stackinit: packed_runtime_partial FAIL (uninit bytes: 24)
test_stackinit: small_hole_runtime_all FAIL (uninit bytes: 3)
test_stackinit: big_hole_runtime_all FAIL (uninit bytes: 61)
test_stackinit: trailing_hole_runtime_all FAIL (uninit bytes: 7)
test_stackinit: u8_none FAIL (uninit bytes: 1)
test_stackinit: u16_none FAIL (uninit bytes: 2)
test_stackinit: u32_none FAIL (uninit bytes: 4)
test_stackinit: u64_none FAIL (uninit bytes: 8)
test_stackinit: char_array_none FAIL (uninit bytes: 16)
test_stackinit: switch_1_none FAIL (uninit bytes: 8)
test_stackinit: switch_2_none FAIL (uninit bytes: 8)
test_stackinit: small_hole_none FAIL (uninit bytes: 24)
test_stackinit: big_hole_none FAIL (uninit bytes: 128)
test_stackinit: trailing_hole_none FAIL (uninit bytes: 32)
test_stackinit: packed_none FAIL (uninit bytes: 32)
test_stackinit: user FAIL (uninit bytes: 32)
test_stackinit: failures: 25
CONFIG_GCC_PLUGIN_STRUCTLEAK_USER=y
This only tries to initialize structs with __user markings, so
only the difference from above is now the "user" test passes:
test_stackinit: small_hole_static_all FAIL (uninit bytes: 3)
test_stackinit: big_hole_static_all FAIL (uninit bytes: 61)
test_stackinit: trailing_hole_static_all FAIL (uninit bytes: 7)
test_stackinit: small_hole_dynamic_all FAIL (uninit bytes: 3)
test_stackinit: big_hole_dynamic_all FAIL (uninit bytes: 61)
test_stackinit: trailing_hole_dynamic_all FAIL (uninit bytes: 7)
test_stackinit: small_hole_runtime_partial FAIL (uninit bytes: 23)
test_stackinit: big_hole_runtime_partial FAIL (uninit bytes: 127)
test_stackinit: trailing_hole_runtime_partial FAIL (uninit bytes: 24)
test_stackinit: packed_runtime_partial FAIL (uninit bytes: 24)
test_stackinit: small_hole_runtime_all FAIL (uninit bytes: 3)
test_stackinit: big_hole_runtime_all FAIL (uninit bytes: 61)
test_stackinit: trailing_hole_runtime_all FAIL (uninit bytes: 7)
test_stackinit: u8_none FAIL (uninit bytes: 1)
test_stackinit: u16_none FAIL (uninit bytes: 2)
test_stackinit: u32_none FAIL (uninit bytes: 4)
test_stackinit: u64_none FAIL (uninit bytes: 8)
test_stackinit: char_array_none FAIL (uninit bytes: 16)
test_stackinit: switch_1_none FAIL (uninit bytes: 8)
test_stackinit: switch_2_none FAIL (uninit bytes: 8)
test_stackinit: small_hole_none FAIL (uninit bytes: 24)
test_stackinit: big_hole_none FAIL (uninit bytes: 128)
test_stackinit: trailing_hole_none FAIL (uninit bytes: 32)
test_stackinit: packed_none FAIL (uninit bytes: 32)
test_stackinit: user ok
test_stackinit: failures: 24
CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF=y
This initializes all structures passed by reference (scalars and strings
remain uninitialized):
test_stackinit: small_hole_static_all ok
test_stackinit: big_hole_static_all ok
test_stackinit: trailing_hole_static_all ok
test_stackinit: small_hole_dynamic_all ok
test_stackinit: big_hole_dynamic_all ok
test_stackinit: trailing_hole_dynamic_all ok
test_stackinit: small_hole_runtime_partial ok
test_stackinit: big_hole_runtime_partial ok
test_stackinit: trailing_hole_runtime_partial ok
test_stackinit: packed_runtime_partial ok
test_stackinit: small_hole_runtime_all ok
test_stackinit: big_hole_runtime_all ok
test_stackinit: trailing_hole_runtime_all ok
test_stackinit: u8_none FAIL (uninit bytes: 1)
test_stackinit: u16_none FAIL (uninit bytes: 2)
test_stackinit: u32_none FAIL (uninit bytes: 4)
test_stackinit: u64_none FAIL (uninit bytes: 8)
test_stackinit: char_array_none FAIL (uninit bytes: 16)
test_stackinit: switch_1_none FAIL (uninit bytes: 8)
test_stackinit: switch_2_none FAIL (uninit bytes: 8)
test_stackinit: small_hole_none ok
test_stackinit: big_hole_none ok
test_stackinit: trailing_hole_none ok
test_stackinit: packed_none ok
test_stackinit: user ok
test_stackinit: failures: 7
CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL=y
This initializes all variables, so it matches above with the scalars
and arrays included:
test_stackinit: small_hole_static_all ok
test_stackinit: big_hole_static_all ok
test_stackinit: trailing_hole_static_all ok
test_stackinit: small_hole_dynamic_all ok
test_stackinit: big_hole_dynamic_all ok
test_stackinit: trailing_hole_dynamic_all ok
test_stackinit: small_hole_runtime_partial ok
test_stackinit: big_hole_runtime_partial ok
test_stackinit: trailing_hole_runtime_partial ok
test_stackinit: packed_runtime_partial ok
test_stackinit: small_hole_runtime_all ok
test_stackinit: big_hole_runtime_all ok
test_stackinit: trailing_hole_runtime_all ok
test_stackinit: u8_none ok
test_stackinit: u16_none ok
test_stackinit: u32_none ok
test_stackinit: u64_none ok
test_stackinit: char_array_none ok
test_stackinit: switch_1_none ok
test_stackinit: switch_2_none ok
test_stackinit: small_hole_none ok
test_stackinit: big_hole_none ok
test_stackinit: trailing_hole_none ok
test_stackinit: packed_none ok
test_stackinit: user ok
test_stackinit: all tests passed!
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This adjusts structleak to also work with non-struct types when they
are passed by reference, since those variables may leak just like
anything else. This is exposed via an improved set of Kconfig options.
(This does mean structleak is slightly misnamed now.)
Building with CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL should give the
kernel complete initialization coverage of all stack variables passed
by reference, including padding (see lib/test_stackinit.c).
Using CONFIG_GCC_PLUGIN_STRUCTLEAK_VERBOSE to count added initializations
under defconfig:
..._BYREF: 5945 added initializations
..._BYREF_ALL: 16606 added initializations
There is virtually no change to text+data size (both have less than 0.05%
growth):
text data bss dec hex filename
19502103 5051456 1917000 26470559 193e89f vmlinux.stock
19513412 5051456 1908808 26473676 193f4cc vmlinux.byref
19516974 5047360 1900616 26464950 193d2b6 vmlinux.byref_all
The measured performance difference is in the noise for hackbench and
kernel build benchmarks:
Stock:
5x hackbench -g 20 -l 1000
Mean: 10.649s
Std Dev: 0.339
5x kernel build (4-way parallel)
Mean: 261.98s
Std Dev: 1.53
CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF:
5x hackbench -g 20 -l 1000
Mean: 10.540s
Std Dev: 0.233
5x kernel build (4-way parallel)
Mean: 260.52s
Std Dev: 1.31
CONFIG_GCC_PLUGIN_STRUCTLEAK_BYREF_ALL:
5x hackbench -g 20 -l 1000
Mean: 10.320
Std Dev: 0.413
5x kernel build (4-way parallel)
Mean: 260.10
Std Dev: 0.86
This does not yet solve missing padding initialization for structures
on the stack that are never passed by reference (which should be a tiny
minority). Hopefully this will be more easily addressed by upstream
compiler fixes after clarifying the C11 padding initialization
specification.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull pstore cleanups from Kees Cook:
- Remove some needless memory allocations (Yue Hu, Kees Cook)
- Add zero-length checks to avoid no-op calls (Yue Hu)
* tag 'pstore-v5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
pstore/ram: Avoid needless alloc during header write
pstore/ram: Add kmsg hlen zero check to ramoops_pstore_write()
pstore/ram: Move initialization earlier
pstore: Avoid writing records with zero size
pstore/ram: Replace dummy_data heap memory with stack memory
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Since the header is a fixed small maximum size, just use a stack variable
to avoid memory allocation in the write path.
Signed-off-by: Kees Cook <keescook@chromium.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
If zero-length header happened in ramoops_write_kmsg_hdr(), that means
we will not be able to read back dmesg record later, since it will be
treated as invalid header in ramoops_pstore_read(). So we should not
execute the following code but return the error.
Signed-off-by: Yue Hu <huyue2@yulong.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Since only one single ramoops area allowed at a time, other probes
(like device tree) are meaningless, as it will waste CPU resources.
So let's check for being already initialized first.
Signed-off-by: Yue Hu <huyue2@yulong.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Sometimes pstore_console_write() will write records with zero size
to persistent ram zone, which is unnecessary. It will only increase
resource consumption. Also adjust ramoops_write_kmsg_hdr() to have
same logic if memory allocation fails.
Signed-off-by: Yue Hu <huyue2@yulong.com>
Signed-off-by: Kees Cook <keescook@chromium.org>
|
| |/ / /
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
In ramoops_register_dummy() dummy_data is allocated via kzalloc()
then it will always occupy the heap space after register platform
device via platform_device_register_data(), but it will not be
used any more. So let's free it for system usage, replace it with
stack memory is better due to small size.
Signed-off-by: Yue Hu <huyue2@yulong.com>
[kees: add required memset and adjust sizeof() argument]
Signed-off-by: Kees Cook <keescook@chromium.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Fixes: ("MAINTAINERS: Update from @linux.vnet.ibm.com to @linux.ibm.com")
Reviewed-by: Paul E. McKenney <paulmck@linux.ibm.com>
Signed-off-by: Baruch Siach <baruch@tkos.co.il>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Pull io_uring IO interface from Jens Axboe:
"Second attempt at adding the io_uring interface.
Since the first one, we've added basic unit testing of the three
system calls, that resides in liburing like the other unit tests that
we have so far. It'll take a while to get full coverage of it, but
we're working towards it. I've also added two basic test programs to
tools/io_uring. One uses the raw interface and has support for all the
various features that io_uring supports outside of standard IO, like
fixed files, fixed IO buffers, and polled IO. The other uses the
liburing API, and is a simplified version of cp(1).
This adds support for a new IO interface, io_uring.
io_uring allows an application to communicate with the kernel through
two rings, the submission queue (SQ) and completion queue (CQ) ring.
This allows for very efficient handling of IOs, see the v5 posting for
some basic numbers:
https://lore.kernel.org/linux-block/20190116175003.17880-1-axboe@kernel.dk/
Outside of just efficiency, the interface is also flexible and
extendable, and allows for future use cases like the upcoming NVMe
key-value store API, networked IO, and so on. It also supports async
buffered IO, something that we've always failed to support in the
kernel.
Outside of basic IO features, it supports async polled IO as well.
This particular feature has already been tested at Facebook months ago
for flash storage boxes, with 25-33% improvements. It makes polled IO
actually useful for real world use cases, where even basic flash sees
a nice win in terms of efficiency, latency, and performance. These
boxes were IOPS bound before, now they are not.
This series adds three new system calls. One for setting up an
io_uring instance (io_uring_setup(2)), one for submitting/completing
IO (io_uring_enter(2)), and one for aux functions like registrating
file sets, buffers, etc (io_uring_register(2)). Through the help of
Arnd, I've coordinated the syscall numbers so merge on that front
should be painless.
Jon did a writeup of the interface a while back, which (except for
minor details that have been tweaked) is still accurate. Find that
here:
https://lwn.net/Articles/776703/
Huge thanks to Al Viro for helping getting the reference cycle code
correct, and to Jann Horn for his extensive reviews focused on both
security and bugs in general.
There's a userspace library that provides basic functionality for
applications that don't need or want to care about how to fiddle with
the rings directly. It has helpers to allow applications to easily set
up an io_uring instance, and submit/complete IO through it without
knowing about the intricacies of the rings. It also includes man pages
(thanks to Jeff Moyer), and will continue to grow support helper
functions and features as time progresses. Find it here:
git://git.kernel.dk/liburing
Fio has full support for the raw interface, both in the form of an IO
engine (io_uring), but also with a small test application (t/io_uring)
that can exercise and benchmark the interface"
* tag 'io_uring-2019-03-06' of git://git.kernel.dk/linux-block:
io_uring: add a few test tools
io_uring: allow workqueue item to handle multiple buffered requests
io_uring: add support for IORING_OP_POLL
io_uring: add io_kiocb ref count
io_uring: add submission polling
io_uring: add file set registration
net: split out functions related to registering inflight socket files
io_uring: add support for pre-mapped user IO buffers
block: implement bio helper to add iter bvec pages to bio
io_uring: batch io_kiocb allocation
io_uring: use fget/fput_many() for file references
fs: add fget_many() and fput_many()
io_uring: support for IO polling
io_uring: add fsync support
Add io_uring IO interface
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This adds two test programs in tools/io_uring/ that demonstrate both
the raw io_uring API (and all features) through a small benchmark
app, io_uring-bench, and the liburing exposed API in a simplified
cp(1) implementation through io_uring-cp.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Right now we punt any buffered request that ends up triggering an
-EAGAIN to an async workqueue. This works fine in terms of providing
async execution of them, but it also can create quite a lot of work
queue items. For sequentially buffered IO, it's advantageous to
serialize the issue of them. For reads, the first one will trigger a
read-ahead, and subsequent request merely end up waiting on later pages
to complete. For writes, devices usually respond better to streamed
sequential writes.
Add state to track the last buffered request we punted to a work queue,
and if the next one is sequential to the previous, attempt to get the
previous work item to handle it. We limit the number of sequential
add-ons to the a multiple (8) of the max read-ahead size of the file.
This should be a good number for both reads and wries, as it defines the
max IO size the device can do directly.
This drastically cuts down on the number of context switches we need to
handle buffered sequential IO, and a basic test case of copying a big
file with io_uring sees a 5x speedup.
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This is basically a direct port of bfe4037e722e, which implements a
one-shot poll command through aio. Description below is based on that
commit as well. However, instead of adding a POLL command and relying
on io_cancel(2) to remove it, we mimic the epoll(2) interface of
having a command to add a poll notification, IORING_OP_POLL_ADD,
and one to remove it again, IORING_OP_POLL_REMOVE.
To poll for a file descriptor the application should submit an sqe of
type IORING_OP_POLL. It will poll the fd for the events specified in the
poll_events field.
Unlike poll or epoll without EPOLLONESHOT this interface always works in
one shot mode, that is once the sqe is completed, it will have to be
resubmitted.
Reviewed-by: Hannes Reinecke <hare@suse.com>
Based-on-code-from: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
We'll use this for the POLL implementation. Regular requests will
NOT be using references, so initialize it to 0. Any real use of
the io_kiocb ref will initialize it to at least 2.
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This enables an application to do IO, without ever entering the kernel.
By using the SQ ring to fill in new sqes and watching for completions
on the CQ ring, we can submit and reap IOs without doing a single system
call. The kernel side thread will poll for new submissions, and in case
of HIPRI/polled IO, it'll also poll for completions.
By default, we allow 1 second of active spinning. This can by changed
by passing in a different grace period at io_uring_register(2) time.
If the thread exceeds this idle time without having any work to do, it
will set:
sq_ring->flags |= IORING_SQ_NEED_WAKEUP.
The application will have to call io_uring_enter() to start things back
up again. If IO is kept busy, that will never be needed. Basically an
application that has this feature enabled will guard it's
io_uring_enter(2) call with:
read_barrier();
if (*sq_ring->flags & IORING_SQ_NEED_WAKEUP)
io_uring_enter(fd, 0, 0, IORING_ENTER_SQ_WAKEUP);
instead of calling it unconditionally.
It's mandatory to use fixed files with this feature. Failure to do so
will result in the application getting an -EBADF CQ entry when
submitting IO.
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
We normally have to fget/fput for each IO we do on a file. Even with
the batching we do, the cost of the atomic inc/dec of the file usage
count adds up.
This adds IORING_REGISTER_FILES, and IORING_UNREGISTER_FILES opcodes
for the io_uring_register(2) system call. The arguments passed in must
be an array of __s32 holding file descriptors, and nr_args should hold
the number of file descriptors the application wishes to pin for the
duration of the io_uring instance (or until IORING_UNREGISTER_FILES is
called).
When used, the application must set IOSQE_FIXED_FILE in the sqe->flags
member. Then, instead of setting sqe->fd to the real fd, it sets sqe->fd
to the index in the array passed in to IORING_REGISTER_FILES.
Files are automatically unregistered when the io_uring instance is torn
down. An application need only unregister if it wishes to register a new
set of fds.
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
We need this functionality for the io_uring file registration, but
we cannot rely on it since CONFIG_UNIX can be modular. Move the helpers
to a separate file, that's always builtin to the kernel if CONFIG_UNIX is
m/y.
No functional changes in this patch, just moving code around.
Reviewed-by: Hannes Reinecke <hare@suse.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
If we have fixed user buffers, we can map them into the kernel when we
setup the io_uring. That avoids the need to do get_user_pages() for
each and every IO.
To utilize this feature, the application must call io_uring_register()
after having setup an io_uring instance, passing in
IORING_REGISTER_BUFFERS as the opcode. The argument must be a pointer to
an iovec array, and the nr_args should contain how many iovecs the
application wishes to map.
If successful, these buffers are now mapped into the kernel, eligible
for IO. To use these fixed buffers, the application must use the
IORING_OP_READ_FIXED and IORING_OP_WRITE_FIXED opcodes, and then
set sqe->index to the desired buffer index. sqe->addr..sqe->addr+seq->len
must point to somewhere inside the indexed buffer.
The application may register buffers throughout the lifetime of the
io_uring instance. It can call io_uring_register() with
IORING_UNREGISTER_BUFFERS as the opcode to unregister the current set of
buffers, and then register a new set. The application need not
unregister buffers explicitly before shutting down the io_uring
instance.
It's perfectly valid to setup a larger buffer, and then sometimes only
use parts of it for an IO. As long as the range is within the originally
mapped region, it will work just fine.
For now, buffers must not be file backed. If file backed buffers are
passed in, the registration will fail with -1/EOPNOTSUPP. This
restriction may be relaxed in the future.
RLIMIT_MEMLOCK is used to check how much memory we can pin. A somewhat
arbitrary 1G per buffer size is also imposed.
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
For an ITER_BVEC, we can just iterate the iov and add the pages
to the bio directly. For now, we grab a reference to those pages,
and release them normally on IO completion. This isn't really needed
for the normal case of O_DIRECT from/to a file, but some of the more
esoteric use cases (like splice(2)) will unconditionally put the
pipe buffer pages when the buffers are released. Until we can manage
that case properly, ITER_BVEC pages are treated like normal pages
in terms of reference counting.
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Similarly to how we use the state->ios_left to know how many references
to get to a file, we can use it to allocate the io_kiocb's we need in
bulk.
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Add a separate io_submit_state structure, to cache some of the things
we need for IO submission.
One such example is file reference batching. io_submit_state. We get as
many references as the number of sqes we are submitting, and drop
unused ones if we end up switching files. The assumption here is that
we're usually only dealing with one fd, and if there are multiple,
hopefuly they are at least somewhat ordered. Could trivially be extended
to cover multiple fds, if needed.
On the completion side we do the same thing, except this is trivially
done just locally in io_iopoll_reap().
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Some uses cases repeatedly get and put references to the same file, but
the only exposed interface is doing these one at the time. As each of
these entail an atomic inc or dec on a shared structure, that cost can
add up.
Add fget_many(), which works just like fget(), except it takes an
argument for how many references to get on the file. Ditto fput_many(),
which can drop an arbitrary number of references to a file.
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Add support for a polled io_uring instance. When a read or write is
submitted to a polled io_uring, the application must poll for
completions on the CQ ring through io_uring_enter(2). Polled IO may not
generate IRQ completions, hence they need to be actively found by the
application itself.
To use polling, io_uring_setup() must be used with the
IORING_SETUP_IOPOLL flag being set. It is illegal to mix and match
polled and non-polled IO on an io_uring.
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Add a new fsync opcode, which either syncs a range if one is passed,
or the whole file if the offset and length fields are both cleared
to zero. A flag is provided to use fdatasync semantics, that is only
force out metadata which is required to retrieve the file data, but
not others like metadata.
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The submission queue (SQ) and completion queue (CQ) rings are shared
between the application and the kernel. This eliminates the need to
copy data back and forth to submit and complete IO.
IO submissions use the io_uring_sqe data structure, and completions
are generated in the form of io_uring_cqe data structures. The SQ
ring is an index into the io_uring_sqe array, which makes it possible
to submit a batch of IOs without them being contiguous in the ring.
The CQ ring is always contiguous, as completion events are inherently
unordered, and hence any io_uring_cqe entry can point back to an
arbitrary submission.
Two new system calls are added for this:
io_uring_setup(entries, params)
Sets up an io_uring instance for doing async IO. On success,
returns a file descriptor that the application can mmap to
gain access to the SQ ring, CQ ring, and io_uring_sqes.
io_uring_enter(fd, to_submit, min_complete, flags, sigset, sigsetsize)
Initiates IO against the rings mapped to this fd, or waits for
them to complete, or both. The behavior is controlled by the
parameters passed in. If 'to_submit' is non-zero, then we'll
try and submit new IO. If IORING_ENTER_GETEVENTS is set, the
kernel will wait for 'min_complete' events, if they aren't
already available. It's valid to set IORING_ENTER_GETEVENTS
and 'min_complete' == 0 at the same time, this allows the
kernel to return already completed events without waiting
for them. This is useful only for polling, as for IRQ
driven IO, the application can just check the CQ ring
without entering the kernel.
With this setup, it's possible to do async IO with a single system
call. Future developments will enable polled IO with this interface,
and polled submission as well. The latter will enable an application
to do IO without doing ANY system calls at all.
For IRQ driven IO, an application only needs to enter the kernel for
completions if it wants to wait for them to occur.
Each io_uring is backed by a workqueue, to support buffered async IO
as well. We will only punt to an async context if the command would
need to wait for IO on the device side. Any data that can be accessed
directly in the page cache is done inline. This avoids the slowness
issue of usual threadpools, since cached data is accessed as quickly
as a sync interface.
Sample application: http://git.kernel.dk/cgit/fio/plain/t/io_uring.c
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Pull block layer updates from Jens Axboe:
"Not a huge amount of changes in this round, the biggest one is that we
finally have Mings multi-page bvec support merged. Apart from that,
this pull request contains:
- Small series that avoids quiescing the queue for sysfs changes that
match what we currently have (Aleksei)
- Series of bcache fixes (via Coly)
- Series of lightnvm fixes (via Mathias)
- NVMe pull request from Christoph. Nothing major, just SPDX/license
cleanups, RR mp policy (Hannes), and little fixes (Bart,
Chaitanya).
- BFQ series (Paolo)
- Save blk-mq cpu -> hw queue mapping, removing a pointer indirection
for the fast path (Jianchao)
- fops->iopoll() added for async IO polling, this is a feature that
the upcoming io_uring interface will use (Christoph, me)
- Partition scan loop fixes (Dongli)
- mtip32xx conversion from managed resource API (Christoph)
- cdrom registration race fix (Guenter)
- MD pull from Song, two minor fixes.
- Various documentation fixes (Marcos)
- Multi-page bvec feature. This brings a lot of nice improvements
with it, like more efficient splitting, larger IOs can be supported
without growing the bvec table size, and so on. (Ming)
- Various little fixes to core and drivers"
* tag 'for-5.1/block-20190302' of git://git.kernel.dk/linux-block: (117 commits)
block: fix updating bio's front segment size
block: Replace function name in string with __func__
nbd: propagate genlmsg_reply return code
floppy: remove set but not used variable 'q'
null_blk: fix checking for REQ_FUA
block: fix NULL pointer dereference in register_disk
fs: fix guard_bio_eod to check for real EOD errors
blk-mq: use HCTX_TYPE_DEFAULT but not 0 to index blk_mq_tag_set->map
block: optimize bvec iteration in bvec_iter_advance
block: introduce mp_bvec_for_each_page() for iterating over page
block: optimize blk_bio_segment_split for single-page bvec
block: optimize __blk_segment_map_sg() for single-page bvec
block: introduce bvec_nth_page()
iomap: wire up the iopoll method
block: add bio_set_polled() helper
block: wire up block device iopoll method
fs: add an iopoll method to struct file_operations
loop: set GENHD_FL_NO_PART_SCAN after blkdev_reread_part()
loop: do not print warn message if partition scan is successful
block: bounce: make sure that bvec table is updated
...
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
When the current bvec can be merged to the 1st segment, the bio's front
segment size has to be updated.
However, dcebd755926b doesn't consider that case, then bio's front
segment size may not be correct.
This patch fixes this issue.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Omar Sandoval <osandov@fb.com>
Fixes: dcebd755926b ("block: use bio_for_each_bvec() to compute multi-page bvec count")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Replace hard coded function name register_blkdev with __func__, to
improve robustness and to conform to the Linux kernel coding
style. Issue found using checkpatch.
Signed-off-by: Keyur Patel <iamkeyur96@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|