| Commit message (Collapse) | Author | Files | Lines |
|
We encountered workloads that have very long wake up list on large
systems. A waker takes a long time to traverse the entire wake list and
execute all the wake functions.
We saw page wait list that are up to 3700+ entries long in tests of
large 4 and 8 socket systems. It took 0.8 sec to traverse such list
during wake up. Any other CPU that contends for the list spin lock will
spin for a long time. It is a result of the numa balancing migration of
hot pages that are shared by many threads.
Multiple CPUs waking are queued up behind the lock, and the last one
queued has to wait until all CPUs did all the wakeups.
The page wait list is traversed with interrupt disabled, which caused
various problems. This was the original cause that triggered the NMI
watch dog timer in: https://patchwork.kernel.org/patch/9800303/ . Only
extending the NMI watch dog timer there helped.
This patch bookmarks the waker's scan position in wake list and break
the wake up walk, to allow access to the list before the waker resume
its walk down the rest of the wait list. It lowers the interrupt and
rescheduling latency.
This patch also provides a performance boost when combined with the next
patch to break up page wakeup list walk. We saw 22% improvement in the
will-it-scale file pread2 test on a Xeon Phi system running 256 threads.
[ v2: Merged in Linus' changes to remove the bookmark_wake_function, and
simply access to flags. ]
Reported-by: Kan Liang <kan.liang@intel.com>
Tested-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
... and __initconst if applicable.
Based on similar work for an older kernel in the Grsecurity patch.
[JD: fix toshiba-wmi build]
[JD: add htcpen]
[JD: move __initconst where checkscript wants it]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jean Delvare <jdelvare@suse.de>
|
|
The page_owner stacktrace always begin as follows:
[<ffffff987bfd48f4>] save_stack+0x40/0xc8
[<ffffff987bfd4da8>] __set_page_owner+0x3c/0x6c
These two entries do not provide any useful information and limits the
available stacktrace depth. The page_owner stacktrace was skipping
caller function from stack entries but this was missed with commit
f2ca0b557107 ("mm/page_owner: use stackdepot to store stacktrace")
Example page_owner entry after the patch:
Page allocated via order 0, mask 0x8(ffffff80085fb714)
PFN 654411 type Movable Block 639 type CMA Flags 0x0(ffffffbe5c7f12c0)
[<ffffff9b64989c14>] post_alloc_hook+0x70/0x80
...
[<ffffff9b651216e8>] msm_comm_try_state+0x5f8/0x14f4
[<ffffff9b6512486c>] msm_vidc_open+0x5e4/0x7d0
[<ffffff9b65113674>] msm_v4l2_open+0xa8/0x224
Link: http://lkml.kernel.org/r/1504078343-28754-2-git-send-email-guptap@codeaurora.org
Fixes: f2ca0b557107 ("mm/page_owner: use stackdepot to store stacktrace")
Signed-off-by: Prakash Gupta <guptap@codeaurora.org>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The stacktraces always begin as follows:
[<c00117b4>] save_stack_trace_tsk+0x0/0x98
[<c0011870>] save_stack_trace+0x24/0x28
...
This is because the stack trace code includes the stack frames for
itself. This is incorrect behaviour, and also leads to "skip" doing the
wrong thing (which is the number of stack frames to avoid recording.)
Perversely, it does the right thing when passed a non-current thread.
Fix this by ensuring that we have a known constant number of frames
above the main stack trace function, and always skip these.
This was fixed for arch arm by commit 3683f44c42e9 ("ARM: stacktrace:
avoid listing stacktrace functions in stacktrace")
Link: http://lkml.kernel.org/r/1504078343-28754-1-git-send-email-guptap@codeaurora.org
Signed-off-by: Prakash Gupta <guptap@codeaurora.org>
Cc: Russell King <rmk+kernel@arm.linux.org.uk>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
GFP_TEMPORARY was introduced by commit e12ba74d8ff3 ("Group short-lived
and reclaimable kernel allocations") along with __GFP_RECLAIMABLE. It's
primary motivation was to allow users to tell that an allocation is
short lived and so the allocator can try to place such allocations close
together and prevent long term fragmentation. As much as this sounds
like a reasonable semantic it becomes much less clear when to use the
highlevel GFP_TEMPORARY allocation flag. How long is temporary? Can the
context holding that memory sleep? Can it take locks? It seems there is
no good answer for those questions.
The current implementation of GFP_TEMPORARY is basically GFP_KERNEL |
__GFP_RECLAIMABLE which in itself is tricky because basically none of
the existing caller provide a way to reclaim the allocated memory. So
this is rather misleading and hard to evaluate for any benefits.
I have checked some random users and none of them has added the flag
with a specific justification. I suspect most of them just copied from
other existing users and others just thought it might be a good idea to
use without any measuring. This suggests that GFP_TEMPORARY just
motivates for cargo cult usage without any reasoning.
I believe that our gfp flags are quite complex already and especially
those with highlevel semantic should be clearly defined to prevent from
confusion and abuse. Therefore I propose dropping GFP_TEMPORARY and
replace all existing users to simply use GFP_KERNEL. Please note that
SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and
so they will be placed properly for memory fragmentation prevention.
I can see reasons we might want some gfp flag to reflect shorterm
allocations but I propose starting from a clear semantic definition and
only then add users with proper justification.
This was been brought up before LSF this year by Matthew [1] and it
turned out that GFP_TEMPORARY really doesn't have a clear semantic. It
seems to be a heuristic without any measured advantage for most (if not
all) its current users. The follow up discussion has revealed that
opinions on what might be temporary allocation differ a lot between
developers. So rather than trying to tweak existing users into a
semantic which they haven't expected I propose to simply remove the flag
and start from scratch if we really need a semantic for short term
allocations.
[1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org
[akpm@linux-foundation.org: fix typo]
[akpm@linux-foundation.org: coding-style fixes]
[sfr@canb.auug.org.au: drm/i915: fix up]
Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au
Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.org
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Neil Brown <neilb@suse.de>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
gcc-7 points out that a negative port_num value would overflow the
string buffer:
drivers/infiniband/hw/mlx4/sysfs.c: In function 'mlx4_ib_device_register_sysfs':
drivers/infiniband/hw/mlx4/sysfs.c:251:16: error: 'sprintf' may write a terminating nul past the end of the destination [-Werror=format-overflow=]
drivers/infiniband/hw/mlx4/sysfs.c:251:2: note: 'sprintf' output between 2 and 11 bytes into a destination of size 10
drivers/infiniband/hw/mlx4/sysfs.c:303:17: error: 'sprintf' may write a terminating nul past the end of the destination [-Werror=format-overflow=]
drivers/infiniband/hw/mlx4/sysfs.c:303:3: note: 'sprintf' output between 2 and 11 bytes into a destination of size 10
While we should be able to assume that port_num is positive here, making
the buffer one byte longer has no downsides and avoids the warning.
Fixes: c1e7e466120b ("IB/mlx4: Add iov directory in sysfs under the ib device")
Link: http://lkml.kernel.org/r/20170714120720.906842-23-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
gcc points out a minor bug in the handling of unknown cookie types,
which could result in a string overflow when the integer is copied into
a 3-byte string:
fs/fscache/object-list.c: In function 'fscache_objlist_show':
fs/fscache/object-list.c:265:19: error: 'sprintf' may write a terminating nul past the end of the destination [-Werror=format-overflow=]
sprintf(_type, "%02u", cookie->def->type);
^~~~~~
fs/fscache/object-list.c:265:4: note: 'sprintf' output between 3 and 4 bytes into a destination of size 3
This is currently harmless as no code sets a type other than 0 or 1, but
it makes sense to use snprintf() here to avoid overflowing the array if
that changes.
Link: http://lkml.kernel.org/r/20170714120720.906842-22-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
With gcc 4.1.2:
lib/test_bitmap.c:189: warning: integer constant is too large for `long' type
lib/test_bitmap.c:190: warning: integer constant is too large for `long' type
lib/test_bitmap.c:194: warning: integer constant is too large for `long' type
lib/test_bitmap.c:195: warning: integer constant is too large for `long' type
Add the missing "ULL" suffix to fix this.
Link: http://lkml.kernel.org/r/1505040523-31230-1-git-send-email-geert@linux-m68k.org
Fixes: 60ef690018b262dd ("bitmap: introduce BITMAP_FROM_U64()")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Acked-by: Yury Norov <ynorov@caviumnetworks.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
In NOMMU configurations, we get a warning about a variable that has become
unused:
fs/proc/task_nommu.c: In function 'nommu_vma_show':
fs/proc/task_nommu.c:148:28: error: unused variable 'priv' [-Werror=unused-variable]
Link: http://lkml.kernel.org/r/20170911200231.3171415-1-arnd@arndb.de
Fixes: 1240ea0dc3bb ("fs, proc: remove priv argument from is_stack")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
gcc-4.4.4 has issues with initialization of anonymous unions:
drivers/media/cec/cec-adap.c: In function 'cec_queue_msg_fh':
drivers/media/cec/cec-adap.c:184: error: unknown field 'lost_msgs' specified in initializer
work around this.
Fixes: 6b2bbb08747a5 ("media: cec: rework the cec event handling")
Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: Hans Verkuil <hans.verkuil@cisco.com>
Cc: Maxime Ripard <maxime.ripard@free-electrons.com>
Cc: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
IDR only supports non-negative IDs. There used to be a 'WARN_ON_ONCE(id <
0)' in idr_replace(), but it was intentionally removed by commit
2e1c9b286765 ("idr: remove WARN_ON_ONCE() on negative IDs").
Then it was added back by commit 0a835c4f090a ("Reimplement IDR and IDA
using the radix tree"). However it seems that adding it back was a
mistake, given that some users such as drm_gem_handle_delete()
(DRM_IOCTL_GEM_CLOSE) pass in a value from userspace to idr_replace(),
allowing the WARN_ON_ONCE to be triggered. drm_gem_handle_delete()
actually just wants idr_replace() to return an error code if the ID is
not allocated, including in the case where the ID is invalid (negative).
So once again remove the bogus WARN_ON_ONCE().
This bug was found by syzkaller, which encountered the following
warning:
WARNING: CPU: 3 PID: 3008 at lib/idr.c:157 idr_replace+0x1d8/0x240 lib/idr.c:157
Kernel panic - not syncing: panic_on_warn set ...
CPU: 3 PID: 3008 Comm: syzkaller218828 Not tainted 4.13.0-rc4-next-20170811 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Call Trace:
fixup_bug+0x40/0x90 arch/x86/kernel/traps.c:190
do_trap_no_signal arch/x86/kernel/traps.c:224 [inline]
do_trap+0x260/0x390 arch/x86/kernel/traps.c:273
do_error_trap+0x120/0x390 arch/x86/kernel/traps.c:310
do_invalid_op+0x1b/0x20 arch/x86/kernel/traps.c:323
invalid_op+0x1e/0x30 arch/x86/entry/entry_64.S:930
RIP: 0010:idr_replace+0x1d8/0x240 lib/idr.c:157
RSP: 0018:ffff8800394bf9f8 EFLAGS: 00010297
RAX: ffff88003c6c60c0 RBX: 1ffff10007297f43 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800394bfa78
RBP: ffff8800394bfae0 R08: ffffffff82856487 R09: 0000000000000000
R10: ffff8800394bf9a8 R11: ffff88006c8bae28 R12: ffffffffffffffff
R13: ffff8800394bfab8 R14: dffffc0000000000 R15: ffff8800394bfbc8
drm_gem_handle_delete+0x33/0xa0 drivers/gpu/drm/drm_gem.c:297
drm_gem_close_ioctl+0xa1/0xe0 drivers/gpu/drm/drm_gem.c:671
drm_ioctl_kernel+0x1e7/0x2e0 drivers/gpu/drm/drm_ioctl.c:729
drm_ioctl+0x72e/0xa50 drivers/gpu/drm/drm_ioctl.c:825
vfs_ioctl fs/ioctl.c:45 [inline]
do_vfs_ioctl+0x1b1/0x1520 fs/ioctl.c:685
SYSC_ioctl fs/ioctl.c:700 [inline]
SyS_ioctl+0x8f/0xc0 fs/ioctl.c:691
entry_SYSCALL_64_fastpath+0x1f/0xbe
Here is a C reproducer:
#include <fcntl.h>
#include <stddef.h>
#include <stdint.h>
#include <sys/ioctl.h>
#include <drm/drm.h>
int main(void)
{
int cardfd = open("/dev/dri/card0", O_RDONLY);
ioctl(cardfd, DRM_IOCTL_GEM_CLOSE,
&(struct drm_gem_close) { .handle = -1 } );
}
Link: http://lkml.kernel.org/r/20170906235306.20534-1-ebiggers3@gmail.com
Fixes: 0a835c4f090a ("Reimplement IDR and IDA using the radix tree")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Matthew Wilcox <mawilcox@microsoft.com>
Cc: <stable@vger.kernel.org> [v4.11+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Another merge window, another MAINTAINERS file disaster.
People have serious problems with the alphabet and sorting, and poor
Jérôme Glisse and Radim Krčmář get their names mangled by locale issues,
turning them into some mangled mess (probably others do too, but those
two stood out when sorting things again).
And we now have two copies of the same 'AS3645A LED FLASH CONTROLLER
DRIVER' in the tree and in the MAINTAINERS file, but that's a separate
issue - the duplication is real, and I left them as two entries for the
same name.
This does not try to sort the actual section pattern entries, although I
may end up doing that later.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commits:
7dcf90e9e032 ("PCI: hv: Use vPCI protocol version 1.2")
628f54cc6451 ("x86/hyper-v: Support extended CPU ranges for TLB flush hypercalls")
added the same definition and they came in through different trees.
Fix the duplication.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: K. Y. Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: devel@linuxdriverproject.org
Link: http://lkml.kernel.org/r/20170911150620.3998-1-vkuznets@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Allocate the hypervisor callback IDT entry early in the boot sequence.
The previous code would allocate the entry as part of registering the handler
when the vmbus driver loaded, and this caused a problem for the IDT cleanup
that Thomas is working on for v4.15.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: apw@canonical.com
Cc: devel@linuxdriverproject.org
Cc: gregkh@linuxfoundation.org
Cc: jasowang@redhat.com
Cc: olaf@aepfle.de
Link: http://lkml.kernel.org/r/20170908231557.2419-1-kys@exchange.microsoft.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Jeremy Fitzhardinge is stepping down as a paravirt maintainer. I'll
replace him.
While at it, update the file list to the actual pattern.
Signed-off-by: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akataria@vmware.com
Cc: chrisw@sous-sol.org
Cc: jeremy@goop.org
Cc: rusty@rustcorp.com.au
Cc: virtualization@lists.linux-foundation.org
Link: http://lkml.kernel.org/r/20170905143407.9227-1-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
With removal of lguest some of the paravirt functions are no longer
needed:
->read_cr4()
->store_idt()
->set_pmd_at()
->set_pud_at()
->pte_update()
Remove them.
Signed-off-by: Juergen Gross <jgross@suse.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: akataria@vmware.com
Cc: boris.ostrovsky@oracle.com
Cc: chrisw@sous-sol.org
Cc: jeremy@goop.org
Cc: rusty@rustcorp.com.au
Cc: virtualization@lists.linux-foundation.org
Cc: xen-devel@lists.xenproject.org
Link: http://lkml.kernel.org/r/20170904102527.25409-1-jgross@suse.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
cpu_init() is weird: it's called rather late (after early
identification and after most MMU state is initialized) on the boot
CPU but is called extremely early (before identification) on secondary
CPUs. It's called just late enough on the boot CPU that its CR4 value
isn't propagated to mmu_cr4_features.
Even if we put CR4.PCIDE into mmu_cr4_features, we'd hit two
problems. First, we'd crash in the trampoline code. That's
fixable, and I tried that. It turns out that mmu_cr4_features is
totally ignored by secondary_start_64(), though, so even with the
trampoline code fixed, it wouldn't help.
This means that we don't currently have CR4.PCIDE reliably initialized
before we start playing with cpu_tlbstate. This is very fragile and
tends to cause boot failures if I make even small changes to the TLB
handling code.
Make it more robust: initialize CR4.PCIDE earlier on the boot CPU
and propagate it to secondary CPUs in start_secondary().
( Yes, this is ugly. I think we should have improved mmu_cr4_features
to actually control CR4 during secondary bootup, but that would be
fairly intrusive at this stage. )
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Reported-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Tested-by: Sai Praneeth Prakhya <sai.praneeth.prakhya@intel.com>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Fixes: 660da7c9228f ("x86/mm: Enable CR4.PCIDE on supported systems")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Jiri reported a resume-from-hibernation failure triggered by PCID.
The root cause appears to be rather odd. The hibernation asm
restores a CR3 value that comes from the image header. If the image
kernel has PCID on, it's entirely reasonable for this CR3 value to
have one of the low 12 bits set. The restore code restores it with
CR4.PCIDE=0, which means that those low 12 bits are accepted by the
CPU but are either ignored or interpreted as a caching mode. This
is odd, but still works. We blow up later when the image kernel
restores CR4, though, since changing CR4.PCIDE with CR3[11:0] != 0
is illegal. Boom!
FWIW, it's entirely unclear to me what's supposed to happen if a PAE
kernel restores a non-PAE image or vice versa. Ditto for LA57.
Reported-by: Jiri Kosina <jikos@kernel.org>
Tested-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 660da7c9228f ("x86/mm: Enable CR4.PCIDE on supported systems")
Link: http://lkml.kernel.org/r/18ca57090651a6341e97083883f9e814c4f14684.1504847163.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
If we hit the VM_BUG_ON(), we're detecting a genuinely bad situation,
but we're very unlikely to get a useful call trace.
Make it a warning instead.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bpetkov@suse.de>
Cc: Jiri Kosina <jikos@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/3b4e06bbb382ca54a93218407c93925ff5871546.1504847163.git.luto@kernel.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
If using a kernel with CONFIG_XFS_RT=y and we set the RHINHERIT flag on
a directory in a filesystem that does not have a realtime device and
create a new file in that directory, it gets marked as a real time file.
When data is written and a fsync is issued, the filesystem attempts to
flush a non-existent rt device during the fsync process.
This results in a crash dereferencing a null buftarg pointer in
xfs_blkdev_issue_flush():
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
IP: xfs_blkdev_issue_flush+0xd/0x20
.....
Call Trace:
xfs_file_fsync+0x188/0x1c0
vfs_fsync_range+0x3b/0xa0
do_fsync+0x3d/0x70
SyS_fsync+0x10/0x20
do_syscall_64+0x4d/0xb0
entry_SYSCALL64_slow_path+0x25/0x25
Setting RT inode flags does not require special privileges so any
unprivileged user can cause this oops to occur. To reproduce, confirm
kernel is compiled with CONFIG_XFS_RT=y and run:
# mkfs.xfs -f /dev/pmem0
# mount /dev/pmem0 /mnt/test
# mkdir /mnt/test/foo
# xfs_io -c 'chattr +t' /mnt/test/foo
# xfs_io -f -c 'pwrite 0 5m' -c fsync /mnt/test/foo/bar
Or just run xfstests with MKFS_OPTIONS="-d rtinherit=1" and wait.
Kernels built with CONFIG_XFS_RT=n are not exposed to this bug.
Fixes: f538d4da8d52 ("[XFS] write barrier support")
Cc: <stable@vger.kernel.org>
Signed-off-by: Richard Wareing <rwareing@fb.com>
Signed-off-by: Dave Chinner <david@fromorbit.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Once we encounter I/O interruption during issuing discards, we will delay
long time before next round, but if system status is I/O idle during the
time, it may loses opportunity to issue discards. So this patch changes
to hurry up to issue discard after io interruption.
Besides, this patch also fixes to issue discards accurately with assigned
rate.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Fix below incorrect display when reading discard_granularity sysfs node.
$ cat /sys/fs/f2fs/<device>/discard_granularity
$ 16
$ echo 32 > /sys/fs/f2fs/<device>/discard_granularity
$ cat /sys/fs/f2fs/<device>/discard_granularity
$ 16
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
Add a bugon in f2fs_evict_inode to detect inconsistent status between
inode cache and related node page cache.
Signed-off-by: Chao Yu <yuchao0@huawei.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
When packaging the perf userland application into an AppImage, the
wait() call in perf stat returned too early. It turned out that some
other child process exited, but not the one perf stat launched:
$ sudo strace -e fork,execve,clone,wait4 -f ./perf-x86_64.AppImage stat sleep 1
execve("./perf-git.3a73b7f9-x86_64.AppImage", ["./perf-git.3a73b7f9-x86_64.AppIm"..., "stat", "sleep", "1"], 0x7ffec1bbf050 /* 18 vars */) = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f6a6e7efe50) = 3912
strace: Process 3912 attached
[pid 3912] clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f6a6e7efe50) = 3914
strace: Process 3914 attached
[pid 3912] +++ exited with 0 +++
[pid 3911] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=3912, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
[pid 3914] clone(strace: Process 3915 attached
child_stack=0x7f6a6d9fefb0, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|CLONE_CHILD_CLEARTID, parent_tidptr=0x7f6a6d9ff9d0, tls=0x7f6a6d9ff700, child_tidptr=0x7f6a6d9ff9d0) = 3915
[pid 3911] execve("/tmp/.mount_perf-g6VYMpl/AppRun", ["./perf-git.3a73b7f9-x86_64.AppIm"..., "stat", "sleep", "1"], 0x14aab70 /* 21 vars */) = 0
[pid 3911] clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7f4ae113c4d0) = 3916
strace: Process 3916 attached
[pid 3911] wait4(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 3912
[pid 3916] execve("/usr/libexec/perf-core/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
[pid 3916] execve("/tmp/./sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
[pid 3916] execve("/home/milian/.bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
[pid 3916] execve("/usr/lib/icecream/libexec/icecc/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
[pid 3916] execve("/ssd2/milian/projects/compiled/other/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
[pid 3916] execve("/home/milian/.bin/kf5/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
[pid 3916] execve("/ssd2/milian/projects/compiled/kf5/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
[pid 3916] execve("/home/milian/projects/compiled/other/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
[pid 3916] execve("/home/milian/projects/compiled/kf5/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
[pid 3916] execve("/usr/local/sbin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
[pid 3916] execve("/usr/local/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */) = -1 ENOENT (No such file or directory)
[pid 3916] execve("/usr/bin/sleep", ["sleep", "1"], 0x27d3650 /* 22 vars */
Performance counter stats for 'sleep 1':
<not counted> task-clock
<not counted> context-switches
<not counted> cpu-migrations
<not counted> page-faults
<not counted> cycles
<not counted> instructions
<not counted> branches
<not counted> branch-misses
0.000047194 seconds time elapsed
[pid 3916] --- SIGTERM {si_signo=SIGTERM, si_code=SI_USER, si_pid=3911, si_uid=0} ---
[pid 3916] +++ killed by SIGTERM +++
[pid 3911] --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=3916, si_uid=0, si_status=SIGTERM, si_utime=0, si_stime=0} ---
[pid 3915] --- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=3914, si_uid=0} ---
[pid 3911] +++ exited with 0 +++
[pid 3915] --- SIGHUP {si_signo=SIGHUP, si_code=SI_USER, si_pid=3914, si_uid=0} ---
[pid 3915] +++ exited with 0 +++
+++ exited with 0 +++
This patch uses waitpid instead to ensure the call waits for the
debuggee application launched by 'perf stat'. This fixes 'perf stat'
when launched from an AppImage:
$ ./perf-x86_64.AppImage stat sleep 1
Performance counter stats for 'sleep 1':
0.357235 task-clock (msec) # 0.000 CPUs utilized
1 context-switches # 0.003 M/sec
0 cpu-migrations # 0.000 K/sec
50 page-faults # 0.140 M/sec
1269602 cycles # 3.554 GHz
654278 instructions # 0.52 insn per cycle
129963 branches # 363.803 M/sec
7082 branch-misses # 5.45% of all branches
1.000633420 seconds time elapsed
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20170912152523.4497-1-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Previously the part behind "perf-" was interpreted as an internal perf
command. If the suffix could not be handled, the execution was stopped.
This makes it impossible to launch perf binaries that got renamed to
have the `perf-` prefix. This is e.g. the case for appimages (e.g.
"perf-x86_64.AppImage"), but would also apply to all other scenarios
where users symlink or rename perf themselves:
Status quo with the broken behavior:
$ ln -s ./perf ./perf-custom-suffix
$ ./perf-custom-suffix list
cannot handle custom-suffix internally$
Also note the missing newline at the end of the error message.
With this patch applied, the above works properly:
$ ./perf-custom-suffix list
List of pre-defined events (to be used in -e):
...
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Acked-by: David Ahern <dsahern@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20170911111422.31903-1-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
I'm forever late for editing my kernel cmdline, add a runtime knob to
disable the "sched_debug" thing.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170907150614.142924283@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Migrating tasks to offline CPUs is a pretty big fail, warn about it.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20170907150614.094206976@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
The load balancer applies cpu_active_mask to whatever sched_domains it
finds, however in the case of active_balance there is a hole between
setting rq->{active_balance,push_cpu} and running the stop_machine
work doing the actual migration.
The @push_cpu can go offline in this window, which would result in us
moving a task onto a dead cpu, which is a fairly bad thing.
Double check the active mask before the stop work does the migration.
CPU0 CPU1
<SoftIRQ>
stop_machine(takedown_cpu)
load_balance() cpu_stopper_thread()
... work = multi_cpu_stop
stop_one_cpu_nowait( /* wait for CPU0 */
.func = active_load_balance_cpu_stop
);
</SoftIRQ>
cpu_stopper_thread()
work = multi_cpu_stop
/* sync with CPU1 */
take_cpu_down()
<idle>
play_dead();
work = active_load_balance_cpu_stop
set_task_cpu(p, CPU1); /* oops!! */
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170907150614.044460912@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
On CPU hot unplug, when parking the last kthread we'll try and
schedule into idle to kill the CPU. This last schedule can (and does)
trigger newidle balance because at this point the sched domains are
still up because of commit:
77d1dfda0e79 ("sched/topology, cpuset: Avoid spurious/wrong domain rebuilds")
Obviously pulling tasks to an already offline CPU is a bad idea, and
all balancing operations _should_ be subject to cpu_active_mask, make
it so.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: 77d1dfda0e79 ("sched/topology, cpuset: Avoid spurious/wrong domain rebuilds")
Link: http://lkml.kernel.org/r/20170907150613.994135806@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Currently section->from_system_config is being checked multiple times.
item->from_system_config should be checked instead, when iterating thru
the items in a section. Fix it.
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1504754325-9724-1-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
We currently update the 'next' variable only with a single step value.
But it's possible the 'adv' update is bigger than single 'step' value.
This would leave 'next' value under counted and force unnecessary
ui_progress__ops->update calls.
Calculate the amount of steps we need for 'adv' update and increase the
'next' with that amounts of steps.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20170908120510.22515-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Unlikely, but we could have ui_progress__init being called with total <
16, which would set the next and step variables to 0. That would force
unnecessary ui_progress__ops->update calls because 'next' would never
raise.
Forcing the next and step values to be always > 0.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20170908120510.22515-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Do not carry the perf.data file descriptor into the workload process and
close it when perf executes the workload.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20170908084621.31595-2-jolsa@kernel.org
[ Add definitions for O_CLOEXEC for older systems ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Do not use -D_FORTIFY_SOURCE=2 for DEBUG build as it seems to mess up
with debuginfo, which results in bad gdb experience.
We already do that for tools/perf/.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20170908084621.31595-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
When cross compiling perf and I want to link against a self-compiled
libunwind, I usually make the custom path where the libunwind headers
exist visible by adding the libunwind prefix to the include path when
compiling perf, i.e.:
~~~~~
$ ls $HOME/projects/compiled/other/include/
libunwind-coredump.h libunwind.h libunwind-x86_64.h
libunwind-common.h libunwind-dynamic.h libunwind-ptrace.h
unwind.h
$ make EXTRA_CFLAGS="-I$HOME/projects/compiled/other/include/
~~~~~~
Note the `unwind.h` header from libunwind which leads to compile
errors when compiling tests/dwarf-unwind.c, since it shadows perf's
util/unwind.h:
~~~~~
tests/dwarf-unwind.c:41:32: error: ‘struct unwind_entry’ declared inside parameter list will not be visible outside of this definition or declaration [-Werror]
static int unwind_entry(struct unwind_entry *entry, void *arg)
^~~~~~~~~~~~
tests/dwarf-unwind.c: In function ‘unwind_entry’:
tests/dwarf-unwind.c:44:22: error: dereferencing pointer to incomplete type ‘struct unwind_entry’
char *symbol = entry->sym ? entry->sym->name : NULL;
^~
tests/dwarf-unwind.c: In function ‘unwind_thread’:
tests/dwarf-unwind.c:92:8: error: implicit declaration of function ‘unwind__get_entries’; did you mean ‘unwind_entry’? [-Werror=implicit-function-declaration]
err = unwind__get_entries(unwind_entry, &cnt, thread,
^~~~~~~~~~~~~~~~~~~
unwind_entry
tests/dwarf-unwind.c:92:8: error: nested extern declaration of ‘unwind__get_entries’ [-Werror=nested-externs]
~~~~~~
Fix this compile error by specificing an explicit include of perf's
unwind.h in the util folder.
Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20170906150209.12579-1-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
When cross building to android r15c (and older versions) on Fedora 26
we notice these:
/opt/android-ndk-r15c/platforms/android-24/arch-arm/usr/include/sys/cdefs.h:332:0: note: this is the location of the previous definition
For __aligned, __packed and __noreturn, so guard those with ifdefs to
avoid drowning useful warnings in these.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-d7w3fa9c22dtmrwbedos6ie1@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
|
|
Commit b9ac5c274b8c ("ovl: hash overlay non-dir inodes by copy up origin")
verifies that the origin lower inode stored in the overlayfs inode matched
the inode of a copy up origin dentry found by lookup.
There is a false positive result in that check when lower fs does not
support file handles and copy up origin cannot be followed by file handle
at lookup time.
The false negative happens when finding an overlay inode in cache on a
copied up overlay dentry lookup. The overlay inode still 'remembers' the
copy up origin inode, but the copy up origin dentry is not available for
verification.
Relax the check in case copy up origin dentry is not available.
Fixes: b9ac5c274b8c ("ovl: hash overlay non-dir inodes by copy up...")
Cc: <stable@vger.kernel.org> # v4.13
Reported-by: Jordi Pujol <jordipujolp@gmail.com>
Signed-off-by: Amir Goldstein <amir73il@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
The previous commit spotted that "Tarball successfully created ..."
is displayed even if the "tar" command returns error code because
it is followed by "| ${compress}".
Let the build fail instead of printing the successful message since
if the "tar" command fails, the output may not be what users expect.
Avoid the use of the pipe. While we are here, refactor the script
removing the use of sub-shell, ${compress}, ${file_ext}.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
$tmpdir/lib is created by "make modules_install". It does not exist
if CONFIG_MODULES is disabled, then tar reports the following messages:
tar: lib: Cannot stat: No such file or directory
tar: Exiting with failure status due to previous errors
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
|
|
The refreshed argument isn't used by any caller, get rid of it.
Use a helper for just updating the inode (no need to fill in a kstat).
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
If the IOCB_DSYNC flag is set a sync is not being performed by
fuse_file_write_iter.
Honor IOCB_DSYNC/IOCB_SYNC by setting O_DYSNC/O_SYNC respectively in the
flags filed of the write request.
We don't need to sync data or metadata, since fuse_perform_write() does
write-through and the filesystem is responsible for updating file times.
Original patch by Vitaly Zolotusky.
Reported-by: Nate Clark <nate@neworld.us>
Cc: Vitaly Zolotusky <vitaly@unitc.com>.
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|
Commit 0b6e9ea041e6 ("fuse: Add support for pid namespaces") broke
Sandstorm.io development tools, which have been sending FUSE file
descriptors across PID namespace boundaries since early 2014.
The above patch added a check that prevented I/O on the fuse device file
descriptor if the pid namespace of the reader/writer was different from the
pid namespace of the mounter. With this change passing the device file
descriptor to a different pid namespace simply doesn't work. The check was
added because pids are transferred to/from the fuse userspace server in the
namespace registered at mount time.
To fix this regression, remove the checks and do the following:
1) the pid in the request header (the pid of the task that initiated the
filesystem operation) is translated to the reader's pid namespace. If a
mapping doesn't exist for this pid, then a zero pid is used. Note: even if
a mapping would exist between the initiator task's pid namespace and the
reader's pid namespace the pid will be zero if either mapping from
initator's to mounter's namespace or mapping from mounter's to reader's
namespace doesn't exist.
2) The lk.pid value in setlk/setlkw requests and getlk reply is left alone.
Userspace should not interpret this value anyway. Also allow the
setlk/setlkw operations if the pid of the task cannot be represented in the
mounter's namespace (pid being zero in that case).
Reported-by: Kenton Varda <kenton@sandstorm.io>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Fixes: 0b6e9ea041e6 ("fuse: Add support for pid namespaces")
Cc: <stable@vger.kernel.org> # v4.12+
Cc: Eric W. Biederman <ebiederm@xmission.com>
Cc: Seth Forshee <seth.forshee@canonical.com>
|
|
The touchpad in the Asus laptop models X505BA/BP and X542BA/BP is
unresponsive after suspend/resume. The following error appears during
resume:
i2c_hid i2c-ELAN1300:00: failed to reset device.
The problem here is that i2c_hid does not notice the interrupt being
generated at this point, because the GPIO is no longer configured
for interrupts.
Fix this by saving pinctrl-amd pin registers during suspend and
restoring them at resume time.
Based on code from pinctrl-intel.
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Drake <drake@endlessm.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
ALSA sequencer core has a mechanism to load the enumerated devices
automatically, and it's performed in an off-load work. This seems
causing some race when a sequencer is removed while the pending
autoload work is running. As syzkaller spotted, it may lead to some
use-after-free:
BUG: KASAN: use-after-free in snd_rawmidi_dev_seq_free+0x69/0x70
sound/core/rawmidi.c:1617
Write of size 8 at addr ffff88006c611d90 by task kworker/2:1/567
CPU: 2 PID: 567 Comm: kworker/2:1 Not tainted 4.13.0+ #29
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Workqueue: events autoload_drivers
Call Trace:
__dump_stack lib/dump_stack.c:16 [inline]
dump_stack+0x192/0x22c lib/dump_stack.c:52
print_address_description+0x78/0x280 mm/kasan/report.c:252
kasan_report_error mm/kasan/report.c:351 [inline]
kasan_report+0x230/0x340 mm/kasan/report.c:409
__asan_report_store8_noabort+0x1c/0x20 mm/kasan/report.c:435
snd_rawmidi_dev_seq_free+0x69/0x70 sound/core/rawmidi.c:1617
snd_seq_dev_release+0x4f/0x70 sound/core/seq_device.c:192
device_release+0x13f/0x210 drivers/base/core.c:814
kobject_cleanup lib/kobject.c:648 [inline]
kobject_release lib/kobject.c:677 [inline]
kref_put include/linux/kref.h:70 [inline]
kobject_put+0x145/0x240 lib/kobject.c:694
put_device+0x25/0x30 drivers/base/core.c:1799
klist_devices_put+0x36/0x40 drivers/base/bus.c:827
klist_next+0x264/0x4a0 lib/klist.c:403
next_device drivers/base/bus.c:270 [inline]
bus_for_each_dev+0x17e/0x210 drivers/base/bus.c:312
autoload_drivers+0x3b/0x50 sound/core/seq_device.c:117
process_one_work+0x9fb/0x1570 kernel/workqueue.c:2097
worker_thread+0x1e4/0x1350 kernel/workqueue.c:2231
kthread+0x324/0x3f0 kernel/kthread.c:231
ret_from_fork+0x25/0x30 arch/x86/entry/entry_64.S:425
The fix is simply to assure canceling the autoload work at removing
the device.
Reported-by: Andrey Konovalov <andreyknvl@google.com>
Tested-by: Andrey Konovalov <andreyknvl@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|
|
Since commit dc749a09ea5e ("gpiolib: allow gpio irqchip to map irqs
dynamically"), the irqs for gpio are not statically allocated during in
gpiochip_irqchip_add.
This driver was based on this assumption for initializing the mask
associated to each interrupt this led to a NULL pointer crash in the
kernel:
Unable to handle kernel NULL pointer dereference at virtual address 00000000
Mem abort info:
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000068
CM = 0, WnR = 1
[0000000000000000] user address but active_mm is swapper
Internal error: Oops: 96000044 [#1] PREEMPT SMP
Modules linked in:
CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.13.0-06657-g3b9f8ed25dbe #576
Hardware name: Marvell Armada 3720 Development Board DB-88F3720-DDR3 (DT)
task: ffff80001d908000 task.stack: ffff000008068000
PC is at armada_37xx_pinctrl_probe+0x5f8/0x670
LR is at armada_37xx_pinctrl_probe+0x5e8/0x670
pc : [<ffff000008e25cdc>] lr : [<ffff000008e25ccc>] pstate: 60000045
sp : ffff00000806bb80
x29: ffff00000806bb80 x28: 0000000000000024
x27: 000000000000000c x26: 0000000000000001
x25: ffff80001efee760 x24: 0000000000000000
x23: ffff80001db6f570 x22: ffff80001db6f438
x21: 0000000000000000 x20: ffff80001d9f4810
x19: ffff80001db6f418 x18: 0000000000000000
x17: 0000000000000001 x16: 0000000000000019
x15: ffffffffffffffff x14: 0140000000000000
x13: 0000000000000000 x12: 0000000000000030
x11: 0101010101010101 x10: 0000000000000040
x9 : ffff000009923580 x8 : ffff80001d400248
x7 : ffff80001d400270 x6 : 0000000000000000
x5 : ffff80001d400248 x4 : ffff80001d400270
x3 : 0000000000000000 x2 : 0000000000000001
x1 : 0000000000000001 x0 : 0000000000000000
Process swapper/0 (pid: 1, stack limit = 0xffff000008068000)
Call trace:
Exception stack(0xffff00000806ba40 to 0xffff00000806bb80)
ba40: 0000000000000000 0000000000000001 0000000000000001 0000000000000000
ba60: ffff80001d400270 ffff80001d400248 0000000000000000 ffff80001d400270
ba80: ffff80001d400248 ffff000009923580 0000000000000040 0101010101010101
baa0: 0000000000000030 0000000000000000 0140000000000000 ffffffffffffffff
bac0: 0000000000000019 0000000000000001 0000000000000000 ffff80001db6f418
bae0: ffff80001d9f4810 0000000000000000 ffff80001db6f438 ffff80001db6f570
bb00: 0000000000000000 ffff80001efee760 0000000000000001 000000000000000c
bb20: 0000000000000024 ffff00000806bb80 ffff000008e25ccc ffff00000806bb80
bb40: ffff000008e25cdc 0000000060000045 ffff00000806bb60 ffff0000081189b8
bb60: ffffffffffffffff ffff00000811cf1c ffff00000806bb80 ffff000008e25cdc
[<ffff000008e25cdc>] armada_37xx_pinctrl_probe+0x5f8/0x670
[<ffff00000859d8c8>] platform_drv_probe+0x58/0xb8
[<ffff00000859bb44>] driver_probe_device+0x22c/0x2d8
[<ffff00000859bcac>] __driver_attach+0xbc/0xc0
[<ffff000008599c84>] bus_for_each_dev+0x4c/0x98
[<ffff00000859b440>] driver_attach+0x20/0x28
[<ffff00000859af90>] bus_add_driver+0x1b8/0x228
[<ffff00000859c648>] driver_register+0x60/0xf8
[<ffff00000859df64>] __platform_driver_probe+0x74/0x130
[<ffff000008e256dc>] armada_37xx_pinctrl_driver_init+0x20/0x28
[<ffff000008083980>] do_one_initcall+0x38/0x128
[<ffff000008e00cf4>] kernel_init_freeable+0x188/0x22c
[<ffff0000089b56e8>] kernel_init+0x10/0x100
[<ffff000008084bb0>] ret_from_fork+0x10/0x18
Code: f9403fa2 12001341 1100075a 9ac12041 (b9000001)
---[ end trace 8b0f4e05e1603208 ]---
This patch moves the initialization of the mask field in the irq_startup
function. However some callbacks such as irq_set_type and irq_set_wake
could be called before irq_startup. For those functions the mask is
computed at each call which is not a issue as these functions are not
located in a hot path but are used sporadically for configuration.
Fixes: dc749a09ea5e ("gpiolib: allow gpio irqchip to map irqs
dynamically")
Cc: <stable@vger.kernel.org>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
info->groups[] has info->ngroups elements so these comparisons should be
>= instead of >.
Fixes: 41d32cfce1ae ("pinctrl: sprd: Add Spreadtrum pin control driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Baolin Wang <baolin.wang@spreadtrum.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
devm_pinctrl_get() could fail with ERR_PTR(-ENOMEM) so I have added a
check for that. I also reversed the other IS_ERR() test because it was
a little confusing to test one way and then the opposite a couple lines
later.
Fixes: 41d32cfce1ae ("pinctrl: sprd: Add Spreadtrum pin control driver")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
The Spreadtrum pinctrl drivers are only useful when building for a
Spreadtrum platform.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
Fix build errors when CONFIG_OF is not enabled.
Also, the pinctrl-sprd-sc9860 driver uses functions from the pinctrl-sprd
driver, so the former should depend on the latter driver.
../drivers/pinctrl/sprd/pinctrl-sprd.c: In function 'sprd_dt_node_to_map':
../drivers/pinctrl/sprd/pinctrl-sprd.c:290:2: error: implicit declaration of function 'pinconf_generic_parse_dt_config' [-Werror=implicit-function-declaration]
ret = pinconf_generic_parse_dt_config(np, pctldev, &configs,
^
../drivers/pinctrl/sprd/pinctrl-sprd.c: At top level:
../drivers/pinctrl/sprd/pinctrl-sprd.c:844:44: error: array type has incomplete element type
static const struct pinconf_generic_params sprd_dt_params[] = {
^
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Baolin Wang <baolin.wang@spreadtrum.com>
Cc: Linus Walleij <linus.walleij@linaro.org>
Cc: linux-gpio@vger.kernel.org
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|
|
The functions sprd_pmx_get_function_count, sprd_pmx_get_function_name
and sprd_pmx_get_function_groups are local to the source and do not
need to be in global scope, so make them static.
Cleans up sparse warnings:
"symbol 'sprd_pmx_get_function_count' was not declared. Should it be
static?"
"symbol 'sprd_pmx_get_function_name' was not declared. Should it be
static?"
"symbol 'sprd_pmx_get_function_groups' was not declared. Should it be
static?"
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
|