| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Purpose of compaction is to get a high order page. Currently, if we
find high-order page while searching migration target page, we break it
to order-0 pages and use them as migration target. It is contrary to
purpose of compaction, so disallow high-order page to be used for
migration target.
Additionally, clean-up logic in suitable_migration_target() to simplify
the code. There is no functional changes from this clean-up.
Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Acked-by: Vlastimil Babka <vbabka@suse.cz>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We had a report about strange OOM killer strikes on a PPC machine
although there was a lot of swap free and a tons of anonymous memory
which could be swapped out. In the end it turned out that the OOM was a
side effect of zone reclaim which wasn't unmapping and swapping out and
so the system was pushed to the OOM. Although this sounds like a bug
somewhere in the kswapd vs. zone reclaim vs. direct reclaim
interaction numactl on the said hardware suggests that the zone reclaim
should not have been set in the first place:
node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
node 0 size: 0 MB
node 0 free: 0 MB
node 2 cpus:
node 2 size: 7168 MB
node 2 free: 6019 MB
node distances:
node 0 2
0: 10 40
2: 40 10
So all the CPUs are associated with Node0 which doesn't have any memory
while Node2 contains all the available memory. Node distances cause an
automatic zone_reclaim_mode enabling.
Zone reclaim is intended to keep the allocations local but this doesn't
make any sense on the memoryless nodes. So let's exclude such nodes for
init_zone_allows_reclaim which evaluates zone reclaim behavior and
suitable reclaim_nodes.
Signed-off-by: Michal Hocko <mhocko@suse.cz>
Acked-by: David Rientjes <rientjes@google.com>
Acked-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Tested-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
| |
The described issue now occurs inside mmap_region(). And unfortunately
is still valid.
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We abort direct reclaim if we find the zone is ready for compaction.
Sometimes the zone is just a promoted highmem zone to force a scan of
highmem, which is not the intended zone the caller want to allocate a
page from. In this situation, setting aborted_reclaim to indicate the
caller turned back to retry the allocation is waste of time and could
cause a loop in __alloc_pages_slowpath().
This patch does not check compaction_ready() on promoted zones to avoid
the above situation. Only set aborted_reclaim if the caller intended
zone is ready for compaction.
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We promote sc->gfp_mask to __GFP_HIGHMEM to forcibly scan highmem if
there are too many buffer_heads pinning highmem. See cc715d99e5 ("mm:
vmscan: forcibly scan highmem if there are too many buffer_heads pinning
highmem").
This patch restores sc->gfp_mask to its caller original value after
finishing the scan job, to avoid the impact on other invocations from
its upper caller, such as vmpressure_prio(), shrink_slab().
Signed-off-by: Weijie Yang <weijie.yang@samsung.com>
Acked-by: Mel Gorman <mgorman@suse.de>
Acked-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The NUMA scanning code can end up iterating over many gigabytes of
unpopulated memory, especially in the case of a freshly started KVM
guest with lots of memory.
This results in the mmu notifier code being called even when there are
no mapped pages in a virtual address range. The amount of time wasted
can be enough to trigger soft lockup warnings with very large KVM
guests.
This patch moves the mmu notifier call to the pmd level, which
represents 1GB areas of memory on x86-64. Furthermore, the mmu notifier
code is only called from the address in the PMD where present mappings
are first encountered.
The hugetlbfs code is left alone for now; hugetlb mappings are not
relocatable, and as such are left alone by the NUMA code, and should
never trigger this problem to begin with.
Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Xing Gang <gang.xing@hp.com>
Tested-by: Chegu Vinod <chegu_vinod@hp.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Sasha reported the following bug using trinity
kernel BUG at mm/mprotect.c:149!
invalid opcode: 0000 [#1] PREEMPT SMP DEBUG_PAGEALLOC
Dumping ftrace buffer:
(ftrace buffer empty)
Modules linked in:
CPU: 20 PID: 26219 Comm: trinity-c216 Tainted: G W 3.14.0-rc5-next-20140305-sasha-00011-ge06f5f3-dirty #105
task: ffff8800b6c80000 ti: ffff880228436000 task.ti: ffff880228436000
RIP: change_protection_range+0x3b3/0x500
Call Trace:
change_protection+0x25/0x30
change_prot_numa+0x1b/0x30
task_numa_work+0x279/0x360
task_work_run+0xae/0xf0
do_notify_resume+0x8e/0xe0
retint_signal+0x4d/0x92
The VM_BUG_ON was added in -mm by the patch "mm,numa: reorganize
change_pmd_range". The race existed without the patch but was just
harder to hit.
The problem is that a transhuge check is made without holding the PTL.
It's possible at the time of the check that a parallel fault clears the
pmd and inserts a new one which then triggers the VM_BUG_ON check. This
patch removes the VM_BUG_ON but fixes the race by rechecking transhuge
under the PTL when marking page tables for NUMA hinting and bailing if a
race occurred. It is not a problem for calls to mprotect() as they hold
mmap_sem for write.
Signed-off-by: Mel Gorman <mgorman@suse.de>
Reported-by: Sasha Levin <sasha.levin@oracle.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reorganize the order of ifs in change_pmd_range a little, in preparation
for the next patch.
[akpm@linux-foundation.org: fix indenting, per David]
Signed-off-by: Rik van Riel <riel@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Reported-by: Xing Gang <gang.xing@hp.com>
Tested-by: Chegu Vinod <chegu_vinod@hp.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
huge_pte_offset() could return NULL, so we need NULL check to avoid
potential NULL pointer dereferences.
Signed-off-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
- Convert spinlock/static array to va_format (inspired by Joe Perches
help on previous logging patches).
- Convert printk(KERN_ERR to pr_warn in __ntfs_warning.
- Convert printk(KERN_ERR to pr_err in __ntfs_error.
- Convert printk(KERN_DEBUG to pr_debug in __ntfs_debug. (Note that
__ntfs_debug is still guarded by #if DEBUG)
- Improve !DEBUG to parse all arguments (Joe Perches).
- Sparse pr_foo() conversions in super.c
NTFS, NTFS-fs prefixes as well as 'warning' and 'error' were removed :
pr_foo() automatically adds module name and error level is already
specified.
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Anton Altaparmakov <anton@tuxera.com>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull NFS client updates from Trond Myklebust:
"Highlights include:
- Stable fix for a use after free issue in the NFSv4.1 open code
- Fix the SUNRPC bi-directional RPC code to account for TCP segmentation
- Optimise usage of readdirplus when confronted with 'ls -l' situations
- Soft mount bugfixes
- NFS over RDMA bugfixes
- NFSv4 close locking fixes
- Various NFSv4.x client state management optimisations
- Rename/unlink code cleanups"
* tag 'nfs-for-3.15-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (28 commits)
nfs: pass string length to pr_notice message about readdir loops
NFSv4: Fix a use-after-free problem in open()
SUNRPC: rpc_restart_call/rpc_restart_call_prepare should clear task->tk_status
SUNRPC: Don't let rpc_delay() clobber non-timeout errors
SUNRPC: Ensure call_connect_status() deals correctly with SOFTCONN tasks
SUNRPC: Ensure call_status() deals correctly with SOFTCONN tasks
NFSv4: Ensure we respect soft mount timeouts during trunking discovery
NFSv4: Schedule recovery if nfs40_walk_client_list() is interrupted
NFS: advertise only supported callback netids
SUNRPC: remove KERN_INFO from dprintk() call sites
SUNRPC: Fix large reads on NFS/RDMA
NFS: Clean up: revert increase in READDIR RPC buffer max size
SUNRPC: Ensure that call_bind times out correctly
SUNRPC: Ensure that call_connect times out correctly
nfs: emit a fsnotify_nameremove call in sillyrename codepath
nfs: remove synchronous rename code
nfs: convert nfs_rename to use async_rename infrastructure
nfs: make nfs_async_rename non-static
nfs: abstract out code needed to complete a sillyrename
NFSv4: Clear the open state flags if the new stateid does not match
...
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
There is no guarantee that the strings in the nfs_cache_array will be
NULL-terminated. In the event that we end up hitting a readdir loop, we
need to ensure that we pass the warning message the length of the
string.
Reported-by: Lachlan McIlroy <lmcilroy@redhat.com>
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If we interrupt the nfs4_wait_for_completion_rpc_task() call in
nfs4_run_open_task(), then we don't prevent the RPC call from
completing. So freeing up the opendata->f_attr.mdsthreshold
in the error path in _nfs4_do_open() leads to a use-after-free
when the XDR decoder tries to decode the mdsthreshold information
from the server.
Fixes: 82be417aa37c0 (NFSv4.1 cache mdsthreshold values on OPEN)
Tested-by: Steve Dickson <SteveD@redhat.com>
Cc: stable@vger.kernel.org # 3.5+
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| |
| |
| |
| | |
When restarting an rpc call, we should not be carrying over data from the
previous call.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| | |
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Don't schedule an rpc_delay before checking to see if the task
is a SOFTCONN because the tk_callback from the delay (__rpc_atrun)
clears the task status before the rpc_exit_task can be run.
Signed-off-by: Steve Dickson <steved@redhat.com>
Fixes: 561ec1603171c (SUNRPC: call_connect_status should recheck...)
Link: http://lkml.kernel.org/r/5329CF7C.7090308@RedHat.com
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| | |
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| |
| | |
Tested-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If a timeout or a signal interrupts the NFSv4 trunking discovery
SETCLIENTID_CONFIRM call, then we don't know whether or not the
server has changed the callback identifier on us.
Assume that it did, and schedule a 'path down' recovery...
Tested-by: Steve Dickson <steved@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
NFSv4.0 clients use the SETCLIENTID operation to inform NFS servers
how to contact a client's callback service. If a server cannot
contact a client's callback service, that server will not delegate
to that client, which results in a performance loss.
Our client advertises "rdma" as the callback netid when the forward
channel is "rdma". But our client always starts only "tcp" and
"tcp6" callback services.
Instead of advertising the forward channel netid, advertise "tcp"
or "tcp6" as the callback netid, based on the value of the
clientaddr mount option, since those are what our client currently
supports.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=69171
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
The use of KERN_INFO causes garbage characters to appear when
debugging is enabled.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
After commit a11a2bf4, "SUNRPC: Optimise away unnecessary data moves
in xdr_align_pages", Thu Aug 2 13:21:43 2012, READs larger than a
few hundred bytes via NFS/RDMA no longer work. This commit exposed
a long-standing bug in rpcrdma_inline_fixup().
I reproduce this with an rsize=4096 mount using the cthon04 basic
tests. Test 5 fails with an EIO error.
For my reproducer, kernel log shows:
NFS: server cheating in read reply: count 4096 > recvd 0
rpcrdma_inline_fixup() is zeroing the xdr_stream::page_len field,
and xdr_align_pages() is now returning that value to the READ XDR
decoder function.
That field is set up by xdr_inline_pages() by the READ XDR encoder
function. As far as I can tell, it is supposed to be left alone
after that, as it describes the dimensions of the reply xdr_stream,
not the contents of that stream.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=68391
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Security labels go with each directory entry, thus they are always
stored in the page cache, not in the head buffer. The length of the
reply that goes in head[0] should not have changed to support
NFSv4.2 labels.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |\ |
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If the rpcbind server is unavailable, we still want the RPC client
to respect the timeout.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When the server is unavailable due to a networking error, etc, we want
the RPC client to respect the timeout delays when attempting to reconnect.
Reported-by: Neil Brown <neilb@suse.de>
Fixes: 561ec1603171 (SUNRPC: call_connect_status should recheck bind..)
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If a file is sillyrenamed, then the generic vfs_unlink code will skip
emitting fsnotify events for it.
This patch has the sillyrename code do that instead.
In truth this is a little bit odd since we aren't actually removing the
dentry per-se, but renaming it. Still, this is probably the right thing
to do since it's what userland apps expect to see when an unlink()
occurs or some file is renamed on top of the dentry.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Tested-by: Anna Schumaker <Anna.Schumaker@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Now that nfs_rename uses the async infrastructure, we can remove this.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Tested-by: Anna Schumaker <Anna.Schumaker@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
There isn't much sense in maintaining two separate versions of rename
code. Convert nfs_rename to use the asynchronous rename infrastructure
that nfs_sillyrename uses, and emulate synchronous behavior by having
the task just wait on the reply.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Tested-by: Anna Schumaker <Anna.Schumaker@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
...and move the prototype for nfs_sillyrename to internal.h.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Tested-by: Anna Schumaker <Anna.Schumaker@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The async rename code is currently "polluted" with some parts that are
really just for sillyrenames. Add a new "complete" operation vector to
the nfs_renamedata to separate out the stuff that just needs to be done
for a sillyrename.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Tested-by: Anna Schumaker <Anna.Schumaker@netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
RFC3530 and RFC5661 both prescribe that the 'opaque' field of the
open stateid returned by new OPEN/OPEN_DOWNGRADE/CLOSE calls for
the same file and open owner should match.
If this is not the case, assume that the open state has been lost,
and that we need to recover it.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The stateid and state->flags should be updated atomically under
protection of the state->seqlock.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If the server returns a completely new layout stateid in response to our
LAYOUTGET, then make sure to free any existing layout segments.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If the filehandles match, but the igrab() fails, or the layout is
freed before we can get it, then just return NULL.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
It is not sufficient to compare filehandles when we receive a layout
recall from the server; we also need to check that the layout stateids
match.
Reported-by: shaobingqing <shaobingqing@bwstor.com.cn>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | |
| | |
| | |
| | |
| | |
| | | |
This patch is in preparation for the NFSv4.1 parallel open capability.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Subtraction of signed integers does not have well defined wraparound
semantics in the C99 standard. In order to be wraparound-safe, we
have to use unsigned subtraction, and then cast the result.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Try to detect 'ls -l' by having nfs_getattr() look at whether or not
there is an opendir() file descriptor for the parent directory.
If so, then assume that we want to force use of readdirplus in order
to avoid the multiple GETATTR calls over the wire.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Since TCP is a stream protocol, our callback read code needs to take into
account the fact that RPC callbacks are not always confined to a single
TCP segment.
This patch adds support for multiple TCP segments by ensuring that we
only remove the rpc_rqst structure from the 'free backchannel requests'
list once the data has been completely received. We rely on the fact
that TCP data is ordered for the duration of the connection.
Reported-by: shaobingqing <shaobingqing@bwstor.com.cn>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull module updates from Rusty Russell:
"Nothing major: the stricter permissions checking for sysfs broke a
staging driver; fix included. Greg KH said he'd take the patch but
hadn't as the merge window opened, so it's included here to avoid
breaking build"
* tag 'modules-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
staging: fix up speakup kobject mode
Use 'E' instead of 'X' for unsigned module taint flag.
VERIFY_OCTAL_PERMISSIONS: stricter checking for sysfs perms.
kallsyms: fix percpu vars on x86-64 with relocation.
kallsyms: generalize address range checking
module: LLVMLinux: Remove unused function warning from __param_check macro
Fix: module signature vs tracepoints: add new TAINT_UNSIGNED_MODULE
module: remove MODULE_GENERIC_TABLE
module: allow multiple calls to MODULE_DEVICE_TABLE() per module
module: use pr_cont
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
It uses the unnecessary S_IFREG bit which broke when my
stricter-checking-for-mode patch went in.
Since we're fixing it anyway, the extra level of indirection is
confusing for readers (ROOT_W == rw-r--r-- for example).
Also, many of these are other-writable. Is that really intended?
I'll-queue-this-patch-up-in-a-bit-by: Greg KH <greg@kroah.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Takashi Iwai <tiwai@suse.de> says:
> The letter 'X' has been already used for SUSE kernels for very long
> time, to indicate the external supported modules. Can the new flag be
> changed to another letter for avoiding conflict...?
> (BTW, we also use 'N' for "no support", too.)
Note: this code should be cleaned up, so we don't have such maps in
three places!
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Summary of http://lkml.org/lkml/2014/3/14/363 :
Ted: module_param(queue_depth, int, 444)
Joe: 0444!
Rusty: User perms >= group perms >= other perms?
Joe: CLASS_ATTR, DEVICE_ATTR, SENSOR_ATTR and SENSOR_ATTR_2?
Side effect of stricter permissions means removing the unnecessary
S_IFREG from several callers.
Note that the BUILD_BUG_ON_ZERO((perm) & 2) test was removed: a fair
number of drivers fail this test, so that will be the debate for a
future patch.
Suggested-by: Joe Perches <joe@perches.com>
Acked-by: Bjorn Helgaas <bhelgaas@google.com> for drivers/pci/slot.c
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <jlbec@evilplan.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
x86-64 has a problem: per-cpu variables are actually represented by
their absolute offsets within the per-cpu area, but the symbols are
not emitted as absolute. Thus kallsyms naively creates them as offsets
from _text, meaning their values change if the kernel is relocated
(especially noticeable with CONFIG_RANDOMIZE_BASE):
$ egrep ' (gdt_|_(stext|_per_cpu_))' /root/kallsyms.nokaslr
0000000000000000 D __per_cpu_start
0000000000004000 D gdt_page
0000000000014280 D __per_cpu_end
ffffffff810001c8 T _stext
ffffffff81ee53c0 D __per_cpu_offset
$ egrep ' (gdt_|_(stext|_per_cpu_))' /root/kallsyms.kaslr1
000000001f200000 D __per_cpu_start
000000001f204000 D gdt_page
000000001f214280 D __per_cpu_end
ffffffffa02001c8 T _stext
ffffffffa10e53c0 D __per_cpu_offset
Making them absolute symbols is the Right Thing, but requires fixes to
the relocs tool. So for the moment, we add a --absolute-percpu option
which makes them absolute from a kallsyms perspective:
$ egrep ' (gdt_|_(stext|_per_cpu_))' /proc/kallsyms # no KASLR
0000000000000000 A __per_cpu_start
000000000000a000 A gdt_page
0000000000013040 A __per_cpu_end
ffffffff802001c8 T _stext
ffffffff8099b180 D __per_cpu_offset
ffffffff809a3000 D __per_cpu_load
$ egrep ' (gdt_|_(stext|_per_cpu_))' /proc/kallsyms # With KASLR
0000000000000000 A __per_cpu_start
000000000000a000 A gdt_page
0000000000013040 A __per_cpu_end
ffffffff89c001c8 T _stext
ffffffff8a39d180 D __per_cpu_offset
ffffffff8a3a5000 D __per_cpu_load
Based-on-the-original-screenplay-by: Andy Honig <ahonig@google.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Kees Cook <keescook@chromium.org>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This refactors the address range checks to be generalized instead of
specific to text range checks, in preparation for other range checks.
Also extracts logic for "is the symbol absolute" into a function.
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This code makes a compile time type check that is optimized away. Clang
complains that it generates an unused function:
linux/kernel/panic.c:471:1: warning: unused function '__check_panic'
[-Wunused-function]
core_param(panic, panic_timeout, int, 0644);
^
linux/moduleparam.h:283:2: note: expanded from macro
'core_param'
param_check_##type(name, &(var)); \
^
<scratch space>:87:1: note: expanded from here
param_check_int
^
linux/moduleparam.h:369:34: note: expanded from macro
'param_check_int'
#define param_check_int(name, p) __param_check(name, p, int)
^
linux/moduleparam.h:349:22: note: expanded from macro
'__param_check'
static inline type *__check_##name(void) { return(p); }
^
<scratch space>:88:1: note: expanded from here
__check_panic
GCC won't complain for a static inline function but would if it was just
a static function.
Adding the unused attribute to the function declaration removes the warning.
Per request from Rusty Russell it is marked as __always_unused as the code
is meant to be optimized away.
This code works for both GCC and clang.
Signed-off-by: Mark Charlebois <charlebm@gmail.com>
Signed-off-by: Behan Webster <behanw@converseincode.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Users have reported being unable to trace non-signed modules loaded
within a kernel supporting module signature.
This is caused by tracepoint.c:tracepoint_module_coming() refusing to
take into account tracepoints sitting within force-loaded modules
(TAINT_FORCED_MODULE). The reason for this check, in the first place, is
that a force-loaded module may have a struct module incompatible with
the layout expected by the kernel, and can thus cause a kernel crash
upon forced load of that module on a kernel with CONFIG_TRACEPOINTS=y.
Tracepoints, however, specifically accept TAINT_OOT_MODULE and
TAINT_CRAP, since those modules do not lead to the "very likely system
crash" issue cited above for force-loaded modules.
With kernels having CONFIG_MODULE_SIG=y (signed modules), a non-signed
module is tainted re-using the TAINT_FORCED_MODULE taint flag.
Unfortunately, this means that Tracepoints treat that module as a
force-loaded module, and thus silently refuse to consider any tracepoint
within this module.
Since an unsigned module does not fit within the "very likely system
crash" category of tainting, add a new TAINT_UNSIGNED_MODULE taint flag
to specifically address this taint behavior, and accept those modules
within Tracepoints. We use the letter 'X' as a taint flag character for
a module being loaded that doesn't know how to sign its name (proposed
by Steven Rostedt).
Also add the missing 'O' entry to trace event show_module_flags() list
for the sake of completeness.
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
NAKed-by: Ingo Molnar <mingo@redhat.com>
CC: Thomas Gleixner <tglx@linutronix.de>
CC: David Howells <dhowells@redhat.com>
CC: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
MODULE_DEVICE_TABLE() calles MODULE_GENERIC_TABLE(); make it do the
work directly. This also removes a wart introduced in the last patch,
where the alias is defined to be an unknown struct type "struct
type##__##name##_device_id" instead of "struct type##_device_id" (it's
an extern so GCC doesn't care, but it's wrong).
The other user of MODULE_GENERIC_TABLE (ISAPNP_CARD_TABLE) is unused,
so delete it.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Commit 78551277e4df5: "Input: i8042 - add PNP modaliases" had a bug, where the
second call to MODULE_DEVICE_TABLE() overrode the first resulting in not all
the modaliases being exposed.
This fixes the problem by including the name of the device_id table in the
__mod_*_device_table alias, allowing us to export several device_id tables
per module.
Suggested-by: Kay Sievers <kay@vrfy.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|