| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull IDA updates from Matthew Wilcox:
"A better IDA API:
id = ida_alloc(ida, GFP_xxx);
ida_free(ida, id);
rather than the cumbersome ida_simple_get(), ida_simple_remove().
The new IDA API is similar to ida_simple_get() but better named. The
internal restructuring of the IDA code removes the bitmap
preallocation nonsense.
I hope the net -200 lines of code is convincing"
* 'ida-4.19' of git://git.infradead.org/users/willy/linux-dax: (29 commits)
ida: Change ida_get_new_above to return the id
ida: Remove old API
test_ida: check_ida_destroy and check_ida_alloc
test_ida: Convert check_ida_conv to new API
test_ida: Move ida_check_max
test_ida: Move ida_check_leaf
idr-test: Convert ida_check_nomem to new API
ida: Start new test_ida module
target/iscsi: Allocate session IDs from an IDA
iscsi target: fix session creation failure handling
drm/vmwgfx: Convert to new IDA API
dmaengine: Convert to new IDA API
ppc: Convert vas ID allocation to new IDA API
media: Convert entity ID allocation to new IDA API
ppc: Convert mmu context allocation to new IDA API
Convert net_namespace to new IDA API
cb710: Convert to new IDA API
rsxx: Convert to new IDA API
osd: Convert to new IDA API
sd: Convert to new IDA API
...
|
| |
| |
| |
| | |
Signed-off-by: Matthew Wilcox <willy@infradead.org>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Merge yet more updates from Andrew Morton:
- the rest of MM
- various misc fixes and tweaks
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (22 commits)
mm: Change return type int to vm_fault_t for fault handlers
lib/fonts: convert comments to utf-8
s390: ebcdic: convert comments to UTF-8
treewide: convert ISO_8859-1 text comments to utf-8
drivers/gpu/drm/gma500/: change return type to vm_fault_t
docs/core-api: mm-api: add section about GFP flags
docs/mm: make GFP flags descriptions usable as kernel-doc
docs/core-api: split memory management API to a separate file
docs/core-api: move *{str,mem}dup* to "String Manipulation"
docs/core-api: kill trailing whitespace in kernel-api.rst
mm/util: add kernel-doc for kvfree
mm/util: make strndup_user description a kernel-doc comment
fs/proc/vmcore.c: hide vmcoredd_mmap_dumps() for nommu builds
treewide: correct "differenciate" and "instanciate" typos
fs/afs: use new return type vm_fault_t
drivers/hwtracing/intel_th/msu.c: change return type to vm_fault_t
mm: soft-offline: close the race against page allocation
mm: fix race on soft-offlining free huge pages
namei: allow restricted O_CREAT of FIFOs and regular files
hfs: prevent crash on exit from failed search
...
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Almost all files in the kernel are either plain text or UTF-8 encoded. A
couple however are ISO_8859-1, usually just a few characters in a C
comments, for historic reasons.
This converts them all to UTF-8 for consistency.
Link: http://lkml.kernel.org/r/20180724111600.4158975-1-arnd@arndb.de
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Simon Horman <horms@verge.net.au> [IPVS portion]
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> [IIO]
Acked-by: Michael Ellerman <mpe@ellerman.id.au> [powerpc]
Acked-by: Rob Herring <robh@kernel.org>
Cc: Joe Perches <joe@perches.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Samuel Ortiz <sameo@linux.intel.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Rob Herring <robh+dt@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Pull NFS client updates from Anna Schumaker:
"These patches include adding async support for the v4.2 COPY
operation. I think Bruce is planning to send the server patches for
the next release, but I figured we could get the client side out of
the way now since it's been in my tree for a while. This shouldn't
cause any problems, since the server will still respond with
synchronous copies even if the client requests async.
Features:
- Add support for asynchronous server-side COPY operations
Stable bufixes:
- Fix an off-by-one in bl_map_stripe() (v3.17+)
- NFSv4 client live hangs after live data migration recovery (v4.9+)
- xprtrdma: Fix disconnect regression (v4.18+)
- Fix locking in pnfs_generic_recover_commit_reqs (v4.14+)
- Fix a sleep in atomic context in nfs4_callback_sequence() (v4.9+)
Other bugfixes and cleanups:
- Optimizations and fixes involving NFS v4.1 / pNFS layout handling
- Optimize lseek(fd, SEEK_CUR, 0) on directories to avoid locking
- Immediately reschedule writeback when the server replies with an
error
- Fix excessive attribute revalidation in nfs_execute_ok()
- Add error checking to nfs_idmap_prepare_message()
- Use new vm_fault_t return type
- Return a delegation when reclaiming one that the server has
recalled
- Referrals should inherit proto setting from parents
- Make rpc_auth_create_args a const
- Improvements to rpc_iostats tracking
- Fix a potential reference leak when there is an error processing a
callback
- Fix rmdir / mkdir / rename nlink accounting
- Fix updating inode change attribute
- Fix error handling in nfsn4_sp4_select_mode()
- Use an appropriate work queue for direct-write completion
- Don't busy wait if NFSv4 session draining is interrupted"
* tag 'nfs-for-4.19-1' of git://git.linux-nfs.org/projects/anna/linux-nfs: (54 commits)
pNFS: Remove unwanted optimisation of layoutget
pNFS/flexfiles: ff_layout_pg_init_read should exit on error
pNFS: Treat RECALLCONFLICT like DELAY...
pNFS: When updating the stateid in layoutreturn, also update the recall range
NFSv4: Fix a sleep in atomic context in nfs4_callback_sequence()
NFSv4: Fix locking in pnfs_generic_recover_commit_reqs
NFSv4: Fix a typo in nfs4_init_channel_attrs()
NFSv4: Don't busy wait if NFSv4 session draining is interrupted
NFS recover from destination server reboot for copies
NFS add a simple sync nfs4_proc_commit after async COPY
NFS handle COPY ERR_OFFLOAD_NO_REQS
NFS send OFFLOAD_CANCEL when COPY killed
NFS export nfs4_async_handle_error
NFS handle COPY reply CB_OFFLOAD call race
NFS add support for asynchronous COPY
NFS COPY xdr handle async reply
NFS OFFLOAD_CANCEL xdr
NFS CB_OFFLOAD xdr
NFS: Use an appropriate work queue for direct-write completion
NFSv4: Fix error handling in nfs4_sp4_select_mode()
...
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
I found that injecting disconnects with v4.18-rc resulted in
random failures of the multi-threaded git regression test.
The root cause appears to be that, after a reconnect, the
RPC/RDMA transport is waking pending RPCs before the transport has
posted enough Receive buffers to receive the Replies. If a Reply
arrives before enough Receive buffers are posted, the connection
is dropped. A few connection drops happen in quick succession as
the client and server struggle to regain credit synchronization.
This regression was introduced with commit 7c8d9e7c8863 ("xprtrdma:
Move Receive posting to Receive handler"). The client is supposed to
post a single Receive when a connection is established because
it's not supposed to send more than one RPC Call before it gets
a fresh credit grant in the first RPC Reply [RFC 8166, Section
3.3.3].
Unfortunately there appears to be a longstanding bug in the Linux
client's credit accounting mechanism. On connect, it simply dumps
all pending RPC Calls onto the new connection. It's possible it has
done this ever since the RPC/RDMA transport was added to the kernel
ten years ago.
Servers have so far been tolerant of this bad behavior. Currently no
server implementation ever changes its credit grant over reconnects,
and servers always repost enough Receives before connections are
fully established.
The Linux client implementation used to post a Receive before each
of these Calls. This has covered up the flooding send behavior.
I could try to correct this old bug so that the client sends exactly
one RPC Call and waits for a Reply. Since we are so close to the
next merge window, I'm going to instead provide a simple patch to
post enough Receives before a reconnect completes (based on the
number of credits granted to the previous connection).
The spurious disconnects will be gone, but the client will still
send multiple RPC Calls immediately after a reconnect.
Addressing the latter problem will wait for a merge window because
a) I expect it to be a large change requiring lots of testing, and
b) obviously the Linux client has interoperated successfully since
day zero while still being broken.
Fixes: 7c8d9e7c8863 ("xprtrdma: Move Receive posting to ... ")
Cc: stable@vger.kernel.org # v4.18+
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Remove trailing whitespace and blank line at EOF
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
After a live data migration event at the NFS server, the client may send
I/O requests to the wrong server, causing a live hang due to repeated
recovery events. On the wire, this will appear as an I/O request failing
with NFS4ERR_BADSESSION, followed by successful CREATE_SESSION, repeatedly.
NFS4ERR_BADSSESSION is returned because the session ID being used was
issued by the other server and is not valid at the old server.
The failure is caused by async worker threads having cached the transport
(xprt) in the rpc_task structure. After the migration recovery completes,
the task is redispatched and the task resends the request to the wrong
server based on the old value still present in tk_xprt.
The solution is to recompute the tk_xprt field of the rpc_task structure
so that the request goes to the correct server.
Signed-off-by: Bill Baker <bill.baker@oracle.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Tested-by: Helen Chao <helen.chao@oracle.com>
Fixes: fb43d17210ba ("SUNRPC: Use the multipath iterator to assign a ...")
Cc: stable@vger.kernel.org # v4.9+
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Smatch complains that "num" can be uninitialized when kstrtoul() returns
-ERANGE. It's true enough, but basically harmless in this case.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
clones
The existing rpc_print_iostats has a few shortcomings. First, the naming
is not consistent with other functions in the kernel that display stats.
Second, it is really displaying stats for an rpc_clnt structure as it
displays both xprt stats and per-op stats. Third, it does not handle
rpc_clnt clones, which is important for the one in-kernel tree caller
of this function, the NFS client's nfs_show_stats function.
Fix all of the above by renaming the rpc_print_iostats to
rpc_clnt_show_stats and looping through any rpc_clnt clones via
cl_parent.
Once this interface is fixed, this addresses a problem with NFSv4.
Before this patch, the /proc/self/mountstats always showed incorrect
counts for NFSv4 lease and session related opcodes such as SEQUENCE,
RENEW, SETCLIENTID, CREATE_SESSION, etc. These counts were always 0
even though many ops would go over the wire. The reason for this is
there are multiple rpc_clnt structures allocated for any given NFSv4
mount, and inside nfs_show_stats() we callled into rpc_print_iostats()
which only handled one of them, nfs_server->client. Fix these counts
by calling sunrpc's new rpc_clnt_show_stats() function, which handles
cloned rpc_clnt structs and prints the stats together.
Note that one side-effect of the above is that multiple mounts from
the same NFS server will show identical counts in the above ops due
to the fact the one rpc_clnt (representing the NFSv4 client state)
is shared across mounts.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Add a helper function to add the metrics in two rpc_iostats structures.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Refactor the output of the metrics for one RPC op into an internal
function. No functional change.
Signed-off-by: Dave Wysochanski <dwysocha@redhat.com>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This turns rpc_auth_create_args into a const as it gets passed through the
auth stack.
Signed-off-by: Sargun Dhillon <sargun@sargun.me>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Pull nfsd updates from Bruce Fields:
"Chuck Lever fixed a problem with NFSv4.0 callbacks over GSS from
multi-homed servers.
The only new feature is a minor bit of protocol (change_attr_type)
which the client doesn't even use yet.
Other than that, various bugfixes and cleanup"
* tag 'nfsd-4.19-1' of git://linux-nfs.org/~bfields/linux: (27 commits)
sunrpc: Add comment defining gssd upcall API keywords
nfsd: Remove callback_cred
nfsd: Use correct credential for NFSv4.0 callback with GSS
sunrpc: Extract target name into svc_cred
sunrpc: Enable the kernel to specify the hostname part of service principals
sunrpc: Don't use stack buffer with scatterlist
rpc: remove unneeded variable 'ret' in rdma_listen_handler
nfsd: use true and false for boolean values
nfsd: constify write_op[]
fs/nfsd: Delete invalid assignment statements in nfsd4_decode_exchange_id
NFSD: Handle full-length symlinks
NFSD: Refactor the generic write vector fill helper
svcrdma: Clean up Read chunk path
svcrdma: Avoid releasing a page in svc_xprt_release()
nfsd: Mark expected switch fall-through
sunrpc: remove redundant variables 'checksumlen','blocksize' and 'data'
nfsd: fix leaked file lock with nfs exported overlayfs
nfsd: don't advertise a SCSI layout for an unsupported request_queue
nfsd: fix corrupted reply to badly ordered compound
nfsd: clarify check_op_ordering
...
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
During review, it was found that the target, service, and srchost
keywords are easily conflated. Add an explainer.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
NFSv4.0 callback needs to know the GSS target name the client used
when it established its lease. That information is available from
the GSS context created by gssproxy. Make it available in each
svc_cred.
Note this will also give us access to the real target service
principal name (which is typically "nfs", but spec does not require
that).
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
A multi-homed NFS server may have more than one "nfs" key in its
keytab. Enable the kernel to pick the key it wants as a machine
credential when establishing a GSS context.
This is useful for GSS-protected NFSv4.0 callbacks, which are
required by RFC 7530 S3.3.3 to use the same principal as the service
principal the client used when establishing its lease.
A complementary modification to rpc.gssd is required to fully enable
this feature.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Fedora got a bug report from NFS:
kernel BUG at include/linux/scatterlist.h:143!
...
RIP: 0010:sg_init_one+0x7d/0x90
..
make_checksum+0x4e7/0x760 [rpcsec_gss_krb5]
gss_get_mic_kerberos+0x26e/0x310 [rpcsec_gss_krb5]
gss_marshal+0x126/0x1a0 [auth_rpcgss]
? __local_bh_enable_ip+0x80/0xe0
? call_transmit_status+0x1d0/0x1d0 [sunrpc]
call_transmit+0x137/0x230 [sunrpc]
__rpc_execute+0x9b/0x490 [sunrpc]
rpc_run_task+0x119/0x150 [sunrpc]
nfs4_run_exchange_id+0x1bd/0x250 [nfsv4]
_nfs4_proc_exchange_id+0x2d/0x490 [nfsv4]
nfs41_discover_server_trunking+0x1c/0xa0 [nfsv4]
nfs4_discover_server_trunking+0x80/0x270 [nfsv4]
nfs4_init_client+0x16e/0x240 [nfsv4]
? nfs_get_client+0x4c9/0x5d0 [nfs]
? _raw_spin_unlock+0x24/0x30
? nfs_get_client+0x4c9/0x5d0 [nfs]
nfs4_set_client+0xb2/0x100 [nfsv4]
nfs4_create_server+0xff/0x290 [nfsv4]
nfs4_remote_mount+0x28/0x50 [nfsv4]
mount_fs+0x3b/0x16a
vfs_kern_mount.part.35+0x54/0x160
nfs_do_root_mount+0x7f/0xc0 [nfsv4]
nfs4_try_mount+0x43/0x70 [nfsv4]
? get_nfs_version+0x21/0x80 [nfs]
nfs_fs_mount+0x789/0xbf0 [nfs]
? pcpu_alloc+0x6ca/0x7e0
? nfs_clone_super+0x70/0x70 [nfs]
? nfs_parse_mount_options+0xb40/0xb40 [nfs]
mount_fs+0x3b/0x16a
vfs_kern_mount.part.35+0x54/0x160
do_mount+0x1fd/0xd50
ksys_mount+0xba/0xd0
__x64_sys_mount+0x21/0x30
do_syscall_64+0x60/0x1f0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
This is BUG_ON(!virt_addr_valid(buf)) triggered by using a stack
allocated buffer with a scatterlist. Convert the buffer for
rc4salt to be dynamically allocated instead.
Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1615258
Signed-off-by: Laura Abbott <labbott@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The ret is not modified after initalization, So just remove the variable
and return 0.
Signed-off-by: zhong jiang <zhongjiang@huawei.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
I've given up on the idea of zero-copy handling of SYMLINK on the
server side. This is because the Linux VFS symlink API requires the
symlink pathname to be in a NUL-terminated kmalloc'd buffer. The
NUL-termination is going to be problematic (watching out for
landing on a page boundary and dealing with a 4096-byte pathname).
I don't believe that SYMLINK creation is on a performance path or is
requested frequently enough that it will cause noticeable CPU cache
pollution due to data copies.
There will be two places where a transport callout will be necessary
to fill in the rqstp: one will be in the svc_fill_symlink_pathname()
helper that is used by NFSv2 and NFSv3, and the other will be in
nfsd4_decode_create().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
fill_in_write_vector() is nearly the same logic as
svc_fill_write_vector(), but there are a few differences so that
the former can handle multiple WRITE payloads in a single COMPOUND.
svc_fill_write_vector() can be adjusted so that it can be used in
the NFSv4 WRITE code path too. Instead of assuming the pages are
coming from rq_args.pages, have the caller pass in the page list.
The immediate benefit is a reduction of code duplication. It also
prevents the NFSv4 WRITE decoder from passing an empty vector
element when the transport has provided the payload in the xdr_buf's
page array.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Simplify the error handling at the tail of recv_read_chunk() by
re-arranging rq_pages[] housekeeping and documenting it properly.
NB: In this path, svc_rdma_recvfrom returns zero. Therefore no
subsequent reply processing is done on the svc_rqstp, and thus the
rq_respages field does not need to be updated.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
svc_xprt_release() invokes svc_free_res_pages(), which releases
pages between rq_respages and rq_next_page.
Historically, the RPC/RDMA transport has set these two pointers to
be different by one, which means:
- one page gets released when svc_recv returns 0. This normally
happens whenever one or more RDMA Reads need to be dispatched to
complete construction of an RPC Call.
- one page gets released after every call to svc_send.
In both cases, this released page is immediately refilled by
svc_alloc_arg. There does not seem to be a reason for releasing this
page.
To avoid this unnecessary memory allocator traffic, set rq_next_page
more carefully.
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Variables 'checksumlen','blocksize' and 'data' are being assigned,
but are never used, hence they are redundant and can be removed.
Fix the following warning:
net/sunrpc/auth_gss/gss_krb5_wrap.c:443:7: warning: variable ‘blocksize’ set but not used [-Wunused-but-set-variable]
net/sunrpc/auth_gss/gss_krb5_crypto.c:376:15: warning: variable ‘checksumlen’ set but not used [-Wunused-but-set-variable]
net/sunrpc/xprtrdma/svc_rdma.c:97:9: warning: variable ‘data’ set but not used [-Wunused-but-set-variable]
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|\ \ \ \ \
| |_|_|/ /
|/| | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Pull more rdma updates from Jason Gunthorpe:
"This is the SMC cleanup promised, a randconfig regression fix, and
kernel oops fix.
Summary:
- Switch SMC over to rdma_get_gid_attr and remove the compat
- Fix a crash in HFI1 with some BIOS's
- Fix a randconfig failure"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
IB/ucm: fix UCM link error
IB/hfi1: Invalid NUMA node information can cause a divide by zero
RDMA/smc: Replace ib_query_gid with rdma_get_gid_attr
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
All RDMA ULPs should be using rdma_get_gid_attr instead of
ib_query_gid. Convert SMC to use the new API.
In the process correct some confusion with gid_type - if attr->ndev is
!NULL then gid_type can never be IB_GID_TYPE_IB by
definition. IB_GID_TYPE_ROCE shares the same enum value and is probably
what was intended here.
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Pull ceph updates from Ilya Dryomov:
"The main things are support for cephx v2 authentication protocol and
basic support for rbd images within namespaces (myself).
Also included are y2038 conversion patches from Arnd, a pile of
miscellaneous fixes from Chengguang and Zheng's feature bit
infrastructure for the filesystem"
* tag 'ceph-for-4.19-rc1' of git://github.com/ceph/ceph-client: (40 commits)
ceph: don't drop message if it contains more data than expected
ceph: support cephfs' own feature bits
crush: fix using plain integer as NULL warning
libceph: remove unnecessary non NULL check for request_key
ceph: refactor error handling code in ceph_reserve_caps()
ceph: refactor ceph_unreserve_caps()
ceph: change to void return type for __do_request()
ceph: compare fsc->max_file_size and inode->i_size for max file size limit
ceph: add additional size check in ceph_setattr()
ceph: add additional offset check in ceph_write_iter()
ceph: add additional range check in ceph_fallocate()
ceph: add new field max_file_size in ceph_fs_client
libceph: weaken sizeof check in ceph_x_verify_authorizer_reply()
libceph: check authorizer reply/challenge length before reading
libceph: implement CEPHX_V2 calculation mode
libceph: add authorizer challenge
libceph: factor out encrypt_authorizer()
libceph: factor out __ceph_x_decrypt()
libceph: factor out __prepare_write_connect()
libceph: store ceph_auth_handshake pointer in ceph_connection
...
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Fixes the following sparse warnings:
net/ceph/crush/mapper.c:517:76: warning: Using plain integer as NULL pointer
net/ceph/crush/mapper.c:728:68: warning: Using plain integer as NULL pointer
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
request_key never return NULL,so no need do non-NULL check.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Allow for extending ceph_x_authorize_reply in the future.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Avoid scribbling over memory if the received reply/challenge is larger
than the buffer supplied with the authorizer.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Derive the signature from the entire buffer (both AES cipher blocks)
instead of using just the first half of the first block, leaving out
data_crc entirely.
This addresses CVE-2018-1129.
Link: http://tracker.ceph.com/issues/24837
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
When a client authenticates with a service, an authorizer is sent with
a nonce to the service (ceph_x_authorize_[ab]) and the service responds
with a mutation of that nonce (ceph_x_authorize_reply). This lets the
client verify the service is who it says it is but it doesn't protect
against a replay: someone can trivially capture the exchange and reuse
the same authorizer to authenticate themselves.
Allow the service to reject an initial authorizer with a random
challenge (ceph_x_authorize_challenge). The client then has to respond
with an updated authorizer proving they are able to decrypt the
service's challenge and that the new authorizer was produced for this
specific connection instance.
The accepting side requires this challenge and response unconditionally
if the client side advertises they have CEPHX_V2 feature bit.
This addresses CVE-2018-1128.
Link: http://tracker.ceph.com/issues/24836
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Will be used for encrypting both the initial and updated authorizers.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Will be used for decrypting the server challenge which is only preceded
by ceph_x_encrypt_header.
Drop struct_v check to allow for extending ceph_x_encrypt_header in the
future.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Will be used for sending ceph_msg_connect with an updated authorizer,
after the server challenges the initial authorizer.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
We already copy authorizer_reply_buf and authorizer_reply_buf_len into
ceph_connection. Factoring out __prepare_write_connect() requires two
more: authorizer_buf and authorizer_buf_len. Store the pointer to the
handshake in con->auth rather than piling on.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sage Weil <sage@redhat.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Remove blank lines at end of file and trailing whitespace.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
The request mtime field is used all over ceph, and is currently
represented as a 'timespec' structure in Linux. This changes it to
timespec64 to allow times beyond 2038, modifying all users at the
same time.
[ Remove now redundant ts variable in writepage_nounlock(). ]
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
ceph_con_keepalive_expired() is the last user of timespec_add() and some
of the last uses of ktime_get_real_ts(). Replacing this with timespec64
based interfaces lets us remove that deprecated API.
I'm introducing new ceph_encode_timespec64()/ceph_decode_timespec64()
here that take timespec64 structures and convert to/from ceph_timespec,
which is defined to have an unsigned 32-bit tv_sec member. This extends
the range of valid times to year 2106, avoiding the year 2038 overflow.
The ceph file system portion still uses the old functions for inode
timestamps, this will be done separately after the VFS layer is converted.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Don't mention "mount" -- in the rbd case it is "mapping".
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
There is no reason to continue option parsing after detecting
bad option.
[ Return match_int() errors from ceph_parse_options() to match the
behaviour of parse_rbd_opts_token() and parse_fsopt_token(). ]
Signed-off-by: Chengguang Xu <cgxu519@gmx.com>
Reviewed-by: Ilya Dryomov <idryomov@gmail.com>
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
| | |_|/ /
| |/| | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The wire format dictates that payload_len fits into 4 bytes. Take u32
instead of size_t to reflect that.
All callers pass a small integer, so no changes required.
Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
|
|\ \ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Pull networking fixes from David Miller:
1) Fix races in IPVS, from Tan Hu.
2) Missing unbind in matchall classifier, from Hangbin Liu.
3) Missing act_ife action release, from Vlad Buslov.
4) Cure lockdep splats in ila, from Cong Wang.
5) veth queue leak on link delete, from Toshiaki Makita.
6) Disable isdn's IIOCDBGVAR ioctl, it exposes kernel addresses. From
Kees Cook.
7) RCU usage fixup in XDP, from Tariq Toukan.
8) Two TCP ULP fixes from Daniel Borkmann.
9) r8169 needs REALTEK_PHY as a Kconfig dependency, from Heiner
Kallweit.
10) Always take tcf_lock with BH disabled, otherwise we can deadlock
with rate estimator code paths. From Vlad Buslov.
11) Don't use MSI-X on RTL8106e r8169 chips, they don't resume properly.
From Jian-Hong Pan.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (41 commits)
ip6_vti: fix creating fallback tunnel device for vti6
ip_vti: fix a null pointer deferrence when create vti fallback tunnel
r8169: don't use MSI-X on RTL8106e
net: lan743x_ptp: convert to ktime_get_clocktai_ts64
net: sched: always disable bh when taking tcf_lock
ip6_vti: simplify stats handling in vti6_xmit
bpf: fix redirect to map under tail calls
r8169: add missing Kconfig dependency
tools/bpf: fix bpf selftest test_cgroup_storage failure
bpf, sockmap: fix sock_map_ctx_update_elem race with exist/noexist
bpf, sockmap: fix map elem deletion race with smap_stop_sock
bpf, sockmap: fix leakage of smap_psock_map_entry
tcp, ulp: fix leftover icsk_ulp_ops preventing sock from reattach
tcp, ulp: add alias for all ulp modules
bpf: fix a rcu usage warning in bpf_prog_array_copy_core()
samples/bpf: all XDP samples should unload xdp/bpf prog on SIGTERM
net/xdp: Fix suspicious RCU usage warning
net/mlx5e: Delete unneeded function argument
Documentation: networking: ti-cpsw: correct cbs parameters for Eth1 100Mb
isdn: Disable IIOCDBGVAR
...
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
When set fb_tunnels_only_for_init_net to 1, don't create fallback tunnel
device for vti6 when a new namespace is created.
Tested:
[root@builder2 ~]# modprobe ip6_tunnel
[root@builder2 ~]# modprobe ip6_vti
[root@builder2 ~]# echo 1 > /proc/sys/net/core/fb_tunnels_only_for_init_net
[root@builder2 ~]# unshare -n
[root@builder2 ~]# ip link
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group
default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
After set fb_tunnels_only_for_init_net to 1, the itn->fb_tunnel_dev will
be NULL and will cause following crash:
[ 2742.849298] BUG: unable to handle kernel NULL pointer dereference at 0000000000000941
[ 2742.851380] PGD 800000042c21a067 P4D 800000042c21a067 PUD 42aaed067 PMD 0
[ 2742.852818] Oops: 0002 [#1] SMP PTI
[ 2742.853570] CPU: 7 PID: 2484 Comm: unshare Kdump: loaded Not tainted 4.18.0-rc8+ #2
[ 2742.855163] Hardware name: Fedora Project OpenStack Nova, BIOS seabios-1.7.5-11.el7 04/01/2014
[ 2742.856970] RIP: 0010:vti_init_net+0x3a/0x50 [ip_vti]
[ 2742.858034] Code: 90 83 c0 48 c7 c2 20 a1 83 c0 48 89 fb e8 6e 3b f6 ff 85 c0 75 22 8b 0d f4 19 00 00 48 8b 93 00 14 00 00 48 8b 14 ca 48 8b 12 <c6> 82 41 09 00 00 04 c6 82 38 09 00 00 45 5b c3 66 0f 1f 44 00 00
[ 2742.861940] RSP: 0018:ffff9be28207fde0 EFLAGS: 00010246
[ 2742.863044] RAX: 0000000000000000 RBX: ffff8a71ebed4980 RCX: 0000000000000013
[ 2742.864540] RDX: 0000000000000000 RSI: 0000000000000013 RDI: ffff8a71ebed4980
[ 2742.866020] RBP: ffff8a71ea717000 R08: ffffffffc083903c R09: ffff8a71ea717000
[ 2742.867505] R10: 0000000000000000 R11: 0000000000000000 R12: ffff8a71ebed4980
[ 2742.868987] R13: 0000000000000013 R14: ffff8a71ea5b49c0 R15: 0000000000000000
[ 2742.870473] FS: 00007f02266c9740(0000) GS:ffff8a71ffdc0000(0000) knlGS:0000000000000000
[ 2742.872143] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 2742.873340] CR2: 0000000000000941 CR3: 000000042bc20006 CR4: 00000000001606e0
[ 2742.874821] Call Trace:
[ 2742.875358] ops_init+0x38/0xf0
[ 2742.876078] setup_net+0xd9/0x1f0
[ 2742.876789] copy_net_ns+0xb7/0x130
[ 2742.877538] create_new_namespaces+0x11a/0x1d0
[ 2742.878525] unshare_nsproxy_namespaces+0x55/0xa0
[ 2742.879526] ksys_unshare+0x1a7/0x330
[ 2742.880313] __x64_sys_unshare+0xe/0x20
[ 2742.881131] do_syscall_64+0x5b/0x180
[ 2742.881933] entry_SYSCALL_64_after_hwframe+0x44/0xa9
Reproduce:
echo 1 > /proc/sys/net/core/fb_tunnels_only_for_init_net
modprobe ip_vti
unshare -n
Fixes: 79134e6ce2c9 ("net: do not create fallback tunnels for non-default namespaces")
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Recently, ops->init() and ops->dump() of all actions were modified to
always obtain tcf_lock when accessing private action state. Actions that
don't depend on tcf_lock for synchronization with their data path use
non-bh locking API. However, tcf_lock is also used to protect rate
estimator stats in softirq context by timer callback.
Change ops->init() and ops->dump() of all actions to disable bh when using
tcf_lock to prevent deadlock reported by following lockdep warning:
[ 105.470398] ================================
[ 105.475014] WARNING: inconsistent lock state
[ 105.479628] 4.18.0-rc8+ #664 Not tainted
[ 105.483897] --------------------------------
[ 105.488511] inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
[ 105.494871] swapper/16/0 [HC0[0]:SC1[1]:HE1:SE0] takes:
[ 105.500449] 00000000f86c012e (&(&p->tcfa_lock)->rlock){+.?.}, at: est_fetch_counters+0x3c/0xa0
[ 105.509696] {SOFTIRQ-ON-W} state was registered at:
[ 105.514925] _raw_spin_lock+0x2c/0x40
[ 105.519022] tcf_bpf_init+0x579/0x820 [act_bpf]
[ 105.523990] tcf_action_init_1+0x4e4/0x660
[ 105.528518] tcf_action_init+0x1ce/0x2d0
[ 105.532880] tcf_exts_validate+0x1d8/0x200
[ 105.537416] fl_change+0x55a/0x268b [cls_flower]
[ 105.542469] tc_new_tfilter+0x748/0xa20
[ 105.546738] rtnetlink_rcv_msg+0x56a/0x6d0
[ 105.551268] netlink_rcv_skb+0x18d/0x200
[ 105.555628] netlink_unicast+0x2d0/0x370
[ 105.559990] netlink_sendmsg+0x3b9/0x6a0
[ 105.564349] sock_sendmsg+0x6b/0x80
[ 105.568271] ___sys_sendmsg+0x4a1/0x520
[ 105.572547] __sys_sendmsg+0xd7/0x150
[ 105.576655] do_syscall_64+0x72/0x2c0
[ 105.580757] entry_SYSCALL_64_after_hwframe+0x49/0xbe
[ 105.586243] irq event stamp: 489296
[ 105.590084] hardirqs last enabled at (489296): [<ffffffffb507e639>] _raw_spin_unlock_irq+0x29/0x40
[ 105.599765] hardirqs last disabled at (489295): [<ffffffffb507e745>] _raw_spin_lock_irq+0x15/0x50
[ 105.609277] softirqs last enabled at (489292): [<ffffffffb413a6a3>] irq_enter+0x83/0xa0
[ 105.618001] softirqs last disabled at (489293): [<ffffffffb413a800>] irq_exit+0x140/0x190
[ 105.626813]
other info that might help us debug this:
[ 105.633976] Possible unsafe locking scenario:
[ 105.640526] CPU0
[ 105.643325] ----
[ 105.646125] lock(&(&p->tcfa_lock)->rlock);
[ 105.650747] <Interrupt>
[ 105.653717] lock(&(&p->tcfa_lock)->rlock);
[ 105.658514]
*** DEADLOCK ***
[ 105.665349] 1 lock held by swapper/16/0:
[ 105.669629] #0: 00000000a640ad99 ((&est->timer)){+.-.}, at: call_timer_fn+0x10b/0x550
[ 105.678200]
stack backtrace:
[ 105.683194] CPU: 16 PID: 0 Comm: swapper/16 Not tainted 4.18.0-rc8+ #664
[ 105.690249] Hardware name: Supermicro SYS-2028TP-DECR/X10DRT-P, BIOS 2.0b 03/30/2017
[ 105.698626] Call Trace:
[ 105.701421] <IRQ>
[ 105.703791] dump_stack+0x92/0xeb
[ 105.707461] print_usage_bug+0x336/0x34c
[ 105.711744] mark_lock+0x7c9/0x980
[ 105.715500] ? print_shortest_lock_dependencies+0x2e0/0x2e0
[ 105.721424] ? check_usage_forwards+0x230/0x230
[ 105.726315] __lock_acquire+0x923/0x26f0
[ 105.730597] ? debug_show_all_locks+0x240/0x240
[ 105.735478] ? mark_lock+0x493/0x980
[ 105.739412] ? check_chain_key+0x140/0x1f0
[ 105.743861] ? __lock_acquire+0x836/0x26f0
[ 105.748323] ? lock_acquire+0x12e/0x290
[ 105.752516] lock_acquire+0x12e/0x290
[ 105.756539] ? est_fetch_counters+0x3c/0xa0
[ 105.761084] _raw_spin_lock+0x2c/0x40
[ 105.765099] ? est_fetch_counters+0x3c/0xa0
[ 105.769633] est_fetch_counters+0x3c/0xa0
[ 105.773995] est_timer+0x87/0x390
[ 105.777670] ? est_fetch_counters+0xa0/0xa0
[ 105.782210] ? lock_acquire+0x12e/0x290
[ 105.786410] call_timer_fn+0x161/0x550
[ 105.790512] ? est_fetch_counters+0xa0/0xa0
[ 105.795055] ? del_timer_sync+0xd0/0xd0
[ 105.799249] ? __lock_is_held+0x93/0x110
[ 105.803531] ? mark_held_locks+0x20/0xe0
[ 105.807813] ? _raw_spin_unlock_irq+0x29/0x40
[ 105.812525] ? est_fetch_counters+0xa0/0xa0
[ 105.817069] ? est_fetch_counters+0xa0/0xa0
[ 105.821610] run_timer_softirq+0x3c4/0x9f0
[ 105.826064] ? lock_acquire+0x12e/0x290
[ 105.830257] ? __bpf_trace_timer_class+0x10/0x10
[ 105.835237] ? __lock_is_held+0x25/0x110
[ 105.839517] __do_softirq+0x11d/0x7bf
[ 105.843542] irq_exit+0x140/0x190
[ 105.847208] smp_apic_timer_interrupt+0xac/0x3b0
[ 105.852182] apic_timer_interrupt+0xf/0x20
[ 105.856628] </IRQ>
[ 105.859081] RIP: 0010:cpuidle_enter_state+0xd8/0x4d0
[ 105.864395] Code: 46 ff 48 89 44 24 08 0f 1f 44 00 00 31 ff e8 cf ec 46 ff 80 7c 24 07 00 0f 85 1d 02 00 00 e8 9f 90 4b ff fb 66 0f 1f 44 00 00 <4c> 8b 6c 24 08 4d 29 fd 0f 80 36 03 00 00 4c 89 e8 48 ba cf f7 53
[ 105.884288] RSP: 0018:ffff8803ad94fd20 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13
[ 105.892494] RAX: 0000000000000000 RBX: ffffe8fb300829c0 RCX: ffffffffb41e19e1
[ 105.899988] RDX: 0000000000000007 RSI: dffffc0000000000 RDI: ffff8803ad9358ac
[ 105.907503] RBP: ffffffffb6636300 R08: 0000000000000004 R09: 0000000000000000
[ 105.914997] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000004
[ 105.922487] R13: ffffffffb6636140 R14: ffffffffb66362d8 R15: 000000188d36091b
[ 105.929988] ? trace_hardirqs_on_caller+0x141/0x2d0
[ 105.935232] do_idle+0x28e/0x320
[ 105.938817] ? arch_cpu_idle_exit+0x40/0x40
[ 105.943361] ? mark_lock+0x8c1/0x980
[ 105.947295] ? _raw_spin_unlock_irqrestore+0x32/0x60
[ 105.952619] cpu_startup_entry+0xc2/0xd0
[ 105.956900] ? cpu_in_idle+0x20/0x20
[ 105.960830] ? _raw_spin_unlock_irqrestore+0x32/0x60
[ 105.966146] ? trace_hardirqs_on_caller+0x141/0x2d0
[ 105.971391] start_secondary+0x2b5/0x360
[ 105.975669] ? set_cpu_sibling_map+0x1330/0x1330
[ 105.980654] secondary_startup_64+0xa5/0xb0
Taking tcf_lock in sample action with bh disabled causes lockdep to issue a
warning regarding possible irq lock inversion dependency between tcf_lock,
and psample_groups_lock that is taken when holding tcf_lock in sample init:
[ 162.108959] Possible interrupt unsafe locking scenario:
[ 162.116386] CPU0 CPU1
[ 162.121277] ---- ----
[ 162.126162] lock(psample_groups_lock);
[ 162.130447] local_irq_disable();
[ 162.136772] lock(&(&p->tcfa_lock)->rlock);
[ 162.143957] lock(psample_groups_lock);
[ 162.150813] <Interrupt>
[ 162.153808] lock(&(&p->tcfa_lock)->rlock);
[ 162.158608]
*** DEADLOCK ***
In order to prevent potential lock inversion dependency between tcf_lock
and psample_groups_lock, extract call to psample_group_get() from tcf_lock
protected section in sample action init function.
Fixes: 4e232818bd32 ("net: sched: act_mirred: remove dependency on rtnl lock")
Fixes: 764e9a24480f ("net: sched: act_vlan: remove dependency on rtnl lock")
Fixes: 729e01260989 ("net: sched: act_tunnel_key: remove dependency on rtnl lock")
Fixes: d77284956656 ("net: sched: act_sample: remove dependency on rtnl lock")
Fixes: e8917f437006 ("net: sched: act_gact: remove dependency on rtnl lock")
Fixes: b6a2b971c0b0 ("net: sched: act_csum: remove dependency on rtnl lock")
Fixes: 2142236b4584 ("net: sched: act_bpf: remove dependency on rtnl lock")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Same as ip_vti, use iptunnel_xmit_stats to updates stats in tunnel xmit
code path.
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| |\ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Daniel Borkmann says:
====================
pull-request: bpf 2018-08-18
The following pull-request contains BPF updates for your *net* tree.
The main changes are:
1) Fix a BPF selftest failure in test_cgroup_storage due to rlimit
restrictions, from Yonghong.
2) Fix a suspicious RCU rcu_dereference_check() warning triggered
from removing a device's XDP memory allocator by using the correct
rhashtable lookup function, from Tariq.
3) A batch of BPF sockmap and ULP fixes mainly fixing leaks and races
as well as enforcing module aliases for ULPs. Another fix for BPF
map redirect to make them work again with tail calls, from Daniel.
4) Fix XDP BPF samples to unload their programs upon SIGTERM, from Jesper.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Commits 109980b894e9 ("bpf: don't select potentially stale ri->map
from buggy xdp progs") and 7c3001313396 ("bpf: fix ri->map_owner
pointer on bpf_prog_realloc") tried to mitigate that buggy programs
using bpf_redirect_map() helper call do not leave stale maps behind.
Idea was to add a map_owner cookie into the per CPU struct redirect_info
which was set to prog->aux by the prog making the helper call as a
proof that the map is not stale since the prog is implicitly holding
a reference to it. This owner cookie could later on get compared with
the program calling into BPF whether they match and therefore the
redirect could proceed with processing the map safely.
In (obvious) hindsight, this approach breaks down when tail calls are
involved since the original caller's prog->aux pointer does not have
to match the one from one of the progs out of the tail call chain,
and therefore the xdp buffer will be dropped instead of redirected.
A way around that would be to fix the issue differently (which also
allows to remove related work in fast path at the same time): once
the life-time of a redirect map has come to its end we use it's map
free callback where we need to wait on synchronize_rcu() for current
outstanding xdp buffers and remove such a map pointer from the
redirect info if found to be present. At that time no program is
using this map anymore so we simply invalidate the map pointers to
NULL iff they previously pointed to that instance while making sure
that the redirect path only reads out the map once.
Fixes: 97f91a7cf04f ("bpf: add bpf_redirect_map helper routine")
Fixes: 109980b894e9 ("bpf: don't select potentially stale ri->map from buggy xdp progs")
Reported-by: Sebastiano Miano <sebastiano.miano@polito.it>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|