| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
As of f025adf191924e3a75ce80e130afcd2485b53bb8 "sunrpc: Properly decode
kuids and kgids in RPC_AUTH_UNIX credentials" any rpc containing a -1
(0xffff) uid or gid would fail with a badcred error.
Commit afe3c3fd5392b2f0066930abc5dbd3f4b14a0f13 "svcrpc: fix failures to
handle -1 uid's and gid's" fixed part of the problem, but overlooked the
gid upcall--the kernel can request supplementary gid's for the -1 uid,
but mountd's attempt write a response will get -EINVAL.
Symptoms were nfsd failing to reply to the first attempt to use a newly
negotiated krb5 context.
Reported-by: Sven Geggus <lists@fuchsschwanzdomain.de>
Tested-by: Sven Geggus <lists@fuchsschwanzdomain.de>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a cache entry is replaced, the "expiry_time" get set to
zero by a call to "cache_fresh_locked(..., 0)" at the end of
"sunrpc_cache_update".
This low expiry time makes cache_check() think that the 'refresh_age'
is negative, so the 'age' is comparatively large and a refresh is
triggered.
However refreshing a replaced entry it pointless, it cannot achieve
anything useful.
So teach cache_check to ignore a low refresh_age when expiry_time
is zero.
Reported-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
commit d202cce8963d9268ff355a386e20243e8332b308
sunrpc: never return expired entries in sunrpc_cache_lookup
moved the 'entry is expired' test from cache_check to
sunrpc_cache_lookup, so that it happened early and some races could
safely be ignored.
However the ip_map (in svcauth_unix.c) has a separate single-item
cache which allows quick lookup without locking. An entry in this
case would not be subject to the expiry test and so could be used
well after it has expired.
This is not normally a big problem because the first time it is used
after it is expired an up-call will be scheduled to refresh the entry
(if it hasn't been scheduled already) and the old entry will then
be invalidated. So on the second attempt to use it after it has
expired, ip_map_cached_get will discard it.
However that is subtle and not ideal, so replace the "!cache_valid"
test with "cache_is_expired".
In doing this we drop the test on the "CACHE_VALID" bit. This is
unnecessary as the bit is never cleared, and an entry will only
be cached if the bit is set.
Reported-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It is possible for a race to set CACHE_PENDING after cache_clean()
has removed a cache entry from the cache.
If CACHE_PENDING is still set when the entry is finally 'put',
the cache_dequeue() will never happen and we can leak memory.
So set a new flag 'CACHE_CLEANED' when we remove something from
the cache, and don't queue any upcall if it is set.
If CACHE_PENDING is set before CACHE_CLEANED, the call that
cache_clean() makes to cache_fresh_unlocked() will free memory
as needed. If CACHE_PENDING is set after CACHE_CLEANED, the
test in sunrpc_cache_pipe_upcall will ensure that the memory
is not allocated.
Reported-by: <bstroesser@ts.fujitsu.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cache_fresh_unlocked() is called when a cache entry
has been updated and ensures that if there were any
pending upcalls, they are cleared.
So every time we update a cache entry, we should call this,
and this should be the only way that we try to clear
pending calls (that sort of uniformity makes code sooo much
easier to read).
try_to_negate_entry() will (possibly) mark an entry as
negative. If it doesn't, it is because the entry already
is VALID.
So the entry will be valid on exit, so it is appropriate to
call cache_fresh_unlocked().
So tidy up try_to_negate_entry() to do that, and remove
partial open-coded cache_fresh_unlocked() from the one
call-site of try_to_negate_entry().
In the other branch of the 'switch(cache_make_upcall())',
we again have a partial open-coded version of cache_fresh_unlocked().
Replace that with a real call.
And again in cache_clean(), use a real call to cache_fresh_unlocked().
These call sites might previously have called
cache_revisit_request() if CACHE_PENDING wasn't set.
This is never necessary because cache_revisit_request() can
only do anything if the item is in the cache_defer_hash,
However any time that an item is added to the cache_defer_hash
(setup_deferral), the code immediately tests CACHE_PENDING,
and removes the entry again if it is clear. So all other
places we only need to 'cache_revisit_request' if we've
just cleared CACHE_PENDING.
Reported-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We currently queue an upcall after setting CACHE_PENDING,
and dequeue after clearing CACHE_PENDING.
So a request should only be present when CACHE_PENDING is set.
However we don't combine the test and the enqueue/dequeue in
a protected region, so it is possible (if unlikely) for a race
to result in a request being queued without CACHE_PENDING set,
or a request to be absent despite CACHE_PENDING.
So: include a test for CACHE_PENDING inside the regions of
enqueue and dequeue where queue_lock is held, and abort
the operation if the value is not as expected.
Also remove the early 'return' from cache_dequeue() to ensure that it
always removes all entries: As there is no locking between setting
CACHE_PENDING and calling sunrpc_cache_pipe_upcall it is not
inconceivable for some other thread to clear CACHE_PENDING and then
someone else to set it and call sunrpc_cache_pipe_upcall, both before
the original threads completed the call.
With this, it perfectly safe and correct to:
- call cache_dequeue() if and only if we have just
cleared CACHE_PENDING
- call sunrpc_cache_pipe_upcall() (via cache_make_upcall)
if and only if we have just set CACHE_PENDING.
Reported-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: Bodo Stroesser <bstroesser@ts.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Though clients we care about mostly don't do this, it is possible for
rpc requests to be sent in multiple fragments. Here we have a sanity
check to ensure that the final received rpc isn't too small--except that
the number we're actually checking is the length of just the final
fragment, not of the whole rpc. So a perfectly legal rpc that's
unluckily fragmented could cause the server to close the connection
here.
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we detect that an rpc is too short, we abort and close the
connection. Except, there's a bug here: we're leaving sk_datalen
nonzero without leaving any pages in the sk_pages array. The most
likely result of the inconsistency is a subsequent crash in
svc_tcp_clear_pages.
Also demote the BUG_ON in svc_tcp_clear_pages to a WARN.
Cc: stable@kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
|
|
|
|
|
|
|
| |
Store a pointer to the gss mechanism used in the rq_cred and cl_cred.
This will make it easier to enforce SP4_MACH_CRED, which needs to
compare the mechanism used on the exchange_id with that used on
protected operations.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|
|
|
|
|
| |
Common helper to zero out fields of the svc_cred.
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|\
| |
| |
| | |
Merge bugfixes into my for-3.11 branch.
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
As of f025adf191924e3a75ce80e130afcd2485b53bb8 "sunrpc: Properly decode
kuids and kgids in RPC_AUTH_UNIX credentials" any rpc containing a -1
(0xffff) uid or gid would fail with a badcred error.
Reported symptoms were xmbc clients failing on upgrade of the NFS
server; examination of the network trace showed them sending -1 as the
gid.
Reported-by: Julian Sikorski <belegdol@gmail.com>
Tested-by: Julian Sikorski <belegdol@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: stable@vger.kernel.org
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Somebody noticed LTP was complaining about O_NONBLOCK opens of
/proc/net/rpc/use-gss-proxy succeeding and then a following read
hanging.
I'm not convinced LTP really has any business opening random proc files
and expecting them to behave a certain way. Maybe this isn't really a
bug.
But in any case the O_NONBLOCK behavior could be useful for someone that
wants to test whether gss-proxy is up without waiting.
Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
This should return zero on success and -EBUSY on error so the type
needs to be int instead of bool.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
The cache_detail(*detail) in function cache_is_valid is not used any
more.
Signed-off-by: fanchaoting <fanchaoting@cn.fujitsu.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|/
|
|
|
|
|
|
|
| |
XPRT_BOUND is set on server backchannel xprts by xs_setup_bc_tcp()
(using xprt_set_bound()), and is never cleared, so ->rpcbind() will
never need to be called.
Reported-by: "Myklebust, Trond" <Trond.Myklebust@netapp.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull audit changes from Eric Paris:
"Al used to send pull requests every couple of years but he told me to
just start pushing them to you directly.
Our touching outside of core audit code is pretty straight forward. A
couple of interface changes which hit net/. A simple argument bug
calling audit functions in namei.c and the removal of some assembly
branch prediction code on ppc"
* git://git.infradead.org/users/eparis/audit: (31 commits)
audit: fix message spacing printing auid
Revert "audit: move kaudit thread start from auditd registration to kaudit init"
audit: vfs: fix audit_inode call in O_CREAT case of do_last
audit: Make testing for a valid loginuid explicit.
audit: fix event coverage of AUDIT_ANOM_LINK
audit: use spin_lock in audit_receive_msg to process tty logging
audit: do not needlessly take a lock in tty_audit_exit
audit: do not needlessly take a spinlock in copy_signal
audit: add an option to control logging of passwords with pam_tty_audit
audit: use spin_lock_irqsave/restore in audit tty code
helper for some session id stuff
audit: use a consistent audit helper to log lsm information
audit: push loginuid and sessionid processing down
audit: stop pushing loginid, uid, sessionid as arguments
audit: remove the old depricated kernel interface
audit: make validity checking generic
audit: allow checking the type of audit message in the user filter
audit: fix build break when AUDIT_DEBUG == 2
audit: remove duplicate export of audit_enabled
Audit: do not print error when LSMs disabled
...
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
parameters by itself
__audit_socketcall is an extern function.
better to check its parameters by itself.
also can return error code, when fail (find invalid parameters).
also use macro instead of real hard code number
also give related comments for it.
Signed-off-by: Chen Gang <gang.chen@asianux.com>
[eparis: fix the return value when !CONFIG_AUDIT]
Signed-off-by: Eric Paris <eparis@redhat.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Pull nfsd fixes from Bruce Fields:
"Small fixes for two bugs and two warnings"
* 'for-3.10' of git://linux-nfs.org/~bfields/linux:
nfsd: fix oops when legacy_recdir_name_error is passed a -ENOENT error
SUNRPC: fix decoding of optional gss-proxy xdr fields
SUNRPC: Refactor gssx_dec_option_array() to kill uninitialized warning
nfsd4: don't allow owner override on 4.1 CLAIM_FH opens
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The current code works, but sort of by accident: it obviously didn't
intend the error return to be interpreted as "true".
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
net/sunrpc/auth_gss/gss_rpc_xdr.c: In function ‘gssx_dec_option_array’:
net/sunrpc/auth_gss/gss_rpc_xdr.c:258: warning: ‘creds’ may be used uninitialized in this function
Return early if count is zero, to make it clearer to the compiler (and the
casual reviewer) that no more processing is done.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Pull more NFS client bugfixes from Trond Myklebust:
- Ensure that we match the 'sec=' mount flavour against the server list
- Fix the NFSv4 byte range locking in the presence of delegations
- Ensure that we conform to the NFSv4.1 spec w.r.t. freeing lock
stateids
- Fix a pNFS data server connection race
* tag 'nfs-for-3.10-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS4.1 Fix data server connection race
NFSv3: match sec= flavor against server list
NFSv4.1: Ensure that we free the lock stateid on the server
NFSv4: Convert nfs41_free_stateid to use an asynchronous RPC call
SUNRPC: Don't spam syslog with "Pseudoflavor not found" messages
NFSv4.x: Fix handling of partially delegated locks
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Just convert those messages to dprintk()s so that they can be used
when debugging.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Rather than having logic to calculate inner protocol in every
tunnel gso handler move it to gso code. This simplifies code.
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Cong Wang <amwang@redhat.com>
Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Pull networking fixes from David Miller:
"Just a small pile of fixes"
1) Fix race conditions in IP fragmentation LRU list handling, from
Konstantin Khlebnikov.
2) vfree() is no longer verboten in interrupts, so deferring is
pointless, from Al Viro.
3) Conversion from mutex to semaphore in netpoll left trylock test
inverted, caught by Dan Carpenter.
4) 3c59x uses wrong base address when releasing regions, from Sergei
Shtylyov.
5) Bounds checking in TIPC from Dan Carpenter.
6) Fastopen cookies should not be expired as aggressively as other TCP
metrics. From Eric Dumazet.
7) Fix retrieval of MAC address in ibmveth, from Ben Herrenschmidt.
8) Don't use "u16" in virtio user headers, from Stephen Hemminger
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
tipc: potential divide by zero in tipc_link_recv_fragment()
tipc: add a bounds check in link_recv_changeover_msg()
net/usb: new driver for RTL8152
3c59x: fix freeing nonexistent resource on driver unload
netpoll: inverted down_trylock() test
rps_dev_flow_table_release(): no need to delay vfree()
fib_trie: no need to delay vfree()
net: frag, fix race conditions in LRU list maintenance
tcp: do not expire TCP fastopen cookies
net/eth/ibmveth: Fixup retrieval of MAC address
virtio: don't expose u16 in userspace api
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The worry here is that fragm_sz could be zero since it comes from
skb->data.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The bearer_id here comes from skb->data and it can be a number from 0 to
7. The problem is that the ->links[] array has only 2 elements so I
have added a range check.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The return value is reversed from mutex_trylock().
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The same story as with fib_trie patch - vfree() from RCU callbacks
is legitimate now.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Now that vfree() can be called from interrupt contexts, there's no
need to play games with schedule_work() to escape calling vfree()
from RCU callbacks.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This patch fixes race between inet_frag_lru_move() and inet_frag_lru_add()
which was introduced in commit 3ef0eb0db4bf92c6d2510fe5c4dc51852746f206
("net: frag, move LRU list maintenance outside of rwlock")
One cpu already added new fragment queue into hash but not into LRU.
Other cpu found it in hash and tries to move it to the end of LRU.
This leads to NULL pointer dereference inside of list_move_tail().
Another possible race condition is between inet_frag_lru_move() and
inet_frag_lru_del(): move can happens after deletion.
This patch initializes LRU list head before adding fragment into hash and
inet_frag_lru_move() doesn't touches it if it's empty.
I saw this kernel oops two times in a couple of days.
[119482.128853] BUG: unable to handle kernel NULL pointer dereference at (null)
[119482.132693] IP: [<ffffffff812ede89>] __list_del_entry+0x29/0xd0
[119482.136456] PGD 2148f6067 PUD 215ab9067 PMD 0
[119482.140221] Oops: 0000 [#1] SMP
[119482.144008] Modules linked in: vfat msdos fat 8021q fuse nfsd auth_rpcgss nfs_acl nfs lockd sunrpc ppp_async ppp_generic bridge slhc stp llc w83627ehf hwmon_vid snd_hda_codec_hdmi snd_hda_codec_realtek kvm_amd k10temp kvm snd_hda_intel snd_hda_codec edac_core radeon snd_hwdep ath9k snd_pcm ath9k_common snd_page_alloc ath9k_hw snd_timer snd soundcore drm_kms_helper ath ttm r8169 mii
[119482.152692] CPU 3
[119482.152721] Pid: 20, comm: ksoftirqd/3 Not tainted 3.9.0-zurg-00001-g9f95269 #132 To Be Filled By O.E.M. To Be Filled By O.E.M./RS880D
[119482.161478] RIP: 0010:[<ffffffff812ede89>] [<ffffffff812ede89>] __list_del_entry+0x29/0xd0
[119482.166004] RSP: 0018:ffff880216d5db58 EFLAGS: 00010207
[119482.170568] RAX: 0000000000000000 RBX: ffff88020882b9c0 RCX: dead000000200200
[119482.175189] RDX: 0000000000000000 RSI: 0000000000000880 RDI: ffff88020882ba00
[119482.179860] RBP: ffff880216d5db58 R08: ffffffff8155c7f0 R09: 0000000000000014
[119482.184570] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88020882ba00
[119482.189337] R13: ffffffff81c8d780 R14: ffff880204357f00 R15: 00000000000005a0
[119482.194140] FS: 00007f58124dc700(0000) GS:ffff88021fcc0000(0000) knlGS:0000000000000000
[119482.198928] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[119482.203711] CR2: 0000000000000000 CR3: 00000002155f0000 CR4: 00000000000007e0
[119482.208533] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[119482.213371] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[119482.218221] Process ksoftirqd/3 (pid: 20, threadinfo ffff880216d5c000, task ffff880216d3a9a0)
[119482.223113] Stack:
[119482.228004] ffff880216d5dbd8 ffffffff8155dcda 0000000000000000 ffff000200000001
[119482.233038] ffff8802153c1f00 ffff880000289440 ffff880200000014 ffff88007bc72000
[119482.238083] 00000000000079d5 ffff88007bc72f44 ffffffff00000002 ffff880204357f00
[119482.243090] Call Trace:
[119482.248009] [<ffffffff8155dcda>] ip_defrag+0x8fa/0xd10
[119482.252921] [<ffffffff815a8013>] ipv4_conntrack_defrag+0x83/0xe0
[119482.257803] [<ffffffff8154485b>] nf_iterate+0x8b/0xa0
[119482.262658] [<ffffffff8155c7f0>] ? inet_del_offload+0x40/0x40
[119482.267527] [<ffffffff815448e4>] nf_hook_slow+0x74/0x130
[119482.272412] [<ffffffff8155c7f0>] ? inet_del_offload+0x40/0x40
[119482.277302] [<ffffffff8155d068>] ip_rcv+0x268/0x320
[119482.282147] [<ffffffff81519992>] __netif_receive_skb_core+0x612/0x7e0
[119482.286998] [<ffffffff81519b78>] __netif_receive_skb+0x18/0x60
[119482.291826] [<ffffffff8151a650>] process_backlog+0xa0/0x160
[119482.296648] [<ffffffff81519f29>] net_rx_action+0x139/0x220
[119482.301403] [<ffffffff81053707>] __do_softirq+0xe7/0x220
[119482.306103] [<ffffffff81053868>] run_ksoftirqd+0x28/0x40
[119482.310809] [<ffffffff81074f5f>] smpboot_thread_fn+0xff/0x1a0
[119482.315515] [<ffffffff81074e60>] ? lg_local_lock_cpu+0x40/0x40
[119482.320219] [<ffffffff8106d870>] kthread+0xc0/0xd0
[119482.324858] [<ffffffff8106d7b0>] ? insert_kthread_work+0x40/0x40
[119482.329460] [<ffffffff816c32dc>] ret_from_fork+0x7c/0xb0
[119482.334057] [<ffffffff8106d7b0>] ? insert_kthread_work+0x40/0x40
[119482.338661] Code: 00 00 55 48 8b 17 48 b9 00 01 10 00 00 00 ad de 48 8b 47 08 48 89 e5 48 39 ca 74 29 48 b9 00 02 20 00 00 00 ad de 48 39 c8 74 7a <4c> 8b 00 4c 39 c7 75 53 4c 8b 42 08 4c 39 c7 75 2b 48 89 42 08
[119482.343787] RIP [<ffffffff812ede89>] __list_del_entry+0x29/0xd0
[119482.348675] RSP <ffff880216d5db58>
[119482.353493] CR2: 0000000000000000
Oops happened on this path:
ip_defrag() -> ip_frag_queue() -> inet_frag_lru_move() -> list_move_tail() -> __list_del_entry()
Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Florian Westphal <fw@strlen.de>
Cc: Eric Dumazet <edumazet@google.com>
Cc: David S. Miller <davem@davemloft.net>
Acked-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
TCP metric cache expires entries after one hour.
This probably make sense for TCP RTT/RTTVAR/CWND, but not
for TCP fastopen cookies.
Its better to try previous cookie. If it appears to be obsolete,
server will send us new cookie anyway.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|\ \ \ \ \
| |/ / / /
|/| | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client
Pull Ceph changes from Alex Elder:
"This is a big pull.
Most of it is culmination of Alex's work to implement RBD image
layering, which is now complete (yay!).
There is also some work from Yan to fix i_mutex behavior surrounding
writes in cephfs, a sync write fix, a fix for RBD images that get
resized while they are mapped, and a few patches from me that resolve
annoying auth warnings and fix several bugs in the ceph auth code."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (254 commits)
rbd: fix image request leak on parent read
libceph: use slab cache for osd client requests
libceph: allocate ceph message data with a slab allocator
libceph: allocate ceph messages with a slab allocator
rbd: allocate image object names with a slab allocator
rbd: allocate object requests with a slab allocator
rbd: allocate name separate from obj_request
rbd: allocate image requests with a slab allocator
rbd: use binary search for snapshot lookup
rbd: clear EXISTS flag if mapped snapshot disappears
rbd: kill off the snapshot list
rbd: define rbd_snap_size() and rbd_snap_features()
rbd: use snap_id not index to look up snap info
rbd: look up snapshot name in names buffer
rbd: drop obj_request->version
rbd: drop rbd_obj_method_sync() version parameter
rbd: more version parameter removal
rbd: get rid of some version parameters
rbd: stop tracking header object version
rbd: snap names are pointer to constant data
...
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Create a slab cache to manage allocation of ceph_osdc_request
structures.
This resolves:
http://tracker.ceph.com/issues/3926
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Create a slab cache to manage ceph_msg_data structure allocation.
This is part of:
http://tracker.ceph.com/issues/3926
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Create a slab cache to manage ceph_msg structure allocation.
This is part of:
http://tracker.ceph.com/issues/3926
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This creates a new source file "net/ceph/snapshot.c" to contain
utility routines related to ceph snapshot contexts. The main
motivation was to define ceph_create_snap_context() as a common way
to create these structures, but I've moved the definitions of
ceph_get_snap_context() and ceph_put_snap_context() there too.
(The benefit of inlining those is very small, and I'd rather
keep this collection of functions together.)
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
A WATCH op includes an object version. The version that's supplied
is incorrectly byte-swapped osd_req_op_watch_init() where it's first
assigned (it's been this way since that code was first added).
The result is that the version sent to the osd is wrong, because
that value gets byte-swapped again in osd_req_encode_op(). This
is the source of a sparse warning related to improper byte order in
the assignment.
The approach of using the version to avoid a race is deprecated
(see http://tracker.ceph.com/issues/3871), and the watch parameter
is no longer even examined by the osd. So fix the assignment in
osd_req_op_watch_init() so it no longer does the byte swap.
This resolves:
http://tracker.ceph.com/issues/3847
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Add the ability to provide an array of pages as outbound request
data for object class method calls.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This patch makes four small changes in the ceph messenger.
While getting copyup functionality working I found two bugs in the
messenger. Existing paths through the code did not trigger these
problems, but they're fixed here:
- In ceph_msg_data_pagelist_cursor_init(), the cursor's
last_piece field was being checked against the length
supplied. This was OK until this commit: ccba6d98 libceph:
implement multiple data items in a message That commit changed
the cursor init routines to allow lengths to be supplied that
exceeded the size of the current data item. Because of this,
we have to use the assigned cursor resid field rather than the
provided length in determining whether the cursor points to
the last piece of a data item.
- In ceph_msg_data_add_pages(), a BUG_ON() was erroneously
catching attempts to add page data to a message if the message
already had data assigned to it. That was OK until that same
commit, at which point it was fine for messages to have
multiple data items. It slipped through because that BUG_ON()
call was present twice in that function. (You can never be too
careful.)
In addition two other minor things are changed:
- In ceph_msg_data_cursor_init(), the local variable "data" was
getting assigned twice.
- In ceph_msg_data_advance(), it was assumed that the
type-specific advance routine would set new_piece to true
after it advanced past the last piece. That may have been
fine, but since we check for that case we might as well set it
explicitly in ceph_msg_data_advance().
This resolves:
http://tracker.ceph.com/issues/4762
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Allow osd request ops that aren't otherwise structured (not class,
extent, or watch ops) to specify "raw" data to be used to hold
incoming data for the op. Make use of this capability for the osd
STAT op.
Prefix the name of the private function osd_req_op_init() with "_",
and expose a new function by that (earlier) name whose purpose is to
initialize osd ops with (only) implied data.
For now we'll just support the use of a page array for an osd op
with incoming raw data.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
There are a bunch of functions defined to encapsulate getting the
address of a data field for a particular op in an osd request.
They're all defined the same way, so create a macro to take the
place of all of them.
Two of these are used outside the osd client code, so preserve them
(but convert them to use the new macro internally). Stop exporting
the ones that aren't used elsewhere.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
In the incremental move toward supporting distinct data items in an
osd request some of the functions had "write_request" parameters to
indicate, basically, whether the data belonged to in_data or the
out_data. Now that we maintain the data fields in the op structure
there is no need to indicate the direction, so get rid of the
"write_request" parameters.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
An osd request currently has two callbacks. They inform the
initiator of the request when we've received confirmation for the
target osd that a request was received, and when the osd indicates
all changes described by the request are durable.
The only time the second callback is used is in the ceph file system
for a synchronous write. There's a race that makes some handling of
this case unsafe. This patch addresses this problem. The error
handling for this callback is also kind of gross, and this patch
changes that as well.
In ceph_sync_write(), if a safe callback is requested we want to add
the request on the ceph inode's unsafe items list. Because items on
this list must have their tid set (by ceph_osd_start_request()), the
request added *after* the call to that function returns. The
problem with this is that there's a race between starting the
request and adding it to the unsafe items list; the request may
already be complete before ceph_sync_write() even begins to put it
on the list.
To address this, we change the way the "safe" callback is used.
Rather than just calling it when the request is "safe", we use it to
notify the initiator the bounds (start and end) of the period during
which the request is *unsafe*. So the initiator gets notified just
before the request gets sent to the osd (when it is "unsafe"), and
again when it's known the results are durable (it's no longer
unsafe). The first call will get made in __send_request(), just
before the request message gets sent to the messenger for the first
time. That function is only called by __send_queued(), which is
always called with the osd client's request mutex held.
We then have this callback function insert the request on the ceph
inode's unsafe list when we're told the request is unsafe. This
will avoid the race because this call will be made under protection
of the osd client's request mutex. It also nicely groups the setup
and cleanup of the state associated with managing unsafe requests.
The name of the "safe" callback field is changed to "unsafe" to
better reflect its new purpose. It has a Boolean "unsafe" parameter
to indicate whether the request is becoming unsafe or is now safe.
Because the "msg" parameter wasn't used, we drop that.
This resolves the original problem reportedin:
http://tracker.ceph.com/issues/4706
Reported-by: Yan, Zheng <zheng.z.yan@intel.com>
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Yan, Zheng <zheng.z.yan@intel.com>
Reviewed-by: Sage Weil <sage@inktank.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Right now the data for a method call is specified via a pointer and
length, and it's copied--along with the class and method name--into
a pagelist data item to be sent to the osd. Instead, encode the
data in a data item separate from the class and method names.
This will allow large amounts of data to be supplied to methods
without copying. Only rbd uses the class functionality right now,
and when it really needs this it will probably need to use a page
array rather than a page list. But this simple implementation
demonstrates the functionality on the osd client, and that's enough
for now.
This resolves:
http://tracker.ceph.com/issues/4104
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Change the names of the functions that put data on a pagelist to
reflect that we're adding to whatever's already there rather than
just setting it to the one thing. Currently only one data item is
ever added to a message, but that's about to change.
This resolves:
http://tracker.ceph.com/issues/2770
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This patch adds support to the messenger for more than one data item
in its data list.
A message data cursor has two more fields to support this:
- a count of the number of bytes left to be consumed across
all data items in the list, "total_resid"
- a pointer to the head of the list (for validation only)
The cursor initialization routine has been split into two parts: the
outer one, which initializes the cursor for traversing the entire
list of data items; and the inner one, which initializes the cursor
to start processing a single data item.
When a message cursor is first initialized, the outer initialization
routine sets total_resid to the length provided. The data pointer
is initialized to the first data item on the list. From there, the
inner initialization routine finishes by setting up to process the
data item the cursor points to.
Advancing the cursor consumes bytes in total_resid. If the resid
field reaches zero, it means the current data item is fully
consumed. If total_resid indicates there is more data, the cursor
is advanced to point to the next data item, and then the inner
initialization routine prepares for using that. (A check is made at
this point to make sure we don't wrap around the front of the list.)
The type-specific init routines are modified so they can be given a
length that's larger than what the data item can support. The resid
field is initialized to the smaller of the provided length and the
length of the entire data item.
When total_resid reaches zero, we're done.
This resolves:
http://tracker.ceph.com/issues/3761
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
In place of the message data pointer, use a list head which links
through message data items. For now we only support a single entry
on that list.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Rather than having a ceph message data item point to the cursor it's
associated with, have the cursor point to a data item. This will
allow a message cursor to be used for more than one data item.
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
A message will only be processing a single data item at a time, so
there's no need for each data item to have its own cursor.
Move the cursor embedded in the message data structure into the
message itself. To minimize the impact, keep the data->cursor
field, but make it be a pointer to the cursor in the message.
Move the definition of ceph_msg_data above ceph_msg_data_cursor so
the cursor can point to the data without a forward definition rather
than vice-versa.
This and the upcoming patches are part of:
http://tracker.ceph.com/issues/3761
Signed-off-by: Alex Elder <elder@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
|