| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
| |
This patch introduce background_gc=sync enabling synchronous cleaning in
background.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
| |
This patch introduce a new ioctl for those users who want to trigger
checkpoint from userspace through ioctl.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch drops in batches gc triggered through ioctl, since user
can easily control the gc by designing the loop around the ->ioctl.
We support synchronous gc by forcing using FG_GC in f2fs_gc, so with
it, user can make sure that in this round all blocks gced were
persistent in the device until ioctl returned.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
| |
When searching victim during gc, if there are no dirty segments in
filesystem, we will still take the time to search the whole dirty segment
map, it's not needed, it's better to skip in this condition.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
| |
When doing gc, we search a victim in dirty map, starting from position of
last victim, we will reset the current searching position until we touch
the end of dirty map, and then search the whole diryt map. So sometimes we
will search the range [victim, last] twice, it's redundant, this patch
avoids this issue.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Our hit stat of extent cache will increase all the time until remount,
and we use atomic_t type for the stat variable, so it may easily incur
overflow when we query extent cache frequently in a long time running
fs.
So to avoid that, this patch uses atomic64_t for hit stat variables.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
| |
This patch introduces f2fs_kvmalloc to avoid -ENOMEM during mount.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
| |
If we do not call get_victim first, we cannot get a new victim for retrial
path.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes to maintain the right section count freed in garbage
collecting when triggering a foreground gc.
Besides, when a foreground gc is running on current selected section, once
we fail to gc one segment, it's better to abandon gcing the left segments
in current section, because anyway we will select next victim for
foreground gc, so gc on the left segments in previous section will become
overhead and also cause the long latency for caller.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch fixes to update ctime and atime correctly when truncating
larger in ->setattr.
The bug is reported by xfstest generic/313 as below:
generic/313 2s ... - output mismatch (see ./results/generic/313.out.bad)
--- tests/generic/313.out 2015-08-04 15:28:53.430798882 +0800
+++ results/generic/313.out.bad 2015-09-28 17:04:27.294278016 +0800
@@ -1,2 +1,4 @@
QA output created by 313
Silence is golden
+ctime not updated after truncate up
+mtime not updated after truncate up
...
(Run 'diff -u tests/generic/313.out tests/generic/313.out.bad' to see the entire diff)
Ran: generic/313
Failures: generic/313
Failed 1 of 1 tests
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, we skip dentry block writes when wbc is SYNC_NONE with no memory
pressure and the number of dirty pages is pretty small.
But, we didn't skip for normal data writes, which gives us not much big impact
on overall performance.
Moreover, by skipping some data writes, kworker falls into infinite loop to try
to write blocks, when many dir inodes have only one dentry block.
So, this patch removes skipping data writes.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
| |
Protecting recovery flow by using cp_rwsem is not needed, since we have
prevent triggering any checkpoint by locking cp_mutex previously.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In update_sit_info, we use div_u64 to handle 'u64 divide u64' case, but
div_u64 can only handle 32-bits divisor, so our divisor with u64 type
passed to div_u64 will overflow, result in the wrong calculation when
show debug info of f2fs as below:
BDF: 464, avg. vblocks: 23509
(BDF should never exceed 100)
So change to use div64_u64 to handle this case correctly.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
| |
This patch adds a new helper __try_update_largest_extent for cleanup.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
recover_inline_data
This fixes error handling for calls to various functions in the
function recover_inline_data to check if these particular functions
either return a error code or the boolean value false to signal their
caller they have failed internally and if this arises return false
to signal failure immediately to the caller of recover_inline_data
as we cannot continue after failures to calling either the function
truncate_inline_inode or truncate_blocks.
Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
| |
Swith extent_cache option dynamically when remount may casue consistency
issue between extent cache and dnode page. Fix in this patch to avoid
that condition.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
| |
We introduce F2FS_GET_BLOCK_READ in commit e2b4e2bc8865 ("f2fs: fix
incorrect mapping for bmap"), but forget to use this flag in the right
place, fix it.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Here is a oops reported as following message when testing generic/019 of
xfstest:
------------[ cut here ]------------
kernel BUG at /home/yuchao/git/f2fs-dev/segment.c:882!
invalid opcode: 0000 [#1] SMP
Modules linked in: zram lz4_compress lz4_decompress f2fs(O) ip6table_filter ip6_tables ebtable_nat ebtables nf_conntrack_ipv4
nf_def
CPU: 2 PID: 25441 Comm: fio Tainted: G O 4.3.0-rc1+ #6
Hardware name: Hewlett-Packard HP Z220 CMT Workstation/1790, BIOS K51 v01.61 05/16/2013
task: ffff8803f4e85580 ti: ffff8803fd61c000 task.ti: ffff8803fd61c000
RIP: 0010:[<ffffffffa0784981>] [<ffffffffa0784981>] new_curseg+0x321/0x330 [f2fs]
RSP: 0018:ffff8803fd61f918 EFLAGS: 00010246
RAX: 00000000000007ed RBX: 0000000000000224 RCX: 000000000000001f
RDX: 0000000000000800 RSI: ffffffffffffffff RDI: ffff8803f56f4300
RBP: ffff8803fd61f978 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000024 R11: ffff8800d23bbd78 R12: ffff8800d0ef0000
R13: 0000000000000224 R14: 0000000000000000 R15: 0000000000000001
FS: 00007f827ff85700(0000) GS:ffff88041ea80000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffff600000 CR3: 00000003fef17000 CR4: 00000000001406e0
Stack:
000007ea00000002 0000000100000001 ffff8803f6456248 000007ed0000002b
0000000000000224 ffff880404d1aa20 ffff8803fd61f9c8 ffff8800d0ef0000
ffff8803f6456248 0000000000000001 00000000ffffffff ffffffffa078f358
Call Trace:
[<ffffffffa0785b87>] allocate_segment_by_default+0x1a7/0x1f0 [f2fs]
[<ffffffffa078322c>] allocate_data_block+0x17c/0x360 [f2fs]
[<ffffffffa0779521>] __allocate_data_block+0x131/0x1d0 [f2fs]
[<ffffffffa077a995>] f2fs_direct_IO+0x4b5/0x580 [f2fs]
[<ffffffff811510ae>] generic_file_direct_write+0xae/0x160
[<ffffffff811518f5>] __generic_file_write_iter+0xd5/0x1f0
[<ffffffff81151e07>] generic_file_write_iter+0xf7/0x200
[<ffffffff81319e38>] ? apparmor_file_permission+0x18/0x20
[<ffffffffa0768480>] ? f2fs_fallocate+0x1190/0x1190 [f2fs]
[<ffffffffa07684c6>] f2fs_file_write_iter+0x46/0x90 [f2fs]
[<ffffffff8120b4fe>] aio_run_iocb+0x1ee/0x290
[<ffffffff81700f7e>] ? mutex_lock+0x1e/0x50
[<ffffffff8120a1d7>] ? aio_read_events+0x207/0x2b0
[<ffffffff8120b913>] do_io_submit+0x373/0x630
[<ffffffff8120a4f6>] ? SyS_io_getevents+0x56/0xb0
[<ffffffff8120bbe0>] SyS_io_submit+0x10/0x20
[<ffffffff81703857>] entry_SYSCALL_64_fastpath+0x12/0x6a
Code: 45 c8 48 8b 78 10 e8 9f 23 bf e0 41 8b 8c 24 cc 03 00 00 89 c7 31 d2 89 c6 89 d8 29 df f7 f1 29 d1 39 cf 0f 83 be fd ff ff eb
RIP [<ffffffffa0784981>] new_curseg+0x321/0x330 [f2fs]
RSP <ffff8803fd61f918>
---[ end trace 2e577d7f711ddb86 ]---
The reason is that: in the test of generic/019, we will trigger a manmade
IO error in block layer through debugfs, after that, prefree segment will
no longer be freed, because we always skip doing gc or checkpoint when
there occurs an IO error.
Meanwhile fio with aio engine generated a large number of direct IOs,
which continue allocating spaces in free segment until we run out of them,
eventually, results in panic in new_curseg as no more free segment was
found.
So, this patch changes to return EIO in direct_IO for this condition.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
truncate_data_blocks_range can do in batches truncation which makes all
changes in dnode page content, dnode page status, extent cache, block
count updating together.
But previously, truncate_hole() always truncates one block in dnode page
at a time by invoking truncate_data_blocks_range(,1), which make thing
slow.
This patch changes truncate_hole() to do in batches truncation for all
target blocks in one direct node inside truncate_data_blocks_range, which
can make our punch hole operation in ->fallocate more efficent.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix 2 potential problems:
1. when largest extent needs to be invalidated, it will be reset in
__drop_largest_extent, which makes __is_extent_same after always
return false, and largest extent unchanged. Now we update it properly.
2. when extent is split and the latter part remains in tree, next_en
should be the latter part instead of next extent of original extent.
It will cause merge failure if there is in-place update, although
there is not, I think this fix will still makes codes less ambiguous.
This patch also simplifies codes of invalidating extents, and optimizes the
procedues that split extent into two.
There are a few modifications after last patch:
1. prev_en now is updated properly.
2. more codes and branches are simplified.
Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
| |
now we update extent by range, fofs may not be on the largest
extent if the new extent overlaps with it. so add a new function
to drop largest extent properly.
Signed-off-by: Fan li <fanofcode.li@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
| |
This patch avoids to produce new checkpoint blocks before the previous meta
pages were written completely.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We got dentry pages from high_mem, and its address space directly goes into the
decryption path via f2fs_fname_disk_to_usr.
But, sg_init_one assumes the address is not from high_mem, so we can get this
panic since it doesn't call kmap_high but kunmap_high is triggered at the end.
kernel BUG at ../../../../../../kernel/mm/highmem.c:290!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP ARM
...
(kunmap_high+0xb0/0xb8) from [<c0114534>] (__kunmap_atomic+0xa0/0xa4)
(__kunmap_atomic+0xa0/0xa4) from [<c035f028>] (blkcipher_walk_done+0x128/0x1ec)
(blkcipher_walk_done+0x128/0x1ec) from [<c0366c24>] (crypto_cbc_decrypt+0xc0/0x170)
(crypto_cbc_decrypt+0xc0/0x170) from [<c0367148>] (crypto_cts_decrypt+0xc0/0x114)
(crypto_cts_decrypt+0xc0/0x114) from [<c035ea98>] (async_decrypt+0x40/0x48)
(async_decrypt+0x40/0x48) from [<c032ca34>] (f2fs_fname_disk_to_usr+0x124/0x304)
(f2fs_fname_disk_to_usr+0x124/0x304) from [<c03056fc>] (f2fs_fill_dentries+0xac/0x188)
(f2fs_fill_dentries+0xac/0x188) from [<c03059c8>] (f2fs_readdir+0x1f0/0x300)
(f2fs_readdir+0x1f0/0x300) from [<c0218054>] (vfs_readdir+0x90/0xb4)
(vfs_readdir+0x90/0xb4) from [<c0218418>] (SyS_getdents64+0x64/0xcc)
(SyS_getdents64+0x64/0xcc) from [<c0105ba0>] (ret_fast_syscall+0x0/0x30)
Cc: <stable@vger.kernel.org>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In this patch, we try to reorganize f2fs_map_blocks to make block mapping
flow more clear by using following structure:
/* check status of mapping */
if (unmapped) {
/* blkaddr == NULL_ADDR || blkaddr == NEW_ADDR */
if (create) {
/* write path, handle dio write case here */
alloc_and_map;
} else {
/*
* handle read cases from all call paths:
* 1. generic read;
* 2. dio read;
* 3. fiemap;
* 4. bmap
*/
}
}
/* map buffer_header */
Besides, this patch handles the missing case correctly for dio write:
When we fail in __allocate_data_blocks, then in f2fs_map_blocks, we will
not allocate blocks correctly for preallocated blocks, but returning with
an unmapped buffer head, which will result in failure of dio write.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
| |
This function should be static.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have potential overflow issue when calculating size of object, when
we left shift index with PAGE_CACHE_SHIFT bits, if type of index has only
32-bits space in 32-bit architecture, left shifting will incur overflow,
i.e:
pgoff_t index = 0xFFFFFFFF;
loff_t size = index << PAGE_CACHE_SHIFT;
size: 0xFFFFF000
So we should cast index with 64-bits type to avoid this issue.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When shrinking extent cache, we have two steps in the flow:
1) shrink objects which are unreferenced by inodes;
2) shrink objects from LRU list of extent cache.
In step 1, if we haven't shrunk enough number of objects, we will try
step 2, but before that we didn't update the searching position which
may point to last inode index in global extent tree, result in failing
to shrink objects by traversing the all inodes' extent tree.
In this patch, we reset searching position to beginning of global extent
tree for fixing.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
| |
This patch changes to verify file type early in f2fs_fallocate for
cleanup, meanwhile this also fixes to add missing verification for
expand_inode_data.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
| |
As comment says, we don't need to call f2fs_lock_op in write_inode to prevent
from producing dirty node pages all the time.
That happens only when there is not enough free sections and we can avoid that
by calling balance_fs in prior to that.
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
| |
This number is referenced by checkpoint under node_write lock.
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
f2fs_ioc_release_volatile_write
This fixes the incorrect return statement at the end of the function
f2fs_ioc_release_volatile_write's body for returning zero as this is
incorrect due to the function call before this return statement to
the function punch_hole being able to fail and we should return this
function's return fail directly in order to signal to callers of the
function f2fs_ioc_release_volatile if a failure arises with this call
to punch_hole fails.
Signed-off-by: Nicholas Krause <xerofoify@gmail.com>
Reviewed-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|
|
|
|
|
|
|
| |
Rename trace_f2fs_update_extent_tree to trace_f2fs_update_extent_tree_range,
then expand and enable it to trace in batches extent info updates.
Signed-off-by: Chao Yu <chao2.yu@samsung.com>
Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull NFS client bugfixes from Trond Myklebust:
"Highlights include:
Bugfixes:
- Fix a use-after-free bug in the RPC/RDMA client
- Fix a write performance regression
- Fix up page writeback accounting
- Don't try to reclaim unused state owners
- Fix a NFSv4 nograce recovery hang
- reset states to use open_stateid when returning delegation
voluntarily
- Fix a tracepoint NULL-pointer dereference"
* tag 'nfs-for-4.3-3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFS: Fix a tracepoint NULL-pointer dereference
nfs4: reset states to use open_stateid when returning delegation voluntarily
NFSv4: Fix a nograce recovery hang
NFSv4.1: nfs4_opendata_check_deleg needs to handle NFS4_OPEN_CLAIM_DELEG_CUR_FH
NFSv4: Don't try to reclaim unused state owners
NFS: Fix a write performance regression
NFS: Fix up page writeback accounting
xprtrdma: disconnect and flush cqs before freeing buffers
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Running xfstest generic/013 with the tracepoint nfs:nfs4_open_file
enabled produces a NULL-pointer dereference when calculating fileid and
filehandle of the opened file. Fix this by checking if state is NULL
before trying to use the inode pointer.
Reported-by: Olga Kornievskaia <aglo@umich.edu>
Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When the client goes to return a delegation, it should always update any
nfs4_state currently set up to use that delegation stateid to instead
use the open stateid. It already does do this in some cases,
particularly in the state recovery code, but not currently when the
delegation is voluntarily returned (e.g. in advance of a RENAME). This
causes the client to try to continue using the delegation stateid after
the DELEGRETURN, e.g. in LAYOUTGET.
Set the nfs4_state back to using the open stateid in
nfs4_open_delegation_recall, just before clearing the
NFS_DELEGATED_STATE bit.
Signed-off-by: Jeff Layton <jeff.layton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Since commit 5cae02f42793130e1387f4ec09c4d07056ce9fa5 an OPEN_CONFIRM should
have a privileged sequence in the recovery case to allow nograce recovery to
proceed for NFSv4.0.
Signed-off-by: Benjamin Coddington <bcodding@redhat.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| |
| |
| |
| | |
We need to warn against broken NFSv4.1 servers that try to hand out
delegations in response to NFS4_OPEN_CLAIM_DELEG_CUR_FH.
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently, we don't test if the state owner is in use before we try to
recover it. The problem is that if the refcount is zero, then the
state owner will be waiting on the lru list for garbage collection.
The expectation in that case is that if you bump the refcount, then
you must also remove the state owner from the lru list. Otherwise
the call to nfs4_put_state_owner will corrupt that list by trying
to add our state owner a second time.
Avoid the whole problem by just skipping state owners that hold no
state.
Reported-by: Andrew W Elble <aweits@rit.edu>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If all other conditions in nfs_can_extend_write() are met, and there
are no locks, then we should be able to assume close-to-open semantics
and the ability to extend our write to cover the whole page.
With this patch, the xfstests generic/074 test completes in 242s instead
of >1400s on my test rig.
Fixes: bd61e0a9c852 ("locks: convert posix locks to file_lock_context")
Cc: Jeff Layton <jlayton@primarydata.com>
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently, we are crediting all the calls to nfs_writepages_callback()
(i.e. the nfs_writepages() callback) to nfs_writepage(). Aside from
being inconsistent with the behaviour of the equivalent readpage/readpages
accounting, this also means that we cannot distinguish between bulk writes
and single page writebacks (which confuses the 'nfsiostat -p' tool).
Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com>
|
|\ \
| |/
|/|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull CIFS fixes from Steve French:
"Two fixes for problems pointed out by automated tools.
Thanks PaX/grsecurity team and Dan Carpenter (and the Smatch tool)"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
[CIFS] Update cifs version number
[SMB3] Do not fall back to SMBWriteX in set_file_size error cases
[SMB3] Missing null tcon check
|
| |
| |
| |
| |
| |
| | |
Update modinfo cifs.ko version number to 2.08
Signed-off-by: Steve French <steve.french@primarydata.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The error paths in set_file_size for cifs and smb3 are incorrect.
In the unlikely event that a server did not support set file info
of the file size, the code incorrectly falls back to trying SMBWriteX
(note that only the original core SMB Write, used for example by DOS,
can set the file size this way - this actually does not work for the more
recent SMBWriteX). The idea was since the old DOS SMB Write could set
the file size if you write zero bytes at that offset then use that if
server rejects the normal set file info call.
Fortunately the SMBWriteX will never be sent on the wire (except when
file size is zero) since the length and offset fields were reversed
in the two places in this function that call SMBWriteX causing
the fall back path to return an error. It is also important to never call
an SMB request from an SMB2/sMB3 session (which theoretically would
be possible, and can cause a brief session drop, although the client
recovers) so this should be fixed. In practice this path does not happen
with modern servers but the error fall back to SMBWriteX is clearly wrong.
Removing the calls to SMBWriteX in the error paths in cifs_set_file_size
Pointed out by PaX/grsecurity team
Signed-off-by: Steve French <steve.french@primarydata.com>
Reported-by: PaX Team <pageexec@freemail.hu>
CC: Emese Revfy <re.emese@gmail.com>
CC: Brad Spengler <spender@grsecurity.net>
CC: Stable <stable@vger.kernel.org>
|
| |
| |
| |
| |
| |
| |
| | |
Pointed out by Dan Carpenter via smatch code analysis tool
CC: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Steve French <steve.french@primarydata.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Commit 46c043ede471 ("mm: take i_mmap_lock in unmap_mapping_range() for
DAX") moved some code in __dax_pmd_fault() that was responsible for
zeroing newly allocated PMD pages. The new location didn't properly set
up 'kaddr', so when run this code resulted in a NULL pointer BUG.
Fix this by getting the correct 'kaddr' via bdev_direct_access().
Signed-off-by: Ross Zwisler <ross.zwisler@linux.intel.com>
Reported-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Matthew Wilcox <willy@linux.intel.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Dave Chinner <david@fromorbit.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \
| |/
|/|
| |
| |
| |
| |
| |
| |
| | |
Pull UBI/UBIFS fixes from Richard Weinberger:
"This contains three bug fixes for both UBI and UBIFS"
* tag 'upstream-4.3-rc4' of git://git.infradead.org/linux-ubifs:
UBI: return ENOSPC if no enough space available
UBI: Validate data_size
UBIFS: Kill unneeded locking in ubifs_init_security
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Fixes the following lockdep splat:
[ 1.244527] =============================================
[ 1.245193] [ INFO: possible recursive locking detected ]
[ 1.245193] 4.2.0-rc1+ #37 Not tainted
[ 1.245193] ---------------------------------------------
[ 1.245193] cp/742 is trying to acquire lock:
[ 1.245193] (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [<ffffffff812b3f69>] ubifs_init_security+0x29/0xb0
[ 1.245193]
[ 1.245193] but task is already holding lock:
[ 1.245193] (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [<ffffffff81198e7f>] path_openat+0x3af/0x1280
[ 1.245193]
[ 1.245193] other info that might help us debug this:
[ 1.245193] Possible unsafe locking scenario:
[ 1.245193]
[ 1.245193] CPU0
[ 1.245193] ----
[ 1.245193] lock(&sb->s_type->i_mutex_key#9);
[ 1.245193] lock(&sb->s_type->i_mutex_key#9);
[ 1.245193]
[ 1.245193] *** DEADLOCK ***
[ 1.245193]
[ 1.245193] May be due to missing lock nesting notation
[ 1.245193]
[ 1.245193] 2 locks held by cp/742:
[ 1.245193] #0: (sb_writers#5){.+.+.+}, at: [<ffffffff811ad37f>] mnt_want_write+0x1f/0x50
[ 1.245193] #1: (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [<ffffffff81198e7f>] path_openat+0x3af/0x1280
[ 1.245193]
[ 1.245193] stack backtrace:
[ 1.245193] CPU: 2 PID: 742 Comm: cp Not tainted 4.2.0-rc1+ #37
[ 1.245193] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140816_022509-build35 04/01/2014
[ 1.245193] ffffffff8252d530 ffff88007b023a38 ffffffff814f6f49 ffffffff810b56c5
[ 1.245193] ffff88007c30cc80 ffff88007b023af8 ffffffff810a150d ffff88007b023a68
[ 1.245193] 000000008101302a ffff880000000000 00000008f447e23f ffffffff8252d500
[ 1.245193] Call Trace:
[ 1.245193] [<ffffffff814f6f49>] dump_stack+0x4c/0x65
[ 1.245193] [<ffffffff810b56c5>] ? console_unlock+0x1c5/0x510
[ 1.245193] [<ffffffff810a150d>] __lock_acquire+0x1a6d/0x1ea0
[ 1.245193] [<ffffffff8109fa78>] ? __lock_is_held+0x58/0x80
[ 1.245193] [<ffffffff810a1a93>] lock_acquire+0xd3/0x270
[ 1.245193] [<ffffffff812b3f69>] ? ubifs_init_security+0x29/0xb0
[ 1.245193] [<ffffffff814fc83b>] mutex_lock_nested+0x6b/0x3a0
[ 1.245193] [<ffffffff812b3f69>] ? ubifs_init_security+0x29/0xb0
[ 1.245193] [<ffffffff812b3f69>] ? ubifs_init_security+0x29/0xb0
[ 1.245193] [<ffffffff812b3f69>] ubifs_init_security+0x29/0xb0
[ 1.245193] [<ffffffff8128e286>] ubifs_create+0xa6/0x1f0
[ 1.245193] [<ffffffff81198e7f>] ? path_openat+0x3af/0x1280
[ 1.245193] [<ffffffff81195d15>] vfs_create+0x95/0xc0
[ 1.245193] [<ffffffff8119929c>] path_openat+0x7cc/0x1280
[ 1.245193] [<ffffffff8109ffe3>] ? __lock_acquire+0x543/0x1ea0
[ 1.245193] [<ffffffff81088f20>] ? sched_clock_cpu+0x90/0xc0
[ 1.245193] [<ffffffff81088c00>] ? calc_global_load_tick+0x60/0x90
[ 1.245193] [<ffffffff81088f20>] ? sched_clock_cpu+0x90/0xc0
[ 1.245193] [<ffffffff811a9cef>] ? __alloc_fd+0xaf/0x180
[ 1.245193] [<ffffffff8119ac55>] do_filp_open+0x75/0xd0
[ 1.245193] [<ffffffff814ffd86>] ? _raw_spin_unlock+0x26/0x40
[ 1.245193] [<ffffffff811a9cef>] ? __alloc_fd+0xaf/0x180
[ 1.245193] [<ffffffff81189bd9>] do_sys_open+0x129/0x200
[ 1.245193] [<ffffffff81189cc9>] SyS_open+0x19/0x20
[ 1.245193] [<ffffffff81500717>] entry_SYSCALL_64_fastpath+0x12/0x6f
While the lockdep splat is a false positive, becuase path_openat holds i_mutex
of the parent directory and ubifs_init_security() tries to acquire i_mutex
of a new inode, it reveals that taking i_mutex in ubifs_init_security() is
in vain because it is only being called in the inode allocation path
and therefore nobody else can see the inode yet.
Cc: stable@vger.kernel.org # 3.20-
Reported-and-tested-by: Boris Brezillon <boris.brezillon@free-electrons.com>
Reviewed-and-tested-by: Dongsheng Yang <yangds.fnst@cn.fujitsu.com>
Signed-off-by: Richard Weinberger <richard@nod.at>
Signed-off-by: dedekind1@gmail.com
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Pull CIFS fixes from Steve French:
"Four fixes from testing at the recent SMB3 Plugfest including two
important authentication ones (one fixes authentication problems to
some popular servers when clock times differ more than two hours
between systems, the other fixes Kerberos authentication for SMB3)"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
fix encryption error checks on mount
[SMB3] Fix sec=krb5 on smb3 mounts
cifs: use server timestamp for ntlmv2 authentication
disabling oplocks/leases via module parm enable_oplocks broken for SMB3
|
| | |
| | |
| | |
| | | |
Signed-off-by: Steve French <steve.french@primarydata.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Kerberos, which is very important for security, was only enabled for
CIFS not SMB2/SMB3 mounts (e.g. vers=3.0)
Patch based on the information detailed in
http://thread.gmane.org/gmane.linux.kernel.cifs/10081/focus=10307
to enable Kerberized SMB2/SMB3
a) SMB2_negotiate: enable/use decode_negTokenInit in SMB2_negotiate
b) SMB2_sess_setup: handle Kerberos sectype and replicate Kerberos
SMB1 processing done in sess_auth_kerberos
Signed-off-by: Noel Power <noel.power@suse.com>
Signed-off-by: Jim McDonough <jmcd@samba.org>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <steve.french@primarydata.com>
|