| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Ext4 commits for 3.3 merge window
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (32 commits)
ext4: fix undefined behavior in ext4_fill_flex_info()
ext4: make more symbols static
ext4: make local symbol ext4_initxattrs static
jbd2: fix hung processes in jbd2_journal_lock_updates()
ext4: reserve new feature flag codepoints
ext4: Report max_batch_time option correctly
ext4: add missing ext4_resize_end on error paths
ext4: let ext4_group_add() use common code
ext4: let ext4_group_extend() use common code
ext4: add new online resize interface
ext4: add a new function which adds a flex group to a fs
ext4: add a new function which allocates bitmaps and inode tables
ext4: pass verify_reserved_gdb() the number of group decriptors
ext4: add a function which updates the super block during online resizing
ext4: add a function which sets up a block group descriptors of a flex bg
ext4: add a function which sets up group blocks of a flex bg
ext4: add a structure which will be used by 64bit-resize interface
ext4: add a function which adds a new group descriptors to a fs
ext4: add a function which extends a group without checking parameters
ext4: use proper little-endian bitops
...
|
| |\
| | |
| | |
| | |
| | | |
Conflicts:
fs/ext4/ioctl.c
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Commit 503358ae01b70ce6909d19dd01287093f6b6271c ("ext4: avoid divide by
zero when trying to mount a corrupted file system") fixes CVE-2009-4307
by performing a sanity check on s_log_groups_per_flex, since it can be
set to a bogus value by an attacker.
sbi->s_log_groups_per_flex = sbi->s_es->s_log_groups_per_flex;
groups_per_flex = 1 << sbi->s_log_groups_per_flex;
if (groups_per_flex < 2) { ... }
This patch fixes two potential issues in the previous commit.
1) The sanity check might only work on architectures like PowerPC.
On x86, 5 bits are used for the shifting amount. That means, given a
large s_log_groups_per_flex value like 36, groups_per_flex = 1 << 36
is essentially 1 << 4 = 16, rather than 0. This will bypass the check,
leaving s_log_groups_per_flex and groups_per_flex inconsistent.
2) The sanity check relies on undefined behavior, i.e., oversized shift.
A standard-confirming C compiler could rewrite the check in unexpected
ways. Consider the following equivalent form, assuming groups_per_flex
is unsigned for simplicity.
groups_per_flex = 1 << sbi->s_log_groups_per_flex;
if (groups_per_flex == 0 || groups_per_flex == 1) {
We compile the code snippet using Clang 3.0 and GCC 4.6. Clang will
completely optimize away the check groups_per_flex == 0, leaving the
patched code as vulnerable as the original. GCC keeps the check, but
there is no guarantee that future versions will do the same.
Signed-off-by: Xi Wang <xi.wang@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
A couple more functions can reasonably be made static if desired.
Signed-off-by: Eric Sandeen <sandeen@redhat.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The ext4_initxattrs symbol is used only in this file, so it should be
declared static.
Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Toshiyuki Okajima found out that when running
for ((i=0; i < 100000; i++)); do
if ((i%2 == 0)); then
chattr +j /mnt/file
else
chattr -j /mnt/file
fi
echo "0" >> /mnt/file
done
process sometimes hangs indefinitely in jbd2_journal_lock_updates().
Toshiyuki identified that the following race happens:
jbd2_journal_lock_updates() |jbd2_journal_stop()
---------------------------------------+---------------------------------------
write_lock(&journal->j_state_lock) | .
++journal->j_barrier_count | .
spin_lock(&tran->t_handle_lock) | .
atomic_read(&tran->t_updates) //not 0 |
| atomic_dec_and_test(&tran->t_updates)
| // t_updates = 0
| wake_up(&journal->j_wait_updates)
prepare_to_wait() | // no process is woken up.
spin_unlock(&tran->t_handle_lock) |
write_unlock(&journal->j_state_lock) |
schedule() // never return |
We fix the problem by first calling prepare_to_wait() and only after that
checking t_updates in jbd2_journal_lock_updates().
Reported-and-analyzed-by: Toshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Reserve the ext4 features flags EXT4_FEATURE_RO_COMPAT_METADATA_CSUM,
EXT4_FEATURE_INCOMPAT_INLINEDATA, and EXT4_FEATURE_INCOMPAT_LARGEDIR.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Currently the value reported for max_batch_time is really the
value of min_batch_time.
Reported-by: Russell Coker <russell@coker.com.au>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Online resize ioctls 'EXT4_IOC_GROUP_EXTEND' and 'EXT4_IOC_GROUP_ADD'
call ext4_resize_begin() to check permissions and to set the
EXT4_RESIZING bit lock, they do their work and they must finish with
ext4_resize_end() which calls clear_bit_unlock() to unlock and to
avoid -EBUSY errors for the next resize operations.
This patch adds the missing ext4_resize_end() calls on error paths.
Patch tested.
Cc: stable@vger.kernel.org
Signed-off-by: Djalal Harouni <tixxdz@opendz.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch lets ext4_group_add() call ext4_flex_group_add().
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
ext4_group_extend_no_check() is moved out from ext4_group_extend(),
this patch lets ext4_group_extend() call ext4_group_extentd_no_check()
instead.
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch adds new online resize interface, whose input argument is a
64-bit integer indicating how many blocks there are in the resized fs.
In new resize impelmentation, all work like allocating group tables
are done by kernel side, so the new resize interface can support
flex_bg feature and prepares ground for suppoting resize with features
like bigalloc and exclude bitmap. Besides these, user-space tools just
passes in the new number of blocks.
We delay initializing the bitmaps and inode tables of added groups if
possible and add multi groups (a flex groups) each time, so new resize
is very fast like mkfs.
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch adds a new function named ext4_flex_group_add() which adds a
flex group to a fs. The function is used by 64bit-resize interface.
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch adds a new function named ext4_allocates_group_table()
which allocates block bitmaps, inode bitmaps and inode tables for a
flex groups and is used by resize code.
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The 64bit resizer adds a flex group each time, so verify_reserved_gdb
can not use s_groups_count directly, it should use the number of group
decriptors before the added group.
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch adds a function named ext4_update_super() which updates
super block so the newly created block groups are visible to the file
system. This code is copied from ext4_group_add().
The function will be used by new resize implementation.
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch adds a function named ext4_setup_new_descs which sets up the
block group descriptors of a flex bg.
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch adds a function named setup_new_flex_group_blocks() which
sets up group blocks of a flex bg.
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch adds a structure which will be used by 64bit-resize interface.
Two functions which allocate and destroy the structure respectively are
added.
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch adds a function named ext4_add_new_descs() which adds one
or more new group descriptors to a fs and whose code is copied from
ext4_group_add().
The function will be used by new resize implementation.
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This patch added a function named ext4_group_extend_no_check() whose code
is copied from ext4_group_extend(). ext4_group_extend_no_check() assumes
the parameter is valid and has been checked by caller.
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
ext4_{set,clear}_bit() is defined as __test_and_{set,clear}_bit_le() for
ext4. Only two ext4_{set,clear}_bit() calls check the return value. The
rest of calls ignore the return value and they can be replaced with
__{set,clear}_bit_le().
This changes ext4_{set,clear}_bit() from __test_and_{set,clear}_bit_le()
to __{set,clear}_bit_le() and introduces ext4_test_and_{set,clear}_bit()
for the two places where old bit needs to be returned.
This ext4_{set,clear}_bit() change is considered safe, because if someone
uses these macros without noticing the change, new ext4_{set,clear}_bit
don't have return value and causes compiler errors where the return value
is used.
This also removes unused ext4_find_first_zero_bit().
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The functions ext4_block_truncate_page() and ext4_block_zero_page_range()
are no longer used, so remove them.
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | | |
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Fix ext4_debug format in ext4_ext_handle_uninitialized_extents() and
ext4_end_io_dio().
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Currently, we clear revoked flag only when a block is reused. However,
this can tigger a false journal error. Consider a situation when a block
is used as a meta block and is deleted(revoked) in ordered mode, then the
block is allocated as a data block to a file. At this moment, user changes
the file's journal mode from ordered to journaled and truncates the file.
The block will be considered re-revoked by journal because it has revoked
flag still pending from the last transaction and an assertion triggers.
We fix the problem by keeping the revoked status more uptodate - we clear
revoked flag when switching revoke tables to reflect there is no revoked
buffers in current transaction any more.
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
It's necessary to flush the journal when switching away from
data=journal mode. This is because there are no revoke records when
data blocks are journalled, but revoke records are required in the
other journal modes.
However, it is not necessary to flush the journal when switching into
data=journal mode, and flushing the journal is expensive. So let's
avoid it in that case.
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
delalloc blocks should be allocated before changing journal mode,
otherwise they can not be allocated and even more truncate on
delalloc blocks could triggre BUG by flushing delalloc buffers.
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In the code to support EXT4_IOC_MOVE_EXT, ext4_ioctl calls
file_remove_suid() after the call to ext4_move_extents() if any
extents has been moved. There are at least three things wrong with
this. First, file_remove_suid() should be called with i_mutex down,
which is not here. Second, it should be called before the donor file
has been modified, to avoid a potential race condition. Third, and
most importantly, it's pointless, because ext4_file_extents() already
checks if the donor file has the setuid or setgid bit set, and will
return an error in that case. So the first two objections don't
really matter, since file_remove_suid() will never need to modify the
inode in any case.
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We found performance regression when using bigalloc with "nodelalloc"
(1MB cluster size):
1. mke2fs -C 1048576 -O ^has_journal,bigalloc /dev/sda
2. mount -o nodelalloc /dev/sda /test/
3. time dd if=/dev/zero of=/test/io bs=1048576 count=1024
The "dd" will cost about 2 seconds to finish, but if we mke2fs without
"bigalloc", "dd" will only cost less than 1 second.
The reason is: when using ext4 with "nodelalloc", it will call
ext4_find_delalloc_cluster() nearly everytime it call
ext4_ext_map_blocks(), and ext4_find_delalloc_range() will also scan
all pages in cluster because no buffer is "delayed". A cluster has
256 pages (1MB cluster), so it will scan 256 * 256k pags when creating
a 1G file. That severely hurts the performance.
Therefore, we return immediately from ext4_find_delalloc_range() in
nodelalloc mode, since by definition there can't be any delalloc
pages.
Signed-off-by: Robin Dong <sanbai@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In get_implied_cluster_alloc(), rr_cluster_end was being
defined and set, but was never used. Removed this.
Signed-off-by: Curt Wohlgemuth <curtw@google.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When insert_inode_locked() fails in ext4_new_inode() it most likely means inode
bitmap got corrupted and we allocated again inode which is already in use. Also
doing unlock_new_inode() during error recovery is wrong since the inode does
not have I_NEW set. Fix the problem by jumping to fail: (instead of fail_drop:)
which declares filesystem error and does not call unlock_new_inode().
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | | |
Signed-off-by: Zheng Liu <wenqing.lz@taobao.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
pa_inode in group_pa is set NULL in ext4_mb_new_group_pa, so
pa_inode should be not referenced.
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com>
Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ericvh/v9fs:
fs/9p: iattr_valid flags are kernel internal flags map them to 9p values.
fs/9p: We should not allocate a new inode when creating hardlines.
fs/9p: v9fs_stat2inode should update suid/sgid bits.
9p: Reduce object size with CONFIG_NET_9P_DEBUG
fs/9p: check schedule_timeout_interruptible return value
Fix up trivial conflicts in fs/9p/{vfs_inode.c,vfs_inode_dotl.c} due to
debug messages having changed to use p9_debug() on one hand, and the
changes for umode_t on the other.
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Kernel internal values can change, add protocol values for these constant
and use them.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Don't do new_inode_from fid in case of hardlink creation. This ensures
that link count for hardlink files get updated properly. Earlier link count
was not updated on removing a hardlink with cache mode enabled.
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Create a new helper that update the permission bits and use
that, instead of opencoding the logic.
Reported and bisected by: M. Mohan Kumar <mohan@in.ibm.com>
Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Reduce object size by deduplicating formats.
Use vsprintf extension %pV.
Rename P9_DPRINTK uses to p9_debug, align arguments.
Add function for _p9_debug and macro to add __func__.
Add missing "\n"s to p9_debug uses.
Remove embedded function names as p9_debug adds it.
Remove P9_EPRINTK macro and convert use to pr_<level>.
Add and use pr_fmt and pr_<level>.
$ size fs/9p/built-in.o*
text data bss dec hex filename
62133 984 16000 79117 1350d fs/9p/built-in.o.new
67342 984 16928 85254 14d06 fs/9p/built-in.o.old
$ size net/9p/built-in.o*
text data bss dec hex filename
88792 4148 22024 114964 1c114 net/9p/built-in.o.new
94072 4148 23232 121452 1da6c net/9p/built-in.o.old
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
In v9fs_file_do_lock() we need to check return value of
schedule_timeout_interruptible() and exit the loop when it
returns nonzero, otherwise the loop is not really interruptible
and after the signal, the loop is no longer throttled by
P9_LOCK_TIMEOUT.
Signed-off-by: Jim Garlick <garlick.jim@gmail.com>
Signed-off-by: Eric Van Hensbergen <ericvh@gmail.com>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
* 'nfs-for-3.3' of git://git.linux-nfs.org/projects/trondmy/linux-nfs:
NFSv4: Change the default setting of the nfs4_disable_idmapping parameter
NFSv4: Save the owner/group name string when doing open
NFS: Remove pNFS bloat from the generic write path
pnfs-obj: Must return layout on IO error
pnfs-obj: pNFS errors are communicated on iodata->pnfs_error
NFS: Cache state owners after files are closed
NFS: Clean up nfs4_find_state_owners_locked()
NFSv4: include bitmap in nfsv4 get acl data
nfs: fix a minor do_div portability issue
NFSv4.1: cleanup comment and debug printk
NFSv4.1: change nfs4_free_slot parameters for dynamic slots
NFSv4.1: cleanup init and reset of session slot tables
NFSv4.1: fix backchannel slotid off-by-one bug
nfs: fix regression in handling of context= option in NFSv4
NFS - fix recent breakage to NFS error handling.
NFS: Retry mounting NFSROOT
SUNRPC: Clean up the RPCSEC_GSS service ticket requests
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Now that the use of numeric uids/gids is officially sanctioned in
RFC3530bis, it is time to change the default here to 'enabled'.
By doing so, we ensure that NFSv4 copies the behaviour of NFSv3 when we're
using the default AUTH_SYS authentication (i.e. when the client uses the
numeric uids/gids as authentication tokens), so that when new files are
created, they will appear to have the correct user/group.
It also fixes a number of backward compatibility issues when migrating
from NFSv3 to NFSv4 on a platform where the server uses different uid/gid
mappings than the client.
Note also that this setting has been successfully tested against servers
that do not support numeric uids/gids at several Connectathon/Bakeathon
events at this point, and the fall back to using string names/groups has
been shown to work well in all those test cases.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
...so that we can do the uid/gid mapping outside the asynchronous RPC
context.
This fixes a bug in the current NFSv4 atomic open code where the client
isn't able to determine what the true uid/gid fields of the file are,
(because the asynchronous nature of the OPEN call denies it the ability
to do an upcall) and so fills them with default values, marking the
inode as needing revalidation.
Unfortunately, in some cases, the VFS will do some additional sanity
checks on the file, and may override the server's decision to allow
the open because it sees the wrong owner/group fields.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
We have no business doing any this in the standard write release path.
Get rid of it, and put it in the pNFS layer.
Also, while we're at it, get rid of the completely bogus unlock/relock
semantics that were present in nfs_writeback_release_full(). It is
not only unnecessary, but actually dangerous to release the write lock
just in order to take it again in nfs_page_async_flush(). Better just
to open code the pgio operations in a pnfs helper.
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
As mandated by the standard. In case of an IO error, a pNFS
objects layout driver must return it's layout. This is because
all device errors are reported to the server as part of the
layout return buffer.
This is implemented the same way PNFS_LAYOUTRET_ON_SETATTR
is done, through a bit flag on the pnfs_layoutdriver_type->flags
member. The flag is set by the layout driver that wants a
layout_return preformed at pnfs_ld_{write,read}_done in case
of an error.
(Though I have not defined a wrapper like pnfs_ld_layoutret_on_setattr
because this code is never called outside of pnfs.c and pnfs IO
paths)
Without this patch 3.[0-2] Kernels leak memory and have an annoying
WARN_ON after every IO error utilizing the pnfs-obj driver.
[This patch is for 3.2 Kernel. 3.1/0 Kernels need a different patch]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Some time along the way pNFS IO errors were switched to
communicate with a special iodata->pnfs_error member instead
of the regular RPC members. But objlayout was not switched
over.
Fix that!
Without this fix any IO error is hanged, because IO is not
switched to MDS and pages are never cleared or read.
[Applies to 3.2.0. Same bug different patch for 3.1/0 Kernels]
CC: Stable Tree <stable@kernel.org>
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Servers have a finite amount of memory to store NFSv4 open and lock
owners. Moreover, servers may have a difficult time determining when
they can reap their state owner table, thanks to gray areas in the
NFSv4 protocol specification. Thus clients should be careful to reuse
state owners when possible.
Currently Linux is not too careful. When a user has closed all her
files on one mount point, the state owner's reference count goes to
zero, and it is released. The next OPEN allocates a new one. A
workload that serially opens and closes files can run through a large
number of open owners this way.
When a state owner's reference count goes to zero, slap it onto a free
list for that nfs_server, with an expiry time. Garbage collect before
looking for a state owner. This makes state owners for active users
available for re-use.
Now that there can be unused state owners remaining at umount time,
purge the state owner free list when a server is destroyed. Also be
sure not to reclaim unused state owners during state recovery.
This change has benefits for the client as well. For some workloads,
this approach drops the number of OPEN_CONFIRM calls from the same as
the number of OPEN calls, down to just one. This reduces wire traffic
and thus open(2) latency. Before this patch, untarring a kernel
source tarball shows the OPEN_CONFIRM call counter steadily increasing
through the test. With the patch, the OPEN_CONFIRM count remains at 1
throughout the entire untar.
As long as the expiry time is kept short, I don't think garbage
collection should be terribly expensive, although it does bounce the
clp->cl_lock around a bit.
[ At some point we should rationalize the use of the nfs_server
->destroy method. ]
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
[Trond: Fixed a garbage collection race and a few efficiency issues]
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
There's no longer a need to check the so_server field in the state
owner, because nowadays the RB tree we search for state owners
contains owners for that only server.
Make nfs4_find_state_owners_locked() use the same tree searching logic
as nfs4_insert_state_owner_locked().
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The NFSv4 bitmap size is unbounded: a server can return an arbitrary
sized bitmap in an FATTR4_WORD0_ACL request. Replace using the
nfs4_fattr_bitmap_maxsz as a guess to the maximum bitmask returned by a server
with the inclusion of the bitmap (xdr length plus bitmasks) and the acl data
xdr length to the (cached) acl page data.
This is a general solution to commit e5012d1f "NFSv4.1: update
nfs4_fattr_bitmap_maxsz" and fixes hitting a BUG_ON in xdr_shrink_bufhead
when getting ACLs.
Fix a bug in decode_getacl that returned -EINVAL on ACLs > page when getxattr
was called with a NULL buffer, preventing ACL > PAGE_SIZE from being retrieved.
Cc: stable@kernel.org
Signed-off-by: Andy Adamson <andros@netapp.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This change modifies filelayout_get_dense_offset() to use the functions
in math64.h and thus avoid a 32-bit platform compile error trying to
use do_div() on an s64 type.
Signed-off-by: Chris Metcalf <cmetcalf@tilera.com>
Reviewed-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
|