| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse updates from Miklos Szeredi:
- Add page -> folio conversions (Joanne Koong, Josef Bacik)
- Allow max size of fuse requests to be configurable with a sysctl
(Joanne Koong)
- Allow FOPEN_DIRECT_IO to take advantage of async code path (yangyun)
- Fix large kernel reads (like a module load) in virtio_fs (Hou Tao)
- Fix attribute inconsistency in case readdirplus (and plain lookup in
corner cases) is racing with inode eviction (Zhang Tianci)
- Fix a WARN_ON triggered by virtio_fs (Asahi Lina)
* tag 'fuse-update-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: (30 commits)
virtiofs: dax: remove ->writepages() callback
fuse: check attributes staleness on fuse_iget()
fuse: remove pages for requests and exclusively use folios
fuse: convert direct io to use folios
mm/writeback: add folio_mark_dirty_lock()
fuse: convert writebacks to use folios
fuse: convert retrieves to use folios
fuse: convert ioctls to use folios
fuse: convert writes (non-writeback) to use folios
fuse: convert reads to use folios
fuse: convert readdir to use folios
fuse: convert readlink to use folios
fuse: convert cuse to use folios
fuse: add support in virtio for requests using folios
fuse: support folios in struct fuse_args_pages and fuse_copy_pages()
fuse: convert fuse_notify_store to use folios
fuse: convert fuse_retrieve to use folios
fuse: use the folio based vmstat helpers
fuse: convert fuse_writepage_need_send to take a folio
fuse: convert fuse_do_readpage to use folios
...
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When using FUSE DAX with virtiofs, cache coherency is managed by the
host. Disk persistence is handled via fsync() and friends, which are
passed directly via the FUSE layer to the host. Therefore, there's no
need to do dax_writeback_mapping_range(). All that ends up doing is a
cache flush operation, which is not caught by KVM and doesn't do much,
since the host and guest are already cache-coherent.
Since dax_writeback_mapping_range() checks that the inode block size is
equal to PAGE_SIZE, this fixes a spurious WARN when virtiofs is used
with a mismatched guest PAGE_SIZE and virtiofs backing FS block size
(this happens, for example, when it's a tmpfs and the host and guest
have a different PAGE_SIZE). FUSE DAX does not require any particular FS
block size, since it always performs DAX mappings in aligned 2MiB
blocks.
See discussion in [1].
[1] https://lore.kernel.org/lkml/20241101-dax-page-size-v1-1-eedbd0c6b08f@asahilina.net/T/#u
[SzM: remove the empty callback]
Suggested-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Asahi Lina <lina@asahilina.net>
Acked-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Function fuse_direntplus_link() might call fuse_iget() to initialize a new
fuse_inode and change its attributes. If fi->attr_version is always
initialized with 0, even if the attributes returned by the FUSE_READDIR
request is staled, as the new fi->attr_version is 0, fuse_change_attributes
will still set the staled attributes to inode. This wrong behaviour may
cause file size inconsistency even when there is no changes from
server-side.
To reproduce the issue, consider the following 2 programs (A and B) are
running concurrently,
A B
---------------------------------- --------------------------------
{ /fusemnt/dir/f is a file path in a fuse mount, the size of f is 0. }
readdir(/fusemnt/dir) start
//Daemon set size 0 to f direntry
fallocate(f, 1024)
stat(f) // B see size 1024
echo 2 > /proc/sys/vm/drop_caches
readdir(/fusemnt/dir) reply to kernel
Kernel set 0 to the I_NEW inode
stat(f) // B see size 0
In the above case, only program B is modifying the file size, however, B
observes file size changing between the 2 'readonly' stat() calls. To fix
this issue, we should make sure readdirplus still follows the rule of
attr_version staleness checking even if the fi->attr_version is lost due to
inode eviction.
To identify this situation, the new fc->evict_ctr is used to record whether
the eviction of inodes occurs during the readdirplus request processing.
If it does, the result of readdirplus may be inaccurate; otherwise, the
result of readdirplus can be trusted. Although this may still lead to
incorrect invalidation, considering the relatively low frequency of
evict occurrences, it should be acceptable.
Link: https://lore.kernel.org/lkml/20230711043405.66256-2-zhangjiachen.jaycee@bytedance.com/
Link: https://lore.kernel.org/lkml/20241114070905.48901-1-zhangtianci.1997@bytedance.com/
Reported-by: Jiachen Zhang <zhangjiachen.jaycee@bytedance.com>
Suggested-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Zhang Tianci <zhangtianci.1997@bytedance.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
All fuse requests use folios instead of pages for transferring data.
Remove pages from the requests and exclusively use folios.
No functional changes.
[SzM: rename back folio_descs -> descs, etc.]
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Convert direct io requests to use folios instead of pages.
No functional changes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Convert writeback requests to use folios instead of pages.
No functional changes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Convert retrieve requests to use folios instead of pages.
No functional changes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Convert ioctl requests to use folios instead of pages.
No functional changes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Convert non-writeback write requests to use folios instead of pages.
No functional changes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Convert read requests to use folios instead of pages.
No functional changes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Convert readdir requests to use a folio instead of a page.
No functional changes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Convert readlink requests to use a folio instead of a page.
No functional changes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Convert cuse requests to use a folio instead of a page.
No functional changes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Until all requests have been converted to use folios instead of pages,
virtio will need to support both types. Once all requests have been
converted, then virtio will support just folios.
No functional changes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This adds support in struct fuse_args_pages and fuse_copy_pages() for
using folios instead of pages for transferring data. Both folios and
pages must be supported right now in struct fuse_args_pages and
fuse_copy_pages() until all request types have been converted to use
folios. Once all have been converted, then
struct fuse_args_pages and fuse_copy_pages() will only support folios.
Right now in fuse, all folios are one page (large folios are not yet
supported). As such, copying folio->page is sufficient for copying
the entire folio in fuse_copy_pages().
No functional changes.
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This function creates pages in an inode and copies data into them,
update the function to use a folio instead of a page, and use the
appropriate folio helpers.
[SzM: use filemap_grab_folio()]
[Hau Tao: The third argument of folio_zero_range() should be the length to
be zeroed, not the total length. Fix it by using folio_zero_segment()
instead in fuse_notify_store()]
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
We're just looking for pages in a mapping, use a folio and the folio
lookup function directly instead of using the page helper.
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In order to make it easier to switch to folios in the fuse_args_pages
update the places where we update the vmstat counters for writeback to
use the folio related helpers. On the inc side this is easy as we
already have the folio, on the dec side we have to page_folio() the
pages for now.
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
fuse_writepage_need_send is called by fuse_writepages_fill() which
already has a folio. Change fuse_writepage_need_send() to take a folio
instead, add a helper to check if the folio range is under writeback and
use this, as well as the appropriate folio helpers in the rest of the
function. Update fuse_writepage_need_send() to pass in the folio
directly.
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Now that the buffered write path is using folios, convert
fuse_do_readpage() to take a folio instead of a page, update it to use
the appropriate folio helpers, and update the callers to pass in the
folio directly instead of a page.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This combines the file_remove_privs() and file_update_time() call into
one call.
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Bernd Schubert <bschubert@ddn.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Convert this to grab the folio directly, and update all the helpers to
use the folio related functions.
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Convert this to grab the folio directly, and update all the helpers to
use the folio related functions.
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Convert this to grab the folio from the fuse_args_pages and use the
appropriate folio related functions.
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Joanne Koong <joannelkoong@gmail.com>
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Currently we're using the __readahead_batch() helper which populates our
fuse_args_pages->pages array with pages. Convert this to use the newer
folio based pattern which is to call readahead_folio() to get the next
folio in the read ahead batch. I've updated the code to use things like
folio_size() and to take into account larger folio sizes, but this is
purely to make that eventual work easier to do, we currently will not
get large folios so this is more future proofing than actual support.
[SzM: remove check for readahead_folio() won't return NULL (at least for
now) so remove ugly assign in conditional.]
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
fuse_send_readpages() waits for writeback on each page. This can be
replaced by a single call to fuse_range_is_writeback().
[SzM: split this off from "fuse: convert readahead to use folios"]
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When invoking virtio_fs_enqueue_req() through kworker, both the
allocation of the sg array and the bounce buffer still use GFP_ATOMIC.
Considering the size of the sg array may be greater than PAGE_SIZE, use
GFP_NOFS instead of GFP_ATOMIC to lower the possibility of memory
allocation failure and to avoid unnecessarily depleting the atomic
reserves. GFP_NOFS is not passed to virtio_fs_enqueue_req() directly,
GFP_KERNEL and memalloc_nofs_{save|restore} helpers are used instead.
It may seem OK to pass GFP_NOFS to virtio_fs_enqueue_req() as well when
queuing the request for the first time, but this is not the case. The
reason is that fuse_request_queue_background() may call
->queue_request_and_unlock() while holding fc->bg_lock, which is a
spin-lock. Therefore, still use GFP_ATOMIC for it.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When trying to insert a 10MB kernel module kept in a virtio-fs with cache
disabled, the following warning was reported:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 404 at mm/page_alloc.c:4551 ......
Modules linked in:
CPU: 1 PID: 404 Comm: insmod Not tainted 6.9.0-rc5+ #123
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996) ......
RIP: 0010:__alloc_pages+0x2bf/0x380
......
Call Trace:
<TASK>
? __warn+0x8e/0x150
? __alloc_pages+0x2bf/0x380
__kmalloc_large_node+0x86/0x160
__kmalloc+0x33c/0x480
virtio_fs_enqueue_req+0x240/0x6d0
virtio_fs_wake_pending_and_unlock+0x7f/0x190
queue_request_and_unlock+0x55/0x60
fuse_simple_request+0x152/0x2b0
fuse_direct_io+0x5d2/0x8c0
fuse_file_read_iter+0x121/0x160
__kernel_read+0x151/0x2d0
kernel_read+0x45/0x50
kernel_read_file+0x1a9/0x2a0
init_module_from_file+0x6a/0xe0
idempotent_init_module+0x175/0x230
__x64_sys_finit_module+0x5d/0xb0
x64_sys_call+0x1c3/0x9e0
do_syscall_64+0x3d/0xc0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
......
</TASK>
---[ end trace 0000000000000000 ]---
The warning is triggered as follows:
1) syscall finit_module() handles the module insertion and it invokes
kernel_read_file() to read the content of the module first.
2) kernel_read_file() allocates a 10MB buffer by using vmalloc() and
passes it to kernel_read(). kernel_read() constructs a kvec iter by
using iov_iter_kvec() and passes it to fuse_file_read_iter().
3) virtio-fs disables the cache, so fuse_file_read_iter() invokes
fuse_direct_io(). As for now, the maximal read size for kvec iter is
only limited by fc->max_read. For virtio-fs, max_read is UINT_MAX, so
fuse_direct_io() doesn't split the 10MB buffer. It saves the address and
the size of the 10MB-sized buffer in out_args[0] of a fuse request and
passes the fuse request to virtio_fs_wake_pending_and_unlock().
4) virtio_fs_wake_pending_and_unlock() uses virtio_fs_enqueue_req() to
queue the request. Because virtiofs need DMA-able address, so
virtio_fs_enqueue_req() uses kmalloc() to allocate a bounce buffer for
all fuse args, copies these args into the bounce buffer and passed the
physical address of the bounce buffer to virtiofsd. The total length of
these fuse args for the passed fuse request is about 10MB, so
copy_args_to_argbuf() invokes kmalloc() with a 10MB size parameter and
it triggers the warning in __alloc_pages():
if (WARN_ON_ONCE_GFP(order > MAX_PAGE_ORDER, gfp))
return NULL;
5) virtio_fs_enqueue_req() will retry the memory allocation in a
kworker, but it won't help, because kmalloc() will always return NULL
due to the abnormal size and finit_module() will hang forever.
A feasible solution is to limit the value of max_read for virtio-fs, so
the length passed to kmalloc() will be limited. However it will affect
the maximal read size for normal read. And for virtio-fs write initiated
from kernel, it has the similar problem but now there is no way to limit
fc->max_write in kernel.
So instead of limiting both the values of max_read and max_write in
kernel, introducing use_pages_for_kvec_io in fuse_conn and setting it as
true in virtiofs. When use_pages_for_kvec_io is enabled, fuse will use
pages instead of pointer to pass the KVEC_IO data.
After switching to pages for KVEC_IO data, these pages will be used for
DMA through virtio-fs. If these pages are backed by vmalloc(),
{flush|invalidate}_kernel_vmap_range() are necessary to flush or
invalidate the cache before the DMA operation. So add two new fields in
fuse_args_pages to record the base address of vmalloc area and the
condition indicating whether invalidation is needed. Perform the flush
in fuse_get_user_pages() for write operations and the invalidation in
fuse_release_user_pages() for read operations.
It may seem necessary to introduce another field in fuse_conn to
indicate that these KVEC_IO pages are used for DMA, However, considering
that virtio-fs is currently the only user of use_pages_for_kvec_io, just
reuse use_pages_for_kvec_io to indicate that these pages will be used
for DMA.
Fixes: a62a8ef9d97d ("virtio-fs: add virtiofs filesystem")
Signed-off-by: Hou Tao <houtao1@huawei.com>
Tested-by: Jingbo Xu <jefflexu@linux.alibaba.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Commit 23c94e1cdcbf ("fuse: Switch to using async direct IO
for FOPEN_DIRECT_IO") gave the async direct IO code path in the
fuse_direct_read_iter() and fuse_direct_write_iter(). But since
these two functions are only called under FOPEN_DIRECT_IO is set,
it seems that we can also use the async direct IO even the flag
IOCB_DIRECT is not set to enjoy the async direct IO method. Also
move the definition of fuse_io_priv to where it is used in fuse_
direct_write_iter.
Signed-off-by: yangyun <yangyun50@huawei.com>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Introduce the capability to dynamically configure the max pages limit
(FUSE_MAX_MAX_PAGES) through a sysctl. This allows system administrators
to dynamically set the maximum number of pages that can be used for
servicing requests in fuse.
Previously, this is gated by FUSE_MAX_MAX_PAGES which is statically set
to 256 pages. One result of this is that the buffer size for a write
request is limited to 1 MiB on a 4k-page system.
The default value for this sysctl is the original limit (256 pages).
$ sysctl -a | grep max_pages_limit
fs.fuse.max_pages_limit = 256
$ sysctl -n fs.fuse.max_pages_limit
256
$ echo 1024 | sudo tee /proc/sys/fs/fuse/max_pages_limit
1024
$ sysctl -n fs.fuse.max_pages_limit
1024
$ echo 65536 | sudo tee /proc/sys/fs/fuse/max_pages_limit
tee: /proc/sys/fs/fuse/max_pages_limit: Invalid argument
$ echo 0 | sudo tee /proc/sys/fs/fuse/max_pages_limit
tee: /proc/sys/fs/fuse/max_pages_limit: Invalid argument
$ echo 65535 | sudo tee /proc/sys/fs/fuse/max_pages_limit
65535
$ sysctl -n fs.fuse.max_pages_limit
65535
Signed-off-by: Joanne Koong <joannelkoong@gmail.com>
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Andreas Gruenbacher:
- Fix the code that cleans up left-over unlinked files.
Various fixes and minor improvements in deleting files cached or held
open remotely.
- Simplify the use of dlm's DLM_LKF_QUECVT flag.
- A few other minor cleanups.
* tag 'gfs2-for-6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (21 commits)
gfs2: Prevent inode creation race
gfs2: Only defer deletes when we have an iopen glock
gfs2: Simplify DLM_LKF_QUECVT use
gfs2: gfs2_evict_inode clarification
gfs2: Make gfs2_inode_refresh static
gfs2: Use get_random_u32 in gfs2_orlov_skip
gfs2: Randomize GLF_VERIFY_DELETE work delay
gfs2: Use mod_delayed_work in gfs2_queue_try_to_evict
gfs2: Update to the evict / remote delete documentation
gfs2: Call gfs2_queue_verify_delete from gfs2_evict_inode
gfs2: Clean up delete work processing
gfs2: Minor delete_work_func cleanup
gfs2: Return enum evict_behavior from gfs2_upgrade_iopen_glock
gfs2: Rename dinode_demise to evict_behavior
gfs2: Rename GIF_{DEFERRED -> DEFER}_DELETE
gfs2: Faster gfs2_upgrade_iopen_glock wakeups
KMSAN: uninit-value in inode_go_dump (5)
gfs2: Fix unlinked inode cleanup
gfs2: Allow immediate GLF_VERIFY_DELETE work
gfs2: Initialize gl_no_formal_ino earlier
...
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When a request to evict an inode comes in over the network, we are
trying to grab an inode reference via the iopen glock's gl_object
pointer. There is a very small probability that by the time such a
request comes in, inode creation hasn't completed and the I_NEW flag is
still set. To deal with that, wait for the inode and then check if
inode creation was successful.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The mechanism to defer deleting unlinked inodes is tied to
delete_work_func(), which is tied to iopen glocks. When we don't have
an iopen glock, we must carry out deletes immediately instead.
Fixes a NULL pointer dereference in gfs2_evict_inode().
Fixes: 8c21c2c71e66 ("gfs2: Call gfs2_queue_verify_delete from gfs2_evict_inode")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The DLM_LKF_QUECVT flag needs to be set for "upward" lock conversions to
ensure fairness, but setting it for "downward" lock conversions will
lead to a failure. The flag is currently set based on the GLF_BLOCKING
flag and it's not immediately obvious why this is correct. Simplify
things by figuring out if a lock conversion is "upward" by looking at
the before and after locking modes instead of relying on the
GLF_BLOCKING flag.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When function evict_should_delete() returns SHOULD_DEFER_EVICTION, gh is
never initialized, but that isn't obvious; if it did initialize gh and
then return SHOULD_DEFER_EVICTION, gfs2_evict_inode() would fail to
release it. To clarify the code, change gfs2_evict_inode() to always
check if gh needs to be released, no matter what evict_should_delete()
returns.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | | |
Function gfs2_inode_refresh() is only used in fs/gfs2/glops.c.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Use get_random_u32() instead of get_random_bytes() to remove the last
remaining call to get_random_bytes().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Randomize the delay of GLF_VERIFY_DELETE work. This avoids thundering
herd problems when multiple nodes schedule that kind of work in response
to an inode being unlinked remotely.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In the unlikely case that we're trying to queue GLF_TRY_TO_EVICT work
for an inode that already has GLF_VERIFY_DELETE work queued, we want to
make sure that the GLF_TRY_TO_EVICT work gets scheduled immediately
instead of waiting for the delayed work timer to expire.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Try to be a bit more clear and remove some duplications. We cannot
actually get rid of the verification step eventually, so remove the
comment saying so.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | | |
Move calls to gfs2_queue_verify_delete() into gfs2_evict_inode().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Function delete_work_func() was previously assuming that the
GLF_TRY_TO_EVICT and GLF_VERIFY_DELETE flags won't both be set at the
same time, but there probably are races in which that can happen, so
handle that case correctly.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | | |
Move those definitions into the the scope in which they are used.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
In case an iopen glock cannot be upgraded, function
gfs2_upgrade_iopen_glock() needs to communicate to gfs2_evict_inode()
whether deleting the inode should be deferred or skipped altogether.
Change the function to return the appropriate enum evict_behavior value
to indicate that.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Rename enum dinode_demise to evict_behavior and its items
SHOULD_DELETE_DINODE to EVICT_SHOULD_DELETE,
SHOULD_NOT_DELETE_DINODE to EVICT_SHOULD_SKIP_DELETE, and
SHOULD_DEFER_EVICTION to EVICT_SHOULD_DEFER_DELETE.
In gfs2_evict_inode(), add a separate variable of type enum
evict_behavior instead of implicitly casting to int.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The GIF_DEFERRED_DELETE flag indicates an action that gfs2_evict_inode()
should take, so rename the flag to GIF_DEFER_DELETE to clarify.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Move function needs_demote() to glock.h and rename it to
glock_needs_demote(). In handle_callback(), wake up the glock when
setting the GLF_PENDING_DEMOTE flag as well. (Setting the GLF_DEMOTE
flag already triggered a wake-up.)
With that, check for glock_needs_demote() in gfs2_upgrade_iopen_glock()
to wake up when either of those flags is set for the inode glock: the
faster we can react to contention, the better.
The GLF_PENDING_DEMOTE flag is only used for inode glocks (see
gfs2_glock_cb()) so it's okay to only check for the GLF_DEMOTE flag in
gfs2_drop_inode(). Still, using glock_needs_demote() there as well
makes the code a little easier to read.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When mounting of a corrupted disk image fails, the error message printed
can reference uninitialized inode fields. To prevent that from happening,
always initialize those fields.
Reported-by: syzbot+aa0730b0a42646eb1359@syzkaller.appspotmail.com
Signed-off-by: Qianqiang Liu <qianqiang.liu@163.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Before commit f0e56edc2ec7 ("gfs2: Split the two kinds of glock "delete"
work"), function delete_work_func() was used to trigger the eviction of
in-memory inodes from remote as well as deleting unlinked inodes at a
later point. These two kinds of work were then split into two kinds of
work, and the two places in the code were deferred deletion of inodes is
required accidentally ended up queuing the wrong kind of work. This
caused unlinked inodes to be left behind, which could in the worst case
fill up filesystems and require a filesystem check to recover.
Fix that by queuing the right kind of work in try_rgrp_unlink() and
gfs2_drop_inode().
Fixes: f0e56edc2ec7 ("gfs2: Split the two kinds of glock "delete" work")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add an argument to gfs2_queue_verify_delete() that allows it to queue
GLF_VERIFY_DELETE work for immediate execution. This is used in the
next patch.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|