summaryrefslogtreecommitdiffstats
path: root/fs (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'vboxsf-v5.14-1' of ↵Linus Torvalds2021-07-133-38/+116
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/hansg/linux Pull vboxsf fixes from Hans de Goede: "This adds support for the atomic_open directory-inode op to vboxsf. Note this is not just an enhancement this also fixes an actual issue which users are hitting, see the commit message of the "boxsf: Add support for the atomic_open directory-inode" patch" * tag 'vboxsf-v5.14-1' of git://git.kernel.org/pub/scm/linux/kernel/git/hansg/linux: vboxsf: Add support for the atomic_open directory-inode op vboxsf: Add vboxsf_[create|release]_sf_handle() helpers vboxsf: Make vboxsf_dir_create() return the handle for the created file vboxsf: Honor excl flag to the dir-inode create op
| * vboxsf: Add support for the atomic_open directory-inode opHans de Goede2021-06-231-0/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Opening a new file is done in 2 steps on regular filesystems: 1. Call the create inode-op on the parent-dir to create an inode to hold the meta-data related to the file. 2. Call the open file-op to get a handle for the file. vboxsf however does not really use disk-backed inodes because it is based on passing through file-related system-calls through to the hypervisor. So both steps translate to an open(2) call being passed through to the hypervisor. With the handle returned by the first call immediately being closed again. Making 2 open calls for a single open(..., O_CREATE, ...) calls has 2 problems: a) It is not really efficient. b) It actually breaks some apps. An example of b) is doing a git clone inside a vboxsf mount. When git clone tries to create a tempfile to store the pak files which is downloading the following happens: 1. vboxsf_dir_mkfile() gets called with a mode of 0444 and succeeds. 2. vboxsf_file_open() gets called with file->f_flags containing O_RDWR. When the host is a Linux machine this fails because doing a open(..., O_RDWR) on a file which exists and has mode 0444 results in an -EPERM error. Other network-filesystems and fuse avoid the problem of needing to pass 2 open() calls to the other side by using the atomic_open directory-inode op. This commit fixes git clone not working inside a vboxsf mount, by adding support for the atomic_open directory-inode op. As an added bonus this should also make opening new files faster. The atomic_open implementation is modelled after the atomic_open implementations from the 9p and fuse code. Fixes: 0fd169576648 ("fs: Add VirtualBox guest shared folder (vboxsf) support") Reported-by: Ludovic Pouzenc <bugreports@pouzenc.fr> Signed-off-by: Hans de Goede <hdegoede@redhat.com>
| * vboxsf: Add vboxsf_[create|release]_sf_handle() helpersHans de Goede2021-06-232-27/+51
| | | | | | | | | | | | | | | | | | | | Factor out the code to create / release a struct vboxsf_handle into 2 new helper functions. This is a preparation patch for adding atomic_open support. Fixes: 0fd169576648 ("fs: Add VirtualBox guest shared folder (vboxsf) support") Signed-off-by: Hans de Goede <hdegoede@redhat.com>
| * vboxsf: Make vboxsf_dir_create() return the handle for the created fileHans de Goede2021-06-231-7/+11
| | | | | | | | | | | | | | | | | | Make vboxsf_dir_create() optionally return the vboxsf-handle for the created file. This is a preparation patch for adding atomic_open support. Fixes: 0fd169576648 ("fs: Add VirtualBox guest shared folder (vboxsf) support") Signed-off-by: Hans de Goede <hdegoede@redhat.com>
| * vboxsf: Honor excl flag to the dir-inode create opHans de Goede2021-06-231-7/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Honor the excl flag to the dir-inode create op, instead of behaving as if it is always set. Note the old behavior still worked most of the time since a non-exclusive open only calls the create op, if there is a race and the file is created between the dentry lookup and the calling of the create call. While at it change the type of the is_dir parameter to the vboxsf_dir_create() helper from an int to a bool, to be consistent with the use of bool for the excl parameter. Fixes: 0fd169576648 ("fs: Add VirtualBox guest shared folder (vboxsf) support") Signed-off-by: Hans de Goede <hdegoede@redhat.com>
* | Merge tag 'for-5.14-rc1-tag' of ↵Linus Torvalds2021-07-139-286/+687
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs zoned mode fixes from David Sterba: - fix deadlock when allocating system chunk - fix wrong mutex unlock on an error path - fix extent map splitting for append operation - update and fix message reporting unusable chunk space - don't block when background zone reclaim runs with balance in parallel * tag 'for-5.14-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: zoned: fix wrong mutex unlock on failure to allocate log root tree btrfs: don't block if we can't acquire the reclaim lock btrfs: properly split extent_map for REQ_OP_ZONE_APPEND btrfs: rework chunk allocation to avoid exhaustion of the system chunk array btrfs: fix deadlock with concurrent chunk allocations involving system chunks btrfs: zoned: print unusable percentage when reclaiming block groups btrfs: zoned: fix types for u64 division in btrfs_reclaim_bgs_work
| * | btrfs: zoned: fix wrong mutex unlock on failure to allocate log root treeFilipe Manana2021-07-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When syncing the log, if we fail to allocate the root node for the log root tree: 1) We are unlocking fs_info->tree_log_mutex, but at this point we have not yet locked this mutex; 2) We have locked fs_info->tree_root->log_mutex, but we end up not unlocking it; So fix this by unlocking fs_info->tree_root->log_mutex instead of fs_info->tree_log_mutex. Fixes: e75f9fd194090e ("btrfs: zoned: move log tree node allocation out of log_root_tree->log_mutex") CC: stable@vger.kernel.org # 5.13+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | btrfs: don't block if we can't acquire the reclaim lockJohannes Thumshirn2021-07-071-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we can't acquire the reclaim_bgs_lock on block group reclaim, we block until it is free. This can potentially stall for a long time. While reclaim of block groups is necessary for a good user experience on a zoned file system, there still is no need to block as it is best effort only, just like when we're deleting unused block groups. CC: stable@vger.kernel.org # 5.13 Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | btrfs: properly split extent_map for REQ_OP_ZONE_APPENDNaohiro Aota2021-07-071-29/+118
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Damien reported a test failure with btrfs/209. The test itself ran fine, but the fsck ran afterwards reported a corrupted filesystem. The filesystem corruption happens because we're splitting an extent and then writing the extent twice. We have to split the extent though, because we're creating too large extents for a REQ_OP_ZONE_APPEND operation. When dumping the extent tree, we can see two EXTENT_ITEMs at the same start address but different lengths. $ btrfs inspect dump-tree /dev/nullb1 -t extent ... item 19 key (269484032 EXTENT_ITEM 126976) itemoff 15470 itemsize 53 refs 1 gen 7 flags DATA extent data backref root FS_TREE objectid 257 offset 786432 count 1 item 20 key (269484032 EXTENT_ITEM 262144) itemoff 15417 itemsize 53 refs 1 gen 7 flags DATA extent data backref root FS_TREE objectid 257 offset 786432 count 1 The duplicated EXTENT_ITEMs originally come from wrongly split extent_map in extract_ordered_extent(). Since extract_ordered_extent() uses create_io_em() to split an existing extent_map, we will have split->orig_start != split->start. Then, it will be logged with non-zero "extent data offset". Finally, the logged entries are replayed into a duplicated EXTENT_ITEM. Introduce and use proper splitting function for extent_map. The function is intended to be simple and specific usage for extract_ordered_extent() e.g. not supporting compression case (we do not allow splitting compressed extent_map anyway). There was a question raised by Qu, in summary why we want to split the extent map (and not the bio): The problem is not the limit on the zone end, which as you mention is the same as the block group end. The problem is that data write use zone append (ZA) operations. ZA BIOs cannot be split so a large extent may need to be processed with multiple ZA BIOs, While that is also true for regular writes, the major difference is that ZA are "nameless" write operation giving back the written sectors on completion. And ZA operations may be reordered by the block layer (not intentionally though). Combine both of these characteristics and you can see that the data for a large extent may end up being shuffled when written resulting in data corruption and the impossibility to map the extent to some start sector. To avoid this problem, zoned btrfs uses the principle "one data extent == one ZA BIO". So large extents need to be split. This is unfortunate, but we can revisit this later and optimize, e.g. merge back together the fragments of an extent once written if they actually were written sequentially in the zone. Reported-by: Damien Le Moal <damien.lemoal@wdc.com> Fixes: d22002fd37bd ("btrfs: zoned: split ordered extent when bio is sent") CC: stable@vger.kernel.org # 5.12+ CC: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | btrfs: rework chunk allocation to avoid exhaustion of the system chunk arrayFilipe Manana2021-07-077-184/+546
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit eafa4fd0ad0607 ("btrfs: fix exhaustion of the system chunk array due to concurrent allocations") fixed a problem that resulted in exhausting the system chunk array in the superblock when there are many tasks allocating chunks in parallel. Basically too many tasks enter the first phase of chunk allocation without previous tasks having finished their second phase of allocation, resulting in too many system chunks being allocated. That was originally observed when running the fallocate tests of stress-ng on a PowerPC machine, using a node size of 64K. However that commit also introduced a deadlock where a task in phase 1 of the chunk allocation waited for another task that had allocated a system chunk to finish its phase 2, but that other task was waiting on an extent buffer lock held by the first task, therefore resulting in both tasks not making any progress. That change was later reverted by a patch with the subject "btrfs: fix deadlock with concurrent chunk allocations involving system chunks", since there is no simple and short solution to address it and the deadlock is relatively easy to trigger on zoned filesystems, while the system chunk array exhaustion is not so common. This change reworks the chunk allocation to avoid the system chunk array exhaustion. It accomplishes that by making the first phase of chunk allocation do the updates of the device items in the chunk btree and the insertion of the new chunk item in the chunk btree. This is done while under the protection of the chunk mutex (fs_info->chunk_mutex), in the same critical section that checks for available system space, allocates a new system chunk if needed and reserves system chunk space. This way we do not have chunk space reserved until the second phase completes. The same logic is applied to chunk removal as well, since it keeps reserved system space long after it is done updating the chunk btree. For direct allocation of system chunks, the previous behaviour remains, because otherwise we would deadlock on extent buffers of the chunk btree. Changes to the chunk btree are by large done by chunk allocation and chunk removal, which first reserve chunk system space and then later do changes to the chunk btree. The other remaining cases are uncommon and correspond to adding a device, removing a device and resizing a device. All these other cases do not pre-reserve system space, they modify the chunk btree right away, so they don't hold reserved space for a long period like chunk allocation and chunk removal do. The diff of this change is huge, but more than half of it is just addition of comments describing both how things work regarding chunk allocation and removal, including both the new behavior and the parts of the old behavior that did not change. CC: stable@vger.kernel.org # 5.12+ Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Tested-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Tested-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | btrfs: fix deadlock with concurrent chunk allocations involving system chunksFilipe Manana2021-07-073-69/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a task attempting to allocate a new chunk verifies that there is not currently enough free space in the system space_info and there is another task that allocated a new system chunk but it did not finish yet the creation of the respective block group, it waits for that other task to finish creating the block group. This is to avoid exhaustion of the system chunk array in the superblock, which is limited, when we have a thundering herd of tasks allocating new chunks. This problem was described and fixed by commit eafa4fd0ad0607 ("btrfs: fix exhaustion of the system chunk array due to concurrent allocations"). However there are two very similar scenarios where this can lead to a deadlock: 1) Task B allocated a new system chunk and task A is waiting on task B to finish creation of the respective system block group. However before task B ends its transaction handle and finishes the creation of the system block group, it attempts to allocate another chunk (like a data chunk for an fallocate operation for a very large range). Task B will be unable to progress and allocate the new chunk, because task A set space_info->chunk_alloc to 1 and therefore it loops at btrfs_chunk_alloc() waiting for task A to finish its chunk allocation and set space_info->chunk_alloc to 0, but task A is waiting on task B to finish creation of the new system block group, therefore resulting in a deadlock; 2) Task B allocated a new system chunk and task A is waiting on task B to finish creation of the respective system block group. By the time that task B enter the final phase of block group allocation, which happens at btrfs_create_pending_block_groups(), when it modifies the extent tree, the device tree or the chunk tree to insert the items for some new block group, it needs to allocate a new chunk, so it ends up at btrfs_chunk_alloc() and keeps looping there because task A has set space_info->chunk_alloc to 1, but task A is waiting for task B to finish creation of the new system block group and release the reserved system space, therefore resulting in a deadlock. In short, the problem is if a task B needs to allocate a new chunk after it previously allocated a new system chunk and if another task A is currently waiting for task B to complete the allocation of the new system chunk. Unfortunately this deadlock scenario introduced by the previous fix for the system chunk array exhaustion problem does not have a simple and short fix, and requires a big change to rework the chunk allocation code so that chunk btree updates are all made in the first phase of chunk allocation. And since this deadlock regression is being frequently hit on zoned filesystems and the system chunk array exhaustion problem is triggered in more extreme cases (originally observed on PowerPC with a node size of 64K when running the fallocate tests from stress-ng), revert the changes from that commit. The next patch in the series, with a subject of "btrfs: rework chunk allocation to avoid exhaustion of the system chunk array" does the necessary changes to fix the system chunk array exhaustion problem. Reported-by: Naohiro Aota <naohiro.aota@wdc.com> Link: https://lore.kernel.org/linux-btrfs/20210621015922.ewgbffxuawia7liz@naota-xeon/ Fixes: eafa4fd0ad0607 ("btrfs: fix exhaustion of the system chunk array due to concurrent allocations") CC: stable@vger.kernel.org # 5.12+ Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Tested-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Tested-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | btrfs: zoned: print unusable percentage when reclaiming block groupsJohannes Thumshirn2021-07-071-2/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we're automatically reclaiming a zone, because its zone_unusable value is above the reclaim threshold, we're only logging how much percent of the zone's capacity are used, but not how much of the capacity is unusable. Also print the percentage of the unusable space in the block group before we're reclaiming it. Example: BTRFS info (device sdg): reclaiming chunk 230686720 with 13% used 86% unusable CC: stable@vger.kernel.org # 5.13 Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
| * | btrfs: zoned: fix types for u64 division in btrfs_reclaim_bgs_workDavid Sterba2021-07-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The types in calculation of the used percentage in the reclaiming messages are both u64, though bg->length is either 1GiB (non-zoned) or the zone size in the zoned mode. The upper limit on zone size is 8GiB so this could theoretically overflow in the future, right now the values fit. Fixes: 18bb8bbf13c1 ("btrfs: zoned: automatically reclaim zones") CC: stable@vger.kernel.org # 5.13 Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
* | | Merge tag '5.14-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds2021-07-1012-29/+135
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull cifs fixes from Steve French: "13 cifs/smb3 fixes. Most are to address minor issues pointed out by Coverity. Also includes a packet signing enhancement and mount improvement" * tag '5.14-rc-smb3-fixes-part2' of git://git.samba.org/sfrench/cifs-2.6: cifs: update internal version number cifs: prevent NULL deref in cifs_compose_mount_options() SMB3.1.1: Add support for negotiating signing algorithm cifs: use helpers when parsing uid/gid mount options and validate them CIFS: Clarify SMB1 code for POSIX Lock CIFS: Clarify SMB1 code for rename open file CIFS: Clarify SMB1 code for delete CIFS: Clarify SMB1 code for SetFileSize smb3: fix typo in header file CIFS: Clarify SMB1 code for UnixSetPathInfo CIFS: Clarify SMB1 code for UnixCreateSymLink cifs: clarify SMB1 code for UnixCreateHardLink cifs: make locking consistent around the server session status
| * | | cifs: update internal version numberSteve French2021-07-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | To 2.33 Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | cifs: prevent NULL deref in cifs_compose_mount_options()Paulo Alcantara2021-07-091-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The optional @ref parameter might contain an NULL node_name, so prevent dereferencing it in cifs_compose_mount_options(). Addresses-Coverity: 1476408 ("Explicit null dereferenced") Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | SMB3.1.1: Add support for negotiating signing algorithmSteve French2021-07-094-11/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Support for faster packet signing (using GMAC instead of CMAC) can now be negotiated to some newer servers, including Windows. See MS-SMB2 section 2.2.3.17. This patch adds support for sending the new negotiate context with the first of three supported signing algorithms (AES-CMAC) and decoding the response. A followon patch will add support for sending the other two (including AES-GMAC, which is fastest) and changing the signing algorithm used based on what was negotiated. To allow the client to request GMAC signing set module parameter "enable_negotiate_signing" to 1. Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Reviewed-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | cifs: use helpers when parsing uid/gid mount options and validate themRonnie Sahlberg2021-07-092-5/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the nice helpers to initialize and the uid/gid/cred_uid when passed as mount arguments. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Acked-by: Pavel Shilovsky <pshilovsky@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | CIFS: Clarify SMB1 code for POSIX LockSteve French2021-07-071-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Coverity also complains about the way we calculate the offset (starting from the address of a 4 byte array within the header structure rather than from the beginning of the struct plus 4 bytes) for SMB1 PosixLock. This changeset doesn't change the address but makes it slightly clearer. Addresses-Coverity: 711520 ("Out of bounds write") Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | CIFS: Clarify SMB1 code for rename open fileSteve French2021-07-071-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Coverity also complains about the way we calculate the offset (starting from the address of a 4 byte array within the header structure rather than from the beginning of the struct plus 4 bytes) for SMB1 RenameOpenFile. This changeset doesn't change the address but makes it slightly clearer. Addresses-Coverity: 711521 ("Out of bounds write") Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | CIFS: Clarify SMB1 code for deleteSteve French2021-07-071-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Coverity also complains about the way we calculate the offset (starting from the address of a 4 byte array within the header structure rather than from the beginning of the struct plus 4 bytes) for SMB1 SetFileDisposition (which is used to unlink a file by setting the delete on close flag). This changeset doesn't change the address but makes it slightly clearer. Addresses-Coverity: 711524 ("Out of bounds write") Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | CIFS: Clarify SMB1 code for SetFileSizeSteve French2021-07-071-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Coverity also complains about the way we calculate the offset (starting from the address of a 4 byte array within the header structure rather than from the beginning of the struct plus 4 bytes) for setting the file size using SMB1. This changeset doesn't change the address but makes it slightly clearer. Addresses-Coverity: 711525 ("Out of bounds write") Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | smb3: fix typo in header fileSteve French2021-07-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Although it compiles, the test robot correctly noted: 'cifsacl.h' file not found with <angled> include; use "quotes" instead Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | CIFS: Clarify SMB1 code for UnixSetPathInfoSteve French2021-07-031-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Coverity also complains about the way we calculate the offset (starting from the address of a 4 byte array within the header structure rather than from the beginning of the struct plus 4 bytes) for doing SetPathInfo (setattr) when using the Unix extensions. This doesn't change the address but makes it slightly clearer. Addresses-Coverity: 711528 ("Out of bounds read") Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | CIFS: Clarify SMB1 code for UnixCreateSymLinkSteve French2021-07-031-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Coverity also complains about the way we calculate the offset (starting from the address of a 4 byte array within the header structure rather than from the beginning of the struct plus 4 bytes) for creating SMB1 symlinks when using the Unix extensions. This doesn't change the address but makes it slightly clearer. Addresses-Coverity: 711530 ("Out of bounds read") Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | cifs: clarify SMB1 code for UnixCreateHardLinkSteve French2021-07-032-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Coverity complains about the way we calculate the offset (starting from the address of a 4 byte array within the header structure rather than from the beginning of the struct plus 4 bytes). This doesn't change the address but makes it slightly clearer. Addresses-Coverity: 711529 ("Out of bounds read") Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | cifs: make locking consistent around the server session statusSteve French2021-07-033-1/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There were three places where we were not taking the spinlock around updates to server->tcpStatus when it was being modified. To be consistent (also removes Coverity warning) and to remove possibility of race best to lock all places where it is updated. Two of the three were in initialization of the field and can't race - but added lock around the other. Addresses-Coverity: 1399512 ("Data race condition") Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com>
* | | | Merge tag 'io_uring-5.14-2021-07-09' of git://git.kernel.dk/linux-blockLinus Torvalds2021-07-091-123/+68
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull io_uring fixes from Jens Axboe: "A few fixes that should go into this merge. One fixes a regression introduced in this release, others are just generic fixes, mostly related to handling fallback task_work" * tag 'io_uring-5.14-2021-07-09' of git://git.kernel.dk/linux-block: io_uring: remove dead non-zero 'poll' check io_uring: mitigate unlikely iopoll lag io_uring: fix drain alloc fail return code io_uring: fix exiting io_req_task_work_add leaks io_uring: simplify task_work func io_uring: fix stuck fallback reqs
| * | | | io_uring: remove dead non-zero 'poll' checkJens Axboe2021-07-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Colin reports that Coverity complains about checking for poll being non-zero after having dereferenced it multiple times. This is a valid complaint, and actually a leftover from back when this code was based on the aio poll code. Kill the redundant check. Link: https://lore.kernel.org/io-uring/fe70c532-e2a7-3722-58a1-0fa4e5c5ff2c@canonical.com/ Reported-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | io_uring: mitigate unlikely iopoll lagPavel Begunkov2021-07-081-1/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have requests like IORING_OP_FILES_UPDATE that don't go through ->iopoll_list but get completed in place under ->uring_lock, and so after dropping the lock io_iopoll_check() should expect that some CQEs might have get completed in a meanwhile. Currently such events won't be accounted in @nr_events, and the loop will continue to poll even if there is enough of CQEs. It shouldn't be a problem as it's not likely to happen and so, but not nice either. Just return earlier in this case, it should be enough. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/66ef932cc66a34e3771bbae04b2953a8058e9d05.1625747741.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | io_uring: fix drain alloc fail return codePavel Begunkov2021-07-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After a recent change io_drain_req() started to fail requests with result=0 in case of allocation failure, where it should be and have been -ENOMEM. Fixes: 76cc33d79175a ("io_uring: refactor io_req_defer()") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/e068110ac4293e0c56cfc4d280d0f22b9303ec08.1625682153.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | io_uring: fix exiting io_req_task_work_add leaksPavel Begunkov2021-07-011-46/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If one entered io_req_task_work_add() not seeing PF_EXITING, it will set a ->task_state bit and try task_work_add(), which may fail by that moment. If that happens the function would try to cancel the request. However, in a meanwhile there might come other io_req_task_work_add() callers, which will see the bit set and leave their requests in the list, which will never be executed. Don't propagate an error, but clear the bit first and then fallback all requests that we can splice from the list. The callback functions have to be able to deal with PF_EXITING, so poll and apoll was modified via changing io_poll_rewait(). Fixes: 7cbf1722d5fc ("io_uring: provide FIFO ordering for task_work") Reported-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/060002f19f1fdbd130ba24aef818ea4d3080819b.1625142209.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | io_uring: simplify task_work funcPavel Begunkov2021-07-011-44/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since we don't really use req->task_work anymore, get rid of it together with the nasty ->func aliasing between ->io_task_work and ->task_work, and hide ->fallback_node inside of io_task_work. Also, as task_work is gone now, replace the callback type from task_work_func_t to a function taking io_kiocb to avoid casting and simplify code. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
| * | | | io_uring: fix stuck fallback reqsPavel Begunkov2021-07-011-41/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When task_work_add() fails, we use ->exit_task_work to queue the work. That will be run only in the cancellation path, which happens either when the ctx is dying or one of tasks with inflight requests is exiting or executing. There is a good chance that such a request would just get stuck in the list potentially hodling a file, all io_uring rsrc recycling or some other resources. Nothing terrible, it'll go away at some point, but we don't want to lock them up for longer than needed. Replace that hand made ->exit_task_work with delayed_work + llist inspired by fput_many(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | | Merge tag 'block-5.14-2021-07-08' of git://git.kernel.dk/linux-blockLinus Torvalds2021-07-091-1/+1
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull more block updates from Jens Axboe: "A combination of changes that ended up depending on both the driver and core branch (and/or the IDE removal), and a few late arriving fixes. In detail: - Fix io ticks wrap-around issue (Chunguang) - nvme-tcp sock locking fix (Maurizio) - s390-dasd fixes (Kees, Christoph) - blk_execute_rq polling support (Keith) - blk-cgroup RCU iteration fix (Yu) - nbd backend ID addition (Prasanna) - Partition deletion fix (Yufen) - Use blk_mq_alloc_disk for mmc, mtip32xx, ubd (Christoph) - Removal of now dead block request types due to IDE removal (Christoph) - Loop probing and control device cleanups (Christoph) - Device uevent fix (Christoph) - Misc cleanups/fixes (Tetsuo, Christoph)" * tag 'block-5.14-2021-07-08' of git://git.kernel.dk/linux-block: (34 commits) blk-cgroup: prevent rcu_sched detected stalls warnings while iterating blkgs block: fix the problem of io_ticks becoming smaller nvme-tcp: can't set sk_user_data without write_lock loop: remove unused variable in loop_set_status() block: remove the bdgrab in blk_drop_partitions block: grab a device refcount in disk_uevent s390/dasd: Avoid field over-reading memcpy() dasd: unexport dasd_set_target_state block: check disk exist before trying to add partition ubd: remove dead code in ubd_setup_common nvme: use return value from blk_execute_rq() block: return errors from blk_execute_rq() nvme: use blk_execute_rq() for passthrough commands block: support polling through blk_execute_rq block: remove REQ_OP_SCSI_{IN,OUT} block: mark blk_mq_init_queue_data static loop: rewrite loop_exit using idr_for_each_entry loop: split loop_lookup loop: don't allow deleting an unspecified loop device loop: move loop_ctl_mutex locking into loop_add ...
| * | | | | block: remove REQ_OP_SCSI_{IN,OUT}Christoph Hellwig2021-06-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | With the legacy IDE driver gone drivers now use either REQ_OP_DRV_* or REQ_OP_SCSI_*, so unify the two concepts of passthrough requests into a single one. Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | | | | | Merge tag 'for-linus-5.14-rc1' of ↵Linus Torvalds2021-07-099-22/+51
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull UBIFS updates from Richard Weinberger: - Fix for a race xattr list and modification - Various minor fixes (spelling, return codes, ...) * tag 'for-linus-5.14-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: ubifs: Set/Clear I_LINKABLE under i_lock for whiteout inode ubifs: Fix spelling mistakes ubifs: Remove ui_mutex in ubifs_xattr_get and change_xattr ubifs: Fix races between xattr_{set|get} and listxattr operations ubifs: fix snprintf() checking ubifs: journal: Fix error return code in ubifs_jnl_write_inode()
| * | | | | | ubifs: Set/Clear I_LINKABLE under i_lock for whiteout inodeZhihao Cheng2021-06-221-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xfstests-generic/476 reports a warning message as below: WARNING: CPU: 2 PID: 30347 at fs/inode.c:361 inc_nlink+0x52/0x70 Call Trace: do_rename+0x502/0xd40 [ubifs] ubifs_rename+0x8b/0x180 [ubifs] vfs_rename+0x476/0x1080 do_renameat2+0x67c/0x7b0 __x64_sys_renameat2+0x6e/0x90 do_syscall_64+0x66/0xe0 entry_SYSCALL_64_after_hwframe+0x44/0xae Following race case can cause this: rename_whiteout(Thread 1) wb_workfn(Thread 2) ubifs_rename do_rename __writeback_single_inode spin_lock(&inode->i_lock) whiteout->i_state |= I_LINKABLE inode->i_state &= ~dirty; ---- How race happens on i_state: (tmp = whiteout->i_state | I_LINKABLE) (tmp = inode->i_state & ~dirty) (whiteout->i_state = tmp) (inode->i_state = tmp) ---- spin_unlock(&inode->i_lock) inc_nlink(whiteout) WARN_ON(!(inode->i_state & I_LINKABLE)) !!! Fix to add i_lock to avoid i_state update race condition. Fixes: 9e0a1fff8db56ea ("ubifs: Implement RENAME_WHITEOUT") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | | | ubifs: Fix spelling mistakesZheng Yongjun2021-06-226-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix some spelling mistakes in comments: withoug ==> without numer ==> number aswell ==> as well referes ==> refers childs ==> children unnecesarry ==> unnecessary Signed-off-by: Zheng Yongjun <zhengyongjun3@huawei.com> Reviewed-by: Alexander Dahl <ada@thorsis.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | | | ubifs: Remove ui_mutex in ubifs_xattr_get and change_xattrZhihao Cheng2021-06-181-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since ubifs_xattr_get and ubifs_xattr_set cannot being executed parallelly after importing @host_ui->xattr_sem, now we can remove ui_mutex imported by commit ab92a20bce3b4c2 ("ubifs: make ubifs_[get|set]xattr atomic"). @xattr_size, @xattr_names and @xattr_cnt can't be out of protection by @host_ui->mutex yet, they are sill accesed in other places, such as pack_inode() called by ubifs_write_inode() triggered by page-writeback. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | | | ubifs: Fix races between xattr_{set|get} and listxattr operationsZhihao Cheng2021-06-183-11/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | UBIFS may occur some problems with concurrent xattr_{set|get} and listxattr operations, such as assertion failure, memory corruption, stale xattr value[1]. Fix it by importing a new rw-lock in @ubifs_inode to serilize write operations on xattr, concurrent read operations are still effective, just like ext4. [1] https://lore.kernel.org/linux-mtd/20200630130438.141649-1-houtao1@huawei.com Fixes: 1e51764a3c2ac05a23 ("UBIFS: add new flash file system") Cc: stable@vger.kernel.org # v2.6+ Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Sascha Hauer <s.hauer@pengutronix.de> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | | | ubifs: fix snprintf() checkingDan Carpenter2021-06-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The snprintf() function returns the number of characters (not counting the NUL terminator) that it would have printed if we had space. This buffer has UBIFS_DFS_DIR_LEN characters plus one extra for the terminator. Printing UBIFS_DFS_DIR_LEN is okay but anything higher will result in truncation. Thus the comparison needs to be change from == to >. These strings are compile time constants so this patch doesn't affect runtime. Fixes: ae380ce04731 ("UBIFS: lessen the size of debugging info data structure") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Alexander Dahl <ada@thorsis.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | | | | | ubifs: journal: Fix error return code in ubifs_jnl_write_inode()Zhen Lei2021-06-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix to return a negative error code from the error handling case instead of 0, as done elsewhere in this function. Fixes: 9ca2d7326444 ("ubifs: Limit number of xattrs per inode") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
* | | | | | | Merge tag 'ext4_for_linus_stable' of ↵Linus Torvalds2021-07-098-128/+113
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Ext4 regression and bug fixes" * tag 'ext4_for_linus_stable' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: inline jbd2_journal_[un]register_shrinker() ext4: fix flags validity checking for EXT4_IOC_CHECKPOINT ext4: fix possible UAF when remounting r/o a mmp-protected file system ext4: use ext4_grp_locked_error in mb_find_extent ext4: fix WARN_ON_ONCE(!buffer_uptodate) after an error writing the superblock Revert "ext4: consolidate checks for resize of bigalloc into ext4_resize_begin"
| * | | | | | | ext4: inline jbd2_journal_[un]register_shrinker()Theodore Ts'o2021-07-083-99/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function jbd2_journal_unregister_shrinker() was getting called twice when the file system was getting unmounted. On Power and ARM platforms this was causing kernel crash when unmounting the file system, when a percpu_counter was destroyed twice. Fix this by removing jbd2_journal_[un]register_shrinker() functions, and inlining the shrinker setup and teardown into journal_init_common() and jbd2_journal_destroy(). This means that ext4 and ocfs2 now no longer need to know about registering and unregistering jbd2's shrinker. Also, while we're at it, rename the percpu counter from j_jh_shrink_count to j_checkpoint_jh_count, since this makes it clearer what this counter is intended to track. Link: https://lore.kernel.org/r/20210705145025.3363130-1-tytso@mit.edu Fixes: 4ba3fcdde7e3 ("jbd2,ext4: add a shrinker to release checkpointed buffers") Reported-by: Jon Hunter <jonathanh@nvidia.com> Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Tested-by: Jon Hunter <jonathanh@nvidia.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * | | | | | | ext4: fix flags validity checking for EXT4_IOC_CHECKPOINTTheodore Ts'o2021-07-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the correct bitmask when checking for any not-yet-supported flags. Link: https://lore.kernel.org/r/20210702173425.1276158-1-tytso@mit.edu Fixes: 351a0a3fbc35 ("ext4: add ioctl EXT4_IOC_CHECKPOINT") Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Leah Rumancik <leah.rumancik@gmail.com>
| * | | | | | | ext4: fix possible UAF when remounting r/o a mmp-protected file systemTheodore Ts'o2021-07-082-17/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After commit 618f003199c6 ("ext4: fix memory leak in ext4_fill_super"), after the file system is remounted read-only, there is a race where the kmmpd thread can exit, causing sbi->s_mmp_tsk to point at freed memory, which the call to ext4_stop_mmpd() can trip over. Fix this by only allowing kmmpd() to exit when it is stopped via ext4_stop_mmpd(). Link: https://lore.kernel.org/r/20210707002433.3719773-1-tytso@mit.edu Reported-by: Ye Bin <yebin10@huawei.com> Bug-Report-Link: <20210629143603.2166962-1-yebin10@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
| * | | | | | | ext4: use ext4_grp_locked_error in mb_find_extentStephen Brennan2021-07-011-4/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 5d1b1b3f492f ("ext4: fix BUG when calling ext4_error with locked block group") introduces ext4_grp_locked_error to handle unlocking a group in error cases. Otherwise, there is a possibility of a sleep while atomic. However, since 43c73221b3b1 ("ext4: replace BUG_ON with WARN_ON in mb_find_extent()"), mb_find_extent() has contained a ext4_error() call while a group spinlock is held. Replace this with ext4_grp_locked_error. Fixes: 43c73221b3b1 ("ext4: replace BUG_ON with WARN_ON in mb_find_extent()") Cc: <stable@vger.kernel.org> # 4.14+ Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Reviewed-by: Junxiao Bi <junxiao.bi@oracle.com> Link: https://lore.kernel.org/r/20210623232114.34457-1-stephen.s.brennan@oracle.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * | | | | | | ext4: fix WARN_ON_ONCE(!buffer_uptodate) after an error writing the superblockYe Bin2021-07-012-3/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a writeback of the superblock fails with an I/O error, the buffer is marked not uptodate. However, this can cause a WARN_ON to trigger when we attempt to write superblock a second time. (Which might succeed this time, for cerrtain types of block devices such as iSCSI devices over a flaky network.) Try to detect this case in flush_stashed_error_work(), and also change __ext4_handle_dirty_metadata() so we always set the uptodate flag, not just in the nojournal case. Before this commit, this problem can be repliciated via: 1. dmsetup create dust1 --table '0 2097152 dust /dev/sdc 0 4096' 2. mount /dev/mapper/dust1 /home/test 3. dmsetup message dust1 0 addbadblock 0 10 4. cd /home/test 5. echo "XXXXXXX" > t After a few seconds, we got following warning: [ 80.654487] end_buffer_async_write: bh=0xffff88842f18bdd0 [ 80.656134] Buffer I/O error on dev dm-0, logical block 0, lost async page write [ 85.774450] EXT4-fs error (device dm-0): ext4_check_bdev_write_error:193: comm kworker/u16:8: Error while async write back metadata [ 91.415513] mark_buffer_dirty: bh=0xffff88842f18bdd0 [ 91.417038] ------------[ cut here ]------------ [ 91.418450] WARNING: CPU: 1 PID: 1944 at fs/buffer.c:1092 mark_buffer_dirty.cold+0x1c/0x5e [ 91.440322] Call Trace: [ 91.440652] __jbd2_journal_temp_unlink_buffer+0x135/0x220 [ 91.441354] __jbd2_journal_unfile_buffer+0x24/0x90 [ 91.441981] __jbd2_journal_refile_buffer+0x134/0x1d0 [ 91.442628] jbd2_journal_commit_transaction+0x249a/0x3240 [ 91.443336] ? put_prev_entity+0x2a/0x200 [ 91.443856] ? kjournald2+0x12e/0x510 [ 91.444324] kjournald2+0x12e/0x510 [ 91.444773] ? woken_wake_function+0x30/0x30 [ 91.445326] kthread+0x150/0x1b0 [ 91.445739] ? commit_timeout+0x20/0x20 [ 91.446258] ? kthread_flush_worker+0xb0/0xb0 [ 91.446818] ret_from_fork+0x1f/0x30 [ 91.447293] ---[ end trace 66f0b6bf3d1abade ]--- Signed-off-by: Ye Bin <yebin10@huawei.com> Link: https://lore.kernel.org/r/20210615090537.3423231-1-yebin10@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * | | | | | | Revert "ext4: consolidate checks for resize of bigalloc into ext4_resize_begin"Theodore Ts'o2021-07-012-4/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function ext4_resize_begin() gets called from three different places, and online resize for bigalloc file systems is disallowed from the old-style online resize (EXT4_IOC_GROUP_ADD and EXT4_IOC_GROUP_EXTEND), but it *is* supposed to be allowed via EXT4_IOC_RESIZE_FS. This reverts commit e9f9f61d0cdcb7f0b0b5feb2d84aa1c5894751f3.