summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* f2fs: support age threshold based garbage collectionChao Yu2020-09-1113-64/+632
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are several issues in current background GC algorithm: - valid blocks is one of key factors during cost overhead calculation, so if segment has less valid block, however even its age is young or it locates hot segment, CB algorithm will still choose the segment as victim, it's not appropriate. - GCed data/node will go to existing logs, no matter in-there datas' update frequency is the same or not, it may mix hot and cold data again. - GC alloctor mainly use LFS type segment, it will cost free segment more quickly. This patch introduces a new algorithm named age threshold based garbage collection to solve above issues, there are three steps mainly: 1. select a source victim: - set an age threshold, and select candidates beased threshold: e.g. 0 means youngest, 100 means oldest, if we set age threshold to 80 then select dirty segments which has age in range of [80, 100] as candiddates; - set candidate_ratio threshold, and select candidates based the ratio, so that we can shrink candidates to those oldest segments; - select target segment with fewest valid blocks in order to migrate blocks with minimum cost; 2. select a target victim: - select candidates beased age threshold; - set candidate_radius threshold, search candidates whose age is around source victims, searching radius should less than the radius threshold. - select target segment with most valid blocks in order to avoid migrating current target segment. 3. merge valid blocks from source victim into target victim with SSR alloctor. Test steps: - create 160 dirty segments: * half of them have 128 valid blocks per segment * left of them have 384 valid blocks per segment - run background GC Benefit: GC count and block movement count both decrease obviously: - Before: - Valid: 86 - Dirty: 1 - Prefree: 11 - Free: 6001 (6001) GC calls: 162 (BG: 220) - data segments : 160 (160) - node segments : 2 (2) Try to move 41454 blocks (BG: 41454) - data blocks : 40960 (40960) - node blocks : 494 (494) IPU: 0 blocks SSR: 0 blocks in 0 segments LFS: 41364 blocks in 81 segments - After: - Valid: 87 - Dirty: 0 - Prefree: 4 - Free: 6008 (6008) GC calls: 75 (BG: 76) - data segments : 74 (74) - node segments : 1 (1) Try to move 12813 blocks (BG: 12813) - data blocks : 12544 (12544) - node blocks : 269 (269) IPU: 0 blocks SSR: 12032 blocks in 77 segments LFS: 855 blocks in 2 segments Signed-off-by: Chao Yu <yuchao0@huawei.com> [Jaegeuk Kim: fix a bug along with pinfile in-mem segment & clean up] Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: point man pages for some f2fs utilsJaegeuk Kim2020-09-101-2/+44
| | | | | | | This patch adds some missing contexts related to f2fs-tools in f2fs documentation. Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: Use generic casefolding supportDaniel Rosenberg2020-09-105-91/+20
| | | | | | | | | | | | | This switches f2fs over to the generic support provided in the previous patch. Since casefolded dentries behave the same in ext4 and f2fs, we decrease the maintenance burden by unifying them, and any optimizations will immediately apply to both. Signed-off-by: Daniel Rosenberg <drosen@google.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* fs: Add standard casefolding supportDaniel Rosenberg2020-09-102-0/+103
| | | | | | | | | | | | | | | | | This adds general supporting functions for filesystems that use utf8 casefolding. It provides standard dentry_operations and adds the necessary structures in struct super_block to allow this standardization. The new dentry operations are functionally equivalent to the existing operations in ext4 and f2fs, apart from the use of utf8_casefold_hash to avoid an allocation. By providing a common implementation, all users can benefit from any optimizations without needing to port over improvements. Signed-off-by: Daniel Rosenberg <drosen@google.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* unicode: Add utf8_casefold_hashDaniel Rosenberg2020-09-102-1/+25
| | | | | | | | | | | | | | This adds a case insensitive hash function to allow taking the hash without needing to allocate a casefolded copy of the string. The existing d_hash implementations for casefolding allocate memory within rcu-walk, by avoiding it we can be more efficient and avoid worrying about a failed allocation. Signed-off-by: Daniel Rosenberg <drosen@google.com> Reviewed-by: Gabriel Krisman Bertazi <krisman@collabora.com> Reviewed-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: compress: use more readable atomic_t type for {cic,dic}.refChao Yu2020-09-103-10/+10
| | | | | | | | | | refcount_t type variable should never be less than one, so it's a little bit hard to understand when we use it to indicate pending compressed page count, let's change to use atomic_t for better readability. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: fix compile warningChao Yu2020-09-101-2/+5
| | | | | | | | | | | | | | | | | | | | This patch fixes below compile warning reported by LKP (kernel test robot) cppcheck warnings: (new ones prefixed by >>) >> fs/f2fs/file.c:761:9: warning: Identical condition 'err', second condition is always false [identicalConditionAfterEarlyExit] return err; ^ fs/f2fs/file.c:753:6: note: first condition if (err) ^ fs/f2fs/file.c:761:9: note: second condition return err; Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: support 64-bits key in f2fs rb-tree node entryChao Yu2020-09-103-7/+49
| | | | | | | | then, we can add specified entry into rb-tree with 64-bits segment time as key. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: inherit mtime of original block during GCChao Yu2020-09-104-17/+50
| | | | | | | Don't let f2fs inner GC ruins original aging degree of segment. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: record average update time of segmentChao Yu2020-09-101-3/+18
| | | | | | | | | | | | | | | | | Previously, once we update one block in segment, we will update mtime of segment to last time, making aged segment becoming freshest, result in that GC with cost benefit algorithm missing such segment, So this patch changes to record mtime as average block updating time instead of last updating time. It's not needed to reset mtime for prefree segment, as se->valid_blocks is zero, then old se->mtime won't take any weight with below calculation: se->mtime = div_u64(se->mtime * se->valid_blocks + mtime, se->valid_blocks + 1); Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: introduce inmem cursegChao Yu2020-09-108-52/+113
| | | | | | | | | | | | | | | | | | | | | | | | | | Previous implementation of aligned pinfile allocation will: - allocate new segment on cold data log no matter whether last used segment is partially used or not, it makes IOs more random; - force concurrent cold data/GCed IO going into warm data area, it can make a bad effect on hot/cold data separation; In this patch, we introduce a new type of log named 'inmem curseg', the differents from normal curseg is: - it reuses existed segment type (CURSEG_XXX_NODE/DATA); - it only exists in memory, its segno, blkofs, summary will not b persisted into checkpoint area; With this new feature, we can enhance scalability of log, special allocators can be created for purposes: - pure lfs allocator for aligned pinfile allocation or file defragmentation - pure ssr allocator for later feature So that, let's update aligned pinfile allocation to use this new inmem curseg fwk. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: compress: remove unneeded codeChao Yu2020-09-101-4/+0
| | | | | | | | | | | | | | | | - f2fs_write_multi_pages - f2fs_compress_pages - init_compress_ctx - compress_pages - destroy_compress_ctx --- 1 - f2fs_write_compressed_pages - destroy_compress_ctx --- 2 destroy_compress_ctx() in f2fs_write_multi_pages() is redundant, remove it. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: remove duplicated type castingXiaojun Wang2020-09-103-4/+4
| | | | | | | | | | Since DUMMY_WRITTEN_PAGE and ATOMIC_WRITTEN_PAGE have already been converted as unsigned long type, we don't need do type casting again. Signed-off-by: Xiaojun Wang <wangxiaojun11@huawei.com> Reported-by: Jack Qiu <jack.qiu@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* f2fs: support zone capacity less than zone sizeAravind Ramesh2020-09-107-37/+275
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NVMe Zoned Namespace devices can have zone-capacity less than zone-size. Zone-capacity indicates the maximum number of sectors that are usable in a zone beginning from the first sector of the zone. This makes the sectors sectors after the zone-capacity till zone-size to be unusable. This patch set tracks zone-size and zone-capacity in zoned devices and calculate the usable blocks per segment and usable segments per section. If zone-capacity is less than zone-size mark only those segments which start before zone-capacity as free segments. All segments at and beyond zone-capacity are treated as permanently used segments. In cases where zone-capacity does not align with segment size the last segment will start before zone-capacity and end beyond the zone-capacity of the zone. For such spanning segments only sectors within the zone-capacity are used. During writes and GC manage the usable segments in a section and usable blocks per segment. Segments which are beyond zone-capacity are never allocated, and do not need to be garbage collected, only the segments which are before zone-capacity needs to garbage collected. For spanning segments based on the number of usable blocks in that segment, write to blocks only up to zone-capacity. Zone-capacity is device specific and cannot be configured by the user. Since NVMe ZNS device zones are sequentially write only, a block device with conventional zones or any normal block device is needed along with the ZNS device for the metadata operations of F2fs. A typical nvme-cli output of a zoned device shows zone start and capacity and write pointer as below: SLBA: 0x0 WP: 0x0 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ SLBA: 0x20000 WP: 0x20000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ SLBA: 0x40000 WP: 0x40000 Cap: 0x18800 State: EMPTY Type: SEQWRITE_REQ Here zone size is 64MB, capacity is 49MB, WP is at zone start as the zones are in EMPTY state. For each zone, only zone start + 49MB is usable area, any lba/sector after 49MB cannot be read or written to, the drive will fail any attempts to read/write. So, the second zone starts at 64MB and is usable till 113MB (64 + 49) and the range between 113 and 128MB is again unusable. The next zone starts at 128MB, and so on. Signed-off-by: Aravind Ramesh <aravind.ramesh@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* Merge tag 'f2fs-for-5.9-rc5' of ↵Linus Torvalds2020-09-103-4/+10
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs fixes from Jaegeuk Kim: "Small bug fixes for: - SMR drive fix - infinite loop when building free node ids - EOF at DIO read" * tag 'f2fs-for-5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: f2fs: Return EOF on unaligned end of file DIO read f2fs: fix indefinite loop scanning for free nid f2fs: Fix type of section block count variables
| * f2fs: Return EOF on unaligned end of file DIO readGabriel Krisman Bertazi2020-09-091-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Reading past end of file returns EOF for aligned reads but -EINVAL for unaligned reads on f2fs. While documentation is not strict about this corner case, most filesystem returns EOF on this case, like iomap filesystems. This patch consolidates the behavior for f2fs, by making it return EOF(0). it can be verified by a read loop on a file that does a partial read before EOF (A file that doesn't end at an aligned address). The following code fails on an unaligned file on f2fs, but not on btrfs, ext4, and xfs. while (done < total) { ssize_t delta = pread(fd, buf + done, total - done, off + done); if (!delta) break; ... } It is arguable whether filesystems should actually return EOF or -EINVAL, but since iomap filesystems support it, and so does the original DIO code, it seems reasonable to consolidate on that. Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: fix indefinite loop scanning for free nidSahitya Tummala2020-09-091-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the sbi->ckpt->next_free_nid is not NAT block aligned and if there are free nids in that NAT block between the start of the block and next_free_nid, then those free nids will not be scanned in scan_nat_page(). This results into mismatch between nm_i->available_nids and the sum of nm_i->free_nid_count of all NAT blocks scanned. And nm_i->available_nids will always be greater than the sum of free nids in all the blocks. Under this condition, if we use all the currently scanned free nids, then it will loop forever in f2fs_alloc_nid() as nm_i->available_nids is still not zero but nm_i->free_nid_count of that partially scanned NAT block is zero. Fix this to align the nm_i->next_scan_nid to the first nid of the corresponding NAT block. Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * f2fs: Fix type of section block count variablesShin'ichiro Kawasaki2020-09-091-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit da52f8ade40b ("f2fs: get the right gc victim section when section has several segments") added code to count blocks of each section using variables with type 'unsigned short', which has 2 bytes size in many systems. However, the counts can be larger than the 2 bytes range and type conversion results in wrong values. Especially when the f2fs sections have blocks as many as USHRT_MAX + 1, the count is handled as 0. This triggers eternal loop in init_dirty_segmap() at mount system call. Fix this by changing the type of the variables to block_t. Fixes: da52f8ade40b ("f2fs: get the right gc victim section when section has several segments") Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* | Merge branch 'linus' of ↵Linus Torvalds2020-09-101-2/+3
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Pull crypto fix from Herbert Xu: "This fixes a regression in padata" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: padata: fix possible padata_works_lock deadlock
| * | padata: fix possible padata_works_lock deadlockDaniel Jordan2020-09-041-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | syzbot reports, WARNING: inconsistent lock state 5.9.0-rc2-syzkaller #0 Not tainted -------------------------------- inconsistent {IN-SOFTIRQ-W} -> {SOFTIRQ-ON-W} usage. syz-executor.0/26715 takes: (padata_works_lock){+.?.}-{2:2}, at: padata_do_parallel kernel/padata.c:220 {IN-SOFTIRQ-W} state was registered at: spin_lock include/linux/spinlock.h:354 [inline] padata_do_parallel kernel/padata.c:220 ... __do_softirq kernel/softirq.c:298 ... sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1091 asm_sysvec_apic_timer_interrupt arch/x86/include/asm/idtentry.h:581 Possible unsafe locking scenario: CPU0 ---- lock(padata_works_lock); <Interrupt> lock(padata_works_lock); padata_do_parallel() takes padata_works_lock with softirqs enabled, so a deadlock is possible if, on the same CPU, the lock is acquired in process context and then softirq handling done in an interrupt leads to the same path. Fix by leaving softirqs disabled while do_parallel holds padata_works_lock. Reported-by: syzbot+f4b9f49e38e25eb4ef52@syzkaller.appspotmail.com Fixes: 4611ce2246889 ("padata: allocate work structures for parallel jobs from a pool") Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: linux-crypto@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
* | | Merge tag 'nfs-for-5.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2020-09-093-4/+13
|\ \ \ | |_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull NFS client bugfixes from Trond Myklebust: - Fix an NFS/RDMA resource leak - Fix the error handling during delegation recall - NFSv4.0 needs to return the delegation on a zero-stateid SETATTR - Stop printk reading past end of string * tag 'nfs-for-5.9-2' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: SUNRPC: stop printk reading past end of string NFS: Zero-stateid SETATTR should first return delegation NFSv4.1 handle ERR_DELAY error reclaiming locking state on delegation recall xprtrdma: Release in-flight MRs on disconnect
| * | SUNRPC: stop printk reading past end of stringJ. Bruce Fields2020-09-051-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since p points at raw xdr data, there's no guarantee that it's NULL terminated, so we should give a length. And probably escape any special characters too. Reported-by: Zhi Li <yieli@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | NFS: Zero-stateid SETATTR should first return delegationChuck Lever2020-09-051-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a write delegation isn't available, the Linux NFS client uses a zero-stateid when performing a SETATTR. NFSv4.0 provides no mechanism for an NFS server to match such a request to a particular client. It recalls all delegations for that file, even delegations held by the client issuing the request. If that client happens to hold a read delegation, the server will recall it immediately, resulting in an NFS4ERR_DELAY/CB_RECALL/ DELEGRETURN sequence. Optimize out this pipeline bubble by having the client return any delegations it may hold on a file before it issues a SETATTR(zero-stateid) on that file. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | NFSv4.1 handle ERR_DELAY error reclaiming locking state on delegation recallOlga Kornievskaia2020-08-271-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A client should be able to handle getting an ERR_DELAY error while doing a LOCK call to reclaim state due to delegation being recalled. This is a transient error that can happen due to server moving its volumes and invalidating its file location cache and upon reference to it during the LOCK call needing to do an expensive lookup (leading to an ERR_DELAY error on a PUTFH). Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| * | Merge tag 'nfs-rdma-for-5.9-1' of ↵Trond Myklebust2020-08-271-0/+2
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.linux-nfs.org/projects/anna/linux-nfs NFSoRDMA Client Bugfix for Linux 5.9 Bugfix: - xprtrdma: Release in-flight MRs on disconnect Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
| | * | xprtrdma: Release in-flight MRs on disconnectChuck Lever2020-08-261-0/+2
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dan Aloni reports that when a server disconnects abruptly, a few memory regions are left DMA mapped. Over time this leak could pin enough I/O resources to slow or even deadlock an NFS/RDMA client. I found that if a transport disconnects before pending Send and FastReg WRs can be posted, the to-be-registered MRs are stranded on the req's rl_registered list and never released -- since they weren't posted, there's no Send completion to DMA unmap them. Reported-by: Dan Aloni <dan@kernelim.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Anna Schumaker <Anna.Schumaker@Netapp.com>
* | | Merge tag 'linux-kselftest-5.9-rc5' of ↵Linus Torvalds2020-09-082-0/+2
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull kselftest fix from Shuah Khan: "A single fix to timers test to disable timeout setting for tests to run and report accurate results" * tag 'linux-kselftest-5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/timers: Turn off timeout setting
| * | | selftests/timers: Turn off timeout settingPo-Hsu Lin2020-08-202-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following 4 tests in timers can take longer than the default 45 seconds that added in commit 852c8cbf34d3 ("selftests/kselftest/runner.sh: Add 45 second timeout per test") to run: * nsleep-lat - 2m7.350s * set-timer-lat - 2m0.66s * inconsistency-check - 1m45.074s * raw_skew - 2m0.013s Thus they will be marked as failed with the current 45s setting: not ok 3 selftests: timers: nsleep-lat # TIMEOUT not ok 4 selftests: timers: set-timer-lat # TIMEOUT not ok 6 selftests: timers: inconsistency-check # TIMEOUT not ok 7 selftests: timers: raw_skew # TIMEOUT Disable the timeout setting for timers can make these tests finish properly: ok 3 selftests: timers: nsleep-lat ok 4 selftests: timers: set-timer-lat ok 6 selftests: timers: inconsistency-check ok 7 selftests: timers: raw_skew https://bugs.launchpad.net/bugs/1864626 Fixes: 852c8cbf34d3 ("selftests/kselftest/runner.sh: Add 45 second timeout per test") Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com> Acked-by: John Stultz <john.stultz@linaro.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
* | | | Merge tag 'scsi-fixes' of ↵Linus Torvalds2020-09-0815-23/+39
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Eleven fixes, mostly in drivers or minor fixes in driver related infrastructure libraries (target, libfc and libsas). Most of the bugs fixed only show up under rare circumstances, the exception being the endianness problem in qla2xxx which is used as a device on some sparc systems" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: mpt3sas: Don't call disable_irq from IRQ poll handler scsi: megaraid_sas: Don't call disable_irq from process IRQ poll scsi: target: iscsi: Fix hang in iscsit_access_np() when getting tpg->np_login_sem scsi: libsas: Set data_dir as DMA_NONE if libata marks qc as NODATA scsi: target: iscsi: Fix data digest calculation scsi: lpfc: Update lpfc version to 12.8.0.4 scsi: lpfc: Extend the RDF FPIN Registration descriptor for additional events scsi: lpfc: Fix FLOGI/PLOGI receive race condition in pt2pt discovery scsi: lpfc: Fix setting IRQ affinity with an empty CPU mask scsi: qla2xxx: Fix regression on sparc64 scsi: libfc: Fix for double free() scsi: pm8001: Fix memleak in pm8001_exec_internal_task_abort
| * | | | scsi: mpt3sas: Don't call disable_irq from IRQ poll handlerTomas Henzl2020-09-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | disable_irq() might sleep, replace it with disable_irq_nosync(). For synchronisation 'irq_poll_scheduled' is sufficient Fixes: 320e77acb3 scsi: mpt3sas: Irq poll to avoid CPU hard lockups Link: https://lore.kernel.org/r/20200901145026.12174-1-thenzl@redhat.com Signed-off-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | | | scsi: megaraid_sas: Don't call disable_irq from process IRQ pollTomas Henzl2020-09-031-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | disable_irq() might sleep. Replace it with disable_irq_nosync() which is sufficient as irq_poll_scheduled protects against concurrently running complete_cmd_fusion() from megasas_irqpoll() and megasas_isr_fusion(). Link: https://lore.kernel.org/r/20200827165332.8432-1-thenzl@redhat.com Fixes: a6ffd5bf681 scsi: megaraid_sas: Call disable_irq from process IRQ poll Signed-off-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | | | scsi: target: iscsi: Fix hang in iscsit_access_np() when getting ↵Hou Pu2020-09-033-7/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tpg->np_login_sem The iSCSI target login thread might get stuck with the following stack: cat /proc/`pidof iscsi_np`/stack [<0>] down_interruptible+0x42/0x50 [<0>] iscsit_access_np+0xe3/0x167 [<0>] iscsi_target_locate_portal+0x695/0x8ac [<0>] __iscsi_target_login_thread+0x855/0xb82 [<0>] iscsi_target_login_thread+0x2f/0x5a [<0>] kthread+0xfa/0x130 [<0>] ret_from_fork+0x1f/0x30 This can be reproduced via the following steps: 1. Initiator A tries to log in to iqn1-tpg1 on port 3260. After finishing PDU exchange in the login thread and before the negotiation is finished the the network link goes down. At this point A has not finished login and tpg->np_login_sem is held. 2. Initiator B tries to log in to iqn2-tpg1 on port 3260. After finishing PDU exchange in the login thread the target expects to process remaining login PDUs in workqueue context. 3. Initiator A' tries to log in to iqn1-tpg1 on port 3260 from a new socket. A' will wait for tpg->np_login_sem with np->np_login_timer loaded to wait for at most 15 seconds. The lock is held by A so A' eventually times out. 4. Before A' got timeout initiator B gets negotiation failed and calls iscsi_target_login_drop()->iscsi_target_login_sess_out(). The np->np_login_timer is canceled and initiator A' will hang forever. Because A' is now in the login thread, no new login requests can be serviced. Fix this by moving iscsi_stop_login_thread_timer() out of iscsi_target_login_sess_out(). Also remove iscsi_np parameter from iscsi_target_login_sess_out(). Link: https://lore.kernel.org/r/20200729130343.24976-1-houpu@bytedance.com Cc: stable@vger.kernel.org Reviewed-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Hou Pu <houpu@bytedance.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | | | scsi: libsas: Set data_dir as DMA_NONE if libata marks qc as NODATALuo Jiaxing2020-09-021-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It was discovered that sdparm will fail when attempting to disable write cache on a SATA disk connected via libsas. In the ATA command set the write cache state is controlled through the SET FEATURES operation. This is roughly corresponds to MODE SELECT in SCSI and the latter command is what is used in the SCSI-ATA translation layer. A subtle difference is that a MODE SELECT carries data whereas SET FEATURES is defined as a non-data command in ATA. Set the DMA data direction to DMA_NONE if the requested ATA command is identified as non-data. [mkp: commit desc] Fixes: fa1c1e8f1ece ("[SCSI] Add SATA support to libsas") Link: https://lore.kernel.org/r/1598426666-54544-1-git-send-email-luojiaxing@huawei.com Reviewed-by: John Garry <john.garry@huawei.com> Reviewed-by: Jason Yan <yanaijie@huawei.com> Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | | | scsi: target: iscsi: Fix data digest calculationVarun Prakash2020-09-021-2/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current code does not consider 'page_off' in data digest calculation. To fix this, add a local variable 'first_sg' and set first_sg.offset to sg->offset + page_off. Link: https://lore.kernel.org/r/1598358910-3052-1-git-send-email-varun@chelsio.com Fixes: e48354ce078c ("iscsi-target: Add iSCSI fabric support for target v4.1") Cc: <stable@vger.kernel.org> Reviewed-by: Mike Christie <michael.christie@oralce.com> Signed-off-by: Varun Prakash <varun@chelsio.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | | | scsi: lpfc: Update lpfc version to 12.8.0.4James Smart2020-09-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update lpfc version to 12.8.0.4 Link: https://lore.kernel.org/r/20200828175332.130300-5-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | | | scsi: lpfc: Extend the RDF FPIN Registration descriptor for additional eventsJames Smart2020-09-012-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the driver registers for Link Integrity events only. This patch adds registration for the following FPIN types: - Delivery Notifications - Congestion Notification - Peer Congestion Notification Link: https://lore.kernel.org/r/20200828175332.130300-4-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | | | scsi: lpfc: Fix FLOGI/PLOGI receive race condition in pt2pt discoveryJames Smart2020-09-011-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The driver is unable to successfully login with remote device. During pt2pt login, the driver completes its FLOGI request with the remote device having WWN precedence. The remote device issues its own (delayed) FLOGI after accepting the driver's and, upon transmitting the FLOGI, immediately recognizes it has already processed the driver's FLOGI thus it transitions to sending a PLOGI before waiting for an ACC to its FLOGI. In the driver, the FLOGI is received and an ACC sent, followed by the PLOGI being received and an ACC sent. The issue is that the PLOGI reception occurs before the response from the adapter from the FLOGI ACC is received. Processing of the PLOGI sets state flags to perform the REG_RPI mailbox command and proceed with the rest of discovery on the port. The same completion routine used by both FLOGI and PLOGI is generic in nature. One of the things it does is clear flags, and those flags happen to drive the rest of discovery. So what happened was the PLOGI processing set the flags, the FLOGI ACC completion cleared them, thus when the PLOGI ACC completes it doesn't see the flags and stops. Fix by modifying the generic completion routine to not clear the rest of discovery flag (NLP_ACC_REGLOGIN) unless the completion is also associated with performing a mailbox command as part of its handling. For things such as FLOGI ACC, there isn't a subsequent action to perform with the adapter, thus there is no mailbox cmd ptr. PLOGI ACC though will perform REG_RPI upon completion, thus there is a mailbox cmd ptr. Link: https://lore.kernel.org/r/20200828175332.130300-3-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | | | scsi: lpfc: Fix setting IRQ affinity with an empty CPU maskJames Smart2020-09-011-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some systems are reporting the following log message during driver unload or system shutdown: ics_rtas_set_affinity: No online cpus in the mask A prior commit introduced the writing of an empty affinity mask in calls to irq_set_affinity_hint() when disabling interrupts or when there are no remaining online CPUs to service an eq interrupt. At least some ppc64 systems are checking whether affinity masks are empty or not. Do not call irq_set_affinity_hint() with an empty CPU mask. Fixes: dcaa21367938 ("scsi: lpfc: Change default IRQ model on AMD architectures") Link: https://lore.kernel.org/r/20200828175332.130300-2-james.smart@broadcom.com Cc: <stable@vger.kernel.org> # v5.5+ Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | | | scsi: qla2xxx: Fix regression on sparc64René Rebe2020-09-012-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 98aee70d19a7 ("qla2xxx: Add endianizer to max_payload_size modifier.") in 2014 broke qla2xxx on sparc64, e.g. as in the Sun Blade 1000 / 2000. Unbreak by partial revert to fix endianness in nvram firmware default initialization. Also mark the second frame_payload_size in nvram_t __le16 to avoid new sparse warnings. Link: https://lore.kernel.org/r/20200827.222729.1875148247374704975.rene@exactcode.com Fixes: 98aee70d19a7 ("qla2xxx: Add endianizer to max_payload_size modifier.") Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Acked-by: Arun Easi <aeasi@marvell.com> Signed-off-by: René Rebe <rene@exactcode.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | | | scsi: libfc: Fix for double free()Javed Hasan2020-09-011-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix for '&fp->skb' double free. Link: https://lore.kernel.org/r/20200825093940.19612-1-jhasan@marvell.com Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Javed Hasan <jhasan@marvell.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * | | | scsi: pm8001: Fix memleak in pm8001_exec_internal_task_abortDinghao Liu2020-09-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When pm8001_tag_alloc() fails, task should be freed just like it is done in the subsequent error paths. Link: https://lore.kernel.org/r/20200823091453.4782-1-dinghao.liu@zju.edu.cn Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com> Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* | | | | Merge tag 'drm-fixes-2020-09-08' of git://anongit.freedesktop.org/drm/drmLinus Torvalds2020-09-0815-64/+411
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull drm fixes from Dave Airlie: "The i915 reverts are going to be a bit of a conflict mess for next, so I decided to dequeue them now, along with some msm fixes for a ring corruption issue, that Rob sent over the weekend. Summary: i915: - revert gpu relocation changes due to regression msm: - fixes for RPTR corruption issue" * tag 'drm-fixes-2020-09-08' of git://anongit.freedesktop.org/drm/drm: Revert "drm/i915/gem: Delete unused code" Revert "drm/i915/gem: Async GPU relocations only" Revert "drm/i915: Remove i915_gem_object_get_dirty_page()" drm/msm: Disable the RPTR shadow drm/msm: Disable preemption on all 5xx targets drm/msm: Enable expanded apriv support for a650 drm/msm: Split the a5xx preemption record
| * | | | | Revert "drm/i915/gem: Delete unused code"Dave Airlie2020-09-081-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These commits caused a regression on Lenovo t520 sandybridge machine belonging to reporter. We are reverting them for 5.10 for other reasons, so just do it for 5.9 as well. This reverts commit 7ac2d2536dfa71c275a74813345779b1e7522c91. Reported-by: Harald Arnesen <harald@skogtun.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
| * | | | | Revert "drm/i915/gem: Async GPU relocations only"Dave Airlie2020-09-082-27/+289
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These commits caused a regression on Lenovo t520 sandybridge machine belonging to reporter. We are reverting them for 5.10 for other reasons, so just do it for 5.9 as well. This reverts commit 9e0f9464e2ab36b864359a59b0e9058fdef0ce47. Reported-by: Harald Arnesen <harald@skogtun.org> Signed-off-by: Dave Airlie <airlied@redhat.com>
| * | | | | Revert "drm/i915: Remove i915_gem_object_get_dirty_page()"Dave Airlie2020-09-082-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | These commits caused a regression on Lenovo t520 sandybridge machine belonging to reporter. We are reverting them for 5.10 for other reasons, so just do it for 5.9 as well. This reverts commit 763fedd6a216f94c2eb98d2f7ca21be3d3806e69. Reported-by: Harald Arnesen <harald@skogtun.org> Signed-off-by: Dave Airlie <airied@redhat.com>
| * | | | | Merge tag 'drm-msm-fixes-2020-09-04' of ↵Dave Airlie2020-09-0811-37/+85
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | https://gitlab.freedesktop.org/drm/msm into drm-fixes A few fixes for a potential RPTR corruption issue. Signed-off-by: Dave Airlie <airlied@redhat.com> From: Rob Clark <robdclark@gmail.com> Link: https://patchwork.freedesktop.org/patch/msgid/ <CAF6AEGvnr6Nhz2J0sjv2G+j7iceVtaDiJDT8T88uW6jiBfOGKQ@mail.gmail.com
| | * | | | | drm/msm: Disable the RPTR shadowJordan Crouse2020-09-046-27/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Disable the RPTR shadow across all targets. It will be selectively re-enabled later for targets that need it. Cc: stable@vger.kernel.org Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Rob Clark <robdclark@chromium.org>
| | * | | | | drm/msm: Disable preemption on all 5xx targetsJordan Crouse2020-09-041-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Temporarily disable preemption on a5xx targets pending some improvements to protect the RPTR shadow from being corrupted. Cc: stable@vger.kernel.org Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Rob Clark <robdclark@chromium.org>
| | * | | | | drm/msm: Enable expanded apriv support for a650Jordan Crouse2020-09-044-4/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | a650 supports expanded apriv support that allows us to map critical buffers (ringbuffer and memstore) as as privileged to protect them from corruption. Cc: stable@vger.kernel.org Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Rob Clark <robdclark@chromium.org>
| | * | | | | drm/msm: Split the a5xx preemption recordJordan Crouse2020-09-042-5/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The main a5xx preemption record can be marked as privileged to protect it from user access but the counters storage needs to be remain unprivileged. Split the buffers and mark the critical memory as privileged. Cc: stable@vger.kernel.org Signed-off-by: Jordan Crouse <jcrouse@codeaurora.org> Signed-off-by: Rob Clark <robdclark@chromium.org>