summaryrefslogtreecommitdiffstats
path: root/fs (follow)
Commit message (Collapse)AuthorAgeFilesLines
* CIFS: Fix NULL-pointer dereference in smb2_push_mandatory_locksPavel Shilovsky2019-12-021-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently when the client creates a cifsFileInfo structure for a newly opened file, it allocates a list of byte-range locks with a pointer to the new cfile and attaches this list to the inode's lock list. The latter happens before initializing all other fields, e.g. cfile->tlink. Thus a partially initialized cifsFileInfo structure becomes available to other threads that walk through the inode's lock list. One example of such a thread may be an oplock break worker thread that tries to push all cached byte-range locks. This causes NULL-pointer dereference in smb2_push_mandatory_locks() when accessing cfile->tlink: [598428.945633] BUG: kernel NULL pointer dereference, address: 0000000000000038 ... [598428.945749] Workqueue: cifsoplockd cifs_oplock_break [cifs] [598428.945793] RIP: 0010:smb2_push_mandatory_locks+0xd6/0x5a0 [cifs] ... [598428.945834] Call Trace: [598428.945870] ? cifs_revalidate_mapping+0x45/0x90 [cifs] [598428.945901] cifs_oplock_break+0x13d/0x450 [cifs] [598428.945909] process_one_work+0x1db/0x380 [598428.945914] worker_thread+0x4d/0x400 [598428.945921] kthread+0x104/0x140 [598428.945925] ? process_one_work+0x380/0x380 [598428.945931] ? kthread_park+0x80/0x80 [598428.945937] ret_from_fork+0x35/0x40 Fix this by reordering initialization steps of the cifsFileInfo structure: initialize all the fields first and then add the new byte-range lock list to the inode's lock list. Cc: Stable <stable@vger.kernel.org> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
* Merge tag '5.5-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6Linus Torvalds2019-11-3023-529/+1340
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull cifs updates from Steve French: "Various smb3 fixes (including 12 for stable) and also features (addition of multichannel support)" * tag '5.5-rc-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6: (41 commits) CIFS: fix a white space issue in cifs_get_inode_info() cifs: update internal module version number cifs: Always update signing key of first channel cifs: Fix retrieval of DFS referrals in cifs_mount() cifs: Fix potential softlockups while refreshing DFS cache cifs: Fix lookup of root ses in DFS referral cache cifs: Fix use-after-free bug in cifs_reconnect() cifs: dump channel info in DebugData smb3: dump in_send and num_waiters stats counters by default cifs: try harder to open new channels CIFS: Properly process SMB3 lease breaks cifs: move cifsFileInfo_put logic into a work-queue cifs: try opening channels after mounting CIFS: refactor cifs_get_inode_info() cifs: switch servers depending on binding state cifs: add server param cifs: add multichannel mount options and data structs cifs: sort interface list by speed CIFS: Fix SMB2 oplock break processing cifs: don't use 'pre:' for MODULE_SOFTDEP ...
| * CIFS: fix a white space issue in cifs_get_inode_info()Dan Carpenter via samba-technical2019-11-271-1/+2
| | | | | | | | | | | | | | | | | | We accidentally messed up the indenting on this if statement. Fixes: 16c696a6c300 ("CIFS: refactor cifs_get_inode_info()") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: update internal module version numberSteve French2019-11-251-1/+1
| | | | | | | | | | | | To 2.24 Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: Always update signing key of first channelPaulo Alcantara (SUSE)2019-11-251-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | Update signing key of first channel whenever generating the master sigining/encryption/decryption keys rather than only in cifs_mount(). This also fixes reconnect when re-establishing smb sessions to other servers. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: Fix retrieval of DFS referrals in cifs_mount()Paulo Alcantara (SUSE)2019-11-251-10/+22
| | | | | | | | | | | | | | | | | | | | | | Make sure that DFS referrals are sent to newly resolved root targets as in a multi tier DFS setup. Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Link: https://lkml.kernel.org/r/05aa2995-e85e-0ff4-d003-5bb08bd17a22@canonical.com Cc: stable@vger.kernel.org Tested-by: Matthew Ruffell <matthew.ruffell@canonical.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: Fix potential softlockups while refreshing DFS cachePaulo Alcantara (SUSE)2019-11-251-12/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We used to skip reconnects on all SMB2_IOCTL commands due to SMB3+ FSCTL_VALIDATE_NEGOTIATE_INFO - which made sense since we're still establishing a SMB session. However, when refresh_cache_worker() calls smb2_get_dfs_refer() and we're under reconnect, SMB2_ioctl() will not be able to get a proper status error (e.g. -EHOSTDOWN in case we failed to reconnect) but an -EAGAIN from cifs_send_recv() thus looping forever in refresh_cache_worker(). Fixes: e99c63e4d86d ("SMB3: Fix deadlock in validate negotiate hits reconnect") Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Suggested-by: Aurelien Aptel <aaptel@suse.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: Fix lookup of root ses in DFS referral cachePaulo Alcantara (SUSE)2019-11-251-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't care about module aliasing validation in cifs_compose_mount_options(..., is_smb3) when finding the root SMB session of an DFS namespace in order to refresh DFS referral cache. The following issue has been observed when mounting with '-t smb3' and then specifying 'vers=2.0': ... Nov 08 15:27:08 tw kernel: address conversion returned 0 for FS0.WIN.LOCAL Nov 08 15:27:08 tw kernel: [kworke] ==> dns_query((null),FS0.WIN.LOCAL,13,(null)) Nov 08 15:27:08 tw kernel: [kworke] call request_key(,FS0.WIN.LOCAL,) Nov 08 15:27:08 tw kernel: [kworke] ==> dns_resolver_cmp(FS0.WIN.LOCAL,FS0.WIN.LOCAL) Nov 08 15:27:08 tw kernel: [kworke] <== dns_resolver_cmp() = 1 Nov 08 15:27:08 tw kernel: [kworke] <== dns_query() = 13 Nov 08 15:27:08 tw kernel: fs/cifs/dns_resolve.c: dns_resolve_server_name_to_ip: resolved: FS0.WIN.LOCAL to 192.168.30.26 ===> Nov 08 15:27:08 tw kernel: CIFS VFS: vers=2.0 not permitted when mounting with smb3 Nov 08 15:27:08 tw kernel: fs/cifs/dfs_cache.c: CIFS VFS: leaving refresh_tcon (xid = 26) rc = -22 ... Fixes: 5072010ccf05 ("cifs: Fix DFS cache refresher for DFS links") Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: Fix use-after-free bug in cifs_reconnect()Paulo Alcantara (SUSE)2019-11-251-11/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ensure we grab an active reference in cifs superblock while doing failover to prevent automounts (DFS links) of expiring and then destroying the superblock pointer. This patch fixes the following KASAN report: [ 464.301462] BUG: KASAN: use-after-free in cifs_reconnect+0x6ab/0x1350 [ 464.303052] Read of size 8 at addr ffff888155e580d0 by task cifsd/1107 [ 464.304682] CPU: 3 PID: 1107 Comm: cifsd Not tainted 5.4.0-rc4+ #13 [ 464.305552] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.12.1-0-ga5cab58-rebuilt.opensuse.org 04/01/2014 [ 464.307146] Call Trace: [ 464.307875] dump_stack+0x5b/0x90 [ 464.308631] print_address_description.constprop.0+0x16/0x200 [ 464.309478] ? cifs_reconnect+0x6ab/0x1350 [ 464.310253] ? cifs_reconnect+0x6ab/0x1350 [ 464.311040] __kasan_report.cold+0x1a/0x41 [ 464.311811] ? cifs_reconnect+0x6ab/0x1350 [ 464.312563] kasan_report+0xe/0x20 [ 464.313300] cifs_reconnect+0x6ab/0x1350 [ 464.314062] ? extract_hostname.part.0+0x90/0x90 [ 464.314829] ? printk+0xad/0xde [ 464.315525] ? _raw_spin_lock+0x7c/0xd0 [ 464.316252] ? _raw_read_lock_irq+0x40/0x40 [ 464.316961] ? ___ratelimit+0xed/0x182 [ 464.317655] cifs_readv_from_socket+0x289/0x3b0 [ 464.318386] cifs_read_from_socket+0x98/0xd0 [ 464.319078] ? cifs_readv_from_socket+0x3b0/0x3b0 [ 464.319782] ? try_to_wake_up+0x43c/0xa90 [ 464.320463] ? cifs_small_buf_get+0x4b/0x60 [ 464.321173] ? allocate_buffers+0x98/0x1a0 [ 464.321856] cifs_demultiplex_thread+0x218/0x14a0 [ 464.322558] ? cifs_handle_standard+0x270/0x270 [ 464.323237] ? __switch_to_asm+0x40/0x70 [ 464.323893] ? __switch_to_asm+0x34/0x70 [ 464.324554] ? __switch_to_asm+0x40/0x70 [ 464.325226] ? __switch_to_asm+0x40/0x70 [ 464.325863] ? __switch_to_asm+0x34/0x70 [ 464.326505] ? __switch_to_asm+0x40/0x70 [ 464.327161] ? __switch_to_asm+0x34/0x70 [ 464.327784] ? finish_task_switch+0xa1/0x330 [ 464.328414] ? __switch_to+0x363/0x640 [ 464.329044] ? __schedule+0x575/0xaf0 [ 464.329655] ? _raw_spin_lock_irqsave+0x82/0xe0 [ 464.330301] kthread+0x1a3/0x1f0 [ 464.330884] ? cifs_handle_standard+0x270/0x270 [ 464.331624] ? kthread_create_on_node+0xd0/0xd0 [ 464.332347] ret_from_fork+0x35/0x40 [ 464.333577] Allocated by task 1110: [ 464.334381] save_stack+0x1b/0x80 [ 464.335123] __kasan_kmalloc.constprop.0+0xc2/0xd0 [ 464.335848] cifs_smb3_do_mount+0xd4/0xb00 [ 464.336619] legacy_get_tree+0x6b/0xa0 [ 464.337235] vfs_get_tree+0x41/0x110 [ 464.337975] fc_mount+0xa/0x40 [ 464.338557] vfs_kern_mount.part.0+0x6c/0x80 [ 464.339227] cifs_dfs_d_automount+0x336/0xd29 [ 464.339846] follow_managed+0x1b1/0x450 [ 464.340449] lookup_fast+0x231/0x4a0 [ 464.341039] path_openat+0x240/0x1fd0 [ 464.341634] do_filp_open+0x126/0x1c0 [ 464.342277] do_sys_open+0x1eb/0x2c0 [ 464.342957] do_syscall_64+0x5e/0x190 [ 464.343555] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [ 464.344772] Freed by task 0: [ 464.345347] save_stack+0x1b/0x80 [ 464.345966] __kasan_slab_free+0x12c/0x170 [ 464.346576] kfree+0xa6/0x270 [ 464.347211] rcu_core+0x39c/0xc80 [ 464.347800] __do_softirq+0x10d/0x3da [ 464.348919] The buggy address belongs to the object at ffff888155e58000 which belongs to the cache kmalloc-256 of size 256 [ 464.350222] The buggy address is located 208 bytes inside of 256-byte region [ffff888155e58000, ffff888155e58100) [ 464.351575] The buggy address belongs to the page: [ 464.352333] page:ffffea0005579600 refcount:1 mapcount:0 mapping:ffff88815a803400 index:0x0 compound_mapcount: 0 [ 464.353583] flags: 0x200000000010200(slab|head) [ 464.354209] raw: 0200000000010200 ffffea0005576200 0000000400000004 ffff88815a803400 [ 464.355353] raw: 0000000000000000 0000000080100010 00000001ffffffff 0000000000000000 [ 464.356458] page dumped because: kasan: bad access detected [ 464.367005] Memory state around the buggy address: [ 464.367787] ffff888155e57f80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 464.368877] ffff888155e58000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 464.369967] >ffff888155e58080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb [ 464.371111] ^ [ 464.371775] ffff888155e58100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 464.372893] ffff888155e58180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 464.373983] ================================================================== Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: dump channel info in DebugDataAurelien Aptel2019-11-251-1/+34
| | | | | | | | | | | | | | | | | | | | | | | | * show server&TCP states for extra channels * mention if an interface has a channel connected to it In this version three of the patch, fixed minor printk format issue pointed out by the kbuild robot. Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * smb3: dump in_send and num_waiters stats counters by defaultSteve French2019-11-252-21/+4
| | | | | | | | | | | | | | | | | | Number of requests in_send and the number of waiters on sendRecv are useful counters in various cases, move them from CONFIG_CIFS_STATS2 to be on by default especially with multichannel Signed-off-by: Steve French <stfrench@microsoft.com> Acked-by: Ronnie Sahlberg <lsahlber@redhat.com>
| * cifs: try harder to open new channelsAurelien Aptel2019-11-251-10/+22
| | | | | | | | | | | | | | | | | | Previously we would only loop over the iface list once. This patch tries to loop over multiple times until all channels are opened. It will also try to reuse RSS ifaces. Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * CIFS: Properly process SMB3 lease breaksPavel Shilovsky2019-11-257-65/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currenly we doesn't assume that a server may break a lease from RWH to RW which causes us setting a wrong lease state on a file and thus mistakenly flushing data and byte-range locks and purging cached data on the client. This leads to performance degradation because subsequent IOs go directly to the server. Fix this by propagating new lease state and epoch values to the oplock break handler through cifsFileInfo structure and removing the use of cifsInodeInfo flags for that. It allows to avoid some races of several lease/oplock breaks using those flags in parallel. Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: move cifsFileInfo_put logic into a work-queueRonnie Sahlberg2019-11-253-27/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch moves the final part of the cifsFileInfo_put() logic where we need a write lock on lock_sem to be processed in a separate thread that holds no other locks. This is to prevent deadlocks like the one below: > there are 6 processes looping to while trying to down_write > cinode->lock_sem, 5 of them from _cifsFileInfo_put, and one from > cifs_new_fileinfo > > and there are 5 other processes which are blocked, several of them > waiting on either PG_writeback or PG_locked (which are both set), all > for the same page of the file > > 2 inode_lock() (inode->i_rwsem) for the file > 1 wait_on_page_writeback() for the page > 1 down_read(inode->i_rwsem) for the inode of the directory > 1 inode_lock()(inode->i_rwsem) for the inode of the directory > 1 __lock_page > > > so processes are blocked waiting on: > page flags PG_locked and PG_writeback for one specific page > inode->i_rwsem for the directory > inode->i_rwsem for the file > cifsInodeInflock_sem > > > > here are the more gory details (let me know if I need to provide > anything more/better): > > [0 00:48:22.765] [UN] PID: 8863 TASK: ffff8c691547c5c0 CPU: 3 > COMMAND: "reopen_file" > #0 [ffff9965007e3ba8] __schedule at ffffffff9b6e6095 > #1 [ffff9965007e3c38] schedule at ffffffff9b6e64df > #2 [ffff9965007e3c48] rwsem_down_write_slowpath at ffffffff9af283d7 > #3 [ffff9965007e3cb8] legitimize_path at ffffffff9b0f975d > #4 [ffff9965007e3d08] path_openat at ffffffff9b0fe55d > #5 [ffff9965007e3dd8] do_filp_open at ffffffff9b100a33 > #6 [ffff9965007e3ee0] do_sys_open at ffffffff9b0eb2d6 > #7 [ffff9965007e3f38] do_syscall_64 at ffffffff9ae04315 > * (I think legitimize_path is bogus) > > in path_openat > } else { > const char *s = path_init(nd, flags); > while (!(error = link_path_walk(s, nd)) && > (error = do_last(nd, file, op)) > 0) { <<<< > > do_last: > if (open_flag & O_CREAT) > inode_lock(dir->d_inode); <<<< > else > so it's trying to take inode->i_rwsem for the directory > > DENTRY INODE SUPERBLK TYPE PATH > ffff8c68bb8e79c0 ffff8c691158ef20 ffff8c6915bf9000 DIR /mnt/vm1_smb/ > inode.i_rwsem is ffff8c691158efc0 > > <struct rw_semaphore 0xffff8c691158efc0>: > owner: <struct task_struct 0xffff8c6914275d00> (UN - 8856 - > reopen_file), counter: 0x0000000000000003 > waitlist: 2 > 0xffff9965007e3c90 8863 reopen_file UN 0 1:29:22.926 > RWSEM_WAITING_FOR_WRITE > 0xffff996500393e00 9802 ls UN 0 1:17:26.700 > RWSEM_WAITING_FOR_READ > > > the owner of the inode.i_rwsem of the directory is: > > [0 00:00:00.109] [UN] PID: 8856 TASK: ffff8c6914275d00 CPU: 3 > COMMAND: "reopen_file" > #0 [ffff99650065b828] __schedule at ffffffff9b6e6095 > #1 [ffff99650065b8b8] schedule at ffffffff9b6e64df > #2 [ffff99650065b8c8] schedule_timeout at ffffffff9b6e9f89 > #3 [ffff99650065b940] msleep at ffffffff9af573a9 > #4 [ffff99650065b948] _cifsFileInfo_put.cold.63 at ffffffffc0a42dd6 [cifs] > #5 [ffff99650065ba38] cifs_writepage_locked at ffffffffc0a0b8f3 [cifs] > #6 [ffff99650065bab0] cifs_launder_page at ffffffffc0a0bb72 [cifs] > #7 [ffff99650065bb30] invalidate_inode_pages2_range at ffffffff9b04d4bd > #8 [ffff99650065bcb8] cifs_invalidate_mapping at ffffffffc0a11339 [cifs] > #9 [ffff99650065bcd0] cifs_revalidate_mapping at ffffffffc0a1139a [cifs] > #10 [ffff99650065bcf0] cifs_d_revalidate at ffffffffc0a014f6 [cifs] > #11 [ffff99650065bd08] path_openat at ffffffff9b0fe7f7 > #12 [ffff99650065bdd8] do_filp_open at ffffffff9b100a33 > #13 [ffff99650065bee0] do_sys_open at ffffffff9b0eb2d6 > #14 [ffff99650065bf38] do_syscall_64 at ffffffff9ae04315 > > cifs_launder_page is for page 0xffffd1e2c07d2480 > > crash> page.index,mapping,flags 0xffffd1e2c07d2480 > index = 0x8 > mapping = 0xffff8c68f3cd0db0 > flags = 0xfffffc0008095 > > PAGE-FLAG BIT VALUE > PG_locked 0 0000001 > PG_uptodate 2 0000004 > PG_lru 4 0000010 > PG_waiters 7 0000080 > PG_writeback 15 0008000 > > > inode is ffff8c68f3cd0c40 > inode.i_rwsem is ffff8c68f3cd0ce0 > DENTRY INODE SUPERBLK TYPE PATH > ffff8c68a1f1b480 ffff8c68f3cd0c40 ffff8c6915bf9000 REG > /mnt/vm1_smb/testfile.8853 > > > this process holds the inode->i_rwsem for the parent directory, is > laundering a page attached to the inode of the file it's opening, and in > _cifsFileInfo_put is trying to down_write the cifsInodeInflock_sem > for the file itself. > > > <struct rw_semaphore 0xffff8c68f3cd0ce0>: > owner: <struct task_struct 0xffff8c6914272e80> (UN - 8854 - > reopen_file), counter: 0x0000000000000003 > waitlist: 1 > 0xffff9965005dfd80 8855 reopen_file UN 0 1:29:22.912 > RWSEM_WAITING_FOR_WRITE > > this is the inode.i_rwsem for the file > > the owner: > > [0 00:48:22.739] [UN] PID: 8854 TASK: ffff8c6914272e80 CPU: 2 > COMMAND: "reopen_file" > #0 [ffff99650054fb38] __schedule at ffffffff9b6e6095 > #1 [ffff99650054fbc8] schedule at ffffffff9b6e64df > #2 [ffff99650054fbd8] io_schedule at ffffffff9b6e68e2 > #3 [ffff99650054fbe8] __lock_page at ffffffff9b03c56f > #4 [ffff99650054fc80] pagecache_get_page at ffffffff9b03dcdf > #5 [ffff99650054fcc0] grab_cache_page_write_begin at ffffffff9b03ef4c > #6 [ffff99650054fcd0] cifs_write_begin at ffffffffc0a064ec [cifs] > #7 [ffff99650054fd30] generic_perform_write at ffffffff9b03bba4 > #8 [ffff99650054fda8] __generic_file_write_iter at ffffffff9b04060a > #9 [ffff99650054fdf0] cifs_strict_writev.cold.70 at ffffffffc0a4469b [cifs] > #10 [ffff99650054fe48] new_sync_write at ffffffff9b0ec1dd > #11 [ffff99650054fed0] vfs_write at ffffffff9b0eed35 > #12 [ffff99650054ff00] ksys_write at ffffffff9b0eefd9 > #13 [ffff99650054ff38] do_syscall_64 at ffffffff9ae04315 > > the process holds the inode->i_rwsem for the file to which it's writing, > and is trying to __lock_page for the same page as in the other processes > > > the other tasks: > [0 00:00:00.028] [UN] PID: 8859 TASK: ffff8c6915479740 CPU: 2 > COMMAND: "reopen_file" > #0 [ffff9965007b39d8] __schedule at ffffffff9b6e6095 > #1 [ffff9965007b3a68] schedule at ffffffff9b6e64df > #2 [ffff9965007b3a78] schedule_timeout at ffffffff9b6e9f89 > #3 [ffff9965007b3af0] msleep at ffffffff9af573a9 > #4 [ffff9965007b3af8] cifs_new_fileinfo.cold.61 at ffffffffc0a42a07 [cifs] > #5 [ffff9965007b3b78] cifs_open at ffffffffc0a0709d [cifs] > #6 [ffff9965007b3cd8] do_dentry_open at ffffffff9b0e9b7a > #7 [ffff9965007b3d08] path_openat at ffffffff9b0fe34f > #8 [ffff9965007b3dd8] do_filp_open at ffffffff9b100a33 > #9 [ffff9965007b3ee0] do_sys_open at ffffffff9b0eb2d6 > #10 [ffff9965007b3f38] do_syscall_64 at ffffffff9ae04315 > > this is opening the file, and is trying to down_write cinode->lock_sem > > > [0 00:00:00.041] [UN] PID: 8860 TASK: ffff8c691547ae80 CPU: 2 > COMMAND: "reopen_file" > [0 00:00:00.057] [UN] PID: 8861 TASK: ffff8c6915478000 CPU: 3 > COMMAND: "reopen_file" > [0 00:00:00.059] [UN] PID: 8858 TASK: ffff8c6914271740 CPU: 2 > COMMAND: "reopen_file" > [0 00:00:00.109] [UN] PID: 8862 TASK: ffff8c691547dd00 CPU: 6 > COMMAND: "reopen_file" > #0 [ffff9965007c3c78] __schedule at ffffffff9b6e6095 > #1 [ffff9965007c3d08] schedule at ffffffff9b6e64df > #2 [ffff9965007c3d18] schedule_timeout at ffffffff9b6e9f89 > #3 [ffff9965007c3d90] msleep at ffffffff9af573a9 > #4 [ffff9965007c3d98] _cifsFileInfo_put.cold.63 at ffffffffc0a42dd6 [cifs] > #5 [ffff9965007c3e88] cifs_close at ffffffffc0a07aaf [cifs] > #6 [ffff9965007c3ea0] __fput at ffffffff9b0efa6e > #7 [ffff9965007c3ee8] task_work_run at ffffffff9aef1614 > #8 [ffff9965007c3f20] exit_to_usermode_loop at ffffffff9ae03d6f > #9 [ffff9965007c3f38] do_syscall_64 at ffffffff9ae0444c > > closing the file, and trying to down_write cifsi->lock_sem > > > [0 00:48:22.839] [UN] PID: 8857 TASK: ffff8c6914270000 CPU: 7 > COMMAND: "reopen_file" > #0 [ffff9965006a7cc8] __schedule at ffffffff9b6e6095 > #1 [ffff9965006a7d58] schedule at ffffffff9b6e64df > #2 [ffff9965006a7d68] io_schedule at ffffffff9b6e68e2 > #3 [ffff9965006a7d78] wait_on_page_bit at ffffffff9b03cac6 > #4 [ffff9965006a7e10] __filemap_fdatawait_range at ffffffff9b03b028 > #5 [ffff9965006a7ed8] filemap_write_and_wait at ffffffff9b040165 > #6 [ffff9965006a7ef0] cifs_flush at ffffffffc0a0c2fa [cifs] > #7 [ffff9965006a7f10] filp_close at ffffffff9b0e93f1 > #8 [ffff9965006a7f30] __x64_sys_close at ffffffff9b0e9a0e > #9 [ffff9965006a7f38] do_syscall_64 at ffffffff9ae04315 > > in __filemap_fdatawait_range > wait_on_page_writeback(page); > for the same page of the file > > > > [0 00:48:22.718] [UN] PID: 8855 TASK: ffff8c69142745c0 CPU: 7 > COMMAND: "reopen_file" > #0 [ffff9965005dfc98] __schedule at ffffffff9b6e6095 > #1 [ffff9965005dfd28] schedule at ffffffff9b6e64df > #2 [ffff9965005dfd38] rwsem_down_write_slowpath at ffffffff9af283d7 > #3 [ffff9965005dfdf0] cifs_strict_writev at ffffffffc0a0c40a [cifs] > #4 [ffff9965005dfe48] new_sync_write at ffffffff9b0ec1dd > #5 [ffff9965005dfed0] vfs_write at ffffffff9b0eed35 > #6 [ffff9965005dff00] ksys_write at ffffffff9b0eefd9 > #7 [ffff9965005dff38] do_syscall_64 at ffffffff9ae04315 > > inode_lock(inode); > > > and one 'ls' later on, to see whether the rest of the mount is available > (the test file is in the root, so we get blocked up on the directory > ->i_rwsem), so the entire mount is unavailable > > [0 00:36:26.473] [UN] PID: 9802 TASK: ffff8c691436ae80 CPU: 4 > COMMAND: "ls" > #0 [ffff996500393d28] __schedule at ffffffff9b6e6095 > #1 [ffff996500393db8] schedule at ffffffff9b6e64df > #2 [ffff996500393dc8] rwsem_down_read_slowpath at ffffffff9b6e9421 > #3 [ffff996500393e78] down_read_killable at ffffffff9b6e95e2 > #4 [ffff996500393e88] iterate_dir at ffffffff9b103c56 > #5 [ffff996500393ec8] ksys_getdents64 at ffffffff9b104b0c > #6 [ffff996500393f30] __x64_sys_getdents64 at ffffffff9b104bb6 > #7 [ffff996500393f38] do_syscall_64 at ffffffff9ae04315 > > in iterate_dir: > if (shared) > res = down_read_killable(&inode->i_rwsem); <<<< > else > res = down_write_killable(&inode->i_rwsem); > Reported-by: Frank Sorenson <sorenson@redhat.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: try opening channels after mountingAurelien Aptel2019-11-259-88/+478
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After doing mount() successfully we call cifs_try_adding_channels() which will open as many channels as it can. Channels are closed when the master session is closed. The master connection becomes the first channel. ,-------------> global cifs_tcp_ses_list <-------------------------. | | '- TCP_Server_Info <--> TCP_Server_Info <--> TCP_Server_Info <-' (master con) (chan#1 con) (chan#2 con) | ^ ^ ^ v '--------------------|--------------------' cifs_ses | - chan_count = 3 | - chans[] ---------------------' - smb3signingkey[] (master signing key) Note how channel connections don't have sessions. That's because cifs_ses can only be part of one linked list (list_head are internal to the elements). For signing keys, each channel has its own signing key which must be used only after the channel has been bound. While it's binding it must use the master session signing key. For encryption keys, since channel connections do not have sessions attached we must now find matching session by looping over all sessions in smb2_get_enc_key(). Each channel is opened like a regular server connection but at the session setup request step it must set the SMB2_SESSION_REQ_FLAG_BINDING flag and use the session id to bind to. Finally, while sending in compound_send_recv() for requests that aren't negprot, ses-setup or binding related, use a channel by cycling through the available ones (round-robin). Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * CIFS: refactor cifs_get_inode_info()Aurelien Aptel2019-11-252-137/+205
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make logic of cifs_get_inode() much clearer by moving code to sub functions and adding comments. Document the steps this function does. cifs_get_inode_info() gets and updates a file inode metadata from its file path. * If caller already has raw info data from server they can pass it. * If inode already exists (just need to update) caller can pass it. Step 1: get raw data from server if none was passed Step 2: parse raw data into intermediate internal cifs_fattr struct Step 3: set fattr uniqueid which is later used for inode number. This can sometime be done from raw data Step 4: tweak fattr according to mount options (file_mode, acl to mode bits, uid, gid, etc) Step 5: update or create inode from final fattr struct * add is_smb1_server() helper * add is_inode_cache_good() helper * move SMB1-backupcreds-getinfo-retry to separate func cifs_backup_query_path_info(). * move set-uniqueid code to separate func cifs_set_fattr_ino() * don't clobber uniqueid from backup cred retry * fix some probable corner cases memleaks Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: switch servers depending on binding stateAurelien Aptel2019-11-256-21/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | Currently a lot of the code to initialize a connection & session uses the cifs_ses as input. But depending on if we are opening a new session or a new channel we need to use different server pointers. Add a "binding" flag in cifs_ses and a helper function that returns the server ptr a session should use (only in the sess establishment code path). Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: add server paramAurelien Aptel2019-11-255-17/+22
| | | | | | | | | | | | | | | | | | | | | | As we get down to the transport layer, plenty of functions are passed the session pointer and assume the transport to use is ses->server. Instead we modify those functions to pass (ses, server) so that we can decouple the session from the server. Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: add multichannel mount options and data structsAurelien Aptel2019-11-253-2/+56
| | | | | | | | | | | | | | | | | | | | | | | | | | adds: - [no]multichannel to enable/disable multichannel - max_channels=N to control how many channels to create these options are then stored in the volume struct. - store channels and max_channels in cifs_ses Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: sort interface list by speedAurelien Aptel2019-11-251-0/+11
| | | | | | | | | | | | | | | | New channels are going to be opened by walking the list sequentially, so by sorting it we will connect to the fastest interfaces first. Signed-off-by: Aurelien Aptel <aaptel@suse.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * CIFS: Fix SMB2 oplock break processingPavel Shilovsky2019-11-251-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Even when mounting modern protocol version the server may be configured without supporting SMB2.1 leases and the client uses SMB2 oplock to optimize IO performance through local caching. However there is a problem in oplock break handling that leads to missing a break notification on the client who has a file opened. It latter causes big latencies to other clients that are trying to open the same file. The problem reproduces when there are multiple shares from the same server mounted on the client. The processing code tries to match persistent and volatile file ids from the break notification with an open file but it skips all share besides the first one. Fix this by looking up in all shares belonging to the server that issued the oplock break. Cc: Stable <stable@vger.kernel.org> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: don't use 'pre:' for MODULE_SOFTDEPRonnie Sahlberg2019-11-251-12/+12
| | | | | | | | | | | | | | | | | | | | | | It can cause to fail with modprobe: FATAL: Module <module> is builtin. RHBZ: 1767094 Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: smbd: Return -EAGAIN when transport is reconnectingLong Li2019-11-251-2/+5
| | | | | | | | | | | | | | | | | | | | During reconnecting, the transport may have already been destroyed and is in the process being reconnected. In this case, return -EAGAIN to not fail and to retry this I/O. Signed-off-by: Long Li <longli@microsoft.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: smbd: Only queue work for error recovery on memory registrationLong Li2019-11-251-11/+15
| | | | | | | | | | | | | | | | | | | | | | It's not necessary to queue invalidated memory registration to work queue, as all we need to do is to unmap the SG and make it usable again. This can save CPU cycles in normal data paths as memory registration errors are rare and normally only happens during reconnection. Signed-off-by: Long Li <longli@microsoft.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
| * smb3: add debug messages for closing unmatched openRonnie Sahlberg2019-11-252-10/+34
| | | | | | | | | | | | | | | | Helps distinguish between an interrupted close and a truly unmatched open. Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * CIFS: Do not miss cancelled OPEN responsesPavel Shilovsky2019-11-252-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When an OPEN command is cancelled we mark a mid as cancelled and let the demultiplex thread process it by closing an open handle. The problem is there is a race between a system call thread and the demultiplex thread and there may be a situation when the mid has been already processed before it is set as cancelled. Fix this by processing cancelled requests when mids are being destroyed which means that there is only one thread referencing a particular mid. Also set mids as cancelled unconditionally on their state. Cc: Stable <stable@vger.kernel.org> Tested-by: Frank Sorenson <sorenson@redhat.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * CIFS: Fix NULL pointer dereference in mid callbackPavel Shilovsky2019-11-253-7/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a race between a system call processing thread and the demultiplex thread when mid->resp_buf becomes NULL and later is being accessed to get credits. It happens when the 1st thread wakes up before a mid callback is called in the 2nd one but the mid state has already been set to MID_RESPONSE_RECEIVED. This causes NULL pointer dereference in mid callback. Fix this by saving credits from the response before we update the mid state and then use this value in the mid callback rather then accessing a response buffer. Cc: Stable <stable@vger.kernel.org> Fixes: ee258d79159afed5 ("CIFS: Move credit processing to mid callbacks for SMB3") Tested-by: Frank Sorenson <sorenson@redhat.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * CIFS: Close open handle after interrupted closePavel Shilovsky2019-11-253-15/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If Close command is interrupted before sending a request to the server the client ends up leaking an open file handle. This wastes server resources and can potentially block applications that try to remove the file or any directory containing this file. Fix this by putting the close command into a worker queue, so another thread retries it later. Cc: Stable <stable@vger.kernel.org> Tested-by: Frank Sorenson <sorenson@redhat.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * CIFS: Respect O_SYNC and O_DIRECT flags during reconnectPavel Shilovsky2019-11-251-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the client translates O_SYNC and O_DIRECT flags into corresponding SMB create options when openning a file. The problem is that on reconnect when the file is being re-opened the client doesn't set those flags and it causes a server to reject re-open requests because create options don't match. The latter means that any subsequent system call against that open file fail until a share is re-mounted. Fix this by properly setting SMB create options when re-openning files after reconnects. Fixes: 1013e760d10e6: ("SMB3: Don't ignore O_SYNC/O_DSYNC and O_DIRECT flags") Cc: Stable <stable@vger.kernel.org> Signed-off-by: Pavel Shilovsky <pshilov@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * smb3: remove confusing dmesg when mounting with encryption ("seal")Steve French2019-11-251-8/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The smb2/smb3 message checking code was logging to dmesg when mounting with encryption ("seal") for compounded SMB3 requests. When encrypted the whole frame (including potentially multiple compounds) is read so the length field is longer than in the case of non-encrypted case (where length field will match the the calculated length for the particular SMB3 request in the compound being validated). Avoids the warning on mount (with "seal"): "srv rsp padded more than expected. Length 384 not ..." Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: close the shared root handle on tree disconnectRonnie Sahlberg2019-11-251-0/+2
| | | | | | | | | | Signed-off-by: Ronnie Sahlberg <lsahlber@redhat.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * CIFS: Return directly after a failed build_path_from_dentry() in ↵Markus Elfring2019-11-251-4/+2
| | | | | | | | | | | | | | | | | | | | cifs_do_create() Return directly after a call of the function "build_path_from_dentry" failed at the beginning. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Steve French <stfrench@microsoft.com>
| * CIFS: Use common error handling code in smb2_ioctl_query_info()Markus Elfring2019-11-251-22/+23
| | | | | | | | | | | | | | | | | | | | Move the same error code assignments so that such exception handling can be better reused at the end of this function. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Steve French <stfrench@microsoft.com>
| * CIFS: Use memdup_user() rather than duplicating its implementationMarkus Elfring2019-11-251-9/+4
| | | | | | | | | | | | | | | | | | | | | | Reuse existing functionality from memdup_user() instead of keeping duplicate source code. Generated by: scripts/coccinelle/api/memdup_user.cocci Fixes: f5b05d622a3e99e6a97a189fe500414be802a05c ("cifs: add IOCTL for QUERY_INFO passthrough to userspace") Signed-off-by: Markus Elfring <elfring@users.sourceforge.net> Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: smbd: Return -ECONNABORTED when trasnport is not in connected stateLong Li2019-11-251-1/+1
| | | | | | | | | | | | | | | | The transport should return this error so the upper layer will reconnect. Signed-off-by: Long Li <longli@microsoft.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: smbd: Add messages on RDMA session destroy and reconnectionLong Li2019-11-251-2/+4
| | | | | | | | | | | | | | | | Log these activities to help production support. Signed-off-by: Long Li <longli@microsoft.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: smbd: Return -EINVAL when the number of iovs exceeds SMBDIRECT_MAX_SGELong Li2019-11-251-1/+1
| | | | | | | | | | | | | | | | | | | | While it's not friendly to fail user processes that issue more iovs than we support, at least we should return the correct error code so the user process gets a chance to retry with smaller number of iovs. Signed-off-by: Long Li <longli@microsoft.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: smbd: Invalidate and deregister memory registration on re-send for ↵Long Li2019-11-251-2/+18
| | | | | | | | | | | | | | | | | | | | direct I/O On re-send, there might be a reconnect and all prevoius memory registrations need to be invalidated and deregistered. Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: Don't display RDMA transport on reconnectLong Li2019-11-251-0/+5
| | | | | | | | | | | | | | | | | | On reconnect, the transport data structure is NULL and its information is not available. Signed-off-by: Long Li <longli@microsoft.com> Cc: stable@vger.kernel.org Signed-off-by: Steve French <stfrench@microsoft.com>
| * CIFS: remove set but not used variables 'cinode' and 'netfid'YueHaibing2019-11-251-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes gcc '-Wunused-but-set-variable' warning: fs/cifs/file.c: In function 'cifs_flock': fs/cifs/file.c:1704:8: warning: variable 'netfid' set but not used [-Wunused-but-set-variable] fs/cifs/file.c:1702:24: warning: variable 'cinode' set but not used [-Wunused-but-set-variable] Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: add support for flockSteve French2019-11-253-1/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | The flock system call locks the whole file rather than a byte range and so is currently emulated by various other file systems by simply sending a byte range lock for the whole file. Add flock handling for cifs.ko in similar way. xfstest generic/504 passes with this as well Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
| * cifs: remove unused variable 'sid_user'YueHaibing2019-11-251-2/+0
| | | | | | | | | | | | | | | | | | | | fs/cifs/cifsacl.c:43:30: warning: sid_user defined but not used [-Wunused-const-variable=] It is never used, so remove it. Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * cifs: rename a variable in SendReceive()Dan Carpenter2019-11-251-1/+1
| | | | | | | | | | | | | | | | | | Smatch gets confused because we sometimes refer to "server->srv_mutex" and sometimes to "sess->server->srv_mutex". They refer to the same lock so let's just make this consistent. Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Steve French <stfrench@microsoft.com>
* | Merge tag 'f2fs-for-5.5' of ↵Linus Torvalds2019-11-3015-106/+411
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we've introduced fairly small number of patches as below. Enhancements: - improve the in-place-update IO flow - allocate segment to guarantee no GC for pinned files Bug fixes: - fix updatetime in lazytime mode - potential memory leak in f2fs_listxattr - record parent inode number in rename2 correctly - fix deadlock in f2fs_gc along with atomic writes - avoid needless data migration in GC" * tag 'f2fs-for-5.5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: f2fs: stop GC when the victim becomes fully valid f2fs: expose main_blkaddr in sysfs f2fs: choose hardlimit when softlimit is larger than hardlimit in f2fs_statfs_project() f2fs: Fix deadlock in f2fs_gc() context during atomic files handling f2fs: show f2fs instance in printk_ratelimited f2fs: fix potential overflow f2fs: fix to update dir's i_pino during cross_rename f2fs: support aligned pinned file f2fs: avoid kernel panic on corruption test f2fs: fix wrong description in document f2fs: cache global IPU bio f2fs: fix to avoid memory leakage in f2fs_listxattr f2fs: check total_segments from devices in raw_super f2fs: update multi-dev metadata in resize_fs f2fs: mark recovery flag correctly in read_raw_super_block() f2fs: fix to update time in lazytime mode
| * | f2fs: stop GC when the victim becomes fully validJaegeuk Kim2019-11-251-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We must stop GC, once the segment becomes fully valid. Otherwise, it can produce another dirty segments by moving valid blocks in the segment partially. Ramon hit no free segment panic sometimes and saw this case happens when validating reliable file pinning feature. Signed-off-by: Ramon Pantin <pantin@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * | f2fs: expose main_blkaddr in sysfsJaegeuk Kim2019-11-251-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Expose in /sys/fs/f2fs/<blockdev>/main_blkaddr the block address where the main area starts. This allows user mode programs to determine: - That pinned files that are made exclusively of fully allocated 2MB segments will never be unpinned by the file system. - Where the main area starts. This is required by programs that want to verify if a file is made exclusively of 2MB f2fs segments, the alignment boundary for segments starts at this address. Testing for 2MB alignment relative to the start of the device is incorrect, because for some filesystems main_blkaddr is not at a 2MB boundary relative to the start of the device. The entry will be used when validating reliable pinning file feature proposed by "f2fs: support aligned pinned file". Signed-off-by: Ramon Pantin <pantin@google.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * | f2fs: choose hardlimit when softlimit is larger than hardlimit in ↵Chengguang Xu2019-11-251-6/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | f2fs_statfs_project() Setting softlimit larger than hardlimit seems meaningless for disk quota but currently it is allowed. In this case, there may be a bit of comfusion for users when they run df comamnd to directory which has project quota. For example, we set 20M softlimit and 10M hardlimit of block usage limit for project quota of test_dir(project id 123). [root@hades f2fs]# repquota -P -a *** Report for project quotas on device /dev/nvme0n1p8 Block grace time: 7days; Inode grace time: 7days Block limits File limits Project used soft hard grace used soft hard grace ---------------------------------------------------------------------- 0 -- 4 0 0 1 0 0 123 +- 10248 20480 10240 2 0 0 The result of df command as below: [root@hades f2fs]# df -h /mnt/f2fs/test Filesystem Size Used Avail Use% Mounted on /dev/nvme0n1p8 20M 11M 10M 51% /mnt/f2fs Even though it looks like there is another 10M free space to use, if we write new data to diretory test(inherit project id), the write will fail with errno(-EDQUOT). After this patch, the df result looks like below. [root@hades f2fs]# df -h /mnt/f2fs/test Filesystem Size Used Avail Use% Mounted on /dev/nvme0n1p8 10M 10M 0 100% /mnt/f2fs Signed-off-by: Chengguang Xu <cgxu519@mykernel.net> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * | f2fs: Fix deadlock in f2fs_gc() context during atomic files handlingSahitya Tummala2019-11-193-6/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The FS got stuck in the below stack when the storage is almost full/dirty condition (when FG_GC is being done). schedule_timeout io_schedule_timeout congestion_wait f2fs_drop_inmem_pages_all f2fs_gc f2fs_balance_fs __write_node_page f2fs_fsync_node_pages f2fs_do_sync_file f2fs_ioctl The root cause for this issue is there is a potential infinite loop in f2fs_drop_inmem_pages_all() for the case where gc_failure is true and when there an inode whose i_gc_failures[GC_FAILURE_ATOMIC] is not set. Fix this by keeping track of the total atomic files currently opened and using that to exit from this condition. Fix-suggested-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Sahitya Tummala <stummala@codeaurora.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * | f2fs: show f2fs instance in printk_ratelimitedChao Yu2019-11-199-27/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As Eric mentioned, bare printk{,_ratelimited} won't show which filesystem instance these message is coming from, this patch tries to show fs instance with sb->s_id field in all places we missed before. Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * | f2fs: fix potential overflowChao Yu2019-11-072-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We expect 64-bit calculation result from below statement, however in 32-bit machine, looped left shift operation on pgoff_t type variable may cause overflow issue, fix it by forcing type cast. page->index << PAGE_SHIFT; Fixes: 26de9b117130 ("f2fs: avoid unnecessary updating inode during fsync") Fixes: 0a2aa8fbb969 ("f2fs: refactor __exchange_data_block for speed up") Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>