summaryrefslogtreecommitdiffstats
path: root/fs/gfs2 (follow)
Commit message (Collapse)AuthorAgeFilesLines
* gfs2: Clean up glock demote logicAndreas Gruenbacher2024-07-091-6/+6
| | | | | | | | | | | | The logic for determining when to demote a glock in glock_work_func(), introduced in commit 7cf8dcd3b68a ("GFS2: Automatically adjust glock min hold time"), doesn't make sense: inode glocks have a minimum hold time that delays demotion, while all other glocks are expected to be demoted immediately. Instead of demoting non-inode glocks immediately, glock_work_func() schedules glock work for them to be demoted, however. Get rid of that unnecessary indirection. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: Revert "check for no eligible quota changes"Andreas Gruenbacher2024-06-201-20/+0
| | | | | | | | | | | | | Since the previous commit, function gfs2_quota_sync() will not cause the sync generation to creep forward by one every time the function is called; this helps keep things a but more tidy. We also don't care that this function allocates a page of memory every time it is called, so no good reason for keeping qd_changed() anymore, which just duplicates qd_grab_sync(). This reverts commit 06aa6fd31a5f402b055e12ea53bb7b086359d3c8. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: Be more careful with the quota sync generationAndreas Gruenbacher2024-06-201-8/+19
| | | | | | | | | | | | | | | | | The quota sync generation is only ever updated under sd_quota_sync_mutex by gfs2_quota_sync(), but its current value is fetched ouside of that mutex, so use WRITE_ONCE() and READ_ONCE() when accessing it without holding that mutex. Pass the current sync generation to do_sync() from its callers to ensure that we're not recording the wrong generation when the syncing is done. Also, make sure that qd->qd_sync_gen only ever moves forward. In gfs2_quota_sync(), only write the new sync generation when we know that there are changes. This eliminates the need for function sd_changed(), which we will remove in the next commit. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: Get rid of some unnecessary quota lockingAndreas Gruenbacher2024-06-203-28/+27
| | | | | | | | | With the locking the previous patch has introduced for each struct gfs2_quota_data object, sd_quota_mutex has become largely irrelevant. By waiting on the buffer head instead of waiting on the mutex in get_bh(), it becomes completely irrelevant and can be removed. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: Add some missing quota lockingAndreas Gruenbacher2024-06-121-29/+53
| | | | | | | | | | | | | | | | The quota code is missing some locking between local quota changes and syncing those quota changes to the global quota file (gfs2_quota_sync); in particular, qd->qd_change needs to be kept in sync with the QDF_CHANGE change flag and the number of references held. Use the qd->qd_lockref.lock spinlock for that. With the qd->qd_lockref.lock spinlock held, we can no longer call lockref_get(), so turn qd_hold() into a variant that assumes that the lock is held. This function is really supposed to take an additional reference when one or more references are already held, so check for that instead of checking if the lockref is dead. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: Fold qd_fish into gfs2_quota_syncAndreas Gruenbacher2024-06-081-48/+29
| | | | | | | | | | | The split between qd_fish() and gfs2_quota_sync() is rather unfortunate as qd_fish() is repeatedly called to scan sdp->sd_quota_list only to find the next object to that needs syncing; if there are multiple objects on the list that need syncing, it makes more sense to grab them all in one go. This is relatively easy to do when qd_fish() is folded into gfs2_quota_sync(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: quota need_sync cleanupAndreas Gruenbacher2024-06-081-14/+12
| | | | | | | | | | | | | | Rename variable 'value' to 'change' as it stores a change in value. Add new 'value' and 'limit' variables for the current value and limit. Only fetch the tuning parameters when we need them. Get rid of unnecessary nesting. No change in functionality. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: Fix and clean up function do_qcAndreas Gruenbacher2024-06-081-13/+21
| | | | | | | | | | | | | | | | | | | | | | | Function do_qc() is supposed to be conceptually simple: it alters the current in-memory and on-disk quota change values for a given uid/gid by a given delta. If the on-disk record isn't defined yet, a new record is created. If the on-disk record exists and the resulting change value is zero, there no longer is a need for that record and so the record is deleted. On top of that, some reference counting is involved when creating and deleting records. Currently, instead of doing the above, do_qc() alters the on-disk value and then it sets the in-memory value to the on-disk value. This is incorrect when the on-disk value differs from the in-memory value. The two values are allowed to differ when quota changes are synced to the global quota file. Fix by changing both values by the same amount. In addition, do_qc() currently gets confused when the delta value is 0. It isn't supposed to be called that way, but that assumption isn't mentioned and it makes the code harder to read. Make the code more explicit. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: Revert "Add quota_change type"Andreas Gruenbacher2024-06-082-15/+10
| | | | | | | | | | | | | | | | Commit 432928c93779 ("gfs2: Add quota_change type") makes the incorrect assertion that function do_qc() should behave differently in the two contexts it is used in, but that isn't actually true. In all cases, do_qc() grabs a "reference" when it starts using a slot in the per-node quota changes file, and it releases that "reference" when no more residual changes remain. Revert that broken commit. There are some remaining issues with function do_qc() which are addressed in the next commit. This reverts commit 432928c9377959684c748a9bc6553ed2d3c2ea4f. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: Revert "ignore negated quota changes"Andreas Gruenbacher2024-06-081-11/+0
| | | | | | | | | | | | | | | | Commit 4c6a08125f22 ("gfs2: ignore negated quota changes") skips quota changes with qd_change == 0 instead of writing them back, which leaves behind non-zero qd_change values in the affected slots. The kernel then assumes that those slots are unused, while the qd_change values on disk indicate that they are indeed still in use. The next time the filesystem is mounted, those invalid slots are read in from disk, which will cause inconsistencies. Revert that commit to avoid filesystem corruption. This reverts commit 4c6a08125f2249531ec01783a5f4317d7342add5. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: qd_check_sync cleanupsAndreas Gruenbacher2024-06-081-18/+22
| | | | | | | | | | | Rename qd_check_sync() to qd_grab_sync() and make it return a bool. Turn the sync_gen pointer into a regular u64 and pass in U64_MAX instead of a NULL pointer when sync generation checking isn't needed. Introduce a new qd_ungrab_sync() helper for undoing the effects of qd_grab_sync() if the subsequent bh_get() on the qd object fails. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: Revert "introduce qd_bh_get_or_undo"Andreas Gruenbacher2024-06-081-19/+17
| | | | | | | | | | The qd_bh_get_or_undo() helper introduced by that commit doesn't improve the code much, so revert it and clean things up in a more useful way in the next commit. This reverts commit 7dbc6ae60dd7089d8ed42892b6a66c138f0aa7a0. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: Check quota consistency on mountAndreas Gruenbacher2024-06-071-6/+31
| | | | | | | | | | | | | In gfs2_quota_init(), make sure that the per-node "quota_change%u" file doesn't contain duplicate uids/gids. Those duplicates would cause us to acquire the glock corresponding to those ids repeatedly, which the glock code doesn't allow. When finding inconsistencies, we wipe them out and ignore them. The resulting quotas will likely be inconsistent, and running quotacheck(1) is advised. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: Minor gfs2_quota_init error path cleanupAndreas Gruenbacher2024-06-041-9/+7
| | | | | | | | Add a fail_brelse label and use it where useful. Move variable bh out of the loop to extend its visibility to the new label. No functional change. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: Get rid of demote_ok checksAndreas Gruenbacher2024-05-293-41/+1
| | | | | | | | | | | | | | | | The demote_ok glock operation is only still used to prevent the inode glocks of the "jindex" and "rindex" directories from getting recycled while they are still referenced by sdp->sd_jindex and sdp->sd_rindex. However, the LRU walking code will no longer recycle glocks which are referenced, so the demote_ok glock operation is obsolete and can be removed. Each of a glock's holders in the gl_holders list is holding a reference on the glock, so when the list of holders isn't empty in demote_ok(), the existing reference count check will already prevent the glock from getting released. This means that demote_ok() is obsolete as well. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* Revert "GFS2: Don't add all glocks to the lru"Andreas Gruenbacher2024-05-293-12/+4
| | | | | | | | | | | | | | | | | | This reverts commit e7ccaf5fe1590667b3fa2f8df5c5ec9ba0dc5b85. Before commit e7ccaf5fe159, every time a resource group glock was dequeued by gfs2_glock_dq(), it was added to the glock LRU list even though the glock was still referenced by the resource group and could never be evicted, anyway. Commit e7ccaf5fe159 added a GLOF_LRU hack to avoid that overhead for resource group glocks, and that hack was since adopted for some other types of glocks as well. We now no longer add glocks to the glock LRU list while they are still referenced. This solves the underlying problem, and obsoletes the GLOF_LRU hack. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> (cherry picked from commit 3e5257c810cba91e274d07f3db5cf013c7c830be)
* gfs2: Revise glock reference counting modelAndreas Gruenbacher2024-05-293-28/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the current glock reference counting model, a bias of one is added to a glock's refcount when it is locked (gl->gl_state != LM_ST_UNLOCKED). A glock is removed from the lru_list when it is enqueued, and added back when it is dequeued. This isn't a very appropriate model because most glocks are held for long periods of time (for example, the inode "owns" references to its inode and iopen glocks as long as the inode is cached even when the glock state changes to LM_ST_UNLOCKED), and they can only be freed when they are no longer referenced, anyway. Fix this by getting rid of the refcount bias for locked glocks. That way, we can use lockref_put_or_lock() to efficiently drop all but the last glock reference, and put the glock onto the lru_list when the last reference is dropped. When find_insert_glock() returns a reference to a cached glock, it removes the glock from the lru_list. Dumping the "glocks" and "glstats" debugfs files also takes glock references, but instead of removing the glocks from the lru_list in that case as well, we leave them on the list. This ensures that dumping those files won't perturb the order of the glocks on the lru_list. In addition, when the last reference to an *unlocked* glock is dropped, we immediately free it; this preserves the preexisting behavior. If it later turns out that caching unlocked glocks is useful in some situations, we can change the caching strategy. It is currently unclear if a glock that has no active references can have the GLF_LFLUSH flag set. To make sure that such a glock won't accidentally be evicted due to memory pressure, we add a GLF_LFLUSH check to gfs2_dispose_glock_lru(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: Switch to a per-filesystem glock workqueueAndreas Gruenbacher2024-05-293-15/+18
| | | | | | | | Switch to a per-filesystem glock workqueue. Additional workqueues are cheap nowadays, and keeping separate workqueues allows to flush the work of each filesystem without affecting the others. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: Report when glocks cannot be freed for a long timeAndreas Gruenbacher2024-05-291-3/+15
| | | | | | | | When glocks cannot be freed for a long time, avoid the "task blocked for more than N seconds" messages and report how many glocks are still outstanding, instead. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: gfs2_glock_get cleanupAndreas Gruenbacher2024-05-291-20/+13
| | | | | | | Clean up the messy code in gfs2_glock_get(). No change in functionality. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: Invert the GLF_INITIAL flagAndreas Gruenbacher2024-05-293-10/+22
| | | | | | | | | | | | | | | | | | | | Invert the meaning of the GLF_INITIAL flag: right now, when GLF_INITIAL is set, a DLM lock exists and we have a valid identifier for it; when GLF_INITIAL is cleared, no DLM lock exists (yet). This is confusing. In addition, it makes more sense to highlight the exceptional case (i.e., no DLM lock exists yet) in glock dumps and trace points than to highlight the common case. To avoid confusion between the "old" and the "new" meaning of the flag, use 'a' instead of 'I' to represent the flag. For improved code consistency, check if the GLF_INITIAL flag is cleared to determine whether a DLM lock exists instead of checking if the lock identifier is non-zero. Document what the flag is used for. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: Remove outdated comment in glock_work_funcAndreas Gruenbacher2024-05-291-5/+1
| | | | Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: Rename handle_callback to request_demoteAndreas Gruenbacher2024-05-281-10/+10
| | | | | | | | | | | Function handle_callback() is used to request a glock demote. This often happens in response to a conflicting remote locking request and subsequent bast callback from DLM, but there are other reasons for triggering a demote request as well, such as when trying to release a glock in response to memory pressure. To clarify that, rename the function to request_demote(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: Rename GLF_FROZEN to GLF_HAVE_FROZEN_REPLYAndreas Gruenbacher2024-05-283-6/+6
| | | | | | | | The GLF_FROZEN flag indicates that a reply to a DLM locking request has been received, but should not be processed at this time. To clarify that meaning, rename the flag to GLF_HAVE_FROZEN_REPLY. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: Rename GLF_REPLY_PENDING to GLF_HAVE_REPLYAndreas Gruenbacher2024-05-283-9/+9
| | | | | | | | | | The GLF_REPLY_PENDING flag indicates to glock_work_func() that in response to a locking request, DLM has sent a reply that needs to be processed. A flag with that name could as well indicate that we are waiting on a reply from DLM, however. To disambiguate these two cases, rename the flag to GLF_HAVE_REPLY. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: Rename GLF_FREEING to GLF_UNLOCKEDAndreas Gruenbacher2024-05-285-16/+16
| | | | | | | | | Rename the GLF_FREEING flag to GLF_UNLOCKED, and the ->go_free glock operation to ->go_unlocked. This mechanism is used to wait for the underlying DLM lock to be unlocked; being able to free the glock is a consequence of the DLM lock being unlocked. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: Remove useless return statement in run_queueAndreas Gruenbacher2024-05-281-1/+0
| | | | | | The return statement at the end of run_queue() is useless. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* gfs2: Remove unnecessary function prototypeAndreas Gruenbacher2024-05-281-1/+0
| | | | | | | Function __gfs2_glock_dq() gets defined before it is used, so there is no need for a separate function declaration. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
* Merge tag 'pull-bd_inode-1' of ↵Linus Torvalds2024-05-212-2/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull bdev bd_inode updates from Al Viro: "Replacement of bdev->bd_inode with sane(r) set of primitives by me and Yu Kuai" * tag 'pull-bd_inode-1' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: RIP ->bd_inode dasd_format(): killing the last remaining user of ->bd_inode nilfs_attach_log_writer(): use ->bd_mapping->host instead of ->bd_inode block/bdev.c: use the knowledge of inode/bdev coallocation gfs2: more obvious initializations of mapping->host fs/buffer.c: massage the remaining users of ->bd_inode to ->bd_mapping blk_ioctl_{discard,zeroout}(): we only want ->bd_inode->i_mapping here... grow_dev_folio(): we only want ->bd_inode->i_mapping there use ->bd_mapping instead of ->bd_inode->i_mapping block_device: add a pointer to struct address_space (page cache of bdev) missing helpers: bdev_unhash(), bdev_drop() block: move two helpers into bdev.c block2mtd: prevent direct access of bd_inode dm-vdo: use bdev_nr_bytes(bdev) instead of i_size_read(bdev->bd_inode) blkdev_write_iter(): saner way to get inode and bdev bcachefs: remove dead function bdev_sectors() ext4: remove block_device_ejected() erofs_buf: store address_space instead of inode erofs: switch erofs_bread() to passing offset instead of block number
| * gfs2: more obvious initializations of mapping->hostAl Viro2024-05-032-2/+2
| | | | | | | | | | | | | | | | what's going on is copying the ->host of bdev's address_space Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/r/20240411145346.2516848-4-viro@zeniv.linux.org.uk Signed-off-by: Christian Brauner <brauner@kernel.org>
* | Merge tag 'gfs2-for-v6.10' of ↵Linus Torvalds2024-05-1518-258/+320
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Properly fix the glock shrinker this time: it broke in commit "gfs2: Make glock lru list scanning safer" and commit "gfs2: fix glock shrinker ref issues" wasn't actually enough to fix it - On unmount, keep glocks around long enough that no more dlm callbacks can occur on them - Some more folio conversion patches from Matthew Wilcox - Lots of other smaller fixes and cleanups * tag 'gfs2-for-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (27 commits) gfs2: make timeout values more explicit gfs2: Convert gfs2_aspace_writepage() to use a folio gfs2: Add a migrate_folio operation for journalled files gfs2: Simplify gfs2_read_super gfs2: Convert gfs2_page_mkwrite() to use a folio gfs2: gfs2_freeze_unlock cleanup gfs2: Remove and replace gfs2_glock_queue_work gfs2: do_xmote fixes gfs2: finish_xmote cleanup gfs2: Unlock fewer glocks on unmount gfs2: Fix potential glock use-after-free on unmount gfs2: Remove ill-placed consistency check gfs2: Fix lru_count accounting gfs2: Fix "Make glock lru list scanning safer" Revert "gfs2: fix glock shrinker ref issues" gfs2: Fix "ignore unlock failures after withdraw" gfs2: Get rid of unnecessary test_and_set_bit gfs2: Don't set GLF_LOCK in gfs2_dispose_glock_lru gfs2: Replace gfs2_glock_queue_put with gfs2_glock_put_async gfs2: Get rid of gfs2_glock_queue_put in signal_our_withdraw ...
| * gfs2: make timeout values more explicitWolfram Sang2024-05-071-3/+2
| | | | | | | | | | | | | | | | | | | | | | 'timeout' is a vague name for the return value of wait_event_*_timeout because it actually returns the time left. Because the variable is never used later, just drop the return value. Since variable 'timeout' is then only used to carry a fixed timeout value, drop this in favor of a fixed function argument as in the other call to wait_event_timeout() above. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * gfs2: Convert gfs2_aspace_writepage() to use a folioMatthew Wilcox (Oracle)2024-05-031-8/+8
| | | | | | | | | | | | | | | | Convert the incoming struct page to a folio and use it throughout. Saves six calls to compound_head(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * gfs2: Add a migrate_folio operation for journalled filesMatthew Wilcox (Oracle)2024-05-031-2/+2
| | | | | | | | | | | | | | | | | | | | For journalled data, folio migration currently works by writing the folio back, freeing the folio and faulting the new folio back in. We can bypass that by telling the migration code to migrate the buffer_heads attached to our folios. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * gfs2: Simplify gfs2_read_superMatthew Wilcox (Oracle)2024-05-021-33/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Use submit_bio_wait() instead of hand-rolling our own synchronous wait. Also allocate the BIO on the stack since we're not deep in the call stack at this point. There's no need to kmap the page, since it isn't allocated from HIGHMEM. Turn the GFP_NOFS allocation into GFP_KERNEL; if the page allocator enters reclaim, we cannot be called as the filesystem has not yet been initialised and so has no pages to reclaim. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * gfs2: Convert gfs2_page_mkwrite() to use a folioMatthew Wilcox (Oracle)2024-04-291-29/+30
| | | | | | | | | | | | | | | | | | | | | | | | Convert the incoming page to a folio and use it throughout, saving several calls to compound_head(). Also use 'pos' for file position rather than the ambiguous 'offset' and convert 'length' to type size_t in case we get some truly ridiculous sized folios in the future. This function should now be large-folio safe, but I may have missed something. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * gfs2: gfs2_freeze_unlock cleanupAndreas Gruenbacher2024-04-294-11/+11
| | | | | | | | | | | | | | Function gfs2_freeze_unlock() is always called with &sdp->sd_freeze_gh as its argument, so clean up the code by passing in sdp instead. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * gfs2: Remove and replace gfs2_glock_queue_workAndreas Gruenbacher2024-04-241-20/+15
| | | | | | | | | | | | | | | | There are no more callers of gfs2_glock_queue_work() left, so remove that helper. With that, we can now rename __gfs2_glock_queue_work() back to gfs2_glock_queue_work() to get rid of some unnecessary clutter. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * gfs2: do_xmote fixesAndreas Gruenbacher2024-04-241-19/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Function do_xmote() is called with the glock spinlock held. Commit 86934198eefa added a 'goto skip_inval' statement at the beginning of the function to further below where the glock spinlock is expected not to be held anymore. Then it added code there that requires the glock spinlock to be held. This doesn't make sense; fix this up by dropping and retaking the spinlock where needed. In addition, when ->lm_lock() returned an error, do_xmote() didn't fail the locking operation, and simply left the glock hanging; fix that as well. (This is a much older error.) Fixes: 86934198eefa ("gfs2: Clear flags when withdraw prevents xmote") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * gfs2: finish_xmote cleanupAndreas Gruenbacher2024-04-241-8/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, function finish_xmote() takes and releases the glock spinlock. However, all of its callers immediately take that spinlock again, so it makes more sense to take the spin lock before calling finish_xmote() already. With that, thaw_glock() is the only place that sets the GLF_HAVE_REPLY flag outside of the glock spinlock, but it also takes that spinlock immediately thereafter. Change that to set the bit when the spinlock is already held. This allows to switch from test_and_clear_bit() to test_bit() and clear_bit() in glock_work_func(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * gfs2: Unlock fewer glocks on unmountAndreas Gruenbacher2024-04-241-2/+8
| | | | | | | | | | | | | | | | | | | | | | At unmount time, we would generally like to explicitly unlock as few glocks as possible for efficiency. We are already skipping glocks that don't have a lock value block (LVB), but we can also skip glocks which are not held in DLM_LOCK_EX or DLM_LOCK_PW mode (of which gfs2 only uses DLM_LOCK_EX under the name LM_ST_EXCLUSIVE). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: David Teigland <teigland@redhat.com>
| * gfs2: Fix potential glock use-after-free on unmountAndreas Gruenbacher2024-04-246-16/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a DLM lockspace is released and there ares still locks in that lockspace, DLM will unlock those locks automatically. Commit fb6791d100d1b started exploiting this behavior to speed up filesystem unmount: gfs2 would simply free glocks it didn't want to unlock and then release the lockspace. This didn't take the bast callbacks for asynchronous lock contention notifications into account, which remain active until until a lock is unlocked or its lockspace is released. To prevent those callbacks from accessing deallocated objects, put the glocks that should not be unlocked on the sd_dead_glocks list, release the lockspace, and only then free those glocks. As an additional measure, ignore unexpected ast and bast callbacks if the receiving glock is dead. Fixes: fb6791d100d1b ("GFS2: skip dlm_unlock calls in unmount") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: David Teigland <teigland@redhat.com>
| * gfs2: Remove ill-placed consistency checkAndreas Gruenbacher2024-04-241-1/+0
| | | | | | | | | | | | | | | | | | | | This consistency check was originally added by commit 9287c6452d2b1 ("gfs2: Fix occasional glock use-after-free"). It is ill-placed in gfs2_glock_free() because if it holds there, it must equally hold in __gfs2_glock_put() already. Either way, the check doesn't seem necessary anymore. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * gfs2: Fix lru_count accountingAndreas Gruenbacher2024-04-241-14/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, gfs2_scan_glock_lru() decrements lru_count when a glock is moved onto the dispose list. When such a glock is then stolen from the dispose list while gfs2_dispose_glock_lru() doesn't hold the lru_lock, lru_count will be decremented again, so the counter will eventually go negative. This bug has existed in one form or another since at least commit 97cc1025b1a91 ("GFS2: Kill two daemons with one patch"). Fix this by only decrementing lru_count when we actually remove a glock and schedule for it to be unlocked and dropped. We also don't need to remove and then re-add glocks when we can just as well move them back onto the lru_list when necessary. In addition, return the number of glocks freed as we should, not the number of glocks moved onto the dispose list. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * gfs2: Fix "Make glock lru list scanning safer"Andreas Gruenbacher2024-04-091-11/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 228804a35caa tried to add a refcount check to gfs2_scan_glock_lru() to make sure that glocks that are still referenced cannot be freed. It failed to account for the bias state_change() adds to the refcount for held glocks, so held glocks are no longer removed from the glock cache, which can lead to out-of-memory problems. Fix that. (The inodes those glocks are associated with do get shrunk and do get pushed out of memory.) In addition, use the same eligibility check in gfs2_scan_glock_lru() and gfs2_dispose_glock_lru(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * Revert "gfs2: fix glock shrinker ref issues"Andreas Gruenbacher2024-04-091-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 62862485a4c3a52029fc30f4bdde9af04afdafc9. Commit 62862485a4c3 tried to fix issues introduced by commit 228804a35caa ("gfs2: Make glock lru list scanning safer"), but like that commit, it failed to account for the bias state_change() adds to the glock reference count for locked glocks. Revert commit 62862485a4c3 so that we can fix commit 228804a35caa properly. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * gfs2: Fix "ignore unlock failures after withdraw"Andreas Gruenbacher2024-04-092-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 3e11e53041502 tries to suppress dlm_lock() lock conversion errors that occur when the lockspace has already been released. It does that by setting and checking the SDF_SKIP_DLM_UNLOCK flag. This conflicts with the intended meaning of the SDF_SKIP_DLM_UNLOCK flag, so check whether the lockspace is still allocated instead. (Given the current DLM API, checking for this kind of error after the fact seems easier that than to make sure that the lockspace is still allocated before calling dlm_lock(). Changing the DLM API so that users maintain the lockspace references themselves would be an option.) Fixes: 3e11e53041502 ("GFS2: ignore unlock failures after withdraw") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * gfs2: Get rid of unnecessary test_and_set_bitAndreas Gruenbacher2024-04-091-1/+2
| | | | | | | | | | | | | | | | The GLF_LOCK flag is protected by the gl->gl_lockref.lock spin lock which is held when entering run_queue(), so we can use test_bit() and set_bit() here. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * gfs2: Don't set GLF_LOCK in gfs2_dispose_glock_lruAndreas Gruenbacher2024-04-091-2/+1
| | | | | | | | | | | | | | | | In gfs2_dispose_glock_lru(), we want to skip glocks which are in the process of transitioning state (as indicated by the set GLF_LOCK flag), but we we don't need to set that flag for requesting a state transition. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
| * gfs2: Replace gfs2_glock_queue_put with gfs2_glock_put_asyncAndreas Gruenbacher2024-04-094-14/+21
| | | | | | | | | | | | | | | | | | | | | | Function gfs2_glock_queue_put() puts a glock reference by enqueuing glock work instead of putting the reference directly. This ensures that the operation won't sleep, but it is costly and really only necessary when putting the final glock reference. Replace it with a new gfs2_glock_put_async() function that only queues glock work when putting the last glock reference. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>