mm/mmu_notifier: add an interval tree notifier

Of the 13 users of mmu_notifiers, 8 of them use only invalidate_range_start/end() and immediately intersect the mmu_notifier_range with some kind of internal list of VAs. 4 use an interval tree (i915_gem, radeon_mn, umem_odp, hfi1). 4 use a linked list of some kind (scif_dma, vhost, gntdev, hmm) And the remaining 5 either don't use invalidate_range_start() or do some special thing with it. It turns out that building a correct scheme with an interval tree is pretty complicated, particularly if the use case is synchronizing against another thread doing get_user_pages(). Many of these implementations have various subtle and difficult to fix races. This approach puts the interval tree as common code at the top of the mmu notifier call tree and implements a shareable locking scheme. It includes: - An interval tree tracking VA ranges, with per-range callbacks - A read/write locking scheme for the interval tree that avoids sleeping in the notifier path (for OOM killer) - A sequence counter based collision-retry locking scheme to tell device page fault that a VA range is being concurrently invalidated. This is based on various ideas: - hmm accumulates invalidated VA ranges and releases them when all invalidates are done, via active_invalidate_ranges count. This approach avoids having to intersect the interval tree twice (as umem_odp does) at the potential cost of a longer device page fault. - kvm/umem_odp use a sequence counter to drive the collision retry, via invalidate_seq - a deferred work todo list on unlock scheme like RTNL, via deferred_list. This makes adding/removing interval tree members more deterministic - seqlock, except this version makes the seqlock idea multi-holder on the write side by protecting it with active_invalidate_ranges and a spinlock To minimize MM overhead when only the interval tree is being used, the entire SRCU and hlist overheads are dropped using some simple branches. Similarly the interval tree overhead is dropped when in hlist mode. The overhead from the mandatory spinlock is broadly the same as most of existing users which already had a lock (or two) of some sort on the invalidation path. Link: https://lore.kernel.org/r/20191112202231.3856-3-jgg@ziepe.ca Acked-by: Christian König <christian.koenig@amd.com> Tested-by: Philip Yang <Philip.Yang@amd.com> Tested-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: John Hubbard <jhubbard@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
author: Jason Gunthorpe <jgg@mellanox.com> 2019-11-12 21:22:19 +0100
committer: Jason Gunthorpe <jgg@mellanox.com> 2019-11-24 00:56:44 +0100
commit: 99cb252f5e68d72afa3245a4e73d216d295cd335 (patch)
tree: 8e012ca4004ab6de4d13cdb1fb8c240e321ad792 /mm/Kconfig
parent: mm/mmu_notifier: define the header pre-processor parts even if disabled (diff)
download: linux-99cb252f5e68d72afa3245a4e73d216d295cd335.tar.xz
linux-99cb252f5e68d72afa3245a4e73d216d295cd335.zip
1 files changed, 1 insertions, 0 deletions
diff --git a/mm/Kconfig b/mm/Kconfig
index a5dae9a7eb51..d0b5046d9aef 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -284,6 +284,7 @@ config VIRT_TO_BUS
 config MMU_NOTIFIER
 	bool
 	select SRCU
+	select INTERVAL_TREE
 
 config KSM
 	bool "Enable KSM for page merging"
author	Jason Gunthorpe <jgg@mellanox.com>	2019-11-12 21:22:19 +0100
committer	Jason Gunthorpe <jgg@mellanox.com>	2019-11-24 00:56:44 +0100
commit	99cb252f5e68d72afa3245a4e73d216d295cd335 (patch)
tree	8e012ca4004ab6de4d13cdb1fb8c240e321ad792 /mm/Kconfig
parent	mm/mmu_notifier: define the header pre-processor parts even if disabled (diff)
download	linux-99cb252f5e68d72afa3245a4e73d216d295cd335.tar.xz linux-99cb252f5e68d72afa3245a4e73d216d295cd335.zip