docs: RCU: Convert checklist.txt to ReST

- Add a SPDX header; - Adjust document title; - Some whitespace fixes and new line breaks; - Use the right list markups; - Add it to RCU/index.rst. Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
author: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> 2020-04-21 19:04:02 +0200
committer: Paul E. McKenney <paulmck@kernel.org> 2020-06-29 20:58:10 +0200
commit: 6b05dfacd761c6ace11def4b3b42fc6a7583fec3 (patch)
tree: 5fc7623c686d4f35a8ba45b3640e42fd7203754b /Documentation/RCU/checklist.txt
parent: Linux 5.8-rc3 (diff)
download: linux-6b05dfacd761c6ace11def4b3b42fc6a7583fec3.tar.xz
linux-6b05dfacd761c6ace11def4b3b42fc6a7583fec3.zip
1 files changed, 0 insertions, 458 deletions
diff --git a/Documentation/RCU/checklist.txt b/Documentation/RCU/checklist.txt
deleted file mode 100644
index e98ff261a438..000000000000
--- a/Documentation/RCU/checklist.txt
+++ /dev/null
@@ -1,458 +0,0 @@
-Review Checklist for RCU Patches
-
-
-This document contains a checklist for producing and reviewing patches
-that make use of RCU.  Violating any of the rules listed below will
-result in the same sorts of problems that leaving out a locking primitive
-would cause.  This list is based on experiences reviewing such patches
-over a rather long period of time, but improvements are always welcome!
-
-0.	Is RCU being applied to a read-mostly situation?  If the data
-	structure is updated more than about 10% of the time, then you
-	should strongly consider some other approach, unless detailed
-	performance measurements show that RCU is nonetheless the right
-	tool for the job.  Yes, RCU does reduce read-side overhead by
-	increasing write-side overhead, which is exactly why normal uses
-	of RCU will do much more reading than updating.
-
-	Another exception is where performance is not an issue, and RCU
-	provides a simpler implementation.  An example of this situation
-	is the dynamic NMI code in the Linux 2.6 kernel, at least on
-	architectures where NMIs are rare.
-
-	Yet another exception is where the low real-time latency of RCU's
-	read-side primitives is critically important.
-
-	One final exception is where RCU readers are used to prevent
-	the ABA problem (https://en.wikipedia.org/wiki/ABA_problem)
-	for lockless updates.  This does result in the mildly
-	counter-intuitive situation where rcu_read_lock() and
-	rcu_read_unlock() are used to protect updates, however, this
-	approach provides the same potential simplifications that garbage
-	collectors do.
-
-1.	Does the update code have proper mutual exclusion?
-
-	RCU does allow -readers- to run (almost) naked, but -writers- must
-	still use some sort of mutual exclusion, such as:
-
-	a.	locking,
-	b.	atomic operations, or
-	c.	restricting updates to a single task.
-
-	If you choose #b, be prepared to describe how you have handled
-	memory barriers on weakly ordered machines (pretty much all of
-	them -- even x86 allows later loads to be reordered to precede
-	earlier stores), and be prepared to explain why this added
-	complexity is worthwhile.  If you choose #c, be prepared to
-	explain how this single task does not become a major bottleneck on
-	big multiprocessor machines (for example, if the task is updating
-	information relating to itself that other tasks can read, there
-	by definition can be no bottleneck).  Note that the definition
-	of "large" has changed significantly:  Eight CPUs was "large"
-	in the year 2000, but a hundred CPUs was unremarkable in 2017.
-
-2.	Do the RCU read-side critical sections make proper use of
-	rcu_read_lock() and friends?  These primitives are needed
-	to prevent grace periods from ending prematurely, which
-	could result in data being unceremoniously freed out from
-	under your read-side code, which can greatly increase the
-	actuarial risk of your kernel.
-
-	As a rough rule of thumb, any dereference of an RCU-protected
-	pointer must be covered by rcu_read_lock(), rcu_read_lock_bh(),
-	rcu_read_lock_sched(), or by the appropriate update-side lock.
-	Disabling of preemption can serve as rcu_read_lock_sched(), but
-	is less readable and prevents lockdep from detecting locking issues.
-
-	Letting RCU-protected pointers "leak" out of an RCU read-side
-	critical section is every bid as bad as letting them leak out
-	from under a lock.  Unless, of course, you have arranged some
-	other means of protection, such as a lock or a reference count
-	-before- letting them out of the RCU read-side critical section.
-
-3.	Does the update code tolerate concurrent accesses?
-
-	The whole point of RCU is to permit readers to run without
-	any locks or atomic operations.  This means that readers will
-	be running while updates are in progress.  There are a number
-	of ways to handle this concurrency, depending on the situation:
-
-	a.	Use the RCU variants of the list and hlist update
-		primitives to add, remove, and replace elements on
-		an RCU-protected list.	Alternatively, use the other
-		RCU-protected data structures that have been added to
-		the Linux kernel.
-
-		This is almost always the best approach.
-
-	b.	Proceed as in (a) above, but also maintain per-element
-		locks (that are acquired by both readers and writers)
-		that guard per-element state.  Of course, fields that
-		the readers refrain from accessing can be guarded by
-		some other lock acquired only by updaters, if desired.
-
-		This works quite well, also.
-
-	c.	Make updates appear atomic to readers.	For example,
-		pointer updates to properly aligned fields will
-		appear atomic, as will individual atomic primitives.
-		Sequences of operations performed under a lock will -not-
-		appear to be atomic to RCU readers, nor will sequences
-		of multiple atomic primitives.
-
-		This can work, but is starting to get a bit tricky.
-
-	d.	Carefully order the updates and the reads so that
-		readers see valid data at all phases of the update.
-		This is often more difficult than it sounds, especially
-		given modern CPUs' tendency to reorder memory references.
-		One must usually liberally sprinkle memory barriers
-		(smp_wmb(), smp_rmb(), smp_mb()) through the code,
-		making it difficult to understand and to test.
-
-		It is usually better to group the changing data into
-		a separate structure, so that the change may be made
-		to appear atomic by updating a pointer to reference
-		a new structure containing updated values.
-
-4.	Weakly ordered CPUs pose special challenges.  Almost all CPUs
-	are weakly ordered -- even x86 CPUs allow later loads to be
-	reordered to precede earlier stores.  RCU code must take all of
-	the following measures to prevent memory-corruption problems:
-
-	a.	Readers must maintain proper ordering of their memory
-		accesses.  The rcu_dereference() primitive ensures that
-		the CPU picks up the pointer before it picks up the data
-		that the pointer points to.  This really is necessary
-		on Alpha CPUs.	If you don't believe me, see:
-
-			http://www.openvms.compaq.com/wizard/wiz_2637.html
-
-		The rcu_dereference() primitive is also an excellent
-		documentation aid, letting the person reading the
-		code know exactly which pointers are protected by RCU.
-		Please note that compilers can also reorder code, and
-		they are becoming increasingly aggressive about doing
-		just that.  The rcu_dereference() primitive therefore also
-		prevents destructive compiler optimizations.  However,
-		with a bit of devious creativity, it is possible to
-		mishandle the return value from rcu_dereference().
-		Please see rcu_dereference.txt in this directory for
-		more information.
-
-		The rcu_dereference() primitive is used by the
-		various "_rcu()" list-traversal primitives, such
-		as the list_for_each_entry_rcu().  Note that it is
-		perfectly legal (if redundant) for update-side code to
-		use rcu_dereference() and the "_rcu()" list-traversal
-		primitives.  This is particularly useful in code that
-		is common to readers and updaters.  However, lockdep
-		will complain if you access rcu_dereference() outside
-		of an RCU read-side critical section.  See lockdep.txt
-		to learn what to do about this.
-
-		Of course, neither rcu_dereference() nor the "_rcu()"
-		list-traversal primitives can substitute for a good
-		concurrency design coordinating among multiple updaters.
-
-	b.	If the list macros are being used, the list_add_tail_rcu()
-		and list_add_rcu() primitives must be used in order
-		to prevent weakly ordered machines from misordering
-		structure initialization and pointer planting.
-		Similarly, if the hlist macros are being used, the
-		hlist_add_head_rcu() primitive is required.
-
-	c.	If the list macros are being used, the list_del_rcu()
-		primitive must be used to keep list_del()'s pointer
-		poisoning from inflicting toxic effects on concurrent
-		readers.  Similarly, if the hlist macros are being used,
-		the hlist_del_rcu() primitive is required.
-
-		The list_replace_rcu() and hlist_replace_rcu() primitives
-		may be used to replace an old structure with a new one
-		in their respective types of RCU-protected lists.
-
-	d.	Rules similar to (4b) and (4c) apply to the "hlist_nulls"
-		type of RCU-protected linked lists.
-
-	e.	Updates must ensure that initialization of a given
-		structure happens before pointers to that structure are
-		publicized.  Use the rcu_assign_pointer() primitive
-		when publicizing a pointer to a structure that can
-		be traversed by an RCU read-side critical section.
-
-5.	If call_rcu() or call_srcu() is used, the callback function will
-	be called from softirq context.  In particular, it cannot block.
-
-6.	Since synchronize_rcu() can block, it cannot be called
-	from any sort of irq context.  The same rule applies
-	for synchronize_srcu(), synchronize_rcu_expedited(), and
-	synchronize_srcu_expedited().
-
-	The expedited forms of these primitives have the same semantics
-	as the non-expedited forms, but expediting is both expensive and
-	(with the exception of synchronize_srcu_expedited()) unfriendly
-	to real-time workloads.  Use of the expedited primitives should
-	be restricted to rare configuration-change operations that would
-	not normally be undertaken while a real-time workload is running.
-	However, real-time workloads can use rcupdate.rcu_normal kernel
-	boot parameter to completely disable expedited grace periods,
-	though this might have performance implications.
-
-	In particular, if you find yourself invoking one of the expedited
-	primitives repeatedly in a loop, please do everyone a favor:
-	Restructure your code so that it batches the updates, allowing
-	a single non-expedited primitive to cover the entire batch.
-	This will very likely be faster than the loop containing the
-	expedited primitive, and will be much much easier on the rest
-	of the system, especially to real-time workloads running on
-	the rest of the system.
-
-7.	As of v4.20, a given kernel implements only one RCU flavor,
-	which is RCU-sched for PREEMPT=n and RCU-preempt for PREEMPT=y.
-	If the updater uses call_rcu() or synchronize_rcu(),
-	then the corresponding readers my use rcu_read_lock() and
-	rcu_read_unlock(), rcu_read_lock_bh() and rcu_read_unlock_bh(),
-	or any pair of primitives that disables and re-enables preemption,
-	for example, rcu_read_lock_sched() and rcu_read_unlock_sched().
-	If the updater uses synchronize_srcu() or call_srcu(),
-	then the corresponding readers must use srcu_read_lock() and
-	srcu_read_unlock(), and with the same srcu_struct.  The rules for
-	the expedited primitives are the same as for their non-expedited
-	counterparts.  Mixing things up will result in confusion and
-	broken kernels, and has even resulted in an exploitable security
-	issue.
-
-	One exception to this rule: rcu_read_lock() and rcu_read_unlock()
-	may be substituted for rcu_read_lock_bh() and rcu_read_unlock_bh()
-	in cases where local bottom halves are already known to be
-	disabled, for example, in irq or softirq context.  Commenting
-	such cases is a must, of course!  And the jury is still out on
-	whether the increased speed is worth it.
-
-8.	Although synchronize_rcu() is slower than is call_rcu(), it
-	usually results in simpler code.  So, unless update performance is
-	critically important, the updaters cannot block, or the latency of
-	synchronize_rcu() is visible from userspace, synchronize_rcu()
-	should be used in preference to call_rcu().  Furthermore,
-	kfree_rcu() usually results in even simpler code than does
-	synchronize_rcu() without synchronize_rcu()'s multi-millisecond
-	latency.  So please take advantage of kfree_rcu()'s "fire and
-	forget" memory-freeing capabilities where it applies.
-
-	An especially important property of the synchronize_rcu()
-	primitive is that it automatically self-limits: if grace periods
-	are delayed for whatever reason, then the synchronize_rcu()
-	primitive will correspondingly delay updates.  In contrast,
-	code using call_rcu() should explicitly limit update rate in
-	cases where grace periods are delayed, as failing to do so can
-	result in excessive realtime latencies or even OOM conditions.
-
-	Ways of gaining this self-limiting property when using call_rcu()
-	include:
-
-	a.	Keeping a count of the number of data-structure elements
-		used by the RCU-protected data structure, including
-		those waiting for a grace period to elapse.  Enforce a
-		limit on this number, stalling updates as needed to allow
-		previously deferred frees to complete.	Alternatively,
-		limit only the number awaiting deferred free rather than
-		the total number of elements.
-
-		One way to stall the updates is to acquire the update-side
-		mutex.	(Don't try this with a spinlock -- other CPUs
-		spinning on the lock could prevent the grace period
-		from ever ending.)  Another way to stall the updates
-		is for the updates to use a wrapper function around
-		the memory allocator, so that this wrapper function
-		simulates OOM when there is too much memory awaiting an
-		RCU grace period.  There are of course many other
-		variations on this theme.
-
-	b.	Limiting update rate.  For example, if updates occur only
-		once per hour, then no explicit rate limiting is
-		required, unless your system is already badly broken.
-		Older versions of the dcache subsystem take this approach,
-		guarding updates with a global lock, limiting their rate.
-
-	c.	Trusted update -- if updates can only be done manually by
-		superuser or some other trusted user, then it might not
-		be necessary to automatically limit them.  The theory
-		here is that superuser already has lots of ways to crash
-		the machine.
-
-	d.	Periodically invoke synchronize_rcu(), permitting a limited
-		number of updates per grace period.
-
-	The same cautions apply to call_srcu() and kfree_rcu().
-
-	Note that although these primitives do take action to avoid memory
-	exhaustion when any given CPU has too many callbacks, a determined
-	user could still exhaust memory.  This is especially the case
-	if a system with a large number of CPUs has been configured to
-	offload all of its RCU callbacks onto a single CPU, or if the
-	system has relatively little free memory.
-
-9.	All RCU list-traversal primitives, which include
-	rcu_dereference(), list_for_each_entry_rcu(), and
-	list_for_each_safe_rcu(), must be either within an RCU read-side
-	critical section or must be protected by appropriate update-side
-	locks.	RCU read-side critical sections are delimited by
-	rcu_read_lock() and rcu_read_unlock(), or by similar primitives
-	such as rcu_read_lock_bh() and rcu_read_unlock_bh(), in which
-	case the matching rcu_dereference() primitive must be used in
-	order to keep lockdep happy, in this case, rcu_dereference_bh().
-
-	The reason that it is permissible to use RCU list-traversal
-	primitives when the update-side lock is held is that doing so
-	can be quite helpful in reducing code bloat when common code is
-	shared between readers and updaters.  Additional primitives
-	are provided for this case, as discussed in lockdep.txt.
-
-10.	Conversely, if you are in an RCU read-side critical section,
-	and you don't hold the appropriate update-side lock, you -must-
-	use the "_rcu()" variants of the list macros.  Failing to do so
-	will break Alpha, cause aggressive compilers to generate bad code,
-	and confuse people trying to read your code.
-
-11.	Any lock acquired by an RCU callback must be acquired elsewhere
-	with softirq disabled, e.g., via spin_lock_irqsave(),
-	spin_lock_bh(), etc.  Failing to disable softirq on a given
-	acquisition of that lock will result in deadlock as soon as
-	the RCU softirq handler happens to run your RCU callback while
-	interrupting that acquisition's critical section.
-
-12.	RCU callbacks can be and are executed in parallel.  In many cases,
-	the callback code simply wrappers around kfree(), so that this
-	is not an issue (or, more accurately, to the extent that it is
-	an issue, the memory-allocator locking handles it).  However,
-	if the callbacks do manipulate a shared data structure, they
-	must use whatever locking or other synchronization is required
-	to safely access and/or modify that data structure.
-
-	Do not assume that RCU callbacks will be executed on the same
-	CPU that executed the corresponding call_rcu() or call_srcu().
-	For example, if a given CPU goes offline while having an RCU
-	callback pending, then that RCU callback will execute on some
-	surviving CPU.	(If this was not the case, a self-spawning RCU
-	callback would prevent the victim CPU from ever going offline.)
-	Furthermore, CPUs designated by rcu_nocbs= might well -always-
-	have their RCU callbacks executed on some other CPUs, in fact,
-	for some  real-time workloads, this is the whole point of using
-	the rcu_nocbs= kernel boot parameter.
-
-13.	Unlike other forms of RCU, it -is- permissible to block in an
-	SRCU read-side critical section (demarked by srcu_read_lock()
-	and srcu_read_unlock()), hence the "SRCU": "sleepable RCU".
-	Please note that if you don't need to sleep in read-side critical
-	sections, you should be using RCU rather than SRCU, because RCU
-	is almost always faster and easier to use than is SRCU.
-
-	Also unlike other forms of RCU, explicit initialization and
-	cleanup is required either at build time via DEFINE_SRCU()
-	or DEFINE_STATIC_SRCU() or at runtime via init_srcu_struct()
-	and cleanup_srcu_struct().  These last two are passed a
-	"struct srcu_struct" that defines the scope of a given
-	SRCU domain.  Once initialized, the srcu_struct is passed
-	to srcu_read_lock(), srcu_read_unlock() synchronize_srcu(),
-	synchronize_srcu_expedited(), and call_srcu().	A given
-	synchronize_srcu() waits only for SRCU read-side critical
-	sections governed by srcu_read_lock() and srcu_read_unlock()
-	calls that have been passed the same srcu_struct.  This property
-	is what makes sleeping read-side critical sections tolerable --
-	a given subsystem delays only its own updates, not those of other
-	subsystems using SRCU.	Therefore, SRCU is less prone to OOM the
-	system than RCU would be if RCU's read-side critical sections
-	were permitted to sleep.
-
-	The ability to sleep in read-side critical sections does not
-	come for free.	First, corresponding srcu_read_lock() and
-	srcu_read_unlock() calls must be passed the same srcu_struct.
-	Second, grace-period-detection overhead is amortized only
-	over those updates sharing a given srcu_struct, rather than
-	being globally amortized as they are for other forms of RCU.
-	Therefore, SRCU should be used in preference to rw_semaphore
-	only in extremely read-intensive situations, or in situations
-	requiring SRCU's read-side deadlock immunity or low read-side
-	realtime latency.  You should also consider percpu_rw_semaphore
-	when you need lightweight readers.
-
-	SRCU's expedited primitive (synchronize_srcu_expedited())
-	never sends IPIs to other CPUs, so it is easier on
-	real-time workloads than is synchronize_rcu_expedited().
-
-	Note that rcu_assign_pointer() relates to SRCU just as it does to
-	other forms of RCU, but instead of rcu_dereference() you should
-	use srcu_dereference() in order to avoid lockdep splats.
-
-14.	The whole point of call_rcu(), synchronize_rcu(), and friends
-	is to wait until all pre-existing readers have finished before
-	carrying out some otherwise-destructive operation.  It is
-	therefore critically important to -first- remove any path
-	that readers can follow that could be affected by the
-	destructive operation, and -only- -then- invoke call_rcu(),
-	synchronize_rcu(), or friends.
-
-	Because these primitives only wait for pre-existing readers, it
-	is the caller's responsibility to guarantee that any subsequent
-	readers will execute safely.
-
-15.	The various RCU read-side primitives do -not- necessarily contain
-	memory barriers.  You should therefore plan for the CPU
-	and the compiler to freely reorder code into and out of RCU
-	read-side critical sections.  It is the responsibility of the
-	RCU update-side primitives to deal with this.
-
-	For SRCU readers, you can use smp_mb__after_srcu_read_unlock()
-	immediately after an srcu_read_unlock() to get a full barrier.
-
-16.	Use CONFIG_PROVE_LOCKING, CONFIG_DEBUG_OBJECTS_RCU_HEAD, and the
-	__rcu sparse checks to validate your RCU code.	These can help
-	find problems as follows:
-
-	CONFIG_PROVE_LOCKING: check that accesses to RCU-protected data
-		structures are carried out under the proper RCU
-		read-side critical section, while holding the right
-		combination of locks, or whatever other conditions
-		are appropriate.
-
-	CONFIG_DEBUG_OBJECTS_RCU_HEAD: check that you don't pass the
-		same object to call_rcu() (or friends) before an RCU
-		grace period has elapsed since the last time that you
-		passed that same object to call_rcu() (or friends).
-
-	__rcu sparse checks: tag the pointer to the RCU-protected data
-		structure with __rcu, and sparse will warn you if you
-		access that pointer without the services of one of the
-		variants of rcu_dereference().
-
-	These debugging aids can help you find problems that are
-	otherwise extremely difficult to spot.
-
-17.	If you register a callback using call_rcu() or call_srcu(), and
-	pass in a function defined within a loadable module, then it in
-	necessary to wait for all pending callbacks to be invoked after
-	the last invocation and before unloading that module.  Note that
-	it is absolutely -not- sufficient to wait for a grace period!
-	The current (say) synchronize_rcu() implementation is -not-
-	guaranteed to wait for callbacks registered on other CPUs.
-	Or even on the current CPU if that CPU recently went offline
-	and came back online.
-
-	You instead need to use one of the barrier functions:
-
-	o	call_rcu() -> rcu_barrier()
-	o	call_srcu() -> srcu_barrier()
-
-	However, these barrier functions are absolutely -not- guaranteed
-	to wait for a grace period.  In fact, if there are no call_rcu()
-	callbacks waiting anywhere in the system, rcu_barrier() is within
-	its rights to return immediately.
-
-	So if you need to wait for both an RCU grace period and for
-	all pre-existing call_rcu() callbacks, you will need to execute
-	both rcu_barrier() and synchronize_rcu(), if necessary, using
-	something like workqueues to to execute them concurrently.
-
-	See rcubarrier.txt for more information.
author	Mauro Carvalho Chehab <mchehab+huawei@kernel.org>	2020-04-21 19:04:02 +0200
committer	Paul E. McKenney <paulmck@kernel.org>	2020-06-29 20:58:10 +0200
commit	6b05dfacd761c6ace11def4b3b42fc6a7583fec3 (patch)
tree	5fc7623c686d4f35a8ba45b3640e42fd7203754b /Documentation/RCU/checklist.txt
parent	Linux 5.8-rc3 (diff)
download	linux-6b05dfacd761c6ace11def4b3b42fc6a7583fec3.tar.xz linux-6b05dfacd761c6ace11def4b3b42fc6a7583fec3.zip