From dc7a12bdfccd94c31f79e294f16f7549bd411b49 Mon Sep 17 00:00:00 2001 From: Mauro Carvalho Chehab Date: Sun, 14 Apr 2019 15:51:10 -0300 Subject: docs: arm: convert docs to ReST and rename to *.rst Converts ARM the text files to ReST, preparing them to be an architecture book. The conversion is actually: - add blank lines and identation in order to identify paragraphs; - fix tables markups; - add some lists markups; - mark literal blocks; - adjust title markups. At its new index.rst, let's add a :orphan: while this is not linked to the main index.rst file, in order to avoid build warnings. Signed-off-by: Mauro Carvalho Chehab Reviewed-by Corentin Labbe # For sun4i-ss --- Documentation/arm/vlocks.txt | 211 ------------------------------------------- 1 file changed, 211 deletions(-) delete mode 100644 Documentation/arm/vlocks.txt (limited to 'Documentation/arm/vlocks.txt') diff --git a/Documentation/arm/vlocks.txt b/Documentation/arm/vlocks.txt deleted file mode 100644 index 45731672c564..000000000000 --- a/Documentation/arm/vlocks.txt +++ /dev/null @@ -1,211 +0,0 @@ -vlocks for Bare-Metal Mutual Exclusion -====================================== - -Voting Locks, or "vlocks" provide a simple low-level mutual exclusion -mechanism, with reasonable but minimal requirements on the memory -system. - -These are intended to be used to coordinate critical activity among CPUs -which are otherwise non-coherent, in situations where the hardware -provides no other mechanism to support this and ordinary spinlocks -cannot be used. - - -vlocks make use of the atomicity provided by the memory system for -writes to a single memory location. To arbitrate, every CPU "votes for -itself", by storing a unique number to a common memory location. The -final value seen in that memory location when all the votes have been -cast identifies the winner. - -In order to make sure that the election produces an unambiguous result -in finite time, a CPU will only enter the election in the first place if -no winner has been chosen and the election does not appear to have -started yet. - - -Algorithm ---------- - -The easiest way to explain the vlocks algorithm is with some pseudo-code: - - - int currently_voting[NR_CPUS] = { 0, }; - int last_vote = -1; /* no votes yet */ - - bool vlock_trylock(int this_cpu) - { - /* signal our desire to vote */ - currently_voting[this_cpu] = 1; - if (last_vote != -1) { - /* someone already volunteered himself */ - currently_voting[this_cpu] = 0; - return false; /* not ourself */ - } - - /* let's suggest ourself */ - last_vote = this_cpu; - currently_voting[this_cpu] = 0; - - /* then wait until everyone else is done voting */ - for_each_cpu(i) { - while (currently_voting[i] != 0) - /* wait */; - } - - /* result */ - if (last_vote == this_cpu) - return true; /* we won */ - return false; - } - - bool vlock_unlock(void) - { - last_vote = -1; - } - - -The currently_voting[] array provides a way for the CPUs to determine -whether an election is in progress, and plays a role analogous to the -"entering" array in Lamport's bakery algorithm [1]. - -However, once the election has started, the underlying memory system -atomicity is used to pick the winner. This avoids the need for a static -priority rule to act as a tie-breaker, or any counters which could -overflow. - -As long as the last_vote variable is globally visible to all CPUs, it -will contain only one value that won't change once every CPU has cleared -its currently_voting flag. - - -Features and limitations ------------------------- - - * vlocks are not intended to be fair. In the contended case, it is the - _last_ CPU which attempts to get the lock which will be most likely - to win. - - vlocks are therefore best suited to situations where it is necessary - to pick a unique winner, but it does not matter which CPU actually - wins. - - * Like other similar mechanisms, vlocks will not scale well to a large - number of CPUs. - - vlocks can be cascaded in a voting hierarchy to permit better scaling - if necessary, as in the following hypothetical example for 4096 CPUs: - - /* first level: local election */ - my_town = towns[(this_cpu >> 4) & 0xf]; - I_won = vlock_trylock(my_town, this_cpu & 0xf); - if (I_won) { - /* we won the town election, let's go for the state */ - my_state = states[(this_cpu >> 8) & 0xf]; - I_won = vlock_lock(my_state, this_cpu & 0xf)); - if (I_won) { - /* and so on */ - I_won = vlock_lock(the_whole_country, this_cpu & 0xf]; - if (I_won) { - /* ... */ - } - vlock_unlock(the_whole_country); - } - vlock_unlock(my_state); - } - vlock_unlock(my_town); - - -ARM implementation ------------------- - -The current ARM implementation [2] contains some optimisations beyond -the basic algorithm: - - * By packing the members of the currently_voting array close together, - we can read the whole array in one transaction (providing the number - of CPUs potentially contending the lock is small enough). This - reduces the number of round-trips required to external memory. - - In the ARM implementation, this means that we can use a single load - and comparison: - - LDR Rt, [Rn] - CMP Rt, #0 - - ...in place of code equivalent to: - - LDRB Rt, [Rn] - CMP Rt, #0 - LDRBEQ Rt, [Rn, #1] - CMPEQ Rt, #0 - LDRBEQ Rt, [Rn, #2] - CMPEQ Rt, #0 - LDRBEQ Rt, [Rn, #3] - CMPEQ Rt, #0 - - This cuts down on the fast-path latency, as well as potentially - reducing bus contention in contended cases. - - The optimisation relies on the fact that the ARM memory system - guarantees coherency between overlapping memory accesses of - different sizes, similarly to many other architectures. Note that - we do not care which element of currently_voting appears in which - bits of Rt, so there is no need to worry about endianness in this - optimisation. - - If there are too many CPUs to read the currently_voting array in - one transaction then multiple transations are still required. The - implementation uses a simple loop of word-sized loads for this - case. The number of transactions is still fewer than would be - required if bytes were loaded individually. - - - In principle, we could aggregate further by using LDRD or LDM, but - to keep the code simple this was not attempted in the initial - implementation. - - - * vlocks are currently only used to coordinate between CPUs which are - unable to enable their caches yet. This means that the - implementation removes many of the barriers which would be required - when executing the algorithm in cached memory. - - packing of the currently_voting array does not work with cached - memory unless all CPUs contending the lock are cache-coherent, due - to cache writebacks from one CPU clobbering values written by other - CPUs. (Though if all the CPUs are cache-coherent, you should be - probably be using proper spinlocks instead anyway). - - - * The "no votes yet" value used for the last_vote variable is 0 (not - -1 as in the pseudocode). This allows statically-allocated vlocks - to be implicitly initialised to an unlocked state simply by putting - them in .bss. - - An offset is added to each CPU's ID for the purpose of setting this - variable, so that no CPU uses the value 0 for its ID. - - -Colophon --------- - -Originally created and documented by Dave Martin for Linaro Limited, for -use in ARM-based big.LITTLE platforms, with review and input gratefully -received from Nicolas Pitre and Achin Gupta. Thanks to Nicolas for -grabbing most of this text out of the relevant mail thread and writing -up the pseudocode. - -Copyright (C) 2012-2013 Linaro Limited -Distributed under the terms of Version 2 of the GNU General Public -License, as defined in linux/COPYING. - - -References ----------- - -[1] Lamport, L. "A New Solution of Dijkstra's Concurrent Programming - Problem", Communications of the ACM 17, 8 (August 1974), 453-455. - - https://en.wikipedia.org/wiki/Lamport%27s_bakery_algorithm - -[2] linux/arch/arm/common/vlock.S, www.kernel.org. -- cgit v1.2.3