rcu: Make expedited RCU CPU selection avoid unnecessary stores

This commit reworks the first loop in sync_rcu_exp_select_cpus() to avoid doing unnecssary stores to other CPUs' rcu_data structures. This speeds up that first loop by roughly a factor of two on an old x86 system. In the case where the system is mostly idle, this loop incurs a large fraction of the overhead of the synchronize_rcu_expedited(). There is less benefit on busy systems because the overhead of the smp_call_function_single() in the second loop dominates in that case. However, it is not unusual to do configuration chances involving RCU grace periods (both expedited and normal) while the system is mostly idle, so this optimization is worth doing. While we are in the area, this commit also adds parentheses to arguments used by the for_each_leaf_node_possible_cpu() macro. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
author: Paul E. McKenney <paulmck@linux.vnet.ibm.com> 2018-02-01 05:24:15 +0100
committer: Paul E. McKenney <paulmck@linux.vnet.ibm.com> 2018-02-21 01:12:29 +0100
commit: 65963d246147c46aafda2b04523d6dbe6c457e7c (patch)
tree: 1378427fb1a730e749d6cde5c8298a49a406e5f3 /kernel/rcu/rcu.h
parent: rcu: Trace expedited GP delays due to transitioning CPUs (diff)
download: linux-65963d246147c46aafda2b04523d6dbe6c457e7c.tar.xz
linux-65963d246147c46aafda2b04523d6dbe6c457e7c.zip
1 files changed, 13 insertions, 3 deletions
diff --git a/kernel/rcu/rcu.h b/kernel/rcu/rcu.h
index 507a0802c717..1c868bcfd705 100644
--- a/kernel/rcu/rcu.h
+++ b/kernel/rcu/rcu.h
@@ -301,9 +301,19 @@ static inline void rcu_init_levelspread(int *levelspread, const int *levelcnt)
  * Iterate over all possible CPUs in a leaf RCU node.
  */
 #define for_each_leaf_node_possible_cpu(rnp, cpu) \
-	for ((cpu) = cpumask_next(rnp->grplo - 1, cpu_possible_mask); \
-	     cpu <= rnp->grphi; \
-	     cpu = cpumask_next((cpu), cpu_possible_mask))
+	for ((cpu) = cpumask_next((rnp)->grplo - 1, cpu_possible_mask); \
+	     (cpu) <= rnp->grphi; \
+	     (cpu) = cpumask_next((cpu), cpu_possible_mask))
+
+/*
+ * Iterate over all CPUs in a leaf RCU node's specified mask.
+ */
+#define rcu_find_next_bit(rnp, cpu, mask) \
+	((rnp)->grplo + find_next_bit(&(mask), BITS_PER_LONG, (cpu)))
+#define for_each_leaf_node_cpu_mask(rnp, cpu, mask) \
+	for ((cpu) = rcu_find_next_bit((rnp), 0, (mask)); \
+	     (cpu) <= rnp->grphi; \
+	     (cpu) = rcu_find_next_bit((rnp), (cpu) + 1 - (rnp->grplo), (mask)))
 
 /*
  * Wrappers for the rcu_node::lock acquire and release.
author	Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-02-01 05:24:15 +0100
committer	Paul E. McKenney <paulmck@linux.vnet.ibm.com>	2018-02-21 01:12:29 +0100
commit	65963d246147c46aafda2b04523d6dbe6c457e7c (patch)
tree	1378427fb1a730e749d6cde5c8298a49a406e5f3 /kernel/rcu/rcu.h
parent	rcu: Trace expedited GP delays due to transitioning CPUs (diff)
download	linux-65963d246147c46aafda2b04523d6dbe6c457e7c.tar.xz linux-65963d246147c46aafda2b04523d6dbe6c457e7c.zip