summaryrefslogtreecommitdiffstats
path: root/tools/perf/bench/numa.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* perf header: Move is_cpu_online to numa benchIan Rogers2024-11-161-0/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The helper function is only used in the NUMA benchmark as typically online CPUs are determined through perf_cpu_map__new_online_cpus(). Reduce the scope of the function for now. Reviewed-by: James Clark <james.clark@linaro.org> Signed-off-by: Ian Rogers <irogers@google.com> Tested-by: Xu Yang <xu.yang_2@nxp.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Ben Zong-You Xie <ben717@andestech.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Bibo Mao <maobibo@loongson.cn> Cc: Clément Le Goffic <clement.legoffic@foss.st.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Will Deacon <will@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Link: https://lore.kernel.org/r/20241107162035.52206-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench numa: Fix type of loop iterator in do_work, it should be 'long'Andreas Herrmann2023-04-041-1/+1
| | | | | | | | | | | 'j' is of type int and start/end are of type 'long'. Thus 'j' might become negative and cause segfault in access_data(). Fix it by using 'long' for 'j' as well. Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Andreas Herrmann <aherrmann@suse.de> Link: https://lore.kernel.org/r/20230330074202.14052-1-aherrmann@suse.de Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Make quiet mode consistent between toolsJames Clark2022-10-271-4/+5
| | | | | | | | | | | | | | | | | | | | Use the global quiet variable everywhere so that all tools hide warnings in quiet mode and update the documentation to reflect this. 'perf probe' claimed that errors are not printed in quiet mode but I don't see this so remove it from the docs. Signed-off-by: James Clark <james.clark@arm.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20221018094137.783081-3-james.clark@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench: Update use of pthread mutex/condIan Rogers2022-10-041-59/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Switch to the use of mutex wrappers that provide better error checking. Signed-off-by: Ian Rogers <irogers@google.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Truong <alexandre.truong@arm.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andres Freund <andres@anarazel.de> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: André Almeida <andrealmeid@igalia.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Cc: Colin Ian King <colin.king@intel.com> Cc: Dario Petrillo <dario.pk1@gmail.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Dave Marchevsky <davemarchevsky@fb.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Fangrui Song <maskray@google.com> Cc: Hewenliang <hewenliang4@huawei.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jason Wang <wangborong@cdjrlc.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kim Phillips <kim.phillips@amd.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin Liška <mliska@suse.cz> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Pavithra Gurushankar <gpavithrasha@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Monnet <quentin@isovalent.com> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Remi Bernon <rbernon@codeweavers.com> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Tom Rix <trix@redhat.com> Cc: Weiguo Li <liwg06@foxmail.com> Cc: Wenyu Liu <liuwenyu7@huawei.com> Cc: William Cohen <wcohen@redhat.com> Cc: Zechuan Chen <chenzechuan1@huawei.com> Cc: bpf@vger.kernel.org Cc: llvm@lists.linux.dev Cc: yaowenbin <yaowenbin1@huawei.com> Link: https://lore.kernel.org/r/20220826164242.43412-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench numa: Address compiler error on s390Thomas Richter2022-05-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The compilation on s390 results in this error: # make DEBUG=y bench/numa.o ... bench/numa.c: In function ‘__bench_numa’: bench/numa.c:1749:81: error: ‘%d’ directive output may be truncated writing between 1 and 11 bytes into a region of size between 10 and 20 [-Werror=format-truncation=] 1749 | snprintf(tname, sizeof(tname), "process%d:thread%d", p, t); ^~ ... bench/numa.c:1749:64: note: directive argument in the range [-2147483647, 2147483646] ... # The maximum length of the %d replacement is 11 characters because of the negative sign. Therefore extend the array by two more characters. Output after: # make DEBUG=y bench/numa.o > /dev/null 2>&1; ll bench/numa.o -rw-r--r-- 1 root root 418320 May 19 09:11 bench/numa.o # Fixes: 3aff8ba0a4c9c919 ("perf bench numa: Avoid possible truncation when using snprintf()") Suggested-by: Namhyung Kim <namhyung@gmail.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20220520081158.2990006-1-tmricht@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench: Fix two numa NDEBUG warningsIan Rogers2022-05-091-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | BUG_ON is a no-op if NDEBUG is defined, otherwise it is an assert. Compiling with NDEBUG yields: bench/numa.c: In function ‘bind_to_cpu’: bench/numa.c:314:1: error: control reaches end of non-void function [-Werror=return-type] 314 | } | ^ bench/numa.c: In function ‘bind_to_node’: bench/numa.c:367:1: error: control reaches end of non-void function [-Werror=return-type] 367 | } | ^ Add return statements to cover this case. Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20220428202912.1056444-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench: Fix numa bench to fix usage of affinity for machines with #CPUs > 1KAthira Rajeev2022-04-141-33/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'perf bench numa' testcase fails on systems with more than 1K CPUs. Testcase: perf bench numa mem -p 1 -t 3 -P 512 -s 100 -zZ0qcm --thp 1 Snippet of code: <<>> perf: bench/numa.c:302: bind_to_node: Assertion `!(ret)' failed. Aborted (core dumped) <<>> bind_to_node() uses "sched_getaffinity" to save the original cpumask and this call is returning EINVAL ((invalid argument). This happens because the default mask size in glibc is 1024. To overcome this 1024 CPUs mask size limitation of cpu_set_t, change the mask size using the CPU_*_S macros ie, use CPU_ALLOC to allocate cpumask, CPU_ALLOC_SIZE for size. Apart from fixing this for "orig_mask", apply same logic to "mask" as well which is used to setaffinity so that mask size is large enough to represent number of possible CPU's in the system. sched_getaffinity is used in one more place in perf numa bench. It is in "bind_to_cpu" function. Apply the same logic there also. Though currently no failure is reported from there, it is ideal to change getaffinity to work with such system configurations having CPU's more than default mask size supported by glibc. Also fix "sched_setaffinity" to use mask size which is large enough to represent number of possible CPU's in the system. Fixed all places where "bind_cpumask" which is part of "struct thread_data" is used such that bind_cpumask works in all configuration. Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com> Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nageswara R Sastry <rnsastry@linux.ibm.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20220412164059.42654-3-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench: Fix numa testcase to check if CPU used to bind task is onlineAthira Rajeev2022-04-141-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Perf numa bench test fails with error: Testcase: ./perf bench numa mem -p 2 -t 1 -P 1024 -C 0,8 -M 1,0 -s 20 -zZq --thp 1 --no-data_rand_walk Failure snippet: <<>> Running 'numa/mem' benchmark: # Running main, "perf bench numa numa-mem -p 2 -t 1 -P 1024 -C 0,8 -M 1,0 -s 20 -zZq --thp 1 --no-data_rand_walk" perf: bench/numa.c:333: bind_to_cpumask: Assertion `!(ret)' failed. <<>> The Testcases uses CPU's 0 and 8. In function "parse_setup_cpu_list", There is check to see if cpu number is greater than max cpu's possible in the system ie via "if (bind_cpu_0 >= g->p.nr_cpus || bind_cpu_1 >= g->p.nr_cpus) {". But it could happen that system has say 48 CPU's, but only number of online CPU's is 0-7. Other CPU's are offlined. Since "g->p.nr_cpus" is 48, so function will go ahead and set bit for CPU 8 also in cpumask ( td->bind_cpumask). bind_to_cpumask function is called to set affinity using sched_setaffinity and the cpumask. Since the CPU8 is not present, set affinity will fail here with EINVAL. Fix this issue by adding a check to make sure that, CPU's provided in the input argument values are online before proceeding further and skip the test. For this, include new helper function "is_cpu_online" in "tools/perf/util/header.c". Since "BIT(x)" definition will get included from header.h, remove that from bench/numa.c Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com> Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Tested-by: Disha Goel <disgoel@linux.vnet.ibm.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nageswara R Sastry <rnsastry@linux.ibm.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Link: https://lore.kernel.org/r/20220412164059.42654-2-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Fix various typos in commentsIngo Molnar2021-03-231-1/+1
| | | | | | | | | | | Fix ~124 single-word typos and a few spelling errors in the perf tooling code, accumulated over the years. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20210321113734.GA248990@gmail.com Link: http://lore.kernel.org/lkml/20210323160915.GA61903@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench numa: Fix the condition checks for max number of NUMA nodesAthira Rajeev2021-03-061-13/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In systems having higher node numbers available like node 255, perf numa bench will fail with SIGABORT. <<>> perf: bench/numa.c:1416: init: Assertion `!(g->p.nr_nodes > 64 || g->p.nr_nodes < 0)' failed. Aborted (core dumped) <<>> Snippet from 'numactl -H' below on a powerpc system where the highest node number available is 255: available: 6 nodes (0,8,252-255) node 0 cpus: <cpu-list> node 0 size: 519587 MB node 0 free: 516659 MB node 8 cpus: <cpu-list> node 8 size: 523607 MB node 8 free: 486757 MB node 252 cpus: node 252 size: 0 MB node 252 free: 0 MB node 253 cpus: node 253 size: 0 MB node 253 free: 0 MB node 254 cpus: node 254 size: 0 MB node 254 free: 0 MB node 255 cpus: node 255 size: 0 MB node 255 free: 0 MB node distances: node 0 8 252 253 254 255 Note: <cpu-list> expands to actual cpu list in the original output. These nodes 252-255 are to represent the memory on GPUs and are valid nodes. The perf numa bench init code has a condition check to see if the number of NUMA nodes (nr_nodes) exceeds MAX_NR_NODES. The value of MAX_NR_NODES defined in perf code is 64. And the 'nr_nodes' is the value from numa_max_node() which represents the highest node number available in the system. In some systems where we could have NUMA node 255, this condition check fails and results in SIGABORT. The numa benchmark uses static value of MAX_NR_NODES in the code to represent size of two NUMA node arrays and node bitmask used for setting memory policy. Patch adds a fix to dynamically allocate size for the two arrays and bitmask value based on the node numbers available in the system. With the fix, perf numa benchmark will work with node configuration on any system and thus removes the static MAX_NR_NODES value. Signed-off-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lore.kernel.org/lkml/1614271802-1503-1-git-send-email-atrajeev@linux.vnet.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench: Use condition variables in numa.Ian Rogers2020-10-141-21/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The existing approach to synchronization between threads in the numa benchmark is unbalanced mutexes. This synchronization causes thread sanitizer to warn of locks being taken twice on a thread without an unlock, as well as unlocks with no corresponding locks. This change replaces the synchronization with more regular condition variables. While this fixes one class of thread sanitizer warnings, there still remain warnings of data races due to threads reading and writing shared memory without any atomics. Committer testing: Basic run on a non-NUMA machine. # perf bench numa # List of available benchmarks for collection 'numa': mem: Benchmark for NUMA workloads all: Run all NUMA benchmarks # perf bench numa all # Running numa/mem benchmark... # Running main, "perf bench numa numa-mem" # # Running test on: Linux five 5.8.12-200.fc32.x86_64 #1 SMP Mon Sep 28 12:17:31 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux # # Running RAM-bw-local, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0 -M 0 -s 20 -zZq --thp 1 --no-data_rand_walk" 20.076 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.073 secs average thread-runtime 0.190 % difference between max/avg runtime 241.828 GB data processed, per thread 241.828 GB data processed, total 0.083 nsecs/byte/thread runtime 12.045 GB/sec/thread speed 12.045 GB/sec total speed # Running RAM-bw-local-NOTHP, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0 -M 0 -s 20 -zZq --thp 1 --no-data_rand_walk --thp -1" 20.045 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.014 secs average thread-runtime 0.111 % difference between max/avg runtime 234.304 GB data processed, per thread 234.304 GB data processed, total 0.086 nsecs/byte/thread runtime 11.689 GB/sec/thread speed 11.689 GB/sec total speed # Running RAM-bw-remote, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0 -M 1 -s 20 -zZq --thp 1 --no-data_rand_walk" Test not applicable, system has only 1 nodes. # Running RAM-bw-local-2x, "perf bench numa mem -p 2 -t 1 -P 1024 -C 0,2 -M 0x2 -s 20 -zZq --thp 1 --no-data_rand_walk" 20.138 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.121 secs average thread-runtime 0.342 % difference between max/avg runtime 135.961 GB data processed, per thread 271.922 GB data processed, total 0.148 nsecs/byte/thread runtime 6.752 GB/sec/thread speed 13.503 GB/sec total speed # Running RAM-bw-remote-2x, "perf bench numa mem -p 2 -t 1 -P 1024 -C 0,2 -M 1x2 -s 20 -zZq --thp 1 --no-data_rand_walk" Test not applicable, system has only 1 nodes. # Running RAM-bw-cross, "perf bench numa mem -p 2 -t 1 -P 1024 -C 0,8 -M 1,0 -s 20 -zZq --thp 1 --no-data_rand_walk" Test not applicable, system has only 1 nodes. # Running 1x3-convergence, "perf bench numa mem -p 1 -t 3 -P 512 -s 100 -zZ0qcm --thp 1" 0.747 secs latency to NUMA-converge 0.747 secs slowest (max) thread-runtime 0.000 secs fastest (min) thread-runtime 0.714 secs average thread-runtime 50.000 % difference between max/avg runtime 3.228 GB data processed, per thread 9.683 GB data processed, total 0.231 nsecs/byte/thread runtime 4.321 GB/sec/thread speed 12.964 GB/sec total speed # Running 1x4-convergence, "perf bench numa mem -p 1 -t 4 -P 512 -s 100 -zZ0qcm --thp 1" 1.127 secs latency to NUMA-converge 1.127 secs slowest (max) thread-runtime 1.000 secs fastest (min) thread-runtime 1.089 secs average thread-runtime 5.624 % difference between max/avg runtime 3.765 GB data processed, per thread 15.062 GB data processed, total 0.299 nsecs/byte/thread runtime 3.342 GB/sec/thread speed 13.368 GB/sec total speed # Running 1x6-convergence, "perf bench numa mem -p 1 -t 6 -P 1020 -s 100 -zZ0qcm --thp 1" 1.003 secs latency to NUMA-converge 1.003 secs slowest (max) thread-runtime 0.000 secs fastest (min) thread-runtime 0.889 secs average thread-runtime 50.000 % difference between max/avg runtime 2.141 GB data processed, per thread 12.847 GB data processed, total 0.469 nsecs/byte/thread runtime 2.134 GB/sec/thread speed 12.805 GB/sec total speed # Running 2x3-convergence, "perf bench numa mem -p 2 -t 3 -P 1020 -s 100 -zZ0qcm --thp 1" 1.814 secs latency to NUMA-converge 1.814 secs slowest (max) thread-runtime 1.000 secs fastest (min) thread-runtime 1.716 secs average thread-runtime 22.440 % difference between max/avg runtime 3.747 GB data processed, per thread 22.483 GB data processed, total 0.484 nsecs/byte/thread runtime 2.065 GB/sec/thread speed 12.393 GB/sec total speed # Running 3x3-convergence, "perf bench numa mem -p 3 -t 3 -P 1020 -s 100 -zZ0qcm --thp 1" 2.065 secs latency to NUMA-converge 2.065 secs slowest (max) thread-runtime 1.000 secs fastest (min) thread-runtime 1.947 secs average thread-runtime 25.788 % difference between max/avg runtime 2.855 GB data processed, per thread 25.694 GB data processed, total 0.723 nsecs/byte/thread runtime 1.382 GB/sec/thread speed 12.442 GB/sec total speed # Running 4x4-convergence, "perf bench numa mem -p 4 -t 4 -P 512 -s 100 -zZ0qcm --thp 1" 1.912 secs latency to NUMA-converge 1.912 secs slowest (max) thread-runtime 1.000 secs fastest (min) thread-runtime 1.775 secs average thread-runtime 23.852 % difference between max/avg runtime 1.479 GB data processed, per thread 23.668 GB data processed, total 1.293 nsecs/byte/thread runtime 0.774 GB/sec/thread speed 12.378 GB/sec total speed # Running 4x4-convergence-NOTHP, "perf bench numa mem -p 4 -t 4 -P 512 -s 100 -zZ0qcm --thp 1 --thp -1" 1.783 secs latency to NUMA-converge 1.783 secs slowest (max) thread-runtime 1.000 secs fastest (min) thread-runtime 1.633 secs average thread-runtime 21.960 % difference between max/avg runtime 1.345 GB data processed, per thread 21.517 GB data processed, total 1.326 nsecs/byte/thread runtime 0.754 GB/sec/thread speed 12.067 GB/sec total speed # Running 4x6-convergence, "perf bench numa mem -p 4 -t 6 -P 1020 -s 100 -zZ0qcm --thp 1" 5.396 secs latency to NUMA-converge 5.396 secs slowest (max) thread-runtime 4.000 secs fastest (min) thread-runtime 4.928 secs average thread-runtime 12.937 % difference between max/avg runtime 2.721 GB data processed, per thread 65.306 GB data processed, total 1.983 nsecs/byte/thread runtime 0.504 GB/sec/thread speed 12.102 GB/sec total speed # Running 4x8-convergence, "perf bench numa mem -p 4 -t 8 -P 512 -s 100 -zZ0qcm --thp 1" 3.121 secs latency to NUMA-converge 3.121 secs slowest (max) thread-runtime 2.000 secs fastest (min) thread-runtime 2.836 secs average thread-runtime 17.962 % difference between max/avg runtime 1.194 GB data processed, per thread 38.192 GB data processed, total 2.615 nsecs/byte/thread runtime 0.382 GB/sec/thread speed 12.236 GB/sec total speed # Running 8x4-convergence, "perf bench numa mem -p 8 -t 4 -P 512 -s 100 -zZ0qcm --thp 1" 4.302 secs latency to NUMA-converge 4.302 secs slowest (max) thread-runtime 3.000 secs fastest (min) thread-runtime 4.045 secs average thread-runtime 15.133 % difference between max/avg runtime 1.631 GB data processed, per thread 52.178 GB data processed, total 2.638 nsecs/byte/thread runtime 0.379 GB/sec/thread speed 12.128 GB/sec total speed # Running 8x4-convergence-NOTHP, "perf bench numa mem -p 8 -t 4 -P 512 -s 100 -zZ0qcm --thp 1 --thp -1" 4.418 secs latency to NUMA-converge 4.418 secs slowest (max) thread-runtime 3.000 secs fastest (min) thread-runtime 4.104 secs average thread-runtime 16.045 % difference between max/avg runtime 1.664 GB data processed, per thread 53.254 GB data processed, total 2.655 nsecs/byte/thread runtime 0.377 GB/sec/thread speed 12.055 GB/sec total speed # Running 3x1-convergence, "perf bench numa mem -p 3 -t 1 -P 512 -s 100 -zZ0qcm --thp 1" 0.973 secs latency to NUMA-converge 0.973 secs slowest (max) thread-runtime 0.000 secs fastest (min) thread-runtime 0.955 secs average thread-runtime 50.000 % difference between max/avg runtime 4.124 GB data processed, per thread 12.372 GB data processed, total 0.236 nsecs/byte/thread runtime 4.238 GB/sec/thread speed 12.715 GB/sec total speed # Running 4x1-convergence, "perf bench numa mem -p 4 -t 1 -P 512 -s 100 -zZ0qcm --thp 1" 0.820 secs latency to NUMA-converge 0.820 secs slowest (max) thread-runtime 0.000 secs fastest (min) thread-runtime 0.808 secs average thread-runtime 50.000 % difference between max/avg runtime 2.555 GB data processed, per thread 10.220 GB data processed, total 0.321 nsecs/byte/thread runtime 3.117 GB/sec/thread speed 12.468 GB/sec total speed # Running 8x1-convergence, "perf bench numa mem -p 8 -t 1 -P 512 -s 100 -zZ0qcm --thp 1" 0.667 secs latency to NUMA-converge 0.667 secs slowest (max) thread-runtime 0.000 secs fastest (min) thread-runtime 0.607 secs average thread-runtime 50.000 % difference between max/avg runtime 1.009 GB data processed, per thread 8.069 GB data processed, total 0.661 nsecs/byte/thread runtime 1.512 GB/sec/thread speed 12.095 GB/sec total speed # Running 16x1-convergence, "perf bench numa mem -p 16 -t 1 -P 256 -s 100 -zZ0qcm --thp 1" 1.546 secs latency to NUMA-converge 1.546 secs slowest (max) thread-runtime 1.000 secs fastest (min) thread-runtime 1.485 secs average thread-runtime 17.664 % difference between max/avg runtime 1.162 GB data processed, per thread 18.594 GB data processed, total 1.331 nsecs/byte/thread runtime 0.752 GB/sec/thread speed 12.025 GB/sec total speed # Running 32x1-convergence, "perf bench numa mem -p 32 -t 1 -P 128 -s 100 -zZ0qcm --thp 1" 0.812 secs latency to NUMA-converge 0.812 secs slowest (max) thread-runtime 0.000 secs fastest (min) thread-runtime 0.739 secs average thread-runtime 50.000 % difference between max/avg runtime 0.309 GB data processed, per thread 9.874 GB data processed, total 2.630 nsecs/byte/thread runtime 0.380 GB/sec/thread speed 12.166 GB/sec total speed # Running 2x1-bw-process, "perf bench numa mem -p 2 -t 1 -P 1024 -s 20 -zZ0q --thp 1" 20.044 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.020 secs average thread-runtime 0.109 % difference between max/avg runtime 125.750 GB data processed, per thread 251.501 GB data processed, total 0.159 nsecs/byte/thread runtime 6.274 GB/sec/thread speed 12.548 GB/sec total speed # Running 3x1-bw-process, "perf bench numa mem -p 3 -t 1 -P 1024 -s 20 -zZ0q --thp 1" 20.148 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.090 secs average thread-runtime 0.367 % difference between max/avg runtime 85.267 GB data processed, per thread 255.800 GB data processed, total 0.236 nsecs/byte/thread runtime 4.232 GB/sec/thread speed 12.696 GB/sec total speed # Running 4x1-bw-process, "perf bench numa mem -p 4 -t 1 -P 1024 -s 20 -zZ0q --thp 1" 20.169 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.100 secs average thread-runtime 0.419 % difference between max/avg runtime 63.144 GB data processed, per thread 252.576 GB data processed, total 0.319 nsecs/byte/thread runtime 3.131 GB/sec/thread speed 12.523 GB/sec total speed # Running 8x1-bw-process, "perf bench numa mem -p 8 -t 1 -P 512 -s 20 -zZ0q --thp 1" 20.175 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.107 secs average thread-runtime 0.433 % difference between max/avg runtime 31.267 GB data processed, per thread 250.133 GB data processed, total 0.645 nsecs/byte/thread runtime 1.550 GB/sec/thread speed 12.398 GB/sec total speed # Running 8x1-bw-process-NOTHP, "perf bench numa mem -p 8 -t 1 -P 512 -s 20 -zZ0q --thp 1 --thp -1" 20.216 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.113 secs average thread-runtime 0.535 % difference between max/avg runtime 30.998 GB data processed, per thread 247.981 GB data processed, total 0.652 nsecs/byte/thread runtime 1.533 GB/sec/thread speed 12.266 GB/sec total speed # Running 16x1-bw-process, "perf bench numa mem -p 16 -t 1 -P 256 -s 20 -zZ0q --thp 1" 20.234 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.174 secs average thread-runtime 0.577 % difference between max/avg runtime 15.377 GB data processed, per thread 246.039 GB data processed, total 1.316 nsecs/byte/thread runtime 0.760 GB/sec/thread speed 12.160 GB/sec total speed # Running 1x4-bw-thread, "perf bench numa mem -p 1 -t 4 -T 256 -s 20 -zZ0q --thp 1" 20.040 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.028 secs average thread-runtime 0.099 % difference between max/avg runtime 66.832 GB data processed, per thread 267.328 GB data processed, total 0.300 nsecs/byte/thread runtime 3.335 GB/sec/thread speed 13.340 GB/sec total speed # Running 1x8-bw-thread, "perf bench numa mem -p 1 -t 8 -T 256 -s 20 -zZ0q --thp 1" 20.064 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.034 secs average thread-runtime 0.160 % difference between max/avg runtime 32.911 GB data processed, per thread 263.286 GB data processed, total 0.610 nsecs/byte/thread runtime 1.640 GB/sec/thread speed 13.122 GB/sec total speed # Running 1x16-bw-thread, "perf bench numa mem -p 1 -t 16 -T 128 -s 20 -zZ0q --thp 1" 20.092 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.052 secs average thread-runtime 0.230 % difference between max/avg runtime 16.131 GB data processed, per thread 258.088 GB data processed, total 1.246 nsecs/byte/thread runtime 0.803 GB/sec/thread speed 12.845 GB/sec total speed # Running 1x32-bw-thread, "perf bench numa mem -p 1 -t 32 -T 64 -s 20 -zZ0q --thp 1" 20.099 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.063 secs average thread-runtime 0.247 % difference between max/avg runtime 7.962 GB data processed, per thread 254.773 GB data processed, total 2.525 nsecs/byte/thread runtime 0.396 GB/sec/thread speed 12.676 GB/sec total speed # Running 2x3-bw-process, "perf bench numa mem -p 2 -t 3 -P 512 -s 20 -zZ0q --thp 1" 20.150 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.120 secs average thread-runtime 0.372 % difference between max/avg runtime 44.827 GB data processed, per thread 268.960 GB data processed, total 0.450 nsecs/byte/thread runtime 2.225 GB/sec/thread speed 13.348 GB/sec total speed # Running 4x4-bw-process, "perf bench numa mem -p 4 -t 4 -P 512 -s 20 -zZ0q --thp 1" 20.258 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.168 secs average thread-runtime 0.636 % difference between max/avg runtime 17.079 GB data processed, per thread 273.263 GB data processed, total 1.186 nsecs/byte/thread runtime 0.843 GB/sec/thread speed 13.489 GB/sec total speed # Running 4x6-bw-process, "perf bench numa mem -p 4 -t 6 -P 512 -s 20 -zZ0q --thp 1" 20.559 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.382 secs average thread-runtime 1.359 % difference between max/avg runtime 10.758 GB data processed, per thread 258.201 GB data processed, total 1.911 nsecs/byte/thread runtime 0.523 GB/sec/thread speed 12.559 GB/sec total speed # Running 4x8-bw-process, "perf bench numa mem -p 4 -t 8 -P 512 -s 20 -zZ0q --thp 1" 20.744 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.516 secs average thread-runtime 1.792 % difference between max/avg runtime 8.069 GB data processed, per thread 258.201 GB data processed, total 2.571 nsecs/byte/thread runtime 0.389 GB/sec/thread speed 12.447 GB/sec total speed # Running 4x8-bw-process-NOTHP, "perf bench numa mem -p 4 -t 8 -P 512 -s 20 -zZ0q --thp 1 --thp -1" 20.855 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.561 secs average thread-runtime 2.050 % difference between max/avg runtime 8.069 GB data processed, per thread 258.201 GB data processed, total 2.585 nsecs/byte/thread runtime 0.387 GB/sec/thread speed 12.381 GB/sec total speed # Running 3x3-bw-process, "perf bench numa mem -p 3 -t 3 -P 512 -s 20 -zZ0q --thp 1" 20.134 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.077 secs average thread-runtime 0.333 % difference between max/avg runtime 28.091 GB data processed, per thread 252.822 GB data processed, total 0.717 nsecs/byte/thread runtime 1.395 GB/sec/thread speed 12.557 GB/sec total speed # Running 5x5-bw-process, "perf bench numa mem -p 5 -t 5 -P 512 -s 20 -zZ0q --thp 1" 20.588 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.375 secs average thread-runtime 1.427 % difference between max/avg runtime 10.177 GB data processed, per thread 254.436 GB data processed, total 2.023 nsecs/byte/thread runtime 0.494 GB/sec/thread speed 12.359 GB/sec total speed # Running 2x16-bw-process, "perf bench numa mem -p 2 -t 16 -P 512 -s 20 -zZ0q --thp 1" 20.657 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.429 secs average thread-runtime 1.589 % difference between max/avg runtime 8.170 GB data processed, per thread 261.429 GB data processed, total 2.528 nsecs/byte/thread runtime 0.395 GB/sec/thread speed 12.656 GB/sec total speed # Running 1x32-bw-process, "perf bench numa mem -p 1 -t 32 -P 2048 -s 20 -zZ0q --thp 1" 22.981 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 21.996 secs average thread-runtime 6.486 % difference between max/avg runtime 8.863 GB data processed, per thread 283.606 GB data processed, total 2.593 nsecs/byte/thread runtime 0.386 GB/sec/thread speed 12.341 GB/sec total speed # Running numa02-bw, "perf bench numa mem -p 1 -t 32 -T 32 -s 20 -zZ0q --thp 1" 20.047 secs slowest (max) thread-runtime 19.000 secs fastest (min) thread-runtime 20.026 secs average thread-runtime 2.611 % difference between max/avg runtime 8.441 GB data processed, per thread 270.111 GB data processed, total 2.375 nsecs/byte/thread runtime 0.421 GB/sec/thread speed 13.474 GB/sec total speed # Running numa02-bw-NOTHP, "perf bench numa mem -p 1 -t 32 -T 32 -s 20 -zZ0q --thp 1 --thp -1" 20.088 secs slowest (max) thread-runtime 19.000 secs fastest (min) thread-runtime 20.025 secs average thread-runtime 2.709 % difference between max/avg runtime 8.411 GB data processed, per thread 269.142 GB data processed, total 2.388 nsecs/byte/thread runtime 0.419 GB/sec/thread speed 13.398 GB/sec total speed # Running numa01-bw-thread, "perf bench numa mem -p 2 -t 16 -T 192 -s 20 -zZ0q --thp 1" 20.293 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.175 secs average thread-runtime 0.721 % difference between max/avg runtime 7.918 GB data processed, per thread 253.374 GB data processed, total 2.563 nsecs/byte/thread runtime 0.390 GB/sec/thread speed 12.486 GB/sec total speed # Running numa01-bw-thread-NOTHP, "perf bench numa mem -p 2 -t 16 -T 192 -s 20 -zZ0q --thp 1 --thp -1" 20.411 secs slowest (max) thread-runtime 20.000 secs fastest (min) thread-runtime 20.226 secs average thread-runtime 1.006 % difference between max/avg runtime 7.931 GB data processed, per thread 253.778 GB data processed, total 2.574 nsecs/byte/thread runtime 0.389 GB/sec/thread speed 12.434 GB/sec total speed # Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20201012161611.366482-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench numa: Remove dead code in parse_nodes_opt()Peng Fan2020-08-141-2/+0
| | | | | | | | | | | | | In the function parse_nodes_opt(), the statement "return 0;" is dead code, remove it. Signed-off-by: Peng Fan <fanpeng@loongson.cn> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/1597401894-27549-1-git-send-email-fanpeng@loongson.cn Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench numa: Use numa_node_to_cpus() to bind tasks to nodesAlexander Gordeev2020-08-131-17/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | It is currently assumed that each node contains at most nr_cpus/nr_nodes CPUs and nodes' CPU ranges do not overlap. That assumption is generally incorrect as there are archs where a CPU number does not depend on to its node number. This update removes the described assumption by simply calling numa_node_to_cpus() interface and using the returned mask for binding CPUs to nodes. Also, variable types and names made consistent in functions using cpumask. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Balamuruhan S <bala24@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Link: http://lore.kernel.org/lkml/20200813113247.GA2014@oc3871087118.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench numa: Fix cpumask memory leak in node_has_cpus()Alexander Gordeev2020-08-131-4/+9
| | | | | | | | | | | | | | | | | Couple numa_allocate_cpumask() and numa_free_cpumask() functions Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Balamuruhan S <bala24@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Link: http://lore.kernel.org/lkml/20200813113041.GA1685@oc3871087118.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench numa: Fix benchmark namesAlexander Gordeev2020-08-121-18/+17
| | | | | | | | | | | | | | | | | | | Standard benchmark names let users know the tests specifics. For example "2x1-bw-process" name tells that two processes one thread each are run and the RAM bandwidth is measured. Several benchmarks names do not correspond to their actual running configuration. Fix that and also some whitespace and comment inconsistencies. Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/6b6f2084f132ee8e9203dc7c32f9deb209b87a68.1597004831.git.agordeev@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench numa: Fix number of processes in "2x3-convergence" testAlexander Gordeev2020-08-121-1/+1
| | | | | | | | | | | Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/d949f5f48e17fc816f3beecf8479f1b2480345e4.1597004831.git.agordeev@linux.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Remove needless builtin.h include directivesArnaldo Carvalho de Melo2019-09-201-1/+0
| | | | | | | | | | | | Now that builtin.h isn't included by any other header, we can check where it is really needed, i.e. we can remove it and be sure that it isn't being obtained indirectly. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-mn7jheex85iw9qo6tlv26hb2@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Remove perf.h from source files not needing itArnaldo Carvalho de Melo2019-08-291-1/+0
| | | | | | | | | | | With the movement of lots of stuff out of perf.h to other headers we ended up not needing it in lots of places, remove it from those places. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-c718m0sxxwp73lp9d8vpihb4@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench numa: Fix cpu0 bindingJiri Olsa2019-08-011-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Michael reported an issue with perf bench numa failing with binding to cpu0 with '-0' option. # perf bench numa mem -p 3 -t 1 -P 512 -s 100 -zZcm0 --thp 1 -M 1 -ddd # Running 'numa/mem' benchmark: # Running main, "perf bench numa numa-mem -p 3 -t 1 -P 512 -s 100 -zZcm0 --thp 1 -M 1 -ddd" binding to node 0, mask: 0000000000000001 => -1 perf: bench/numa.c:356: bind_to_memnode: Assertion `!(ret)' failed. Aborted (core dumped) This happens when the cpu0 is not part of node0, which is the benchmark assumption and we can see that's not the case for some powerpc servers. Using correct node for cpu0 binding. Reported-by: Michael Petlan <mpetlan@redhat.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20190801142642.28004-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* tools lib: Adopt zalloc()/zfree() from tools/perfArnaldo Carvalho de Melo2019-07-091-1/+1
| | | | | | | | | | Eroding a bit more the tools/perf/util/util.h hodpodge header. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lkml.kernel.org/n/tip-natazosyn9rwjka25tvcnyi0@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench numa: Add define for RUSAGE_THREAD if not presentArnaldo Carvalho de Melo2019-05-021-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While cross building perf to the ARC architecture on a fedora 30 host, we were failing with: CC /tmp/build/perf/bench/numa.o bench/numa.c: In function ‘worker_thread’: bench/numa.c:1261:12: error: ‘RUSAGE_THREAD’ undeclared (first use in this function); did you mean ‘SIGEV_THREAD’? getrusage(RUSAGE_THREAD, &rusage); ^~~~~~~~~~~~~ SIGEV_THREAD bench/numa.c:1261:12: note: each undeclared identifier is reported only once for each function it appears in [perfbuilder@60d5802468f6 perf]$ /arc_gnu_2019.03-rc1_prebuilt_uclibc_le_archs_linux_install/bin/arc-linux-gcc --version | head -1 arc-linux-gcc (ARCv2 ISA Linux uClibc toolchain 2019.03-rc1) 8.3.1 20190225 [perfbuilder@60d5802468f6 perf]$ Trying to reproduce a report by Vineet, I noticed that, with just cross-built zlib and numactl libraries, I ended up with the above failure. So, since RUSAGE_THREAD is available as a define, check for that and numactl libraries, I ended up with the above failure. So, since RUSAGE_THREAD is available as a define in the system headers, check if it is defined in the 'perf bench numa' sources and define it if not. Now it builds and I have to figure out if the problem reported by Vineet only takes place if we have libelf or some other library available. Cc: Arnd Bergmann <arnd@arndb.de> Cc: Jiri Olsa <jolsa@kernel.org> Cc: linux-snps-arc@lists.infradead.org Cc: Namhyung Kim <namhyung@kernel.org> Cc: Vineet Gupta <Vineet.Gupta1@synopsys.com> Link: https://lkml.kernel.org/n/tip-2wb4r1gir9xrevbpq7qp0amk@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* tools/: replace open encodings for NUMA_NO_NODEStephen Rothwell2019-03-061-3/+4
| | | | | | | | | | | | | | | | | | | | | | This replaces all open encodings in tools with NUMA_NO_NODE. Also linux/numa.h is now needed for the perf build. [sfr@canb.auug.org.au: fix for replace open encodings for NUMA_NO_NODE] Link: http://lkml.kernel.org/r/20190108131141.730e9c4f@canb.auug.org.au Link: http://lkml.kernel.org/r/1545127933-10711-3-git-send-email-anshuman.khandual@arm.com Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: David Hildenbrand <david@redhat.com> Cc: Doug Ledford <dledford@redhat.com> [drivers/infiniband] Cc: Hans Verkuil <hverkuil@xs4all.nl> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> [ixgbe] Cc: Jens Axboe <axboe@kernel.dk> [mtip32xx] Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Michael Ellerman <mpe@ellerman.id.au> [powerpc] Cc: Vinod Koul <vkoul@kernel.org> [dmaengine.c] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* perf bench: Fix numa report output codeJiri Olsa2018-06-251-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we can hit following assert when running numa bench: $ perf bench numa mem -p 3 -t 1 -P 512 -s 100 -zZ0cm --thp 1 perf: bench/numa.c:1577: __bench_numa: Assertion `!(!(((wait_stat) & 0x7f) == 0))' failed. The assertion is correct, because we hit the SIGFPE in following line: Thread 2.2 "thread 0/0" received signal SIGFPE, Arithmetic exception. [Switching to Thread 0x7fffd28c6700 (LWP 11750)] 0x000.. in worker_thread (__tdata=0x7.. ) at bench/numa.c:1257 1257 td->speed_gbs = bytes_done / (td->runtime_ns / NSEC_PER_SEC) / 1e9; We don't check if the runtime is actually bigger than 1 second, and thus this might end up with zero division within FPU. Adding the check to prevent this. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20180620094036.17278-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench numa: Fix typo in optionsYisheng Xie2018-05-071-1/+1
| | | | | | | | | | 'R' means access the data via reads instead of writes, fix this typo. Signed-off-by: Yisheng Xie <xieyisheng1@huawei.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1524644707-11030-1-git-send-email-xieyisheng1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench numa: Fixup discontiguous/sparse numa nodesSatheesh Rajendran2017-11-281-5/+51
| | | | | | | | | | | | | | Certain systems are designed to have sparse/discontiguous nodes. On such systems, 'perf bench numa' hangs, shows wrong number of nodes and shows values for non-existent nodes. Handle this by only taking nodes that are exposed by kernel to userspace. Signed-off-by: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com> Reviewed-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/1edbcd353c009e109e93d78f2f46381930c340fe.1511368645.git.sathnaga@linux.vnet.ibm.com Signed-off-by: Balamuruhan S <bala24@linux.vnet.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman2017-11-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* perf tools: Use __maybe_unused consistentlyArnaldo Carvalho de Melo2017-06-191-1/+1
| | | | | | | | | | | | Instead of defining __unused or redefining __maybe_unused. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-4eleto5pih31jw1q4dypm9pf@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Including missing inttypes.h headerArnaldo Carvalho de Melo2017-04-191-0/+1
| | | | | | | | | | | | Needed to use the PRI[xu](32,64) formatting macros. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-wkbho8kaw24q67dd11q0j39f@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Add include <linux/kernel.h> where ARRAY_SIZE() is usedArnaldo Carvalho de Melo2017-04-191-0/+1
| | | | | | | | | | | | | To pave the way for further cleanups where linux/kernel.h may stop being included in some header. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-qqxan6tfsl6qx3l0v3nwgjvk@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Remove unused 'prefix' from builtin functionsArnaldo Carvalho de Melo2017-03-271-1/+1
| | | | | | | | | | | | | | | | | | | | | We got it from the git sources but never used it for anything, with the place where this would be somehow used remaining: static int run_builtin(struct cmd_struct *p, int argc, const char **argv) { prefix = NULL; if (p->option & RUN_SETUP) prefix = NULL; /* setup_perf_directory(); */ Ditch it. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-uw5swz05vol0qpr32c5lpvus@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench numa: Add more comment for -c optionJiri Olsa2017-03-061-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | Adding more commentary for -c/--show_convergence option, to explain how the convergence is defined. Before: -c, --show_convergence show convergence details Now: -c, --show_convergence convergence is reached when each process \ (all its threads) is running on a single NUMA node. Suggested--by: Jiri Hladky <jhladky@redhat.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Hladky <jhladky@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1488732011-27384-1-git-send-email-jolsa@kernel.org [ Rephrased a bit based on a IRC conversation with Jiri ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench numa: Make sure dprintf() is not definedArnaldo Carvalho de Melo2017-02-141-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | When building with clang we get this error: bench/numa.c:46:9: error: 'dprintf' macro redefined [-Werror,-Wmacro-redefined] #define dprintf(x...) do { if (g && g->p.show_details >= 1) printf(x); } while (0) ^ /usr/include/bits/stdio2.h:145:12: note: previous definition is here # define dprintf(fd, ...) \ ^ CC /tmp/build/perf/tests/parse-no-sample-id-all.o 1 error generated. So, make sure it is undefined before using that name. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Jakub Jelen <jjelen@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-f654o2svtrutamvxt7igwz74@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench numa: Avoid possible truncation when using snprintf()Arnaldo Carvalho de Melo2017-02-091-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Addressing this warning from gcc 7: CC /tmp/build/perf/bench/numa.o bench/numa.c: In function '__bench_numa': bench/numa.c:1582:42: error: '%d' directive output may be truncated writing between 1 and 10 bytes into a region of size between 8 and 17 [-Werror=format-truncation=] snprintf(tname, 32, "process%d:thread%d", p, t); ^~ bench/numa.c:1582:25: note: directive argument in the range [0, 2147483647] snprintf(tname, 32, "process%d:thread%d", p, t); ^~~~~~~~~~~~~~~~~~~~ In file included from /usr/include/stdio.h:939:0, from bench/../util/util.h:47, from bench/../builtin.h:4, from bench/numa.c:11: /usr/include/bits/stdio2.h:64:10: note: '__builtin___snprintf_chk' output between 17 and 35 bytes into a destination of size 32 return __builtin___snprintf_chk (__s, __n, __USE_FORTIFY_LEVEL - 1, ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ __bos (__s), __fmt, __va_arg_pack ()); ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ cc1: all warnings being treated as errors Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Petr Holasek <pholasek@redhat.com> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-twa37vsfqcie5gwpqwnjuuz9@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench numa: Use NSEC_PER_U?SECArnaldo Carvalho de Melo2016-08-231-26/+27
| | | | | | | | | | | | | Following kernel practices, using linux/time64.h Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-7vnv15263y50qku76p4w5xk6@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench: Add missing pthread.h include for CPU_*() macrosArnaldo Carvalho de Melo2016-07-121-1/+3
| | | | | | | | | | | Cc: David Ahern <dsahern@gmail.com> Cc: Davidlohr Bueso <dbueso@suse.de> Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Wang Nan <wangnan0@huawei.com> Link: http://lkml.kernel.org/n/tip-48qbfv7tqs8n8ey74lbyfjtq@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench numa: Fix assertion for nodes bitfieldJakub Jelen2016-03-211-1/+1
| | | | | | | | | | | | | | | | | | | | | Comparing bits and bytes in numa benchmark assertion I hit the issue on two socket Power8 machine presenting its numa nodes as 0,1,16,17 (according to numactl). Therefore I got error (and hang of parent process): perf: bench/numa.c:296: bind_to_memnode: Assertion `!(g->p.nr_nodes > (int)sizeof(nodemask))' failed. This is obviously false positive. We can fit all the 18 nodes into bitfield of 8 bytes (long on 64b architecture). Signed-off-by: Jakub Jelen <jakuje@gmail.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jakub Jelen <jjelen@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: trivial@kernel.org Link: http://lkml.kernel.org/r/1458388687-24421-1-git-send-email-jakuje@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf subcmd: Create subcmd libraryJosh Poimboeuf2015-12-171-1/+1
| | | | | | | | | | | | | | | Move the subcommand-related files from perf to a new library named libsubcmd.a. Since we're moving files anyway, go ahead and rename 'exec_cmd.*' to 'exec-cmd.*' to be consistent with the naming of all the other files. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/c0a838d4c878ab17fee50998811612b2281355c1.1450193761.git.jpoimboe@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench: Harmonize all the -l/--nr_loops optionsIngo Molnar2015-10-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have three benchmarking subsystems that specify some sort of 'number of loops' parameter - but all of them do it inconsistently: numa: -l/--nr_loops sched messaging: -l/--loops mem memset/memcpy: -i/--iterations Harmonize them to -l/--nr_loops by picking the numa variant - which is also the most likely one to have existing scripting which we don't want to break. Plus improve the parameter help texts to indicate the default value for the nr_loops variable to keep users from guessing ... Also propagate the naming to internal variables. Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1445241870-24854-13-git-send-email-mingo@kernel.org [ Let the harmonisation reach the perf-bench man page as well ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench numa: Share sched_getcpu() __weak def with cloexec.cArnaldo Carvalho de Melo2015-05-181-0/+1
| | | | | | | | | | | | | | | | | | We really should move the sched_getcpu() to some more suitable place, but this one-liner fixes this build problem on ancient distros like RHEL5. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Borislav Petkov <bp@suse.de> Cc: David Ahern <dsahern@gmail.com> Cc: Don Zickus <dzickus@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vinson Lee <vlee@twitter.com> Link: http://lkml.kernel.org/n/tip-5yqg4p11f9uii6yremz3r35v@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* Merge branch 'perf/urgent' into perf/core, to resolve conflictsIngo Molnar2015-05-111-2/+10
|\ | | | | | | | | | | | | Conflicts: tools/perf/builtin-kmem.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * perf bench numa: Fix immediate meeting of convergence conditionPetr Holasek2015-04-271-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes the race in the beginning of benchmark run when some threads hasn't got assigned curr_cpu yet so they don't occur in nodes-of-process stats and benchmark concludes that all remaining threads are converged already. The race can be reproduced with small amount of threads and some bigger amount of shared process memory, e.g. one process, two threads and 5GB of process memory. Signed-off-by: Petr Holasek <pholasek@redhat.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1429198699-25039-4-git-send-email-pholasek@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * perf bench numa: Fixes of --quiet argumentPetr Holasek2015-04-271-2/+2
| | | | | | | | | | | | | | | | | | | | Corrected description and fixed function of --quiet argument. Signed-off-by: Petr Holasek <pholasek@redhat.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1429198699-25039-2-git-send-email-pholasek@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* | perf bench numa: Show more stats of particular threads in verbose modePetr Holasek2015-05-041-1/+31
|/ | | | | | | | | | | | | In verbose mode perf bench numa shows also GB/s speed, system and user cpu time for each particular thread. Using of getrusage() can provide much more per process or per thread stats in future. Signed-off-by: Petr Holasek <pholasek@redhat.com> Reviewed-by: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1429198699-25039-3-git-send-email-pholasek@redhat.com [ Rename 'usage' variable to not shadow util.h's usage() ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* Merge branch 'perf-core-for-mingo' into perf/urgentIngo Molnar2014-04-141-0/+4
|\ | | | | | | | | | | | | | | | | Conflicts: tools/perf/bench/numa.c Pull perf fixes from Jiri Olsa. Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * perf bench: Set more defaults in the 'numa' suiteRamkumar Ramachandra2014-04-141-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, $ perf bench numa mem errors out with usage information. To make this more user-friendly, let us provide a minimum set of default values required for a test run. As an added bonus, $ perf bench all now goes all the way to completion. Signed-off-by: Ramkumar Ramachandra <artagnon@gmail.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/1395964219-22173-2-git-send-email-artagnon@gmail.com Signed-off-by: Jiri Olsa <jolsa@redhat.com>
* | perf bench numa: Make no args mean 'run all tests'Arnaldo Carvalho de Melo2014-03-141-0/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we call just: perf bench numa mem it will present the same output as: perf bench numa mem -h i.e. ask for instructions about what to run. While that is kinda ok, using 'run all tests' as the default, i.e. making 'no parms' be equivalent to: perf bench numa mem -a Will allow: perf bench numa all to actually do what is asked: i.e. run all the 'bench' tests, instead of responding to that by asking what to do. That, in turn, allows: perf bench all to actually complete, for the same reasons. And after that, the tests that come after that, and that at some point hit a NULL deref, will run, allowing me to reproduce a recently reported problem. That when you have the needed numa libraries, which wasn't the case for the reporter, making me a bit confused after trying to reproduce his report. So make no parms mean -a. Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Patrick Palka <patrick@parcs.ath.cx> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-x7h0ghx4pef4n0brywg21krk@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf tools: Fix bench/numa.c for 32-bit buildAdrian Hunter2013-10-211-2/+2
| | | | | | | | | | | | | | | | | | | | bench/numa.c: In function 'worker_thread': bench/numa.c:1123:20: error: comparison between signed and unsigned integer expressions [-Werror=sign-compare] bench/numa.c:1171:6: error: format '%lx' expects argument of type 'long unsigned int', but argument 5 has type 'u64' [-Werror=format] cc1: all warnings being treated as errors Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1382099356-4918-13-git-send-email-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf bench: Fix failing assertions in numa benchPetr Holasek2013-10-111-13/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch adds more subtle handling of -C and -N parameters in parse_{cpu,node}_setup_list() functions when there isn't enough NUMA nodes or CPUs present. Instead of assertion and terminating benchmark, partial test is skipped with error message and perf will continue to the next one. Fixed problem can be easily reproduced on machine with only one NUMA node: # Running numa/mem benchmark... # Running main, "perf bench numa mem -a" ... # Running RAM-bw-remote, "perf bench numa mem -p 1 -t 1 -P 1024 -C 0 -M 1 -s perf: bench/numa.c:622: parse_setup_node_list: Assertion `!(bind_node_0 < 0 || bind_node_0 >= g->p.nr_nodes)' failed. Aborted Signed-off-by: Petr Holasek <pholasek@redhat.com> Acked-by: Ingo Molnar <mingo@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Petr Benas <pbenas@redhat.com> Link: http://lkml.kernel.org/r/1380821325-4017-1-git-send-email-pholasek@redhat.com Signed-off-by: Petr Benas <pbenas@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* perf: Add 'perf bench numa mem' NUMA performance measurement suiteIngo Molnar2013-01-301-0/+1731
Add a suite of NUMA performance benchmarks. The goal was simulate the behavior and access patterns of real NUMA workloads, via a wide range of parameters, so this tool goes well beyond simple bzero() measurements that most NUMA micro-benchmarks use: - It processes the data and creates a chain of data dependencies, like a real workload would. Neither the compiler, nor the kernel (via KSM and other optimizations) nor the CPU can eliminate parts of the workload. - It randomizes the initial state and also randomizes the target addresses of the processing - it's not a simple forward scan of addresses. - It provides flexible options to set process, thread and memory relationship information: -G sets "global" memory shared between all test processes, -P sets "process" memory shared by all threads of a process and -T sets "thread" private memory. - There's a NUMA convergence monitoring and convergence latency measurement option via -c and -m. - Micro-sleeps and synchronization can be injected to provoke lock contention and scheduling, via the -u and -S options. This simulates IO and contention. - The -x option instructs the workload to 'perturb' itself artificially every N seconds, by moving to the first and last CPU of the system periodically. This way the stability of convergence equilibrium and the number of steps taken for the scheduler to reach equilibrium again can be measured. - The amount of work can be specified via the -l loop count, and/or via a -s seconds-timeout value. - CPU and node memory binding options, to test hard binding scenarios. THP can be turned on and off via madvise() calls. - Live reporting of convergence progress in an 'at glance' output format. Printing of convergence and deconvergence events. The 'perf bench numa mem -a' option will start an array of about 30 individual tests that will each output such measurements: # Running 5x5-bw-thread, "perf bench numa mem -p 5 -t 5 -P 512 -s 20 -zZ0q --thp 1" 5x5-bw-thread, 20.276, secs, runtime-max/thread 5x5-bw-thread, 20.004, secs, runtime-min/thread 5x5-bw-thread, 20.155, secs, runtime-avg/thread 5x5-bw-thread, 0.671, %, spread-runtime/thread 5x5-bw-thread, 21.153, GB, data/thread 5x5-bw-thread, 528.818, GB, data-total 5x5-bw-thread, 0.959, nsecs, runtime/byte/thread 5x5-bw-thread, 1.043, GB/sec, thread-speed 5x5-bw-thread, 26.081, GB/sec, total-speed See the help text and the code for more details. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>