<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/tools/perf/bench, branch v6.15</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>perf bench sched pipe: fix enforced blocking reads in worker_thread</title>
<updated>2025-03-24T06:20:37+00:00</updated>
<author>
<name>Dirk Gouders</name>
<email>dirk@gouders.net</email>
</author>
<published>2025-03-23T14:01:01+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=99476fa085da764fbed0684e22b831de8cd22512'/>
<id>99476fa085da764fbed0684e22b831de8cd22512</id>
<content type='text'>
The function worker_thread() is programmed in a way that roughly
doubles the number of expectable context switches, because it enforces
blocking reads:

 Performance counter stats for 'perf bench sched pipe':

         2,000,004      context-switches

      11.859548321 seconds time elapsed

       0.674871000 seconds user
       8.076890000 seconds sys

The result of this behavior is that the blocking reads by far dominate
the performance analysis of 'perf bench sched pipe':

Samples: 78K of event 'cycles:P', Event count (approx.): 27964965844
Overhead  Command     Shared Object         Symbol
  25.28%  sched-pipe  [kernel.kallsyms]     [k] read_hpet
   8.11%  sched-pipe  [kernel.kallsyms]     [k] retbleed_untrain_ret
   2.82%  sched-pipe  [kernel.kallsyms]     [k] pipe_write

From the code, it is unclear if that behavior is wanted but the log
says that at least Ingo Molnar aims to mimic lmbench's lat_ctx, that
doesn't handle the pipe ends that way
(https://sourceforge.net/p/lmbench/code/HEAD/tree/trunk/lmbench2/src/lat_ctx.c)

Fix worker_thread() by always first feeding the write ends of the pipes
and then trying to read.

This roughly halves the context switches and runtime of pure
'perf bench sched pipe':

 Performance counter stats for 'perf bench sched pipe':

         1,005,770      context-switches

       6.033448041 seconds time elapsed

       0.423142000 seconds user
       4.519829000 seconds sys

And the blocking reads do no longer dominate the analysis at the above
extreme:

Samples: 40K of event 'cycles:P', Event count (approx.): 14309364879
Overhead  Command     Shared Object         Symbol
  12.20%  sched-pipe  [kernel.kallsyms]     [k] read_hpet
   9.23%  sched-pipe  [kernel.kallsyms]     [k] retbleed_untrain_ret
   3.68%  sched-pipe  [kernel.kallsyms]     [k] pipe_write

Signed-off-by: Dirk Gouders &lt;dirk@gouders.net&gt;
Acked-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lore.kernel.org/r/20250323140316.19027-2-dirk@gouders.net
Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The function worker_thread() is programmed in a way that roughly
doubles the number of expectable context switches, because it enforces
blocking reads:

 Performance counter stats for 'perf bench sched pipe':

         2,000,004      context-switches

      11.859548321 seconds time elapsed

       0.674871000 seconds user
       8.076890000 seconds sys

The result of this behavior is that the blocking reads by far dominate
the performance analysis of 'perf bench sched pipe':

Samples: 78K of event 'cycles:P', Event count (approx.): 27964965844
Overhead  Command     Shared Object         Symbol
  25.28%  sched-pipe  [kernel.kallsyms]     [k] read_hpet
   8.11%  sched-pipe  [kernel.kallsyms]     [k] retbleed_untrain_ret
   2.82%  sched-pipe  [kernel.kallsyms]     [k] pipe_write

From the code, it is unclear if that behavior is wanted but the log
says that at least Ingo Molnar aims to mimic lmbench's lat_ctx, that
doesn't handle the pipe ends that way
(https://sourceforge.net/p/lmbench/code/HEAD/tree/trunk/lmbench2/src/lat_ctx.c)

Fix worker_thread() by always first feeding the write ends of the pipes
and then trying to read.

This roughly halves the context switches and runtime of pure
'perf bench sched pipe':

 Performance counter stats for 'perf bench sched pipe':

         1,005,770      context-switches

       6.033448041 seconds time elapsed

       0.423142000 seconds user
       4.519829000 seconds sys

And the blocking reads do no longer dominate the analysis at the above
extreme:

Samples: 40K of event 'cycles:P', Event count (approx.): 14309364879
Overhead  Command     Shared Object         Symbol
  12.20%  sched-pipe  [kernel.kallsyms]     [k] read_hpet
   9.23%  sched-pipe  [kernel.kallsyms]     [k] retbleed_untrain_ret
   3.68%  sched-pipe  [kernel.kallsyms]     [k] pipe_write

Signed-off-by: Dirk Gouders &lt;dirk@gouders.net&gt;
Acked-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lore.kernel.org/r/20250323140316.19027-2-dirk@gouders.net
Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf bench: Fix perf bench syscall loop count</title>
<updated>2025-03-05T17:19:23+00:00</updated>
<author>
<name>Thomas Richter</name>
<email>tmricht@linux.ibm.com</email>
</author>
<published>2025-03-04T09:23:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=957d194163bf983da98bf7ec7e4f86caff8cd0eb'/>
<id>957d194163bf983da98bf7ec7e4f86caff8cd0eb</id>
<content type='text'>
Command 'perf bench syscall fork -l 100000' offers option -l to run for
a specified number of iterations. However this option is not always
observed. The number is silently limited to 10000 iterations as can be
seen:

Output before:
 # perf bench syscall fork -l 100000
 # Running 'syscall/fork' benchmark:
 # Executed 10,000 fork() calls
     Total time: 23.388 [sec]

    2338.809800 usecs/op
            427 ops/sec
 #

When explicitly specified with option -l or --loops, also observe
higher number of iterations:

Output after:
 # perf bench syscall fork -l 100000
 # Running 'syscall/fork' benchmark:
 # Executed 100,000 fork() calls
     Total time: 716.982 [sec]

    7169.829510 usecs/op
            139 ops/sec
 #

This patch fixes the issue for basic execve fork and getpgid.

Fixes: ece7f7c0507c ("perf bench syscall: Add fork syscall benchmark")
Signed-off-by: Thomas Richter &lt;tmricht@linux.ibm.com&gt;
Acked-by: Sumanth Korikkar &lt;sumanthk@linux.ibm.com&gt;
Tested-by: Athira Rajeev &lt;atrajeev@linux.ibm.com&gt;
Cc: Tiezhu Yang &lt;yangtiezhu@loongson.cn&gt;
Link: https://lore.kernel.org/r/20250304092349.2618082-1-tmricht@linux.ibm.com
Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Command 'perf bench syscall fork -l 100000' offers option -l to run for
a specified number of iterations. However this option is not always
observed. The number is silently limited to 10000 iterations as can be
seen:

Output before:
 # perf bench syscall fork -l 100000
 # Running 'syscall/fork' benchmark:
 # Executed 10,000 fork() calls
     Total time: 23.388 [sec]

    2338.809800 usecs/op
            427 ops/sec
 #

When explicitly specified with option -l or --loops, also observe
higher number of iterations:

Output after:
 # perf bench syscall fork -l 100000
 # Running 'syscall/fork' benchmark:
 # Executed 100,000 fork() calls
     Total time: 716.982 [sec]

    7169.829510 usecs/op
            139 ops/sec
 #

This patch fixes the issue for basic execve fork and getpgid.

Fixes: ece7f7c0507c ("perf bench syscall: Add fork syscall benchmark")
Signed-off-by: Thomas Richter &lt;tmricht@linux.ibm.com&gt;
Acked-by: Sumanth Korikkar &lt;sumanthk@linux.ibm.com&gt;
Tested-by: Athira Rajeev &lt;atrajeev@linux.ibm.com&gt;
Cc: Tiezhu Yang &lt;yangtiezhu@loongson.cn&gt;
Link: https://lore.kernel.org/r/20250304092349.2618082-1-tmricht@linux.ibm.com
Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf bench: Fix undefined behavior in cmpworker()</title>
<updated>2025-01-18T18:14:36+00:00</updated>
<author>
<name>Kuan-Wei Chiu</name>
<email>visitorckw@gmail.com</email>
</author>
<published>2025-01-16T11:08:42+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=62892e77b8a64b9dc0e1da75980aa145347b6820'/>
<id>62892e77b8a64b9dc0e1da75980aa145347b6820</id>
<content type='text'>
The comparison function cmpworker() violates the C standard's
requirements for qsort() comparison functions, which mandate symmetry
and transitivity:

Symmetry: If x &lt; y, then y &gt; x.
Transitivity: If x &lt; y and y &lt; z, then x &lt; z.

In its current implementation, cmpworker() incorrectly returns 0 when
w1-&gt;tid &lt; w2-&gt;tid, which breaks both symmetry and transitivity. This
violation causes undefined behavior, potentially leading to issues such
as memory corruption in glibc [1].

Fix the issue by returning -1 when w1-&gt;tid &lt; w2-&gt;tid, ensuring
compliance with the C standard and preventing undefined behavior.

Link: https://www.qualys.com/2024/01/30/qsort.txt [1]
Fixes: 121dd9ea0116 ("perf bench: Add epoll parallel epoll_wait benchmark")
Cc: stable@vger.kernel.org
Signed-off-by: Kuan-Wei Chiu &lt;visitorckw@gmail.com&gt;
Reviewed-by: James Clark &lt;james.clark@linaro.org&gt;
Link: https://lore.kernel.org/r/20250116110842.4087530-1-visitorckw@gmail.com
Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The comparison function cmpworker() violates the C standard's
requirements for qsort() comparison functions, which mandate symmetry
and transitivity:

Symmetry: If x &lt; y, then y &gt; x.
Transitivity: If x &lt; y and y &lt; z, then x &lt; z.

In its current implementation, cmpworker() incorrectly returns 0 when
w1-&gt;tid &lt; w2-&gt;tid, which breaks both symmetry and transitivity. This
violation causes undefined behavior, potentially leading to issues such
as memory corruption in glibc [1].

Fix the issue by returning -1 when w1-&gt;tid &lt; w2-&gt;tid, ensuring
compliance with the C standard and preventing undefined behavior.

Link: https://www.qualys.com/2024/01/30/qsort.txt [1]
Fixes: 121dd9ea0116 ("perf bench: Add epoll parallel epoll_wait benchmark")
Cc: stable@vger.kernel.org
Signed-off-by: Kuan-Wei Chiu &lt;visitorckw@gmail.com&gt;
Reviewed-by: James Clark &lt;james.clark@linaro.org&gt;
Link: https://lore.kernel.org/r/20250116110842.4087530-1-visitorckw@gmail.com
Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf bench: Remove reference to cmd_inject</title>
<updated>2024-12-18T19:24:33+00:00</updated>
<author>
<name>Ian Rogers</name>
<email>irogers@google.com</email>
</author>
<published>2024-11-19T01:16:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=df487111bd09616e5f20f32e88c48005d09dc0ec'/>
<id>df487111bd09616e5f20f32e88c48005d09dc0ec</id>
<content type='text'>
Avoid `perf bench internals inject-build-id` referencing the
cmd_inject sub-command that requires perf-bench to backward reference
internals of builtins. Replace the reference to cmd_inject with a call
to main. To avoid python.c needing to link with something providing
main, drop the libperf-bench library from the python shared object.

Signed-off-by: Ian Rogers &lt;irogers@google.com&gt;
Tested-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Athira Rajeev &lt;atrajeev@linux.vnet.ibm.com&gt;
Cc: Colin Ian King &lt;colin.i.king@gmail.com&gt;
Cc: Dapeng Mi &lt;dapeng1.mi@linux.intel.com&gt;
Cc: Howard Chu &lt;howardchu95@gmail.com&gt;
Cc: Ilya Leoshkevich &lt;iii@linux.ibm.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: James Clark &lt;james.clark@linaro.org&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Michael Petlan &lt;mpetlan@redhat.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Richter &lt;tmricht@linux.ibm.com&gt;
Cc: Veronika Molnarova &lt;vmolnaro@redhat.com&gt;
Cc: Weilin Wang &lt;weilin.wang@intel.com&gt;
Link: https://lore.kernel.org/r/20241119011644.971342-17-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Avoid `perf bench internals inject-build-id` referencing the
cmd_inject sub-command that requires perf-bench to backward reference
internals of builtins. Replace the reference to cmd_inject with a call
to main. To avoid python.c needing to link with something providing
main, drop the libperf-bench library from the python shared object.

Signed-off-by: Ian Rogers &lt;irogers@google.com&gt;
Tested-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Andi Kleen &lt;ak@linux.intel.com&gt;
Cc: Athira Rajeev &lt;atrajeev@linux.vnet.ibm.com&gt;
Cc: Colin Ian King &lt;colin.i.king@gmail.com&gt;
Cc: Dapeng Mi &lt;dapeng1.mi@linux.intel.com&gt;
Cc: Howard Chu &lt;howardchu95@gmail.com&gt;
Cc: Ilya Leoshkevich &lt;iii@linux.ibm.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: James Clark &lt;james.clark@linaro.org&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Michael Petlan &lt;mpetlan@redhat.com&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Richter &lt;tmricht@linux.ibm.com&gt;
Cc: Veronika Molnarova &lt;vmolnaro@redhat.com&gt;
Cc: Weilin Wang &lt;weilin.wang@intel.com&gt;
Link: https://lore.kernel.org/r/20241119011644.971342-17-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf header: Move is_cpu_online to numa bench</title>
<updated>2024-11-16T19:36:47+00:00</updated>
<author>
<name>Ian Rogers</name>
<email>irogers@google.com</email>
</author>
<published>2024-11-07T16:20:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c6fafe36bab32b9b24941fe381f0b66751021c25'/>
<id>c6fafe36bab32b9b24941fe381f0b66751021c25</id>
<content type='text'>
The helper function is only used in the NUMA benchmark as typically
online CPUs are determined through perf_cpu_map__new_online_cpus().

Reduce the scope of the function for now.

Reviewed-by: James Clark &lt;james.clark@linaro.org&gt;
Signed-off-by: Ian Rogers &lt;irogers@google.com&gt;
Tested-by: Xu Yang &lt;xu.yang_2@nxp.com&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Albert Ou &lt;aou@eecs.berkeley.edu&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Alexandre Ghiti &lt;alexghiti@rivosinc.com&gt;
Cc: Athira Rajeev &lt;atrajeev@linux.vnet.ibm.com&gt;
Cc: Ben Zong-You Xie &lt;ben717@andestech.com&gt;
Cc: Benjamin Gray &lt;bgray@linux.ibm.com&gt;
Cc: Bibo Mao &lt;maobibo@loongson.cn&gt;
Cc: Clément Le Goffic &lt;clement.legoffic@foss.st.com&gt;
Cc: Dima Kogan &lt;dima@secretsauce.net&gt;
Cc: Dr. David Alan Gilbert &lt;linux@treblig.org&gt;
Cc: Huacai Chen &lt;chenhuacai@kernel.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: John Garry &lt;john.g.garry@oracle.com&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Leo Yan &lt;leo.yan@linux.dev&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Mike Leach &lt;mike.leach@linaro.org&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Palmer Dabbelt &lt;palmer@dabbelt.com&gt;
Cc: Paul Walmsley &lt;paul.walmsley@sifive.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ravi Bangoria &lt;ravi.bangoria@amd.com&gt;
Cc: Sandipan Das &lt;sandipan.das@amd.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Yicong Yang &lt;yangyicong@hisilicon.com&gt;
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Link: https://lore.kernel.org/r/20241107162035.52206-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The helper function is only used in the NUMA benchmark as typically
online CPUs are determined through perf_cpu_map__new_online_cpus().

Reduce the scope of the function for now.

Reviewed-by: James Clark &lt;james.clark@linaro.org&gt;
Signed-off-by: Ian Rogers &lt;irogers@google.com&gt;
Tested-by: Xu Yang &lt;xu.yang_2@nxp.com&gt;
Cc: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Cc: Albert Ou &lt;aou@eecs.berkeley.edu&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Alexandre Ghiti &lt;alexghiti@rivosinc.com&gt;
Cc: Athira Rajeev &lt;atrajeev@linux.vnet.ibm.com&gt;
Cc: Ben Zong-You Xie &lt;ben717@andestech.com&gt;
Cc: Benjamin Gray &lt;bgray@linux.ibm.com&gt;
Cc: Bibo Mao &lt;maobibo@loongson.cn&gt;
Cc: Clément Le Goffic &lt;clement.legoffic@foss.st.com&gt;
Cc: Dima Kogan &lt;dima@secretsauce.net&gt;
Cc: Dr. David Alan Gilbert &lt;linux@treblig.org&gt;
Cc: Huacai Chen &lt;chenhuacai@kernel.org&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: John Garry &lt;john.g.garry@oracle.com&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Leo Yan &lt;leo.yan@linux.dev&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Mike Leach &lt;mike.leach@linaro.org&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Palmer Dabbelt &lt;palmer@dabbelt.com&gt;
Cc: Paul Walmsley &lt;paul.walmsley@sifive.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Ravi Bangoria &lt;ravi.bangoria@amd.com&gt;
Cc: Sandipan Das &lt;sandipan.das@amd.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Yicong Yang &lt;yangyicong@hisilicon.com&gt;
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-riscv@lists.infradead.org
Link: https://lore.kernel.org/r/20241107162035.52206-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf tools: sched-pipe bench: add (-n) nonblocking benchmark</title>
<updated>2024-10-22T04:23:01+00:00</updated>
<author>
<name>Brian Geffon</name>
<email>bgeffon@google.com</email>
</author>
<published>2024-10-16T19:00:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=3e2d4df574fc6bbd00b422f2f1ce5c1ac251feae'/>
<id>3e2d4df574fc6bbd00b422f2f1ce5c1ac251feae</id>
<content type='text'>
The -n mode will benchmark pipes in a non-blocking mode using
epoll_wait.

This specific mode was added to demonstrate the broken sync nature
of epoll: https://lore.kernel.org/lkml/20240426-zupfen-jahrzehnt-5be786bcdf04@brauner

Signed-off-by: Brian Geffon &lt;bgeffon@google.com&gt;
Reviewed-by: Ian Rogers &lt;irogers@google.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Link: https://lore.kernel.org/r/20241016190009.866615-1-bgeffon@google.com
Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The -n mode will benchmark pipes in a non-blocking mode using
epoll_wait.

This specific mode was added to demonstrate the broken sync nature
of epoll: https://lore.kernel.org/lkml/20240426-zupfen-jahrzehnt-5be786bcdf04@brauner

Signed-off-by: Brian Geffon &lt;bgeffon@google.com&gt;
Reviewed-by: Ian Rogers &lt;irogers@google.com&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Link: https://lore.kernel.org/r/20241016190009.866615-1-bgeffon@google.com
Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf tool: Constify tool pointers</title>
<updated>2024-08-12T21:05:14+00:00</updated>
<author>
<name>Ian Rogers</name>
<email>irogers@google.com</email>
</author>
<published>2024-08-12T20:46:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=30f29bae9142f34e978a4861ed07aa512af21416'/>
<id>30f29bae9142f34e978a4861ed07aa512af21416</id>
<content type='text'>
The tool pointer (to a struct largely of function pointers) is passed
around but is unchanged except at initialization. Change parameter and
variable types to be const to lower the possibilities of what could
happen with a tool.

Reviewed-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Signed-off-by: Ian Rogers &lt;irogers@google.com&gt;
Tested-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Tested-by: Leo Yan &lt;leo.yan@arm.com&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Cc: Athira Rajeev &lt;atrajeev@linux.vnet.ibm.com&gt;
Cc: Huacai Chen &lt;chenhuacai@kernel.org&gt;
Cc: Ilkka Koskinen &lt;ilkka@os.amperecomputing.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: James Clark &lt;james.clark@arm.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: John Garry &lt;john.g.garry@oracle.com&gt;
Cc: Jonathan Cameron &lt;jonathan.cameron@huawei.com&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Leo Yan &lt;leo.yan@linux.dev&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Mike Leach &lt;mike.leach@linaro.org&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Nick Desaulniers &lt;ndesaulniers@google.com&gt;
Cc: Nick Terrell &lt;terrelln@fb.com&gt;
Cc: Oliver Upton &lt;oliver.upton@linux.dev&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Song Liu &lt;song@kernel.org&gt;
Cc: Sun Haiyong &lt;sunhaiyong@loongson.cn&gt;
Cc: Suzuki Poulouse &lt;suzuki.poulose@arm.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Yanteng Si &lt;siyanteng@loongson.cn&gt;
Cc: Yicong Yang &lt;yangyicong@hisilicon.com&gt;
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The tool pointer (to a struct largely of function pointers) is passed
around but is unchanged except at initialization. Change parameter and
variable types to be const to lower the possibilities of what could
happen with a tool.

Reviewed-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Signed-off-by: Ian Rogers &lt;irogers@google.com&gt;
Tested-by: Adrian Hunter &lt;adrian.hunter@intel.com&gt;
Tested-by: Leo Yan &lt;leo.yan@arm.com&gt;
Cc: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Anshuman Khandual &lt;anshuman.khandual@arm.com&gt;
Cc: Athira Rajeev &lt;atrajeev@linux.vnet.ibm.com&gt;
Cc: Huacai Chen &lt;chenhuacai@kernel.org&gt;
Cc: Ilkka Koskinen &lt;ilkka@os.amperecomputing.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: James Clark &lt;james.clark@arm.com&gt;
Cc: Jiri Olsa &lt;jolsa@kernel.org&gt;
Cc: John Garry &lt;john.g.garry@oracle.com&gt;
Cc: Jonathan Cameron &lt;jonathan.cameron@huawei.com&gt;
Cc: Kan Liang &lt;kan.liang@linux.intel.com&gt;
Cc: Leo Yan &lt;leo.yan@linux.dev&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Mike Leach &lt;mike.leach@linaro.org&gt;
Cc: Namhyung Kim &lt;namhyung@kernel.org&gt;
Cc: Nick Desaulniers &lt;ndesaulniers@google.com&gt;
Cc: Nick Terrell &lt;terrelln@fb.com&gt;
Cc: Oliver Upton &lt;oliver.upton@linux.dev&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Song Liu &lt;song@kernel.org&gt;
Cc: Sun Haiyong &lt;sunhaiyong@loongson.cn&gt;
Cc: Suzuki Poulouse &lt;suzuki.poulose@arm.com&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Yanteng Si &lt;siyanteng@loongson.cn&gt;
Cc: Yicong Yang &lt;yangyicong@hisilicon.com&gt;
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20240812204720.631678-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo &lt;acme@redhat.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>perf bench: Make bench its own library</title>
<updated>2024-06-26T18:07:28+00:00</updated>
<author>
<name>Ian Rogers</name>
<email>irogers@google.com</email>
</author>
<published>2024-06-25T21:41:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=21cc3bc00a68c1f4178feab1f89d1af3cfcfc84f'/>
<id>21cc3bc00a68c1f4178feab1f89d1af3cfcfc84f</id>
<content type='text'>
Make the benchmark code into a library so it may be linked against
things like the python module to avoid compiling code twice.

Signed-off-by: Ian Rogers &lt;irogers@google.com&gt;
Reviewed-by: James Clark &lt;james.clark@arm.com&gt;
Cc: Suzuki K Poulose &lt;suzuki.poulose@arm.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Palmer Dabbelt &lt;palmer@dabbelt.com&gt;
Cc: Albert Ou &lt;aou@eecs.berkeley.edu&gt;
Cc: Nick Terrell &lt;terrelln@fb.com&gt;
Cc: Gary Guo &lt;gary@garyguo.net&gt;
Cc: Alex Gaynor &lt;alex.gaynor@gmail.com&gt;
Cc: Boqun Feng &lt;boqun.feng@gmail.com&gt;
Cc: Wedson Almeida Filho &lt;wedsonaf@gmail.com&gt;
Cc: Ze Gao &lt;zegao2021@gmail.com&gt;
Cc: Alice Ryhl &lt;aliceryhl@google.com&gt;
Cc: Andrei Vagin &lt;avagin@google.com&gt;
Cc: Yicong Yang &lt;yangyicong@hisilicon.com&gt;
Cc: Jonathan Cameron &lt;jonathan.cameron@huawei.com&gt;
Cc: Guo Ren &lt;guoren@kernel.org&gt;
Cc: Miguel Ojeda &lt;ojeda@kernel.org&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Mike Leach &lt;mike.leach@linaro.org&gt;
Cc: Leo Yan &lt;leo.yan@linux.dev&gt;
Cc: Oliver Upton &lt;oliver.upton@linux.dev&gt;
Cc: John Garry &lt;john.g.garry@oracle.com&gt;
Cc: Benno Lossin &lt;benno.lossin@proton.me&gt;
Cc: Björn Roy Baron &lt;bjorn3_gh@protonmail.com&gt;
Cc: Andreas Hindborg &lt;a.hindborg@samsung.com&gt;
Cc: Paul Walmsley &lt;paul.walmsley@sifive.com&gt;
Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Link: https://lore.kernel.org/r/20240625214117.953777-6-irogers@google.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Make the benchmark code into a library so it may be linked against
things like the python module to avoid compiling code twice.

Signed-off-by: Ian Rogers &lt;irogers@google.com&gt;
Reviewed-by: James Clark &lt;james.clark@arm.com&gt;
Cc: Suzuki K Poulose &lt;suzuki.poulose@arm.com&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: Palmer Dabbelt &lt;palmer@dabbelt.com&gt;
Cc: Albert Ou &lt;aou@eecs.berkeley.edu&gt;
Cc: Nick Terrell &lt;terrelln@fb.com&gt;
Cc: Gary Guo &lt;gary@garyguo.net&gt;
Cc: Alex Gaynor &lt;alex.gaynor@gmail.com&gt;
Cc: Boqun Feng &lt;boqun.feng@gmail.com&gt;
Cc: Wedson Almeida Filho &lt;wedsonaf@gmail.com&gt;
Cc: Ze Gao &lt;zegao2021@gmail.com&gt;
Cc: Alice Ryhl &lt;aliceryhl@google.com&gt;
Cc: Andrei Vagin &lt;avagin@google.com&gt;
Cc: Yicong Yang &lt;yangyicong@hisilicon.com&gt;
Cc: Jonathan Cameron &lt;jonathan.cameron@huawei.com&gt;
Cc: Guo Ren &lt;guoren@kernel.org&gt;
Cc: Miguel Ojeda &lt;ojeda@kernel.org&gt;
Cc: Will Deacon &lt;will@kernel.org&gt;
Cc: Mike Leach &lt;mike.leach@linaro.org&gt;
Cc: Leo Yan &lt;leo.yan@linux.dev&gt;
Cc: Oliver Upton &lt;oliver.upton@linux.dev&gt;
Cc: John Garry &lt;john.g.garry@oracle.com&gt;
Cc: Benno Lossin &lt;benno.lossin@proton.me&gt;
Cc: Björn Roy Baron &lt;bjorn3_gh@protonmail.com&gt;
Cc: Andreas Hindborg &lt;a.hindborg@samsung.com&gt;
Cc: Paul Walmsley &lt;paul.walmsley@sifive.com&gt;
Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Link: https://lore.kernel.org/r/20240625214117.953777-6-irogers@google.com
</pre>
</div>
</content>
</entry>
<entry>
<title>tools/perf: Fix timing issue with parallel threads in perf bench wake-up-parallel</title>
<updated>2024-06-14T04:27:49+00:00</updated>
<author>
<name>Athira Rajeev</name>
<email>atrajeev@linux.vnet.ibm.com</email>
</author>
<published>2024-06-07T04:43:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=245b0edf4838874801cd35b0507ed9ec38742e8e'/>
<id>245b0edf4838874801cd35b0507ed9ec38742e8e</id>
<content type='text'>
perf bench futex fails as below and hangs intermittently when
attempted to run on on a powerpc system:

./perf bench futex wake-parallel
 Running 'futex/wake-parallel' benchmark:
 Run summary [PID 88588]: blocking on 640 threads (at [private] futex 0x10464b8c), 640 threads waking up 1 at a time.

[Run 1]: Avg per-thread latency (waking 1/640 threads) in 0.1309 ms (+-53.27%)
[Run 2]: Avg per-thread latency (waking 1/640 threads) in 0.0120 ms (+-31.16%)
[Run 3]: Avg per-thread latency (waking 1/640 threads) in 0.1474 ms (+-92.47%)
[Run 4]: Avg per-thread latency (waking 1/640 threads) in 0.2883 ms (+-67.75%)
[Run 5]: Avg per-thread latency (waking 1/640 threads) in 0.4108 ms (+-39.60%)
[Run 6]: Avg per-thread latency (waking 1/640 threads) in 0.7843 ms (+-78.98%)
perf: couldn't wakeup all tasks (0/1)
perf: couldn't wakeup all tasks (0/1)
perf: couldn't wakeup all tasks (0/1)
perf: couldn't wakeup all tasks (0/1)
perf: couldn't wakeup all tasks (0/1)
perf: couldn't wakeup all tasks (0/1)

In the system, where perf bench wake-up-parallel is has system
configuration of 640 cpus. After debugging, this turned out to be
a timing issue. The benchmark creates threads equal to number of
cpus and issues a futex_wait. Then it does a usleep for .1 second
before initiating futex_wake. In system configuration with more
threads, the usleep time is not enough. Patch changes the usleep
from 100000 to 200000

With the patch, ran multiple iterations and there were no issues
further seen

Reported-by: Disha Goel &lt;disgoel@linux.vnet.ibm.com&gt;
Signed-off-by: Athira Rajeev &lt;atrajeev@linux.vnet.ibm.com&gt;
Reviewed-by: Ian Rogers &lt;irogers@google.com&gt;
Tested-by: Disha Goel &lt;disgoel@linux.ibm.com&gt;
Cc: akanksha@linux.ibm.com
Cc: kjain@linux.ibm.com
Cc: maddy@linux.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Link: https://lore.kernel.org/r/20240607044354.82225-3-atrajeev@linux.vnet.ibm.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
perf bench futex fails as below and hangs intermittently when
attempted to run on on a powerpc system:

./perf bench futex wake-parallel
 Running 'futex/wake-parallel' benchmark:
 Run summary [PID 88588]: blocking on 640 threads (at [private] futex 0x10464b8c), 640 threads waking up 1 at a time.

[Run 1]: Avg per-thread latency (waking 1/640 threads) in 0.1309 ms (+-53.27%)
[Run 2]: Avg per-thread latency (waking 1/640 threads) in 0.0120 ms (+-31.16%)
[Run 3]: Avg per-thread latency (waking 1/640 threads) in 0.1474 ms (+-92.47%)
[Run 4]: Avg per-thread latency (waking 1/640 threads) in 0.2883 ms (+-67.75%)
[Run 5]: Avg per-thread latency (waking 1/640 threads) in 0.4108 ms (+-39.60%)
[Run 6]: Avg per-thread latency (waking 1/640 threads) in 0.7843 ms (+-78.98%)
perf: couldn't wakeup all tasks (0/1)
perf: couldn't wakeup all tasks (0/1)
perf: couldn't wakeup all tasks (0/1)
perf: couldn't wakeup all tasks (0/1)
perf: couldn't wakeup all tasks (0/1)
perf: couldn't wakeup all tasks (0/1)

In the system, where perf bench wake-up-parallel is has system
configuration of 640 cpus. After debugging, this turned out to be
a timing issue. The benchmark creates threads equal to number of
cpus and issues a futex_wait. Then it does a usleep for .1 second
before initiating futex_wake. In system configuration with more
threads, the usleep time is not enough. Patch changes the usleep
from 100000 to 200000

With the patch, ran multiple iterations and there were no issues
further seen

Reported-by: Disha Goel &lt;disgoel@linux.vnet.ibm.com&gt;
Signed-off-by: Athira Rajeev &lt;atrajeev@linux.vnet.ibm.com&gt;
Reviewed-by: Ian Rogers &lt;irogers@google.com&gt;
Tested-by: Disha Goel &lt;disgoel@linux.ibm.com&gt;
Cc: akanksha@linux.ibm.com
Cc: kjain@linux.ibm.com
Cc: maddy@linux.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Link: https://lore.kernel.org/r/20240607044354.82225-3-atrajeev@linux.vnet.ibm.com
</pre>
</div>
</content>
</entry>
<entry>
<title>tools/perf: Fix perf bench epoll to enable the run when some CPU's are offline</title>
<updated>2024-06-14T04:27:26+00:00</updated>
<author>
<name>Athira Rajeev</name>
<email>atrajeev@linux.vnet.ibm.com</email>
</author>
<published>2024-06-07T04:43:53+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=3638e44542a56e8adc2018df4894eaf31d387c54'/>
<id>3638e44542a56e8adc2018df4894eaf31d387c54</id>
<content type='text'>
Perf bench epoll fails as below when attempted to run on
on a powerpc system:

   ./perf bench epoll wait
   Running 'epoll/wait' benchmark:
   Run summary [PID 627653]: 79 threads monitoring on 64 file-descriptors for 8 secs.

   perf: pthread_create: No such file or directory

In the setup where this perf bench was ran, difference was that
partition had 640 CPU's, but not all CPUs were online. 80 CPUs
were online. While creating threads and using epoll_wait , code
sets the affinity using cpumask. The cpumask size used is 80
which is picked from "nrcpus = perf_cpu_map__nr(cpu)". Here the
benchmark reports fail while setting affinity for cpu number which
is greater than 80 or higher, because it attempts to set a bit
position which is not allocated on the cpumask. Fix this by changing
the size of cpumask to number of possible cpus and not the number
of online cpus.

Signed-off-by: Athira Rajeev &lt;atrajeev@linux.vnet.ibm.com&gt;
Reviewed-by: Ian Rogers &lt;irogers@google.com&gt;
Tested-by: Disha Goel &lt;disgoel@linux.ibm.com&gt;
Cc: akanksha@linux.ibm.com
Cc: kjain@linux.ibm.com
Cc: maddy@linux.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Link: https://lore.kernel.org/r/20240607044354.82225-2-atrajeev@linux.vnet.ibm.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Perf bench epoll fails as below when attempted to run on
on a powerpc system:

   ./perf bench epoll wait
   Running 'epoll/wait' benchmark:
   Run summary [PID 627653]: 79 threads monitoring on 64 file-descriptors for 8 secs.

   perf: pthread_create: No such file or directory

In the setup where this perf bench was ran, difference was that
partition had 640 CPU's, but not all CPUs were online. 80 CPUs
were online. While creating threads and using epoll_wait , code
sets the affinity using cpumask. The cpumask size used is 80
which is picked from "nrcpus = perf_cpu_map__nr(cpu)". Here the
benchmark reports fail while setting affinity for cpu number which
is greater than 80 or higher, because it attempts to set a bit
position which is not allocated on the cpumask. Fix this by changing
the size of cpumask to number of possible cpus and not the number
of online cpus.

Signed-off-by: Athira Rajeev &lt;atrajeev@linux.vnet.ibm.com&gt;
Reviewed-by: Ian Rogers &lt;irogers@google.com&gt;
Tested-by: Disha Goel &lt;disgoel@linux.ibm.com&gt;
Cc: akanksha@linux.ibm.com
Cc: kjain@linux.ibm.com
Cc: maddy@linux.ibm.com
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Namhyung Kim &lt;namhyung@kernel.org&gt;
Link: https://lore.kernel.org/r/20240607044354.82225-2-atrajeev@linux.vnet.ibm.com
</pre>
</div>
</content>
</entry>
</feed>
