linux.git/tools/perf/util, branch v6.9-rc2

perf annotate: Add comments in the data structures

2024-03-07T04:25:48+00:00

Reviewed-by: Ian Rogers 
Reviewed-by: Arnaldo Carvalho de Melo 
Tested-by: Arnaldo Carvalho de Melo 
Cc: Andi Kleen 
Signed-off-by: Namhyung Kim 
Link: https://lore.kernel.org/r/20240304230815.1440583-5-namhyung@kernel.org

perf annotate: Remove sym_hist.addr[] array

2024-03-07T04:25:36+00:00

It's not used anymore and the code is coverted to use a hash map.  Now
sym_hist has a static size, so no need to have sizeof_sym_hist in the
struct annotated_source.

Reviewed-by: Ian Rogers 
Reviewed-by: Arnaldo Carvalho de Melo 
Tested-by: Arnaldo Carvalho de Melo 
Cc: Andi Kleen 
Signed-off-by: Namhyung Kim 
Link: https://lore.kernel.org/r/20240304230815.1440583-4-namhyung@kernel.org

perf annotate: Calculate instruction overhead using hashmap

2024-03-07T04:25:20+00:00

Use annotated_source.samples hashmap instead of addr array in the
struct sym_hist.

Reviewed-by: Ian Rogers 
Reviewed-by: Arnaldo Carvalho de Melo 
Tested-by: Arnaldo Carvalho de Melo 
Cc: Andi Kleen 
Signed-off-by: Namhyung Kim 
Link: https://lore.kernel.org/r/20240304230815.1440583-3-namhyung@kernel.org

perf annotate: Add a hashmap for symbol histogram

2024-03-07T04:24:55+00:00

Now symbol histogram uses an array to save per-offset sample counts.
But it wastes a lot of memory if the symbol has a few samples only.
Add a hashmap to save values only for actual samples.

For now, it has duplicate histogram (one in the existing array and
another in the new hash map).  Once it can convert to use the hash
in all places, we can get rid of the array later.

Reviewed-by: Ian Rogers 
Reviewed-by: Arnaldo Carvalho de Melo 
Tested-by: Arnaldo Carvalho de Melo 
Cc: Andi Kleen 
Signed-off-by: Namhyung Kim 
Link: https://lore.kernel.org/r/20240304230815.1440583-2-namhyung@kernel.org

perf threads: Reduce table size from 256 to 8

2024-03-04T06:52:13+00:00

The threads data structure is an array of hashmaps, previously
rbtrees. The two levels allows for a fixed outer array where access is
guarded by rw_semaphores. Commit 91e467bc568f ("perf machine: Use
hashtable for machine threads") sized the outer table at 256 entries
to avoid future scalability problems, however, this means the threads
struct is sized at 30,720 bytes. As the hashmaps allow O(1) access for
the common find/insert/remove operations, lower the number of entries
to 8. This reduces the size overhead to 960 bytes.

Signed-off-by: Ian Rogers 
Acked-by: Namhyung Kim 
Cc: Yang Jihong 
Cc: Oliver Upton 
Signed-off-by: Namhyung Kim 
Link: https://lore.kernel.org/r/20240301053646.1449657-8-irogers@google.com

perf threads: Switch from rbtree to hashmap

2024-03-04T06:52:04+00:00

The rbtree provides a sorting on entries but this is unused. Switch to
using hashmap for O(1) rather than O(log n) find/insert/remove
complexity.

Signed-off-by: Ian Rogers 
Acked-by: Namhyung Kim 
Cc: Yang Jihong 
Cc: Oliver Upton 
Signed-off-by: Namhyung Kim 
Link: https://lore.kernel.org/r/20240301053646.1449657-7-irogers@google.com

perf threads: Move threads to its own files

2024-03-04T06:51:55+00:00

Move threads out of machine and into its own file.

Signed-off-by: Ian Rogers 
Acked-by: Namhyung Kim 
Cc: Yang Jihong 
Cc: Oliver Upton 
Signed-off-by: Namhyung Kim 
Link: https://lore.kernel.org/r/20240301053646.1449657-6-irogers@google.com

perf machine: Move machine's threads into its own abstraction

2024-03-04T06:51:44+00:00

Move thread_rb_node into the machine.c file. This hides the
implementation of threads from the rest of the code allowing for it to
be refactored.

Locking discipline is tightened up in this change. As the lock is now
encapsulated in threads, the findnew function requires holding it (as
it already did in machine). Rather than do conditionals with locks
based on whether the thread should be created (which could potentially
be error prone with a read lock match with a write unlock), have a
separate threads__find that won't create the thread and only holds the
read lock. This effectively duplicates the findnew logic, with the
existing findnew logic only operating under a write lock assuming
creation is necessary as a previous find failed. The creation may
still fail with the write lock due to another thread. The duplication
is removed in a later next patch that delegates the implementation to
hashtable.

Signed-off-by: Ian Rogers 
Acked-by: Namhyung Kim 
Cc: Yang Jihong 
Cc: Oliver Upton 
Signed-off-by: Namhyung Kim 
Link: https://lore.kernel.org/r/20240301053646.1449657-5-irogers@google.com

perf machine: Move fprintf to for_each loop and a callback

2024-03-04T06:51:31+00:00

Avoid exposing the threads data structure by switching to the callback
machine__for_each_thread approach. machine__fprintf is only used in
tests and verbose >3 output so don't turn to list and sort. Add
machine__threads_nr to be refactored later.

Note, all existing *_fprintf routines ignore fprintf errors.

Signed-off-by: Ian Rogers 
Acked-by: Namhyung Kim 
Cc: Yang Jihong 
Cc: Oliver Upton 
Signed-off-by: Namhyung Kim 
Link: https://lore.kernel.org/r/20240301053646.1449657-4-irogers@google.com

perf trace: Ignore thread hashing in summary

2024-03-04T06:51:18+00:00

Commit 91e467bc568f ("perf machine: Use hashtable for machine
threads") made the iteration of thread tids unordered. The perf trace
--summary output sorts and prints each hash bucket, rather than all
threads globally. Change this behavior by turn all threads into a
list, sort the list by number of trace events then by tids, finally
print the list. This also allows the rbtree in threads to be not
accessed outside of machine.

Signed-off-by: Ian Rogers 
Acked-by: Namhyung Kim 
Cc: Yang Jihong 
Cc: Oliver Upton 
Signed-off-by: Namhyung Kim 
Link: https://lore.kernel.org/r/20240301053646.1449657-3-irogers@google.com