<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/kernel/trace/trace.c, branch v6.6.2</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>tracing: Have trace_event_file have ref counters</title>
<updated>2023-11-08T10:56:21+00:00</updated>
<author>
<name>Steven Rostedt (Google)</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2023-11-05T15:56:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9034c87d61be8cff989017740a91701ac8195a1d'/>
<id>9034c87d61be8cff989017740a91701ac8195a1d</id>
<content type='text'>
commit bb32500fb9b78215e4ef6ee8b4345c5f5d7eafb4 upstream

The following can crash the kernel:

 # cd /sys/kernel/tracing
 # echo 'p:sched schedule' &gt; kprobe_events
 # exec 5&gt;&gt;events/kprobes/sched/enable
 # &gt; kprobe_events
 # exec 5&gt;&amp;-

The above commands:

 1. Change directory to the tracefs directory
 2. Create a kprobe event (doesn't matter what one)
 3. Open bash file descriptor 5 on the enable file of the kprobe event
 4. Delete the kprobe event (removes the files too)
 5. Close the bash file descriptor 5

The above causes a crash!

 BUG: kernel NULL pointer dereference, address: 0000000000000028
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] PREEMPT SMP PTI
 CPU: 6 PID: 877 Comm: bash Not tainted 6.5.0-rc4-test-00008-g2c6b6b1029d4-dirty #186
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
 RIP: 0010:tracing_release_file_tr+0xc/0x50

What happens here is that the kprobe event creates a trace_event_file
"file" descriptor that represents the file in tracefs to the event. It
maintains state of the event (is it enabled for the given instance?).
Opening the "enable" file gets a reference to the event "file" descriptor
via the open file descriptor. When the kprobe event is deleted, the file is
also deleted from the tracefs system which also frees the event "file"
descriptor.

But as the tracefs file is still opened by user space, it will not be
totally removed until the final dput() is called on it. But this is not
true with the event "file" descriptor that is already freed. If the user
does a write to or simply closes the file descriptor it will reference the
event "file" descriptor that was just freed, causing a use-after-free bug.

To solve this, add a ref count to the event "file" descriptor as well as a
new flag called "FREED". The "file" will not be freed until the last
reference is released. But the FREE flag will be set when the event is
removed to prevent any more modifications to that event from happening,
even if there's still a reference to the event "file" descriptor.

Link: https://lore.kernel.org/linux-trace-kernel/20231031000031.1e705592@gandalf.local.home/
Link: https://lore.kernel.org/linux-trace-kernel/20231031122453.7a48b923@gandalf.local.home

Cc: stable@vger.kernel.org
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Fixes: f5ca233e2e66d ("tracing: Increase trace array ref count on enable and filter files")
Reported-by: Beau Belgrave &lt;beaub@linux.microsoft.com&gt;
Tested-by: Beau Belgrave &lt;beaub@linux.microsoft.com&gt;
Reviewed-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit bb32500fb9b78215e4ef6ee8b4345c5f5d7eafb4 upstream

The following can crash the kernel:

 # cd /sys/kernel/tracing
 # echo 'p:sched schedule' &gt; kprobe_events
 # exec 5&gt;&gt;events/kprobes/sched/enable
 # &gt; kprobe_events
 # exec 5&gt;&amp;-

The above commands:

 1. Change directory to the tracefs directory
 2. Create a kprobe event (doesn't matter what one)
 3. Open bash file descriptor 5 on the enable file of the kprobe event
 4. Delete the kprobe event (removes the files too)
 5. Close the bash file descriptor 5

The above causes a crash!

 BUG: kernel NULL pointer dereference, address: 0000000000000028
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] PREEMPT SMP PTI
 CPU: 6 PID: 877 Comm: bash Not tainted 6.5.0-rc4-test-00008-g2c6b6b1029d4-dirty #186
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
 RIP: 0010:tracing_release_file_tr+0xc/0x50

What happens here is that the kprobe event creates a trace_event_file
"file" descriptor that represents the file in tracefs to the event. It
maintains state of the event (is it enabled for the given instance?).
Opening the "enable" file gets a reference to the event "file" descriptor
via the open file descriptor. When the kprobe event is deleted, the file is
also deleted from the tracefs system which also frees the event "file"
descriptor.

But as the tracefs file is still opened by user space, it will not be
totally removed until the final dput() is called on it. But this is not
true with the event "file" descriptor that is already freed. If the user
does a write to or simply closes the file descriptor it will reference the
event "file" descriptor that was just freed, causing a use-after-free bug.

To solve this, add a ref count to the event "file" descriptor as well as a
new flag called "FREED". The "file" will not be freed until the last
reference is released. But the FREE flag will be set when the event is
removed to prevent any more modifications to that event from happening,
even if there's still a reference to the event "file" descriptor.

Link: https://lore.kernel.org/linux-trace-kernel/20231031000031.1e705592@gandalf.local.home/
Link: https://lore.kernel.org/linux-trace-kernel/20231031122453.7a48b923@gandalf.local.home

Cc: stable@vger.kernel.org
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Fixes: f5ca233e2e66d ("tracing: Increase trace array ref count on enable and filter files")
Reported-by: Beau Belgrave &lt;beaub@linux.microsoft.com&gt;
Tested-by: Beau Belgrave &lt;beaub@linux.microsoft.com&gt;
Reviewed-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'trace-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace</title>
<updated>2023-09-13T18:30:11+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-09-13T18:30:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=99214f67784b11e98fb099201461aebe08dec3eb'/>
<id>99214f67784b11e98fb099201461aebe08dec3eb</id>
<content type='text'>
Pull tracing fixes from Steven Rostedt:

 - Add missing LOCKDOWN checks for eventfs callers

   When LOCKDOWN is active for tracing, it causes inconsistent state
   when some functions succeed and others fail.

 - Use dput() to free the top level eventfs descriptor

   There was a race between accesses and freeing it.

 - Fix a long standing bug that eventfs exposed due to changing timings
   by dynamically creating files. That is, If a event file is opened for
   an instance, there's nothing preventing the instance from being
   removed which will make accessing the files cause use-after-free
   bugs.

 - Fix a ring buffer race that happens when iterating over the ring
   buffer while writers are active. Check to make sure not to read the
   event meta data if it's beyond the end of the ring buffer sub buffer.

 - Fix the print trigger that disappeared because the test to create it
   was looking for the event dir field being filled, but now it has the
   "ef" field filled for the eventfs structure.

 - Remove the unused "dir" field from the event structure.

 - Fix the order of the trace_dynamic_info as it had it backwards for
   the offset and len fields for which one was for which endianess.

 - Fix NULL pointer dereference with eventfs_remove_rec()

   If an allocation fails in one of the eventfs_add_*() functions, the
   caller of it in event_subsystem_dir() or event_create_dir() assigns
   the result to the structure. But it's assigning the ERR_PTR and not
   NULL. This was passed to eventfs_remove_rec() which expects either a
   good pointer or a NULL, not ERR_PTR. The fix is to not assign the
   ERR_PTR to the structure, but to keep it NULL on error.

 - Fix list_for_each_rcu() to use list_for_each_srcu() in
   dcache_dir_open_wrapper(). One iteration of the code used RCU but
   because it had to call sleepable code, it had to be changed to use
   SRCU, but one of the iterations was missed.

 - Fix synthetic event print function to use "as_u64" instead of passing
   in a pointer to the union. To fix big/little endian issues, the u64
   that represented several types was turned into a union to define the
   types properly.

* tag 'trace-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  eventfs: Fix the NULL pointer dereference bug in eventfs_remove_rec()
  tracefs/eventfs: Use list_for_each_srcu() in dcache_dir_open_wrapper()
  tracing/synthetic: Print out u64 values properly
  tracing/synthetic: Fix order of struct trace_dynamic_info
  selftests/ftrace: Fix dependencies for some of the synthetic event tests
  tracing: Remove unused trace_event_file dir field
  tracing: Use the new eventfs descriptor for print trigger
  ring-buffer: Do not attempt to read past "commit"
  tracefs/eventfs: Free top level files on removal
  ring-buffer: Avoid softlockup in ring_buffer_resize()
  tracing: Have event inject files inc the trace array ref count
  tracing: Have option files inc the trace array ref count
  tracing: Have current_trace inc the trace array ref count
  tracing: Have tracing_max_latency inc the trace array ref count
  tracing: Increase trace array ref count on enable and filter files
  tracefs/eventfs: Use dput to free the toplevel events directory
  tracefs/eventfs: Add missing lockdown checks
  tracefs: Add missing lockdown check to tracefs_create_dir()
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull tracing fixes from Steven Rostedt:

 - Add missing LOCKDOWN checks for eventfs callers

   When LOCKDOWN is active for tracing, it causes inconsistent state
   when some functions succeed and others fail.

 - Use dput() to free the top level eventfs descriptor

   There was a race between accesses and freeing it.

 - Fix a long standing bug that eventfs exposed due to changing timings
   by dynamically creating files. That is, If a event file is opened for
   an instance, there's nothing preventing the instance from being
   removed which will make accessing the files cause use-after-free
   bugs.

 - Fix a ring buffer race that happens when iterating over the ring
   buffer while writers are active. Check to make sure not to read the
   event meta data if it's beyond the end of the ring buffer sub buffer.

 - Fix the print trigger that disappeared because the test to create it
   was looking for the event dir field being filled, but now it has the
   "ef" field filled for the eventfs structure.

 - Remove the unused "dir" field from the event structure.

 - Fix the order of the trace_dynamic_info as it had it backwards for
   the offset and len fields for which one was for which endianess.

 - Fix NULL pointer dereference with eventfs_remove_rec()

   If an allocation fails in one of the eventfs_add_*() functions, the
   caller of it in event_subsystem_dir() or event_create_dir() assigns
   the result to the structure. But it's assigning the ERR_PTR and not
   NULL. This was passed to eventfs_remove_rec() which expects either a
   good pointer or a NULL, not ERR_PTR. The fix is to not assign the
   ERR_PTR to the structure, but to keep it NULL on error.

 - Fix list_for_each_rcu() to use list_for_each_srcu() in
   dcache_dir_open_wrapper(). One iteration of the code used RCU but
   because it had to call sleepable code, it had to be changed to use
   SRCU, but one of the iterations was missed.

 - Fix synthetic event print function to use "as_u64" instead of passing
   in a pointer to the union. To fix big/little endian issues, the u64
   that represented several types was turned into a union to define the
   types properly.

* tag 'trace-v6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  eventfs: Fix the NULL pointer dereference bug in eventfs_remove_rec()
  tracefs/eventfs: Use list_for_each_srcu() in dcache_dir_open_wrapper()
  tracing/synthetic: Print out u64 values properly
  tracing/synthetic: Fix order of struct trace_dynamic_info
  selftests/ftrace: Fix dependencies for some of the synthetic event tests
  tracing: Remove unused trace_event_file dir field
  tracing: Use the new eventfs descriptor for print trigger
  ring-buffer: Do not attempt to read past "commit"
  tracefs/eventfs: Free top level files on removal
  ring-buffer: Avoid softlockup in ring_buffer_resize()
  tracing: Have event inject files inc the trace array ref count
  tracing: Have option files inc the trace array ref count
  tracing: Have current_trace inc the trace array ref count
  tracing: Have tracing_max_latency inc the trace array ref count
  tracing: Increase trace array ref count on enable and filter files
  tracefs/eventfs: Use dput to free the toplevel events directory
  tracefs/eventfs: Add missing lockdown checks
  tracefs: Add missing lockdown check to tracefs_create_dir()
</pre>
</div>
</content>
</entry>
<entry>
<title>tracing: Use the new eventfs descriptor for print trigger</title>
<updated>2023-09-09T03:13:01+00:00</updated>
<author>
<name>Steven Rostedt (Google)</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2023-09-08T02:19:11+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=1ef26d8b2ca5d8715563c951083cf6c385c77d1f'/>
<id>1ef26d8b2ca5d8715563c951083cf6c385c77d1f</id>
<content type='text'>
The check to create the print event "trigger" was using the obsolete "dir"
value of the trace_event_file to determine if it should create the trigger
or not. But that value will now be NULL because it uses the event file
descriptor.

Change it to test the "ef" field of the trace_event_file structure so that
the trace_marker "trigger" file appears again.

Link: https://lkml.kernel.org/r/20230908022001.371815239@goodmis.org

Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Ajay Kaher &lt;akaher@vmware.com&gt;
Fixes: 27152bceea1df ("eventfs: Move tracing/events to eventfs")
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The check to create the print event "trigger" was using the obsolete "dir"
value of the trace_event_file to determine if it should create the trigger
or not. But that value will now be NULL because it uses the event file
descriptor.

Change it to test the "ef" field of the trace_event_file structure so that
the trace_marker "trigger" file appears again.

Link: https://lkml.kernel.org/r/20230908022001.371815239@goodmis.org

Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Ajay Kaher &lt;akaher@vmware.com&gt;
Fixes: 27152bceea1df ("eventfs: Move tracing/events to eventfs")
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tracing: Have option files inc the trace array ref count</title>
<updated>2023-09-07T20:38:54+00:00</updated>
<author>
<name>Steven Rostedt (Google)</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2023-09-07T02:47:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=7e2cfbd2d3c86afcd5c26b5c4b1dd251f63c5838'/>
<id>7e2cfbd2d3c86afcd5c26b5c4b1dd251f63c5838</id>
<content type='text'>
The option files update the options for a given trace array. For an
instance, if the file is opened and the instance is deleted, reading or
writing to the file will cause a use after free.

Up the ref count of the trace_array when an option file is opened.

Link: https://lkml.kernel.org/r/20230907024804.086679464@goodmis.org
Link: https://lore.kernel.org/all/1cb3aee2-19af-c472-e265-05176fe9bd84@huawei.com/

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Zheng Yejian &lt;zhengyejian1@huawei.com&gt;
Fixes: 8530dec63e7b4 ("tracing: Add tracing_check_open_get_tr()")
Tested-by: Linux Kernel Functional Testing &lt;lkft@linaro.org&gt;
Tested-by: Naresh Kamboju &lt;naresh.kamboju@linaro.org&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The option files update the options for a given trace array. For an
instance, if the file is opened and the instance is deleted, reading or
writing to the file will cause a use after free.

Up the ref count of the trace_array when an option file is opened.

Link: https://lkml.kernel.org/r/20230907024804.086679464@goodmis.org
Link: https://lore.kernel.org/all/1cb3aee2-19af-c472-e265-05176fe9bd84@huawei.com/

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Zheng Yejian &lt;zhengyejian1@huawei.com&gt;
Fixes: 8530dec63e7b4 ("tracing: Add tracing_check_open_get_tr()")
Tested-by: Linux Kernel Functional Testing &lt;lkft@linaro.org&gt;
Tested-by: Naresh Kamboju &lt;naresh.kamboju@linaro.org&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tracing: Have current_trace inc the trace array ref count</title>
<updated>2023-09-07T20:38:53+00:00</updated>
<author>
<name>Steven Rostedt (Google)</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2023-09-07T02:47:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9b37febc578b2e1ad76a105aab11d00af5ec3d27'/>
<id>9b37febc578b2e1ad76a105aab11d00af5ec3d27</id>
<content type='text'>
The current_trace updates the trace array tracer. For an instance, if the
file is opened and the instance is deleted, reading or writing to the file
will cause a use after free.

Up the ref count of the trace array when current_trace is opened.

Link: https://lkml.kernel.org/r/20230907024803.877687227@goodmis.org
Link: https://lore.kernel.org/all/1cb3aee2-19af-c472-e265-05176fe9bd84@huawei.com/

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Zheng Yejian &lt;zhengyejian1@huawei.com&gt;
Fixes: 8530dec63e7b4 ("tracing: Add tracing_check_open_get_tr()")
Tested-by: Linux Kernel Functional Testing &lt;lkft@linaro.org&gt;
Tested-by: Naresh Kamboju &lt;naresh.kamboju@linaro.org&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The current_trace updates the trace array tracer. For an instance, if the
file is opened and the instance is deleted, reading or writing to the file
will cause a use after free.

Up the ref count of the trace array when current_trace is opened.

Link: https://lkml.kernel.org/r/20230907024803.877687227@goodmis.org
Link: https://lore.kernel.org/all/1cb3aee2-19af-c472-e265-05176fe9bd84@huawei.com/

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Zheng Yejian &lt;zhengyejian1@huawei.com&gt;
Fixes: 8530dec63e7b4 ("tracing: Add tracing_check_open_get_tr()")
Tested-by: Linux Kernel Functional Testing &lt;lkft@linaro.org&gt;
Tested-by: Naresh Kamboju &lt;naresh.kamboju@linaro.org&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tracing: Have tracing_max_latency inc the trace array ref count</title>
<updated>2023-09-07T20:38:53+00:00</updated>
<author>
<name>Steven Rostedt (Google)</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2023-09-07T02:47:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=7d660c9b2bc95107f90a9f4c4759be85309a6550'/>
<id>7d660c9b2bc95107f90a9f4c4759be85309a6550</id>
<content type='text'>
The tracing_max_latency file points to the trace_array max_latency field.
For an instance, if the file is opened and the instance is deleted,
reading or writing to the file will cause a use after free.

Up the ref count of the trace_array when tracing_max_latency is opened.

Link: https://lkml.kernel.org/r/20230907024803.666889383@goodmis.org
Link: https://lore.kernel.org/all/1cb3aee2-19af-c472-e265-05176fe9bd84@huawei.com/

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Zheng Yejian &lt;zhengyejian1@huawei.com&gt;
Fixes: 8530dec63e7b4 ("tracing: Add tracing_check_open_get_tr()")
Tested-by: Linux Kernel Functional Testing &lt;lkft@linaro.org&gt;
Tested-by: Naresh Kamboju &lt;naresh.kamboju@linaro.org&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The tracing_max_latency file points to the trace_array max_latency field.
For an instance, if the file is opened and the instance is deleted,
reading or writing to the file will cause a use after free.

Up the ref count of the trace_array when tracing_max_latency is opened.

Link: https://lkml.kernel.org/r/20230907024803.666889383@goodmis.org
Link: https://lore.kernel.org/all/1cb3aee2-19af-c472-e265-05176fe9bd84@huawei.com/

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Cc: Zheng Yejian &lt;zhengyejian1@huawei.com&gt;
Fixes: 8530dec63e7b4 ("tracing: Add tracing_check_open_get_tr()")
Tested-by: Linux Kernel Functional Testing &lt;lkft@linaro.org&gt;
Tested-by: Naresh Kamboju &lt;naresh.kamboju@linaro.org&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tracing: Increase trace array ref count on enable and filter files</title>
<updated>2023-09-07T20:05:46+00:00</updated>
<author>
<name>Steven Rostedt (Google)</name>
<email>rostedt@goodmis.org</email>
</author>
<published>2023-09-07T02:47:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f5ca233e2e66dc1c249bf07eefa37e34a6c9346a'/>
<id>f5ca233e2e66dc1c249bf07eefa37e34a6c9346a</id>
<content type='text'>
When the trace event enable and filter files are opened, increment the
trace array ref counter, otherwise they can be accessed when the trace
array is being deleted. The ref counter keeps the trace array from being
deleted while those files are opened.

Link: https://lkml.kernel.org/r/20230907024803.456187066@goodmis.org
Link: https://lore.kernel.org/all/1cb3aee2-19af-c472-e265-05176fe9bd84@huawei.com/

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Fixes: 8530dec63e7b4 ("tracing: Add tracing_check_open_get_tr()")
Tested-by: Linux Kernel Functional Testing &lt;lkft@linaro.org&gt;
Tested-by: Naresh Kamboju &lt;naresh.kamboju@linaro.org&gt;
Reported-by: Zheng Yejian &lt;zhengyejian1@huawei.com&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When the trace event enable and filter files are opened, increment the
trace array ref counter, otherwise they can be accessed when the trace
array is being deleted. The ref counter keeps the trace array from being
deleted while those files are opened.

Link: https://lkml.kernel.org/r/20230907024803.456187066@goodmis.org
Link: https://lore.kernel.org/all/1cb3aee2-19af-c472-e265-05176fe9bd84@huawei.com/

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu &lt;mhiramat@kernel.org&gt;
Cc: Mark Rutland &lt;mark.rutland@arm.com&gt;
Cc: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Fixes: 8530dec63e7b4 ("tracing: Add tracing_check_open_get_tr()")
Tested-by: Linux Kernel Functional Testing &lt;lkft@linaro.org&gt;
Tested-by: Naresh Kamboju &lt;naresh.kamboju@linaro.org&gt;
Reported-by: Zheng Yejian &lt;zhengyejian1@huawei.com&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'probes-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace</title>
<updated>2023-09-02T18:10:50+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-09-02T18:10:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=b70100f2e62aeec2087d7690e41f7d6afd445f5a'/>
<id>b70100f2e62aeec2087d7690e41f7d6afd445f5a</id>
<content type='text'>
Pull probes updates from Masami Hiramatsu:

 - kprobes: use struct_size() for variable size kretprobe_instance data
   structure.

 - eprobe: Simplify trace_eprobe list iteration.

 - probe events: Data structure field access support on BTF argument.

     - Update BTF argument support on the functions in the kernel
       loadable modules (only loaded modules are supported).

     - Move generic BTF access function (search function prototype and
       get function parameters) to a separated file.

     - Add a function to search a member of data structure in BTF.

     - Support accessing BTF data structure member from probe args by
       C-like arrow('-&gt;') and dot('.') operators. e.g.
          't sched_switch next=next-&gt;pid vruntime=next-&gt;se.vruntime'

     - Support accessing BTF data structure member from $retval. e.g.
          'f getname_flags%return +0($retval-&gt;name):string'

     - Add string type checking if BTF type info is available. This will
       reject if user specify ":string" type for non "char pointer"
       type.

     - Automatically assume the fprobe event as a function return event
       if $retval is used.

 - selftests/ftrace: Add BTF data field access test cases.

 - Documentation: Update fprobe event example with BTF data field.

* tag 'probes-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  Documentation: tracing: Update fprobe event example with BTF field
  selftests/ftrace: Add BTF fields access testcases
  tracing/fprobe-event: Assume fprobe is a return event by $retval
  tracing/probes: Add string type check with BTF
  tracing/probes: Support BTF field access from $retval
  tracing/probes: Support BTF based data structure field access
  tracing/probes: Add a function to search a member of a struct/union
  tracing/probes: Move finding func-proto API and getting func-param API to trace_btf
  tracing/probes: Support BTF argument on module functions
  tracing/eprobe: Iterate trace_eprobe directly
  kernel: kprobes: Use struct_size()
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull probes updates from Masami Hiramatsu:

 - kprobes: use struct_size() for variable size kretprobe_instance data
   structure.

 - eprobe: Simplify trace_eprobe list iteration.

 - probe events: Data structure field access support on BTF argument.

     - Update BTF argument support on the functions in the kernel
       loadable modules (only loaded modules are supported).

     - Move generic BTF access function (search function prototype and
       get function parameters) to a separated file.

     - Add a function to search a member of data structure in BTF.

     - Support accessing BTF data structure member from probe args by
       C-like arrow('-&gt;') and dot('.') operators. e.g.
          't sched_switch next=next-&gt;pid vruntime=next-&gt;se.vruntime'

     - Support accessing BTF data structure member from $retval. e.g.
          'f getname_flags%return +0($retval-&gt;name):string'

     - Add string type checking if BTF type info is available. This will
       reject if user specify ":string" type for non "char pointer"
       type.

     - Automatically assume the fprobe event as a function return event
       if $retval is used.

 - selftests/ftrace: Add BTF data field access test cases.

 - Documentation: Update fprobe event example with BTF data field.

* tag 'probes-v6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace:
  Documentation: tracing: Update fprobe event example with BTF field
  selftests/ftrace: Add BTF fields access testcases
  tracing/fprobe-event: Assume fprobe is a return event by $retval
  tracing/probes: Add string type check with BTF
  tracing/probes: Support BTF field access from $retval
  tracing/probes: Support BTF based data structure field access
  tracing/probes: Add a function to search a member of a struct/union
  tracing/probes: Move finding func-proto API and getting func-param API to trace_btf
  tracing/probes: Support BTF argument on module functions
  tracing/eprobe: Iterate trace_eprobe directly
  kernel: kprobes: Use struct_size()
</pre>
</div>
</content>
</entry>
<entry>
<title>tracing: Zero the pipe cpumask on alloc to avoid spurious -EBUSY</title>
<updated>2023-09-02T01:26:07+00:00</updated>
<author>
<name>Brian Foster</name>
<email>bfoster@redhat.com</email>
</author>
<published>2023-08-31T12:55:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=3d07fa1dd19035eb0b13ae6697efd5caa9033e74'/>
<id>3d07fa1dd19035eb0b13ae6697efd5caa9033e74</id>
<content type='text'>
The pipe cpumask used to serialize opens between the main and percpu
trace pipes is not zeroed or initialized. This can result in
spurious -EBUSY returns if underlying memory is not fully zeroed.
This has been observed by immediate failure to read the main
trace_pipe file on an otherwise newly booted and idle system:

 # cat /sys/kernel/debug/tracing/trace_pipe
 cat: /sys/kernel/debug/tracing/trace_pipe: Device or resource busy

Zero the allocation of pipe_cpumask to avoid the problem.

Link: https://lore.kernel.org/linux-trace-kernel/20230831125500.986862-1-bfoster@redhat.com

Cc: stable@vger.kernel.org
Fixes: c2489bb7e6be ("tracing: Introduce pipe_cpumask to avoid race on trace_pipes")
Reviewed-by: Zheng Yejian &lt;zhengyejian1@huawei.com&gt;
Reviewed-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Brian Foster &lt;bfoster@redhat.com&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The pipe cpumask used to serialize opens between the main and percpu
trace pipes is not zeroed or initialized. This can result in
spurious -EBUSY returns if underlying memory is not fully zeroed.
This has been observed by immediate failure to read the main
trace_pipe file on an otherwise newly booted and idle system:

 # cat /sys/kernel/debug/tracing/trace_pipe
 cat: /sys/kernel/debug/tracing/trace_pipe: Device or resource busy

Zero the allocation of pipe_cpumask to avoid the problem.

Link: https://lore.kernel.org/linux-trace-kernel/20230831125500.986862-1-bfoster@redhat.com

Cc: stable@vger.kernel.org
Fixes: c2489bb7e6be ("tracing: Introduce pipe_cpumask to avoid race on trace_pipes")
Reviewed-by: Zheng Yejian &lt;zhengyejian1@huawei.com&gt;
Reviewed-by: Masami Hiramatsu (Google) &lt;mhiramat@kernel.org&gt;
Signed-off-by: Brian Foster &lt;bfoster@redhat.com&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>tracing: Fix race issue between cpu buffer write and swap</title>
<updated>2023-09-02T01:00:00+00:00</updated>
<author>
<name>Zheng Yejian</name>
<email>zhengyejian1@huawei.com</email>
</author>
<published>2023-08-31T13:27:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=3163f635b20e9e1fb4659e74f47918c9dddfe64e'/>
<id>3163f635b20e9e1fb4659e74f47918c9dddfe64e</id>
<content type='text'>
Warning happened in rb_end_commit() at code:
	if (RB_WARN_ON(cpu_buffer, !local_read(&amp;cpu_buffer-&gt;committing)))

  WARNING: CPU: 0 PID: 139 at kernel/trace/ring_buffer.c:3142
	rb_commit+0x402/0x4a0
  Call Trace:
   ring_buffer_unlock_commit+0x42/0x250
   trace_buffer_unlock_commit_regs+0x3b/0x250
   trace_event_buffer_commit+0xe5/0x440
   trace_event_buffer_reserve+0x11c/0x150
   trace_event_raw_event_sched_switch+0x23c/0x2c0
   __traceiter_sched_switch+0x59/0x80
   __schedule+0x72b/0x1580
   schedule+0x92/0x120
   worker_thread+0xa0/0x6f0

It is because the race between writing event into cpu buffer and swapping
cpu buffer through file per_cpu/cpu0/snapshot:

  Write on CPU 0             Swap buffer by per_cpu/cpu0/snapshot on CPU 1
  --------                   --------
                             tracing_snapshot_write()
                               [...]

  ring_buffer_lock_reserve()
    cpu_buffer = buffer-&gt;buffers[cpu]; // 1. Suppose find 'cpu_buffer_a';
    [...]
    rb_reserve_next_event()
      [...]

                               ring_buffer_swap_cpu()
                                 if (local_read(&amp;cpu_buffer_a-&gt;committing))
                                     goto out_dec;
                                 if (local_read(&amp;cpu_buffer_b-&gt;committing))
                                     goto out_dec;
                                 buffer_a-&gt;buffers[cpu] = cpu_buffer_b;
                                 buffer_b-&gt;buffers[cpu] = cpu_buffer_a;
                                 // 2. cpu_buffer has swapped here.

      rb_start_commit(cpu_buffer);
      if (unlikely(READ_ONCE(cpu_buffer-&gt;buffer)
          != buffer)) { // 3. This check passed due to 'cpu_buffer-&gt;buffer'
        [...]           //    has not changed here.
        return NULL;
      }
                                 cpu_buffer_b-&gt;buffer = buffer_a;
                                 cpu_buffer_a-&gt;buffer = buffer_b;
                                 [...]

      // 4. Reserve event from 'cpu_buffer_a'.

  ring_buffer_unlock_commit()
    [...]
    cpu_buffer = buffer-&gt;buffers[cpu]; // 5. Now find 'cpu_buffer_b' !!!
    rb_commit(cpu_buffer)
      rb_end_commit()  // 6. WARN for the wrong 'committing' state !!!

Based on above analysis, we can easily reproduce by following testcase:
  ``` bash
  #!/bin/bash

  dmesg -n 7
  sysctl -w kernel.panic_on_warn=1
  TR=/sys/kernel/tracing
  echo 7 &gt; ${TR}/buffer_size_kb
  echo "sched:sched_switch" &gt; ${TR}/set_event
  while [ true ]; do
          echo 1 &gt; ${TR}/per_cpu/cpu0/snapshot
  done &amp;
  while [ true ]; do
          echo 1 &gt; ${TR}/per_cpu/cpu0/snapshot
  done &amp;
  while [ true ]; do
          echo 1 &gt; ${TR}/per_cpu/cpu0/snapshot
  done &amp;
  ```

To fix it, IIUC, we can use smp_call_function_single() to do the swap on
the target cpu where the buffer is located, so that above race would be
avoided.

Link: https://lore.kernel.org/linux-trace-kernel/20230831132739.4070878-1-zhengyejian1@huawei.com

Cc: &lt;mhiramat@kernel.org&gt;
Fixes: f1affcaaa861 ("tracing: Add snapshot in the per_cpu trace directories")
Signed-off-by: Zheng Yejian &lt;zhengyejian1@huawei.com&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Warning happened in rb_end_commit() at code:
	if (RB_WARN_ON(cpu_buffer, !local_read(&amp;cpu_buffer-&gt;committing)))

  WARNING: CPU: 0 PID: 139 at kernel/trace/ring_buffer.c:3142
	rb_commit+0x402/0x4a0
  Call Trace:
   ring_buffer_unlock_commit+0x42/0x250
   trace_buffer_unlock_commit_regs+0x3b/0x250
   trace_event_buffer_commit+0xe5/0x440
   trace_event_buffer_reserve+0x11c/0x150
   trace_event_raw_event_sched_switch+0x23c/0x2c0
   __traceiter_sched_switch+0x59/0x80
   __schedule+0x72b/0x1580
   schedule+0x92/0x120
   worker_thread+0xa0/0x6f0

It is because the race between writing event into cpu buffer and swapping
cpu buffer through file per_cpu/cpu0/snapshot:

  Write on CPU 0             Swap buffer by per_cpu/cpu0/snapshot on CPU 1
  --------                   --------
                             tracing_snapshot_write()
                               [...]

  ring_buffer_lock_reserve()
    cpu_buffer = buffer-&gt;buffers[cpu]; // 1. Suppose find 'cpu_buffer_a';
    [...]
    rb_reserve_next_event()
      [...]

                               ring_buffer_swap_cpu()
                                 if (local_read(&amp;cpu_buffer_a-&gt;committing))
                                     goto out_dec;
                                 if (local_read(&amp;cpu_buffer_b-&gt;committing))
                                     goto out_dec;
                                 buffer_a-&gt;buffers[cpu] = cpu_buffer_b;
                                 buffer_b-&gt;buffers[cpu] = cpu_buffer_a;
                                 // 2. cpu_buffer has swapped here.

      rb_start_commit(cpu_buffer);
      if (unlikely(READ_ONCE(cpu_buffer-&gt;buffer)
          != buffer)) { // 3. This check passed due to 'cpu_buffer-&gt;buffer'
        [...]           //    has not changed here.
        return NULL;
      }
                                 cpu_buffer_b-&gt;buffer = buffer_a;
                                 cpu_buffer_a-&gt;buffer = buffer_b;
                                 [...]

      // 4. Reserve event from 'cpu_buffer_a'.

  ring_buffer_unlock_commit()
    [...]
    cpu_buffer = buffer-&gt;buffers[cpu]; // 5. Now find 'cpu_buffer_b' !!!
    rb_commit(cpu_buffer)
      rb_end_commit()  // 6. WARN for the wrong 'committing' state !!!

Based on above analysis, we can easily reproduce by following testcase:
  ``` bash
  #!/bin/bash

  dmesg -n 7
  sysctl -w kernel.panic_on_warn=1
  TR=/sys/kernel/tracing
  echo 7 &gt; ${TR}/buffer_size_kb
  echo "sched:sched_switch" &gt; ${TR}/set_event
  while [ true ]; do
          echo 1 &gt; ${TR}/per_cpu/cpu0/snapshot
  done &amp;
  while [ true ]; do
          echo 1 &gt; ${TR}/per_cpu/cpu0/snapshot
  done &amp;
  while [ true ]; do
          echo 1 &gt; ${TR}/per_cpu/cpu0/snapshot
  done &amp;
  ```

To fix it, IIUC, we can use smp_call_function_single() to do the swap on
the target cpu where the buffer is located, so that above race would be
avoided.

Link: https://lore.kernel.org/linux-trace-kernel/20230831132739.4070878-1-zhengyejian1@huawei.com

Cc: &lt;mhiramat@kernel.org&gt;
Fixes: f1affcaaa861 ("tracing: Add snapshot in the per_cpu trace directories")
Signed-off-by: Zheng Yejian &lt;zhengyejian1@huawei.com&gt;
Signed-off-by: Steven Rostedt (Google) &lt;rostedt@goodmis.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
