| Age | Commit message (Collapse) | Author |
|
commit 2840f84f74035e5a535959d5f17269c69fa6edc5 upstream.
The following commands will cause a memory leak:
# cd /sys/kernel/tracing
# mkdir instances/foo
# echo schedule > instance/foo/set_ftrace_filter
# rmdir instances/foo
The reason is that the hashes that hold the filters to set_ftrace_filter and
set_ftrace_notrace are not freed if they contain any data on the instance
and the instance is removed.
Found by kmemleak detector.
Fixes: 591dffdade9f ("ftrace: Allow for function tracing instance to filter functions")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 3cec638b3d793b7cacdec5b8072364b41caeb0e1 upstream.
When create_event_filter() fails in set_trigger_filter(), the filter may
still be allocated and needs to be freed. The caller expects the
data->filter to be updated with the new filter, even if the new filter
failed (we could add an error message by setting set_str parameter of
create_event_filter(), but that's another update).
But because the error would just exit, filter was left hanging and
nothing could free it.
Found by kmemleak detector.
Fixes: bac5fb97a173a ("tracing: Add and use generic set_trigger_filter() implementation")
Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 83f365554e47997ec68dc4eca3f5dce525cd15c3 upstream.
When reducing ring buffer size, pages are removed by scheduling a work
item on each CPU for the corresponding CPU ring buffer. After the pages
are removed from ring buffer linked list, the pages are free()d in a
tight loop. The loop does not give up CPU until all pages are removed.
In a worst case behavior, when lot of pages are to be freed, it can
cause system stall.
After the pages are removed from the list, the free() can happen while
the work is rescheduled. Call cond_resched() in the loop to prevent the
system hangup.
Link: http://lkml.kernel.org/r/20180907223129.71994-1-vnagarnaik@google.com
Fixes: 83f40318dab00 ("ring-buffer: Make removal of ring buffer pages atomic")
Reported-by: Jason Behmer <jbehmer@google.com>
Signed-off-by: Vaibhav Nagarnaik <vnagarnaik@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 757d9140072054528b13bbe291583d9823cde195 upstream.
Masami Hiramatsu reported:
Current trace-enable attribute in sysfs returns an error
if user writes the same setting value as current one,
e.g.
# cat /sys/block/sda/trace/enable
0
# echo 0 > /sys/block/sda/trace/enable
bash: echo: write error: Invalid argument
# echo 1 > /sys/block/sda/trace/enable
# echo 1 > /sys/block/sda/trace/enable
bash: echo: write error: Device or resource busy
But this is not a preferred behavior, it should ignore
if new setting is same as current one. This fixes the
problem as below.
# cat /sys/block/sda/trace/enable
0
# echo 0 > /sys/block/sda/trace/enable
# echo 1 > /sys/block/sda/trace/enable
# echo 1 > /sys/block/sda/trace/enable
Link: http://lkml.kernel.org/r/20180816103802.08678002@gandalf.local.home
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: linux-block@vger.kernel.org
Fixes: cd649b8bb830d ("blktrace: remove sysfs_blk_trace_enable_show/store()")
Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Tested-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 016f8ffc48cb01d1e7701649c728c5d2e737d295 upstream.
While debugging another bug, I was looking at all the synchronize*()
functions being used in kernel/trace, and noticed that trace_uprobes was
using synchronize_sched(), with a comment to synchronize with
{u,ret}_probe_trace_func(). When looking at those functions, the data is
protected with "rcu_read_lock()" and not with "rcu_read_lock_sched()". This
is using the wrong synchronize_*() function.
Link: http://lkml.kernel.org/r/20180809160553.469e1e32@gandalf.local.home
Fixes: 70ed91c6ec7f8 ("tracing/uprobes: Support ftrace_event_file base multibuffer")
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit f143641bfef9a4a60c57af30de26c63057e7e695 upstream.
Currently, when one echo's in 1 into tracing_on, the current tracer's
"start()" function is executed, even if tracing_on was already one. This can
lead to strange side effects. One being that if the hwlat tracer is enabled,
and someone does "echo 1 > tracing_on" into tracing_on, the hwlat tracer's
start() function is called again which will recreate another kernel thread,
and make it unable to remove the old one.
Link: http://lkml.kernel.org/r/1533120354-22923-1-git-send-email-erica.bugden@linutronix.de
Fixes: 2df8f8a6a897e ("tracing: Fix regression with irqsoff tracer and tracing_on file")
Reported-by: Erica Bugden <erica.bugden@linutronix.de>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 2519c1bbe38d7acacc9aacba303ca6f97482ed53 upstream.
Commit 57ea2a34adf4 ("tracing/kprobes: Fix trace_probe flags on
enable_trace_kprobe() failure") added an if statement that depends on another
if statement that gcc doesn't see will initialize the "link" variable and
gives the warning:
"warning: 'link' may be used uninitialized in this function"
It is really a false positive, but to quiet the warning, and also to make
sure that it never actually is used uninitialized, initialize the "link"
variable to NULL and add an if (!WARN_ON_ONCE(!link)) where the compiler
thinks it could be used uninitialized.
Fixes: 57ea2a34adf4 ("tracing/kprobes: Fix trace_probe flags on enable_trace_kprobe() failure")
Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 15cc78644d0075e76d59476a4467e7143860f660 upstream.
There was a case that triggered a double free in event_trigger_callback()
due to the called reg() function freeing the trigger_data and then it
getting freed again by the error return by the caller. The solution there
was to up the trigger_data ref count.
Code inspection found that event_enable_trigger_func() has the same issue,
but is not as easy to trigger (requires harder to trigger failures). It
needs to be solved slightly different as it needs more to clean up when the
reg() function fails.
Link: http://lkml.kernel.org/r/20180725124008.7008e586@gandalf.local.home
Fixes: 7862ad1846e99 ("tracing: Add 'enable_event' and 'disable_event' event trigger commands")
Reivewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 57ea2a34adf40f3a6e88409aafcf803b8945619a upstream.
If enable_trace_kprobe fails to enable the probe in enable_k(ret)probe
it returns an error, but does not unset the tp flags it set previously.
This results in a probe being considered enabled and failures like being
unable to remove the probe through kprobe_events file since probes_open()
expects every probe to be disabled.
Link: http://lkml.kernel.org/r/20180725102826.8300-1-asavkov@redhat.com
Link: http://lkml.kernel.org/r/20180725142038.4765-1-asavkov@redhat.com
Cc: Ingo Molnar <mingo@redhat.com>
Fixes: 41a7dd420c57 ("tracing/kprobes: Support ftrace_event_file base multibuffer")
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Artem Savkov <asavkov@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 73c8d8945505acdcbae137c2e00a1232e0be709f upstream.
Maintain the tracing on/off setting of the ring_buffer when switching
to the trace buffer snapshot.
Taking a snapshot is done by swapping the backup ring buffer
(max_tr_buffer). But since the tracing on/off setting is defined
by the ring buffer, when swapping it, the tracing on/off setting
can also be changed. This causes a strange result like below:
/sys/kernel/debug/tracing # cat tracing_on
1
/sys/kernel/debug/tracing # echo 0 > tracing_on
/sys/kernel/debug/tracing # cat tracing_on
0
/sys/kernel/debug/tracing # echo 1 > snapshot
/sys/kernel/debug/tracing # cat tracing_on
1
/sys/kernel/debug/tracing # echo 1 > snapshot
/sys/kernel/debug/tracing # cat tracing_on
0
We don't touch tracing_on, but snapshot changes tracing_on
setting each time. This is an anomaly, because user doesn't know
that each "ring_buffer" stores its own tracing-enable state and
the snapshot is done by swapping ring buffers.
Link: http://lkml.kernel.org/r/153149929558.11274.11730609978254724394.stgit@devbox
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Shuah Khan <shuah@kernel.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Hiraku Toyooka <hiraku.toyooka@cybertrust.co.jp>
Fixes: debdd57f5145 ("tracing: Make a snapshot feature available from userspace")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
[ Updated commit log and comment in the code ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 1863c387259b629e4ebfb255495f67cd06aa229b upstream.
Running the following:
# cd /sys/kernel/debug/tracing
# echo 500000 > buffer_size_kb
[ Or some other number that takes up most of memory ]
# echo snapshot > events/sched/sched_switch/trigger
Triggers the following bug:
------------[ cut here ]------------
kernel BUG at mm/slub.c:296!
invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC PTI
CPU: 6 PID: 6878 Comm: bash Not tainted 4.18.0-rc6-test+ #1066
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
RIP: 0010:kfree+0x16c/0x180
Code: 05 41 0f b6 72 51 5b 5d 41 5c 4c 89 d7 e9 ac b3 f8 ff 48 89 d9 48 89 da 41 b8 01 00 00 00 5b 5d 41 5c 4c 89 d6 e9 f4 f3 ff ff <0f> 0b 0f 0b 48 8b 3d d9 d8 f9 00 e9 c1 fe ff ff 0f 1f 40 00 0f 1f
RSP: 0018:ffffb654436d3d88 EFLAGS: 00010246
RAX: ffff91a9d50f3d80 RBX: ffff91a9d50f3d80 RCX: ffff91a9d50f3d80
RDX: 00000000000006a4 RSI: ffff91a9de5a60e0 RDI: ffff91a9d9803500
RBP: ffffffff8d267c80 R08: 00000000000260e0 R09: ffffffff8c1a56be
R10: fffff0d404543cc0 R11: 0000000000000389 R12: ffffffff8c1a56be
R13: ffff91a9d9930e18 R14: ffff91a98c0c2890 R15: ffffffff8d267d00
FS: 00007f363ea64700(0000) GS:ffff91a9de580000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055c1cacc8e10 CR3: 00000000d9b46003 CR4: 00000000001606e0
Call Trace:
event_trigger_callback+0xee/0x1d0
event_trigger_write+0xfc/0x1a0
__vfs_write+0x33/0x190
? handle_mm_fault+0x115/0x230
? _cond_resched+0x16/0x40
vfs_write+0xb0/0x190
ksys_write+0x52/0xc0
do_syscall_64+0x5a/0x160
entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f363e16ab50
Code: 73 01 c3 48 8b 0d 38 83 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 83 3d 79 db 2c 00 00 75 10 b8 01 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 1e e3 01 00 48 89 04 24
RSP: 002b:00007fff9a4c6378 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007f363e16ab50
RDX: 0000000000000009 RSI: 000055c1cacc8e10 RDI: 0000000000000001
RBP: 000055c1cacc8e10 R08: 00007f363e435740 R09: 00007f363ea64700
R10: 0000000000000073 R11: 0000000000000246 R12: 0000000000000009
R13: 0000000000000001 R14: 00007f363e4345e0 R15: 00007f363e4303c0
Modules linked in: ip6table_filter ip6_tables snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic snd_hda_intel snd_hda_codec snd_hwdep snd_hda_core snd_seq snd_seq_device i915 snd_pcm snd_timer i2c_i801 snd soundcore i2c_algo_bit drm_kms_helper
86_pkg_temp_thermal video kvm_intel kvm irqbypass wmi e1000e
---[ end trace d301afa879ddfa25 ]---
The cause is because the register_snapshot_trigger() call failed to
allocate the snapshot buffer, and then called unregister_trigger()
which freed the data that was passed to it. Then on return to the
function that called register_snapshot_trigger(), as it sees it
failed to register, it frees the trigger_data again and causes
a double free.
By calling event_trigger_init() on the trigger_data (which only ups
the reference counter for it), and then event_trigger_free() afterward,
the trigger_data would not get freed by the registering trigger function
as it would only up and lower the ref count for it. If the register
trigger function fails, then the event_trigger_free() called after it
will free the trigger data normally.
Link: http://lkml.kernel.org/r/20180724191331.738eb819@gandalf.local.home
Cc: stable@vger.kerne.org
Fixes: 93e31ffbf417 ("tracing: Add 'snapshot' event trigger command")
Reported-by: Masami Hiramatsu <mhiramat@kernel.org>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 1fe4293f4b8de75824935f8d8e9a99c7fc6873da upstream.
The function_graph tracer does not show the interrupt return marker for the
leaf entry. On leaf entries, we see an unbalanced interrupt marker (the
interrupt was entered, but nevern left).
Before:
1) | SyS_write() {
1) | __fdget_pos() {
1) 0.061 us | __fget_light();
1) 0.289 us | }
1) | vfs_write() {
1) 0.049 us | rw_verify_area();
1) + 15.424 us | __vfs_write();
1) ==========> |
1) 6.003 us | smp_apic_timer_interrupt();
1) 0.055 us | __fsnotify_parent();
1) 0.073 us | fsnotify();
1) + 23.665 us | }
1) + 24.501 us | }
After:
0) | SyS_write() {
0) | __fdget_pos() {
0) 0.052 us | __fget_light();
0) 0.328 us | }
0) | vfs_write() {
0) 0.057 us | rw_verify_area();
0) | __vfs_write() {
0) ==========> |
0) 8.548 us | smp_apic_timer_interrupt();
0) <========== |
0) + 36.507 us | } /* __vfs_write */
0) 0.049 us | __fsnotify_parent();
0) 0.066 us | fsnotify();
0) + 50.064 us | }
0) + 50.952 us | }
Link: http://lkml.kernel.org/r/1517413729-20411-1-git-send-email-changbin.du@intel.com
Fixes: f8b755ac8e0cc ("tracing/function-graph-tracer: Output arrows signal on hardirq call/return")
Signed-off-by: Changbin Du <changbin.du@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
[bwh: Backported to 3.16: Propagate return of TRACE_TYPE_PARTIAL_LINE from
print_graph_irq()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 86b389ff22bd6ad8fd3cb98e41cd271886c6d023 upstream.
If a instance has an event trigger enabled when it is freed, it could cause
an access of free memory. Here's the case that crashes:
# cd /sys/kernel/tracing
# mkdir instances/foo
# echo snapshot > instances/foo/events/initcall/initcall_start/trigger
# rmdir instances/foo
Would produce:
general protection fault: 0000 [#1] PREEMPT SMP PTI
Modules linked in: tun bridge ...
CPU: 5 PID: 6203 Comm: rmdir Tainted: G W 4.17.0-rc4-test+ #933
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016
RIP: 0010:clear_event_triggers+0x3b/0x70
RSP: 0018:ffffc90003783de0 EFLAGS: 00010286
RAX: 0000000000000000 RBX: 6b6b6b6b6b6b6b2b RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8800c7130ba0
RBP: ffffc90003783e00 R08: ffff8801131993f8 R09: 0000000100230016
R10: ffffc90003783d80 R11: 0000000000000000 R12: ffff8800c7130ba0
R13: ffff8800c7130bd8 R14: ffff8800cc093768 R15: 00000000ffffff9c
FS: 00007f6f4aa86700(0000) GS:ffff88011eb40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f6f4a5aed60 CR3: 00000000cd552001 CR4: 00000000001606e0
Call Trace:
event_trace_del_tracer+0x2a/0xc5
instance_rmdir+0x15c/0x200
tracefs_syscall_rmdir+0x52/0x90
vfs_rmdir+0xdb/0x160
do_rmdir+0x16d/0x1c0
__x64_sys_rmdir+0x17/0x20
do_syscall_64+0x55/0x1a0
entry_SYSCALL_64_after_hwframe+0x49/0xbe
This was due to the call the clears out the triggers when an instance is
being deleted not removing the trigger from the link list.
Fixes: 85f2b08268c01 ("tracing: Add basic event trigger framework")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit dc432c3d7f9bceb3de6f5b44fb9c657c9810ed6d upstream.
The regex match function regex_match_front() in the tracing filter logic,
was fixed to test just the pattern length from testing the entire test
string. That is, it went from strncmp(str, r->pattern, len) to
strcmp(str, r->pattern, r->len).
The issue is that str is not guaranteed to be nul terminated, and if r->len
is greater than the length of str, it can access more memory than is
allocated.
The solution is to add a simple test if (len < r->len) return 0.
Fixes: 285caad415f45 ("tracing/filters: Fix MATCH_FRONT_ONLY filter matching")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 0c92c7a3c5d416f47b32c5f20a611dfeca5d5f2e upstream.
As Miklos reported and suggested:
This pattern repeats two times in trace_uprobe.c and in
kernel/events/core.c as well:
ret = kern_path(filename, LOOKUP_FOLLOW, &path);
if (ret)
goto fail_address_parse;
inode = igrab(d_inode(path.dentry));
path_put(&path);
And it's wrong. You can only hold a reference to the inode if you
have an active ref to the superblock as well (which is normally
through path.mnt) or holding s_umount.
This way unmounting the containing filesystem while the tracepoint is
active will give you the "VFS: Busy inodes after unmount..." message
and a crash when the inode is finally put.
Solution: store path instead of inode.
This patch fixes two instances in trace_uprobe.c. struct path is added to
struct trace_uprobe to keep the inode and containing mount point
referenced.
Link: http://lkml.kernel.org/r/20180423172135.4050588-1-songliubraving@fb.com
Fixes: f3f096cfedf8 ("tracing: Provide trace events interface for uprobes")
Fixes: 33ea4b24277b ("perf/core: Implement the 'perf_uprobe' PMU")
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Howard McLauchlan <hmclauchlan@fb.com>
Cc: Josef Bacik <jbacik@fb.com>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Miklos Szeredi <mszeredi@redhat.com>
Reported-by: Miklos Szeredi <miklos@szeredi.hu>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
[bwh: Backported to 3.16:
- Open-code d_real_inode(), d_is_reg()
- Drop changes in create_local_trace_uprobe()]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 16a8ef2751801346f1f76a18685b2beb63cd170f upstream.
The iput() function tests whether its argument is NULL and then
returns immediately. Thus the test around the call is not needed.
This issue was detected by using the Coccinelle software.
Link: http://lkml.kernel.org/r/5468F875.7080907@users.sourceforge.net
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 6496bb72bf20c1c7e4d6be44dfa663163e709116 upstream.
Previously, `create_trace_uprobe` found the *first* occurence
of the ':' character when parsing `PATH:OFFSET` for a uprobe.
However, if the path contains a ':' character, then the function
would parse the path incorrectly. Even worse, if the path does not
exist, the subsequent call to `kern_path()` would set `ret` to
`ENOENT`, leading to very cryptic errno values in user space.
The fix is to find the *last* occurence of ':'.
How to repro:: The write fails with "No such file or directory", suggesting
incorrectly that the `uprobe_events` file does not exist.
$ mkdir testing && cd testing
$ cp /bin/bash .
$ cp /bin/bash ./bash:with:colon
$ echo "p:uprobes/p__root_testing_bash_0x6 /root/testing/bash:0x6" > /sys/kernel/debug/tracing/uprobe_events # this works
$ echo "p:uprobes/p__root_testing_bash_with_colon_0x6 /root/testing/bash:with:colon:0x6" >> /sys/kernel/debug/tracing/uprobe_events # this doesn't
-bash: echo: write error: No such file or directory
With the patch:
$ echo "p:uprobes/p__root_testing_bash_0x6 /root/testing/bash:0x6" > /sys/kernel/debug/tracing/uprobe_events # this still works
$ echo "p:uprobes/p__root_testing_bash_with_colon_0x6 /root/testing/bash:with:colon:0x6" >> /sys/kernel/debug/tracing/uprobe_events # this works now too!
$ cat /sys/kernel/debug/tracing/uprobe_events
p:uprobes/p__root_testing_bash_0x6 /root/testing/bash:0x0000000000000006
p:uprobes/p__root_testing_bash_with_colon_0x6 /root/testing/bash:with:colon:0x0000000000000006
Link: http://lkml.kernel.org/r/20170113165834.4081016-1-kennyyu@fb.com
Signed-off-by: Kenny Yu <kennyyu@fb.com>
Reviewed-by: Omar Sandoval <osandov@fb.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 5ba8a4a96f6eaa6af88e24c7794f142217aa3b6f upstream.
It's useless. Before:
[tracing]# echo 'p:test /a:0x0' >> uprobe_events
[tracing]# echo 'p:test a:0x0' >> uprobe_events
-bash: echo: write error: No such file or directory
[tracing]# echo 'p:test 1:0x0' >> uprobe_events
-bash: echo: write error: Invalid argument
After:
[tracing]# echo 'p:test 1:0x0' >> uprobe_events
-bash: echo: write error: No such file or directory
Link: http://lkml.kernel.org/r/20160825152110.25663-3-dsafonov@virtuozzo.com
Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 50268a3d266ecfdd6c5873d62b2758d9732fc598 upstream.
Fix string fetch function to terminate with NUL.
It is OK to drop the rest of string.
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: security@kernel.org
Cc: 范龙飞 <long7573@126.com>
Fixes: 5baaa59ef09e ("tracing/probes: Implement 'memory' fetch method for uprobes")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit c5d343b6b7badd1f5fe0873eff2e8d63a193e732 upstream.
In Documentation/trace/kprobetrace.txt, it says
@SYM[+|-offs] : Fetch memory at SYM +|- offs (SYM should be a data symbol)
However, the parser doesn't parse minus offset correctly, since
commit 2fba0c8867af ("tracing/kprobes: Fix probe offset to be
unsigned") drops minus ("-") offset support for kprobe probe
address usage.
This fixes the traceprobe_split_symbol_offset() to parse minus
offset again with checking the offset range, and add a minus
offset check in kprobe probe address usage.
Link: http://lkml.kernel.org/r/152129028983.31874.13419301530285775521.stgit@devbox
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Fixes: 2fba0c8867af ("tracing/kprobes: Fix probe offset to be unsigned")
Acked-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 4397f04575c44e1440ec2e49b6302785c95fd2f8 upstream.
Jing Xia and Chunyan Zhang reported that on failing to allocate part of the
tracing buffer, memory is freed, but the pointers that point to them are not
initialized back to NULL, and later paths may try to free the freed memory
again. Jing and Chunyan fixed one of the locations that does this, but
missed a spot.
Link: http://lkml.kernel.org/r/20171226071253.8968-1-chunyan.zhang@spreadtrum.com
Fixes: 737223fbca3b1 ("tracing: Consolidate buffer allocation code")
Reported-by: Jing Xia <jing.xia@spreadtrum.com>
Reported-by: Chunyan Zhang <chunyan.zhang@spreadtrum.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 24f2aaf952ee0b59f31c3a18b8b36c9e3d3c2cf5 upstream.
Double free of the ring buffer happens when it fails to alloc new
ring buffer instance for max_buffer if TRACER_MAX_TRACE is configured.
The root cause is that the pointer is not set to NULL after the buffer
is freed in allocate_trace_buffers(), and the freeing of the ring
buffer is invoked again later if the pointer is not equal to Null,
as:
instance_mkdir()
|-allocate_trace_buffers()
|-allocate_trace_buffer(tr, &tr->trace_buffer...)
|-allocate_trace_buffer(tr, &tr->max_buffer...)
// allocate fail(-ENOMEM),first free
// and the buffer pointer is not set to null
|-ring_buffer_free(tr->trace_buffer.buffer)
// out_free_tr
|-free_trace_buffers()
|-free_trace_buffer(&tr->trace_buffer);
//if trace_buffer is not null, free again
|-ring_buffer_free(buf->buffer)
|-rb_free_cpu_buffer(buffer->buffers[cpu])
// ring_buffer_per_cpu is null, and
// crash in ring_buffer_per_cpu->pages
Link: http://lkml.kernel.org/r/20171226071253.8968-1-chunyan.zhang@spreadtrum.com
Fixes: 737223fbca3b1 ("tracing: Consolidate buffer allocation code")
Signed-off-by: Jing Xia <jing.xia@spreadtrum.com>
Signed-off-by: Chunyan Zhang <chunyan.zhang@spreadtrum.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 45d8b80c2ac5d21cd1e2954431fb676bc2b1e099 upstream.
Two info bits were added to the "commit" part of the ring buffer data page
when returned to be consumed. This was to inform the user space readers that
events have been missed, and that the count may be stored at the end of the
page.
What wasn't handled, was the splice code that actually called a function to
return the length of the data in order to zero out the rest of the page
before sending it up to user space. These data bits were returned with the
length making the value negative, and that negative value was not checked.
It was compared to PAGE_SIZE, and only used if the size was less than
PAGE_SIZE. Luckily PAGE_SIZE is unsigned long which made the compare an
unsigned compare, meaning the negative size value did not end up causing a
large portion of memory to be randomly zeroed out.
Fixes: 66a8cb95ed040 ("ring-buffer: Add place holder recording of dropped events")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 2967acbb257a6a9bf912f4778b727e00972eac9b upstream.
A previous commit changed the locking around registration/cleanup,
but direct callers of blk_trace_remove() were missed. This means
that if we hit the error path in setup, we will deadlock on
attempting to re-acquire the queue trace mutex.
Fixes: 1f2cac107c59 ("blktrace: fix unlocked access to init/start-stop/teardown")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 1f2cac107c591c24b60b115d6050adc213d10fc0 upstream.
sg.c calls into the blktrace functions without holding the proper queue
mutex for doing setup, start/stop, or teardown.
Add internal unlocked variants, and export the ones that do the proper
locking.
Fixes: 6da127ad0918 ("blktrace: Add blktrace ioctls to SCSI generic devices")
Tested-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 5acb3cc2c2e9d3020a4fee43763c6463767f1572 upstream.
The lockdep code had reported the following unsafe locking scenario:
CPU0 CPU1
---- ----
lock(s_active#228);
lock(&bdev->bd_mutex/1);
lock(s_active#228);
lock(&bdev->bd_mutex);
*** DEADLOCK ***
The deadlock may happen when one task (CPU1) is trying to delete a
partition in a block device and another task (CPU0) is accessing
tracing sysfs file (e.g. /sys/block/dm-1/trace/act_mask) in that
partition.
The s_active isn't an actual lock. It is a reference count (kn->count)
on the sysfs (kernfs) file. Removal of a sysfs file, however, require
a wait until all the references are gone. The reference count is
treated like a rwsem using lockdep instrumentation code.
The fact that a thread is in the sysfs callback method or in the
ioctl call means there is a reference to the opended sysfs or device
file. That should prevent the underlying block structure from being
removed.
Instead of using bd_mutex in the block_device structure, a new
blk_trace_mutex is now added to the request_queue structure to protect
access to the blk_trace structure.
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Waiman Long <longman@redhat.com>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Fix typo in patch subject line, and prune a comment detailing how
the code used to work.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 75df6e688ccd517e339a7c422ef7ad73045b18a2 upstream.
When reading data from trace_pipe, tracing_wait_pipe() performs a
check to see if tracing has been turned off after some data was read.
Currently, this check always looks at global trace state, but it
should be checking the trace instance where trace_pipe is located at.
Because of this bug, cat instances/i1/trace_pipe in the following
script will immediately exit instead of waiting for data:
cd /sys/kernel/debug/tracing
echo 0 > tracing_on
mkdir -p instances/i1
echo 1 > instances/i1/tracing_on
echo 1 > instances/i1/events/sched/sched_process_exec/enable
cat instances/i1/trace_pipe
Link: http://lkml.kernel.org/r/20170917102348.1615-1-tahsin@google.com
Fixes: 10246fa35d4f ("tracing: give easy way to clear trace buffer")
Signed-off-by: Tahsin Erdogan <tahsin@google.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 8dd33bcb7050dd6f8c1432732f930932c9d3a33e upstream.
One convenient way to erase trace is "echo > trace". However, this
is currently broken if the current tracer is irqsoff tracer. This
is because irqsoff tracer use max_buffer as the default trace
buffer.
Set the max_buffer as the one to be cleared when it's the trace
buffer currently in use.
Link: http://lkml.kernel.org/r/1505754215-29411-1-git-send-email-byan@nvidia.com
Cc: <mingo@redhat.com>
Fixes: 4acd4d00f ("tracing: give easy way to clear trace buffer")
Signed-off-by: Bo Yan <byan@nvidia.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 170b3b1050e28d1ba0700e262f0899ffa4fccc52 upstream.
Currently trace_clock timestamps are applied to both regular and max
buffers only for global trace. For instance trace, trace_clock
timestamps are applied only to regular buffer. But, regular and max
buffers can be swapped, for example, following a snapshot. So, for
instance trace, bad timestamps can be seen following a snapshot.
Let's apply trace_clock timestamps to instance max buffer as well.
Link: http://lkml.kernel.org/r/ebdb168d0be042dcdf51f81e696b17fabe3609c1.1504642143.git.tom.zanussi@linux.intel.com
Fixes: 277ba0446 ("tracing: Add interface to allow multiple trace buffers")
Signed-off-by: Baohong Liu <baohong.liu@intel.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 46320a6acc4fb58f04bcf78c4c942cc43b20f986 upstream.
In the second iteration of trace_selftest_ops(), the error goto label is
wrong in the case where trace_selftest_test_global_cnt is off. In the
case of error, it leaks the dynamic ops that was allocated.
Fixes: 95950c2e ("ftrace: Add self-tests for multiple function trace users")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 86038c5ea81b519a8a1fcfcd5e4599aab0cdd119 upstream.
Both Linus (most recent) and Steve (a while ago) reported that perf
related callbacks have massive stack bloat.
The problem is that software events need a pt_regs in order to
properly report the event location and unwind stack. And because we
could not assume one was present we allocated one on stack and filled
it with minimal bits required for operation.
Now, pt_regs is quite large, so this is undesirable. Furthermore it
turns out that most sites actually have a pt_regs pointer available,
making this even more onerous, as the stack space is pointless waste.
This patch addresses the problem by observing that software events
have well defined nesting semantics, therefore we can use static
per-cpu storage instead of on-stack.
Linus made the further observation that all but the scheduler callers
of perf_sw_event() have a pt_regs available, so we change the regular
perf_sw_event() to require a valid pt_regs (where it used to be
optional) and add perf_sw_event_sched() for the scheduler.
We have a scheduler specific call instead of a more generic _noregs()
like construct because we can assume non-recursion from the scheduler
and thereby simplify the code further (_noregs would have to put the
recursion context call inline in order to assertain which __perf_regs
element to use).
One last note on the implementation of perf_trace_buf_prepare(); we
allow .regs = NULL for those cases where we already have a pt_regs
pointer available and do not need another.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Javi Merino <javi.merino@arm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Petr Mladek <pmladek@suse.cz>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Tom Zanussi <tom.zanussi@linux.intel.com>
Cc: Vaibhav Nagarnaik <vnagarnaik@google.com>
Link: http://lkml.kernel.org/r/20141216115041.GW3337@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 8b0db1a5bdfcee0dbfa89607672598ae203c9045 upstream.
Performing the following task with kmemleak enabled:
# cd /sys/kernel/tracing/events/irq/irq_handler_entry/
# echo 'enable_event:kmem:kmalloc:3 if irq >' > trigger
# echo 'enable_event:kmem:kmalloc:3 if irq > 31' > trigger
# echo scan > /sys/kernel/debug/kmemleak
# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff8800b9290308 (size 32):
comm "bash", pid 1114, jiffies 4294848451 (age 141.139s)
hex dump (first 32 bytes):
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff81cef5aa>] kmemleak_alloc+0x4a/0xa0
[<ffffffff81357938>] kmem_cache_alloc_trace+0x158/0x290
[<ffffffff81261c09>] create_filter_start.constprop.28+0x99/0x940
[<ffffffff812639c9>] create_filter+0xa9/0x160
[<ffffffff81263bdc>] create_event_filter+0xc/0x10
[<ffffffff812655e5>] set_trigger_filter+0xe5/0x210
[<ffffffff812660c4>] event_enable_trigger_func+0x324/0x490
[<ffffffff812652e2>] event_trigger_write+0x1a2/0x260
[<ffffffff8138cf87>] __vfs_write+0xd7/0x380
[<ffffffff8138f421>] vfs_write+0x101/0x260
[<ffffffff8139187b>] SyS_write+0xab/0x130
[<ffffffff81cfd501>] entry_SYSCALL_64_fastpath+0x1f/0xbe
[<ffffffffffffffff>] 0xffffffffffffffff
The function create_filter() is passed a 'filterp' pointer that gets
allocated, and if "set_str" is true, it is up to the caller to free it, even
on error. The problem is that the pointer is not freed by create_filter()
when set_str is false. This is a bug, and it is not up to the caller to free
the filter on error if it doesn't care about the string.
Link: http://lkml.kernel.org/r/1502705898-27571-2-git-send-email-chuhu@redhat.com
Fixes: 38b78eb85 ("tracing: Factorize filter creation")
Reported-by: Chunyu Hu <chuhu@redhat.com>
Tested-by: Chunyu Hu <chuhu@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit db9108e054700c96322b0f0028546aa4e643cf0b upstream.
Hit the kmemleak when executing instance_rmdir, it forgot releasing
mem of tracing_cpumask. With this fix, the warn does not appear any
more.
unreferenced object 0xffff93a8dfaa7c18 (size 8):
comm "mkdir", pid 1436, jiffies 4294763622 (age 9134.308s)
hex dump (first 8 bytes):
ff ff ff ff ff ff ff ff ........
backtrace:
[<ffffffff88b6567a>] kmemleak_alloc+0x4a/0xa0
[<ffffffff8861ea41>] __kmalloc_node+0xf1/0x280
[<ffffffff88b505d3>] alloc_cpumask_var_node+0x23/0x30
[<ffffffff88b5060e>] alloc_cpumask_var+0xe/0x10
[<ffffffff88571ab0>] instance_mkdir+0x90/0x240
[<ffffffff886e5100>] tracefs_syscall_mkdir+0x40/0x70
[<ffffffff886565c9>] vfs_mkdir+0x109/0x1b0
[<ffffffff8865b1d0>] SyS_mkdir+0xd0/0x100
[<ffffffff88403857>] do_syscall_64+0x67/0x150
[<ffffffff88b710e7>] return_from_SYSCALL_64+0x0/0x6a
[<ffffffffffffffff>] 0xffffffffffffffff
Link: http://lkml.kernel.org/r/1500546969-12594-1-git-send-email-chuhu@redhat.com
Fixes: ccfe9e42e451 ("tracing: Make tracing_cpumask available for all instances")
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 9e52b32567126fe146f198971364f68d3bc5233f upstream.
Always try to parse an address, since kstrtoul() will safely fail when
given a symbol as input. If that fails (which will be the case for a
symbol), try to parse a symbol instead.
This allows creating a probe such as:
p:probe/vlan_gro_receive 8021q:vlan_gro_receive+0
Which is necessary for this command to work:
perf probe -m 8021q -a vlan_gro_receive
Link: http://lkml.kernel.org/r/fd72d666f45b114e2c5b9cf7e27b91de1ec966f1.1498122881.git.sd@queasysnail.net
Fixes: 413d37d1e ("tracing: Add kprobe-based event tracer")
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
[bwh: Backported to 3.16: preserve the check that an addresses isn't used for
a kretprobe]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 30e7d894c1478c88d50ce94ddcdbd7f9763d9cdd upstream.
Enabling the tracer selftest triggers occasionally the warning in
text_poke(), which warns when the to be modified page is not marked
reserved.
The reason is that the tracer selftest installs kprobes on functions marked
__init for testing. These probes are removed after the tests, but that
removal schedules the delayed kprobes_optimizer work, which will do the
actual text poke. If the work is executed after the init text is freed,
then the warning triggers. The bug can be reproduced reliably when the work
delay is increased.
Flush the optimizer work and wait for the optimizing/unoptimizing lists to
become empty before returning from the kprobes tracer selftest. That
ensures that all operations which were queued due to the probes removal
have completed.
Link: http://lkml.kernel.org/r/20170516094802.76a468bb@gandalf.local.home
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Masami Hiramatsu <mhiramat@kernel.org>
Fixes: 6274de498 ("kprobes: Support delayed unoptimizing")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 82cc4fc2e70ec5baeff8f776f2773abc8b2cc0ae upstream.
When two function probes are added to set_ftrace_filter, and then one of
them is removed, the update to the function locations is not performed, and
the record keeping of the function states are corrupted, and causes an
ftrace_bug() to occur.
This is easily reproducable by adding two probes, removing one, and then
adding it back again.
# cd /sys/kernel/debug/tracing
# echo schedule:traceoff > set_ftrace_filter
# echo do_IRQ:traceoff > set_ftrace_filter
# echo \!do_IRQ:traceoff > /debug/tracing/set_ftrace_filter
# echo do_IRQ:traceoff > set_ftrace_filter
Causes:
------------[ cut here ]------------
WARNING: CPU: 2 PID: 1098 at kernel/trace/ftrace.c:2369 ftrace_get_addr_curr+0x143/0x220
Modules linked in: [...]
CPU: 2 PID: 1098 Comm: bash Not tainted 4.10.0-test+ #405
Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v02.05 05/07/2012
Call Trace:
dump_stack+0x68/0x9f
__warn+0x111/0x130
? trace_irq_work_interrupt+0xa0/0xa0
warn_slowpath_null+0x1d/0x20
ftrace_get_addr_curr+0x143/0x220
? __fentry__+0x10/0x10
ftrace_replace_code+0xe3/0x4f0
? ftrace_int3_handler+0x90/0x90
? printk+0x99/0xb5
? 0xffffffff81000000
ftrace_modify_all_code+0x97/0x110
arch_ftrace_update_code+0x10/0x20
ftrace_run_update_code+0x1c/0x60
ftrace_run_modify_code.isra.48.constprop.62+0x8e/0xd0
register_ftrace_function_probe+0x4b6/0x590
? ftrace_startup+0x310/0x310
? debug_lockdep_rcu_enabled.part.4+0x1a/0x30
? update_stack_state+0x88/0x110
? ftrace_regex_write.isra.43.part.44+0x1d3/0x320
? preempt_count_sub+0x18/0xd0
? mutex_lock_nested+0x104/0x800
? ftrace_regex_write.isra.43.part.44+0x1d3/0x320
? __unwind_start+0x1c0/0x1c0
? _mutex_lock_nest_lock+0x800/0x800
ftrace_trace_probe_callback.isra.3+0xc0/0x130
? func_set_flag+0xe0/0xe0
? __lock_acquire+0x642/0x1790
? __might_fault+0x1e/0x20
? trace_get_user+0x398/0x470
? strcmp+0x35/0x60
ftrace_trace_onoff_callback+0x48/0x70
ftrace_regex_write.isra.43.part.44+0x251/0x320
? match_records+0x420/0x420
ftrace_filter_write+0x2b/0x30
__vfs_write+0xd7/0x330
? do_loop_readv_writev+0x120/0x120
? locks_remove_posix+0x90/0x2f0
? do_lock_file_wait+0x160/0x160
? __lock_is_held+0x93/0x100
? rcu_read_lock_sched_held+0x5c/0xb0
? preempt_count_sub+0x18/0xd0
? __sb_start_write+0x10a/0x230
? vfs_write+0x222/0x240
vfs_write+0xef/0x240
SyS_write+0xab/0x130
? SyS_read+0x130/0x130
? trace_hardirqs_on_caller+0x182/0x280
? trace_hardirqs_on_thunk+0x1a/0x1c
entry_SYSCALL_64_fastpath+0x18/0xad
RIP: 0033:0x7fe61c157c30
RSP: 002b:00007ffe87890258 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
RAX: ffffffffffffffda RBX: ffffffff8114a410 RCX: 00007fe61c157c30
RDX: 0000000000000010 RSI: 000055814798f5e0 RDI: 0000000000000001
RBP: ffff8800c9027f98 R08: 00007fe61c422740 R09: 00007fe61ca53700
R10: 0000000000000073 R11: 0000000000000246 R12: 0000558147a36400
R13: 00007ffe8788f160 R14: 0000000000000024 R15: 00007ffe8788f15c
? trace_hardirqs_off_caller+0xc0/0x110
---[ end trace 99fa09b3d9869c2c ]---
Bad trampoline accounting at: ffffffff81cc3b00 (do_IRQ+0x0/0x150)
Fixes: 59df055f1991 ("ftrace: trace different functions with a different tracer")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
[bwh: Backported to 3.16:
- Use ftrace_run_update_code() instead of ftrace_run_modify_code(), and
don't define old_hash
- Adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 78f7a45dac2a2d2002f98a3a95f7979867868d73 upstream.
I noticed that reading the snapshot file when it is empty no longer gives a
status. It suppose to show the status of the snapshot buffer as well as how
to allocate and use it. For example:
># cat snapshot
# tracer: nop
#
#
# * Snapshot is allocated *
#
# Snapshot commands:
# echo 0 > snapshot : Clears and frees snapshot buffer
# echo 1 > snapshot : Allocates snapshot buffer, if not already allocated.
# Takes a snapshot of the main buffer.
# echo 2 > snapshot : Clears snapshot buffer (but does not allocate or free)
# (Doesn't have to be '2' works with any number that
# is not a '0' or '1')
But instead it just showed an empty buffer:
># cat snapshot
# tracer: nop
#
# entries-in-buffer/entries-written: 0/0 #P:4
#
# _-----=> irqs-off
# / _----=> need-resched
# | / _---=> hardirq/softirq
# || / _--=> preempt-depth
# ||| / delay
# TASK-PID CPU# |||| TIMESTAMP FUNCTION
# | | | |||| | |
What happened was that it was using the ring_buffer_iter_empty() function to
see if it was empty, and if it was, it showed the status. But that function
was returning false when it was empty. The reason was that the iter header
page was on the reader page, and the reader page was empty, but so was the
buffer itself. The check only tested to see if the iter was on the commit
page, but the commit page was no longer pointing to the reader page, but as
all pages were empty, the buffer is also.
Fixes: 651e22f2701b ("ring-buffer: Always reset iterator to reader page")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit df62db5be2e5f070ecd1a5ece5945b590ee112e0 upstream.
Currently the snapshot trigger enables the probe and then allocates the
snapshot. If the probe triggers before the allocation, it could cause the
snapshot to fail and turn tracing off. It's best to allocate the snapshot
buffer first, and then enable the trigger. If something goes wrong in the
enabling of the trigger, the snapshot buffer is still allocated, but it can
also be freed by the user by writting zero into the snapshot buffer file.
Also add a check of the return status of alloc_snapshot().
Fixes: 77fd5c15e3 ("tracing: Add snapshot trigger to function probes")
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 62277de758b155dc04b78f195a1cb5208c37b2df upstream.
In case of error, the function kthread_run() returns ERR_PTR()
and never returns NULL. The NULL test in the return value check
should be replaced with IS_ERR().
Link: http://lkml.kernel.org/r/1466184839-14927-1-git-send-email-weiyj_lk@163.com
Fixes: 6c43e554a ("ring-buffer: Add ring buffer startup selftest")
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit e09e28671cda63e6308b31798b997639120e2a21 upstream.
Strcpy is inherently not safe, and strlcpy() should be used instead.
__trace_find_cmdline() uses strcpy() because the comms saved must have a
terminating nul character, but it doesn't hurt to add the extra protection
of using strlcpy() instead of strcpy().
Link: http://lkml.kernel.org/r/1493806274-13936-1-git-send-email-amit.pundir@linaro.org
Signed-off-by: Amey Telawane <ameyt@codeaurora.org>
[AmitP: Cherry-picked this commit from CodeAurora kernel/msm-3.10
https://source.codeaurora.org/quic/la/kernel/msm-3.10/commit/?id=2161ae9a70b12cf18ac8e5952a20161ffbccb477]
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
[ Updated change log and removed the "- 1" from len parameter ]
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 794de08a16cf1fc1bf785dc48f66d36218cf6d88 upstream.
Both the wakeup and irqsoff tracers can use the function graph tracer when
the display-graph option is set. The problem is that they ignore the notrace
file, and record the entry of functions that would be ignored by the
function_graph tracer. This causes the trace->depth to be recorded into the
ring buffer. The set_graph_notrace uses a trick by adding a large negative
number to the trace->depth when a graph function is to be ignored.
On trace output, the graph function uses the depth to record a stack of
functions. But since the depth is negative, it accesses the array with a
negative number and causes an out of bounds access that can cause a kernel
oops or corrupt data.
Have the print functions handle cases where a tracer still records functions
even when they are in set_graph_notrace.
Also add warnings if the depth is below zero before accessing the array.
Note, the function graph logic will still prevent the return of these
functions from being recorded, which means that they will be left hanging
without a return. For example:
# echo '*spin*' > set_graph_notrace
# echo 1 > options/display-graph
# echo wakeup > current_tracer
# cat trace
[...]
_raw_spin_lock() {
preempt_count_add() {
do_raw_spin_lock() {
update_rq_clock();
Where it should look like:
_raw_spin_lock() {
preempt_count_add();
do_raw_spin_lock();
}
update_rq_clock();
Cc: Namhyung Kim <namhyung.kim@lge.com>
Fixes: 29ad23b00474 ("ftrace: Add set_graph_notrace filter")
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 1245800c0f96eb6ebb368593e251d66c01e61022 upstream.
The iter->seq can be reset outside the protection of the mutex. So can
reading of user data. Move the mutex up to the beginning of the function.
Fixes: d7350c3f45694 ("tracing/core: make the read callbacks reentrants")
Reported-by: Al Viro <viro@ZenIV.linux.org.uk>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 70c8217acd4383e069fe1898bbad36ea4fcdbdcc upstream.
If a task uses a non constant string for the format parameter in
trace_printk(), then the trace_printk_fmt variable is set to NULL. This
variable is then saved in the __trace_printk_fmt section.
The function hold_module_trace_bprintk_format() checks to see if duplicate
formats are used by modules, and reuses them if so (saves them to the list
if it is new). But this function calls lookup_format() that does a strcmp()
to the value (which is now NULL) and can cause a kernel oops.
This wasn't an issue till 3debb0a9ddb ("tracing: Fix trace_printk() to print
when not using bprintk()") which added "__used" to the trace_printk_fmt
variable, and before that, the kernel simply optimized it out (no NULL value
was saved).
The fix is simply to handle the NULL pointer in lookup_format() and have the
caller ignore the value if it was NULL.
Link: http://lkml.kernel.org/r/1464769870-18344-1-git-send-email-zhengjun.xing@intel.com
Reported-by: xingzhen <zhengjun.xing@intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Fixes: 3debb0a9ddb ("tracing: Fix trace_printk() to print when not using bprintk()")
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 59643d1535eb220668692a5359de22545af579f6 upstream.
If the size passed to ring_buffer_resize() is greater than MAX_LONG - BUF_PAGE_SIZE
then the DIV_ROUND_UP() will return zero.
Here's the details:
# echo 18014398509481980 > /sys/kernel/debug/tracing/buffer_size_kb
tracing_entries_write() processes this and converts kb to bytes.
18014398509481980 << 10 = 18446744073709547520
and this is passed to ring_buffer_resize() as unsigned long size.
size = DIV_ROUND_UP(size, BUF_PAGE_SIZE);
Where DIV_ROUND_UP(a, b) is (a + b - 1)/b
BUF_PAGE_SIZE is 4080 and here
18446744073709547520 + 4080 - 1 = 18446744073709551599
where 18446744073709551599 is still smaller than 2^64
2^64 - 18446744073709551599 = 17
But now 18446744073709551599 / 4080 = 4521260802379792
and size = size * 4080 = 18446744073709551360
This is checked to make sure its still greater than 2 * 4080,
which it is.
Then we convert to the number of buffer pages needed.
nr_page = DIV_ROUND_UP(size, BUF_PAGE_SIZE)
but this time size is 18446744073709551360 and
2^64 - (18446744073709551360 + 4080 - 1) = -3823
Thus it overflows and the resulting number is less than 4080, which makes
3823 / 4080 = 0
an nr_pages is set to this. As we already checked against the minimum that
nr_pages may be, this causes the logic to fail as well, and we crash the
kernel.
There's no reason to have the two DIV_ROUND_UP() (that's just result of
historical code changes), clean up the code and fix this bug.
Fixes: 83f40318dab00 ("ring-buffer: Make removal of ring buffer pages atomic")
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 9b94a8fba501f38368aef6ac1b30e7335252a220 upstream.
The size variable to change the ring buffer in ftrace is a long. The
nr_pages used to update the ring buffer based on the size is int. On 64 bit
machines this can cause an overflow problem.
For example, the following will cause the ring buffer to crash:
# cd /sys/kernel/debug/tracing
# echo 10 > buffer_size_kb
# echo 8556384240 > buffer_size_kb
Then you get the warning of:
WARNING: CPU: 1 PID: 318 at kernel/trace/ring_buffer.c:1527 rb_update_pages+0x22f/0x260
Which is:
RB_WARN_ON(cpu_buffer, nr_removed);
Note each ring buffer page holds 4080 bytes.
This is because:
1) 10 causes the ring buffer to have 3 pages.
(10kb requires 3 * 4080 pages to hold)
2) (2^31 / 2^10 + 1) * 4080 = 8556384240
The value written into buffer_size_kb is shifted by 10 and then passed
to ring_buffer_resize(). 8556384240 * 2^10 = 8761737461760
3) The size passed to ring_buffer_resize() is then divided by BUF_PAGE_SIZE
which is 4080. 8761737461760 / 4080 = 2147484672
4) nr_pages is subtracted from the current nr_pages (3) and we get:
2147484669. This value is saved in a signed integer nr_pages_to_update
5) 2147484669 is greater than 2^31 but smaller than 2^32, a signed int
turns into the value of -2147482627
6) As the value is a negative number, in update_pages_handler() it is
negated and passed to rb_remove_pages() and 2147482627 pages will
be removed, which is much larger than 3 and it causes the warning
because not all the pages asked to be removed were removed.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=118001
Fixes: 7a8e76a3829f1 ("tracing: unified trace buffer")
Reported-by: Hao Qin <QEver.cn@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 854145e0a8e9a05f7366d240e2f99d9c1ca6d6dd upstream.
Currently register functions for events will be called
through the 'reg' field of event class directly without
any check when seting up triggers.
Triggers for events that don't support register through
debug fs (events under events/ftrace are for trace-cmd to
read event format, and most of them don't have a register
function except events/ftrace/functionx) can't be enabled
at all, and an oops will be hit when setting up trigger
for those events, so just not creating them is an easy way
to avoid the oops.
Link: http://lkml.kernel.org/r/1462275274-3911-1-git-send-email-chuhu@redhat.com
Fixes: 85f2b08268c01 ("tracing: Add basic event trigger framework")
Signed-off-by: Chunyu Hu <chuhu@redhat.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit 3debb0a9ddb16526de8b456491b7db60114f7b5e upstream.
The trace_printk() code will allocate extra buffers if the compile detects
that a trace_printk() is used. To do this, the format of the trace_printk()
is saved to the __trace_printk_fmt section, and if that section is bigger
than zero, the buffers are allocated (along with a message that this has
happened).
If trace_printk() uses a format that is not a constant, and thus something
not guaranteed to be around when the print happens, the compiler optimizes
the fmt out, as it is not used, and the __trace_printk_fmt section is not
filled. This means the kernel will not allocate the special buffers needed
for the trace_printk() and the trace_printk() will not write anything to the
tracing buffer.
Adding a "__used" to the variable in the __trace_printk_fmt section will
keep it around, even though it is set to NULL. This will keep the string
from being printed in the debugfs/tracing/printk_formats section as it is
not needed.
Reported-by: Vlastimil Babka <vbabka@suse.cz>
Fixes: 07d777fe8c398 "tracing: Add percpu buffers for trace_printk()"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit a29054d9478d0435ab01b7544da4f674ab13f533 upstream.
If tracing contains data and the trace_pipe file is read with sendfile(),
then it can trigger a NULL pointer dereference and various BUG_ON within the
VM code.
There's a patch to fix this in the splice_to_pipe() code, but it's also a
good idea to not let that happen from trace_pipe either.
Link: http://lkml.kernel.org/r/1457641146-9068-1-git-send-email-rabin@rab.in
Reported-by: Rabin Vincent <rabin.vincent@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit cb86e05390debcc084cfdb0a71ed4c5dbbec517d upstream.
Joel Fernandes reported that the function tracing of preempt disabled
sections was not being reported when running either the preemptirqsoff or
preemptoff tracers. This was due to the fact that the function tracer
callback for those tracers checked if irqs were disabled before tracing. But
this fails when we want to trace preempt off locations as well.
Joel explained that he wanted to see funcitons where interrupts are enabled
but preemption was disabled. The expected output he wanted:
<...>-2265 1d.h1 3419us : preempt_count_sub <-irq_exit
<...>-2265 1d..1 3419us : __do_softirq <-irq_exit
<...>-2265 1d..1 3419us : msecs_to_jiffies <-__do_softirq
<...>-2265 1d..1 3420us : irqtime_account_irq <-__do_softirq
<...>-2265 1d..1 3420us : __local_bh_disable_ip <-__do_softirq
<...>-2265 1..s1 3421us : run_timer_softirq <-__do_softirq
<...>-2265 1..s1 3421us : hrtimer_run_pending <-run_timer_softirq
<...>-2265 1..s1 3421us : _raw_spin_lock_irq <-run_timer_softirq
<...>-2265 1d.s1 3422us : preempt_count_add <-_raw_spin_lock_irq
<...>-2265 1d.s2 3422us : _raw_spin_unlock_irq <-run_timer_softirq
<...>-2265 1..s2 3422us : preempt_count_sub <-_raw_spin_unlock_irq
<...>-2265 1..s1 3423us : rcu_bh_qs <-__do_softirq
<...>-2265 1d.s1 3423us : irqtime_account_irq <-__do_softirq
<...>-2265 1d.s1 3423us : __local_bh_enable <-__do_softirq
There's a comment saying that the irq disabled check is because there's a
possible race that tracing_cpu may be set when the function is executed. But
I don't remember that race. For now, I added a check for preemption being
enabled too to not record the function, as there would be no race if that
was the case. I need to re-investigate this, as I'm now thinking that the
tracing_cpu will always be correct. But no harm in keeping the check for
now, except for the slight performance hit.
Link: http://lkml.kernel.org/r/1457770386-88717-1-git-send-email-agnel.joel@gmail.com
Fixes: 5e6d2b9cfa3a "tracing: Use one prologue for the preempt irqs off tracer function tracers"
Cc: stable@vget.kernel.org # 2.6.37+
Reported-by: Joel Fernandes <agnel.joel@gmail.com>
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
|
|
commit d045437a169f899dfb0f6f7ede24cc042543ced9 upstream.
The ftrace:function event is only displayed for parsing the function tracer
data. It is not used to enable function tracing, and does not include an
"enable" file in its event directory.
Originally, this event was kept separate from other events because it did
not have a ->reg parameter. But perf added a "reg" parameter for its use
which caused issues, because it made the event available to functions where
it was not compatible for.
Commit 9b63776fa3ca9 "tracing: Do not enable function event with enable"
added a TRACE_EVENT_FL_IGNORE_ENABLE flag that prevented the function event
from being enabled by normal trace events. But this commit missed keeping
the function event from being displayed by the "available_events" directory,
which is used to show what events can be enabled by set_event.
One documented way to enable all events is to:
cat available_events > set_event
But because the function event is displayed in the available_events, this
now causes an INVALID error:
cat: write error: Invalid argument
Reported-by: Chunyu Hu <chuhu@redhat.com>
Fixes: 9b63776fa3ca9 "tracing: Do not enable function event with enable"
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
|