linux-stable.git/Documentation/sysctl/kernel.txt, branch v4.9.321

random: fix sysctl documentation nits

2022-06-25T09:45:12+00:00

commit 069c4ea6871c18bd368f27756e0f91ffb524a788 upstream.

A semicolon was missing, and the almost-alphabetical-but-not ordering
was confusing, so regroup these by category instead.

Signed-off-by: Jason A. Donenfeld 
Signed-off-by: Greg Kroah-Hartman

random: remove ifdef'd out interrupt bench

2022-06-25T09:45:08+00:00

commit 95e6060c20a7f5db60163274c5222a725ac118f9 upstream.

With tools like kbench9000 giving more finegrained responses, and this
basically never having been used ever since it was initially added,
let's just get rid of this. There *is* still work to be done on the
interrupt handler, but this really isn't the way it's being developed.

Cc: Theodore Ts'o 
Reviewed-by: Eric Biggers 
Reviewed-by: Dominik Brodowski 
Signed-off-by: Jason A. Donenfeld 
Signed-off-by: Greg Kroah-Hartman

random: always wake up entropy writers after extraction

2022-06-25T09:45:07+00:00

commit 489c7fc44b5740d377e8cfdbf0851036e493af00 upstream.

Now that POOL_BITS == POOL_MIN_BITS, we must unconditionally wake up
entropy writers after every extraction. Therefore there's no point of
write_wakeup_threshold, so we can move it to the dustbin of unused
compatibility sysctls. While we're at it, we can fix a small comparison
where we were waking up after <= min rather than < min.

Cc: Theodore Ts'o 
Suggested-by: Eric Biggers 
Reviewed-by: Eric Biggers 
Reviewed-by: Dominik Brodowski 
Signed-off-by: Jason A. Donenfeld 
Signed-off-by: Greg Kroah-Hartman

bpf: Add kconfig knob for disabling unpriv bpf by default

2022-02-16T11:43:54+00:00

commit 08389d888287c3823f80b0216766b71e17f0aba5 upstream.

Add a kconfig knob which allows for unprivileged bpf to be disabled by default.
If set, the knob sets /proc/sys/kernel/unprivileged_bpf_disabled to value of 2.

This still allows a transition of 2 -> {0,1} through an admin. Similarly,
this also still keeps 1 -> {1} behavior intact, so that once set to permanently
disabled, it cannot be undone aside from a reboot.

We've also added extra2 with max of 2 for the procfs handler, so that an admin
still has a chance to toggle between 0 <-> 2.

Either way, as an additional alternative, applications can make use of CAP_BPF
that we added a while ago.

Signed-off-by: Daniel Borkmann 
Signed-off-by: Alexei Starovoitov 
Link: https://lore.kernel.org/bpf/74ec548079189e4e4dffaeb42b8987bb3c852eee.1620765074.git.daniel@iogearbox.net
[fllinden@amazon.com: backported to 4.9]
Signed-off-by: Frank van der Linden 
Signed-off-by: Greg Kroah-Hartman

printk: add kernel parameter to control writes to /dev/kmsg

2016-08-02T23:35:06+00:00

Add a "printk.devkmsg" kernel command line parameter which controls how
userspace writes into /dev/kmsg.  It has three options:

 * ratelimit - ratelimit logging from userspace.
 * on  - unlimited logging from userspace
 * off - logging from userspace gets ignored

The default setting is to ratelimit the messages written to it.

This changes the kernel default setting of "on" to "ratelimit" and we do
that because we want to keep userspace spamming /dev/kmsg to sane
levels.  This is especially moot when a small kernel log buffer wraps
around and messages get lost.  So the ratelimiting setting should be a
sane setting where kernel messages should have a bit higher chance of
survival from all the spamming.

It additionally does not limit logging to /dev/kmsg while the system is
booting if we haven't disabled it on the command line.

Furthermore, we can control the logging from a lower priority sysctl
interface - kernel.printk_devkmsg.

That interface will succeed only if printk.devkmsg *hasn't* been
supplied on the command line.  If it has, then printk.devkmsg is a
one-time setting which remains for the duration of the system lifetime.
This "locking" of the setting is to prevent userspace from changing the
logging on us through sysctl(2).

This patch is based on previous patches from Linus and Steven.

[bp@suse.de: fixes]
  Link: http://lkml.kernel.org/r/20160719072344.GC25563@nazgul.tnic
Link: http://lkml.kernel.org/r/20160716061745.15795-3-bp@alien8.de
Signed-off-by: Borislav Petkov 
Cc: Dave Young 
Cc: Franck Bui 
Cc: Greg Kroah-Hartman 
Cc: Ingo Molnar 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Steven Rostedt 
Cc: Uwe Kleine-König 
Signed-off-by: Andrew Morton 
Signed-off-by: Linus Torvalds

rcu: sysctl: Panic on RCU Stall

2016-06-15T23:00:05+00:00

It is not always easy to determine the cause of an RCU stall just by
analysing the RCU stall messages, mainly when the problem is caused
by the indirect starvation of rcu threads. For example, when preempt_rcu
is not awakened due to the starvation of a timer softirq.

We have been hard coding panic() in the RCU stall functions for
some time while testing the kernel-rt. But this is not possible in
some scenarios, like when supporting customers.

This patch implements the sysctl kernel.panic_on_rcu_stall. If
set to 1, the system will panic() when an RCU stall takes place,
enabling the capture of a vmcore. The vmcore provides a way to analyze
all kernel/tasks states, helping out to point to the culprit and the
solution for the stall.

The kernel.panic_on_rcu_stall sysctl is disabled by default.

Changes from v1:
- Fixed a typo in the git log
- The if(sysctl_panic_on_rcu_stall) panic() is in a static function
- Fixed the CONFIG_TINY_RCU compilation issue
- The var sysctl_panic_on_rcu_stall is now __read_mostly

Cc: Jonathan Corbet 
Cc: "Paul E. McKenney" 
Cc: Josh Triplett 
Cc: Steven Rostedt 
Cc: Mathieu Desnoyers 
Cc: Lai Jiangshan 
Acked-by: Christian Borntraeger 
Reviewed-by: Josh Triplett 
Reviewed-by: Arnaldo Carvalho de Melo 
Tested-by: "Luis Claudio R. Goncalves" 
Signed-off-by: Daniel Bristot de Oliveira 
Signed-off-by: Paul E. McKenney

perf core: Separate accounting of contexts and real addresses in a stack trace

2016-05-17T02:11:53+00:00

The perf_sample->ip_callchain->nr value includes all the entries in the
ip_callchain->ip[] array, real addresses and PERF_CONTEXT_{KERNEL,USER,etc},
while what the user expects is that what is in the kernel.perf_event_max_stack
sysctl or in the upcoming per event perf_event_attr.sample_max_stack knob be
honoured in terms of IP addresses in the stack trace.

So allocate a bunch of extra entries for contexts, and do the accounting
via perf_callchain_entry_ctx struct members.

A new sysctl, kernel.perf_event_max_contexts_per_stack is also
introduced for investigating possible bugs in the callchain
implementation by some arch.

Cc: Adrian Hunter 
Cc: Alexander Shishkin 
Cc: Alexei Starovoitov 
Cc: Brendan Gregg 
Cc: David Ahern 
Cc: Frederic Weisbecker 
Cc: He Kuang 
Cc: Jiri Olsa 
Cc: Masami Hiramatsu 
Cc: Milian Wolff 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Cc: Wang Nan 
Cc: Zefan Li 
Link: http://lkml.kernel.org/n/tip-3b4wnqk340c4sg4gwkfdi9yk@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo

Merge branch 'perf/urgent' into perf/core, to pick up fixes

2016-05-11T14:56:38+00:00

Signed-off-by: Ingo Molnar

perf/core: Change the default paranoia level to 2

2016-05-10T00:57:12+00:00

Allowing unprivileged kernel profiling lets any user dump follow kernel
control flow and dump kernel registers.  This most likely allows trivial
kASLR bypassing, and it may allow other mischief as well.  (Off the top
of my head, the PERF_SAMPLE_REGS_INTR output during /dev/urandom reads
could be quite interesting.)

Signed-off-by: Andy Lutomirski 
Acked-by: Kees Cook 
Signed-off-by: Linus Torvalds

perf core: Allow setting up max frame stack depth via sysctl

2016-04-27T13:20:39+00:00

The default remains 127, which is good for most cases, and not even hit
most of the time, but then for some cases, as reported by Brendan, 1024+
deep frames are appearing on the radar for things like groovy, ruby.

And in some workloads putting a _lower_ cap on this may make sense. One
that is per event still needs to be put in place tho.

The new file is:

  # cat /proc/sys/kernel/perf_event_max_stack
  127

Chaging it:

  # echo 256 > /proc/sys/kernel/perf_event_max_stack
  # cat /proc/sys/kernel/perf_event_max_stack
  256

But as soon as there is some event using callchains we get:

  # echo 512 > /proc/sys/kernel/perf_event_max_stack
  -bash: echo: write error: Device or resource busy
  #

Because we only allocate the callchain percpu data structures when there
is a user, which allows for changing the max easily, its just a matter
of having no callchain users at that point.

Reported-and-Tested-by: Brendan Gregg 
Reviewed-by: Frederic Weisbecker 
Acked-by: Alexei Starovoitov 
Acked-by: David Ahern 
Cc: Adrian Hunter 
Cc: Alexander Shishkin 
Cc: He Kuang 
Cc: Jiri Olsa 
Cc: Linus Torvalds 
Cc: Masami Hiramatsu 
Cc: Milian Wolff 
Cc: Namhyung Kim 
Cc: Peter Zijlstra 
Cc: Stephane Eranian 
Cc: Thomas Gleixner 
Cc: Vince Weaver 
Cc: Wang Nan 
Cc: Zefan Li 
Link: http://lkml.kernel.org/r/20160426002928.GB16708@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo