<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/kernel, branch linux-4.7.y</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>timekeeping: Fix __ktime_get_fast_ns() regression</title>
<updated>2016-10-16T15:50:36+00:00</updated>
<author>
<name>John Stultz</name>
<email>john.stultz@linaro.org</email>
</author>
<published>2016-10-05T02:55:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=7ea90daac4f3c6a3dc5592e4faf8ba1905050c4c'/>
<id>7ea90daac4f3c6a3dc5592e4faf8ba1905050c4c</id>
<content type='text'>
commit 58bfea9532552d422bde7afa207e1a0f08dffa7d upstream.

In commit 27727df240c7 ("Avoid taking lock in NMI path with
CONFIG_DEBUG_TIMEKEEPING"), I changed the logic to open-code
the timekeeping_get_ns() function, but I forgot to include
the unit conversion from cycles to nanoseconds, breaking the
function's output, which impacts users like perf.

This results in bogus perf timestamps like:
 swapper     0 [000]   253.427536:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.426573:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.426687:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.426800:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.426905:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.427022:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.427127:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.427239:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.427346:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.427463:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   255.426572:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])

Instead of more reasonable expected timestamps like:
 swapper     0 [000]    39.953768:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.064839:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.175956:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.287103:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.398217:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.509324:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.620437:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.731546:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.842654:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.953772:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    41.064881:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])

Add the proper use of timekeeping_delta_to_ns() to convert
the cycle delta to nanoseconds as needed.

Thanks to Brendan and Alexei for finding this quickly after
the v4.8 release. Unfortunately the problematic commit has
landed in some -stable trees so they'll need this fix as
well.

Many apologies for this mistake. I'll be looking to add a
perf-clock sanity test to the kselftest timers tests soon.

Fixes: 27727df240c7 "timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING"
Reported-by: Brendan Gregg &lt;bgregg@netflix.com&gt;
Reported-by: Alexei Starovoitov &lt;alexei.starovoitov@gmail.com&gt;
Tested-and-reviewed-by: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Signed-off-by: John Stultz &lt;john.stultz@linaro.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Link: http://lkml.kernel.org/r/1475636148-26539-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 58bfea9532552d422bde7afa207e1a0f08dffa7d upstream.

In commit 27727df240c7 ("Avoid taking lock in NMI path with
CONFIG_DEBUG_TIMEKEEPING"), I changed the logic to open-code
the timekeeping_get_ns() function, but I forgot to include
the unit conversion from cycles to nanoseconds, breaking the
function's output, which impacts users like perf.

This results in bogus perf timestamps like:
 swapper     0 [000]   253.427536:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.426573:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.426687:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.426800:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.426905:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.427022:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.427127:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.427239:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.427346:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   254.427463:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]   255.426572:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])

Instead of more reasonable expected timestamps like:
 swapper     0 [000]    39.953768:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.064839:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.175956:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.287103:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.398217:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.509324:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.620437:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.731546:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.842654:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    40.953772:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])
 swapper     0 [000]    41.064881:  111111111 cpu-clock:  ffffffff810a0de6 native_safe_halt+0x6 ([kernel.kallsyms])

Add the proper use of timekeeping_delta_to_ns() to convert
the cycle delta to nanoseconds as needed.

Thanks to Brendan and Alexei for finding this quickly after
the v4.8 release. Unfortunately the problematic commit has
landed in some -stable trees so they'll need this fix as
well.

Many apologies for this mistake. I'll be looking to add a
perf-clock sanity test to the kselftest timers tests soon.

Fixes: 27727df240c7 "timekeeping: Avoid taking lock in NMI path with CONFIG_DEBUG_TIMEKEEPING"
Reported-by: Brendan Gregg &lt;bgregg@netflix.com&gt;
Reported-by: Alexei Starovoitov &lt;alexei.starovoitov@gmail.com&gt;
Tested-and-reviewed-by: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Signed-off-by: John Stultz &lt;john.stultz@linaro.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Link: http://lkml.kernel.org/r/1475636148-26539-1-git-send-email-john.stultz@linaro.org
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Mathieu Desnoyers &lt;mathieu.desnoyers@efficios.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd</title>
<updated>2016-10-07T13:21:25+00:00</updated>
<author>
<name>Michal Hocko</name>
<email>mhocko@suse.com</email>
</author>
<published>2016-09-01T23:15:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d926cd9f7b907f6ae3fc2c49684f51da3c8f2f72'/>
<id>d926cd9f7b907f6ae3fc2c49684f51da3c8f2f72</id>
<content type='text'>
commit 735f2770a770156100f534646158cb58cb8b2939 upstream.

Commit fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal
exit") has caused a subtle regression in nscd which uses
CLONE_CHILD_CLEARTID to clear the nscd_certainly_running flag in the
shared databases, so that the clients are notified when nscd is
restarted.  Now, when nscd uses a non-persistent database, clients that
have it mapped keep thinking the database is being updated by nscd, when
in fact nscd has created a new (anonymous) one (for non-persistent
databases it uses an unlinked file as backend).

The original proposal for the CLONE_CHILD_CLEARTID change claimed
(https://lkml.org/lkml/2006/10/25/233):

: The NPTL library uses the CLONE_CHILD_CLEARTID flag on clone() syscalls
: on behalf of pthread_create() library calls.  This feature is used to
: request that the kernel clear the thread-id in user space (at an address
: provided in the syscall) when the thread disassociates itself from the
: address space, which is done in mm_release().
:
: Unfortunately, when a multi-threaded process incurs a core dump (such as
: from a SIGSEGV), the core-dumping thread sends SIGKILL signals to all of
: the other threads, which then proceed to clear their user-space tids
: before synchronizing in exit_mm() with the start of core dumping.  This
: misrepresents the state of process's address space at the time of the
: SIGSEGV and makes it more difficult for someone to debug NPTL and glibc
: problems (misleading him/her to conclude that the threads had gone away
: before the fault).
:
: The fix below is to simply avoid the CLONE_CHILD_CLEARTID action if a
: core dump has been initiated.

The resulting patch from Roland (https://lkml.org/lkml/2006/10/26/269)
seems to have a larger scope than the original patch asked for.  It
seems that limitting the scope of the check to core dumping should work
for SIGSEGV issue describe above.

[Changelog partly based on Andreas' description]
Fixes: fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit")
Link: http://lkml.kernel.org/r/1471968749-26173-1-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Tested-by: William Preston &lt;wpreston@suse.com&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Roland McGrath &lt;roland@hack.frob.com&gt;
Cc: Andreas Schwab &lt;schwab@suse.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 735f2770a770156100f534646158cb58cb8b2939 upstream.

Commit fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal
exit") has caused a subtle regression in nscd which uses
CLONE_CHILD_CLEARTID to clear the nscd_certainly_running flag in the
shared databases, so that the clients are notified when nscd is
restarted.  Now, when nscd uses a non-persistent database, clients that
have it mapped keep thinking the database is being updated by nscd, when
in fact nscd has created a new (anonymous) one (for non-persistent
databases it uses an unlinked file as backend).

The original proposal for the CLONE_CHILD_CLEARTID change claimed
(https://lkml.org/lkml/2006/10/25/233):

: The NPTL library uses the CLONE_CHILD_CLEARTID flag on clone() syscalls
: on behalf of pthread_create() library calls.  This feature is used to
: request that the kernel clear the thread-id in user space (at an address
: provided in the syscall) when the thread disassociates itself from the
: address space, which is done in mm_release().
:
: Unfortunately, when a multi-threaded process incurs a core dump (such as
: from a SIGSEGV), the core-dumping thread sends SIGKILL signals to all of
: the other threads, which then proceed to clear their user-space tids
: before synchronizing in exit_mm() with the start of core dumping.  This
: misrepresents the state of process's address space at the time of the
: SIGSEGV and makes it more difficult for someone to debug NPTL and glibc
: problems (misleading him/her to conclude that the threads had gone away
: before the fault).
:
: The fix below is to simply avoid the CLONE_CHILD_CLEARTID action if a
: core dump has been initiated.

The resulting patch from Roland (https://lkml.org/lkml/2006/10/26/269)
seems to have a larger scope than the original patch asked for.  It
seems that limitting the scope of the check to core dumping should work
for SIGSEGV issue describe above.

[Changelog partly based on Andreas' description]
Fixes: fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit")
Link: http://lkml.kernel.org/r/1471968749-26173-1-git-send-email-mhocko@kernel.org
Signed-off-by: Michal Hocko &lt;mhocko@suse.com&gt;
Tested-by: William Preston &lt;wpreston@suse.com&gt;
Acked-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Cc: Roland McGrath &lt;roland@hack.frob.com&gt;
Cc: Andreas Schwab &lt;schwab@suse.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>sysctl: handle error writing UINT_MAX to u32 fields</title>
<updated>2016-10-07T13:21:25+00:00</updated>
<author>
<name>Subash Abhinov Kasiviswanathan</name>
<email>subashab@codeaurora.org</email>
</author>
<published>2016-08-25T22:16:51+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=f4cea51e9a3d536e2ca2b74a958f7c0b4ea733c3'/>
<id>f4cea51e9a3d536e2ca2b74a958f7c0b4ea733c3</id>
<content type='text'>
commit e7d316a02f683864a12389f8808570e37fb90aa3 upstream.

We have scripts which write to certain fields on 3.18 kernels but this
seems to be failing on 4.4 kernels.  An entry which we write to here is
xfrm_aevent_rseqth which is u32.

  echo 4294967295  &gt; /proc/sys/net/core/xfrm_aevent_rseqth

Commit 230633d109e3 ("kernel/sysctl.c: detect overflows when converting
to int") prevented writing to sysctl entries when integer overflow
occurs.  However, this does not apply to unsigned integers.

Heinrich suggested that we introduce a new option to handle 64 bit
limits and set min as 0 and max as UINT_MAX.  This might not work as it
leads to issues similar to __do_proc_doulongvec_minmax.  Alternatively,
we would need to change the datatype of the entry to 64 bit.

  static int __do_proc_doulongvec_minmax(void *data, struct ctl_table
  {
      i = (unsigned long *) data;   //This cast is causing to read beyond the size of data (u32)
      vleft = table-&gt;maxlen / sizeof(unsigned long); //vleft is 0 because maxlen is sizeof(u32) which is lesser than sizeof(unsigned long) on x86_64.

Introduce a new proc handler proc_douintvec.  Individual proc entries
will need to be updated to use the new handler.

[akpm@linux-foundation.org: coding-style fixes]
Fixes: 230633d109e3 ("kernel/sysctl.c:detect overflows when converting to int")
Link: http://lkml.kernel.org/r/1471479806-5252-1-git-send-email-subashab@codeaurora.org
Signed-off-by: Subash Abhinov Kasiviswanathan &lt;subashab@codeaurora.org&gt;
Cc: Heinrich Schuchardt &lt;xypron.glpk@gmx.de&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit e7d316a02f683864a12389f8808570e37fb90aa3 upstream.

We have scripts which write to certain fields on 3.18 kernels but this
seems to be failing on 4.4 kernels.  An entry which we write to here is
xfrm_aevent_rseqth which is u32.

  echo 4294967295  &gt; /proc/sys/net/core/xfrm_aevent_rseqth

Commit 230633d109e3 ("kernel/sysctl.c: detect overflows when converting
to int") prevented writing to sysctl entries when integer overflow
occurs.  However, this does not apply to unsigned integers.

Heinrich suggested that we introduce a new option to handle 64 bit
limits and set min as 0 and max as UINT_MAX.  This might not work as it
leads to issues similar to __do_proc_doulongvec_minmax.  Alternatively,
we would need to change the datatype of the entry to 64 bit.

  static int __do_proc_doulongvec_minmax(void *data, struct ctl_table
  {
      i = (unsigned long *) data;   //This cast is causing to read beyond the size of data (u32)
      vleft = table-&gt;maxlen / sizeof(unsigned long); //vleft is 0 because maxlen is sizeof(u32) which is lesser than sizeof(unsigned long) on x86_64.

Introduce a new proc handler proc_douintvec.  Individual proc entries
will need to be updated to use the new handler.

[akpm@linux-foundation.org: coding-style fixes]
Fixes: 230633d109e3 ("kernel/sysctl.c:detect overflows when converting to int")
Link: http://lkml.kernel.org/r/1471479806-5252-1-git-send-email-subashab@codeaurora.org
Signed-off-by: Subash Abhinov Kasiviswanathan &lt;subashab@codeaurora.org&gt;
Cc: Heinrich Schuchardt &lt;xypron.glpk@gmx.de&gt;
Cc: Kees Cook &lt;keescook@chromium.org&gt;
Cc: "David S. Miller" &lt;davem@davemloft.net&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>tracing: Have HIST_TRIGGERS select TRACING</title>
<updated>2016-10-07T13:21:24+00:00</updated>
<author>
<name>Tom Zanussi</name>
<email>tom.zanussi@linux.intel.com</email>
</author>
<published>2016-07-03T13:51:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=42ecc48879c3c4ed9e889f2a9d2e365352a6d197'/>
<id>42ecc48879c3c4ed9e889f2a9d2e365352a6d197</id>
<content type='text'>
commit 7ad8fb61c4abf589596f0a4da34d987471481569 upstream.

The kbuild test robot reported a compile error if HIST_TRIGGERS was
enabled but nothing else that selected TRACING was configured in.

HIST_TRIGGERS should directly select it and not rely on anything else
to do it.

Link: http://lkml.kernel.org/r/57791866.8080505@linux.intel.com

Reported-by: kbuild test robot &lt;fennguang.wu@intel.com&gt;
Fixes: 7ef224d1d0e3a ("tracing: Add 'hist' event trigger command")
Signed-off-by: Tom Zanussi &lt;tom.zanussi@linux.intel.com&gt;
Signed-off-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 7ad8fb61c4abf589596f0a4da34d987471481569 upstream.

The kbuild test robot reported a compile error if HIST_TRIGGERS was
enabled but nothing else that selected TRACING was configured in.

HIST_TRIGGERS should directly select it and not rely on anything else
to do it.

Link: http://lkml.kernel.org/r/57791866.8080505@linux.intel.com

Reported-by: kbuild test robot &lt;fennguang.wu@intel.com&gt;
Fixes: 7ef224d1d0e3a ("tracing: Add 'hist' event trigger command")
Signed-off-by: Tom Zanussi &lt;tom.zanussi@linux.intel.com&gt;
Signed-off-by: Steven Rostedt &lt;rostedt@goodmis.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>printk: fix parsing of "brl=" option</title>
<updated>2016-10-07T13:21:21+00:00</updated>
<author>
<name>Nicolas Iooss</name>
<email>nicolas.iooss_linux@m4x.org</email>
</author>
<published>2016-08-25T22:17:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d8f4420f85adc846d69eb9e34e468e9016147d58'/>
<id>d8f4420f85adc846d69eb9e34e468e9016147d58</id>
<content type='text'>
commit ae6c33ba6e37eea3012fe2640b22400ef3f2d0f3 upstream.

Commit bbeddf52adc1 ("printk: move braille console support into separate
braille.[ch] files") moved the parsing of braille-related options into
_braille_console_setup(), changing the type of variable str from char*
to char**.  In this commit, memcmp(str, "brl,", 4) was correctly updated
to memcmp(*str, "brl,", 4) but not memcmp(str, "brl=", 4).

Update the code to make "brl=" option work again and replace memcmp()
with strncmp() to make the compiler able to detect such an issue.

Fixes: bbeddf52adc1 ("printk: move braille console support into separate braille.[ch] files")
Link: http://lkml.kernel.org/r/20160823165700.28952-1-nicolas.iooss_linux@m4x.org
Signed-off-by: Nicolas Iooss &lt;nicolas.iooss_linux@m4x.org&gt;
Cc: Joe Perches &lt;joe@perches.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit ae6c33ba6e37eea3012fe2640b22400ef3f2d0f3 upstream.

Commit bbeddf52adc1 ("printk: move braille console support into separate
braille.[ch] files") moved the parsing of braille-related options into
_braille_console_setup(), changing the type of variable str from char*
to char**.  In this commit, memcmp(str, "brl,", 4) was correctly updated
to memcmp(*str, "brl,", 4) but not memcmp(str, "brl=", 4).

Update the code to make "brl=" option work again and replace memcmp()
with strncmp() to make the compiler able to detect such an issue.

Fixes: bbeddf52adc1 ("printk: move braille console support into separate braille.[ch] files")
Link: http://lkml.kernel.org/r/20160823165700.28952-1-nicolas.iooss_linux@m4x.org
Signed-off-by: Nicolas Iooss &lt;nicolas.iooss_linux@m4x.org&gt;
Cc: Joe Perches &lt;joe@perches.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>sched/cputime: Fix prev steal time accouting during CPU hotplug</title>
<updated>2016-10-07T13:21:18+00:00</updated>
<author>
<name>Wanpeng Li</name>
<email>wanpeng.li@hotmail.com</email>
</author>
<published>2016-06-13T10:32:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=afca668faa80cbd97ca767d41c2845a175d931c2'/>
<id>afca668faa80cbd97ca767d41c2845a175d931c2</id>
<content type='text'>
commit 3d89e5478bf550a50c99e93adf659369798263b0 upstream.

Commit:

  e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU hotplug")

... set rq-&gt;prev_* to 0 after a CPU hotplug comes back, in order to
fix the case where (after CPU hotplug) steal time is smaller than
rq-&gt;prev_steal_time.

However, this should never happen. Steal time was only smaller because of the
KVM-specific bug fixed by the previous patch.  Worse, the previous patch
triggers a bug on CPU hot-unplug/plug operation: because
rq-&gt;prev_steal_time is cleared, all of the CPU's past steal time will be
accounted again on hot-plug.

Since the root cause has been fixed, we can just revert commit e9532e69b8d1.

Signed-off-by: Wanpeng Li &lt;wanpeng.li@hotmail.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Radim Krčmář &lt;rkrcmar@redhat.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fixes: 'commit e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU hotplug")'
Link: http://lkml.kernel.org/r/1465813966-3116-3-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 3d89e5478bf550a50c99e93adf659369798263b0 upstream.

Commit:

  e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU hotplug")

... set rq-&gt;prev_* to 0 after a CPU hotplug comes back, in order to
fix the case where (after CPU hotplug) steal time is smaller than
rq-&gt;prev_steal_time.

However, this should never happen. Steal time was only smaller because of the
KVM-specific bug fixed by the previous patch.  Worse, the previous patch
triggers a bug on CPU hot-unplug/plug operation: because
rq-&gt;prev_steal_time is cleared, all of the CPU's past steal time will be
accounted again on hot-plug.

Since the root cause has been fixed, we can just revert commit e9532e69b8d1.

Signed-off-by: Wanpeng Li &lt;wanpeng.li@hotmail.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Paolo Bonzini &lt;pbonzini@redhat.com&gt;
Cc: Frederic Weisbecker &lt;fweisbec@gmail.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Radim Krčmář &lt;rkrcmar@redhat.com&gt;
Cc: Rik van Riel &lt;riel@redhat.com&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fixes: 'commit e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU hotplug")'
Link: http://lkml.kernel.org/r/1465813966-3116-3-git-send-email-wanpeng.li@hotmail.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>perf/core: Use this_cpu_ptr() when stopping AUX events</title>
<updated>2016-10-07T13:21:18+00:00</updated>
<author>
<name>Will Deacon</name>
<email>will.deacon@arm.com</email>
</author>
<published>2016-08-24T09:07:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6bbeeaf13d27b68303e5e1663139faa144200c8d'/>
<id>6bbeeaf13d27b68303e5e1663139faa144200c8d</id>
<content type='text'>
commit 8b6a3fe8fab97716990a3abde1a01fb5a34552a3 upstream.

When tearing down an AUX buf for an event via perf_mmap_close(),
__perf_event_output_stop() is called on the event's CPU to ensure that
trace generation is halted before the process of unmapping and
freeing the buffer pages begins.

The callback is performed via cpu_function_call(), which ensures that it
runs with interrupts disabled and is therefore not preemptible.
Unfortunately, the current code grabs the per-cpu context pointer using
get_cpu_ptr(), which unnecessarily disables preemption and doesn't pair
the call with put_cpu_ptr(), leading to a preempt_count() imbalance and
a BUG when freeing the AUX buffer later on:

  WARNING: CPU: 1 PID: 2249 at kernel/events/ring_buffer.c:539 __rb_free_aux+0x10c/0x120
  Modules linked in:
  [...]
  Call Trace:
   [&lt;ffffffff813379dd&gt;] dump_stack+0x4f/0x72
   [&lt;ffffffff81059ff6&gt;] __warn+0xc6/0xe0
   [&lt;ffffffff8105a0c8&gt;] warn_slowpath_null+0x18/0x20
   [&lt;ffffffff8112761c&gt;] __rb_free_aux+0x10c/0x120
   [&lt;ffffffff81128163&gt;] rb_free_aux+0x13/0x20
   [&lt;ffffffff8112515e&gt;] perf_mmap_close+0x29e/0x2f0
   [&lt;ffffffff8111da30&gt;] ? perf_iterate_ctx+0xe0/0xe0
   [&lt;ffffffff8115f685&gt;] remove_vma+0x25/0x60
   [&lt;ffffffff81161796&gt;] exit_mmap+0x106/0x140
   [&lt;ffffffff8105725c&gt;] mmput+0x1c/0xd0
   [&lt;ffffffff8105cac3&gt;] do_exit+0x253/0xbf0
   [&lt;ffffffff8105e32e&gt;] do_group_exit+0x3e/0xb0
   [&lt;ffffffff81068d49&gt;] get_signal+0x249/0x640
   [&lt;ffffffff8101c273&gt;] do_signal+0x23/0x640
   [&lt;ffffffff81905f42&gt;] ? _raw_write_unlock_irq+0x12/0x30
   [&lt;ffffffff81905f69&gt;] ? _raw_spin_unlock_irq+0x9/0x10
   [&lt;ffffffff81901896&gt;] ? __schedule+0x2c6/0x710
   [&lt;ffffffff810022a4&gt;] exit_to_usermode_loop+0x74/0x90
   [&lt;ffffffff81002a56&gt;] prepare_exit_to_usermode+0x26/0x30
   [&lt;ffffffff81906d1b&gt;] retint_user+0x8/0x10

This patch uses this_cpu_ptr() instead of get_cpu_ptr(), since preemption is
already disabled by the caller.

Signed-off-by: Will Deacon &lt;will.deacon@arm.com&gt;
Reviewed-by: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@kernel.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vince Weaver &lt;vincent.weaver@maine.edu&gt;
Fixes: 95ff4ca26c49 ("perf/core: Free AUX pages in unmap path")
Link: http://lkml.kernel.org/r/20160824091905.GA16944@arm.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 8b6a3fe8fab97716990a3abde1a01fb5a34552a3 upstream.

When tearing down an AUX buf for an event via perf_mmap_close(),
__perf_event_output_stop() is called on the event's CPU to ensure that
trace generation is halted before the process of unmapping and
freeing the buffer pages begins.

The callback is performed via cpu_function_call(), which ensures that it
runs with interrupts disabled and is therefore not preemptible.
Unfortunately, the current code grabs the per-cpu context pointer using
get_cpu_ptr(), which unnecessarily disables preemption and doesn't pair
the call with put_cpu_ptr(), leading to a preempt_count() imbalance and
a BUG when freeing the AUX buffer later on:

  WARNING: CPU: 1 PID: 2249 at kernel/events/ring_buffer.c:539 __rb_free_aux+0x10c/0x120
  Modules linked in:
  [...]
  Call Trace:
   [&lt;ffffffff813379dd&gt;] dump_stack+0x4f/0x72
   [&lt;ffffffff81059ff6&gt;] __warn+0xc6/0xe0
   [&lt;ffffffff8105a0c8&gt;] warn_slowpath_null+0x18/0x20
   [&lt;ffffffff8112761c&gt;] __rb_free_aux+0x10c/0x120
   [&lt;ffffffff81128163&gt;] rb_free_aux+0x13/0x20
   [&lt;ffffffff8112515e&gt;] perf_mmap_close+0x29e/0x2f0
   [&lt;ffffffff8111da30&gt;] ? perf_iterate_ctx+0xe0/0xe0
   [&lt;ffffffff8115f685&gt;] remove_vma+0x25/0x60
   [&lt;ffffffff81161796&gt;] exit_mmap+0x106/0x140
   [&lt;ffffffff8105725c&gt;] mmput+0x1c/0xd0
   [&lt;ffffffff8105cac3&gt;] do_exit+0x253/0xbf0
   [&lt;ffffffff8105e32e&gt;] do_group_exit+0x3e/0xb0
   [&lt;ffffffff81068d49&gt;] get_signal+0x249/0x640
   [&lt;ffffffff8101c273&gt;] do_signal+0x23/0x640
   [&lt;ffffffff81905f42&gt;] ? _raw_write_unlock_irq+0x12/0x30
   [&lt;ffffffff81905f69&gt;] ? _raw_spin_unlock_irq+0x9/0x10
   [&lt;ffffffff81901896&gt;] ? __schedule+0x2c6/0x710
   [&lt;ffffffff810022a4&gt;] exit_to_usermode_loop+0x74/0x90
   [&lt;ffffffff81002a56&gt;] prepare_exit_to_usermode+0x26/0x30
   [&lt;ffffffff81906d1b&gt;] retint_user+0x8/0x10

This patch uses this_cpu_ptr() instead of get_cpu_ptr(), since preemption is
already disabled by the caller.

Signed-off-by: Will Deacon &lt;will.deacon@arm.com&gt;
Reviewed-by: Alexander Shishkin &lt;alexander.shishkin@linux.intel.com&gt;
Cc: Arnaldo Carvalho de Melo &lt;acme@kernel.org&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Vince Weaver &lt;vincent.weaver@maine.edu&gt;
Fixes: 95ff4ca26c49 ("perf/core: Free AUX pages in unmap path")
Link: http://lkml.kernel.org/r/20160824091905.GA16944@arm.com
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>rcuperf: Don't treat gp_exp mis-setting as a WARN</title>
<updated>2016-10-07T13:21:17+00:00</updated>
<author>
<name>Boqun Feng</name>
<email>boqun.feng@gmail.com</email>
</author>
<published>2016-05-25T01:25:33+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=376c711bb7f9049ac447608ebe964068508114af'/>
<id>376c711bb7f9049ac447608ebe964068508114af</id>
<content type='text'>
commit af06d4f74a7d2132c805339bfd5ab771b5706f42 upstream.

0day found a boot warning triggered in rcu_perf_writer() on !SMP kernel:

	WARN_ON(rcu_gp_is_normal() &amp;&amp; gp_exp);

, the root cause of which is trying to measure expedited grace
periods(by setting gp_exp to true by default) when all the grace periods
are normal(TINY RCU only has normal grace periods).

However, such a mis-setting would only result in failing to measure the
performance for a specific kind of grace periods, therefore using a
WARN_ON to check this is a little overkilling. We could handle this
inside rcuperf module via some error messages to tell users about the
mis-settings.

Therefore this patch removes the WARN_ON in rcu_perf_writer() and
handles those checkings in rcu_perf_init() with plain if() code.

Moreover, this patch changes the default value of gp_exp to 1) align
with rcutorture tests and 2) make the default setting work for all RCU
implementations by default.

Suggested-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Signed-off-by: Boqun Feng &lt;boqun.feng@gmail.com&gt;
Fixes: http://lkml.kernel.org/r/57411b10.mFvG0+AgcrMXGtcj%fengguang.wu@intel.com
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit af06d4f74a7d2132c805339bfd5ab771b5706f42 upstream.

0day found a boot warning triggered in rcu_perf_writer() on !SMP kernel:

	WARN_ON(rcu_gp_is_normal() &amp;&amp; gp_exp);

, the root cause of which is trying to measure expedited grace
periods(by setting gp_exp to true by default) when all the grace periods
are normal(TINY RCU only has normal grace periods).

However, such a mis-setting would only result in failing to measure the
performance for a specific kind of grace periods, therefore using a
WARN_ON to check this is a little overkilling. We could handle this
inside rcuperf module via some error messages to tell users about the
mis-settings.

Therefore this patch removes the WARN_ON in rcu_perf_writer() and
handles those checkings in rcu_perf_init() with plain if() code.

Moreover, this patch changes the default value of gp_exp to 1) align
with rcutorture tests and 2) make the default setting work for all RCU
implementations by default.

Suggested-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Signed-off-by: Boqun Feng &lt;boqun.feng@gmail.com&gt;
Fixes: http://lkml.kernel.org/r/57411b10.mFvG0+AgcrMXGtcj%fengguang.wu@intel.com
Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup: fix invalid controller enable rejections with cgroup namespace</title>
<updated>2016-10-07T13:21:16+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2016-09-23T20:55:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=fef79fb7f4434f191a506104df71a6a11c51c0db'/>
<id>fef79fb7f4434f191a506104df71a6a11c51c0db</id>
<content type='text'>
commit 9157056da8f8c4a6305f15619e269f164b63a6de upstream.

On the v2 hierarchy, "cgroup.subtree_control" rejects controller
enables if the cgroup has processes in it.  The enforcement of this
logic assumes that the cgroup wouldn't have any css_sets associated
with it if there are no tasks in the cgroup, which is no longer true
since a79a908fd2b0 ("cgroup: introduce cgroup namespaces").

When a cgroup namespace is created, it pins the css_set of the
creating task to use it as the root css_set of the namespace.  This
extra reference stays as long as the namespace is around and makes
"cgroup.subtree_control" think that the namespace root cgroup is not
empty even when it is and thus reject controller enables.

Fix it by making cgroup_subtree_control() walk and test emptiness of
each css_set instead of testing whether the list_head is empty.

While at it, update the comment of cgroup_task_count() to indicate
that the returned value may be higher than the number of tasks, which
has always been true due to temporary references and doesn't break
anything.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Evgeny Vereshchagin &lt;evvers@ya.ru&gt;
Cc: Serge E. Hallyn &lt;serge.hallyn@ubuntu.com&gt;
Cc: Aditya Kali &lt;adityakali@google.com&gt;
Cc: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Fixes: a79a908fd2b0 ("cgroup: introduce cgroup namespaces")
Link: https://github.com/systemd/systemd/pull/3589#issuecomment-249089541
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 9157056da8f8c4a6305f15619e269f164b63a6de upstream.

On the v2 hierarchy, "cgroup.subtree_control" rejects controller
enables if the cgroup has processes in it.  The enforcement of this
logic assumes that the cgroup wouldn't have any css_sets associated
with it if there are no tasks in the cgroup, which is no longer true
since a79a908fd2b0 ("cgroup: introduce cgroup namespaces").

When a cgroup namespace is created, it pins the css_set of the
creating task to use it as the root css_set of the namespace.  This
extra reference stays as long as the namespace is around and makes
"cgroup.subtree_control" think that the namespace root cgroup is not
empty even when it is and thus reject controller enables.

Fix it by making cgroup_subtree_control() walk and test emptiness of
each css_set instead of testing whether the list_head is empty.

While at it, update the comment of cgroup_task_count() to indicate
that the returned value may be higher than the number of tasks, which
has always been true due to temporary references and doesn't break
anything.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-by: Evgeny Vereshchagin &lt;evvers@ya.ru&gt;
Cc: Serge E. Hallyn &lt;serge.hallyn@ubuntu.com&gt;
Cc: Aditya Kali &lt;adityakali@google.com&gt;
Cc: Eric W. Biederman &lt;ebiederm@xmission.com&gt;
Fixes: a79a908fd2b0 ("cgroup: introduce cgroup namespaces")
Link: https://github.com/systemd/systemd/pull/3589#issuecomment-249089541
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>cpuset: handle race between CPU hotplug and cpuset_hotplug_work</title>
<updated>2016-10-07T13:21:16+00:00</updated>
<author>
<name>Joonwoo Park</name>
<email>joonwoop@codeaurora.org</email>
</author>
<published>2016-09-12T04:14:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=0ec4bc23c454f675a1ca3d3155e8ad1bfde4841f'/>
<id>0ec4bc23c454f675a1ca3d3155e8ad1bfde4841f</id>
<content type='text'>
commit 28b89b9e6f7b6c8fef7b3af39828722bca20cfee upstream.

A discrepancy between cpu_online_mask and cpuset's effective_cpus
mask is inevitable during hotplug since cpuset defers updating of
effective_cpus mask using a workqueue, during which time nothing
prevents the system from more hotplug operations.  For that reason
guarantee_online_cpus() walks up the cpuset hierarchy until it finds
an intersection under the assumption that top cpuset's effective_cpus
mask intersects with cpu_online_mask even with such a race occurring.

However a sequence of CPU hotplugs can open a time window, during which
none of the effective CPUs in the top cpuset intersect with
cpu_online_mask.

For example when there are 4 possible CPUs 0-3 and only CPU0 is online:

  ========================  ===========================
   cpu_online_mask           top_cpuset.effective_cpus
  ========================  ===========================
   echo 1 &gt; cpu2/online.
   CPU hotplug notifier woke up hotplug work but not yet scheduled.
      [0,2]                     [0]

   echo 0 &gt; cpu0/online.
   The workqueue is still runnable.
      [2]                       [0]
  ========================  ===========================

  Now there is no intersection between cpu_online_mask and
  top_cpuset.effective_cpus.  Thus invoking sys_sched_setaffinity() at
  this moment can cause following:

   Unable to handle kernel NULL pointer dereference at virtual address 000000d0
   ------------[ cut here ]------------
   Kernel BUG at ffffffc0001389b0 [verbose debug info unavailable]
   Internal error: Oops - BUG: 96000005 [#1] PREEMPT SMP
   Modules linked in:
   CPU: 2 PID: 1420 Comm: taskset Tainted: G        W       4.4.8+ #98
   task: ffffffc06a5c4880 ti: ffffffc06e124000 task.ti: ffffffc06e124000
   PC is at guarantee_online_cpus+0x2c/0x58
   LR is at cpuset_cpus_allowed+0x4c/0x6c
   &lt;snip&gt;
   Process taskset (pid: 1420, stack limit = 0xffffffc06e124020)
   Call trace:
   [&lt;ffffffc0001389b0&gt;] guarantee_online_cpus+0x2c/0x58
   [&lt;ffffffc00013b208&gt;] cpuset_cpus_allowed+0x4c/0x6c
   [&lt;ffffffc0000d61f0&gt;] sched_setaffinity+0xc0/0x1ac
   [&lt;ffffffc0000d6374&gt;] SyS_sched_setaffinity+0x98/0xac
   [&lt;ffffffc000085cb0&gt;] el0_svc_naked+0x24/0x28

The top cpuset's effective_cpus are guaranteed to be identical to
cpu_online_mask eventually.  Hence fall back to cpu_online_mask when
there is no intersection between top cpuset's effective_cpus and
cpu_online_mask.

Signed-off-by: Joonwoo Park &lt;joonwoop@codeaurora.org&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: cgroups@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 28b89b9e6f7b6c8fef7b3af39828722bca20cfee upstream.

A discrepancy between cpu_online_mask and cpuset's effective_cpus
mask is inevitable during hotplug since cpuset defers updating of
effective_cpus mask using a workqueue, during which time nothing
prevents the system from more hotplug operations.  For that reason
guarantee_online_cpus() walks up the cpuset hierarchy until it finds
an intersection under the assumption that top cpuset's effective_cpus
mask intersects with cpu_online_mask even with such a race occurring.

However a sequence of CPU hotplugs can open a time window, during which
none of the effective CPUs in the top cpuset intersect with
cpu_online_mask.

For example when there are 4 possible CPUs 0-3 and only CPU0 is online:

  ========================  ===========================
   cpu_online_mask           top_cpuset.effective_cpus
  ========================  ===========================
   echo 1 &gt; cpu2/online.
   CPU hotplug notifier woke up hotplug work but not yet scheduled.
      [0,2]                     [0]

   echo 0 &gt; cpu0/online.
   The workqueue is still runnable.
      [2]                       [0]
  ========================  ===========================

  Now there is no intersection between cpu_online_mask and
  top_cpuset.effective_cpus.  Thus invoking sys_sched_setaffinity() at
  this moment can cause following:

   Unable to handle kernel NULL pointer dereference at virtual address 000000d0
   ------------[ cut here ]------------
   Kernel BUG at ffffffc0001389b0 [verbose debug info unavailable]
   Internal error: Oops - BUG: 96000005 [#1] PREEMPT SMP
   Modules linked in:
   CPU: 2 PID: 1420 Comm: taskset Tainted: G        W       4.4.8+ #98
   task: ffffffc06a5c4880 ti: ffffffc06e124000 task.ti: ffffffc06e124000
   PC is at guarantee_online_cpus+0x2c/0x58
   LR is at cpuset_cpus_allowed+0x4c/0x6c
   &lt;snip&gt;
   Process taskset (pid: 1420, stack limit = 0xffffffc06e124020)
   Call trace:
   [&lt;ffffffc0001389b0&gt;] guarantee_online_cpus+0x2c/0x58
   [&lt;ffffffc00013b208&gt;] cpuset_cpus_allowed+0x4c/0x6c
   [&lt;ffffffc0000d61f0&gt;] sched_setaffinity+0xc0/0x1ac
   [&lt;ffffffc0000d6374&gt;] SyS_sched_setaffinity+0x98/0xac
   [&lt;ffffffc000085cb0&gt;] el0_svc_naked+0x24/0x28

The top cpuset's effective_cpus are guaranteed to be identical to
cpu_online_mask eventually.  Hence fall back to cpu_online_mask when
there is no intersection between top cpuset's effective_cpus and
cpu_online_mask.

Signed-off-by: Joonwoo Park &lt;joonwoop@codeaurora.org&gt;
Acked-by: Li Zefan &lt;lizefan@huawei.com&gt;
Cc: Tejun Heo &lt;tj@kernel.org&gt;
Cc: cgroups@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
</feed>
