linux-stable.git/kernel, branch v4.17.3

genirq/migration: Avoid out of line call if pending is not set

2018-06-25T23:51:30+00:00

commit d340ebd696f921d3ad01b8c0c29dd38f2ad2bf3e upstream.

The upcoming fix for the -EBUSY return from affinity settings requires to
use the irq_move_irq() functionality even on irq remapped interrupts. To
avoid the out of line call, move the check for the pending bit into an
inline helper.

Preparatory change for the real fix. No functional change.

Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup")
Signed-off-by: Thomas Gleixner 
Cc: Joerg Roedel 
Cc: Peter Zijlstra 
Cc: Song Liu 
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: stable@vger.kernel.org
Cc: Mike Travis 
Cc: Borislav Petkov 
Cc: Tariq Toukan 
Cc: Dou Liyang 
Link: https://lkml.kernel.org/r/20180604162224.471925894@linutronix.de
Signed-off-by: Greg Kroah-Hartman

genirq/affinity: Defer affinity setting if irq chip is busy

2018-06-25T23:51:30+00:00

commit 12f47073a40f6aa75119d8f5df4077b7f334cced upstream.

The case that interrupt affinity setting fails with -EBUSY can be handled
in the kernel completely by using the already available generic pending
infrastructure.

If a irq_chip::set_affinity() fails with -EBUSY, handle it like the
interrupts for which irq_chip::set_affinity() can only be invoked from
interrupt context. Copy the new affinity mask to irq_desc::pending_mask and
set the affinity pending bit. The next raised interrupt for the affected
irq will check the pending bit and try to set the new affinity from the
handler. This avoids that -EBUSY is returned when an affinity change is
requested from user space and the previous change has not been cleaned
up. The new affinity will take effect when the next interrupt is raised
from the device.

Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup")
Signed-off-by: Thomas Gleixner 
Tested-by: Song Liu 
Cc: Joerg Roedel 
Cc: Peter Zijlstra 
Cc: Song Liu 
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: stable@vger.kernel.org
Cc: Mike Travis 
Cc: Borislav Petkov 
Cc: Tariq Toukan 
Link: https://lkml.kernel.org/r/20180604162224.819273597@linutronix.de
Signed-off-by: Greg Kroah-Hartman

genirq/generic_pending: Do not lose pending affinity update

2018-06-25T23:51:29+00:00

commit a33a5d2d16cb84bea8d5f5510f3a41aa48b5c467 upstream.

The generic pending interrupt mechanism moves interrupts from the interrupt
handler on the original target CPU to the new destination CPU. This is
required for x86 and ia64 due to the way the interrupt delivery and
acknowledge works if the interrupts are not remapped.

However that update can fail for various reasons. Some of them are valid
reasons to discard the pending update, but the case, when the previous move
has not been fully cleaned up is not a legit reason to fail.

Check the return value of irq_do_set_affinity() for -EBUSY, which indicates
a pending cleanup, and rearm the pending move in the irq dexcriptor so it's
tried again when the next interrupt arrives.

Fixes: 996c591227d9 ("x86/irq: Plug vector cleanup race")
Signed-off-by: Thomas Gleixner 
Tested-by: Song Liu 
Cc: Joerg Roedel 
Cc: Peter Zijlstra 
Cc: Song Liu 
Cc: Dmitry Safonov <0x7f454c46@gmail.com>
Cc: stable@vger.kernel.org
Cc: Mike Travis 
Cc: Borislav Petkov 
Cc: Tariq Toukan 
Link: https://lkml.kernel.org/r/20180604162224.386544292@linutronix.de
Signed-off-by: Greg Kroah-Hartman

Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

2018-06-03T16:01:41+00:00

Pull scheduler fixes from Thomas Gleixner:

 - two patches addressing the problem that the scheduler allows under
   certain conditions user space tasks to be scheduled on CPUs which are
   not yet fully booted which causes a few subtle and hard to debug
   issue

 - add a missing runqueue clock update in the deadline scheduler which
   triggers a warning under certain circumstances

 - fix a silly typo in the scheduler header file

* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  sched/headers: Fix typo
  sched/deadline: Fix missing clock update
  sched/core: Require cpu_active() in select_task_rq(), for user tasks
  sched/core: Fix rules for running on online && !active CPUs

sched/headers: Fix typo

2018-05-31T10:27:13+00:00

I cannot spell 'throttling'.

Signed-off-by: Davidlohr Bueso 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Davidlohr Bueso 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: http://lkml.kernel.org/r/20180530224940.17839-1-dave@stgolabs.net
Signed-off-by: Ingo Molnar

sched/deadline: Fix missing clock update

2018-05-31T10:27:13+00:00

A missing clock update is causing the following warning:

 rq->clock_update_flags < RQCF_ACT_SKIP
 WARNING: CPU: 10 PID: 0 at kernel/sched/sched.h:963 inactive_task_timer+0x5d6/0x720
 Call Trace:
  
  __hrtimer_run_queues+0x10f/0x530
  hrtimer_interrupt+0xe5/0x240
  smp_apic_timer_interrupt+0x79/0x2b0
  apic_timer_interrupt+0xf/0x20
  
  do_idle+0x203/0x280
  cpu_startup_entry+0x6f/0x80
  start_secondary+0x1b0/0x200
  secondary_startup_64+0xa5/0xb0
 hardirqs last  enabled at (793919): [] cpuidle_enter_state+0x9e/0x360
 hardirqs last disabled at (793920): [] interrupt_entry+0xce/0xe0
 softirqs last  enabled at (793922): [] irq_enter+0x68/0x70
 softirqs last disabled at (793921): [] irq_enter+0x4d/0x70

This happens because inactive_task_timer() calls sub_running_bw() (if
TASK_DEAD and non_contending) that might trigger a schedutil update,
which might access the clock. Clock is however currently updated only
later in inactive_task_timer() function.

Fix the problem by updating the clock right after task_rq_lock().

Reported-by: kernel test robot 
Signed-off-by: Juri Lelli 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Claudio Scordino 
Cc: Linus Torvalds 
Cc: Luca Abeni 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: http://lkml.kernel.org/r/20180530160809.9074-1-juri.lelli@redhat.com
Signed-off-by: Ingo Molnar

sched/core: Require cpu_active() in select_task_rq(), for user tasks

2018-05-31T10:24:25+00:00

select_task_rq() is used in a few paths to select the CPU upon which a
thread should be run - for example it is used by try_to_wake_up() & by
fork or exec balancing. As-is it allows use of any online CPU that is
present in the task's cpus_allowed mask.

This presents a problem because there is a period whilst CPUs are
brought online where a CPU is marked online, but is not yet fully
initialized - ie. the period where CPUHP_AP_ONLINE_IDLE <= state <
CPUHP_ONLINE. Usually we don't run any user tasks during this window,
but there are corner cases where this can happen. An example observed
is:

  - Some user task A, running on CPU X, forks to create task B.

  - sched_fork() calls __set_task_cpu() with cpu=X, setting task B's
    task_struct::cpu field to X.

  - CPU X is offlined.

  - Task A, currently somewhere between the __set_task_cpu() in
    copy_process() and the call to wake_up_new_task(), is migrated to
    CPU Y by migrate_tasks() when CPU X is offlined.

  - CPU X is onlined, but still in the CPUHP_AP_ONLINE_IDLE state. The
    scheduler is now active on CPU X, but there are no user tasks on
    the runqueue.

  - Task A runs on CPU Y & reaches wake_up_new_task(). This calls
    select_task_rq() with cpu=X, taken from task B's task_struct,
    and select_task_rq() allows CPU X to be returned.

  - Task A enqueues task B on CPU X's runqueue, via activate_task() &
    enqueue_task().

  - CPU X now has a user task on its runqueue before it has reached the
    CPUHP_ONLINE state.

In most cases, the user tasks that schedule on the newly onlined CPU
have no idea that anything went wrong, but one case observed to be
problematic is if the task goes on to invoke the sched_setaffinity
syscall. The newly onlined CPU reaches the CPUHP_AP_ONLINE_IDLE state
before the CPU that brought it online calls stop_machine_unpark(). This
means that for a portion of the window of time between
CPUHP_AP_ONLINE_IDLE & CPUHP_ONLINE the newly onlined CPU's struct
cpu_stopper has its enabled field set to false. If a user thread is
executed on the CPU during this window and it invokes sched_setaffinity
with a CPU mask that does not include the CPU it's running on, then when
__set_cpus_allowed_ptr() calls stop_one_cpu() intending to invoke
migration_cpu_stop() and perform the actual migration away from the CPU
it will simply return -ENOENT rather than calling migration_cpu_stop().
We then return from the sched_setaffinity syscall back to the user task
that is now running on a CPU which it just asked not to run on, and
which is not present in its cpus_allowed mask.

This patch resolves the problem by having select_task_rq() enforce that
user tasks run on CPUs that are active - the same requirement that
select_fallback_rq() already enforces. This should ensure that newly
onlined CPUs reach the CPUHP_AP_ACTIVE state before being able to
schedule user tasks, and also implies that bringup_wait_for_ap() will
have called stop_machine_unpark() which resolves the sched_setaffinity
issue above.

I haven't yet investigated them, but it may be of interest to review
whether any of the actions performed by hotplug states between
CPUHP_AP_ONLINE_IDLE & CPUHP_AP_ACTIVE could have similar unintended
effects on user tasks that might schedule before they are reached, which
might widen the scope of the problem from just affecting the behaviour
of sched_setaffinity.

Signed-off-by: Paul Burton 
Signed-off-by: Peter Zijlstra (Intel) 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Link: http://lkml.kernel.org/r/20180526154648.11635-2-paul.burton@mips.com
Signed-off-by: Ingo Molnar

sched/core: Fix rules for running on online && !active CPUs

2018-05-31T10:24:24+00:00

As already enforced by the WARN() in __set_cpus_allowed_ptr(), the rules
for running on an online && !active CPU are stricter than just being a
kthread, you need to be a per-cpu kthread.

If you're not strictly per-CPU, you have better CPUs to run on and
don't need the partially booted one to get your work done.

The exception is to allow smpboot threads to bootstrap the CPU itself
and get kernel 'services' initialized before we allow userspace on it.

Signed-off-by: Peter Zijlstra (Intel) 
Cc: Linus Torvalds 
Cc: Paul E. McKenney 
Cc: Peter Zijlstra 
Cc: Steven Rostedt 
Cc: Tejun Heo 
Cc: Thomas Gleixner 
Fixes: 955dbdf4ce87 ("sched: Allow migrating kthreads into online but inactive CPUs")
Link: http://lkml.kernel.org/r/20170725165821.cejhb7v2s3kecems@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar

Merge tag 'trace-v4.17-rc4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace

2018-05-29T12:28:48+00:00

Pull tracing fixes from Steven Rostedt:
 "While writing selftests for a new feature, I triggered two existing
  bugs that deal with triggers and instances.

   - a generic trigger bug where the triggers are not removed from a
     linked list properly when deleting an instance.

   - a bug specific to snapshots, where the snapshot is done in the top
     level buffer, when it is supposed to snapshot the buffer associated
     to the instance the snapshot trigger exists in"

* tag 'trace-v4.17-rc4-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
  tracing: Make the snapshot trigger work with instances
  tracing: Fix crash when freeing instances with event triggers

tracing: Make the snapshot trigger work with instances

2018-05-28T16:49:32+00:00

The snapshot trigger currently only affects the main ring buffer, even when
it is used by the instances. This can be confusing as the snapshot trigger
is listed in the instance.

 > # cd /sys/kernel/tracing
 > # mkdir instances/foo
 > # echo snapshot > instances/foo/events/syscalls/sys_enter_fchownat/trigger
 > # echo top buffer > trace_marker
 > # echo foo buffer > instances/foo/trace_marker
 > # touch /tmp/bar
 > # chown rostedt /tmp/bar
 > # cat instances/foo/snapshot
 # tracer: nop
 #
 #
 # * Snapshot is freed *
 #
 # Snapshot commands:
 # echo 0 > snapshot : Clears and frees snapshot buffer
 # echo 1 > snapshot : Allocates snapshot buffer, if not already allocated.
 #                      Takes a snapshot of the main buffer.
 # echo 2 > snapshot : Clears snapshot buffer (but does not allocate or free)
 #                      (Doesn't have to be '2' works with any number that
 #                       is not a '0' or '1')

 > # cat snapshot
 # tracer: nop
 #
 #                              _-----=> irqs-off
 #                             / _----=> need-resched
 #                            | / _---=> hardirq/softirq
 #                            || / _--=> preempt-depth
 #                            ||| /     delay
 #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
 #              | |       |   ||||       |         |
             bash-1189  [000] ....   111.488323: tracing_mark_write: top buffer

Not only did the snapshot occur in the top level buffer, but the instance
snapshot buffer should have been allocated, and it is still free.

Cc: stable@vger.kernel.org
Fixes: 85f2b08268c01 ("tracing: Add basic event trigger framework")
Signed-off-by: Steven Rostedt (VMware)