linux.git/arch/x86/kernel/cpu/resctrl, branch v6.9

x86/resctrl: Fix uninitialized memory read when last CPU of domain goes offline

2024-04-03T07:30:01+00:00

Tony encountered this OOPS when the last CPU of a domain goes
offline while running a kernel built with CONFIG_NO_HZ_FULL:

    BUG: kernel NULL pointer dereference, address: 0000000000000000
    #PF: supervisor read access in kernel mode
    #PF: error_code(0x0000) - not-present page
    PGD 0
    Oops: 0000 [#1] PREEMPT SMP NOPTI
    ...
    RIP: 0010:__find_nth_andnot_bit+0x66/0x110
    ...
    Call Trace:
     
     ? __die()
     ? page_fault_oops()
     ? exc_page_fault()
     ? asm_exc_page_fault()
     cpumask_any_housekeeping()
     mbm_setup_overflow_handler()
     resctrl_offline_cpu()
     resctrl_arch_offline_cpu()
     cpuhp_invoke_callback()
     cpuhp_thread_fun()
     smpboot_thread_fn()
     kthread()
     ret_from_fork()
     ret_from_fork_asm()
     

The NULL pointer dereference is encountered while searching for another
online CPU in the domain (of which there are none) that can be used to
run the MBM overflow handler.

Because the kernel is configured with CONFIG_NO_HZ_FULL the search for
another CPU (in its effort to prefer those CPUs that aren't marked
nohz_full) consults the mask representing the nohz_full CPUs,
tick_nohz_full_mask. On a kernel with CONFIG_CPUMASK_OFFSTACK=y
tick_nohz_full_mask is not allocated unless the kernel is booted with
the "nohz_full=" parameter and because of that any access to
tick_nohz_full_mask needs to be guarded with tick_nohz_full_enabled().

Replace the IS_ENABLED(CONFIG_NO_HZ_FULL) with tick_nohz_full_enabled().
The latter ensures tick_nohz_full_mask can be accessed safely and can be
used whether kernel is built with CONFIG_NO_HZ_FULL enabled or not.

[ Use Ingo's suggestion that combines the two NO_HZ checks into one. ]

Fixes: a4846aaf3945 ("x86/resctrl: Add cpumask_any_housekeeping() for limbo/overflow")
Reported-by: Tony Luck 
Signed-off-by: Reinette Chatre 
Signed-off-by: Ingo Molnar 
Reviewed-by: Babu Moger 
Link: https://lore.kernel.org/r/ff8dfc8d3dcb04b236d523d1e0de13d2ef585223.1711993956.git.reinette.chatre@intel.com
Closes: https://lore.kernel.org/lkml/ZgIFT5gZgIQ9A9G7@agluck-desk3/

x86/resctrl: Remove lockdep annotation that triggers false positive

2024-02-22T15:15:38+00:00

get_domain_from_cpu() walks a list of domains to find the one that
contains the specified CPU. This needs to be protected against races
with CPU hotplug when the list is modified. It has recently gained
a lockdep annotation to check this.

The lockdep annotation causes false positives when called via IPI as the
lock is held, but by another process. Remove it.

  [ bp: Refresh it ontop of x86/cache. ]

Fixes: fb700810d30b ("x86/resctrl: Separate arch and fs resctrl locks")
Reported-by: Tony Luck 
Signed-off-by: James Morse 
Signed-off-by: Borislav Petkov (AMD) 
Link: https://lore.kernel.org/all/ZdUSwOM9UUNpw84Y@agluck-desk3

x86/resctrl: Separate arch and fs resctrl locks

2024-02-19T18:28:07+00:00

resctrl has one mutex that is taken by the architecture-specific code, and the
filesystem parts. The two interact via cpuhp, where the architecture code
updates the domain list. Filesystem handlers that walk the domains list should
not run concurrently with the cpuhp callback modifying the list.

Exposing a lock from the filesystem code means the interface is not cleanly
defined, and creates the possibility of cross-architecture lock ordering
headaches. The interaction only exists so that certain filesystem paths are
serialised against CPU hotplug. The CPU hotplug code already has a mechanism to
do this using cpus_read_lock().

MPAM's monitors have an overflow interrupt, so it needs to be possible to walk
the domains list in irq context. RCU is ideal for this, but some paths need to
be able to sleep to allocate memory.

Because resctrl_{on,off}line_cpu() take the rdtgroup_mutex as part of a cpuhp
callback, cpus_read_lock() must always be taken first.
rdtgroup_schemata_write() already does this.

Most of the filesystem code's domain list walkers are currently protected by
the rdtgroup_mutex taken in rdtgroup_kn_lock_live().  The exceptions are
rdt_bit_usage_show() and the mon_config helpers which take the lock directly.

Make the domain list protected by RCU. An architecture-specific lock prevents
concurrent writers. rdt_bit_usage_show() could walk the domain list using RCU,
but to keep all the filesystem operations the same, this is changed to call
cpus_read_lock().  The mon_config helpers send multiple IPIs, take the
cpus_read_lock() in these cases.

The other filesystem list walkers need to be able to sleep.  Add
cpus_read_lock() to rdtgroup_kn_lock_live() so that the cpuhp callbacks can't
be invoked when file system operations are occurring.

Add lockdep_assert_cpus_held() in the cases where the rdtgroup_kn_lock_live()
call isn't obvious.

Resctrl's domain online/offline calls now need to take the rdtgroup_mutex
themselves.

  [ bp: Fold in a build fix: https://lore.kernel.org/r/87zfvwieli.ffs@tglx ]

Signed-off-by: James Morse 
Signed-off-by: Borislav Petkov (AMD) 
Reviewed-by: Shaopeng Tan 
Reviewed-by: Reinette Chatre 
Reviewed-by: Babu Moger 
Tested-by: Shaopeng Tan 
Tested-by: Peter Newman 
Tested-by: Babu Moger 
Tested-by: Carl Worth  # arm64
Link: https://lore.kernel.org/r/20240213184438.16675-25-james.morse@arm.com
Signed-off-by: Borislav Petkov (AMD)

x86/resctrl: Move domain helper migration into resctrl_offline_cpu()

2024-02-16T18:18:33+00:00

When a CPU is taken offline the resctrl filesystem code needs to check if it
was the CPU nominated to perform the periodic overflow and limbo work. If so,
another CPU needs to be chosen to do this work.

This is currently done in core.c, mixed in with the code that removes the CPU
from the domain's mask, and potentially free()s the domain.

Move the migration of the overflow and limbo helpers into the filesystem code,
into resctrl_offline_cpu(). As resctrl_offline_cpu() runs before the
architecture code has removed the CPU from the domain mask, the callers need to
be told which CPU is being removed, to avoid picking it as the new CPU. This
uses the exclude_cpu feature previously added.

Signed-off-by: James Morse 
Signed-off-by: Borislav Petkov (AMD) 
Reviewed-by: Shaopeng Tan 
Reviewed-by: Reinette Chatre 
Reviewed-by: Babu Moger 
Tested-by: Shaopeng Tan 
Tested-by: Peter Newman 
Tested-by: Babu Moger 
Tested-by: Carl Worth  # arm64
Link: https://lore.kernel.org/r/20240213184438.16675-24-james.morse@arm.com
Signed-off-by: Borislav Petkov (AMD)

x86/resctrl: Add CPU offline callback for resctrl work

2024-02-16T18:18:33+00:00

The resctrl architecture specific code may need to free a domain when a CPU
goes offline, it also needs to reset the CPUs PQR_ASSOC register.  Amongst
other things, the resctrl filesystem code needs to clear this CPU from the
cpu_mask of any control and monitor groups.

Currently, this is all done in core.c and called from resctrl_offline_cpu(),
making the split between architecture and filesystem code unclear.

Move the filesystem work to remove the CPU from the control and monitor groups
into a filesystem helper called resctrl_offline_cpu(), and rename the one in
core.c resctrl_arch_offline_cpu().

Signed-off-by: James Morse 
Signed-off-by: Borislav Petkov (AMD) 
Reviewed-by: Shaopeng Tan 
Reviewed-by: Reinette Chatre 
Reviewed-by: Babu Moger 
Tested-by: Shaopeng Tan 
Tested-by: Peter Newman 
Tested-by: Babu Moger 
Tested-by: Carl Worth  # arm64
Link: https://lore.kernel.org/r/20240213184438.16675-23-james.morse@arm.com
Signed-off-by: Borislav Petkov (AMD)

x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but CPU

2024-02-16T18:18:33+00:00

When a CPU is taken offline resctrl may need to move the overflow or limbo
handlers to run on a different CPU.

Once the offline callbacks have been split, cqm_setup_limbo_handler() will be
called while the CPU that is going offline is still present in the CPU mask.

Pass the CPU to exclude to cqm_setup_limbo_handler() and
mbm_setup_overflow_handler(). These functions can use a variant of
cpumask_any_but() when selecting the CPU. -1 is used to indicate no CPUs need
excluding.

Signed-off-by: James Morse 
Signed-off-by: Borislav Petkov (AMD) 
Reviewed-by: Shaopeng Tan 
Reviewed-by: Babu Moger 
Reviewed-by: Reinette Chatre 
Tested-by: Shaopeng Tan 
Tested-by: Peter Newman 
Tested-by: Babu Moger 
Tested-by: Carl Worth  # arm64
Link: https://lore.kernel.org/r/20240213184438.16675-22-james.morse@arm.com
Signed-off-by: Borislav Petkov (AMD)

x86/resctrl: Add CPU online callback for resctrl work

2024-02-16T18:18:33+00:00

The resctrl architecture specific code may need to create a domain when a CPU
comes online, it also needs to reset the CPUs PQR_ASSOC register.  The resctrl
filesystem code needs to update the rdtgroup_default CPU mask when CPUs are
brought online.

Currently, this is all done in one function, resctrl_online_cpu().  It will
need to be split into architecture and filesystem parts before resctrl can be
moved to /fs/.

Pull the rdtgroup_default update work out as a filesystem specific cpu_online
helper. resctrl_online_cpu() is the obvious name for this, which means the
version in core.c needs renaming.

resctrl_online_cpu() is called by the arch code once it has done the work to
add the new CPU to any domains.

In future patches, resctrl_online_cpu() will take the rdtgroup_mutex itself.

Signed-off-by: James Morse 
Signed-off-by: Borislav Petkov (AMD) 
Reviewed-by: Shaopeng Tan 
Reviewed-by: Reinette Chatre 
Reviewed-by: Babu Moger 
Tested-by: Shaopeng Tan 
Tested-by: Peter Newman 
Tested-by: Babu Moger 
Tested-by: Carl Worth  # arm64
Link: https://lore.kernel.org/r/20240213184438.16675-21-james.morse@arm.com
Signed-off-by: Borislav Petkov (AMD)

x86/resctrl: Add helpers for system wide mon/alloc capable

2024-02-16T18:18:33+00:00

resctrl reads rdt_alloc_capable or rdt_mon_capable to determine whether any of
the resources support the corresponding features.  resctrl also uses the
static keys that affect the architecture's context-switch code to determine the
same thing.

This forces another architecture to have the same static keys.

As the static key is enabled based on the capable flag, and none of the
filesystem uses of these are in the scheduler path, move the capable flags
behind helpers, and use these in the filesystem code instead of the static key.

After this change, only the architecture code manages and uses the static keys
to ensure __resctrl_sched_in() does not need runtime checks.

This avoids multiple architectures having to define the same static keys.

Cases where the static key implicitly tested if the resctrl filesystem was
mounted all have an explicit check now.

Signed-off-by: James Morse 
Signed-off-by: Borislav Petkov (AMD) 
Reviewed-by: Shaopeng Tan 
Reviewed-by: Reinette Chatre 
Reviewed-by: Babu Moger 
Tested-by: Shaopeng Tan 
Tested-by: Peter Newman 
Tested-by: Babu Moger 
Tested-by: Carl Worth  # arm64
Link: https://lore.kernel.org/r/20240213184438.16675-20-james.morse@arm.com
Signed-off-by: Borislav Petkov (AMD)

x86/resctrl: Make rdt_enable_key the arch's decision to switch

2024-02-16T18:18:33+00:00

rdt_enable_key is switched when resctrl is mounted. It was also previously used
to prevent a second mount of the filesystem.

Any other architecture that wants to support resctrl has to provide identical
static keys.

Now that there are helpers for enabling and disabling the alloc/mon keys,
resctrl doesn't need to switch this extra key, it can be done by the arch code.
Use the static-key increment and decrement helpers, and change resctrl to
ensure the calls are balanced.

Signed-off-by: James Morse 
Signed-off-by: Borislav Petkov (AMD) 
Reviewed-by: Shaopeng Tan 
Reviewed-by: Reinette Chatre 
Reviewed-by: Babu Moger 
Tested-by: Shaopeng Tan 
Tested-by: Peter Newman 
Tested-by: Babu Moger 
Tested-by: Carl Worth  # arm64
Link: https://lore.kernel.org/r/20240213184438.16675-19-james.morse@arm.com
Signed-off-by: Borislav Petkov (AMD)

x86/resctrl: Move alloc/mon static keys into helpers

2024-02-16T18:18:32+00:00

resctrl enables three static keys depending on the features it has enabled.
Another architecture's context switch code may look different, any static keys
that control it should be buried behind helpers.

Move the alloc/mon logic into arch-specific helpers as a preparatory step for
making the rdt_enable_key's status something the arch code decides.

This means other architectures don't have to mirror the static keys.

Signed-off-by: James Morse 
Signed-off-by: Borislav Petkov (AMD) 
Reviewed-by: Shaopeng Tan 
Reviewed-by: Reinette Chatre 
Reviewed-by: Babu Moger 
Tested-by: Shaopeng Tan 
Tested-by: Peter Newman 
Tested-by: Babu Moger 
Tested-by: Carl Worth  # arm64
Link: https://lore.kernel.org/r/20240213184438.16675-18-james.morse@arm.com
Signed-off-by: Borislav Petkov (AMD)