<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/arch/x86/kernel/cpu/resctrl/internal.h, branch v6.11</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>x86/resctrl: Enable shared RMID mode on Sub-NUMA Cluster (SNC) systems</title>
<updated>2024-07-02T17:57:51+00:00</updated>
<author>
<name>Tony Luck</name>
<email>tony.luck@intel.com</email>
</author>
<published>2024-07-02T17:38:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=21b362cc762aabb3e8496d33d7b4538154c95a0b'/>
<id>21b362cc762aabb3e8496d33d7b4538154c95a0b</id>
<content type='text'>
Hardware has two RMID configuration options for SNC systems. The default
mode divides RMID counters between SNC nodes. E.g. with 200 RMIDs and
two SNC nodes per L3 cache RMIDs 0..99 are used on node 0, and 100..199
on node 1. This isn't compatible with Linux resctrl usage. On this
example system a process using RMID 5 would only update monitor counters
while running on SNC node 0.

The other mode is "RMID Sharing Mode". This is enabled by clearing bit
0 of the RMID_SNC_CONFIG (0xCA0) model specific register. In this mode
the number of logical RMIDs is the number of physical RMIDs (from CPUID
leaf 0xF) divided by the number of SNC nodes per L3 cache instance. A
process can use the same RMID across different SNC nodes.

See the "Intel Resource Director Technology Architecture Specification"
for additional details.

When SNC is enabled, update the MSR when a monitor domain is marked
online. Technically this is overkill. It only needs to be done once
per L3 cache instance rather than per SNC domain. But there is no harm
in doing it more than once, and this is not in a critical path.

Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Link: https://lore.kernel.org/r/20240702173820.90368-3-tony.luck@intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Hardware has two RMID configuration options for SNC systems. The default
mode divides RMID counters between SNC nodes. E.g. with 200 RMIDs and
two SNC nodes per L3 cache RMIDs 0..99 are used on node 0, and 100..199
on node 1. This isn't compatible with Linux resctrl usage. On this
example system a process using RMID 5 would only update monitor counters
while running on SNC node 0.

The other mode is "RMID Sharing Mode". This is enabled by clearing bit
0 of the RMID_SNC_CONFIG (0xCA0) model specific register. In this mode
the number of logical RMIDs is the number of physical RMIDs (from CPUID
leaf 0xF) divided by the number of SNC nodes per L3 cache instance. A
process can use the same RMID across different SNC nodes.

See the "Intel Resource Director Technology Architecture Specification"
for additional details.

When SNC is enabled, update the MSR when a monitor domain is marked
online. Technically this is overkill. It only needs to be done once
per L3 cache instance rather than per SNC domain. But there is no harm
in doing it more than once, and this is not in a critical path.

Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Link: https://lore.kernel.org/r/20240702173820.90368-3-tony.luck@intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/resctrl: Fill out rmid_read structure for smp_call*() to read a counter</title>
<updated>2024-07-02T17:57:19+00:00</updated>
<author>
<name>Tony Luck</name>
<email>tony.luck@intel.com</email>
</author>
<published>2024-06-28T21:56:15+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c8c7d3d904b76c45fe2b5dc982fb5090d12a63af'/>
<id>c8c7d3d904b76c45fe2b5dc982fb5090d12a63af</id>
<content type='text'>
mon_event_read() fills out most fields of the struct rmid_read that is passed
via an smp_call*() function to a CPU that is part of the correct domain to
read the monitor counters.

With Sub-NUMA Cluster (SNC) mode there are now two cases to handle:

1) Reading a file that returns a value for a single domain.
   + Choose the CPU to execute from the domain cpu_mask

2) Reading a file that must sum across domains sharing an L3 cache
   instance.
   + Indicate to called code that a sum is needed by passing a NULL
     rdt_mon_domain pointer.
   + Choose the CPU from the L3 shared_cpu_map.

Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Tested-by: Babu Moger &lt;babu.moger@amd.com&gt;
Link: https://lore.kernel.org/r/20240628215619.76401-16-tony.luck@intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
mon_event_read() fills out most fields of the struct rmid_read that is passed
via an smp_call*() function to a CPU that is part of the correct domain to
read the monitor counters.

With Sub-NUMA Cluster (SNC) mode there are now two cases to handle:

1) Reading a file that returns a value for a single domain.
   + Choose the CPU to execute from the domain cpu_mask

2) Reading a file that must sum across domains sharing an L3 cache
   instance.
   + Indicate to called code that a sum is needed by passing a NULL
     rdt_mon_domain pointer.
   + Choose the CPU from the L3 shared_cpu_map.

Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Tested-by: Babu Moger &lt;babu.moger@amd.com&gt;
Link: https://lore.kernel.org/r/20240628215619.76401-16-tony.luck@intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/resctrl: Allocate a new field in union mon_data_bits</title>
<updated>2024-07-02T17:49:54+00:00</updated>
<author>
<name>Tony Luck</name>
<email>tony.luck@intel.com</email>
</author>
<published>2024-06-28T21:56:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=92b5d0b1189ea9e9f00ae493fc99102fe7f2442f'/>
<id>92b5d0b1189ea9e9f00ae493fc99102fe7f2442f</id>
<content type='text'>
When Sub-NUMA Cluster (SNC) mode is enabled, the legacy monitor reporting files
must report the sum of the data from all of the SNC nodes that share the L3
cache that is referenced by the monitor file.

Resctrl squeezes all the attributes of these files into 32 bits so they can be
stored in the "priv" field of struct kernfs_node.

Currently, only three monitor events are defined by enum resctrl_event_id so
reducing it from 8 bits to 7 bits still provides more than enough space to
represent all the known event types.

But note that this choice was arbitrary.  The "rid" field is also far wider
than needed for the current number of resource id types.  This structure is
purely internal to resctrl, no ABI issues with modifying it. Subsequent changes
may rearrange the allocation of bits between each of the fields as needed.

Give the bit to a new "sum" field that indicates that reading this file must
sum across SNC nodes. This bit also indicates that the domid field is the id of
an L3 cache (instead of a domain id) to find which domains must be summed.

Fix up other issues in the kerneldoc description for mon_data_bits.

Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Tested-by: Babu Moger &lt;babu.moger@amd.com&gt;
Link: https://lore.kernel.org/r/20240628215619.76401-13-tony.luck@intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When Sub-NUMA Cluster (SNC) mode is enabled, the legacy monitor reporting files
must report the sum of the data from all of the SNC nodes that share the L3
cache that is referenced by the monitor file.

Resctrl squeezes all the attributes of these files into 32 bits so they can be
stored in the "priv" field of struct kernfs_node.

Currently, only three monitor events are defined by enum resctrl_event_id so
reducing it from 8 bits to 7 bits still provides more than enough space to
represent all the known event types.

But note that this choice was arbitrary.  The "rid" field is also far wider
than needed for the current number of resource id types.  This structure is
purely internal to resctrl, no ABI issues with modifying it. Subsequent changes
may rearrange the allocation of bits between each of the fields as needed.

Give the bit to a new "sum" field that indicates that reading this file must
sum across SNC nodes. This bit also indicates that the domid field is the id of
an L3 cache (instead of a domain id) to find which domains must be summed.

Fix up other issues in the kerneldoc description for mon_data_bits.

Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Tested-by: Babu Moger &lt;babu.moger@amd.com&gt;
Link: https://lore.kernel.org/r/20240628215619.76401-13-tony.luck@intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/resctrl: Add a new field to struct rmid_read for summation of domains</title>
<updated>2024-07-02T17:49:54+00:00</updated>
<author>
<name>Tony Luck</name>
<email>tony.luck@intel.com</email>
</author>
<published>2024-06-28T21:56:09+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=fb1f51f677585f1b1ba17d2390963bbebe7a8cfa'/>
<id>fb1f51f677585f1b1ba17d2390963bbebe7a8cfa</id>
<content type='text'>
When a user reads a monitor file rdtgroup_mondata_show() calls mon_event_read()
to package up all the required details into an rmid_read structure which is
passed across the smp_call*() infrastructure to code that will read data from
hardware and return the value (or error status) in the rmid_read structure.

Sub-NUMA Cluster (SNC) mode adds files with new semantics. These require the
smp_call-ed code to sum event data from all domains that share an L3 cache.

Add a pointer to the L3 "cacheinfo" structure to struct rmid_read for the data
collection routines to use to pick the domains to be summed.

  [ Reinette: the rmid_read structure has become complex enough so document each
    of its fields and provide the kerneldoc documentation for struct rmid_read. ]

Co-developed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Signed-off-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Tested-by: Babu Moger &lt;babu.moger@amd.com&gt;
Link: https://lore.kernel.org/r/20240628215619.76401-10-tony.luck@intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When a user reads a monitor file rdtgroup_mondata_show() calls mon_event_read()
to package up all the required details into an rmid_read structure which is
passed across the smp_call*() infrastructure to code that will read data from
hardware and return the value (or error status) in the rmid_read structure.

Sub-NUMA Cluster (SNC) mode adds files with new semantics. These require the
smp_call-ed code to sum event data from all domains that share an L3 cache.

Add a pointer to the L3 "cacheinfo" structure to struct rmid_read for the data
collection routines to use to pick the domains to be summed.

  [ Reinette: the rmid_read structure has become complex enough so document each
    of its fields and provide the kerneldoc documentation for struct rmid_read. ]

Co-developed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Signed-off-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Tested-by: Babu Moger &lt;babu.moger@amd.com&gt;
Link: https://lore.kernel.org/r/20240628215619.76401-10-tony.luck@intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/resctrl: Split the rdt_domain and rdt_hw_domain structures</title>
<updated>2024-07-02T17:49:54+00:00</updated>
<author>
<name>Tony Luck</name>
<email>tony.luck@intel.com</email>
</author>
<published>2024-06-28T21:56:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=cae2bcb6a2c691ef7b537ad07e9819a5ed645bcc'/>
<id>cae2bcb6a2c691ef7b537ad07e9819a5ed645bcc</id>
<content type='text'>
The same rdt_domain structure is used for both control and monitor
functions. But this results in wasted memory as some of the fields are
only used by control functions, while most are only used for monitor
functions.

Split into separate rdt_ctrl_domain and rdt_mon_domain structures with
just the fields required for control and monitoring respectively.

Similar split of the rdt_hw_domain structure into rdt_hw_ctrl_domain
and rdt_hw_mon_domain.

Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Tested-by: Babu Moger &lt;babu.moger@amd.com&gt;
Link: https://lore.kernel.org/r/20240628215619.76401-5-tony.luck@intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The same rdt_domain structure is used for both control and monitor
functions. But this results in wasted memory as some of the fields are
only used by control functions, while most are only used for monitor
functions.

Split into separate rdt_ctrl_domain and rdt_mon_domain structures with
just the fields required for control and monitoring respectively.

Similar split of the rdt_hw_domain structure into rdt_hw_ctrl_domain
and rdt_hw_mon_domain.

Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Tested-by: Babu Moger &lt;babu.moger@amd.com&gt;
Link: https://lore.kernel.org/r/20240628215619.76401-5-tony.luck@intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/resctrl: Prepare for different scope for control/monitor operations</title>
<updated>2024-07-02T17:49:53+00:00</updated>
<author>
<name>Tony Luck</name>
<email>tony.luck@intel.com</email>
</author>
<published>2024-06-28T21:56:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=cd84f72b6a5c10f79f19fab67b0edfbc4fdbc5b1'/>
<id>cd84f72b6a5c10f79f19fab67b0edfbc4fdbc5b1</id>
<content type='text'>
Resctrl assumes that control and monitor operations on a resource are
performed at the same scope.

Prepare for systems that use different scope (specifically Intel needs
to split the RDT_RESOURCE_L3 resource to use L3 scope for cache control
and NODE scope for cache occupancy and memory bandwidth monitoring).

Create separate domain lists for control and monitor operations.

Note that errors during initialization of either control or monitor
functions on a domain would previously result in that domain being
excluded from both control and monitor operations. Now the domains are
allocated independently it is no longer required to disable both control
and monitor operations if either fail.

Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Tested-by: Babu Moger &lt;babu.moger@amd.com&gt;
Link: https://lore.kernel.org/r/20240628215619.76401-4-tony.luck@intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Resctrl assumes that control and monitor operations on a resource are
performed at the same scope.

Prepare for systems that use different scope (specifically Intel needs
to split the RDT_RESOURCE_L3 resource to use L3 scope for cache control
and NODE scope for cache occupancy and memory bandwidth monitoring).

Create separate domain lists for control and monitor operations.

Note that errors during initialization of either control or monitor
functions on a domain would previously result in that domain being
excluded from both control and monitor operations. Now the domains are
allocated independently it is no longer required to disable both control
and monitor operations if either fail.

Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Tested-by: Babu Moger &lt;babu.moger@amd.com&gt;
Link: https://lore.kernel.org/r/20240628215619.76401-4-tony.luck@intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/resctrl: Simplify call convention for MSR update functions</title>
<updated>2024-04-24T11:47:00+00:00</updated>
<author>
<name>Tony Luck</name>
<email>tony.luck@intel.com</email>
</author>
<published>2024-03-08T21:38:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=bd4955d4bc2182ccb660c9c30a4dd7f36feaf943'/>
<id>bd4955d4bc2182ccb660c9c30a4dd7f36feaf943</id>
<content type='text'>
The per-resource MSR update functions cat_wrmsr(), mba_wrmsr_intel(),
and mba_wrmsr_amd() all take three arguments:

  (struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r)

struct msr_param contains pointers to both struct rdt_resource and struct
rdt_domain, thus only struct msr_param is necessary.

Pass struct msr_param as a single parameter. Clean up formatting and
fix some fir tree declaration ordering.

No functional change.

Suggested-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Reviewed-by: Babu Moger &lt;babu.moger@amd.com&gt;
Tested-by: Maciej Wieczor-Retman &lt;maciej.wieczor-retman@intel.com&gt;
Link: https://lore.kernel.org/r/20240308213846.77075-3-tony.luck@intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The per-resource MSR update functions cat_wrmsr(), mba_wrmsr_intel(),
and mba_wrmsr_amd() all take three arguments:

  (struct rdt_domain *d, struct msr_param *m, struct rdt_resource *r)

struct msr_param contains pointers to both struct rdt_resource and struct
rdt_domain, thus only struct msr_param is necessary.

Pass struct msr_param as a single parameter. Clean up formatting and
fix some fir tree declaration ordering.

No functional change.

Suggested-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Reviewed-by: Babu Moger &lt;babu.moger@amd.com&gt;
Tested-by: Maciej Wieczor-Retman &lt;maciej.wieczor-retman@intel.com&gt;
Link: https://lore.kernel.org/r/20240308213846.77075-3-tony.luck@intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/resctrl: Pass domain to target CPU</title>
<updated>2024-04-24T11:41:41+00:00</updated>
<author>
<name>Tony Luck</name>
<email>tony.luck@intel.com</email>
</author>
<published>2024-03-08T21:38:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e3ca96e479c91d6ee657d3caa5092a6a3a620f9f'/>
<id>e3ca96e479c91d6ee657d3caa5092a6a3a620f9f</id>
<content type='text'>
reset_all_ctrls() and resctrl_arch_update_domains() use on_each_cpu_mask()
to call rdt_ctrl_update() on potentially one CPU from each domain.

But this means rdt_ctrl_update() needs to figure out which domain to
apply changes to. Doing so requires a search of all domains in a resource,
which can only be done safely if cpus_lock is held. Both callers do hold
this lock, but there isn't a way for a function called on another CPU
via IPI to verify this.

Commit

  c0d848fcb09d ("x86/resctrl: Remove lockdep annotation that triggers
  false positive")

removed the incorrect assertions.

Add the target domain to the msr_param structure and call
rdt_ctrl_update() for each domain separately using
smp_call_function_single(). This means that rdt_ctrl_update() doesn't
need to search for the domain and get_domain_from_cpu() can safely
assert that the cpus_lock is held since the remaining callers do not use
IPI.

Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Reviewed-by: James Morse &lt;james.morse@arm.com&gt;
Reviewed-by: Babu Moger &lt;babu.moger@amd.com&gt;
Tested-by: Maciej Wieczor-Retman &lt;maciej.wieczor-retman@intel.com&gt;
Link: https://lore.kernel.org/r/20240308213846.77075-2-tony.luck@intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
reset_all_ctrls() and resctrl_arch_update_domains() use on_each_cpu_mask()
to call rdt_ctrl_update() on potentially one CPU from each domain.

But this means rdt_ctrl_update() needs to figure out which domain to
apply changes to. Doing so requires a search of all domains in a resource,
which can only be done safely if cpus_lock is held. Both callers do hold
this lock, but there isn't a way for a function called on another CPU
via IPI to verify this.

Commit

  c0d848fcb09d ("x86/resctrl: Remove lockdep annotation that triggers
  false positive")

removed the incorrect assertions.

Add the target domain to the msr_param structure and call
rdt_ctrl_update() for each domain separately using
smp_call_function_single(). This means that rdt_ctrl_update() doesn't
need to search for the domain and get_domain_from_cpu() can safely
assert that the cpus_lock is held since the remaining callers do not use
IPI.

Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Reviewed-by: James Morse &lt;james.morse@arm.com&gt;
Reviewed-by: Babu Moger &lt;babu.moger@amd.com&gt;
Tested-by: Maciej Wieczor-Retman &lt;maciej.wieczor-retman@intel.com&gt;
Link: https://lore.kernel.org/r/20240308213846.77075-2-tony.luck@intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/resctrl: Fix uninitialized memory read when last CPU of domain goes offline</title>
<updated>2024-04-03T07:30:01+00:00</updated>
<author>
<name>Reinette Chatre</name>
<email>reinette.chatre@intel.com</email>
</author>
<published>2024-04-01T18:16:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c3eeb1ffc6a88af9b002e22be0f70851759be03a'/>
<id>c3eeb1ffc6a88af9b002e22be0f70851759be03a</id>
<content type='text'>
Tony encountered this OOPS when the last CPU of a domain goes
offline while running a kernel built with CONFIG_NO_HZ_FULL:

    BUG: kernel NULL pointer dereference, address: 0000000000000000
    #PF: supervisor read access in kernel mode
    #PF: error_code(0x0000) - not-present page
    PGD 0
    Oops: 0000 [#1] PREEMPT SMP NOPTI
    ...
    RIP: 0010:__find_nth_andnot_bit+0x66/0x110
    ...
    Call Trace:
     &lt;TASK&gt;
     ? __die()
     ? page_fault_oops()
     ? exc_page_fault()
     ? asm_exc_page_fault()
     cpumask_any_housekeeping()
     mbm_setup_overflow_handler()
     resctrl_offline_cpu()
     resctrl_arch_offline_cpu()
     cpuhp_invoke_callback()
     cpuhp_thread_fun()
     smpboot_thread_fn()
     kthread()
     ret_from_fork()
     ret_from_fork_asm()
     &lt;/TASK&gt;

The NULL pointer dereference is encountered while searching for another
online CPU in the domain (of which there are none) that can be used to
run the MBM overflow handler.

Because the kernel is configured with CONFIG_NO_HZ_FULL the search for
another CPU (in its effort to prefer those CPUs that aren't marked
nohz_full) consults the mask representing the nohz_full CPUs,
tick_nohz_full_mask. On a kernel with CONFIG_CPUMASK_OFFSTACK=y
tick_nohz_full_mask is not allocated unless the kernel is booted with
the "nohz_full=" parameter and because of that any access to
tick_nohz_full_mask needs to be guarded with tick_nohz_full_enabled().

Replace the IS_ENABLED(CONFIG_NO_HZ_FULL) with tick_nohz_full_enabled().
The latter ensures tick_nohz_full_mask can be accessed safely and can be
used whether kernel is built with CONFIG_NO_HZ_FULL enabled or not.

[ Use Ingo's suggestion that combines the two NO_HZ checks into one. ]

Fixes: a4846aaf3945 ("x86/resctrl: Add cpumask_any_housekeeping() for limbo/overflow")
Reported-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Reviewed-by: Babu Moger &lt;babu.moger@amd.com&gt;
Link: https://lore.kernel.org/r/ff8dfc8d3dcb04b236d523d1e0de13d2ef585223.1711993956.git.reinette.chatre@intel.com
Closes: https://lore.kernel.org/lkml/ZgIFT5gZgIQ9A9G7@agluck-desk3/
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Tony encountered this OOPS when the last CPU of a domain goes
offline while running a kernel built with CONFIG_NO_HZ_FULL:

    BUG: kernel NULL pointer dereference, address: 0000000000000000
    #PF: supervisor read access in kernel mode
    #PF: error_code(0x0000) - not-present page
    PGD 0
    Oops: 0000 [#1] PREEMPT SMP NOPTI
    ...
    RIP: 0010:__find_nth_andnot_bit+0x66/0x110
    ...
    Call Trace:
     &lt;TASK&gt;
     ? __die()
     ? page_fault_oops()
     ? exc_page_fault()
     ? asm_exc_page_fault()
     cpumask_any_housekeeping()
     mbm_setup_overflow_handler()
     resctrl_offline_cpu()
     resctrl_arch_offline_cpu()
     cpuhp_invoke_callback()
     cpuhp_thread_fun()
     smpboot_thread_fn()
     kthread()
     ret_from_fork()
     ret_from_fork_asm()
     &lt;/TASK&gt;

The NULL pointer dereference is encountered while searching for another
online CPU in the domain (of which there are none) that can be used to
run the MBM overflow handler.

Because the kernel is configured with CONFIG_NO_HZ_FULL the search for
another CPU (in its effort to prefer those CPUs that aren't marked
nohz_full) consults the mask representing the nohz_full CPUs,
tick_nohz_full_mask. On a kernel with CONFIG_CPUMASK_OFFSTACK=y
tick_nohz_full_mask is not allocated unless the kernel is booted with
the "nohz_full=" parameter and because of that any access to
tick_nohz_full_mask needs to be guarded with tick_nohz_full_enabled().

Replace the IS_ENABLED(CONFIG_NO_HZ_FULL) with tick_nohz_full_enabled().
The latter ensures tick_nohz_full_mask can be accessed safely and can be
used whether kernel is built with CONFIG_NO_HZ_FULL enabled or not.

[ Use Ingo's suggestion that combines the two NO_HZ checks into one. ]

Fixes: a4846aaf3945 ("x86/resctrl: Add cpumask_any_housekeeping() for limbo/overflow")
Reported-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Reviewed-by: Babu Moger &lt;babu.moger@amd.com&gt;
Link: https://lore.kernel.org/r/ff8dfc8d3dcb04b236d523d1e0de13d2ef585223.1711993956.git.reinette.chatre@intel.com
Closes: https://lore.kernel.org/lkml/ZgIFT5gZgIQ9A9G7@agluck-desk3/
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/resctrl: Allow overflow/limbo handlers to be scheduled on any-but CPU</title>
<updated>2024-02-16T18:18:33+00:00</updated>
<author>
<name>James Morse</name>
<email>james.morse@arm.com</email>
</author>
<published>2024-02-13T18:44:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=978fcca954cb52249babbc14e53de53c88dd6433'/>
<id>978fcca954cb52249babbc14e53de53c88dd6433</id>
<content type='text'>
When a CPU is taken offline resctrl may need to move the overflow or limbo
handlers to run on a different CPU.

Once the offline callbacks have been split, cqm_setup_limbo_handler() will be
called while the CPU that is going offline is still present in the CPU mask.

Pass the CPU to exclude to cqm_setup_limbo_handler() and
mbm_setup_overflow_handler(). These functions can use a variant of
cpumask_any_but() when selecting the CPU. -1 is used to indicate no CPUs need
excluding.

Signed-off-by: James Morse &lt;james.morse@arm.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Shaopeng Tan &lt;tan.shaopeng@fujitsu.com&gt;
Reviewed-by: Babu Moger &lt;babu.moger@amd.com&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Tested-by: Shaopeng Tan &lt;tan.shaopeng@fujitsu.com&gt;
Tested-by: Peter Newman &lt;peternewman@google.com&gt;
Tested-by: Babu Moger &lt;babu.moger@amd.com&gt;
Tested-by: Carl Worth &lt;carl@os.amperecomputing.com&gt; # arm64
Link: https://lore.kernel.org/r/20240213184438.16675-22-james.morse@arm.com
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When a CPU is taken offline resctrl may need to move the overflow or limbo
handlers to run on a different CPU.

Once the offline callbacks have been split, cqm_setup_limbo_handler() will be
called while the CPU that is going offline is still present in the CPU mask.

Pass the CPU to exclude to cqm_setup_limbo_handler() and
mbm_setup_overflow_handler(). These functions can use a variant of
cpumask_any_but() when selecting the CPU. -1 is used to indicate no CPUs need
excluding.

Signed-off-by: James Morse &lt;james.morse@arm.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Shaopeng Tan &lt;tan.shaopeng@fujitsu.com&gt;
Reviewed-by: Babu Moger &lt;babu.moger@amd.com&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Tested-by: Shaopeng Tan &lt;tan.shaopeng@fujitsu.com&gt;
Tested-by: Peter Newman &lt;peternewman@google.com&gt;
Tested-by: Babu Moger &lt;babu.moger@amd.com&gt;
Tested-by: Carl Worth &lt;carl@os.amperecomputing.com&gt; # arm64
Link: https://lore.kernel.org/r/20240213184438.16675-22-james.morse@arm.com
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
</pre>
</div>
</content>
</entry>
</feed>
