<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/arch/x86/kernel/cpu/resctrl/monitor.c, branch v6.14</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>x86/resctrl: Compute memory bandwidth for all supported events</title>
<updated>2024-12-10T15:13:48+00:00</updated>
<author>
<name>Tony Luck</name>
<email>tony.luck@intel.com</email>
</author>
<published>2024-12-06T16:31:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2c272fadb58b590eb973c6c447b039f10631f5f7'/>
<id>2c272fadb58b590eb973c6c447b039f10631f5f7</id>
<content type='text'>
Switching between local and total memory bandwidth events as the input
to the mba_sc feedback loop would be cumbersome and take effect slowly
in the current implementation as the bandwidth is only known after two
consecutive readings of the same event.

Compute the bandwidth for all supported events. This doesn't add
significant overhead and will make changing which event is used
simple.

Suggested-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Tested-by: Babu Moger &lt;babu.moger@amd.com&gt;
Link: https://lore.kernel.org/r/20241206163148.83828-5-tony.luck@intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Switching between local and total memory bandwidth events as the input
to the mba_sc feedback loop would be cumbersome and take effect slowly
in the current implementation as the bandwidth is only known after two
consecutive readings of the same event.

Compute the bandwidth for all supported events. This doesn't add
significant overhead and will make changing which event is used
simple.

Suggested-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Tested-by: Babu Moger &lt;babu.moger@amd.com&gt;
Link: https://lore.kernel.org/r/20241206163148.83828-5-tony.luck@intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/resctrl: Modify update_mba_bw() to use per CTRL_MON group event</title>
<updated>2024-12-10T10:15:19+00:00</updated>
<author>
<name>Tony Luck</name>
<email>tony.luck@intel.com</email>
</author>
<published>2024-12-06T16:31:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=481d363748b2df881df21569f3697b3c7fcf8fc1'/>
<id>481d363748b2df881df21569f3697b3c7fcf8fc1</id>
<content type='text'>
update_mba_bw() hard codes use of the memory bandwidth local event which
prevents more flexible options from being deployed.

Change this function to use the event specified in the rdtgroup that is
being processed.

Mount time checks for the "mba_MBps" option ensure that local memory
bandwidth is enabled. So drop the redundant is_mbm_local_enabled() check.

Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Tested-by: Babu Moger &lt;babu.moger@amd.com&gt;
Link: https://lore.kernel.org/r/20241206163148.83828-4-tony.luck@intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
update_mba_bw() hard codes use of the memory bandwidth local event which
prevents more flexible options from being deployed.

Change this function to use the event specified in the rdtgroup that is
being processed.

Mount time checks for the "mba_MBps" option ensure that local memory
bandwidth is enabled. So drop the redundant is_mbm_local_enabled() check.

Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Tested-by: Babu Moger &lt;babu.moger@amd.com&gt;
Link: https://lore.kernel.org/r/20241206163148.83828-4-tony.luck@intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/resctrl: Introduce resctrl_file_fflags_init() to initialize fflags</title>
<updated>2024-12-09T20:37:01+00:00</updated>
<author>
<name>Babu Moger</name>
<email>babu.moger@amd.com</email>
</author>
<published>2024-12-06T16:31:41+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2937f9c361f7a8b230cd599e4af5264798bf4ce7'/>
<id>2937f9c361f7a8b230cd599e4af5264798bf4ce7</id>
<content type='text'>
thread_throttle_mode_init() and mbm_config_rftype_init() both initialize
fflags for resctrl files.

Adding new files will involve adding another function to initialize
the fflags. This can be simplified by adding a new function
resctrl_file_fflags_init() and passing the file name and flags
to be initialized.

Consolidate fflags initialization into resctrl_file_fflags_init() and
remove thread_throttle_mode_init() and mbm_config_rftype_init().

  [ Tony: Drop __init attribute so resctrl_file_fflags_init() can be used at
    run time. ]

Signed-off-by: Babu Moger &lt;babu.moger@amd.com&gt;
Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Link: https://lore.kernel.org/r/20241206163148.83828-2-tony.luck@intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
thread_throttle_mode_init() and mbm_config_rftype_init() both initialize
fflags for resctrl files.

Adding new files will involve adding another function to initialize
the fflags. This can be simplified by adding a new function
resctrl_file_fflags_init() and passing the file name and flags
to be initialized.

Consolidate fflags initialization into resctrl_file_fflags_init() and
remove thread_throttle_mode_init() and mbm_config_rftype_init().

  [ Tony: Drop __init attribute so resctrl_file_fflags_init() can be used at
    run time. ]

Signed-off-by: Babu Moger &lt;babu.moger@amd.com&gt;
Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Link: https://lore.kernel.org/r/20241206163148.83828-2-tony.luck@intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/resctrl: Support Sub-NUMA cluster mode SNC6</title>
<updated>2024-11-06T09:49:04+00:00</updated>
<author>
<name>Tony Luck</name>
<email>tony.luck@intel.com</email>
</author>
<published>2024-10-31T22:02:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9bce6e94c4b39b6baa649784d92f908aa9168a45'/>
<id>9bce6e94c4b39b6baa649784d92f908aa9168a45</id>
<content type='text'>
Support Sub-NUMA cluster mode with 6 nodes per L3 cache (SNC6) on some
Intel platforms.

Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Link: https://lore.kernel.org/r/20241031220213.17991-1-tony.luck@intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Support Sub-NUMA cluster mode with 6 nodes per L3 cache (SNC6) on some
Intel platforms.

Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Link: https://lore.kernel.org/r/20241031220213.17991-1-tony.luck@intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'x86_cache_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2024-07-16T17:53:54+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2024-07-16T17:53:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b84b3381907a3c5c6f1d524185eddc55547068b7'/>
<id>b84b3381907a3c5c6f1d524185eddc55547068b7</id>
<content type='text'>
Pull x86 resource control updates from Borislav Petkov:

 - Enable Sub-NUMA clustering to work with resource control on Intel by
   teaching resctrl to handle scopes due to the clustering which
   partitions the L3 cache into sets. Modify and extend the subsystem to
   handle such scopes properly

* tag 'x86_cache_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/resctrl: Update documentation with Sub-NUMA cluster changes
  x86/resctrl: Detect Sub-NUMA Cluster (SNC) mode
  x86/resctrl: Enable shared RMID mode on Sub-NUMA Cluster (SNC) systems
  x86/resctrl: Make __mon_event_count() handle sum domains
  x86/resctrl: Fill out rmid_read structure for smp_call*() to read a counter
  x86/resctrl: Handle removing directories in Sub-NUMA Cluster (SNC) mode
  x86/resctrl: Create Sub-NUMA Cluster (SNC) monitor files
  x86/resctrl: Allocate a new field in union mon_data_bits
  x86/resctrl: Refactor mkdir_mondata_subdir() with a helper function
  x86/resctrl: Initialize on-stack struct rmid_read instances
  x86/resctrl: Add a new field to struct rmid_read for summation of domains
  x86/resctrl: Prepare for new Sub-NUMA Cluster (SNC) monitor files
  x86/resctrl: Block use of mba_MBps mount option on Sub-NUMA Cluster (SNC) systems
  x86/resctrl: Introduce snc_nodes_per_l3_cache
  x86/resctrl: Add node-scope to the options for feature scope
  x86/resctrl: Split the rdt_domain and rdt_hw_domain structures
  x86/resctrl: Prepare for different scope for control/monitor operations
  x86/resctrl: Prepare to split rdt_domain structure
  x86/resctrl: Prepare for new domain scope
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull x86 resource control updates from Borislav Petkov:

 - Enable Sub-NUMA clustering to work with resource control on Intel by
   teaching resctrl to handle scopes due to the clustering which
   partitions the L3 cache into sets. Modify and extend the subsystem to
   handle such scopes properly

* tag 'x86_cache_for_v6.11_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/resctrl: Update documentation with Sub-NUMA cluster changes
  x86/resctrl: Detect Sub-NUMA Cluster (SNC) mode
  x86/resctrl: Enable shared RMID mode on Sub-NUMA Cluster (SNC) systems
  x86/resctrl: Make __mon_event_count() handle sum domains
  x86/resctrl: Fill out rmid_read structure for smp_call*() to read a counter
  x86/resctrl: Handle removing directories in Sub-NUMA Cluster (SNC) mode
  x86/resctrl: Create Sub-NUMA Cluster (SNC) monitor files
  x86/resctrl: Allocate a new field in union mon_data_bits
  x86/resctrl: Refactor mkdir_mondata_subdir() with a helper function
  x86/resctrl: Initialize on-stack struct rmid_read instances
  x86/resctrl: Add a new field to struct rmid_read for summation of domains
  x86/resctrl: Prepare for new Sub-NUMA Cluster (SNC) monitor files
  x86/resctrl: Block use of mba_MBps mount option on Sub-NUMA Cluster (SNC) systems
  x86/resctrl: Introduce snc_nodes_per_l3_cache
  x86/resctrl: Add node-scope to the options for feature scope
  x86/resctrl: Split the rdt_domain and rdt_hw_domain structures
  x86/resctrl: Prepare for different scope for control/monitor operations
  x86/resctrl: Prepare to split rdt_domain structure
  x86/resctrl: Prepare for new domain scope
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/resctrl: Detect Sub-NUMA Cluster (SNC) mode</title>
<updated>2024-07-02T18:02:11+00:00</updated>
<author>
<name>Tony Luck</name>
<email>tony.luck@intel.com</email>
</author>
<published>2024-06-28T21:56:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=13488150f5e2a9b84a335ae18bee33a918ead85d'/>
<id>13488150f5e2a9b84a335ae18bee33a918ead85d</id>
<content type='text'>
There isn't a simple hardware bit that indicates whether a CPU is running in
Sub-NUMA Cluster (SNC) mode. Infer the state by comparing the number of CPUs
sharing the L3 cache with CPU0 to the number of CPUs in the same NUMA node as
CPU0.

Add the missing definition of pr_fmt() to monitor.c. This wasn't noticed
before as there are only "can't happen" console messages from this file.

  [ bp: Massage commit message. ]

Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Tested-by: Babu Moger &lt;babu.moger@amd.com&gt;
Link: https://lore.kernel.org/r/20240628215619.76401-19-tony.luck@intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There isn't a simple hardware bit that indicates whether a CPU is running in
Sub-NUMA Cluster (SNC) mode. Infer the state by comparing the number of CPUs
sharing the L3 cache with CPU0 to the number of CPUs in the same NUMA node as
CPU0.

Add the missing definition of pr_fmt() to monitor.c. This wasn't noticed
before as there are only "can't happen" console messages from this file.

  [ bp: Massage commit message. ]

Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Tested-by: Babu Moger &lt;babu.moger@amd.com&gt;
Link: https://lore.kernel.org/r/20240628215619.76401-19-tony.luck@intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/resctrl: Enable shared RMID mode on Sub-NUMA Cluster (SNC) systems</title>
<updated>2024-07-02T17:57:51+00:00</updated>
<author>
<name>Tony Luck</name>
<email>tony.luck@intel.com</email>
</author>
<published>2024-07-02T17:38:20+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=21b362cc762aabb3e8496d33d7b4538154c95a0b'/>
<id>21b362cc762aabb3e8496d33d7b4538154c95a0b</id>
<content type='text'>
Hardware has two RMID configuration options for SNC systems. The default
mode divides RMID counters between SNC nodes. E.g. with 200 RMIDs and
two SNC nodes per L3 cache RMIDs 0..99 are used on node 0, and 100..199
on node 1. This isn't compatible with Linux resctrl usage. On this
example system a process using RMID 5 would only update monitor counters
while running on SNC node 0.

The other mode is "RMID Sharing Mode". This is enabled by clearing bit
0 of the RMID_SNC_CONFIG (0xCA0) model specific register. In this mode
the number of logical RMIDs is the number of physical RMIDs (from CPUID
leaf 0xF) divided by the number of SNC nodes per L3 cache instance. A
process can use the same RMID across different SNC nodes.

See the "Intel Resource Director Technology Architecture Specification"
for additional details.

When SNC is enabled, update the MSR when a monitor domain is marked
online. Technically this is overkill. It only needs to be done once
per L3 cache instance rather than per SNC domain. But there is no harm
in doing it more than once, and this is not in a critical path.

Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Link: https://lore.kernel.org/r/20240702173820.90368-3-tony.luck@intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Hardware has two RMID configuration options for SNC systems. The default
mode divides RMID counters between SNC nodes. E.g. with 200 RMIDs and
two SNC nodes per L3 cache RMIDs 0..99 are used on node 0, and 100..199
on node 1. This isn't compatible with Linux resctrl usage. On this
example system a process using RMID 5 would only update monitor counters
while running on SNC node 0.

The other mode is "RMID Sharing Mode". This is enabled by clearing bit
0 of the RMID_SNC_CONFIG (0xCA0) model specific register. In this mode
the number of logical RMIDs is the number of physical RMIDs (from CPUID
leaf 0xF) divided by the number of SNC nodes per L3 cache instance. A
process can use the same RMID across different SNC nodes.

See the "Intel Resource Director Technology Architecture Specification"
for additional details.

When SNC is enabled, update the MSR when a monitor domain is marked
online. Technically this is overkill. It only needs to be done once
per L3 cache instance rather than per SNC domain. But there is no harm
in doing it more than once, and this is not in a critical path.

Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Link: https://lore.kernel.org/r/20240702173820.90368-3-tony.luck@intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/resctrl: Make __mon_event_count() handle sum domains</title>
<updated>2024-07-02T17:57:22+00:00</updated>
<author>
<name>Tony Luck</name>
<email>tony.luck@intel.com</email>
</author>
<published>2024-06-28T21:56:16+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9fbb303ec949a376f3cbdf6a2b66ad2212c24ebc'/>
<id>9fbb303ec949a376f3cbdf6a2b66ad2212c24ebc</id>
<content type='text'>
Legacy resctrl monitor files must provide the sum of event values across
all Sub-NUMA Cluster (SNC) domains that share an L3 cache instance.

There are now two cases:
1) A specific domain is provided in struct rmid_read
   This is either a non-SNC system, or the request is to read data
   from just one SNC node.
2) Domain pointer is NULL. In this case the cacheinfo field in struct
   rmid_read indicates that all SNC nodes that share that L3 cache
   instance should have the event read and return the sum of all
   values.

Update the CPU sanity check. The existing check that an event is read
from a CPU in the requested domain still applies when reading a single
domain. But when summing across domains a more relaxed check that the
current CPU is in the scope of the L3 cache instance is appropriate
since the MSRs to read events are scoped at L3 cache level.

Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Tested-by: Babu Moger &lt;babu.moger@amd.com&gt;
Link: https://lore.kernel.org/r/20240628215619.76401-17-tony.luck@intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Legacy resctrl monitor files must provide the sum of event values across
all Sub-NUMA Cluster (SNC) domains that share an L3 cache instance.

There are now two cases:
1) A specific domain is provided in struct rmid_read
   This is either a non-SNC system, or the request is to read data
   from just one SNC node.
2) Domain pointer is NULL. In this case the cacheinfo field in struct
   rmid_read indicates that all SNC nodes that share that L3 cache
   instance should have the event read and return the sum of all
   values.

Update the CPU sanity check. The existing check that an event is read
from a CPU in the requested domain still applies when reading a single
domain. But when summing across domains a more relaxed check that the
current CPU is in the scope of the L3 cache instance is appropriate
since the MSRs to read events are scoped at L3 cache level.

Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Tested-by: Babu Moger &lt;babu.moger@amd.com&gt;
Link: https://lore.kernel.org/r/20240628215619.76401-17-tony.luck@intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/resctrl: Initialize on-stack struct rmid_read instances</title>
<updated>2024-07-02T17:49:54+00:00</updated>
<author>
<name>Tony Luck</name>
<email>tony.luck@intel.com</email>
</author>
<published>2024-06-28T21:56:10+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=587edd7069b9e7dc7993d2df9371e7c37a4d2133'/>
<id>587edd7069b9e7dc7993d2df9371e7c37a4d2133</id>
<content type='text'>
New semantics rely on some struct rmid_read members having NULL values to
distinguish between the SNC and non-SNC scenarios.  resctrl can thus no longer
rely on this struct not being initialized properly.

Initialize all on-stack declarations of struct rmid_read:

  rdtgroup_mondata_show()
  mbm_update()
  mkdir_mondata_subdir()

to ensure that garbage values from the stack are not passed down to other
functions.

  [ bp: Massage commit message. ]

Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Tested-by: Babu Moger &lt;babu.moger@amd.com&gt;
Link: https://lore.kernel.org/r/20240628215619.76401-11-tony.luck@intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
New semantics rely on some struct rmid_read members having NULL values to
distinguish between the SNC and non-SNC scenarios.  resctrl can thus no longer
rely on this struct not being initialized properly.

Initialize all on-stack declarations of struct rmid_read:

  rdtgroup_mondata_show()
  mbm_update()
  mkdir_mondata_subdir()

to ensure that garbage values from the stack are not passed down to other
functions.

  [ bp: Massage commit message. ]

Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Tested-by: Babu Moger &lt;babu.moger@amd.com&gt;
Link: https://lore.kernel.org/r/20240628215619.76401-11-tony.luck@intel.com
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/resctrl: Introduce snc_nodes_per_l3_cache</title>
<updated>2024-07-02T17:49:54+00:00</updated>
<author>
<name>Tony Luck</name>
<email>tony.luck@intel.com</email>
</author>
<published>2024-06-28T21:56:06+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e13db55b5a0d447dea63cde772c1078405bbbf96'/>
<id>e13db55b5a0d447dea63cde772c1078405bbbf96</id>
<content type='text'>
Intel Sub-NUMA Cluster (SNC) is a feature that subdivides the CPU cores
and memory controllers on a socket into two or more groups. These are
presented to the operating system as NUMA nodes.

This may enable some workloads to have slightly lower latency to memory
as the memory controller(s) in an SNC node are electrically closer to the
CPU cores on that SNC node. This cost may be offset by lower bandwidth
since the memory accesses for each core can only be interleaved between
the memory controllers on the same SNC node.

Resctrl monitoring on an Intel system depends upon attaching RMIDs to tasks
to track L3 cache occupancy and memory bandwidth. There is an MSR that
controls how the RMIDs are shared between SNC nodes.

The default mode divides them numerically. E.g. when there are two SNC
nodes on a socket the lower number half of the RMIDs are given to the
first node, the remainder to the second node. This would be difficult
to use with the Linux resctrl interface as specific RMID values assigned
to resctrl groups are not visible to users.

RMID sharing mode divides the physical RMIDs evenly between SNC nodes
but uses a logical RMID in the IA32_PQR_ASSOC MSR. For example a system
with 200 physical RMIDs (as enumerated by CPUID leaf 0xF) that has two
SNC nodes per L3 cache instance would have 100 logical RMIDs available
for Linux to use. A task running on SNC node 0 with RMID 5 would
accumulate LLC occupancy and MBM bandwidth data in physical RMID 5.
Another task using RMID 5, but running on SNC node 1 would accumulate
data in physical RMID 105.

Even with this renumbering SNC mode requires several changes in resctrl
behavior for correct operation.

Add a static global to arch/x86/kernel/cpu/resctrl/monitor.c to indicate
how many SNC domains share an L3 cache instance.  Initialize this to
"1". Runtime detection of SNC mode will adjust this value.

Update all places to take appropriate action when SNC mode is enabled:
1) The number of logical RMIDs per L3 cache available for use is the
   number of physical RMIDs divided by the number of SNC nodes.
2) Likewise the "mon_scale" value must be divided by the number of SNC
   nodes.
3) Add a function to convert from logical RMID values (assigned to
   tasks and loaded into the IA32_PQR_ASSOC MSR on context switch)
   to physical RMID values to load into IA32_QM_EVTSEL MSR when
   reading counters on each SNC node.

Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Tested-by: Babu Moger &lt;babu.moger@amd.com&gt;
Link: https://lore.kernel.org/r/20240628215619.76401-7-tony.luck@intel.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Intel Sub-NUMA Cluster (SNC) is a feature that subdivides the CPU cores
and memory controllers on a socket into two or more groups. These are
presented to the operating system as NUMA nodes.

This may enable some workloads to have slightly lower latency to memory
as the memory controller(s) in an SNC node are electrically closer to the
CPU cores on that SNC node. This cost may be offset by lower bandwidth
since the memory accesses for each core can only be interleaved between
the memory controllers on the same SNC node.

Resctrl monitoring on an Intel system depends upon attaching RMIDs to tasks
to track L3 cache occupancy and memory bandwidth. There is an MSR that
controls how the RMIDs are shared between SNC nodes.

The default mode divides them numerically. E.g. when there are two SNC
nodes on a socket the lower number half of the RMIDs are given to the
first node, the remainder to the second node. This would be difficult
to use with the Linux resctrl interface as specific RMID values assigned
to resctrl groups are not visible to users.

RMID sharing mode divides the physical RMIDs evenly between SNC nodes
but uses a logical RMID in the IA32_PQR_ASSOC MSR. For example a system
with 200 physical RMIDs (as enumerated by CPUID leaf 0xF) that has two
SNC nodes per L3 cache instance would have 100 logical RMIDs available
for Linux to use. A task running on SNC node 0 with RMID 5 would
accumulate LLC occupancy and MBM bandwidth data in physical RMID 5.
Another task using RMID 5, but running on SNC node 1 would accumulate
data in physical RMID 105.

Even with this renumbering SNC mode requires several changes in resctrl
behavior for correct operation.

Add a static global to arch/x86/kernel/cpu/resctrl/monitor.c to indicate
how many SNC domains share an L3 cache instance.  Initialize this to
"1". Runtime detection of SNC mode will adjust this value.

Update all places to take appropriate action when SNC mode is enabled:
1) The number of logical RMIDs per L3 cache available for use is the
   number of physical RMIDs divided by the number of SNC nodes.
2) Likewise the "mon_scale" value must be divided by the number of SNC
   nodes.
3) Add a function to convert from logical RMID values (assigned to
   tasks and loaded into the IA32_PQR_ASSOC MSR on context switch)
   to physical RMID values to load into IA32_QM_EVTSEL MSR when
   reading counters on each SNC node.

Signed-off-by: Tony Luck &lt;tony.luck@intel.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Tested-by: Babu Moger &lt;babu.moger@amd.com&gt;
Link: https://lore.kernel.org/r/20240628215619.76401-7-tony.luck@intel.com
</pre>
</div>
</content>
</entry>
</feed>
