<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/arch/x86/kernel/process_32.c, branch linux-rolling-stable</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>x86/traps: Initialize DR7 by writing its architectural reset value</title>
<updated>2025-06-24T20:15:52+00:00</updated>
<author>
<name>Xin Li (Intel)</name>
<email>xin@zytor.com</email>
</author>
<published>2025-06-20T23:15:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=fa7d0f83c5c4223a01598876352473cb3d3bd4d7'/>
<id>fa7d0f83c5c4223a01598876352473cb3d3bd4d7</id>
<content type='text'>
Initialize DR7 by writing its architectural reset value to always set
bit 10, which is reserved to '1', when "clearing" DR7 so as not to
trigger unanticipated behavior if said bit is ever unreserved, e.g. as
a feature enabling flag with inverted polarity.

Signed-off-by: Xin Li (Intel) &lt;xin@zytor.com&gt;
Signed-off-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Reviewed-by: H. Peter Anvin (Intel) &lt;hpa@zytor.com&gt;
Reviewed-by: Sohil Mehta &lt;sohil.mehta@intel.com&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Sean Christopherson &lt;seanjc@google.com&gt;
Tested-by: Sohil Mehta &lt;sohil.mehta@intel.com&gt;
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20250620231504.2676902-3-xin%40zytor.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Initialize DR7 by writing its architectural reset value to always set
bit 10, which is reserved to '1', when "clearing" DR7 so as not to
trigger unanticipated behavior if said bit is ever unreserved, e.g. as
a feature enabling flag with inverted polarity.

Signed-off-by: Xin Li (Intel) &lt;xin@zytor.com&gt;
Signed-off-by: Dave Hansen &lt;dave.hansen@linux.intel.com&gt;
Reviewed-by: H. Peter Anvin (Intel) &lt;hpa@zytor.com&gt;
Reviewed-by: Sohil Mehta &lt;sohil.mehta@intel.com&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Sean Christopherson &lt;seanjc@google.com&gt;
Tested-by: Sohil Mehta &lt;sohil.mehta@intel.com&gt;
Cc:stable@vger.kernel.org
Link: https://lore.kernel.org/all/20250620231504.2676902-3-xin%40zytor.com
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'x86_cache_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip</title>
<updated>2025-05-27T16:53:02+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-05-27T16:53:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=664a231d90aa450f9f6f029bee3a94dd08e1aac6'/>
<id>664a231d90aa450f9f6f029bee3a94dd08e1aac6</id>
<content type='text'>
Pull x86 resource control updates from Borislav Petkov:
 "Carve out the resctrl filesystem-related code into fs/resctrl/ so that
  multiple architectures can share the fs API for manipulating their
  respective hw resource control implementation.

  This is the second step in the work towards sharing the resctrl
  filesystem interface, the next one being plugging ARM's MPAM into the
  aforementioned fs API"

* tag 'x86_cache_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  MAINTAINERS: Add reviewers for fs/resctrl
  x86,fs/resctrl: Move the resctrl filesystem code to live in /fs/resctrl
  x86/resctrl: Always initialise rid field in rdt_resources_all[]
  x86/resctrl: Relax some asm #includes
  x86/resctrl: Prefer alloc(sizeof(*foo)) idiom in rdt_init_fs_context()
  x86/resctrl: Squelch whitespace anomalies in resctrl core code
  x86/resctrl: Move pseudo lock prototypes to include/linux/resctrl.h
  x86/resctrl: Fix types in resctrl_arch_mon_ctx_{alloc,free}() stubs
  x86/resctrl: Move enum resctrl_event_id to resctrl.h
  x86/resctrl: Move the filesystem bits to headers visible to fs/resctrl
  fs/resctrl: Add boiler plate for external resctrl code
  x86/resctrl: Add 'resctrl' to the title of the resctrl documentation
  x86/resctrl: Split trace.h
  x86/resctrl: Expand the width of domid by replacing mon_data_bits
  x86/resctrl: Add end-marker to the resctrl_event_id enum
  x86/resctrl: Move is_mba_sc() out of core.c
  x86/resctrl: Drop __init/__exit on assorted symbols
  x86/resctrl: Resctrl_exit() teardown resctrl but leave the mount point
  x86/resctrl: Check all domains are offline in resctrl_exit()
  x86/resctrl: Rename resctrl_sched_in() to begin with "resctrl_arch_"
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull x86 resource control updates from Borislav Petkov:
 "Carve out the resctrl filesystem-related code into fs/resctrl/ so that
  multiple architectures can share the fs API for manipulating their
  respective hw resource control implementation.

  This is the second step in the work towards sharing the resctrl
  filesystem interface, the next one being plugging ARM's MPAM into the
  aforementioned fs API"

* tag 'x86_cache_for_v6.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (25 commits)
  MAINTAINERS: Add reviewers for fs/resctrl
  x86,fs/resctrl: Move the resctrl filesystem code to live in /fs/resctrl
  x86/resctrl: Always initialise rid field in rdt_resources_all[]
  x86/resctrl: Relax some asm #includes
  x86/resctrl: Prefer alloc(sizeof(*foo)) idiom in rdt_init_fs_context()
  x86/resctrl: Squelch whitespace anomalies in resctrl core code
  x86/resctrl: Move pseudo lock prototypes to include/linux/resctrl.h
  x86/resctrl: Fix types in resctrl_arch_mon_ctx_{alloc,free}() stubs
  x86/resctrl: Move enum resctrl_event_id to resctrl.h
  x86/resctrl: Move the filesystem bits to headers visible to fs/resctrl
  fs/resctrl: Add boiler plate for external resctrl code
  x86/resctrl: Add 'resctrl' to the title of the resctrl documentation
  x86/resctrl: Split trace.h
  x86/resctrl: Expand the width of domid by replacing mon_data_bits
  x86/resctrl: Add end-marker to the resctrl_event_id enum
  x86/resctrl: Move is_mba_sc() out of core.c
  x86/resctrl: Drop __init/__exit on assorted symbols
  x86/resctrl: Resctrl_exit() teardown resctrl but leave the mount point
  x86/resctrl: Check all domains are offline in resctrl_exit()
  x86/resctrl: Rename resctrl_sched_in() to begin with "resctrl_arch_"
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/resctrl: Rename resctrl_sched_in() to begin with "resctrl_arch_"</title>
<updated>2025-05-15T19:01:00+00:00</updated>
<author>
<name>James Morse</name>
<email>james.morse@arm.com</email>
</author>
<published>2025-05-15T16:58:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=7704fb81bc87254e8772f387b62153b135bb1932'/>
<id>7704fb81bc87254e8772f387b62153b135bb1932</id>
<content type='text'>
resctrl_sched_in() loads the architecture specific CPU MSRs with the
CLOSID and RMID values. This function was named before resctrl was
split to have architecture specific code, and generic filesystem code.

This function is obviously architecture specific, but does not begin
with 'resctrl_arch_', making it the odd one out in the functions an
architecture needs to support to enable resctrl.

Rename it for consistency. This is purely cosmetic.

Signed-off-by: James Morse &lt;james.morse@arm.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Shaopeng Tan &lt;tan.shaopeng@jp.fujitsu.com&gt;
Reviewed-by: Tony Luck &lt;tony.luck@intel.com&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Reviewed-by: Fenghua Yu &lt;fenghuay@nvidia.com&gt;
Tested-by: Fenghua Yu &lt;fenghuay@nvidia.com&gt;
Tested-by: Carl Worth &lt;carl@os.amperecomputing.com&gt; # arm64
Tested-by: Shaopeng Tan &lt;tan.shaopeng@jp.fujitsu.com&gt;
Tested-by: Peter Newman &lt;peternewman@google.com&gt;
Tested-by: Amit Singh Tomar &lt;amitsinght@marvell.com&gt; # arm64
Tested-by: Shanker Donthineni &lt;sdonthineni@nvidia.com&gt; # arm64
Tested-by: Babu Moger &lt;babu.moger@amd.com&gt;
Tested-by: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://lore.kernel.org/20250515165855.31452-7-james.morse@arm.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
resctrl_sched_in() loads the architecture specific CPU MSRs with the
CLOSID and RMID values. This function was named before resctrl was
split to have architecture specific code, and generic filesystem code.

This function is obviously architecture specific, but does not begin
with 'resctrl_arch_', making it the odd one out in the functions an
architecture needs to support to enable resctrl.

Rename it for consistency. This is purely cosmetic.

Signed-off-by: James Morse &lt;james.morse@arm.com&gt;
Signed-off-by: Borislav Petkov (AMD) &lt;bp@alien8.de&gt;
Reviewed-by: Shaopeng Tan &lt;tan.shaopeng@jp.fujitsu.com&gt;
Reviewed-by: Tony Luck &lt;tony.luck@intel.com&gt;
Reviewed-by: Reinette Chatre &lt;reinette.chatre@intel.com&gt;
Reviewed-by: Fenghua Yu &lt;fenghuay@nvidia.com&gt;
Tested-by: Fenghua Yu &lt;fenghuay@nvidia.com&gt;
Tested-by: Carl Worth &lt;carl@os.amperecomputing.com&gt; # arm64
Tested-by: Shaopeng Tan &lt;tan.shaopeng@jp.fujitsu.com&gt;
Tested-by: Peter Newman &lt;peternewman@google.com&gt;
Tested-by: Amit Singh Tomar &lt;amitsinght@marvell.com&gt; # arm64
Tested-by: Shanker Donthineni &lt;sdonthineni@nvidia.com&gt; # arm64
Tested-by: Babu Moger &lt;babu.moger@amd.com&gt;
Tested-by: Tony Luck &lt;tony.luck@intel.com&gt;
Link: https://lore.kernel.org/20250515165855.31452-7-james.morse@arm.com
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/fpu: Simplify the switch_fpu_prepare() + switch_fpu_finish() logic</title>
<updated>2025-05-04T08:29:24+00:00</updated>
<author>
<name>Oleg Nesterov</name>
<email>oleg@redhat.com</email>
</author>
<published>2025-05-03T14:38:30+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=730faa15a069f4025a0f8c2a5244c3067da7ecbe'/>
<id>730faa15a069f4025a0f8c2a5244c3067da7ecbe</id>
<content type='text'>
Now that switch_fpu_finish() doesn't load the FPU state, it makes more
sense to fold it into switch_fpu_prepare() renamed to switch_fpu(), and
more importantly, use the "prev_p" task as a target for TIF_NEED_FPU_LOAD.
It doesn't make any sense to delay set_tsk_thread_flag(TIF_NEED_FPU_LOAD)
until "prev_p" is scheduled again.

There is no worry about the very first context switch, fpu_clone() must
always set TIF_NEED_FPU_LOAD.

Also, shift the test_tsk_thread_flag(TIF_NEED_FPU_LOAD) from the callers
to switch_fpu().

Note that the "PF_KTHREAD | PF_USER_WORKER" check can be removed but
this deserves a separate patch which can change more functions, say,
kernel_fpu_begin_mask().

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Chang S . Bae &lt;chang.seok.bae@intel.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Brian Gerst &lt;brgerst@gmail.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20250503143830.GA8982@redhat.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Now that switch_fpu_finish() doesn't load the FPU state, it makes more
sense to fold it into switch_fpu_prepare() renamed to switch_fpu(), and
more importantly, use the "prev_p" task as a target for TIF_NEED_FPU_LOAD.
It doesn't make any sense to delay set_tsk_thread_flag(TIF_NEED_FPU_LOAD)
until "prev_p" is scheduled again.

There is no worry about the very first context switch, fpu_clone() must
always set TIF_NEED_FPU_LOAD.

Also, shift the test_tsk_thread_flag(TIF_NEED_FPU_LOAD) from the callers
to switch_fpu().

Note that the "PF_KTHREAD | PF_USER_WORKER" check can be removed but
this deserves a separate patch which can change more functions, say,
kernel_fpu_begin_mask().

Signed-off-by: Oleg Nesterov &lt;oleg@redhat.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Chang S . Bae &lt;chang.seok.bae@intel.com&gt;
Cc: H. Peter Anvin &lt;hpa@zytor.com&gt;
Cc: Andy Lutomirski &lt;luto@amacapital.net&gt;
Cc: Brian Gerst &lt;brgerst@gmail.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20250503143830.GA8982@redhat.com
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/percpu: Move current_task to percpu hot section</title>
<updated>2025-03-04T19:30:33+00:00</updated>
<author>
<name>Brian Gerst</name>
<email>brgerst@gmail.com</email>
</author>
<published>2025-03-03T16:52:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a1e4cc0155ad577adc3a2c563fc5eec625945ce7'/>
<id>a1e4cc0155ad577adc3a2c563fc5eec625945ce7</id>
<content type='text'>
No functional change.

Signed-off-by: Brian Gerst &lt;brgerst@gmail.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Uros Bizjak &lt;ubizjak@gmail.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20250303165246.2175811-10-brgerst@gmail.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
No functional change.

Signed-off-by: Brian Gerst &lt;brgerst@gmail.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Uros Bizjak &lt;ubizjak@gmail.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20250303165246.2175811-10-brgerst@gmail.com
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/percpu: Move top_of_stack to percpu hot section</title>
<updated>2025-03-04T19:30:33+00:00</updated>
<author>
<name>Brian Gerst</name>
<email>brgerst@gmail.com</email>
</author>
<published>2025-03-03T16:52:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=385f72c83eb609652f02dc9ee415520c23bda629'/>
<id>385f72c83eb609652f02dc9ee415520c23bda629</id>
<content type='text'>
No functional change.

Signed-off-by: Brian Gerst &lt;brgerst@gmail.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Uros Bizjak &lt;ubizjak@gmail.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20250303165246.2175811-9-brgerst@gmail.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
No functional change.

Signed-off-by: Brian Gerst &lt;brgerst@gmail.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Acked-by: Uros Bizjak &lt;ubizjak@gmail.com&gt;
Cc: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20250303165246.2175811-9-brgerst@gmail.com
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/arch_prctl: Simplify sys_arch_prctl()</title>
<updated>2025-02-21T21:32:25+00:00</updated>
<author>
<name>Brian Gerst</name>
<email>brgerst@gmail.com</email>
</author>
<published>2025-02-02T20:23:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2df1ad0d25803399056d439425e271802cd243f6'/>
<id>2df1ad0d25803399056d439425e271802cd243f6</id>
<content type='text'>
Use in_ia32_syscall() instead of a compat syscall entry.

No change in functionality intended.

Signed-off-by: Brian Gerst &lt;brgerst@gmail.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Link: https://lore.kernel.org/r/20250202202323.422113-2-brgerst@gmail.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Use in_ia32_syscall() instead of a compat syscall entry.

No change in functionality intended.

Signed-off-by: Brian Gerst &lt;brgerst@gmail.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Link: https://lore.kernel.org/r/20250202202323.422113-2-brgerst@gmail.com
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/fpu: Clean up FPU switching in the middle of task switching</title>
<updated>2023-10-20T09:24:22+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-10-18T18:41:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=24b8a23638cbf92449c353f828b1d309548c78f4'/>
<id>24b8a23638cbf92449c353f828b1d309548c78f4</id>
<content type='text'>
It happens to work, but it's very very wrong, because our 'current'
macro is magic that is supposedly loading a stable value.

It just happens to be not quite stable enough and the compilers
re-load the value enough for this code to work.  But it's wrong.

The whole

        struct fpu *prev_fpu = &amp;prev-&gt;fpu;

thing in __switch_to() is pretty ugly. There's no reason why we
should look at that 'prev_fpu' pointer there, or pass it down.

And it only generates worse code, in how it loads 'current' when
__switch_to() has the right task pointers.

The attached patch not only cleans this up, it actually
generates better code too:

 (a) it removes one push/pop pair at entry/exit because there's one
     less register used (no 'current')

 (b) it removes that pointless load of 'current' because it just uses
     the right argument:

	-       movq    %gs:pcpu_hot(%rip), %r12
	-       testq   $16384, (%r12)
	+       testq   $16384, (%rdi)

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Uros Bizjak &lt;ubizjak@gmail.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lore.kernel.org/r/20231018184227.446318-1-ubizjak@gmail.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It happens to work, but it's very very wrong, because our 'current'
macro is magic that is supposedly loading a stable value.

It just happens to be not quite stable enough and the compilers
re-load the value enough for this code to work.  But it's wrong.

The whole

        struct fpu *prev_fpu = &amp;prev-&gt;fpu;

thing in __switch_to() is pretty ugly. There's no reason why we
should look at that 'prev_fpu' pointer there, or pass it down.

And it only generates worse code, in how it loads 'current' when
__switch_to() has the right task pointers.

The attached patch not only cleans this up, it actually
generates better code too:

 (a) it removes one push/pop pair at entry/exit because there's one
     less register used (no 'current')

 (b) it removes that pointless load of 'current' because it just uses
     the right argument:

	-       movq    %gs:pcpu_hot(%rip), %r12
	-       testq   $16384, (%r12)
	+       testq   $16384, (%rdi)

Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Uros Bizjak &lt;ubizjak@gmail.com&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lore.kernel.org/r/20231018184227.446318-1-ubizjak@gmail.com
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/resctl: fix scheduler confusion with 'current'</title>
<updated>2023-03-08T19:48:11+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2023-03-07T21:06:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=7fef099702527c3b2c5234a2ea6a24411485a13a'/>
<id>7fef099702527c3b2c5234a2ea6a24411485a13a</id>
<content type='text'>
The implementation of 'current' on x86 is very intentionally special: it
is a very common thing to look up, and it uses 'this_cpu_read_stable()'
to get the current thread pointer efficiently from per-cpu storage.

And the keyword in there is 'stable': the current thread pointer never
changes as far as a single thread is concerned.  Even if when a thread
is preempted, or moved to another CPU, or even across an explicit call
'schedule()' that thread will still have the same value for 'current'.

It is, after all, the kernel base pointer to thread-local storage.
That's why it's stable to begin with, but it's also why it's important
enough that we have that special 'this_cpu_read_stable()' access for it.

So this is all done very intentionally to allow the compiler to treat
'current' as a value that never visibly changes, so that the compiler
can do CSE and combine multiple different 'current' accesses into one.

However, there is obviously one very special situation when the
currently running thread does actually change: inside the scheduler
itself.

So the scheduler code paths are special, and do not have a 'current'
thread at all.  Instead there are _two_ threads: the previous and the
next thread - typically called 'prev' and 'next' (or prev_p/next_p)
internally.

So this is all actually quite straightforward and simple, and not all
that complicated.

Except for when you then have special code that is run in scheduler
context, that code then has to be aware that 'current' isn't really a
valid thing.  Did you mean 'prev'? Did you mean 'next'?

In fact, even if then look at the code, and you use 'current' after the
new value has been assigned to the percpu variable, we have explicitly
told the compiler that 'current' is magical and always stable.  So the
compiler is quite free to use an older (or newer) value of 'current',
and the actual assignment to the percpu storage is not relevant even if
it might look that way.

Which is exactly what happened in the resctl code, that blithely used
'current' in '__resctrl_sched_in()' when it really wanted the new
process state (as implied by the name: we're scheduling 'into' that new
resctl state).  And clang would end up just using the old thread pointer
value at least in some configurations.

This could have happened with gcc too, and purely depends on random
compiler details.  Clang just seems to have been more aggressive about
moving the read of the per-cpu current_task pointer around.

The fix is trivial: just make the resctl code adhere to the scheduler
rules of using the prev/next thread pointer explicitly, instead of using
'current' in a situation where it just wasn't valid.

That same code is then also used outside of the scheduler context (when
a thread resctl state is explicitly changed), and then we will just pass
in 'current' as that pointer, of course.  There is no ambiguity in that
case.

The fix may be trivial, but noticing and figuring out what went wrong
was not.  The credit for that goes to Stephane Eranian.

Reported-by: Stephane Eranian &lt;eranian@google.com&gt;
Link: https://lore.kernel.org/lkml/20230303231133.1486085-1-eranian@google.com/
Link: https://lore.kernel.org/lkml/alpine.LFD.2.01.0908011214330.3304@localhost.localdomain/
Reviewed-by: Nick Desaulniers &lt;ndesaulniers@google.com&gt;
Tested-by: Tony Luck &lt;tony.luck@intel.com&gt;
Tested-by: Stephane Eranian &lt;eranian@google.com&gt;
Tested-by: Babu Moger &lt;babu.moger@amd.com&gt;
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The implementation of 'current' on x86 is very intentionally special: it
is a very common thing to look up, and it uses 'this_cpu_read_stable()'
to get the current thread pointer efficiently from per-cpu storage.

And the keyword in there is 'stable': the current thread pointer never
changes as far as a single thread is concerned.  Even if when a thread
is preempted, or moved to another CPU, or even across an explicit call
'schedule()' that thread will still have the same value for 'current'.

It is, after all, the kernel base pointer to thread-local storage.
That's why it's stable to begin with, but it's also why it's important
enough that we have that special 'this_cpu_read_stable()' access for it.

So this is all done very intentionally to allow the compiler to treat
'current' as a value that never visibly changes, so that the compiler
can do CSE and combine multiple different 'current' accesses into one.

However, there is obviously one very special situation when the
currently running thread does actually change: inside the scheduler
itself.

So the scheduler code paths are special, and do not have a 'current'
thread at all.  Instead there are _two_ threads: the previous and the
next thread - typically called 'prev' and 'next' (or prev_p/next_p)
internally.

So this is all actually quite straightforward and simple, and not all
that complicated.

Except for when you then have special code that is run in scheduler
context, that code then has to be aware that 'current' isn't really a
valid thing.  Did you mean 'prev'? Did you mean 'next'?

In fact, even if then look at the code, and you use 'current' after the
new value has been assigned to the percpu variable, we have explicitly
told the compiler that 'current' is magical and always stable.  So the
compiler is quite free to use an older (or newer) value of 'current',
and the actual assignment to the percpu storage is not relevant even if
it might look that way.

Which is exactly what happened in the resctl code, that blithely used
'current' in '__resctrl_sched_in()' when it really wanted the new
process state (as implied by the name: we're scheduling 'into' that new
resctl state).  And clang would end up just using the old thread pointer
value at least in some configurations.

This could have happened with gcc too, and purely depends on random
compiler details.  Clang just seems to have been more aggressive about
moving the read of the per-cpu current_task pointer around.

The fix is trivial: just make the resctl code adhere to the scheduler
rules of using the prev/next thread pointer explicitly, instead of using
'current' in a situation where it just wasn't valid.

That same code is then also used outside of the scheduler context (when
a thread resctl state is explicitly changed), and then we will just pass
in 'current' as that pointer, of course.  There is no ambiguity in that
case.

The fix may be trivial, but noticing and figuring out what went wrong
was not.  The credit for that goes to Stephane Eranian.

Reported-by: Stephane Eranian &lt;eranian@google.com&gt;
Link: https://lore.kernel.org/lkml/20230303231133.1486085-1-eranian@google.com/
Link: https://lore.kernel.org/lkml/alpine.LFD.2.01.0908011214330.3304@localhost.localdomain/
Reviewed-by: Nick Desaulniers &lt;ndesaulniers@google.com&gt;
Tested-by: Tony Luck &lt;tony.luck@intel.com&gt;
Tested-by: Stephane Eranian &lt;eranian@google.com&gt;
Tested-by: Babu Moger &lt;babu.moger@amd.com&gt;
Cc: stable@kernel.org
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>x86/percpu: Move current_top_of_stack next to current_task</title>
<updated>2022-10-17T14:41:05+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2022-09-15T11:11:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=c063a217bc0726c2560138229de5673dbb253a02'/>
<id>c063a217bc0726c2560138229de5673dbb253a02</id>
<content type='text'>
Extend the struct pcpu_hot cacheline with current_top_of_stack;
another very frequently used value.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20220915111145.493038635@infradead.org
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Extend the struct pcpu_hot cacheline with current_top_of_stack;
another very frequently used value.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/20220915111145.493038635@infradead.org
</pre>
</div>
</content>
</entry>
</feed>
