<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/kernel/sched/psi.c, branch v6.8</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>sched/psi: Update poll =&gt; rtpoll in relevant comments</title>
<updated>2023-10-16T11:42:49+00:00</updated>
<author>
<name>Fan Yu</name>
<email>fan.yu9@zte.com.cn</email>
</author>
<published>2023-10-16T11:20:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7b3d8df549390e797f883efa16224fa0dfe35e55'/>
<id>7b3d8df549390e797f883efa16224fa0dfe35e55</id>
<content type='text'>
The PSI trigger code is now making a distinction between privileged and
unprivileged triggers, after the following commit:

 65457b74aa94 ("sched/psi: Rename existing poll members in preparation")

But some comments have not been modified along with the code, so they
need to be updated.

This will help readers better understand the code.

Signed-off-by: Fan Yu &lt;fan.yu9@zte.com.cn&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Peter Ziljstra &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/202310161920399921184@zte.com.cn
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The PSI trigger code is now making a distinction between privileged and
unprivileged triggers, after the following commit:

 65457b74aa94 ("sched/psi: Rename existing poll members in preparation")

But some comments have not been modified along with the code, so they
need to be updated.

This will help readers better understand the code.

Signed-off-by: Fan Yu &lt;fan.yu9@zte.com.cn&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Peter Ziljstra &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/202310161920399921184@zte.com.cn
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/psi: Bail out early from irq time accounting</title>
<updated>2023-10-13T07:56:29+00:00</updated>
<author>
<name>Haifeng Xu</name>
<email>haifeng.xu@shopee.com</email>
</author>
<published>2023-09-26T11:57:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0c2924079f5a83ed715630680e338b3685a0bf7d'/>
<id>0c2924079f5a83ed715630680e338b3685a0bf7d</id>
<content type='text'>
We could bail out early when psi was disabled.

Signed-off-by: Haifeng Xu &lt;haifeng.xu@shopee.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Chengming Zhou &lt;zhouchengming@bytedance.com&gt;
Link: https://lore.kernel.org/r/20230926115722.467833-1-haifeng.xu@shopee.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We could bail out early when psi was disabled.

Signed-off-by: Haifeng Xu &lt;haifeng.xu@shopee.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Chengming Zhou &lt;zhouchengming@bytedance.com&gt;
Link: https://lore.kernel.org/r/20230926115722.467833-1-haifeng.xu@shopee.com
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/psi: Delete the 'update_total' function parameter from update_triggers()</title>
<updated>2023-10-11T21:08:09+00:00</updated>
<author>
<name>Yang Yang</name>
<email>yang.yang29@zte.com.cn</email>
</author>
<published>2023-10-10T08:45:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3657680f38cd7df413d665f2b2f38e9a78130d8b'/>
<id>3657680f38cd7df413d665f2b2f38e9a78130d8b</id>
<content type='text'>
The 'update_total' parameter of update_triggers() is always true after the
previous commit:

  80cc1d1d5ee3 ("sched/psi: Avoid updating PSI triggers and -&gt;rtpoll_total when there are no state changes")

If the 'changed_states &amp; group-&gt;rtpoll_states' condition is true,
'new_stall' in update_triggers() will be true, and then 'update_total'
should also be true.

So update_total is redundant - remove it.

[ mingo: Changelog updates ]

Signed-off-by: Yang Yang &lt;yang.yang29@zte.com.cn&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Peter Ziljstra &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/202310101645437859599@zte.com.cn
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The 'update_total' parameter of update_triggers() is always true after the
previous commit:

  80cc1d1d5ee3 ("sched/psi: Avoid updating PSI triggers and -&gt;rtpoll_total when there are no state changes")

If the 'changed_states &amp; group-&gt;rtpoll_states' condition is true,
'new_stall' in update_triggers() will be true, and then 'update_total'
should also be true.

So update_total is redundant - remove it.

[ mingo: Changelog updates ]

Signed-off-by: Yang Yang &lt;yang.yang29@zte.com.cn&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Peter Ziljstra &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/202310101645437859599@zte.com.cn
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/psi: Avoid updating PSI triggers and -&gt;rtpoll_total when there are no state changes</title>
<updated>2023-10-11T21:07:50+00:00</updated>
<author>
<name>Yang Yang</name>
<email>yang.yang29@zte.com.cn</email>
</author>
<published>2023-10-10T08:41:07+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=80cc1d1d5ee35701daf11725ce06d8a240588973'/>
<id>80cc1d1d5ee35701daf11725ce06d8a240588973</id>
<content type='text'>
When psimon wakes up and there are no state changes for -&gt;rtpoll_states,
it's unnecessary to update triggers and -&gt;rtpoll_total because the pressures
being monitored by the user have not changed.

This will help to slightly reduce unnecessary computations of PSI.

[ mingo: Changelog updates ]

Signed-off-by: Yang Yang &lt;yang.yang29@zte.com.cn&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Peter Ziljstra &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/202310101641075436843@zte.com.cn
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When psimon wakes up and there are no state changes for -&gt;rtpoll_states,
it's unnecessary to update triggers and -&gt;rtpoll_total because the pressures
being monitored by the user have not changed.

This will help to slightly reduce unnecessary computations of PSI.

[ mingo: Changelog updates ]

Signed-off-by: Yang Yang &lt;yang.yang29@zte.com.cn&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Cc: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Peter Ziljstra &lt;peterz@infradead.org&gt;
Link: https://lore.kernel.org/r/202310101641075436843@zte.com.cn
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/psi: Change update_triggers() to a 'void' function</title>
<updated>2023-10-09T12:54:50+00:00</updated>
<author>
<name>Yang Yang</name>
<email>yang.yang29@zte.com.cn</email>
</author>
<published>2023-10-09T12:24:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e03dc9fa0663bc303383170e961561462ff00c93'/>
<id>e03dc9fa0663bc303383170e961561462ff00c93</id>
<content type='text'>
Update_triggers() always returns now + group-&gt;rtpoll_min_period, and the
return value is only used by psi_rtpoll_work(), so change update_triggers()
to a void function, let group-&gt;rtpoll_next_update = now +
group-&gt;rtpoll_min_period directly.

This will avoid unnecessary function return value passing &amp; simplifies
the function.

[ mingo: Updated changelog ]

Suggested-by: Suren Baghdasaryan &lt;surenb@google.com&gt;
Signed-off-by: Yang Yang &lt;yang.yang29@zte.com.cn&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lore.kernel.org/r/202310092024289721617@zte.com.cn
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Update_triggers() always returns now + group-&gt;rtpoll_min_period, and the
return value is only used by psi_rtpoll_work(), so change update_triggers()
to a void function, let group-&gt;rtpoll_next_update = now +
group-&gt;rtpoll_min_period directly.

This will avoid unnecessary function return value passing &amp; simplifies
the function.

[ mingo: Updated changelog ]

Suggested-by: Suren Baghdasaryan &lt;surenb@google.com&gt;
Signed-off-by: Yang Yang &lt;yang.yang29@zte.com.cn&gt;
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
Link: https://lore.kernel.org/r/202310092024289721617@zte.com.cn
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'v6.5-rc2' into sched/core, to pick up fixes</title>
<updated>2023-07-19T07:43:25+00:00</updated>
<author>
<name>Ingo Molnar</name>
<email>mingo@kernel.org</email>
</author>
<published>2023-07-19T07:43:25+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=752182b24bf4ffda1c5a8025515d53122d930bd8'/>
<id>752182b24bf4ffda1c5a8025515d53122d930bd8</id>
<content type='text'>
Sync with upstream fixes before applying EEVDF.

Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Sync with upstream fixes before applying EEVDF.

Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/psi: make psi_cgroups_enabled static</title>
<updated>2023-07-13T13:21:50+00:00</updated>
<author>
<name>Miaohe Lin</name>
<email>linmiaohe@huawei.com</email>
</author>
<published>2023-05-25T10:34:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=35cd21f6292c6656aaab6066a1fa13cd11ca27f5'/>
<id>35cd21f6292c6656aaab6066a1fa13cd11ca27f5</id>
<content type='text'>
The static key psi_cgroups_enabled is only used inside file psi.c.
Make it static.

Signed-off-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Suren Baghdasaryan &lt;surenb@google.com&gt;
Link: https://lore.kernel.org/r/20230525103428.49712-1-linmiaohe@huawei.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The static key psi_cgroups_enabled is only used inside file psi.c.
Make it static.

Signed-off-by: Miaohe Lin &lt;linmiaohe@huawei.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Reviewed-by: Suren Baghdasaryan &lt;surenb@google.com&gt;
Link: https://lore.kernel.org/r/20230525103428.49712-1-linmiaohe@huawei.com
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/psi: use kernfs polling functions for PSI trigger polling</title>
<updated>2023-07-10T07:52:30+00:00</updated>
<author>
<name>Suren Baghdasaryan</name>
<email>surenb@google.com</email>
</author>
<published>2023-06-30T00:56:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=aff037078ecaecf34a7c2afab1341815f90fba5e'/>
<id>aff037078ecaecf34a7c2afab1341815f90fba5e</id>
<content type='text'>
Destroying psi trigger in cgroup_file_release causes UAF issues when
a cgroup is removed from under a polling process. This is happening
because cgroup removal causes a call to cgroup_file_release while the
actual file is still alive. Destroying the trigger at this point would
also destroy its waitqueue head and if there is still a polling process
on that file accessing the waitqueue, it will step on the freed pointer:

do_select
  vfs_poll
                           do_rmdir
                             cgroup_rmdir
                               kernfs_drain_open_files
                                 cgroup_file_release
                                   cgroup_pressure_release
                                     psi_trigger_destroy
                                       wake_up_pollfree(&amp;t-&gt;event_wait)
// vfs_poll is unblocked
                                       synchronize_rcu
                                       kfree(t)
  poll_freewait -&gt; UAF access to the trigger's waitqueue head

Patch [1] fixed this issue for epoll() case using wake_up_pollfree(),
however the same issue exists for synchronous poll() case.
The root cause of this issue is that the lifecycles of the psi trigger's
waitqueue and of the file associated with the trigger are different. Fix
this by using kernfs_generic_poll function when polling on cgroup-specific
psi triggers. It internally uses kernfs_open_node-&gt;poll waitqueue head
with its lifecycle tied to the file's lifecycle. This also renders the
fix in [1] obsolete, so revert it.

[1] commit c2dbe32d5db5 ("sched/psi: Fix use-after-free in ep_remove_wait_queue()")

Fixes: 0e94682b73bf ("psi: introduce psi monitor")
Closes: https://lore.kernel.org/all/20230613062306.101831-1-lujialin4@huawei.com/
Reported-by: Lu Jialin &lt;lujialin4@huawei.com&gt;
Signed-off-by: Suren Baghdasaryan &lt;surenb@google.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20230630005612.1014540-1-surenb@google.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Destroying psi trigger in cgroup_file_release causes UAF issues when
a cgroup is removed from under a polling process. This is happening
because cgroup removal causes a call to cgroup_file_release while the
actual file is still alive. Destroying the trigger at this point would
also destroy its waitqueue head and if there is still a polling process
on that file accessing the waitqueue, it will step on the freed pointer:

do_select
  vfs_poll
                           do_rmdir
                             cgroup_rmdir
                               kernfs_drain_open_files
                                 cgroup_file_release
                                   cgroup_pressure_release
                                     psi_trigger_destroy
                                       wake_up_pollfree(&amp;t-&gt;event_wait)
// vfs_poll is unblocked
                                       synchronize_rcu
                                       kfree(t)
  poll_freewait -&gt; UAF access to the trigger's waitqueue head

Patch [1] fixed this issue for epoll() case using wake_up_pollfree(),
however the same issue exists for synchronous poll() case.
The root cause of this issue is that the lifecycles of the psi trigger's
waitqueue and of the file associated with the trigger are different. Fix
this by using kernfs_generic_poll function when polling on cgroup-specific
psi triggers. It internally uses kernfs_open_node-&gt;poll waitqueue head
with its lifecycle tied to the file's lifecycle. This also renders the
fix in [1] obsolete, so revert it.

[1] commit c2dbe32d5db5 ("sched/psi: Fix use-after-free in ep_remove_wait_queue()")

Fixes: 0e94682b73bf ("psi: introduce psi monitor")
Closes: https://lore.kernel.org/all/20230613062306.101831-1-lujialin4@huawei.com/
Reported-by: Lu Jialin &lt;lujialin4@huawei.com&gt;
Signed-off-by: Suren Baghdasaryan &lt;surenb@google.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Link: https://lkml.kernel.org/r/20230630005612.1014540-1-surenb@google.com
</pre>
</div>
</content>
</entry>
<entry>
<title>sched/psi: Avoid resetting the min update period when it is unnecessary</title>
<updated>2023-05-20T10:53:16+00:00</updated>
<author>
<name>Yang Yang</name>
<email>yang.yang29@zte.com.cn</email>
</author>
<published>2023-05-14T16:33:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e2a1f85bf9f509afd09b5d3308e3489b65845c28'/>
<id>e2a1f85bf9f509afd09b5d3308e3489b65845c28</id>
<content type='text'>
Psi_group's poll_min_period is determined by the minimum window size of
psi_trigger when creating new triggers. While destroying a psi_trigger,
there is no need to reset poll_min_period if the psi_trigger being
destroyed did not have the minimum window size, since in this condition
poll_min_period will remain the same as before.

Signed-off-by: Yang Yang &lt;yang.yang29@zte.com.cn&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Suren Baghdasaryan &lt;surenb@google.com&gt;
Link: https://lkml.kernel.org/r/20230514163338.834345-1-surenb@google.com
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Psi_group's poll_min_period is determined by the minimum window size of
psi_trigger when creating new triggers. While destroying a psi_trigger,
there is no need to reset poll_min_period if the psi_trigger being
destroyed did not have the minimum window size, since in this condition
poll_min_period will remain the same as before.

Signed-off-by: Yang Yang &lt;yang.yang29@zte.com.cn&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Suren Baghdasaryan &lt;surenb@google.com&gt;
Link: https://lkml.kernel.org/r/20230514163338.834345-1-surenb@google.com
</pre>
</div>
</content>
</entry>
<entry>
<title>psi: remove 500ms min window size limitation for triggers</title>
<updated>2023-05-08T08:58:38+00:00</updated>
<author>
<name>Suren Baghdasaryan</name>
<email>surenb@google.com</email>
</author>
<published>2023-03-03T01:13:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=519fabc7aaba3f0847cf37d5f9a5740c370eb777'/>
<id>519fabc7aaba3f0847cf37d5f9a5740c370eb777</id>
<content type='text'>
Current 500ms min window size for psi triggers limits polling interval
to 50ms to prevent polling threads from using too much cpu bandwidth by
polling too frequently. However the number of cgroups with triggers is
unlimited, so this protection can be defeated by creating multiple
cgroups with psi triggers (triggers in each cgroup are served by a single
"psimon" kernel thread).
Instead of limiting min polling period, which also limits the latency of
psi events, it's better to limit psi trigger creation to authorized users
only, like we do for system-wide psi triggers (/proc/pressure/* files can
be written only by processes with CAP_SYS_RESOURCE capability). This also
makes access rules for cgroup psi files consistent with system-wide ones.
Add a CAP_SYS_RESOURCE capability check for cgroup psi file writers and
remove the psi window min size limitation.

Suggested-by: Sudarshan Rajagopalan &lt;quic_sudaraja@quicinc.com&gt;
Signed-off-by: Suren Baghdasaryan &lt;surenb@google.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Link: https://lore.kernel.org/all/cover.1676067791.git.quic_sudaraja@quicinc.com/
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Current 500ms min window size for psi triggers limits polling interval
to 50ms to prevent polling threads from using too much cpu bandwidth by
polling too frequently. However the number of cgroups with triggers is
unlimited, so this protection can be defeated by creating multiple
cgroups with psi triggers (triggers in each cgroup are served by a single
"psimon" kernel thread).
Instead of limiting min polling period, which also limits the latency of
psi events, it's better to limit psi trigger creation to authorized users
only, like we do for system-wide psi triggers (/proc/pressure/* files can
be written only by processes with CAP_SYS_RESOURCE capability). This also
makes access rules for cgroup psi files consistent with system-wide ones.
Add a CAP_SYS_RESOURCE capability check for cgroup psi file writers and
remove the psi window min size limitation.

Suggested-by: Sudarshan Rajagopalan &lt;quic_sudaraja@quicinc.com&gt;
Signed-off-by: Suren Baghdasaryan &lt;surenb@google.com&gt;
Signed-off-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Acked-by: Michal Hocko &lt;mhocko@suse.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Link: https://lore.kernel.org/all/cover.1676067791.git.quic_sudaraja@quicinc.com/
</pre>
</div>
</content>
</entry>
</feed>
