<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/include/linux/cgroup-defs.h, branch v5.0-rc2</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>cpuset: Expose cpuset.cpus.subpartitions with cgroup_debug</title>
<updated>2018-11-08T20:27:32+00:00</updated>
<author>
<name>Waiman Long</name>
<email>longman@redhat.com</email>
</author>
<published>2018-11-08T15:08:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=5cf8114d6e90b3822be5eb6a2faedf99d1c08f77'/>
<id>5cf8114d6e90b3822be5eb6a2faedf99d1c08f77</id>
<content type='text'>
For debugging purpose, it will be useful to expose the content of the
subparts_cpus as a read-only file to see if the code work correctly.
However, subparts_cpus will not be used at all in most use cases. So
adding a new cpuset file that clutters the cgroup directory may not be
desirable.  This is now being done by using the hidden "cgroup_debug"
kernel command line option to expose a new "cpuset.cpus.subpartitions"
file.

That option was originally used by the debug controller to expose
itself when configured into the kernel. This is now extended to set an
internal flag used by cgroup_addrm_files(). A new CFTYPE_DEBUG flag
can now be used to specify that a cgroup file should only be created
when the "cgroup_debug" option is specified.

Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
For debugging purpose, it will be useful to expose the content of the
subparts_cpus as a read-only file to see if the code work correctly.
However, subparts_cpus will not be used at all in most use cases. So
adding a new cpuset file that clutters the cgroup directory may not be
desirable.  This is now being done by using the hidden "cgroup_debug"
kernel command line option to expose a new "cpuset.cpus.subpartitions"
file.

That option was originally used by the debug controller to expose
itself when configured into the kernel. This is now extended to set an
internal flag used by cgroup_addrm_files(). A new CFTYPE_DEBUG flag
can now be used to specify that a cgroup file should only be created
when the "cgroup_debug" option is specified.

Signed-off-by: Waiman Long &lt;longman@redhat.com&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>psi: cgroup support</title>
<updated>2018-10-26T23:26:32+00:00</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2018-10-26T22:06:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2ce7135adc9ad081aa3c49744144376ac74fea60'/>
<id>2ce7135adc9ad081aa3c49744144376ac74fea60</id>
<content type='text'>
On a system that executes multiple cgrouped jobs and independent
workloads, we don't just care about the health of the overall system, but
also that of individual jobs, so that we can ensure individual job health,
fairness between jobs, or prioritize some jobs over others.

This patch implements pressure stall tracking for cgroups.  In kernels
with CONFIG_PSI=y, cgroup2 groups will have cpu.pressure, memory.pressure,
and io.pressure files that track aggregate pressure stall times for only
the tasks inside the cgroup.

Link: http://lkml.kernel.org/r/20180828172258.3185-10-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Tested-by: Daniel Drake &lt;drake@endlessm.com&gt;
Tested-by: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Christopher Lameter &lt;cl@linux.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Johannes Weiner &lt;jweiner@fb.com&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Enderborg &lt;peter.enderborg@sony.com&gt;
Cc: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Cc: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Vinayak Menon &lt;vinmenon@codeaurora.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
On a system that executes multiple cgrouped jobs and independent
workloads, we don't just care about the health of the overall system, but
also that of individual jobs, so that we can ensure individual job health,
fairness between jobs, or prioritize some jobs over others.

This patch implements pressure stall tracking for cgroups.  In kernels
with CONFIG_PSI=y, cgroup2 groups will have cpu.pressure, memory.pressure,
and io.pressure files that track aggregate pressure stall times for only
the tasks inside the cgroup.

Link: http://lkml.kernel.org/r/20180828172258.3185-10-hannes@cmpxchg.org
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Acked-by: Peter Zijlstra (Intel) &lt;peterz@infradead.org&gt;
Tested-by: Daniel Drake &lt;drake@endlessm.com&gt;
Tested-by: Suren Baghdasaryan &lt;surenb@google.com&gt;
Cc: Christopher Lameter &lt;cl@linux.com&gt;
Cc: Ingo Molnar &lt;mingo@redhat.com&gt;
Cc: Johannes Weiner &lt;jweiner@fb.com&gt;
Cc: Mike Galbraith &lt;efault@gmx.de&gt;
Cc: Peter Enderborg &lt;peter.enderborg@sony.com&gt;
Cc: Randy Dunlap &lt;rdunlap@infradead.org&gt;
Cc: Shakeel Butt &lt;shakeelb@google.com&gt;
Cc: Vinayak Menon &lt;vinmenon@codeaurora.org&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup: Fix dom_cgrp propagation when enabling threaded mode</title>
<updated>2018-10-04T20:28:08+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2018-10-04T20:28:08+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=479adb89a97b0a33e5a9d702119872cc82ca21aa'/>
<id>479adb89a97b0a33e5a9d702119872cc82ca21aa</id>
<content type='text'>
A cgroup which is already a threaded domain may be converted into a
threaded cgroup if the prerequisite conditions are met.  When this
happens, all threaded descendant should also have their -&gt;dom_cgrp
updated to the new threaded domain cgroup.  Unfortunately, this
propagation was missing leading to the following failure.

  # cd /sys/fs/cgroup/unified
  # cat cgroup.subtree_control    # show that no controllers are enabled

  # mkdir -p mycgrp/a/b/c
  # echo threaded &gt; mycgrp/a/b/cgroup.type

  At this point, the hierarchy looks as follows:

      mycgrp [d]
	  a [dt]
	      b [t]
		  c [inv]

  Now let's make node "a" threaded (and thus "mycgrp" s made "domain threaded"):

  # echo threaded &gt; mycgrp/a/cgroup.type

  By this point, we now have a hierarchy that looks as follows:

      mycgrp [dt]
	  a [t]
	      b [t]
		  c [inv]

  But, when we try to convert the node "c" from "domain invalid" to
  "threaded", we get ENOTSUP on the write():

  # echo threaded &gt; mycgrp/a/b/c/cgroup.type
  sh: echo: write error: Operation not supported

This patch fixes the problem by

* Moving the opencoded -&gt;dom_cgrp save and restoration in
  cgroup_enable_threaded() into cgroup_{save|restore}_control() so
  that mulitple cgroups can be handled.

* Updating all threaded descendants' -&gt;dom_cgrp to point to the new
  dom_cgrp when enabling threaded mode.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-and-tested-by: "Michael Kerrisk (man-pages)" &lt;mtk.manpages@gmail.com&gt;
Reported-by: Amin Jamali &lt;ajamali@pivotal.io&gt;
Reported-by: Joao De Almeida Pereira &lt;jpereira@pivotal.io&gt;
Link: https://lore.kernel.org/r/CAKgNAkhHYCMn74TCNiMJ=ccLd7DcmXSbvw3CbZ1YREeG7iJM5g@mail.gmail.com
Fixes: 454000adaa2a ("cgroup: introduce cgroup-&gt;dom_cgrp and threaded css_set handling")
Cc: stable@vger.kernel.org # v4.14+
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
A cgroup which is already a threaded domain may be converted into a
threaded cgroup if the prerequisite conditions are met.  When this
happens, all threaded descendant should also have their -&gt;dom_cgrp
updated to the new threaded domain cgroup.  Unfortunately, this
propagation was missing leading to the following failure.

  # cd /sys/fs/cgroup/unified
  # cat cgroup.subtree_control    # show that no controllers are enabled

  # mkdir -p mycgrp/a/b/c
  # echo threaded &gt; mycgrp/a/b/cgroup.type

  At this point, the hierarchy looks as follows:

      mycgrp [d]
	  a [dt]
	      b [t]
		  c [inv]

  Now let's make node "a" threaded (and thus "mycgrp" s made "domain threaded"):

  # echo threaded &gt; mycgrp/a/cgroup.type

  By this point, we now have a hierarchy that looks as follows:

      mycgrp [dt]
	  a [t]
	      b [t]
		  c [inv]

  But, when we try to convert the node "c" from "domain invalid" to
  "threaded", we get ENOTSUP on the write():

  # echo threaded &gt; mycgrp/a/b/c/cgroup.type
  sh: echo: write error: Operation not supported

This patch fixes the problem by

* Moving the opencoded -&gt;dom_cgrp save and restoration in
  cgroup_enable_threaded() into cgroup_{save|restore}_control() so
  that mulitple cgroups can be handled.

* Updating all threaded descendants' -&gt;dom_cgrp to point to the new
  dom_cgrp when enabling threaded mode.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
Reported-and-tested-by: "Michael Kerrisk (man-pages)" &lt;mtk.manpages@gmail.com&gt;
Reported-by: Amin Jamali &lt;ajamali@pivotal.io&gt;
Reported-by: Joao De Almeida Pereira &lt;jpereira@pivotal.io&gt;
Link: https://lore.kernel.org/r/CAKgNAkhHYCMn74TCNiMJ=ccLd7DcmXSbvw3CbZ1YREeG7iJM5g@mail.gmail.com
Fixes: 454000adaa2a ("cgroup: introduce cgroup-&gt;dom_cgrp and threaded css_set handling")
Cc: stable@vger.kernel.org # v4.14+
</pre>
</div>
</content>
</entry>
<entry>
<title>blkcg: add generic throttling mechanism</title>
<updated>2018-07-09T15:07:54+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>jbacik@fb.com</email>
</author>
<published>2018-07-03T15:14:55+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d09d8df3a29403693d9d20cc34ed101f2c558e2b'/>
<id>d09d8df3a29403693d9d20cc34ed101f2c558e2b</id>
<content type='text'>
Since IO can be issued from literally anywhere it's almost impossible to
do throttling without having some sort of adverse effect somewhere else
in the system because of locking or other dependencies.  The best way to
solve this is to do the throttling when we know we aren't holding any
other kernel resources.  Do this by tracking throttling in a per-blkg
basis, and if we require throttling flag the task that it needs to check
before it returns to user space and possibly sleep there.

This is to address the case where a process is doing work that is
generating IO that can't be throttled, whether that is directly with a
lot of REQ_META IO, or indirectly by allocating so much memory that it
is swamping the disk with REQ_SWAP.  We can't use task_add_work as we
don't want to induce a memory allocation in the IO path, so simply
saving the request queue in the task and flagging it to do the
notify_resume thing achieves the same result without the overhead of a
memory allocation.

Signed-off-by: Josef Bacik &lt;jbacik@fb.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Since IO can be issued from literally anywhere it's almost impossible to
do throttling without having some sort of adverse effect somewhere else
in the system because of locking or other dependencies.  The best way to
solve this is to do the throttling when we know we aren't holding any
other kernel resources.  Do this by tracking throttling in a per-blkg
basis, and if we require throttling flag the task that it needs to check
before it returns to user space and possibly sleep there.

This is to address the case where a process is doing work that is
generating IO that can't be throttled, whether that is directly with a
lot of REQ_META IO, or indirectly by allocating so much memory that it
is swamping the disk with REQ_SWAP.  We can't use task_add_work as we
don't want to induce a memory allocation in the IO path, so simply
saving the request queue in the task and flagging it to do the
notify_resume thing achieves the same result without the overhead of a
memory allocation.

Signed-off-by: Josef Bacik &lt;jbacik@fb.com&gt;
Acked-by: Tejun Heo &lt;tj@kernel.org&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup: Add cgroup_subsys-&gt;css_rstat_flush()</title>
<updated>2018-04-26T21:29:05+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2018-04-26T21:29:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8f53470bab04229e93ff9e4c20338cc08b42b344'/>
<id>8f53470bab04229e93ff9e4c20338cc08b42b344</id>
<content type='text'>
This patch adds cgroup_subsys-&gt;css_rstat_flush().  If a subsystem has
this callback, its csses are linked on cgrp-&gt;css_rstat_list and rstat
will call the function whenever the associated cgroup is flushed.
Flush is also performed when such csses are released so that residual
counts aren't lost.

Combined with the rstat API previous patches factored out, this allows
controllers to plug into rstat to manage their statistics in a
scalable way.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch adds cgroup_subsys-&gt;css_rstat_flush().  If a subsystem has
this callback, its csses are linked on cgrp-&gt;css_rstat_list and rstat
will call the function whenever the associated cgroup is flushed.
Flush is also performed when such csses are released so that residual
counts aren't lost.

Combined with the rstat API previous patches factored out, this allows
controllers to plug into rstat to manage their statistics in a
scalable way.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup: Distinguish base resource stat implementation from rstat</title>
<updated>2018-04-26T21:29:04+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2018-04-26T21:29:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d4ff749b5e0f1e2d4d69a3e4ea81cdeaeb4904d2'/>
<id>d4ff749b5e0f1e2d4d69a3e4ea81cdeaeb4904d2</id>
<content type='text'>
Base resource stat accounts universial (not specific to any
controller) resource consumptions on top of rstat.  Currently, its
implementation is intermixed with rstat implementation making the code
confusing to follow.

This patch clarifies the distintion by doing the followings.

* Encapsulate base resource stat counters, currently only cputime, in
  struct cgroup_base_stat.

* Move prev_cputime into struct cgroup and initialize it with cgroup.

* Rename the related functions so that they start with cgroup_base_stat.

* Prefix the related variables and field names with b.

This patch doesn't make any functional changes.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Base resource stat accounts universial (not specific to any
controller) resource consumptions on top of rstat.  Currently, its
implementation is intermixed with rstat implementation making the code
confusing to follow.

This patch clarifies the distintion by doing the followings.

* Encapsulate base resource stat counters, currently only cputime, in
  struct cgroup_base_stat.

* Move prev_cputime into struct cgroup and initialize it with cgroup.

* Rename the related functions so that they start with cgroup_base_stat.

* Prefix the related variables and field names with b.

This patch doesn't make any functional changes.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup: Rename stat to rstat</title>
<updated>2018-04-26T21:29:04+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2018-04-26T21:29:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c58632b3631cb222da41d9dc0dd39e106c1eafd0'/>
<id>c58632b3631cb222da41d9dc0dd39e106c1eafd0</id>
<content type='text'>
stat is too generic a name and ends up causing subtle confusions.
It'll be made generic so that controllers can plug into it, which will
make the problem worse.  Let's rename it to something more specific -
cgroup_rstat for cgroup recursive stat.

This patch does the following renames.  No other changes.

* cpu_stat	-&gt; rstat_cpu
* stat		-&gt; rstat
* ?cstat	-&gt; ?rstatc

Note that the renames are selective.  The unrenamed are the ones which
implement basic resource statistics on top of rstat.  This will be
further cleaned up in the following patches.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
stat is too generic a name and ends up causing subtle confusions.
It'll be made generic so that controllers can plug into it, which will
make the problem worse.  Let's rename it to something more specific -
cgroup_rstat for cgroup recursive stat.

This patch does the following renames.  No other changes.

* cpu_stat	-&gt; rstat_cpu
* stat		-&gt; rstat
* ?cstat	-&gt; ?rstatc

Note that the renames are selective.  The unrenamed are the ones which
implement basic resource statistics on top of rstat.  This will be
further cleaned up in the following patches.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup: Limit event generation frequency</title>
<updated>2018-04-26T21:29:04+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2018-04-26T21:29:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b12e35832805bf120819b83d8fdb6e4d227a5ddb'/>
<id>b12e35832805bf120819b83d8fdb6e4d227a5ddb</id>
<content type='text'>
".events" files generate file modified event to notify userland of
possible new events.  Some of the events can be quite bursty
(e.g. memory high event) and generating notification each time is
costly and pointless.

This patch implements a event rate limit mechanism.  If a new
notification is requested before 10ms has passed since the previous
notification, the new notification is delayed till then.

As this only delays from the second notification on in a given close
cluster of notifications, userland reactions to notifications
shouldn't be delayed at all in most cases while avoiding notification
storms.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
".events" files generate file modified event to notify userland of
possible new events.  Some of the events can be quite bursty
(e.g. memory high event) and generating notification each time is
costly and pointless.

This patch implements a event rate limit mechanism.  If a new
notification is requested before 10ms has passed since the previous
notification, the new notification is delayed till then.

As this only delays from the second notification on in a given close
cluster of notifications, userland reactions to notifications
shouldn't be delayed at all in most cases while avoiding notification
storms.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq</title>
<updated>2018-04-04T01:00:13+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2018-04-04T01:00:13+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d92cd810e64aa7cf22b05f0ea1c7d3e8dbae75fe'/>
<id>d92cd810e64aa7cf22b05f0ea1c7d3e8dbae75fe</id>
<content type='text'>
Pull workqueue updates from Tejun Heo:
 "rcu_work addition and a couple trivial changes"

* 'for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: remove the comment about the old manager_arb mutex
  workqueue: fix the comments of nr_idle
  fs/aio: Use rcu_work instead of explicit rcu and work item
  cgroup: Use rcu_work instead of explicit rcu and work item
  RCU, workqueue: Implement rcu_work
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull workqueue updates from Tejun Heo:
 "rcu_work addition and a couple trivial changes"

* 'for-4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: remove the comment about the old manager_arb mutex
  workqueue: fix the comments of nr_idle
  fs/aio: Use rcu_work instead of explicit rcu and work item
  cgroup: Use rcu_work instead of explicit rcu and work item
  RCU, workqueue: Implement rcu_work
</pre>
</div>
</content>
</entry>
<entry>
<title>cgroup: Use rcu_work instead of explicit rcu and work item</title>
<updated>2018-03-19T17:12:03+00:00</updated>
<author>
<name>Tejun Heo</name>
<email>tj@kernel.org</email>
</author>
<published>2018-03-14T19:45:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8f36aaec9c929f2864196b0799203491f6a67dc6'/>
<id>8f36aaec9c929f2864196b0799203491f6a67dc6</id>
<content type='text'>
Workqueue now has rcu_work.  Use it instead of open-coding rcu -&gt; work
item bouncing.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Workqueue now has rcu_work.  Use it instead of open-coding rcu -&gt; work
item bouncing.

Signed-off-by: Tejun Heo &lt;tj@kernel.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
