<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/include/linux, branch v4.9.78</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>module: Add retpoline tag to VERMAGIC</title>
<updated>2018-01-23T18:57:06+00:00</updated>
<author>
<name>Andi Kleen</name>
<email>ak@linux.intel.com</email>
</author>
<published>2018-01-16T20:52:28+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=eee0cba7b02f21110a443867401b41dc25a60777'/>
<id>eee0cba7b02f21110a443867401b41dc25a60777</id>
<content type='text'>
commit 6cfb521ac0d5b97470883ff9b7facae264b7ab12 upstream.

Add a marker for retpoline to the module VERMAGIC. This catches the case
when a non RETPOLINE compiled module gets loaded into a retpoline kernel,
making it insecure.

It doesn't handle the case when retpoline has been runtime disabled.  Even
in this case the match of the retcompile status will be enforced.  This
implies that even with retpoline run time disabled all modules loaded need
to be recompiled.

Signed-off-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Acked-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Cc: rusty@rustcorp.com.au
Cc: arjan.van.de.ven@intel.com
Cc: jeyu@kernel.org
Cc: torvalds@linux-foundation.org
Link: https://lkml.kernel.org/r/20180116205228.4890-1-andi@firstfloor.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 6cfb521ac0d5b97470883ff9b7facae264b7ab12 upstream.

Add a marker for retpoline to the module VERMAGIC. This catches the case
when a non RETPOLINE compiled module gets loaded into a retpoline kernel,
making it insecure.

It doesn't handle the case when retpoline has been runtime disabled.  Even
in this case the match of the retcompile status will be enforced.  This
implies that even with retpoline run time disabled all modules loaded need
to be recompiled.

Signed-off-by: Andi Kleen &lt;ak@linux.intel.com&gt;
Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Acked-by: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Cc: rusty@rustcorp.com.au
Cc: arjan.van.de.ven@intel.com
Cc: jeyu@kernel.org
Cc: torvalds@linux-foundation.org
Link: https://lkml.kernel.org/r/20180116205228.4890-1-andi@firstfloor.org
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>objtool, modules: Discard objtool annotation sections for modules</title>
<updated>2018-01-17T08:38:58+00:00</updated>
<author>
<name>Josh Poimboeuf</name>
<email>jpoimboe@redhat.com</email>
</author>
<published>2017-03-01T18:04:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=35aee626fa6311c3942fedd50c50f6748922e89f'/>
<id>35aee626fa6311c3942fedd50c50f6748922e89f</id>
<content type='text'>
commit e390f9a9689a42f477a6073e2e7df530a4c1b740 upstream.

The '__unreachable' and '__func_stack_frame_non_standard' sections are
only used at compile time.  They're discarded for vmlinux but they
should also be discarded for modules.

Since this is a recurring pattern, prefix the section names with
".discard.".  It's a nice convention and vmlinux.lds.h already discards
such sections.

Also remove the 'a' (allocatable) flag from the __unreachable section
since it doesn't make sense for a discarded section.

Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: Jessica Yu &lt;jeyu@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fixes: d1091c7fa3d5 ("objtool: Improve detection of BUG() and other dead ends")
Link: http://lkml.kernel.org/r/20170301180444.lhd53c5tibc4ns77@treble
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
[dwmw2: Remove the unreachable part in backporting since it's not here yet]
Signed-off-by: David Woodhouse &lt;dwmw@amazon.co.ku&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit e390f9a9689a42f477a6073e2e7df530a4c1b740 upstream.

The '__unreachable' and '__func_stack_frame_non_standard' sections are
only used at compile time.  They're discarded for vmlinux but they
should also be discarded for modules.

Since this is a recurring pattern, prefix the section names with
".discard.".  It's a nice convention and vmlinux.lds.h already discards
such sections.

Also remove the 'a' (allocatable) flag from the __unreachable section
since it doesn't make sense for a discarded section.

Suggested-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Josh Poimboeuf &lt;jpoimboe@redhat.com&gt;
Cc: Jessica Yu &lt;jeyu@redhat.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Fixes: d1091c7fa3d5 ("objtool: Improve detection of BUG() and other dead ends")
Link: http://lkml.kernel.org/r/20170301180444.lhd53c5tibc4ns77@treble
Signed-off-by: Ingo Molnar &lt;mingo@kernel.org&gt;
[dwmw2: Remove the unreachable part in backporting since it's not here yet]
Signed-off-by: David Woodhouse &lt;dwmw@amazon.co.ku&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sysfs/cpu: Add vulnerability folder</title>
<updated>2018-01-17T08:38:57+00:00</updated>
<author>
<name>Thomas Gleixner</name>
<email>tglx@linutronix.de</email>
</author>
<published>2018-01-07T21:48:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=11ec2df9c02071a7c0a63a1febb53e76cdee56ac'/>
<id>11ec2df9c02071a7c0a63a1febb53e76cdee56ac</id>
<content type='text'>
commit 87590ce6e373d1a5401f6539f0c59ef92dd924a9 upstream.

As the meltdown/spectre problem affects several CPU architectures, it makes
sense to have common way to express whether a system is affected by a
particular vulnerability or not. If affected the way to express the
mitigation should be common as well.

Create /sys/devices/system/cpu/vulnerabilities folder and files for
meltdown, spectre_v1 and spectre_v2.

Allow architectures to override the show function.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Reviewed-by: Konrad Rzeszutek Wilk &lt;konrad.wilk@oracle.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Linus Torvalds &lt;torvalds@linuxfoundation.org&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Link: https://lkml.kernel.org/r/20180107214913.096657732@linutronix.de
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 87590ce6e373d1a5401f6539f0c59ef92dd924a9 upstream.

As the meltdown/spectre problem affects several CPU architectures, it makes
sense to have common way to express whether a system is affected by a
particular vulnerability or not. If affected the way to express the
mitigation should be common as well.

Create /sys/devices/system/cpu/vulnerabilities folder and files for
meltdown, spectre_v1 and spectre_v2.

Allow architectures to override the show function.

Signed-off-by: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Reviewed-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
Reviewed-by: Konrad Rzeszutek Wilk &lt;konrad.wilk@oracle.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Will Deacon &lt;will.deacon@arm.com&gt;
Cc: Dave Hansen &lt;dave.hansen@intel.com&gt;
Cc: Linus Torvalds &lt;torvalds@linuxfoundation.org&gt;
Cc: Borislav Petkov &lt;bp@alien8.de&gt;
Cc: David Woodhouse &lt;dwmw@amazon.co.uk&gt;
Link: https://lkml.kernel.org/r/20180107214913.096657732@linutronix.de
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>bpf: prevent out-of-bounds speculation</title>
<updated>2018-01-17T08:38:55+00:00</updated>
<author>
<name>Alexei Starovoitov</name>
<email>ast@kernel.org</email>
</author>
<published>2018-01-08T01:33:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=a9bfac14cde2b481eeb0e64fbe15305df66ab32e'/>
<id>a9bfac14cde2b481eeb0e64fbe15305df66ab32e</id>
<content type='text'>
commit b2157399cc9898260d6031c5bfe45fe137c1fbe7 upstream.

Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
memory accesses under a bounds check may be speculated even if the
bounds check fails, providing a primitive for building a side channel.

To avoid leaking kernel data round up array-based maps and mask the index
after bounds check, so speculated load with out of bounds index will load
either valid value from the array or zero from the padded area.

Unconditionally mask index for all array types even when max_entries
are not rounded to power of 2 for root user.
When map is created by unpriv user generate a sequence of bpf insns
that includes AND operation to make sure that JITed code includes
the same 'index &amp; index_mask' operation.

If prog_array map is created by unpriv user replace
  bpf_tail_call(ctx, map, index);
with
  if (index &gt;= max_entries) {
    index &amp;= map-&gt;index_mask;
    bpf_tail_call(ctx, map, index);
  }
(along with roundup to power 2) to prevent out-of-bounds speculation.
There is secondary redundant 'if (index &gt;= max_entries)' in the interpreter
and in all JITs, but they can be optimized later if necessary.

Other array-like maps (cpumap, devmap, sockmap, perf_event_array, cgroup_array)
cannot be used by unpriv, so no changes there.

That fixes bpf side of "Variant 1: bounds check bypass (CVE-2017-5753)" on
all architectures with and without JIT.

v2-&gt;v3:
Daniel noticed that attack potentially can be crafted via syscall commands
without loading the program, so add masking to those paths as well.

Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Jiri Slaby &lt;jslaby@suse.cz&gt;
[ Backported to 4.9 - gregkh ]
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit b2157399cc9898260d6031c5bfe45fe137c1fbe7 upstream.

Under speculation, CPUs may mis-predict branches in bounds checks. Thus,
memory accesses under a bounds check may be speculated even if the
bounds check fails, providing a primitive for building a side channel.

To avoid leaking kernel data round up array-based maps and mask the index
after bounds check, so speculated load with out of bounds index will load
either valid value from the array or zero from the padded area.

Unconditionally mask index for all array types even when max_entries
are not rounded to power of 2 for root user.
When map is created by unpriv user generate a sequence of bpf insns
that includes AND operation to make sure that JITed code includes
the same 'index &amp; index_mask' operation.

If prog_array map is created by unpriv user replace
  bpf_tail_call(ctx, map, index);
with
  if (index &gt;= max_entries) {
    index &amp;= map-&gt;index_mask;
    bpf_tail_call(ctx, map, index);
  }
(along with roundup to power 2) to prevent out-of-bounds speculation.
There is secondary redundant 'if (index &gt;= max_entries)' in the interpreter
and in all JITs, but they can be optimized later if necessary.

Other array-like maps (cpumap, devmap, sockmap, perf_event_array, cgroup_array)
cannot be used by unpriv, so no changes there.

That fixes bpf side of "Variant 1: bounds check bypass (CVE-2017-5753)" on
all architectures with and without JIT.

v2-&gt;v3:
Daniel noticed that attack potentially can be crafted via syscall commands
without loading the program, so add masking to those paths as well.

Signed-off-by: Alexei Starovoitov &lt;ast@kernel.org&gt;
Acked-by: John Fastabend &lt;john.fastabend@gmail.com&gt;
Signed-off-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Cc: Jiri Slaby &lt;jslaby@suse.cz&gt;
[ Backported to 4.9 - gregkh ]
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>net: stmmac: enable EEE in MII, GMII or RGMII only</title>
<updated>2018-01-17T08:38:53+00:00</updated>
<author>
<name>Jerome Brunet</name>
<email>jbrunet@baylibre.com</email>
</author>
<published>2018-01-03T15:46:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6f237183c7cad875e5ddc7c32641094117b036c9'/>
<id>6f237183c7cad875e5ddc7c32641094117b036c9</id>
<content type='text'>
[ Upstream commit 879626e3a52630316d817cbda7cec9a5446d1d82 ]

Note in the databook - Section 4.4 - EEE :
" The EEE feature is not supported when the MAC is configured to use the
TBI, RTBI, SMII, RMII or SGMII single PHY interface. Even if the MAC
supports multiple PHY interfaces, you should activate the EEE mode only
when the MAC is operating with GMII, MII, or RGMII interface."

Applying this restriction solves a stability issue observed on Amlogic
gxl platforms operating with RMII interface and the internal PHY.

Fixes: 83bf79b6bb64 ("stmmac: disable at run-time the EEE if not supported")
Signed-off-by: Jerome Brunet &lt;jbrunet@baylibre.com&gt;
Tested-by: Arnaud Patard &lt;arnaud.patard@rtp-net.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 879626e3a52630316d817cbda7cec9a5446d1d82 ]

Note in the databook - Section 4.4 - EEE :
" The EEE feature is not supported when the MAC is configured to use the
TBI, RTBI, SMII, RMII or SGMII single PHY interface. Even if the MAC
supports multiple PHY interfaces, you should activate the EEE mode only
when the MAC is operating with GMII, MII, or RGMII interface."

Applying this restriction solves a stability issue observed on Amlogic
gxl platforms operating with RMII interface and the internal PHY.

Fixes: 83bf79b6bb64 ("stmmac: disable at run-time the EEE if not supported")
Signed-off-by: Jerome Brunet &lt;jbrunet@baylibre.com&gt;
Tested-by: Arnaud Patard &lt;arnaud.patard@rtp-net.org&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>sh_eth: fix SH7757 GEther initialization</title>
<updated>2018-01-17T08:38:53+00:00</updated>
<author>
<name>Sergei Shtylyov</name>
<email>sergei.shtylyov@cogentembedded.com</email>
</author>
<published>2018-01-04T18:06:49+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=7f4226ffcba0afe7a352e6fed7f095eb9730584e'/>
<id>7f4226ffcba0afe7a352e6fed7f095eb9730584e</id>
<content type='text'>
[ Upstream commit 5133550296d43236439494aa955bfb765a89f615 ]

Renesas  SH7757 has 2 Fast and 2 Gigabit Ether controllers, while the
'sh_eth' driver can only reset and initialize TSU of the first controller
pair. Shimoda-san tried to solve that adding the 'needs_init' member to the
'struct sh_eth_plat_data', however the platform code still never sets this
flag. I think  that we can infer this information from the 'devno' variable
(set  to 'platform_device::id') and reset/init the Ether controller pair
only for an even 'devno'; therefore 'sh_eth_plat_data::needs_init' can be
removed...

Fixes: 150647fb2c31 ("net: sh_eth: change the condition of initialization")
Signed-off-by: Sergei Shtylyov &lt;sergei.shtylyov@cogentembedded.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
[ Upstream commit 5133550296d43236439494aa955bfb765a89f615 ]

Renesas  SH7757 has 2 Fast and 2 Gigabit Ether controllers, while the
'sh_eth' driver can only reset and initialize TSU of the first controller
pair. Shimoda-san tried to solve that adding the 'needs_init' member to the
'struct sh_eth_plat_data', however the platform code still never sets this
flag. I think  that we can infer this information from the 'devno' variable
(set  to 'platform_device::id') and reset/init the Ether controller pair
only for an even 'devno'; therefore 'sh_eth_plat_data::needs_init' can be
removed...

Fixes: 150647fb2c31 ("net: sh_eth: change the condition of initialization")
Signed-off-by: Sergei Shtylyov &lt;sergei.shtylyov@cogentembedded.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kernel: make groups_sort calling a responsibility group_info allocators</title>
<updated>2018-01-10T08:29:52+00:00</updated>
<author>
<name>Thiago Rafael Becker</name>
<email>thiago.becker@gmail.com</email>
</author>
<published>2017-12-14T23:33:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=79258d9834803518a80b0ed0603c790638f0478b'/>
<id>79258d9834803518a80b0ed0603c790638f0478b</id>
<content type='text'>
commit bdcf0a423ea1c40bbb40e7ee483b50fc8aa3d758 upstream.

In testing, we found that nfsd threads may call set_groups in parallel
for the same entry cached in auth.unix.gid, racing in the call of
groups_sort, corrupting the groups for that entry and leading to
permission denials for the client.

This patch:
 - Make groups_sort globally visible.
 - Move the call to groups_sort to the modifiers of group_info
 - Remove the call to groups_sort from set_groups

Link: http://lkml.kernel.org/r/20171211151420.18655-1-thiago.becker@gmail.com
Signed-off-by: Thiago Rafael Becker &lt;thiago.becker@gmail.com&gt;
Reviewed-by: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Acked-by: "J. Bruce Fields" &lt;bfields@fieldses.org&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;


</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit bdcf0a423ea1c40bbb40e7ee483b50fc8aa3d758 upstream.

In testing, we found that nfsd threads may call set_groups in parallel
for the same entry cached in auth.unix.gid, racing in the call of
groups_sort, corrupting the groups for that entry and leading to
permission denials for the client.

This patch:
 - Make groups_sort globally visible.
 - Move the call to groups_sort to the modifiers of group_info
 - Remove the call to groups_sort from set_groups

Link: http://lkml.kernel.org/r/20171211151420.18655-1-thiago.becker@gmail.com
Signed-off-by: Thiago Rafael Becker &lt;thiago.becker@gmail.com&gt;
Reviewed-by: Matthew Wilcox &lt;mawilcox@microsoft.com&gt;
Reviewed-by: NeilBrown &lt;neilb@suse.com&gt;
Acked-by: "J. Bruce Fields" &lt;bfields@fieldses.org&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Martin Schwidefsky &lt;schwidefsky@de.ibm.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;


</pre>
</div>
</content>
</entry>
<entry>
<title>fscache: Fix the default for fscache_maybe_release_page()</title>
<updated>2018-01-10T08:29:52+00:00</updated>
<author>
<name>David Howells</name>
<email>dhowells@redhat.com</email>
</author>
<published>2018-01-02T10:02:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2b9b2002e05d1aa7d8c930d09d4342a6abacd2dd'/>
<id>2b9b2002e05d1aa7d8c930d09d4342a6abacd2dd</id>
<content type='text'>
commit 98801506552593c9b8ac11021b0cdad12cab4f6b upstream.

Fix the default for fscache_maybe_release_page() for when the cookie isn't
valid or the page isn't cached.  It mustn't return false as that indicates
the page cannot yet be freed.

The problem with the default is that if, say, there's no cache, but a
network filesystem's pages are using up almost all the available memory, a
system can OOM because the filesystem -&gt;releasepage() op will not allow
them to be released as fscache_maybe_release_page() incorrectly prevents
it.

This can be tested by writing a sequence of 512MiB files to an AFS mount.
It does not affect NFS or CIFS because both of those wrap the call in a
check of PG_fscache and it shouldn't bother Ceph as that only has
PG_private set whilst writeback is in progress.  This might be an issue for
9P, however.

Note that the pages aren't entirely stuck.  Removing a file or unmounting
will clear things because that uses -&gt;invalidatepage() instead.

Fixes: 201a15428bd5 ("FS-Cache: Handle pages pending storage that get evicted under OOM conditions")
Reported-by: Marc Dionne &lt;marc.dionne@auristor.com&gt;
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@redhat.com&gt;
Acked-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Tested-by: Marc Dionne &lt;marc.dionne@auristor.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 98801506552593c9b8ac11021b0cdad12cab4f6b upstream.

Fix the default for fscache_maybe_release_page() for when the cookie isn't
valid or the page isn't cached.  It mustn't return false as that indicates
the page cannot yet be freed.

The problem with the default is that if, say, there's no cache, but a
network filesystem's pages are using up almost all the available memory, a
system can OOM because the filesystem -&gt;releasepage() op will not allow
them to be released as fscache_maybe_release_page() incorrectly prevents
it.

This can be tested by writing a sequence of 512MiB files to an AFS mount.
It does not affect NFS or CIFS because both of those wrap the call in a
check of PG_fscache and it shouldn't bother Ceph as that only has
PG_private set whilst writeback is in progress.  This might be an issue for
9P, however.

Note that the pages aren't entirely stuck.  Removing a file or unmounting
will clear things because that uses -&gt;invalidatepage() instead.

Fixes: 201a15428bd5 ("FS-Cache: Handle pages pending storage that get evicted under OOM conditions")
Reported-by: Marc Dionne &lt;marc.dionne@auristor.com&gt;
Signed-off-by: David Howells &lt;dhowells@redhat.com&gt;
Reviewed-by: Jeff Layton &lt;jlayton@redhat.com&gt;
Acked-by: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Tested-by: Marc Dionne &lt;marc.dionne@auristor.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;

</pre>
</div>
</content>
</entry>
<entry>
<title>KPTI: Rename to PAGE_TABLE_ISOLATION</title>
<updated>2018-01-05T14:46:35+00:00</updated>
<author>
<name>Kees Cook</name>
<email>keescook@chromium.org</email>
</author>
<published>2018-01-03T18:17:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=e71fac01727a9495f08de9db5259089bee311766'/>
<id>e71fac01727a9495f08de9db5259089bee311766</id>
<content type='text'>
This renames CONFIG_KAISER to CONFIG_PAGE_TABLE_ISOLATION.

Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This renames CONFIG_KAISER to CONFIG_PAGE_TABLE_ISOLATION.

Signed-off-by: Kees Cook &lt;keescook@chromium.org&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kaiser: vmstat show NR_KAISERTABLE as nr_overhead</title>
<updated>2018-01-05T14:46:33+00:00</updated>
<author>
<name>Hugh Dickins</name>
<email>hughd@google.com</email>
</author>
<published>2017-09-10T04:27:32+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=1972bb9d92066fcf7deb0d798e02c88caa45e035'/>
<id>1972bb9d92066fcf7deb0d798e02c88caa45e035</id>
<content type='text'>
The kaiser update made an interesting choice, never to free any shadow
page tables.  Contention on global spinlock was worrying, particularly
with it held across page table scans when freeing.  Something had to be
done: I was going to add refcounting; but simply never to free them is
an appealing choice, minimizing contention without complicating the code
(the more a page table is found already, the less the spinlock is used).

But leaking pages in this way is also a worry: can we get away with it?
At the very least, we need a count to show how bad it actually gets:
in principle, one might end up wasting about 1/256 of memory that way
(1/512 for when direct-mapped pages have to be user-mapped, plus 1/512
for when they are user-mapped from the vmalloc area on another occasion
(but we don't have vmalloc'ed stacks, so only large ldts are vmalloc'ed).

Add per-cpu stat NR_KAISERTABLE: including 256 at startup for the
shared pgd entries, and 1 for each intermediate page table added
thereafter for user-mapping - but leave out the 1 per mm, for its
shadow pgd, because that distracts from the monotonic increase.
Shown in /proc/vmstat as nr_overhead (0 if kaiser not enabled).

In practice, it doesn't look so bad so far: more like 1/12000 after
nine hours of gtests below; and movable pageblock segregation should
tend to cluster the kaiser tables into a subset of the address space
(if not, they will be bad for compaction too).  But production may
tell a different story: keep an eye on this number, and bring back
lighter freeing if it gets out of control (maybe a shrinker).

["nr_overhead" should of course say "nr_kaisertable", if it needs
to stay; but for the moment we are being coy, preferring that when
Joe Blow notices a new line in his /proc/vmstat, he does not get
too curious about what this "kaiser" stuff might be.]

Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
The kaiser update made an interesting choice, never to free any shadow
page tables.  Contention on global spinlock was worrying, particularly
with it held across page table scans when freeing.  Something had to be
done: I was going to add refcounting; but simply never to free them is
an appealing choice, minimizing contention without complicating the code
(the more a page table is found already, the less the spinlock is used).

But leaking pages in this way is also a worry: can we get away with it?
At the very least, we need a count to show how bad it actually gets:
in principle, one might end up wasting about 1/256 of memory that way
(1/512 for when direct-mapped pages have to be user-mapped, plus 1/512
for when they are user-mapped from the vmalloc area on another occasion
(but we don't have vmalloc'ed stacks, so only large ldts are vmalloc'ed).

Add per-cpu stat NR_KAISERTABLE: including 256 at startup for the
shared pgd entries, and 1 for each intermediate page table added
thereafter for user-mapping - but leave out the 1 per mm, for its
shadow pgd, because that distracts from the monotonic increase.
Shown in /proc/vmstat as nr_overhead (0 if kaiser not enabled).

In practice, it doesn't look so bad so far: more like 1/12000 after
nine hours of gtests below; and movable pageblock segregation should
tend to cluster the kaiser tables into a subset of the address space
(if not, they will be bad for compaction too).  But production may
tell a different story: keep an eye on this number, and bring back
lighter freeing if it gets out of control (maybe a shrinker).

["nr_overhead" should of course say "nr_kaisertable", if it needs
to stay; but for the moment we are being coy, preferring that when
Joe Blow notices a new line in his /proc/vmstat, he does not get
too curious about what this "kaiser" stuff might be.]

Signed-off-by: Hugh Dickins &lt;hughd@google.com&gt;
Signed-off-by: Greg Kroah-Hartman &lt;gregkh@linuxfoundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
