<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux-stable.git/include, branch v3.16.40</title>
<subtitle>Linux kernel stable tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/'/>
<entry>
<title>tcp: take care of truncations done by sk_filter()</title>
<updated>2017-02-23T03:54:46+00:00</updated>
<author>
<name>Eric Dumazet</name>
<email>edumazet@google.com</email>
</author>
<published>2016-11-10T21:12:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=3d59e6e25fd0cbe700d3f2910291729227dcfd23'/>
<id>3d59e6e25fd0cbe700d3f2910291729227dcfd23</id>
<content type='text'>
commit ac6e780070e30e4c35bd395acfe9191e6268bdd3 upstream.

With syzkaller help, Marco Grassi found a bug in TCP stack,
crashing in tcp_collapse()

Root cause is that sk_filter() can truncate the incoming skb,
but TCP stack was not really expecting this to happen.
It probably was expecting a simple DROP or ACCEPT behavior.

We first need to make sure no part of TCP header could be removed.
Then we need to adjust TCP_SKB_CB(skb)-&gt;end_seq

Many thanks to syzkaller team and Marco for giving us a reproducer.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: Marco Grassi &lt;marco.gra@gmail.com&gt;
Reported-by: Vladis Dronov &lt;vdronov@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit ac6e780070e30e4c35bd395acfe9191e6268bdd3 upstream.

With syzkaller help, Marco Grassi found a bug in TCP stack,
crashing in tcp_collapse()

Root cause is that sk_filter() can truncate the incoming skb,
but TCP stack was not really expecting this to happen.
It probably was expecting a simple DROP or ACCEPT behavior.

We first need to make sure no part of TCP header could be removed.
Then we need to adjust TCP_SKB_CB(skb)-&gt;end_seq

Many thanks to syzkaller team and Marco for giving us a reproducer.

Signed-off-by: Eric Dumazet &lt;edumazet@google.com&gt;
Reported-by: Marco Grassi &lt;marco.gra@gmail.com&gt;
Reported-by: Vladis Dronov &lt;vdronov@redhat.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>dccp: limit sk_filter trim to payload</title>
<updated>2017-02-23T03:54:46+00:00</updated>
<author>
<name>Willem de Bruijn</name>
<email>willemb@google.com</email>
</author>
<published>2016-07-12T22:18:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=49bbb1dc818f09bfb323ce5b36c47306ad9e94ac'/>
<id>49bbb1dc818f09bfb323ce5b36c47306ad9e94ac</id>
<content type='text'>
commit 4f0c40d94461cfd23893a17335b2ab78ecb333c8 upstream.

Dccp verifies packet integrity, including length, at initial rcv in
dccp_invalid_packet, later pulls headers in dccp_enqueue_skb.

A call to sk_filter in-between can cause __skb_pull to wrap skb-&gt;len.
skb_copy_datagram_msg interprets this as a negative value, so
(correctly) fails with EFAULT. The negative length is reported in
ioctl SIOCINQ or possibly in a DCCP_WARN in dccp_close.

Introduce an sk_receive_skb variant that caps how small a filter
program can trim packets, and call this in dccp with the header
length. Excessively trimmed packets are now processed normally and
queued for reception as 0B payloads.

Fixes: 7c657876b63c ("[DCCP]: Initial implementation")
Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 4f0c40d94461cfd23893a17335b2ab78ecb333c8 upstream.

Dccp verifies packet integrity, including length, at initial rcv in
dccp_invalid_packet, later pulls headers in dccp_enqueue_skb.

A call to sk_filter in-between can cause __skb_pull to wrap skb-&gt;len.
skb_copy_datagram_msg interprets this as a negative value, so
(correctly) fails with EFAULT. The negative length is reported in
ioctl SIOCINQ or possibly in a DCCP_WARN in dccp_close.

Introduce an sk_receive_skb variant that caps how small a filter
program can trim packets, and call this in dccp with the header
length. Excessively trimmed packets are now processed normally and
queued for reception as 0B payloads.

Fixes: 7c657876b63c ("[DCCP]: Initial implementation")
Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rose: limit sk_filter trim to payload</title>
<updated>2017-02-23T03:54:46+00:00</updated>
<author>
<name>Willem de Bruijn</name>
<email>willemb@google.com</email>
</author>
<published>2016-07-12T22:18:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=d0fb92f2ca9b8c3ba10047b5e134cd7d1459cc1c'/>
<id>d0fb92f2ca9b8c3ba10047b5e134cd7d1459cc1c</id>
<content type='text'>
commit f4979fcea7fd36d8e2f556abef86f80e0d5af1ba upstream.

Sockets can have a filter program attached that drops or trims
incoming packets based on the filter program return value.

Rose requires data packets to have at least ROSE_MIN_LEN bytes. It
verifies this on arrival in rose_route_frame and unconditionally pulls
the bytes in rose_recvmsg. The filter can trim packets to below this
value in-between, causing pull to fail, leaving the partial header at
the time of skb_copy_datagram_msg.

Place a lower bound on the size to which sk_filter may trim packets
by introducing sk_filter_trim_cap and call this for rose packets.

Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit f4979fcea7fd36d8e2f556abef86f80e0d5af1ba upstream.

Sockets can have a filter program attached that drops or trims
incoming packets based on the filter program return value.

Rose requires data packets to have at least ROSE_MIN_LEN bytes. It
verifies this on arrival in rose_route_frame and unconditionally pulls
the bytes in rose_recvmsg. The filter can trim packets to below this
value in-between, causing pull to fail, leaving the partial header at
the time of skb_copy_datagram_msg.

Place a lower bound on the size to which sk_filter may trim packets
by introducing sk_filter_trim_cap and call this for rose packets.

Signed-off-by: Willem de Bruijn &lt;willemb@google.com&gt;
Acked-by: Daniel Borkmann &lt;daniel@iogearbox.net&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>net: Add __sock_queue_rcv_skb()</title>
<updated>2017-02-23T03:54:46+00:00</updated>
<author>
<name>Ben Hutchings</name>
<email>ben@decadent.org.uk</email>
</author>
<published>2016-12-29T03:06:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=bed7167a188ec74018825232eeee67e2032275f8'/>
<id>bed7167a188ec74018825232eeee67e2032275f8</id>
<content type='text'>
Extraxcted from commit e6afc8ace6dd5cef5e812f26c72579da8806f5ac
"udp: remove headers from UDP packets before queueing".

Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Extraxcted from commit e6afc8ace6dd5cef5e812f26c72579da8806f5ac
"udp: remove headers from UDP packets before queueing".

Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>can: raw: raw_setsockopt: limit number of can_filter that can be set</title>
<updated>2017-02-23T03:54:43+00:00</updated>
<author>
<name>Marc Kleine-Budde</name>
<email>mkl@pengutronix.de</email>
</author>
<published>2016-12-05T10:44:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=2dd5177598ac0a78ee5860f3f7cec45b41385469'/>
<id>2dd5177598ac0a78ee5860f3f7cec45b41385469</id>
<content type='text'>
commit 332b05ca7a438f857c61a3c21a88489a21532364 upstream.

This patch adds a check to limit the number of can_filters that can be
set via setsockopt on CAN_RAW sockets. Otherwise allocations &gt; MAX_ORDER
are not prevented resulting in a warning.

Reference: https://lkml.org/lkml/2016/12/2/230

Reported-by: Andrey Konovalov &lt;andreyknvl@google.com&gt;
Tested-by: Andrey Konovalov &lt;andreyknvl@google.com&gt;
Signed-off-by: Marc Kleine-Budde &lt;mkl@pengutronix.de&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 332b05ca7a438f857c61a3c21a88489a21532364 upstream.

This patch adds a check to limit the number of can_filters that can be
set via setsockopt on CAN_RAW sockets. Otherwise allocations &gt; MAX_ORDER
are not prevented resulting in a warning.

Reference: https://lkml.org/lkml/2016/12/2/230

Reported-by: Andrey Konovalov &lt;andreyknvl@google.com&gt;
Tested-by: Andrey Konovalov &lt;andreyknvl@google.com&gt;
Signed-off-by: Marc Kleine-Budde &lt;mkl@pengutronix.de&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ip6_tunnel: Clear IP6CB in ip6tunnel_xmit()</title>
<updated>2017-02-23T03:54:27+00:00</updated>
<author>
<name>Eli Cooper</name>
<email>elicooper@gmx.com</email>
</author>
<published>2016-11-01T15:45:12+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=9cc906639d6cb4e2b4e4c6634fc56426f5894639'/>
<id>9cc906639d6cb4e2b4e4c6634fc56426f5894639</id>
<content type='text'>
commit 23f4ffedb7d751c7e298732ba91ca75d224bc1a6 upstream.

skb-&gt;cb may contain data from previous layers. In the observed scenario,
the garbage data were misinterpreted as IP6CB(skb)-&gt;frag_max_size, so
that small packets sent through the tunnel are mistakenly fragmented.

This patch unconditionally clears the control buffer in ip6tunnel_xmit(),
which affects ip6_tunnel, ip6_udp_tunnel and ip6_gre. Currently none of
these tunnels set IP6CB(skb)-&gt;flags, otherwise it needs to be done earlier.

Signed-off-by: Eli Cooper &lt;elicooper@gmx.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 23f4ffedb7d751c7e298732ba91ca75d224bc1a6 upstream.

skb-&gt;cb may contain data from previous layers. In the observed scenario,
the garbage data were misinterpreted as IP6CB(skb)-&gt;frag_max_size, so
that small packets sent through the tunnel are mistakenly fragmented.

This patch unconditionally clears the control buffer in ip6tunnel_xmit(),
which affects ip6_tunnel, ip6_udp_tunnel and ip6_gre. Currently none of
these tunnels set IP6CB(skb)-&gt;flags, otherwise it needs to be done earlier.

Signed-off-by: Eli Cooper &lt;elicooper@gmx.com&gt;
Signed-off-by: David S. Miller &lt;davem@davemloft.net&gt;
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>netfilter: nf_tables: fix type mismatch with error return from nft_parse_u32_check</title>
<updated>2017-02-23T03:54:23+00:00</updated>
<author>
<name>John W. Linville</name>
<email>linville@tuxdriver.com</email>
</author>
<published>2016-10-25T19:56:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=768071f432eee4448a1c064a2032e9c58830c748'/>
<id>768071f432eee4448a1c064a2032e9c58830c748</id>
<content type='text'>
commit f1d505bb762e30bf316ff5d3b604914649d6aed3 upstream.

Commit 36b701fae12ac ("netfilter: nf_tables: validate maximum value of
u32 netlink attributes") introduced nft_parse_u32_check with a return
value of "unsigned int", yet on error it returns "-ERANGE".

This patch corrects the mismatch by changing the return value to "int",
which happens to match the actual users of nft_parse_u32_check already.

Found by Coverity, CID 1373930.

Note that commit 21a9e0f1568ea ("netfilter: nft_exthdr: fix error
handling in nft_exthdr_init()) attempted to address the issue, but
did not address the return type of nft_parse_u32_check.

Signed-off-by: John W. Linville &lt;linville@tuxdriver.com&gt;
Cc: Laura Garcia Liebana &lt;nevola@gmail.com&gt;
Cc: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Cc: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Fixes: 36b701fae12ac ("netfilter: nf_tables: validate maximum value...")
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit f1d505bb762e30bf316ff5d3b604914649d6aed3 upstream.

Commit 36b701fae12ac ("netfilter: nf_tables: validate maximum value of
u32 netlink attributes") introduced nft_parse_u32_check with a return
value of "unsigned int", yet on error it returns "-ERANGE".

This patch corrects the mismatch by changing the return value to "int",
which happens to match the actual users of nft_parse_u32_check already.

Found by Coverity, CID 1373930.

Note that commit 21a9e0f1568ea ("netfilter: nft_exthdr: fix error
handling in nft_exthdr_init()) attempted to address the issue, but
did not address the return type of nft_parse_u32_check.

Signed-off-by: John W. Linville &lt;linville@tuxdriver.com&gt;
Cc: Laura Garcia Liebana &lt;nevola@gmail.com&gt;
Cc: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
Cc: Dan Carpenter &lt;dan.carpenter@oracle.com&gt;
Fixes: 36b701fae12ac ("netfilter: nf_tables: validate maximum value...")
Signed-off-by: Pablo Neira Ayuso &lt;pablo@netfilter.org&gt;
[bwh: Backported to 3.16: adjust context]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>target: Make EXTENDED_COPY 0xe4 failure return COPY TARGET DEVICE NOT REACHABLE</title>
<updated>2017-02-23T03:54:18+00:00</updated>
<author>
<name>Nicholas Bellinger</name>
<email>nab@linux-iscsi.org</email>
</author>
<published>2016-10-09T00:26:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=279a47ee31bc58e0f7965eb884e70feaafee7078'/>
<id>279a47ee31bc58e0f7965eb884e70feaafee7078</id>
<content type='text'>
commit 449a137846c84829a328757cd21fd9ca65c08519 upstream.

This patch addresses a bug where EXTENDED_COPY across multiple LUNs
results in a CHECK_CONDITION when the source + destination are not
located on the same physical node.

ESX Host environments expect sense COPY_ABORTED w/ COPY TARGET DEVICE
NOT REACHABLE to be returned when this occurs, in order to signal
fallback to local copy method.

As described in section 6.3.3 of spc4r22:

  "If it is not possible to complete processing of a segment because the
   copy manager is unable to establish communications with a copy target
   device, because the copy target device does not respond to INQUIRY,
   or because the data returned in response to INQUIRY indicates
   an unsupported logical unit, then the EXTENDED COPY command shall be
   terminated with CHECK CONDITION status, with the sense key set to
   COPY ABORTED, and the additional sense code set to COPY TARGET DEVICE
   NOT REACHABLE."

Tested on v4.1.y with ESX v5.5u2+ with BlockCopy across multiple nodes.

Reported-by: Nixon Vincent &lt;nixon.vincent@calsoftinc.com&gt;
Tested-by: Nixon Vincent &lt;nixon.vincent@calsoftinc.com&gt;
Cc: Nixon Vincent &lt;nixon.vincent@calsoftinc.com&gt;
Tested-by: Dinesh Israni &lt;ddi@datera.io&gt;
Signed-off-by: Dinesh Israni &lt;ddi@datera.io&gt;
Cc: Dinesh Israni &lt;ddi@datera.io&gt;
Signed-off-by: Nicholas Bellinger &lt;nab@linux-iscsi.org&gt;
[bwh: Backported to 3.16: generate the sense data in
 transport_send_check_condition_and_sense() rather than adding to
 sense_info_table]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 449a137846c84829a328757cd21fd9ca65c08519 upstream.

This patch addresses a bug where EXTENDED_COPY across multiple LUNs
results in a CHECK_CONDITION when the source + destination are not
located on the same physical node.

ESX Host environments expect sense COPY_ABORTED w/ COPY TARGET DEVICE
NOT REACHABLE to be returned when this occurs, in order to signal
fallback to local copy method.

As described in section 6.3.3 of spc4r22:

  "If it is not possible to complete processing of a segment because the
   copy manager is unable to establish communications with a copy target
   device, because the copy target device does not respond to INQUIRY,
   or because the data returned in response to INQUIRY indicates
   an unsupported logical unit, then the EXTENDED COPY command shall be
   terminated with CHECK CONDITION status, with the sense key set to
   COPY ABORTED, and the additional sense code set to COPY TARGET DEVICE
   NOT REACHABLE."

Tested on v4.1.y with ESX v5.5u2+ with BlockCopy across multiple nodes.

Reported-by: Nixon Vincent &lt;nixon.vincent@calsoftinc.com&gt;
Tested-by: Nixon Vincent &lt;nixon.vincent@calsoftinc.com&gt;
Cc: Nixon Vincent &lt;nixon.vincent@calsoftinc.com&gt;
Tested-by: Dinesh Israni &lt;ddi@datera.io&gt;
Signed-off-by: Dinesh Israni &lt;ddi@datera.io&gt;
Cc: Dinesh Israni &lt;ddi@datera.io&gt;
Signed-off-by: Nicholas Bellinger &lt;nab@linux-iscsi.org&gt;
[bwh: Backported to 3.16: generate the sense data in
 transport_send_check_condition_and_sense() rather than adding to
 sense_info_table]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>ipc/sem.c: fix complex_count vs. simple op race</title>
<updated>2017-02-23T03:54:12+00:00</updated>
<author>
<name>Manfred Spraul</name>
<email>manfred@colorfullife.com</email>
</author>
<published>2016-10-11T20:54:50+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=accb9f16adbaeb5878974abc7b0e334790e59c11'/>
<id>accb9f16adbaeb5878974abc7b0e334790e59c11</id>
<content type='text'>
commit 5864a2fd3088db73d47942370d0f7210a807b9bc upstream.

Commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") introduced a
race:

sem_lock has a fast path that allows parallel simple operations.
There are two reasons why a simple operation cannot run in parallel:
 - a non-simple operations is ongoing (sma-&gt;sem_perm.lock held)
 - a complex operation is sleeping (sma-&gt;complex_count != 0)

As both facts are stored independently, a thread can bypass the current
checks by sleeping in the right positions.  See below for more details
(or kernel bugzilla 105651).

The patch fixes that by creating one variable (complex_mode)
that tracks both reasons why parallel operations are not possible.

The patch also updates stale documentation regarding the locking.

With regards to stable kernels:
The patch is required for all kernels that include the
commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") (3.10?)

The alternative is to revert the patch that introduced the race.

The patch is safe for backporting, i.e. it makes no assumptions
about memory barriers in spin_unlock_wait().

Background:
Here is the race of the current implementation:

Thread A: (simple op)
- does the first "sma-&gt;complex_count == 0" test

Thread B: (complex op)
- does sem_lock(): This includes an array scan. But the scan can't
  find Thread A, because Thread A does not own sem-&gt;lock yet.
- the thread does the operation, increases complex_count,
  drops sem_lock, sleeps

Thread A:
- spin_lock(&amp;sem-&gt;lock), spin_is_locked(sma-&gt;sem_perm.lock)
- sleeps before the complex_count test

Thread C: (complex op)
- does sem_lock (no array scan, complex_count==1)
- wakes up Thread B.
- decrements complex_count

Thread A:
- does the complex_count test

Bug:
Now both thread A and thread C operate on the same array, without
any synchronization.

Fixes: 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()")
Link: http://lkml.kernel.org/r/1469123695-5661-1-git-send-email-manfred@colorfullife.com
Reported-by: &lt;felixh@informatik.uni-bremen.de&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: &lt;1vier1@web.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[bwh: Backported to 3.16:
 - We missed out on some earlier memory barrier changes
 - Use set_mb instead of smp_store_mb]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 5864a2fd3088db73d47942370d0f7210a807b9bc upstream.

Commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") introduced a
race:

sem_lock has a fast path that allows parallel simple operations.
There are two reasons why a simple operation cannot run in parallel:
 - a non-simple operations is ongoing (sma-&gt;sem_perm.lock held)
 - a complex operation is sleeping (sma-&gt;complex_count != 0)

As both facts are stored independently, a thread can bypass the current
checks by sleeping in the right positions.  See below for more details
(or kernel bugzilla 105651).

The patch fixes that by creating one variable (complex_mode)
that tracks both reasons why parallel operations are not possible.

The patch also updates stale documentation regarding the locking.

With regards to stable kernels:
The patch is required for all kernels that include the
commit 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()") (3.10?)

The alternative is to revert the patch that introduced the race.

The patch is safe for backporting, i.e. it makes no assumptions
about memory barriers in spin_unlock_wait().

Background:
Here is the race of the current implementation:

Thread A: (simple op)
- does the first "sma-&gt;complex_count == 0" test

Thread B: (complex op)
- does sem_lock(): This includes an array scan. But the scan can't
  find Thread A, because Thread A does not own sem-&gt;lock yet.
- the thread does the operation, increases complex_count,
  drops sem_lock, sleeps

Thread A:
- spin_lock(&amp;sem-&gt;lock), spin_is_locked(sma-&gt;sem_perm.lock)
- sleeps before the complex_count test

Thread C: (complex op)
- does sem_lock (no array scan, complex_count==1)
- wakes up Thread B.
- decrements complex_count

Thread A:
- does the complex_count test

Bug:
Now both thread A and thread C operate on the same array, without
any synchronization.

Fixes: 6d07b68ce16a ("ipc/sem.c: optimize sem_lock()")
Link: http://lkml.kernel.org/r/1469123695-5661-1-git-send-email-manfred@colorfullife.com
Reported-by: &lt;felixh@informatik.uni-bremen.de&gt;
Cc: "H. Peter Anvin" &lt;hpa@zytor.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Cc: Davidlohr Bueso &lt;dave@stgolabs.net&gt;
Cc: Thomas Gleixner &lt;tglx@linutronix.de&gt;
Cc: Ingo Molnar &lt;mingo@elte.hu&gt;
Cc: &lt;1vier1@web.de&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
[bwh: Backported to 3.16:
 - We missed out on some earlier memory barrier changes
 - Use set_mb instead of smp_store_mb]
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>compiler: Allow 1- and 2-byte smp_load_acquire() and smp_store_release()</title>
<updated>2017-02-23T03:54:11+00:00</updated>
<author>
<name>Paul E. McKenney</name>
<email>paulmck@linux.vnet.ibm.com</email>
</author>
<published>2014-09-05T18:14:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux-stable.git/commit/?id=6b454ee19ded0051ac67b90c809d64d8cd72a96b'/>
<id>6b454ee19ded0051ac67b90c809d64d8cd72a96b</id>
<content type='text'>
commit 536fa402221f09633e7c5801b327055ab716a363 upstream.

CPUs without single-byte and double-byte loads and stores place some
"interesting" requirements on concurrent code.  For example (adapted
from Peter Hurley's test code), suppose we have the following structure:

	struct foo {
		spinlock_t lock1;
		spinlock_t lock2;
		char a; /* Protected by lock1. */
		char b; /* Protected by lock2. */
	};
	struct foo *foop;

Of course, it is common (and good) practice to place data protected
by different locks in separate cache lines.  However, if the locks are
rarely acquired (for example, only in rare error cases), and there are
a great many instances of the data structure, then memory footprint can
trump false-sharing concerns, so that it can be better to place them in
the same cache cache line as above.

But if the CPU does not support single-byte loads and stores, a store
to foop-&gt;a will do a non-atomic read-modify-write operation on foop-&gt;b,
which will come as a nasty surprise to someone holding foop-&gt;lock2.  So we
now require CPUs to support single-byte and double-byte loads and stores.
Therefore, this commit adjusts the definition of __native_word() to allow
these sizes to be used by smp_load_acquire() and smp_store_release().

Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
commit 536fa402221f09633e7c5801b327055ab716a363 upstream.

CPUs without single-byte and double-byte loads and stores place some
"interesting" requirements on concurrent code.  For example (adapted
from Peter Hurley's test code), suppose we have the following structure:

	struct foo {
		spinlock_t lock1;
		spinlock_t lock2;
		char a; /* Protected by lock1. */
		char b; /* Protected by lock2. */
	};
	struct foo *foop;

Of course, it is common (and good) practice to place data protected
by different locks in separate cache lines.  However, if the locks are
rarely acquired (for example, only in rare error cases), and there are
a great many instances of the data structure, then memory footprint can
trump false-sharing concerns, so that it can be better to place them in
the same cache cache line as above.

But if the CPU does not support single-byte loads and stores, a store
to foop-&gt;a will do a non-atomic read-modify-write operation on foop-&gt;b,
which will come as a nasty surprise to someone holding foop-&gt;lock2.  So we
now require CPUs to support single-byte and double-byte loads and stores.
Therefore, this commit adjusts the definition of __native_word() to allow
these sizes to be used by smp_load_acquire() and smp_store_release().

Signed-off-by: Paul E. McKenney &lt;paulmck@linux.vnet.ibm.com&gt;
Cc: Peter Zijlstra &lt;peterz@infradead.org&gt;
Signed-off-by: Ben Hutchings &lt;ben@decadent.org.uk&gt;
</pre>
</div>
</content>
</entry>
</feed>
