<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/lib, branch v3.10-rc2</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>Merge branch 'for-3.10/drivers' of git://git.kernel.dk/linux-block</title>
<updated>2013-05-08T18:51:05+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-05-08T18:51:05+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ebb37277796269da36a8bc5d72ed1e8e1fb7d34b'/>
<id>ebb37277796269da36a8bc5d72ed1e8e1fb7d34b</id>
<content type='text'>
Pull block driver updates from Jens Axboe:
 "It might look big in volume, but when categorized, not a lot of
  drivers are touched.  The pull request contains:

   - mtip32xx fixes from Micron.

   - A slew of drbd updates, this time in a nicer series.

   - bcache, a flash/ssd caching framework from Kent.

   - Fixes for cciss"

* 'for-3.10/drivers' of git://git.kernel.dk/linux-block: (66 commits)
  bcache: Use bd_link_disk_holder()
  bcache: Allocator cleanup/fixes
  cciss: bug fix to prevent cciss from loading in kdump crash kernel
  cciss: add cciss_allow_hpsa module parameter
  drivers/block/mg_disk.c: add CONFIG_PM_SLEEP to suspend/resume functions
  mtip32xx: Workaround for unaligned writes
  bcache: Make sure blocksize isn't smaller than device blocksize
  bcache: Fix merge_bvec_fn usage for when it modifies the bvm
  bcache: Correctly check against BIO_MAX_PAGES
  bcache: Hack around stuff that clones up to bi_max_vecs
  bcache: Set ra_pages based on backing device's ra_pages
  bcache: Take data offset from the bdev superblock.
  mtip32xx: mtip32xx: Disable TRIM support
  mtip32xx: fix a smatch warning
  bcache: Disable broken btree fuzz tester
  bcache: Fix a format string overflow
  bcache: Fix a minor memory leak on device teardown
  bcache: Documentation updates
  bcache: Use WARN_ONCE() instead of __WARN()
  bcache: Add missing #include &lt;linux/prefetch.h&gt;
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull block driver updates from Jens Axboe:
 "It might look big in volume, but when categorized, not a lot of
  drivers are touched.  The pull request contains:

   - mtip32xx fixes from Micron.

   - A slew of drbd updates, this time in a nicer series.

   - bcache, a flash/ssd caching framework from Kent.

   - Fixes for cciss"

* 'for-3.10/drivers' of git://git.kernel.dk/linux-block: (66 commits)
  bcache: Use bd_link_disk_holder()
  bcache: Allocator cleanup/fixes
  cciss: bug fix to prevent cciss from loading in kdump crash kernel
  cciss: add cciss_allow_hpsa module parameter
  drivers/block/mg_disk.c: add CONFIG_PM_SLEEP to suspend/resume functions
  mtip32xx: Workaround for unaligned writes
  bcache: Make sure blocksize isn't smaller than device blocksize
  bcache: Fix merge_bvec_fn usage for when it modifies the bvm
  bcache: Correctly check against BIO_MAX_PAGES
  bcache: Hack around stuff that clones up to bi_max_vecs
  bcache: Set ra_pages based on backing device's ra_pages
  bcache: Take data offset from the bdev superblock.
  mtip32xx: mtip32xx: Disable TRIM support
  mtip32xx: fix a smatch warning
  bcache: Disable broken btree fuzz tester
  bcache: Fix a format string overflow
  bcache: Fix a minor memory leak on device teardown
  bcache: Documentation updates
  bcache: Use WARN_ONCE() instead of __WARN()
  bcache: Add missing #include &lt;linux/prefetch.h&gt;
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>rwsem: check counter to avoid cmpxchg calls</title>
<updated>2013-05-07T23:11:51+00:00</updated>
<author>
<name>Davidlohr Bueso</name>
<email>davidlohr.bueso@hp.com</email>
</author>
<published>2013-05-07T22:39:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9607a85b67a9714290a78c1a56630ab1c9fa2c23'/>
<id>9607a85b67a9714290a78c1a56630ab1c9fa2c23</id>
<content type='text'>
This patch tries to reduce the amount of cmpxchg calls in the writer
failed path by checking the counter value first before issuing the
instruction.  If -&gt;count is not set to RWSEM_WAITING_BIAS then there is
no point wasting a cmpxchg call.

Furthermore, Michel states "I suppose it helps due to the case where
someone else steals the lock while we're trying to acquire
sem-&gt;wait_lock."

Two very different workloads and machines were used to see how this
patch improves throughput: pgbench on a quad-core laptop and aim7 on a
large 8 socket box with 80 cores.

Some results comparing Michel's fast-path write lock stealing
(tps-rwsem) on a quad-core laptop running pgbench:

  | db_size | clients  |  tps-rwsem     |   tps-patch  |
  +---------+----------+----------------+--------------+
  | 160 MB   |       1 |           6906 |         9153 | + 32.5
  | 160 MB   |       2 |          15931 |        22487 | + 41.1%
  | 160 MB   |       4 |          33021 |        32503 |
  | 160 MB   |       8 |          34626 |        34695 |
  | 160 MB   |      16 |          33098 |        34003 |
  | 160 MB   |      20 |          31343 |        31440 |
  | 160 MB   |      30 |          28961 |        28987 |
  | 160 MB   |      40 |          26902 |        26970 |
  | 160 MB   |      50 |          25760 |        25810 |
  ------------------------------------------------------
  | 1.6 GB   |       1 |           7729 |         7537 |
  | 1.6 GB   |       2 |          19009 |        23508 | + 23.7%
  | 1.6 GB   |       4 |          33185 |        32666 |
  | 1.6 GB   |       8 |          34550 |        34318 |
  | 1.6 GB   |      16 |          33079 |        32689 |
  | 1.6 GB   |      20 |          31494 |        31702 |
  | 1.6 GB   |      30 |          28535 |        28755 |
  | 1.6 GB   |      40 |          27054 |        27017 |
  | 1.6 GB   |      50 |          25591 |        25560 |
  ------------------------------------------------------
  | 7.6 GB   |       1 |           6224 |         7469 | + 20.0%
  | 7.6 GB   |       2 |          13611 |        12778 |
  | 7.6 GB   |       4 |          33108 |        32927 |
  | 7.6 GB   |       8 |          34712 |        34878 |
  | 7.6 GB   |      16 |          32895 |        33003 |
  | 7.6 GB   |      20 |          31689 |        31974 |
  | 7.6 GB   |      30 |          29003 |        28806 |
  | 7.6 GB   |      40 |          26683 |        26976 |
  | 7.6 GB   |      50 |          25925 |        25652 |
  ------------------------------------------------------

For the aim7 worloads, they overall improved on top of Michel's
patchset.  For full graphs on how the rwsem series plus this patch
behaves on a large 8 socket machine against a vanilla kernel:

  http://stgolabs.net/rwsem-aim7-results.tar.gz

Signed-off-by: Davidlohr Bueso &lt;davidlohr.bueso@hp.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This patch tries to reduce the amount of cmpxchg calls in the writer
failed path by checking the counter value first before issuing the
instruction.  If -&gt;count is not set to RWSEM_WAITING_BIAS then there is
no point wasting a cmpxchg call.

Furthermore, Michel states "I suppose it helps due to the case where
someone else steals the lock while we're trying to acquire
sem-&gt;wait_lock."

Two very different workloads and machines were used to see how this
patch improves throughput: pgbench on a quad-core laptop and aim7 on a
large 8 socket box with 80 cores.

Some results comparing Michel's fast-path write lock stealing
(tps-rwsem) on a quad-core laptop running pgbench:

  | db_size | clients  |  tps-rwsem     |   tps-patch  |
  +---------+----------+----------------+--------------+
  | 160 MB   |       1 |           6906 |         9153 | + 32.5
  | 160 MB   |       2 |          15931 |        22487 | + 41.1%
  | 160 MB   |       4 |          33021 |        32503 |
  | 160 MB   |       8 |          34626 |        34695 |
  | 160 MB   |      16 |          33098 |        34003 |
  | 160 MB   |      20 |          31343 |        31440 |
  | 160 MB   |      30 |          28961 |        28987 |
  | 160 MB   |      40 |          26902 |        26970 |
  | 160 MB   |      50 |          25760 |        25810 |
  ------------------------------------------------------
  | 1.6 GB   |       1 |           7729 |         7537 |
  | 1.6 GB   |       2 |          19009 |        23508 | + 23.7%
  | 1.6 GB   |       4 |          33185 |        32666 |
  | 1.6 GB   |       8 |          34550 |        34318 |
  | 1.6 GB   |      16 |          33079 |        32689 |
  | 1.6 GB   |      20 |          31494 |        31702 |
  | 1.6 GB   |      30 |          28535 |        28755 |
  | 1.6 GB   |      40 |          27054 |        27017 |
  | 1.6 GB   |      50 |          25591 |        25560 |
  ------------------------------------------------------
  | 7.6 GB   |       1 |           6224 |         7469 | + 20.0%
  | 7.6 GB   |       2 |          13611 |        12778 |
  | 7.6 GB   |       4 |          33108 |        32927 |
  | 7.6 GB   |       8 |          34712 |        34878 |
  | 7.6 GB   |      16 |          32895 |        33003 |
  | 7.6 GB   |      20 |          31689 |        31974 |
  | 7.6 GB   |      30 |          29003 |        28806 |
  | 7.6 GB   |      40 |          26683 |        26976 |
  | 7.6 GB   |      50 |          25925 |        25652 |
  ------------------------------------------------------

For the aim7 worloads, they overall improved on top of Michel's
patchset.  For full graphs on how the rwsem series plus this patch
behaves on a large 8 socket machine against a vanilla kernel:

  http://stgolabs.net/rwsem-aim7-results.tar.gz

Signed-off-by: Davidlohr Bueso &lt;davidlohr.bueso@hp.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>kref: minor cleanup</title>
<updated>2013-05-07T23:09:00+00:00</updated>
<author>
<name>Anatol Pomozov</name>
<email>anatol.pomozov@gmail.com</email>
</author>
<published>2013-05-07T22:37:48+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=2d864e41710f1d2ba406fb62018ab0487152e6f2'/>
<id>2d864e41710f1d2ba406fb62018ab0487152e6f2</id>
<content type='text'>
 - make warning smp-safe
 - result of atomic _unless_zero functions should be checked by caller
   to avoid use-after-free error
 - trivial whitespace fix.

Link: https://lkml.org/lkml/2013/4/12/391

Tested: compile x86, boot machine and run xfstests
Signed-off-by: Anatol Pomozov &lt;anatol.pomozov@gmail.com&gt;
[ Removed line-break, changed to use WARN_ON_ONCE()  - Linus ]
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
 - make warning smp-safe
 - result of atomic _unless_zero functions should be checked by caller
   to avoid use-after-free error
 - trivial whitespace fix.

Link: https://lkml.org/lkml/2013/4/12/391

Tested: compile x86, boot machine and run xfstests
Signed-off-by: Anatol Pomozov &lt;anatol.pomozov@gmail.com&gt;
[ Removed line-break, changed to use WARN_ON_ONCE()  - Linus ]
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge branch 'rwsem-optimizations'</title>
<updated>2013-05-07T16:22:03+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2013-05-07T16:22:03+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c8de2fa4dc2778ae3605925c127b3deac54b2b3a'/>
<id>c8de2fa4dc2778ae3605925c127b3deac54b2b3a</id>
<content type='text'>
Merge rwsem optimizations from Michel Lespinasse:
 "These patches extend Alex Shi's work (which added write lock stealing
  on the rwsem slow path) in order to provide rwsem write lock stealing
  on the fast path (that is, without taking the rwsem's wait_lock).

  I have unfortunately been unable to push this through -next before due
  to Ingo Molnar / David Howells / Peter Zijlstra being busy with other
  things.  However, this has gotten some attention from Rik van Riel and
  Davidlohr Bueso who both commented that they felt this was ready for
  v3.10, and Ingo Molnar has said that he was OK with me pushing
  directly to you.  So, here goes :)

  Davidlohr got the following test results from pgbench running on a
  quad-core laptop:

    | db_size | clients  |  tps-vanilla   |   tps-rwsem  |
    +---------+----------+----------------+--------------+
    | 160 MB   |       1 |           5803 |         6906 | + 19.0%
    | 160 MB   |       2 |          13092 |        15931 |
    | 160 MB   |       4 |          29412 |        33021 |
    | 160 MB   |       8 |          32448 |        34626 |
    | 160 MB   |      16 |          32758 |        33098 |
    | 160 MB   |      20 |          26940 |        31343 | + 16.3%
    | 160 MB   |      30 |          25147 |        28961 |
    | 160 MB   |      40 |          25484 |        26902 |
    | 160 MB   |      50 |          24528 |        25760 |
    ------------------------------------------------------
    | 1.6 GB   |       1 |           5733 |         7729 | + 34.8%
    | 1.6 GB   |       2 |           9411 |        19009 | + 101.9%
    | 1.6 GB   |       4 |          31818 |        33185 |
    | 1.6 GB   |       8 |          33700 |        34550 |
    | 1.6 GB   |      16 |          32751 |        33079 |
    | 1.6 GB   |      20 |          30919 |        31494 |
    | 1.6 GB   |      30 |          28540 |        28535 |
    | 1.6 GB   |      40 |          26380 |        27054 |
    | 1.6 GB   |      50 |          25241 |        25591 |
    ------------------------------------------------------
    | 7.6 GB   |       1 |           5779 |         6224 |
    | 7.6 GB   |       2 |          10897 |        13611 | + 24.9%
    | 7.6 GB   |       4 |          32683 |        33108 |
    | 7.6 GB   |       8 |          33968 |        34712 |
    | 7.6 GB   |      16 |          32287 |        32895 |
    | 7.6 GB   |      20 |          27770 |        31689 | + 14.1%
    | 7.6 GB   |      30 |          26739 |        29003 |
    | 7.6 GB   |      40 |          24901 |        26683 |
    | 7.6 GB   |      50 |          17115 |        25925 | + 51.5%
    ------------------------------------------------------

  (Davidlohr also has one additional patch which further improves
  throughput, though I will ask him to send it directly to you as I have
  suggested some minor changes)."

* emailed patches from Michel Lespinasse &lt;walken@google.com&gt;:
  rwsem: no need for explicit signed longs
  x86 rwsem: avoid taking slow path when stealing write lock
  rwsem: do not block readers at head of queue if other readers are active
  rwsem: implement support for write lock stealing on the fastpath
  rwsem: simplify __rwsem_do_wake
  rwsem: skip initial trylock in rwsem_down_write_failed
  rwsem: avoid taking wait_lock in rwsem_down_write_failed
  rwsem: use cmpxchg for trying to steal write lock
  rwsem: more agressive lock stealing in rwsem_down_write_failed
  rwsem: simplify rwsem_down_write_failed
  rwsem: simplify rwsem_down_read_failed
  rwsem: move rwsem_down_failed_common code into rwsem_down_{read,write}_failed
  rwsem: shorter spinlocked section in rwsem_down_failed_common()
  rwsem: make the waiter type an enumeration rather than a bitmask
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Merge rwsem optimizations from Michel Lespinasse:
 "These patches extend Alex Shi's work (which added write lock stealing
  on the rwsem slow path) in order to provide rwsem write lock stealing
  on the fast path (that is, without taking the rwsem's wait_lock).

  I have unfortunately been unable to push this through -next before due
  to Ingo Molnar / David Howells / Peter Zijlstra being busy with other
  things.  However, this has gotten some attention from Rik van Riel and
  Davidlohr Bueso who both commented that they felt this was ready for
  v3.10, and Ingo Molnar has said that he was OK with me pushing
  directly to you.  So, here goes :)

  Davidlohr got the following test results from pgbench running on a
  quad-core laptop:

    | db_size | clients  |  tps-vanilla   |   tps-rwsem  |
    +---------+----------+----------------+--------------+
    | 160 MB   |       1 |           5803 |         6906 | + 19.0%
    | 160 MB   |       2 |          13092 |        15931 |
    | 160 MB   |       4 |          29412 |        33021 |
    | 160 MB   |       8 |          32448 |        34626 |
    | 160 MB   |      16 |          32758 |        33098 |
    | 160 MB   |      20 |          26940 |        31343 | + 16.3%
    | 160 MB   |      30 |          25147 |        28961 |
    | 160 MB   |      40 |          25484 |        26902 |
    | 160 MB   |      50 |          24528 |        25760 |
    ------------------------------------------------------
    | 1.6 GB   |       1 |           5733 |         7729 | + 34.8%
    | 1.6 GB   |       2 |           9411 |        19009 | + 101.9%
    | 1.6 GB   |       4 |          31818 |        33185 |
    | 1.6 GB   |       8 |          33700 |        34550 |
    | 1.6 GB   |      16 |          32751 |        33079 |
    | 1.6 GB   |      20 |          30919 |        31494 |
    | 1.6 GB   |      30 |          28540 |        28535 |
    | 1.6 GB   |      40 |          26380 |        27054 |
    | 1.6 GB   |      50 |          25241 |        25591 |
    ------------------------------------------------------
    | 7.6 GB   |       1 |           5779 |         6224 |
    | 7.6 GB   |       2 |          10897 |        13611 | + 24.9%
    | 7.6 GB   |       4 |          32683 |        33108 |
    | 7.6 GB   |       8 |          33968 |        34712 |
    | 7.6 GB   |      16 |          32287 |        32895 |
    | 7.6 GB   |      20 |          27770 |        31689 | + 14.1%
    | 7.6 GB   |      30 |          26739 |        29003 |
    | 7.6 GB   |      40 |          24901 |        26683 |
    | 7.6 GB   |      50 |          17115 |        25925 | + 51.5%
    ------------------------------------------------------

  (Davidlohr also has one additional patch which further improves
  throughput, though I will ask him to send it directly to you as I have
  suggested some minor changes)."

* emailed patches from Michel Lespinasse &lt;walken@google.com&gt;:
  rwsem: no need for explicit signed longs
  x86 rwsem: avoid taking slow path when stealing write lock
  rwsem: do not block readers at head of queue if other readers are active
  rwsem: implement support for write lock stealing on the fastpath
  rwsem: simplify __rwsem_do_wake
  rwsem: skip initial trylock in rwsem_down_write_failed
  rwsem: avoid taking wait_lock in rwsem_down_write_failed
  rwsem: use cmpxchg for trying to steal write lock
  rwsem: more agressive lock stealing in rwsem_down_write_failed
  rwsem: simplify rwsem_down_write_failed
  rwsem: simplify rwsem_down_read_failed
  rwsem: move rwsem_down_failed_common code into rwsem_down_{read,write}_failed
  rwsem: shorter spinlocked section in rwsem_down_failed_common()
  rwsem: make the waiter type an enumeration rather than a bitmask
</pre>
</div>
</content>
</entry>
<entry>
<title>rwsem: no need for explicit signed longs</title>
<updated>2013-05-07T14:20:17+00:00</updated>
<author>
<name>Davidlohr Bueso</name>
<email>davidlohr.bueso@hp.com</email>
</author>
<published>2013-05-07T13:46:02+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=b5f541810ea9fb98d93c0ee0e00e07a22874856f'/>
<id>b5f541810ea9fb98d93c0ee0e00e07a22874856f</id>
<content type='text'>
Change explicit "signed long" declarations into plain "long" as suggested
by Peter Hurley.

Signed-off-by: Davidlohr Bueso &lt;davidlohr.bueso@hp.com&gt;
Reviewed-by: Michel Lespinasse &lt;walken@google.com&gt;
Signed-off-by: Michel Lespinasse &lt;walken@google.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Change explicit "signed long" declarations into plain "long" as suggested
by Peter Hurley.

Signed-off-by: Davidlohr Bueso &lt;davidlohr.bueso@hp.com&gt;
Reviewed-by: Michel Lespinasse &lt;walken@google.com&gt;
Signed-off-by: Michel Lespinasse &lt;walken@google.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rwsem: do not block readers at head of queue if other readers are active</title>
<updated>2013-05-07T14:20:17+00:00</updated>
<author>
<name>Michel Lespinasse</name>
<email>walken@google.com</email>
</author>
<published>2013-05-07T13:46:00+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=25c39325968bbcebe6cd2a1991228c9dfb48d655'/>
<id>25c39325968bbcebe6cd2a1991228c9dfb48d655</id>
<content type='text'>
This change fixes a race condition where a reader might determine it
needs to block, but by the time it acquires the wait_lock the rwsem has
active readers and no queued waiters.

In this situation the reader can run in parallel with the existing
active readers; it does not need to block until the active readers
complete.

Thanks to Peter Hurley for noticing this possible race.

Signed-off-by: Michel Lespinasse &lt;walken@google.com&gt;
Reviewed-by: Peter Hurley &lt;peter@hurleysoftware.com&gt;
Acked-by: Davidlohr Bueso &lt;davidlohr.bueso@hp.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This change fixes a race condition where a reader might determine it
needs to block, but by the time it acquires the wait_lock the rwsem has
active readers and no queued waiters.

In this situation the reader can run in parallel with the existing
active readers; it does not need to block until the active readers
complete.

Thanks to Peter Hurley for noticing this possible race.

Signed-off-by: Michel Lespinasse &lt;walken@google.com&gt;
Reviewed-by: Peter Hurley &lt;peter@hurleysoftware.com&gt;
Acked-by: Davidlohr Bueso &lt;davidlohr.bueso@hp.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rwsem: implement support for write lock stealing on the fastpath</title>
<updated>2013-05-07T14:20:16+00:00</updated>
<author>
<name>Michel Lespinasse</name>
<email>walken@google.com</email>
</author>
<published>2013-05-07T13:45:59+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=fe6e674c6187d4f452a679ced7e95262bd517936'/>
<id>fe6e674c6187d4f452a679ced7e95262bd517936</id>
<content type='text'>
When we decide to wake up readers, we must first grant them as many read
locks as necessary, and then actually wake up all these readers.  But in
order to know how many read shares to grant, we must first count the
readers at the head of the queue.  This might take a while if there are
many readers, and we want to be protected against a writer stealing the
lock while we're counting.  To that end, we grant the first reader lock
before counting how many more readers are queued.

We also require some adjustments to the wake_type semantics.

RWSEM_WAKE_NO_ACTIVE used to mean that we had found the count to be
RWSEM_WAITING_BIAS, in which case the rwsem was known to be free as
nobody could steal it while we hold the wait_lock.  This doesn't make
sense once we implement fastpath write lock stealing, so we now use
RWSEM_WAKE_ANY in that case.

Similarly, when rwsem_down_write_failed found that a read lock was
active, it would use RWSEM_WAKE_READ_OWNED which signalled that new
readers could be woken without checking first that the rwsem was
available.  We can't do that anymore since the existing readers might
release their read locks, and a writer could steal the lock before we
wake up additional readers.  So, we have to use a new RWSEM_WAKE_READERS
value to indicate we only want to wake readers, but we don't currently
hold any read lock.

Signed-off-by: Michel Lespinasse &lt;walken@google.com&gt;
Reviewed-by: Peter Hurley &lt;peter@hurleysoftware.com&gt;
Acked-by: Davidlohr Bueso &lt;davidlohr.bueso@hp.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When we decide to wake up readers, we must first grant them as many read
locks as necessary, and then actually wake up all these readers.  But in
order to know how many read shares to grant, we must first count the
readers at the head of the queue.  This might take a while if there are
many readers, and we want to be protected against a writer stealing the
lock while we're counting.  To that end, we grant the first reader lock
before counting how many more readers are queued.

We also require some adjustments to the wake_type semantics.

RWSEM_WAKE_NO_ACTIVE used to mean that we had found the count to be
RWSEM_WAITING_BIAS, in which case the rwsem was known to be free as
nobody could steal it while we hold the wait_lock.  This doesn't make
sense once we implement fastpath write lock stealing, so we now use
RWSEM_WAKE_ANY in that case.

Similarly, when rwsem_down_write_failed found that a read lock was
active, it would use RWSEM_WAKE_READ_OWNED which signalled that new
readers could be woken without checking first that the rwsem was
available.  We can't do that anymore since the existing readers might
release their read locks, and a writer could steal the lock before we
wake up additional readers.  So, we have to use a new RWSEM_WAKE_READERS
value to indicate we only want to wake readers, but we don't currently
hold any read lock.

Signed-off-by: Michel Lespinasse &lt;walken@google.com&gt;
Reviewed-by: Peter Hurley &lt;peter@hurleysoftware.com&gt;
Acked-by: Davidlohr Bueso &lt;davidlohr.bueso@hp.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rwsem: simplify __rwsem_do_wake</title>
<updated>2013-05-07T14:20:16+00:00</updated>
<author>
<name>Michel Lespinasse</name>
<email>walken@google.com</email>
</author>
<published>2013-05-07T13:45:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=8cf5322ce69afea1fab6a6270db24d057d664798'/>
<id>8cf5322ce69afea1fab6a6270db24d057d664798</id>
<content type='text'>
This is mostly for cleanup value:

- We don't need several gotos to handle the case where the first
  waiter is a writer. Two simple tests will do (and generate very
  similar code).

- In the remainder of the function, we know the first waiter is a reader,
  so we don't have to double check that. We can use do..while loops
  to iterate over the readers to wake (generates slightly better code).

Signed-off-by: Michel Lespinasse &lt;walken@google.com&gt;
Reviewed-by: Peter Hurley &lt;peter@hurleysoftware.com&gt;
Acked-by: Davidlohr Bueso &lt;davidlohr.bueso@hp.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is mostly for cleanup value:

- We don't need several gotos to handle the case where the first
  waiter is a writer. Two simple tests will do (and generate very
  similar code).

- In the remainder of the function, we know the first waiter is a reader,
  so we don't have to double check that. We can use do..while loops
  to iterate over the readers to wake (generates slightly better code).

Signed-off-by: Michel Lespinasse &lt;walken@google.com&gt;
Reviewed-by: Peter Hurley &lt;peter@hurleysoftware.com&gt;
Acked-by: Davidlohr Bueso &lt;davidlohr.bueso@hp.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rwsem: skip initial trylock in rwsem_down_write_failed</title>
<updated>2013-05-07T14:20:16+00:00</updated>
<author>
<name>Michel Lespinasse</name>
<email>walken@google.com</email>
</author>
<published>2013-05-07T13:45:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9b0fc9c09f1b262b7fe697eba6b05095d78850e5'/>
<id>9b0fc9c09f1b262b7fe697eba6b05095d78850e5</id>
<content type='text'>
We can skip the initial trylock in rwsem_down_write_failed() if there
are known active lockers already, thus saving one likely-to-fail
cmpxchg.

Signed-off-by: Michel Lespinasse &lt;walken@google.com&gt;
Reviewed-by: Peter Hurley &lt;peter@hurleysoftware.com&gt;
Acked-by: Davidlohr Bueso &lt;davidlohr.bueso@hp.com&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We can skip the initial trylock in rwsem_down_write_failed() if there
are known active lockers already, thus saving one likely-to-fail
cmpxchg.

Signed-off-by: Michel Lespinasse &lt;walken@google.com&gt;
Reviewed-by: Peter Hurley &lt;peter@hurleysoftware.com&gt;
Acked-by: Davidlohr Bueso &lt;davidlohr.bueso@hp.com&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>rwsem: avoid taking wait_lock in rwsem_down_write_failed</title>
<updated>2013-05-07T14:20:16+00:00</updated>
<author>
<name>Michel Lespinasse</name>
<email>walken@google.com</email>
</author>
<published>2013-05-07T13:45:56+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a7d2c573ae7fad1b2c877d1a1342fa5bb0d6478c'/>
<id>a7d2c573ae7fad1b2c877d1a1342fa5bb0d6478c</id>
<content type='text'>
In rwsem_down_write_failed(), if there are active locks after we wake up
(i.e.  the lock got stolen from us), skip taking the wait_lock and go
back to sleep immediately.

Signed-off-by: Michel Lespinasse &lt;walken@google.com&gt;
Reviewed-by: Peter Hurley &lt;peter@hurleysoftware.com&gt;
Acked-by: Davidlohr Bueso &lt;davidlohr.bueso@hp.com&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In rwsem_down_write_failed(), if there are active locks after we wake up
(i.e.  the lock got stolen from us), skip taking the wait_lock and go
back to sleep immediately.

Signed-off-by: Michel Lespinasse &lt;walken@google.com&gt;
Reviewed-by: Peter Hurley &lt;peter@hurleysoftware.com&gt;
Acked-by: Davidlohr Bueso &lt;davidlohr.bueso@hp.com&gt;
Acked-by: Rik van Riel &lt;riel@redhat.com&gt;
Signed-off-by: Linus Torvalds &lt;torvalds@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
</feed>
