<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/fs/btrfs/compression.c, branch v6.1</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>fs: fix leaked psi pressure state</title>
<updated>2022-11-08T23:57:25+00:00</updated>
<author>
<name>Johannes Weiner</name>
<email>hannes@cmpxchg.org</email>
</author>
<published>2022-11-03T21:34:31+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=82e60d00b753bb5cfecce22b8e952436b14d02a3'/>
<id>82e60d00b753bb5cfecce22b8e952436b14d02a3</id>
<content type='text'>
When psi annotations were added to to btrfs compression reads, the psi
state tracking over add_ra_bio_pages and btrfs_submit_compressed_read was
faulty.  A pressure state, once entered, is never left.  This results in
incorrectly elevated pressure, which triggers OOM kills.

pflags record the *previous* memstall state when we enter a new one.  The
code tried to initialize pflags to 1, and then optimize the leave call
when we either didn't enter a memstall, or were already inside a nested
stall.  However, there can be multiple PageWorkingset pages in the bio, at
which point it's that path itself that enters repeatedly and overwrites
pflags.  This causes us to miss the exit.

Enter the stall only once if needed, then unwind correctly.

erofs has the same problem, fix that up too.  And move the memstall exit
past submit_bio() to restore submit accounting originally added by
b8e24a9300b0 ("block: annotate refault stalls from IO submission").

Link: https://lkml.kernel.org/r/Y2UHRqthNUwuIQGS@cmpxchg.org
Fixes: 4088a47e78f9 ("btrfs: add manual PSI accounting for compressed reads")
Fixes: 99486c511f68 ("erofs: add manual PSI accounting for the compressed address space")
Fixes: 118f3663fbc6 ("block: remove PSI accounting from the bio layer")
Link: https://lore.kernel.org/r/d20a0a85-e415-cf78-27f9-77dd7a94bc8d@leemhuis.info/
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reported-by: Thorsten Leemhuis &lt;linux@leemhuis.info&gt;
Tested-by: Thorsten Leemhuis &lt;linux@leemhuis.info&gt;
Cc: Chao Yu &lt;chao@kernel.org&gt;
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: David Sterba &lt;dsterba@suse.com&gt;
Cc: Gao Xiang &lt;xiang@kernel.org&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Josef Bacik &lt;josef@toxicpanda.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
When psi annotations were added to to btrfs compression reads, the psi
state tracking over add_ra_bio_pages and btrfs_submit_compressed_read was
faulty.  A pressure state, once entered, is never left.  This results in
incorrectly elevated pressure, which triggers OOM kills.

pflags record the *previous* memstall state when we enter a new one.  The
code tried to initialize pflags to 1, and then optimize the leave call
when we either didn't enter a memstall, or were already inside a nested
stall.  However, there can be multiple PageWorkingset pages in the bio, at
which point it's that path itself that enters repeatedly and overwrites
pflags.  This causes us to miss the exit.

Enter the stall only once if needed, then unwind correctly.

erofs has the same problem, fix that up too.  And move the memstall exit
past submit_bio() to restore submit accounting originally added by
b8e24a9300b0 ("block: annotate refault stalls from IO submission").

Link: https://lkml.kernel.org/r/Y2UHRqthNUwuIQGS@cmpxchg.org
Fixes: 4088a47e78f9 ("btrfs: add manual PSI accounting for compressed reads")
Fixes: 99486c511f68 ("erofs: add manual PSI accounting for the compressed address space")
Fixes: 118f3663fbc6 ("block: remove PSI accounting from the bio layer")
Link: https://lore.kernel.org/r/d20a0a85-e415-cf78-27f9-77dd7a94bc8d@leemhuis.info/
Signed-off-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Reported-by: Thorsten Leemhuis &lt;linux@leemhuis.info&gt;
Tested-by: Thorsten Leemhuis &lt;linux@leemhuis.info&gt;
Cc: Chao Yu &lt;chao@kernel.org&gt;
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: Christoph Hellwig &lt;hch@lst.de&gt;
Cc: David Sterba &lt;dsterba@suse.com&gt;
Cc: Gao Xiang &lt;xiang@kernel.org&gt;
Cc: Jens Axboe &lt;axboe@kernel.dk&gt;
Cc: Josef Bacik &lt;josef@toxicpanda.com&gt;
Cc: Suren Baghdasaryan &lt;surenb@google.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm</title>
<updated>2022-10-11T00:53:04+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-10-11T00:53:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=27bc50fc90647bbf7b734c3fc306a5e61350da53'/>
<id>27bc50fc90647bbf7b734c3fc306a5e61350da53</id>
<content type='text'>
Pull MM updates from Andrew Morton:

 - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
   linux-next for a couple of months without, to my knowledge, any
   negative reports (or any positive ones, come to that).

 - Also the Maple Tree from Liam Howlett. An overlapping range-based
   tree for vmas. It it apparently slightly more efficient in its own
   right, but is mainly targeted at enabling work to reduce mmap_lock
   contention.

   Liam has identified a number of other tree users in the kernel which
   could be beneficially onverted to mapletrees.

   Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
   at [1]. This has yet to be addressed due to Liam's unfortunately
   timed vacation. He is now back and we'll get this fixed up.

 - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses
   clang-generated instrumentation to detect used-unintialized bugs down
   to the single bit level.

   KMSAN keeps finding bugs. New ones, as well as the legacy ones.

 - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
   memory into THPs.

 - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to
   support file/shmem-backed pages.

 - userfaultfd updates from Axel Rasmussen

 - zsmalloc cleanups from Alexey Romanov

 - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and
   memory-failure

 - Huang Ying adds enhancements to NUMA balancing memory tiering mode's
   page promotion, with a new way of detecting hot pages.

 - memcg updates from Shakeel Butt: charging optimizations and reduced
   memory consumption.

 - memcg cleanups from Kairui Song.

 - memcg fixes and cleanups from Johannes Weiner.

 - Vishal Moola provides more folio conversions

 - Zhang Yi removed ll_rw_block() :(

 - migration enhancements from Peter Xu

 - migration error-path bugfixes from Huang Ying

 - Aneesh Kumar added ability for a device driver to alter the memory
   tiering promotion paths. For optimizations by PMEM drivers, DRM
   drivers, etc.

 - vma merging improvements from Jakub Matěn.

 - NUMA hinting cleanups from David Hildenbrand.

 - xu xin added aditional userspace visibility into KSM merging
   activity.

 - THP &amp; KSM code consolidation from Qi Zheng.

 - more folio work from Matthew Wilcox.

 - KASAN updates from Andrey Konovalov.

 - DAMON cleanups from Kaixu Xia.

 - DAMON work from SeongJae Park: fixes, cleanups.

 - hugetlb sysfs cleanups from Muchun Song.

 - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.

Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1]

* tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits)
  hugetlb: allocate vma lock for all sharable vmas
  hugetlb: take hugetlb vma_lock when clearing vma_lock-&gt;vma pointer
  hugetlb: fix vma lock handling during split vma and range unmapping
  mglru: mm/vmscan.c: fix imprecise comments
  mm/mglru: don't sync disk for each aging cycle
  mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol
  mm: memcontrol: use do_memsw_account() in a few more places
  mm: memcontrol: deprecate swapaccounting=0 mode
  mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled
  mm/secretmem: remove reduntant return value
  mm/hugetlb: add available_huge_pages() func
  mm: remove unused inline functions from include/linux/mm_inline.h
  selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory
  selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd
  selftests/vm: add thp collapse shmem testing
  selftests/vm: add thp collapse file and tmpfs testing
  selftests/vm: modularize thp collapse memory operations
  selftests/vm: dedup THP helpers
  mm/khugepaged: add tracepoint to hpage_collapse_scan_file()
  mm/madvise: add file and shmem support to MADV_COLLAPSE
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull MM updates from Andrew Morton:

 - Yu Zhao's Multi-Gen LRU patches are here. They've been under test in
   linux-next for a couple of months without, to my knowledge, any
   negative reports (or any positive ones, come to that).

 - Also the Maple Tree from Liam Howlett. An overlapping range-based
   tree for vmas. It it apparently slightly more efficient in its own
   right, but is mainly targeted at enabling work to reduce mmap_lock
   contention.

   Liam has identified a number of other tree users in the kernel which
   could be beneficially onverted to mapletrees.

   Yu Zhao has identified a hard-to-hit but "easy to fix" lockdep splat
   at [1]. This has yet to be addressed due to Liam's unfortunately
   timed vacation. He is now back and we'll get this fixed up.

 - Dmitry Vyukov introduces KMSAN: the Kernel Memory Sanitizer. It uses
   clang-generated instrumentation to detect used-unintialized bugs down
   to the single bit level.

   KMSAN keeps finding bugs. New ones, as well as the legacy ones.

 - Yang Shi adds a userspace mechanism (madvise) to induce a collapse of
   memory into THPs.

 - Zach O'Keefe has expanded Yang Shi's madvise(MADV_COLLAPSE) to
   support file/shmem-backed pages.

 - userfaultfd updates from Axel Rasmussen

 - zsmalloc cleanups from Alexey Romanov

 - cleanups from Miaohe Lin: vmscan, hugetlb_cgroup, hugetlb and
   memory-failure

 - Huang Ying adds enhancements to NUMA balancing memory tiering mode's
   page promotion, with a new way of detecting hot pages.

 - memcg updates from Shakeel Butt: charging optimizations and reduced
   memory consumption.

 - memcg cleanups from Kairui Song.

 - memcg fixes and cleanups from Johannes Weiner.

 - Vishal Moola provides more folio conversions

 - Zhang Yi removed ll_rw_block() :(

 - migration enhancements from Peter Xu

 - migration error-path bugfixes from Huang Ying

 - Aneesh Kumar added ability for a device driver to alter the memory
   tiering promotion paths. For optimizations by PMEM drivers, DRM
   drivers, etc.

 - vma merging improvements from Jakub Matěn.

 - NUMA hinting cleanups from David Hildenbrand.

 - xu xin added aditional userspace visibility into KSM merging
   activity.

 - THP &amp; KSM code consolidation from Qi Zheng.

 - more folio work from Matthew Wilcox.

 - KASAN updates from Andrey Konovalov.

 - DAMON cleanups from Kaixu Xia.

 - DAMON work from SeongJae Park: fixes, cleanups.

 - hugetlb sysfs cleanups from Muchun Song.

 - Mike Kravetz fixes locking issues in hugetlbfs and in hugetlb core.

Link: https://lkml.kernel.org/r/CAOUHufZabH85CeUN-MEMgL8gJGzJEWUrkiM58JkTbBhh-jew0Q@mail.gmail.com [1]

* tag 'mm-stable-2022-10-08' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (555 commits)
  hugetlb: allocate vma lock for all sharable vmas
  hugetlb: take hugetlb vma_lock when clearing vma_lock-&gt;vma pointer
  hugetlb: fix vma lock handling during split vma and range unmapping
  mglru: mm/vmscan.c: fix imprecise comments
  mm/mglru: don't sync disk for each aging cycle
  mm: memcontrol: drop dead CONFIG_MEMCG_SWAP config symbol
  mm: memcontrol: use do_memsw_account() in a few more places
  mm: memcontrol: deprecate swapaccounting=0 mode
  mm: memcontrol: don't allocate cgroup swap arrays when memcg is disabled
  mm/secretmem: remove reduntant return value
  mm/hugetlb: add available_huge_pages() func
  mm: remove unused inline functions from include/linux/mm_inline.h
  selftests/vm: add selftest for MADV_COLLAPSE of uffd-minor memory
  selftests/vm: add file/shmem MADV_COLLAPSE selftest for cleared pmd
  selftests/vm: add thp collapse shmem testing
  selftests/vm: add thp collapse file and tmpfs testing
  selftests/vm: modularize thp collapse memory operations
  selftests/vm: dedup THP helpers
  mm/khugepaged: add tracepoint to hpage_collapse_scan_file()
  mm/madvise: add file and shmem support to MADV_COLLAPSE
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'for-6.1/block-2022-10-03' of git://git.kernel.dk/linux</title>
<updated>2022-10-07T16:19:14+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-10-07T16:19:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=513389809e138ae903b6ef43c1d5d2ffaf4dca17'/>
<id>513389809e138ae903b6ef43c1d5d2ffaf4dca17</id>
<content type='text'>
Pull block updates from Jens Axboe:

 - NVMe pull requests via Christoph:
      - handle number of queue changes in the TCP and RDMA drivers
        (Daniel Wagner)
      - allow changing the number of queues in nvmet (Daniel Wagner)
      - also consider host_iface when checking ip options (Daniel
        Wagner)
      - don't map pages which can't come from HIGHMEM (Fabio M. De
        Francesco)
      - avoid unnecessary flush bios in nvmet (Guixin Liu)
      - shrink and better pack the nvme_iod structure (Keith Busch)
      - add comment for unaligned "fake" nqn (Linjun Bao)
      - print actual source IP address through sysfs "address" attr
        (Martin Belanger)
      - various cleanups (Jackie Liu, Wolfram Sang, Genjian Zhang)
      - handle effects after freeing the request (Keith Busch)
      - copy firmware_rev on each init (Keith Busch)
      - restrict management ioctls to admin (Keith Busch)
      - ensure subsystem reset is single threaded (Keith Busch)
      - report the actual number of tagset maps in nvme-pci (Keith
        Busch)
      - small fabrics authentication fixups (Christoph Hellwig)
      - add common code for tagset allocation and freeing (Christoph
        Hellwig)
      - stop using the request_queue in nvmet (Christoph Hellwig)
      - set min_align_mask before calculating max_hw_sectors (Rishabh
        Bhatnagar)
      - send a rediscover uevent when a persistent discovery controller
        reconnects (Sagi Grimberg)
      - misc nvmet-tcp fixes (Varun Prakash, zhenwei pi)

 - MD pull request via Song:
      - Various raid5 fix and clean up, by Logan Gunthorpe and David
        Sloan.
      - Raid10 performance optimization, by Yu Kuai.

 - sbitmap wakeup hang fixes (Hugh, Keith, Jan, Yu)

 - IO scheduler switching quisce fix (Keith)

 - s390/dasd block driver updates (Stefan)

 - support for recovery for the ublk driver (ZiyangZhang)

 - rnbd drivers fixes and updates (Guoqing, Santosh, ye, Christoph)

 - blk-mq and null_blk map fixes (Bart)

 - various bcache fixes (Coly, Jilin, Jules)

 - nbd signal hang fix (Shigeru)

 - block writeback throttling fix (Yu)

 - optimize the passthrough mapping handling (me)

 - prepare block cgroups to being gendisk based (Christoph)

 - get rid of an old PSI hack in the block layer, moving it to the
   callers instead where it belongs (Christoph)

 - blk-throttle fixes and cleanups (Yu)

 - misc fixes and cleanups (Liu Shixin, Liu Song, Miaohe, Pankaj,
   Ping-Xiang, Wolfram, Saurabh, Li Jinlin, Li Lei, Lin, Li zeming,
   Miaohe, Bart, Coly, Gaosheng

* tag 'for-6.1/block-2022-10-03' of git://git.kernel.dk/linux: (162 commits)
  sbitmap: fix lockup while swapping
  block: add rationale for not using blk_mq_plug() when applicable
  block: adapt blk_mq_plug() to not plug for writes that require a zone lock
  s390/dasd: use blk_mq_alloc_disk
  blk-cgroup: don't update the blkg lookup hint in blkg_conf_prep
  nvmet: don't look at the request_queue in nvmet_bdev_set_limits
  nvmet: don't look at the request_queue in nvmet_bdev_zone_mgmt_emulate_all
  blk-mq: use quiesced elevator switch when reinitializing queues
  block: replace blk_queue_nowait with bdev_nowait
  nvme: remove nvme_ctrl_init_connect_q
  nvme-loop: use the tagset alloc/free helpers
  nvme-loop: store the generic nvme_ctrl in set-&gt;driver_data
  nvme-loop: initialize sqsize later
  nvme-fc: use the tagset alloc/free helpers
  nvme-fc: store the generic nvme_ctrl in set-&gt;driver_data
  nvme-fc: keep ctrl-&gt;sqsize in sync with opts-&gt;queue_size
  nvme-rdma: use the tagset alloc/free helpers
  nvme-rdma: store the generic nvme_ctrl in set-&gt;driver_data
  nvme-tcp: use the tagset alloc/free helpers
  nvme-tcp: store the generic nvme_ctrl in set-&gt;driver_data
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull block updates from Jens Axboe:

 - NVMe pull requests via Christoph:
      - handle number of queue changes in the TCP and RDMA drivers
        (Daniel Wagner)
      - allow changing the number of queues in nvmet (Daniel Wagner)
      - also consider host_iface when checking ip options (Daniel
        Wagner)
      - don't map pages which can't come from HIGHMEM (Fabio M. De
        Francesco)
      - avoid unnecessary flush bios in nvmet (Guixin Liu)
      - shrink and better pack the nvme_iod structure (Keith Busch)
      - add comment for unaligned "fake" nqn (Linjun Bao)
      - print actual source IP address through sysfs "address" attr
        (Martin Belanger)
      - various cleanups (Jackie Liu, Wolfram Sang, Genjian Zhang)
      - handle effects after freeing the request (Keith Busch)
      - copy firmware_rev on each init (Keith Busch)
      - restrict management ioctls to admin (Keith Busch)
      - ensure subsystem reset is single threaded (Keith Busch)
      - report the actual number of tagset maps in nvme-pci (Keith
        Busch)
      - small fabrics authentication fixups (Christoph Hellwig)
      - add common code for tagset allocation and freeing (Christoph
        Hellwig)
      - stop using the request_queue in nvmet (Christoph Hellwig)
      - set min_align_mask before calculating max_hw_sectors (Rishabh
        Bhatnagar)
      - send a rediscover uevent when a persistent discovery controller
        reconnects (Sagi Grimberg)
      - misc nvmet-tcp fixes (Varun Prakash, zhenwei pi)

 - MD pull request via Song:
      - Various raid5 fix and clean up, by Logan Gunthorpe and David
        Sloan.
      - Raid10 performance optimization, by Yu Kuai.

 - sbitmap wakeup hang fixes (Hugh, Keith, Jan, Yu)

 - IO scheduler switching quisce fix (Keith)

 - s390/dasd block driver updates (Stefan)

 - support for recovery for the ublk driver (ZiyangZhang)

 - rnbd drivers fixes and updates (Guoqing, Santosh, ye, Christoph)

 - blk-mq and null_blk map fixes (Bart)

 - various bcache fixes (Coly, Jilin, Jules)

 - nbd signal hang fix (Shigeru)

 - block writeback throttling fix (Yu)

 - optimize the passthrough mapping handling (me)

 - prepare block cgroups to being gendisk based (Christoph)

 - get rid of an old PSI hack in the block layer, moving it to the
   callers instead where it belongs (Christoph)

 - blk-throttle fixes and cleanups (Yu)

 - misc fixes and cleanups (Liu Shixin, Liu Song, Miaohe, Pankaj,
   Ping-Xiang, Wolfram, Saurabh, Li Jinlin, Li Lei, Lin, Li zeming,
   Miaohe, Bart, Coly, Gaosheng

* tag 'for-6.1/block-2022-10-03' of git://git.kernel.dk/linux: (162 commits)
  sbitmap: fix lockup while swapping
  block: add rationale for not using blk_mq_plug() when applicable
  block: adapt blk_mq_plug() to not plug for writes that require a zone lock
  s390/dasd: use blk_mq_alloc_disk
  blk-cgroup: don't update the blkg lookup hint in blkg_conf_prep
  nvmet: don't look at the request_queue in nvmet_bdev_set_limits
  nvmet: don't look at the request_queue in nvmet_bdev_zone_mgmt_emulate_all
  blk-mq: use quiesced elevator switch when reinitializing queues
  block: replace blk_queue_nowait with bdev_nowait
  nvme: remove nvme_ctrl_init_connect_q
  nvme-loop: use the tagset alloc/free helpers
  nvme-loop: store the generic nvme_ctrl in set-&gt;driver_data
  nvme-loop: initialize sqsize later
  nvme-fc: use the tagset alloc/free helpers
  nvme-fc: store the generic nvme_ctrl in set-&gt;driver_data
  nvme-fc: keep ctrl-&gt;sqsize in sync with opts-&gt;queue_size
  nvme-rdma: use the tagset alloc/free helpers
  nvme-rdma: store the generic nvme_ctrl in set-&gt;driver_data
  nvme-tcp: use the tagset alloc/free helpers
  nvme-tcp: store the generic nvme_ctrl in set-&gt;driver_data
  ...
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: unify the lock/unlock extent variants</title>
<updated>2022-09-26T10:28:05+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>josef@toxicpanda.com</email>
</author>
<published>2022-09-09T21:53:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=570eb97bace8743f45c6830b64c7a0889d0915dd'/>
<id>570eb97bace8743f45c6830b64c7a0889d0915dd</id>
<content type='text'>
We have two variants of lock/unlock extent, one set that takes a cached
state, another that does not.  This is slightly annoying, and generally
speaking there are only a few places where we don't have a cached state.
Simplify this by making lock_extent/unlock_extent the only variant and
make it take a cached state, then convert all the callers appropriately.

Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We have two variants of lock/unlock extent, one set that takes a cached
state, another that does not.  This is slightly annoying, and generally
speaking there are only a few places where we don't have a cached state.
Simplify this by making lock_extent/unlock_extent the only variant and
make it take a cached state, then convert all the callers appropriately.

Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: rename clean_io_failure and remove extraneous args</title>
<updated>2022-09-26T10:28:02+00:00</updated>
<author>
<name>Josef Bacik</name>
<email>josef@toxicpanda.com</email>
</author>
<published>2022-09-09T21:53:14+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0d0a762c419a98dcf40db20b7f2681d7b7191d46'/>
<id>0d0a762c419a98dcf40db20b7f2681d7b7191d46</id>
<content type='text'>
This is exported, so rename it to btrfs_clean_io_failure.  Additionally
we are passing in the io tree's and such from the inode, so instead of
doing all that simply pass in the inode itself and get all the
components we need directly inside of btrfs_clean_io_failure.

Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
This is exported, so rename it to btrfs_clean_io_failure.  Additionally
we are passing in the io tree's and such from the inode, so instead of
doing all that simply pass in the inode itself and get all the
components we need directly inside of btrfs_clean_io_failure.

Signed-off-by: Josef Bacik &lt;josef@toxicpanda.com&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: give struct btrfs_bio a real end_io handler</title>
<updated>2022-09-26T10:27:59+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2022-08-06T08:03:26+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=917f32a235017055104332487cb749c5f119c2ae'/>
<id>917f32a235017055104332487cb749c5f119c2ae</id>
<content type='text'>
Currently btrfs_bio end I/O handling is a bit of a mess.  The bi_end_io
handler and bi_private pointer of the embedded struct bio are both used
to handle the completion of the high-level btrfs_bio and for the I/O
completion for the low-level device that the embedded bio ends up being
sent to.

To support this bi_end_io and bi_private are saved into the
btrfs_io_context structure and then restored after the bio sent to the
underlying device has completed the actual I/O.

Untangle this by adding an end I/O handler and private data to struct
btrfs_bio for the high-level btrfs_bio based completions, and leave the
actual bio bi_end_io handler and bi_private pointer entirely to the
low-level device I/O.

Reviewed-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Reviewed-by: Johannes Thumshirn &lt;johannes.thumshirn@wdc.com&gt;
Reviewed-by: Anand Jain &lt;anand.jain@oracle.com&gt;
Tested-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Tested-by: Johannes Thumshirn &lt;johannes.thumshirn@wdc.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Currently btrfs_bio end I/O handling is a bit of a mess.  The bi_end_io
handler and bi_private pointer of the embedded struct bio are both used
to handle the completion of the high-level btrfs_bio and for the I/O
completion for the low-level device that the embedded bio ends up being
sent to.

To support this bi_end_io and bi_private are saved into the
btrfs_io_context structure and then restored after the bio sent to the
underlying device has completed the actual I/O.

Untangle this by adding an end I/O handler and private data to struct
btrfs_bio for the high-level btrfs_bio based completions, and leave the
actual bio bi_end_io handler and bi_private pointer entirely to the
low-level device I/O.

Reviewed-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Reviewed-by: Johannes Thumshirn &lt;johannes.thumshirn@wdc.com&gt;
Reviewed-by: Anand Jain &lt;anand.jain@oracle.com&gt;
Tested-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Tested-by: Johannes Thumshirn &lt;johannes.thumshirn@wdc.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: pass the operation to btrfs_bio_alloc</title>
<updated>2022-09-26T10:27:58+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2022-08-06T08:03:22+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6b42f5e3439d64a725e022ef1711dd00981e68a9'/>
<id>6b42f5e3439d64a725e022ef1711dd00981e68a9</id>
<content type='text'>
Pass the operation to btrfs_bio_alloc, matching what bio_alloc_bioset
set does.

Reviewed-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Reviewed-by: Johannes Thumshirn &lt;johannes.thumshirn@wdc.com&gt;
Reviewed-by: Anand Jain &lt;anand.jain@oracle.com&gt;
Tested-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Tested-by: Johannes Thumshirn &lt;johannes.thumshirn@wdc.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pass the operation to btrfs_bio_alloc, matching what bio_alloc_bioset
set does.

Reviewed-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Reviewed-by: Johannes Thumshirn &lt;johannes.thumshirn@wdc.com&gt;
Reviewed-by: Anand Jain &lt;anand.jain@oracle.com&gt;
Tested-by: Nikolay Borisov &lt;nborisov@suse.com&gt;
Tested-by: Johannes Thumshirn &lt;johannes.thumshirn@wdc.com&gt;
Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Reviewed-by: David Sterba &lt;dsterba@suse.com&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: add manual PSI accounting for compressed reads</title>
<updated>2022-09-20T14:24:38+00:00</updated>
<author>
<name>Christoph Hellwig</name>
<email>hch@lst.de</email>
</author>
<published>2022-09-15T09:41:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=4088a47e78f95a5fea683cf67e0be006b13831fd'/>
<id>4088a47e78f95a5fea683cf67e0be006b13831fd</id>
<content type='text'>
btrfs compressed reads try to always read the entire compressed chunk,
even if only a subset is requested.  Currently this is covered by the
magic PSI accounting underneath submit_bio, but that is about to go
away. Instead add manual psi_memstall_{enter,leave} annotations.

Note that for readahead this really should be using readahead_expand,
but the additionals reads are also done for plain -&gt;read_folio where
readahead_expand can't work, so this overall logic is left as-is for
now.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Acked-by: David Sterba &lt;dsterba@suse.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Link: https://lore.kernel.org/r/20220915094200.139713-4-hch@lst.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
btrfs compressed reads try to always read the entire compressed chunk,
even if only a subset is requested.  Currently this is covered by the
magic PSI accounting underneath submit_bio, but that is about to go
away. Instead add manual psi_memstall_{enter,leave} annotations.

Note that for readahead this really should be using readahead_expand,
but the additionals reads are also done for plain -&gt;read_folio where
readahead_expand can't work, so this overall logic is left as-is for
now.

Signed-off-by: Christoph Hellwig &lt;hch@lst.de&gt;
Acked-by: David Sterba &lt;dsterba@suse.com&gt;
Acked-by: Johannes Weiner &lt;hannes@cmpxchg.org&gt;
Link: https://lore.kernel.org/r/20220915094200.139713-4-hch@lst.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>btrfs: convert end_compressed_writeback() to use filemap_get_folios()</title>
<updated>2022-09-12T03:26:03+00:00</updated>
<author>
<name>Vishal Moola (Oracle)</name>
<email>vishal.moola@gmail.com</email>
</author>
<published>2022-08-24T00:40:19+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=a75b81c3f63b3f467d5f54942c661bb79fe07505'/>
<id>a75b81c3f63b3f467d5f54942c661bb79fe07505</id>
<content type='text'>
Converted function to use folios throughout.  This is in preparation for
the removal of find_get_pages_contig().  Now also supports large folios.

Since we may receive more than nr_pages pages, nr_pages may underflow. 
Since nr_pages &gt; 0 is equivalent to index &lt;= end_index, we replaced it
with this check instead.

Also this function does not care about the pages being contiguous so we
can just use filemap_get_folios() to be more efficient.

Link: https://lkml.kernel.org/r/20220824004023.77310-4-vishal.moola@gmail.com
Signed-off-by: Vishal Moola (Oracle) &lt;vishal.moola@gmail.com&gt;
Acked-by: David Sterba &lt;dsterba@suse.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: David Sterba &lt;dsterb@suse.com&gt;
Cc: Josef Bacik &lt;josef@toxicpanda.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Ryusuke Konishi &lt;konishi.ryusuke@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Converted function to use folios throughout.  This is in preparation for
the removal of find_get_pages_contig().  Now also supports large folios.

Since we may receive more than nr_pages pages, nr_pages may underflow. 
Since nr_pages &gt; 0 is equivalent to index &lt;= end_index, we replaced it
with this check instead.

Also this function does not care about the pages being contiguous so we
can just use filemap_get_folios() to be more efficient.

Link: https://lkml.kernel.org/r/20220824004023.77310-4-vishal.moola@gmail.com
Signed-off-by: Vishal Moola (Oracle) &lt;vishal.moola@gmail.com&gt;
Acked-by: David Sterba &lt;dsterba@suse.com&gt;
Cc: Al Viro &lt;viro@zeniv.linux.org.uk&gt;
Cc: Chris Mason &lt;clm@fb.com&gt;
Cc: David Sterba &lt;dsterb@suse.com&gt;
Cc: Josef Bacik &lt;josef@toxicpanda.com&gt;
Cc: Matthew Wilcox &lt;willy@infradead.org&gt;
Cc: Ryusuke Konishi &lt;konishi.ryusuke@gmail.com&gt;
Signed-off-by: Andrew Morton &lt;akpm@linux-foundation.org&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>Merge tag 'for-5.20-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux</title>
<updated>2022-08-03T21:54:52+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2022-08-03T21:54:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=353767e4aaeb7bc818273dfacbb01dd36a9db47a'/>
<id>353767e4aaeb7bc818273dfacbb01dd36a9db47a</id>
<content type='text'>
Pull btrfs updates from David Sterba:
 "This brings some long awaited changes, the send protocol bump,
  otherwise lots of small improvements and fixes. The main core part is
  reworking bio handling, cleaning up the submission and endio and
  improving error handling.

  There are some changes outside of btrfs adding helpers or updating
  API, listed at the end of the changelog.

  Features:

   - sysfs:
      - export chunk size, in debug mode add tunable for setting its size
      - show zoned among features (was only in debug mode)
      - show commit stats (number, last/max/total duration)

   - send protocol updated to 2
      - new commands:
         - ability write larger data chunks than 64K
         - send raw compressed extents (uses the encoded data ioctls),
           ie. no decompression on send side, no compression needed on
           receive side if supported
         - send 'otime' (inode creation time) among other timestamps
         - send file attributes (a.k.a file flags and xflags)
      - this is first version bump, backward compatibility on send and
        receive side is provided
      - there are still some known and wanted commands that will be
        implemented in the near future, another version bump will be
        needed, however we want to minimize that to avoid causing
        usability issues

   - print checksum type and implementation at mount time

   - don't print some messages at mount (mentioned as people asked about
     it), we want to print messages namely for new features so let's
     make some space for that
      - big metadata - this has been supported for a long time and is
        not a feature that's worth mentioning
      - skinny metadata - same reason, set by default by mkfs

  Performance improvements:

   - reduced amount of reserved metadata for delayed items
      - when inserted items can be batched into one leaf
      - when deleting batched directory index items
      - when deleting delayed items used for deletion
      - overall improved count of files/sec, decreased subvolume lock
        contention

   - metadata item access bounds checker micro-optimized, with a few
     percent of improved runtime for metadata-heavy operations

   - increase direct io limit for read to 256 sectors, improved
     throughput by 3x on sample workload

  Notable fixes:

   - raid56
      - reduce parity writes, skip sectors of stripe when there are no
        data updates
      - restore reading from on-disk data instead of using stripe cache,
        this reduces chances to damage correct data due to RMW cycle

   - refuse to replay log with unknown incompat read-only feature bit
     set

   - zoned
      - fix page locking when COW fails in the middle of allocation
      - improved tracking of active zones, ZNS drives may limit the
        number and there are ENOSPC errors due to that limit and not
        actual lack of space
      - adjust maximum extent size for zone append so it does not cause
        late ENOSPC due to underreservation

   - mirror reading error messages show the mirror number

   - don't fallback to buffered IO for NOWAIT direct IO writes, we don't
     have the NOWAIT semantics for buffered io yet

   - send, fix sending link commands for existing file paths when there
     are deleted and created hardlinks for same files

   - repair all mirrors for profiles with more than 1 copy (raid1c34)

   - fix repair of compressed extents, unify where error detection and
     repair happen

  Core changes:

   - bio completion cleanups
      - don't double defer compression bios
      - simplify endio workqueues
      - add more data to btrfs_bio to avoid allocation for read requests
      - rework bio error handling so it's same what block layer does,
        the submission works and errors are consumed in endio
      - when asynchronous bio offload fails fall back to synchronous
        checksum calculation to avoid errors under writeback or memory
        pressure

   - new trace points
      - raid56 events
      - ordered extent operations

   - super block log_root_transid deprecated (never used)

   - mixed_backref and big_metadata sysfs feature files removed, they've
     been default for sufficiently long time, there are no known users
     and mixed_backref could be confused with mixed_groups

  Non-btrfs changes, API updates:

   - minor highmem API update to cover const arguments

   - switch all kmap/kmap_atomic to kmap_local

   - remove redundant flush_dcache_page()

   - address_space_operations::writepage callback removed

   - add bdev_max_segments() helper"

* tag 'for-5.20-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (163 commits)
  btrfs: don't call btrfs_page_set_checked in finish_compressed_bio_read
  btrfs: fix repair of compressed extents
  btrfs: remove the start argument to check_data_csum and export
  btrfs: pass a btrfs_bio to btrfs_repair_one_sector
  btrfs: simplify the pending I/O counting in struct compressed_bio
  btrfs: repair all known bad mirrors
  btrfs: merge btrfs_dev_stat_print_on_error with its only caller
  btrfs: join running log transaction when logging new name
  btrfs: simplify error handling in btrfs_lookup_dentry
  btrfs: send: always use the rbtree based inode ref management infrastructure
  btrfs: send: fix sending link commands for existing file paths
  btrfs: send: introduce recorded_ref_alloc and recorded_ref_free
  btrfs: zoned: wait until zone is finished when allocation didn't progress
  btrfs: zoned: write out partially allocated region
  btrfs: zoned: activate necessary block group
  btrfs: zoned: activate metadata block group on flush_space
  btrfs: zoned: disable metadata overcommit for zoned
  btrfs: zoned: introduce space_info-&gt;active_total_bytes
  btrfs: zoned: finish least available block group on data bg allocation
  btrfs: let can_allocate_chunk return error
  ...
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull btrfs updates from David Sterba:
 "This brings some long awaited changes, the send protocol bump,
  otherwise lots of small improvements and fixes. The main core part is
  reworking bio handling, cleaning up the submission and endio and
  improving error handling.

  There are some changes outside of btrfs adding helpers or updating
  API, listed at the end of the changelog.

  Features:

   - sysfs:
      - export chunk size, in debug mode add tunable for setting its size
      - show zoned among features (was only in debug mode)
      - show commit stats (number, last/max/total duration)

   - send protocol updated to 2
      - new commands:
         - ability write larger data chunks than 64K
         - send raw compressed extents (uses the encoded data ioctls),
           ie. no decompression on send side, no compression needed on
           receive side if supported
         - send 'otime' (inode creation time) among other timestamps
         - send file attributes (a.k.a file flags and xflags)
      - this is first version bump, backward compatibility on send and
        receive side is provided
      - there are still some known and wanted commands that will be
        implemented in the near future, another version bump will be
        needed, however we want to minimize that to avoid causing
        usability issues

   - print checksum type and implementation at mount time

   - don't print some messages at mount (mentioned as people asked about
     it), we want to print messages namely for new features so let's
     make some space for that
      - big metadata - this has been supported for a long time and is
        not a feature that's worth mentioning
      - skinny metadata - same reason, set by default by mkfs

  Performance improvements:

   - reduced amount of reserved metadata for delayed items
      - when inserted items can be batched into one leaf
      - when deleting batched directory index items
      - when deleting delayed items used for deletion
      - overall improved count of files/sec, decreased subvolume lock
        contention

   - metadata item access bounds checker micro-optimized, with a few
     percent of improved runtime for metadata-heavy operations

   - increase direct io limit for read to 256 sectors, improved
     throughput by 3x on sample workload

  Notable fixes:

   - raid56
      - reduce parity writes, skip sectors of stripe when there are no
        data updates
      - restore reading from on-disk data instead of using stripe cache,
        this reduces chances to damage correct data due to RMW cycle

   - refuse to replay log with unknown incompat read-only feature bit
     set

   - zoned
      - fix page locking when COW fails in the middle of allocation
      - improved tracking of active zones, ZNS drives may limit the
        number and there are ENOSPC errors due to that limit and not
        actual lack of space
      - adjust maximum extent size for zone append so it does not cause
        late ENOSPC due to underreservation

   - mirror reading error messages show the mirror number

   - don't fallback to buffered IO for NOWAIT direct IO writes, we don't
     have the NOWAIT semantics for buffered io yet

   - send, fix sending link commands for existing file paths when there
     are deleted and created hardlinks for same files

   - repair all mirrors for profiles with more than 1 copy (raid1c34)

   - fix repair of compressed extents, unify where error detection and
     repair happen

  Core changes:

   - bio completion cleanups
      - don't double defer compression bios
      - simplify endio workqueues
      - add more data to btrfs_bio to avoid allocation for read requests
      - rework bio error handling so it's same what block layer does,
        the submission works and errors are consumed in endio
      - when asynchronous bio offload fails fall back to synchronous
        checksum calculation to avoid errors under writeback or memory
        pressure

   - new trace points
      - raid56 events
      - ordered extent operations

   - super block log_root_transid deprecated (never used)

   - mixed_backref and big_metadata sysfs feature files removed, they've
     been default for sufficiently long time, there are no known users
     and mixed_backref could be confused with mixed_groups

  Non-btrfs changes, API updates:

   - minor highmem API update to cover const arguments

   - switch all kmap/kmap_atomic to kmap_local

   - remove redundant flush_dcache_page()

   - address_space_operations::writepage callback removed

   - add bdev_max_segments() helper"

* tag 'for-5.20-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: (163 commits)
  btrfs: don't call btrfs_page_set_checked in finish_compressed_bio_read
  btrfs: fix repair of compressed extents
  btrfs: remove the start argument to check_data_csum and export
  btrfs: pass a btrfs_bio to btrfs_repair_one_sector
  btrfs: simplify the pending I/O counting in struct compressed_bio
  btrfs: repair all known bad mirrors
  btrfs: merge btrfs_dev_stat_print_on_error with its only caller
  btrfs: join running log transaction when logging new name
  btrfs: simplify error handling in btrfs_lookup_dentry
  btrfs: send: always use the rbtree based inode ref management infrastructure
  btrfs: send: fix sending link commands for existing file paths
  btrfs: send: introduce recorded_ref_alloc and recorded_ref_free
  btrfs: zoned: wait until zone is finished when allocation didn't progress
  btrfs: zoned: write out partially allocated region
  btrfs: zoned: activate necessary block group
  btrfs: zoned: activate metadata block group on flush_space
  btrfs: zoned: disable metadata overcommit for zoned
  btrfs: zoned: introduce space_info-&gt;active_total_bytes
  btrfs: zoned: finish least available block group on data bg allocation
  btrfs: let can_allocate_chunk return error
  ...
</pre>
</div>
</content>
</entry>
</feed>
