<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/io_uring/opdef.c, branch v6.14</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>Merge tag 'for-6.13-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux</title>
<updated>2025-01-09T18:16:45+00:00</updated>
<author>
<name>Linus Torvalds</name>
<email>torvalds@linux-foundation.org</email>
</author>
<published>2025-01-09T18:16:45+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=643e2e259c2b25a2af0ae4c23c6e16586d9fd19c'/>
<id>643e2e259c2b25a2af0ae4c23c6e16586d9fd19c</id>
<content type='text'>
Pull btrfs fixes from David Sterba:
 "A few more fixes.

  Besides the one-liners in Btrfs there's fix to the io_uring and
  encoded read integration (added in this development cycle). The update
  to io_uring provides more space for the ongoing command that is then
  used in Btrfs to handle some cases.

   - io_uring and encoded read:
       - provide stable storage for io_uring command data
       - make a copy of encoded read ioctl call, reuse that in case the
         call would block and will be called again

   - properly initialize zlib context for hardware compression on s390

   - fix max extent size calculation on filesystems with non-zoned
     devices

   - fix crash in scrub on crafted image due to invalid extent tree"

* tag 'for-6.13-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: zlib: fix avail_in bytes for s390 zlib HW compression path
  btrfs: zoned: calculate max_extent_size properly on non-zoned setup
  btrfs: avoid NULL pointer dereference if no valid extent tree
  btrfs: don't read from userspace twice in btrfs_uring_encoded_read()
  io_uring: add io_uring_cmd_get_async_data helper
  io_uring/cmd: add per-op data to struct io_uring_cmd_data
  io_uring/cmd: rename struct uring_cache to io_uring_cmd_data
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Pull btrfs fixes from David Sterba:
 "A few more fixes.

  Besides the one-liners in Btrfs there's fix to the io_uring and
  encoded read integration (added in this development cycle). The update
  to io_uring provides more space for the ongoing command that is then
  used in Btrfs to handle some cases.

   - io_uring and encoded read:
       - provide stable storage for io_uring command data
       - make a copy of encoded read ioctl call, reuse that in case the
         call would block and will be called again

   - properly initialize zlib context for hardware compression on s390

   - fix max extent size calculation on filesystems with non-zoned
     devices

   - fix crash in scrub on crafted image due to invalid extent tree"

* tag 'for-6.13-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: zlib: fix avail_in bytes for s390 zlib HW compression path
  btrfs: zoned: calculate max_extent_size properly on non-zoned setup
  btrfs: avoid NULL pointer dereference if no valid extent tree
  btrfs: don't read from userspace twice in btrfs_uring_encoded_read()
  io_uring: add io_uring_cmd_get_async_data helper
  io_uring/cmd: add per-op data to struct io_uring_cmd_data
  io_uring/cmd: rename struct uring_cache to io_uring_cmd_data
</pre>
</div>
</content>
</entry>
<entry>
<title>io_uring/cmd: rename struct uring_cache to io_uring_cmd_data</title>
<updated>2025-01-06T12:51:05+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2025-01-03T15:02:23+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=dadf03cfd4eaa09f1d0e8b2521de1e11d3e3bec1'/>
<id>dadf03cfd4eaa09f1d0e8b2521de1e11d3e3bec1</id>
<content type='text'>
In preparation for making this more generically available for
-&gt;uring_cmd() usage that needs stable command data, rename it and move
it to io_uring/cmd.h instead.

Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In preparation for making this more generically available for
-&gt;uring_cmd() usage that needs stable command data, rename it and move
it to io_uring/cmd.h instead.

Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: David Sterba &lt;dsterba@suse.com&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>io_uring/splice: open code 2nd direct file assignment</title>
<updated>2024-10-29T19:43:28+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2024-10-28T14:03:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=743fb58a35cde8fe27b07ee5a985ae76563845e3'/>
<id>743fb58a35cde8fe27b07ee5a985ae76563845e3</id>
<content type='text'>
In preparation for not pinning the whole registered file table, open
code the second potential direct file assignment. This will be handled
by appropriate helpers in the future, for now just do it manually.

Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In preparation for not pinning the whole registered file table, open
code the second potential direct file assignment. This will be handled
by appropriate helpers in the future, for now just do it manually.

Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>io_uring: Fix probe of disabled operations</title>
<updated>2024-06-19T14:58:00+00:00</updated>
<author>
<name>Gabriel Krisman Bertazi</name>
<email>krisman@suse.de</email>
</author>
<published>2024-06-19T02:06:18+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=3e05b222382ec67dce7358d50b6006e91d028d8b'/>
<id>3e05b222382ec67dce7358d50b6006e91d028d8b</id>
<content type='text'>
io_probe checks io_issue_def-&gt;not_supported, but we never really set
that field, as we mark non-supported functions through a specific -&gt;prep
handler.  This means we end up returning IO_URING_OP_SUPPORTED, even for
disabled operations.  Fix it by just checking the prep handler itself.

Fixes: 66f4af93da57 ("io_uring: add support for probing opcodes")
Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Link: https://lore.kernel.org/r/20240619020620.5301-2-krisman@suse.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
io_probe checks io_issue_def-&gt;not_supported, but we never really set
that field, as we mark non-supported functions through a specific -&gt;prep
handler.  This means we end up returning IO_URING_OP_SUPPORTED, even for
disabled operations.  Fix it by just checking the prep handler itself.

Fixes: 66f4af93da57 ("io_uring: add support for probing opcodes")
Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Link: https://lore.kernel.org/r/20240619020620.5301-2-krisman@suse.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>io_uring: Introduce IORING_OP_LISTEN</title>
<updated>2024-06-19T13:57:21+00:00</updated>
<author>
<name>Gabriel Krisman Bertazi</name>
<email>krisman@suse.de</email>
</author>
<published>2024-06-14T16:30:47+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ff140cc8628abfb1755691d16cfa8788d8820ef7'/>
<id>ff140cc8628abfb1755691d16cfa8788d8820ef7</id>
<content type='text'>
IORING_OP_LISTEN provides the semantic of listen(2) via io_uring.  While
this is an essentially synchronous system call, the main point is to
enable a network path to execute fully with io_uring registered and
descriptorless files.

Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Link: https://lore.kernel.org/r/20240614163047.31581-4-krisman@suse.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
IORING_OP_LISTEN provides the semantic of listen(2) via io_uring.  While
this is an essentially synchronous system call, the main point is to
enable a network path to execute fully with io_uring registered and
descriptorless files.

Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Link: https://lore.kernel.org/r/20240614163047.31581-4-krisman@suse.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>io_uring: Introduce IORING_OP_BIND</title>
<updated>2024-06-19T13:57:21+00:00</updated>
<author>
<name>Gabriel Krisman Bertazi</name>
<email>krisman@suse.de</email>
</author>
<published>2024-06-14T16:30:46+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=7481fd93fa0a851740e26026485f56a1305454ce'/>
<id>7481fd93fa0a851740e26026485f56a1305454ce</id>
<content type='text'>
IORING_OP_BIND provides the semantic of bind(2) via io_uring.  While
this is an essentially synchronous system call, the main point is to
enable a network path to execute fully with io_uring registered and
descriptorless files.

Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Link: https://lore.kernel.org/r/20240614163047.31581-3-krisman@suse.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
IORING_OP_BIND provides the semantic of bind(2) via io_uring.  While
this is an essentially synchronous system call, the main point is to
enable a network path to execute fully with io_uring registered and
descriptorless files.

Signed-off-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Link: https://lore.kernel.org/r/20240614163047.31581-3-krisman@suse.de
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>io_uring/rw: Free iovec before cleaning async data</title>
<updated>2024-05-30T14:33:01+00:00</updated>
<author>
<name>Breno Leitao</name>
<email>leitao@debian.org</email>
</author>
<published>2024-05-30T14:23:39+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e112311615a24e1618a591c73506571dc304eb8d'/>
<id>e112311615a24e1618a591c73506571dc304eb8d</id>
<content type='text'>
kmemleak shows that there is a memory leak in io_uring read operation,
where a buffer is allocated at iovec import, but never de-allocated.

The memory is allocated at io_async_rw-&gt;free_iovec, but, then
io_async_rw is kfreed, taking the allocated memory with it. I saw this
happening when the read operation fails with -11 (EAGAIN).

This is the kmemleak splat.

    unreferenced object 0xffff8881da591c00 (size 256):
...
      backtrace (crc 7a15bdee):
	[&lt;00000000256f2de4&gt;] __kmalloc+0x2d6/0x410
	[&lt;000000007a9f5fc7&gt;] iovec_from_user.part.0+0xc6/0x160
	[&lt;00000000cecdf83a&gt;] __import_iovec+0x50/0x220
	[&lt;00000000d1d586a2&gt;] __io_import_iovec+0x13d/0x220
	[&lt;0000000054ee9bd2&gt;] io_prep_rw+0x186/0x340
	[&lt;00000000a9c0372d&gt;] io_prep_rwv+0x31/0x120
	[&lt;000000001d1170b9&gt;] io_prep_readv+0xe/0x30
	[&lt;0000000070b8eb67&gt;] io_submit_sqes+0x1bd/0x780
	[&lt;00000000812496d4&gt;] __do_sys_io_uring_enter+0x3ed/0x5b0
	[&lt;0000000081499602&gt;] do_syscall_64+0x5d/0x170
	[&lt;00000000de1c5a4d&gt;] entry_SYSCALL_64_after_hwframe+0x76/0x7e

This occurs because the async data cleanup functions are not set for
read/write operations. As a result, the potentially allocated iovec in
the rw async data is not freed before the async data is released,
leading to a memory leak.

With this following patch, kmemleak does not show the leaked memory
anymore, and all liburing tests pass.

Fixes: a9165b83c193 ("io_uring/rw: always setup io_async_rw for read/write requests")
Signed-off-by: Breno Leitao &lt;leitao@debian.org&gt;
Link: https://lore.kernel.org/r/20240530142340.1248216-1-leitao@debian.org
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
kmemleak shows that there is a memory leak in io_uring read operation,
where a buffer is allocated at iovec import, but never de-allocated.

The memory is allocated at io_async_rw-&gt;free_iovec, but, then
io_async_rw is kfreed, taking the allocated memory with it. I saw this
happening when the read operation fails with -11 (EAGAIN).

This is the kmemleak splat.

    unreferenced object 0xffff8881da591c00 (size 256):
...
      backtrace (crc 7a15bdee):
	[&lt;00000000256f2de4&gt;] __kmalloc+0x2d6/0x410
	[&lt;000000007a9f5fc7&gt;] iovec_from_user.part.0+0xc6/0x160
	[&lt;00000000cecdf83a&gt;] __import_iovec+0x50/0x220
	[&lt;00000000d1d586a2&gt;] __io_import_iovec+0x13d/0x220
	[&lt;0000000054ee9bd2&gt;] io_prep_rw+0x186/0x340
	[&lt;00000000a9c0372d&gt;] io_prep_rwv+0x31/0x120
	[&lt;000000001d1170b9&gt;] io_prep_readv+0xe/0x30
	[&lt;0000000070b8eb67&gt;] io_submit_sqes+0x1bd/0x780
	[&lt;00000000812496d4&gt;] __do_sys_io_uring_enter+0x3ed/0x5b0
	[&lt;0000000081499602&gt;] do_syscall_64+0x5d/0x170
	[&lt;00000000de1c5a4d&gt;] entry_SYSCALL_64_after_hwframe+0x76/0x7e

This occurs because the async data cleanup functions are not set for
read/write operations. As a result, the potentially allocated iovec in
the rw async data is not freed before the async data is released,
leading to a memory leak.

With this following patch, kmemleak does not show the leaked memory
anymore, and all liburing tests pass.

Fixes: a9165b83c193 ("io_uring/rw: always setup io_async_rw for read/write requests")
Signed-off-by: Breno Leitao &lt;leitao@debian.org&gt;
Link: https://lore.kernel.org/r/20240530142340.1248216-1-leitao@debian.org
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>io_uring/net: add provided buffer support for IORING_OP_SEND</title>
<updated>2024-04-22T17:25:56+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2024-02-19T17:46:44+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=ac5f71a3d9d7eb540f6bf7e794eb4a3e4c3f11dd'/>
<id>ac5f71a3d9d7eb540f6bf7e794eb4a3e4c3f11dd</id>
<content type='text'>
It's pretty trivial to wire up provided buffer support for the send
side, just like how it's done the receive side. This enables setting up
a buffer ring that an application can use to push pending sends to,
and then have a send pick a buffer from that ring.

One of the challenges with async IO and networking sends is that you
can get into reordering conditions if you have more than one inflight
at the same time. Consider the following scenario where everything is
fine:

1) App queues sendA for socket1
2) App queues sendB for socket1
3) App does io_uring_submit()
4) sendA is issued, completes successfully, posts CQE
5) sendB is issued, completes successfully, posts CQE

All is fine. Requests are always issued in-order, and both complete
inline as most sends do.

However, if we're flooding socket1 with sends, the following could
also result from the same sequence:

1) App queues sendA for socket1
2) App queues sendB for socket1
3) App does io_uring_submit()
4) sendA is issued, socket1 is full, poll is armed for retry
5) Space frees up in socket1, this triggers sendA retry via task_work
6) sendB is issued, completes successfully, posts CQE
7) sendA is retried, completes successfully, posts CQE

Now we've sent sendB before sendA, which can make things unhappy. If
both sendA and sendB had been using provided buffers, then it would look
as follows instead:

1) App queues dataA for sendA, queues sendA for socket1
2) App queues dataB for sendB queues sendB for socket1
3) App does io_uring_submit()
4) sendA is issued, socket1 is full, poll is armed for retry
5) Space frees up in socket1, this triggers sendA retry via task_work
6) sendB is issued, picks first buffer (dataA), completes successfully,
   posts CQE (which says "I sent dataA")
7) sendA is retried, picks first buffer (dataB), completes successfully,
   posts CQE (which says "I sent dataB")

Now we've sent the data in order, and everybody is happy.

It's worth noting that this also opens the door for supporting multishot
sends, as provided buffers would be a prerequisite for that. Those can
trigger either when new buffers are added to the outgoing ring, or (if
stalled due to lack of space) when space frees up in the socket.

Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It's pretty trivial to wire up provided buffer support for the send
side, just like how it's done the receive side. This enables setting up
a buffer ring that an application can use to push pending sends to,
and then have a send pick a buffer from that ring.

One of the challenges with async IO and networking sends is that you
can get into reordering conditions if you have more than one inflight
at the same time. Consider the following scenario where everything is
fine:

1) App queues sendA for socket1
2) App queues sendB for socket1
3) App does io_uring_submit()
4) sendA is issued, completes successfully, posts CQE
5) sendB is issued, completes successfully, posts CQE

All is fine. Requests are always issued in-order, and both complete
inline as most sends do.

However, if we're flooding socket1 with sends, the following could
also result from the same sequence:

1) App queues sendA for socket1
2) App queues sendB for socket1
3) App does io_uring_submit()
4) sendA is issued, socket1 is full, poll is armed for retry
5) Space frees up in socket1, this triggers sendA retry via task_work
6) sendB is issued, completes successfully, posts CQE
7) sendA is retried, completes successfully, posts CQE

Now we've sent sendB before sendA, which can make things unhappy. If
both sendA and sendB had been using provided buffers, then it would look
as follows instead:

1) App queues dataA for sendA, queues sendA for socket1
2) App queues dataB for sendB queues sendB for socket1
3) App does io_uring_submit()
4) sendA is issued, socket1 is full, poll is armed for retry
5) Space frees up in socket1, this triggers sendA retry via task_work
6) sendB is issued, picks first buffer (dataA), completes successfully,
   posts CQE (which says "I sent dataA")
7) sendA is retried, picks first buffer (dataB), completes successfully,
   posts CQE (which says "I sent dataB")

Now we've sent the data in order, and everybody is happy.

It's worth noting that this also opens the door for supporting multishot
sends, as provided buffers would be a prerequisite for that. Those can
trigger either when new buffers are added to the outgoing ring, or (if
stalled due to lack of space) when space frees up in the socket.

Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>io_uring: drop -&gt;prep_async()</title>
<updated>2024-04-15T14:10:25+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2024-03-19T02:48:38+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=e10677a8f6980dbae2e866b8320d90bae07e87ee'/>
<id>e10677a8f6980dbae2e866b8320d90bae07e87ee</id>
<content type='text'>
It's now unused, drop the code related to it. This includes the
io_issue_defs-&gt;manual alloc field.

While in there, and since -&gt;async_size is now being used a bit more
frequently and in the issue path, move it to io_issue_defs[].

Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It's now unused, drop the code related to it. This includes the
io_issue_defs-&gt;manual alloc field.

While in there, and since -&gt;async_size is now being used a bit more
frequently and in the issue path, move it to io_issue_defs[].

Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>io_uring/uring_cmd: switch to always allocating async data</title>
<updated>2024-04-15T14:10:25+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2024-03-19T02:41:58+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d10f19dff56eac5ae44dc270336b18071a8bd51c'/>
<id>d10f19dff56eac5ae44dc270336b18071a8bd51c</id>
<content type='text'>
Basic conversion ensuring async_data is allocated off the prep path. Adds
a basic alloc cache as well, as passthrough IO can be quite high in rate.

Tested-by: Anuj Gupta &lt;anuj20.g@samsung.com&gt;
Reviewed-by: Anuj Gupta &lt;anuj20.g@samsung.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Basic conversion ensuring async_data is allocated off the prep path. Adds
a basic alloc cache as well, as passthrough IO can be quite high in rate.

Tested-by: Anuj Gupta &lt;anuj20.g@samsung.com&gt;
Reviewed-by: Anuj Gupta &lt;anuj20.g@samsung.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
</feed>
