<feed xmlns='http://www.w3.org/2005/Atom'>
<title>linux.git/io_uring/timeout.c, branch v7.1-rc3</title>
<subtitle>Linux kernel source tree</subtitle>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/'/>
<entry>
<title>io_uring/timeout: honour caller's time namespace for IORING_TIMEOUT_ABS</title>
<updated>2026-05-06T10:58:56+00:00</updated>
<author>
<name>Maoyi Xie</name>
<email>maoyixie.tju@gmail.com</email>
</author>
<published>2026-05-04T15:37:54+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=9cc6bac1bebf8310d2950d1411a91479e86d69a1'/>
<id>9cc6bac1bebf8310d2950d1411a91479e86d69a1</id>
<content type='text'>
io_uring's IORING_OP_TIMEOUT and IORING_OP_LINK_TIMEOUT accept a
timespec from the caller via io_parse_user_time(). With
IORING_TIMEOUT_ABS, the timestamp is an absolute deadline on the
selected clock. The clock is CLOCK_MONOTONIC by default.
CLOCK_BOOTTIME and CLOCK_REALTIME are also selectable.

A submitter inside a CLONE_NEWTIME time namespace observes
CLOCK_MONOTONIC and CLOCK_BOOTTIME shifted by the namespace's
offsets relative to the host. Every other ABS timer interface in
the kernel converts the caller's absolute time to host view via
timens_ktime_to_host() before arming an hrtimer:

  kernel/time/posix-timers.c    -- timer_settime(TIMER_ABSTIME)
  kernel/time/posix-stubs.c     -- clock_nanosleep(TIMER_ABSTIME)
  kernel/time/alarmtimer.c      -- alarm_timer_nsleep(TIMER_ABSTIME)
  fs/timerfd.c                  -- timerfd_settime(TFD_TIMER_ABSTIME)

io_parse_user_time() does not. As a result, an absolute timeout
submitted from within a time namespace is interpreted in host
view. That is generally a different point in time. It may already
be in the past, causing the timer to fire immediately, or far in
the future, causing the timer not to fire when expected.

Reproducer: in unshare --user --time, with a -10s monotonic
offset, submit IORING_OP_TIMEOUT with IORING_TIMEOUT_ABS and
deadline = now + 1s. The CQE is delivered after &lt;1ms instead of
the expected ~1s.

Apply timens_ktime_to_host() to the parsed time when
IORING_TIMEOUT_ABS is set. Split the existing clock id resolver
in io_timeout_get_clock() into a flags only helper
io_flags_to_clock(), so io_parse_user_time() can resolve the
clock without a struct io_timeout_data.

timens_ktime_to_host() is a no-op for clocks not affected by time
namespaces, e.g. CLOCK_REALTIME. It is also a no-op for callers
in the initial time namespace. The fast path is unchanged.

SQPOLL is also covered. The SQPOLL kernel thread is created via
create_io_thread() with CLONE_THREAD and no CLONE_NEW* flag.
copy_namespaces() therefore shares the submitter's nsproxy by
reference. Inside the SQPOLL kthread, current-&gt;nsproxy-&gt;time_ns
is the submitter's time_ns. timens_ktime_to_host() resolves
correctly.

Suggested-by: Pavel Begunkov &lt;asml.silence@gmail.com&gt;
Suggested-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Maoyi Xie &lt;maoyi.xie@ntu.edu.sg&gt;
Link: https://patch.msgid.link/20260504153755.1293932-2-maoyi.xie@ntu.edu.sg
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
io_uring's IORING_OP_TIMEOUT and IORING_OP_LINK_TIMEOUT accept a
timespec from the caller via io_parse_user_time(). With
IORING_TIMEOUT_ABS, the timestamp is an absolute deadline on the
selected clock. The clock is CLOCK_MONOTONIC by default.
CLOCK_BOOTTIME and CLOCK_REALTIME are also selectable.

A submitter inside a CLONE_NEWTIME time namespace observes
CLOCK_MONOTONIC and CLOCK_BOOTTIME shifted by the namespace's
offsets relative to the host. Every other ABS timer interface in
the kernel converts the caller's absolute time to host view via
timens_ktime_to_host() before arming an hrtimer:

  kernel/time/posix-timers.c    -- timer_settime(TIMER_ABSTIME)
  kernel/time/posix-stubs.c     -- clock_nanosleep(TIMER_ABSTIME)
  kernel/time/alarmtimer.c      -- alarm_timer_nsleep(TIMER_ABSTIME)
  fs/timerfd.c                  -- timerfd_settime(TFD_TIMER_ABSTIME)

io_parse_user_time() does not. As a result, an absolute timeout
submitted from within a time namespace is interpreted in host
view. That is generally a different point in time. It may already
be in the past, causing the timer to fire immediately, or far in
the future, causing the timer not to fire when expected.

Reproducer: in unshare --user --time, with a -10s monotonic
offset, submit IORING_OP_TIMEOUT with IORING_TIMEOUT_ABS and
deadline = now + 1s. The CQE is delivered after &lt;1ms instead of
the expected ~1s.

Apply timens_ktime_to_host() to the parsed time when
IORING_TIMEOUT_ABS is set. Split the existing clock id resolver
in io_timeout_get_clock() into a flags only helper
io_flags_to_clock(), so io_parse_user_time() can resolve the
clock without a struct io_timeout_data.

timens_ktime_to_host() is a no-op for clocks not affected by time
namespaces, e.g. CLOCK_REALTIME. It is also a no-op for callers
in the initial time namespace. The fast path is unchanged.

SQPOLL is also covered. The SQPOLL kernel thread is created via
create_io_thread() with CLONE_THREAD and no CLONE_NEW* flag.
copy_namespaces() therefore shares the submitter's nsproxy by
reference. Inside the SQPOLL kthread, current-&gt;nsproxy-&gt;time_ns
is the submitter's time_ns. timens_ktime_to_host() resolves
correctly.

Suggested-by: Pavel Begunkov &lt;asml.silence@gmail.com&gt;
Suggested-by: Jens Axboe &lt;axboe@kernel.dk&gt;
Signed-off-by: Maoyi Xie &lt;maoyi.xie@ntu.edu.sg&gt;
Link: https://patch.msgid.link/20260504153755.1293932-2-maoyi.xie@ntu.edu.sg
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>io_uring/timeout: use 'ctx' consistently</title>
<updated>2026-04-02T13:08:40+00:00</updated>
<author>
<name>Yang Xiuwei</name>
<email>yangxiuwei@kylinos.cn</email>
</author>
<published>2026-04-02T01:49:52+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f847bf6d29304087f94ef4b4a8646f69d96945f9'/>
<id>f847bf6d29304087f94ef4b4a8646f69d96945f9</id>
<content type='text'>
There's already a local ctx variable, yet cq_timeouts accounting uses
req-&gt;ctx. Use ctx consistently.

Signed-off-by: Yang Xiuwei &lt;yangxiuwei@kylinos.cn&gt;
Link: https://patch.msgid.link/20260402014952.260414-1-yangxiuwei@kylinos.cn
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There's already a local ctx variable, yet cq_timeouts accounting uses
req-&gt;ctx. Use ctx consistently.

Signed-off-by: Yang Xiuwei &lt;yangxiuwei@kylinos.cn&gt;
Link: https://patch.msgid.link/20260402014952.260414-1-yangxiuwei@kylinos.cn
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>io_uring: switch struct io_ring_ctx internal bitfields to flags</title>
<updated>2026-03-16T21:32:59+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2026-03-14T14:41:04+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=f1a424e21c15993db0f9594cda17ef5d516ab3e9'/>
<id>f1a424e21c15993db0f9594cda17ef5d516ab3e9</id>
<content type='text'>
Bitfields cannot be set and checked atomically, and this makes it more
clear that these are indeed in shared storage and must be checked and
set in a sane fashion. This is in preparation for annotating a few of
the known racy, but harmless, flags checking.

No intended functional changes in this patch.

Reviewed-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Bitfields cannot be set and checked atomically, and this makes it more
clear that these are indeed in shared storage and must be checked and
set in a sane fashion. This is in preparation for annotating a few of
the known racy, but harmless, flags checking.

No intended functional changes in this patch.

Reviewed-by: Gabriel Krisman Bertazi &lt;krisman@suse.de&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>io_uring/timeout: immediate timeout arg</title>
<updated>2026-03-09T13:21:54+00:00</updated>
<author>
<name>Pavel Begunkov</name>
<email>asml.silence@gmail.com</email>
</author>
<published>2026-03-02T13:10:37+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=d8345a21902af5d754f2c2aadf877de989e3cac3'/>
<id>d8345a21902af5d754f2c2aadf877de989e3cac3</id>
<content type='text'>
One the things the user has always keep in mind is that any user
pointers they put into an SQE is not going to be read by the kernel
until submission happens, and the user has to ensure the pointee stays
alive until then. For example, snippet below will lead to UAF of the on
stack variable ts. Instead of passing the timeout value as a pointer
allow to store it immediately in the SQE. The user has to set a new flag
called IORING_TIMEOUT_IMMEDIATE_ARG, in which case sqe-&gt;addr for timeout
or sqe-&gt;addr2 for timeout update requests will be interpreted as a time
value in nanosecods.

void prep_timeout(struct io_uring_sqe *sqe) {
    struct __kernel_timespec ts = {...};
    prep_timeout(sqe, &amp;ts);
}

void submit() {
    sqe = get_sqe();
    prep_timeout(sqe);
    io_uring_submit();
}

Signed-off-by: Pavel Begunkov &lt;asml.silence@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
One the things the user has always keep in mind is that any user
pointers they put into an SQE is not going to be read by the kernel
until submission happens, and the user has to ensure the pointee stays
alive until then. For example, snippet below will lead to UAF of the on
stack variable ts. Instead of passing the timeout value as a pointer
allow to store it immediately in the SQE. The user has to set a new flag
called IORING_TIMEOUT_IMMEDIATE_ARG, in which case sqe-&gt;addr for timeout
or sqe-&gt;addr2 for timeout update requests will be interpreted as a time
value in nanosecods.

void prep_timeout(struct io_uring_sqe *sqe) {
    struct __kernel_timespec ts = {...};
    prep_timeout(sqe, &amp;ts);
}

void submit() {
    sqe = get_sqe();
    prep_timeout(sqe);
    io_uring_submit();
}

Signed-off-by: Pavel Begunkov &lt;asml.silence@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>io_uring/timeout: migrate reqs from ts64 to ktime</title>
<updated>2026-03-09T13:21:54+00:00</updated>
<author>
<name>Pavel Begunkov</name>
<email>asml.silence@gmail.com</email>
</author>
<published>2026-03-02T13:10:36+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=0e78aa188cbddc6311ff24938b1c8d3850b35708'/>
<id>0e78aa188cbddc6311ff24938b1c8d3850b35708</id>
<content type='text'>
It'll be more convenient for next patches to keep ktime in requests
instead of timespec64. Convert everything to ktime right after user
argument parsing at request prep time.

Signed-off-by: Pavel Begunkov &lt;asml.silence@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
It'll be more convenient for next patches to keep ktime in requests
instead of timespec64. Convert everything to ktime right after user
argument parsing at request prep time.

Signed-off-by: Pavel Begunkov &lt;asml.silence@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>io_uring/timeout: add helper for parsing user time</title>
<updated>2026-03-09T13:21:54+00:00</updated>
<author>
<name>Pavel Begunkov</name>
<email>asml.silence@gmail.com</email>
</author>
<published>2026-03-02T13:10:35+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=6e3f5943a49b1593921fd340ff1dffba851c2afd'/>
<id>6e3f5943a49b1593921fd340ff1dffba851c2afd</id>
<content type='text'>
There is some duplication for timespec checks that can be deduplicated
with a new function, and it'll be extended in next patches.

Signed-off-by: Pavel Begunkov &lt;asml.silence@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
There is some duplication for timespec checks that can be deduplicated
with a new function, and it'll be extended in next patches.

Signed-off-by: Pavel Begunkov &lt;asml.silence@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>io_uring/timeout: check unused sqe fields</title>
<updated>2026-03-09T13:21:54+00:00</updated>
<author>
<name>Pavel Begunkov</name>
<email>asml.silence@gmail.com</email>
</author>
<published>2026-03-02T13:10:34+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=484ae637a3e3d909718de7c07afd3bb34b6b8504'/>
<id>484ae637a3e3d909718de7c07afd3bb34b6b8504</id>
<content type='text'>
Zero check unused SQE fields addr3 and pad2 for timeout and timeout
update requests. They're not needed now, but could be used sometime
in the future.

Cc: stable@vger.kernel.org
Signed-off-by: Pavel Begunkov &lt;asml.silence@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
Zero check unused SQE fields addr3 and pad2 for timeout and timeout
update requests. They're not needed now, but could be used sometime
in the future.

Cc: stable@vger.kernel.org
Signed-off-by: Pavel Begunkov &lt;asml.silence@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>io_uring/timeout: READ_ONCE sqe-&gt;addr</title>
<updated>2026-02-25T15:36:05+00:00</updated>
<author>
<name>Pavel Begunkov</name>
<email>asml.silence@gmail.com</email>
</author>
<published>2026-02-25T10:35:57+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=85f6c439a69afe4fa8a688512e586971e97e273a'/>
<id>85f6c439a69afe4fa8a688512e586971e97e273a</id>
<content type='text'>
We should use READ_ONCE when reading from a SQE, make sure timeout gets
a stable timespec address.

Signed-off-by: Pavel Begunkov &lt;asml.silence@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
We should use READ_ONCE when reading from a SQE, make sure timeout gets
a stable timespec address.

Signed-off-by: Pavel Begunkov &lt;asml.silence@gmail.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>io_uring/timeout: annotate data race in io_flush_timeouts()</title>
<updated>2026-01-20T16:54:17+00:00</updated>
<author>
<name>Jens Axboe</name>
<email>axboe@kernel.dk</email>
</author>
<published>2026-01-20T16:53:43+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=42b12cb5fd4554679bac06bbdd05dc8b643bcc42'/>
<id>42b12cb5fd4554679bac06bbdd05dc8b643bcc42</id>
<content type='text'>
syzbot correctly reports this as a KCSAN race, as ctx-&gt;cached_cq_tail
should be read under -&gt;uring_lock. This isn't immediately feasible in
io_flush_timeouts(), but as long as we read a stable value, that should
be good enough. If two io-wq threads compete on this value, then they
will both end up calling io_flush_timeouts() and at least one of them
will see the correct value.

Reported-by: syzbot+6c48db7d94402407301e@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
syzbot correctly reports this as a KCSAN race, as ctx-&gt;cached_cq_tail
should be read under -&gt;uring_lock. This isn't immediately feasible in
io_flush_timeouts(), but as long as we read a stable value, that should
be good enough. If two io-wq threads compete on this value, then they
will both end up calling io_flush_timeouts() and at least one of them
will see the correct value.

Reported-by: syzbot+6c48db7d94402407301e@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
<entry>
<title>io_uring: add wrapper type for io_req_tw_func_t arg</title>
<updated>2025-11-03T15:31:26+00:00</updated>
<author>
<name>Caleb Sander Mateos</name>
<email>csander@purestorage.com</email>
</author>
<published>2025-10-31T20:34:29+00:00</published>
<link rel='alternate' type='text/html' href='https://git.tavy.me/linux.git/commit/?id=c33e779aba6804778c1440192a8033a145ba588d'/>
<id>c33e779aba6804778c1440192a8033a145ba588d</id>
<content type='text'>
In preparation for uring_cmd implementations to implement functions
with the io_req_tw_func_t signature, introduce a wrapper struct
io_tw_req to hide the struct io_kiocb * argument. The intention is for
only the io_uring core to access the inner struct io_kiocb *. uring_cmd
implementations should instead call a helper from io_uring/cmd.h to
convert struct io_tw_req to struct io_uring_cmd *.

Signed-off-by: Caleb Sander Mateos &lt;csander@purestorage.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</content>
<content type='xhtml'>
<div xmlns='http://www.w3.org/1999/xhtml'>
<pre>
In preparation for uring_cmd implementations to implement functions
with the io_req_tw_func_t signature, introduce a wrapper struct
io_tw_req to hide the struct io_kiocb * argument. The intention is for
only the io_uring core to access the inner struct io_kiocb *. uring_cmd
implementations should instead call a helper from io_uring/cmd.h to
convert struct io_tw_req to struct io_uring_cmd *.

Signed-off-by: Caleb Sander Mateos &lt;csander@purestorage.com&gt;
Signed-off-by: Jens Axboe &lt;axboe@kernel.dk&gt;
</pre>
</div>
</content>
</entry>
</feed>
