linux-stable.git/io_uring/poll.c, branch linux-6.13.y

io_uring/net: don't retry connect operation on EPOLLERR

2025-02-17T10:36:52+00:00

commit 8c8492ca64e79c6e0f433e8c9d2bcbd039ef83d0 upstream.

If a socket is shutdown before the connection completes, POLLERR is set
in the poll mask. However, connect ignores this as it doesn't know, and
attempts the connection again. This may lead to a bogus -ETIMEDOUT
result, where it should have noticed the POLLERR and just returned
-ECONNRESET instead.

Have the poll logic check for whether or not POLLERR is set in the mask,
and if so, mark the request as failed. Then connect can appropriately
fail the request rather than retry it.

Reported-by: Sergey Galas 
Cc: stable@vger.kernel.org
Link: https://github.com/axboe/liburing/discussions/1335
Fixes: 3fb1bd688172 ("io_uring/net: handle -EINPROGRESS correct for IORING_OP_CONNECT")
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

io_uring: fix multishots with selected buffers

2025-02-17T10:36:52+00:00

commit d63b0e8a628e62ca85a0f7915230186bb92f8bb4 upstream.

We do io_kbuf_recycle() when arming a poll but every iteration of a
multishot can grab more buffers, which is why we need to flush the kbuf
ring state before continuing with waiting.

Cc: stable@vger.kernel.org
Fixes: b3fdea6ecb55c ("io_uring: multishot recv")
Reported-by: Muhammad Ramdhan 
Reported-by: Bing-Jhong Billy Jheng 
Reported-by: Jacob Soo 
Signed-off-by: Pavel Begunkov 
Link: https://lore.kernel.org/r/1bfc9990fe435f1fc6152ca9efeba5eb3e68339c.1738025570.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

io_uring: move struct io_kiocb from task_struct to io_uring_task

2024-11-06T20:55:38+00:00

Rather than store the task_struct itself in struct io_kiocb, store
the io_uring specific task_struct. The life times are the same in terms
of io_uring, and this avoids doing some dereferences through the
task_struct. For the hot path of putting local task references, we can
deref req->tctx instead, which we'll need anyway in that function
regardless of whether it's local or remote references.

This is mostly straight forward, except the original task PF_EXITING
check needs a bit of tweaking. task_work is _always_ run from the
originating task, except in the fallback case, where it's run from a
kernel thread. Replace the potentially racy (in case of fallback work)
checks for req->task->flags with current->flags. It's either the still
the original task, in which case PF_EXITING will be sane, or it has
PF_KTHREAD set, in which case it's fallback work. Both cases should
prevent moving forward with the given request.

Signed-off-by: Jens Axboe

io_uring: move cancelations to be io_uring_task based

2024-11-06T20:55:38+00:00

Right now the task_struct pointer is used as the key to match a task,
but in preparation for some io_kiocb changes, move it to using struct
io_uring_task instead. No functional changes intended in this patch.

Signed-off-by: Jens Axboe

io_uring/poll: get rid of per-hashtable bucket locks

2024-10-29T19:43:27+00:00

Any access to the table is protected by ctx->uring_lock now anyway, the
per-bucket locking doesn't buy us anything.

Signed-off-by: Jens Axboe

io_uring/poll: get rid of io_poll_tw_hash_eject()

2024-10-29T19:43:27+00:00

It serves no purposes anymore, all it does is delete the hash list
entry. task_work always has the ring locked.

Signed-off-by: Jens Axboe

io_uring/poll: get rid of unlocked cancel hash

2024-10-29T19:43:27+00:00

io_uring maintains two hash lists of inflight requests:

1) ctx->cancel_table_locked. This is used when the caller has the
   ctx->uring_lock held already. This is only an issue side parameter,
   as removal or task_work will always have it held.

2) ctx->cancel_table. This is used when the issuer does NOT have the
   ctx->uring_lock held, and relies on the table spinlocks for access.

However, it's pretty trivial to simply grab the lock in the one spot
where we care about it, for insertion. With that, we can kill the
unlocked table (and get rid of the _locked postfix for the other one).

Signed-off-by: Jens Axboe

io_uring/poll: remove 'ctx' argument from io_poll_req_delete()

2024-10-29T19:43:26+00:00

It's always req->ctx being used anyway, having this as a separate
argument (that is then not even used) just makes it more confusing.

Signed-off-by: Jens Axboe

io_uring: keep multishot request NAPI timeout current

2024-07-30T12:18:58+00:00

This refresh statement was originally present in the original patch:
https://lore.kernel.org/netdev/20221121191437.996297-2-shr@devkernel.io/

It has been removed with no explanation in v6:
https://lore.kernel.org/netdev/20230201222254.744422-2-shr@devkernel.io/

It is important to make the refresh for multishot requests, because if no
new requests using the same NAPI device are added to the ring, the entry
will become stale and be removed silently. The unsuspecting user will
not know that their ring had busy polling for only 60 seconds before
being pruned.

Signed-off-by: Olivier Langlois 
Reviewed-by: Pavel Begunkov 
Fixes: 8d0c12a80cdeb ("io-uring: add napi busy poll support")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/0fe61a019ec61e5708cd117cb42ed0dab95e1617.1722294646.git.olivier@trillion01.com
Signed-off-by: Jens Axboe

io_uring/alloc_cache: switch to array based caching

2024-04-15T14:10:25+00:00

Currently lists are being used to manage this, but best practice is
usually to have these in an array instead as that it cheaper to manage.

Outside of that detail, games are also played with KASAN as the list
is inside the cached entry itself.

Finally, all users of this need a struct io_cache_entry embedded in
their struct, which is union'ized with something else in there that
isn't used across the free -> realloc cycle.

Get rid of all of that, and simply have it be an array. This will not
change the memory used, as we're just trading an 8-byte member entry
for the per-elem array size.

This reduces the overhead of the recycled allocations, and it reduces
the amount of code code needed to support recycling to about half of
what it currently is.

Signed-off-by: Jens Axboe