linux-stable.git/fs/io_uring.c, branch linux-5.14.y

io_uring: kill fasync

2021-10-17T08:44:51+00:00

[ Upstream commit 3f008385d46d3cea4a097d2615cd485f2184ba26 ]

We have never supported fasync properly, it would only fire when there
is something polling io_uring making it useless. The original support came
in through the initial io_uring merge for 5.1. Since it's broken and
nobody has reported it, get rid of the fasync bits.

Signed-off-by: Pavel Begunkov 
Link: https://lore.kernel.org/r/2f7ca3d344d406d34fa6713824198915c41cea86.1633080236.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe 
Signed-off-by: Sasha Levin

io_uring: allow conditional reschedule for intensive iterators

2021-10-09T13:02:41+00:00

[ Upstream commit 8bab4c09f24ec8d4a7a78ab343620f89d3a24804 ]

If we have a lot of threads and rings, the tctx list can get quite big.
This is especially true if we keep creating new threads and rings.
Likewise for the provided buffers list. Be nice and insert a conditional
reschedule point while iterating the nodes for deletion.

Link: https://lore.kernel.org/io-uring/00000000000064b6b405ccb41113@google.com/
Reported-by: syzbot+111d2a03f51f5ae73775@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe 
Signed-off-by: Sasha Levin

io_uring: don't punt files update to io-wq unconditionally

2021-09-30T08:13:03+00:00

[ Upstream commit cdb31c29d397a8076d81fd1458d091c647ef94ba ]

There's no reason to punt it unconditionally, we just need to ensure that
the submit lock grabbing is conditional.

Fixes: 05f3fb3c5397 ("io_uring: avoid ring quiesce for fixed file set unregister and update")
Signed-off-by: Jens Axboe 
Signed-off-by: Sasha Levin

io_uring: put provided buffer meta data under memcg accounting

2021-09-30T08:13:03+00:00

[ Upstream commit 9990da93d2bf9892c2c14c958bef050d4e461a1a ]

For each provided buffer, we allocate a struct io_buffer to hold the
data associated with it. As a large number of buffers can be provided,
account that data with memcg.

Fixes: ddf0322db79c ("io_uring: add IORING_OP_PROVIDE_BUFFERS")
Signed-off-by: Jens Axboe 
Signed-off-by: Sasha Levin

io_uring: fix missing set of EPOLLONESHOT for CQ ring overflow

2021-09-30T08:13:03+00:00

[ Upstream commit a62682f92eedb41c1cd8290fa875a4b85624fb9a ]

We should set EPOLLONESHOT if cqring_fill_event() returns false since
io_poll_add() decides to put req or not by it.

Fixes: 5082620fb2ca ("io_uring: terminate multishot poll for CQ ring overflow")
Signed-off-by: Hao Xu 
Link: https://lore.kernel.org/r/20210922101238.7177-3-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe 
Signed-off-by: Sasha Levin

io_uring: fix race between poll completion and cancel_hash insertion

2021-09-30T08:13:03+00:00

[ Upstream commit bd99c71bd14072ce2920f6d0c2fe43df072c653c ]

If poll arming and poll completion runs in parallel, there maybe races.
For instance, run io_poll_add in iowq and io_poll_task_func in original
context, then:

  iowq                                      original context
  io_poll_add
    vfs_poll
     (interruption happens
      tw queued to original
      context)                              io_poll_task_func
                                              generate cqe
                                              del from cancel_hash[]
    if !poll.done
      insert to cancel_hash[]

The entry left in cancel_hash[], similar case for fast poll.
Fix it by set poll.done = true when del from cancel_hash[].

Fixes: 5082620fb2ca ("io_uring: terminate multishot poll for CQ ring overflow")
Signed-off-by: Hao Xu 
Link: https://lore.kernel.org/r/20210922101238.7177-2-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe 
Signed-off-by: Sasha Levin

io_uring: fix off-by-one in BUILD_BUG_ON check of __REQ_F_LAST_BIT

2021-09-26T12:10:25+00:00

[ Upstream commit 32c2d33e0b7c4ea53284d5d9435dd022b582c8cf ]

Build check of __REQ_F_LAST_BIT should be larger than, not equal or larger
than. It's perfectly valid to have __REQ_F_LAST_BIT be 32, as that means
that the last valid bit is 31 which does fit in the type.

Signed-off-by: Hao Xu 
Link: https://lore.kernel.org/r/20210907032243.114190-1-haoxu@linux.alibaba.com
Signed-off-by: Jens Axboe 
Signed-off-by: Sasha Levin

io_uring: retry in case of short read on block device

2021-09-22T10:39:30+00:00

[ Upstream commit 7db304375e11741e5940f9bc549155035bfb4dc1 ]

In case of buffered reading from block device, when short read happens,
we should retry to read more, otherwise the IO will be completed
partially, for example, the following fio expects to read 2MB, but it
can only read 1M or less bytes:

    fio --name=onessd --filename=/dev/nvme0n1 --filesize=2M \
	--rw=randread --bs=2M --direct=0 --overwrite=0 --numjobs=1 \
	--iodepth=1 --time_based=0 --runtime=2 --ioengine=io_uring \
	--registerfiles --fixedbufs --gtod_reduce=1 --group_reporting

Fix the issue by allowing short read retry for block device, which sets
FMODE_BUF_RASYNC really.

Fixes: 9a173346bd9e ("io_uring: fix short read retries for non-reg files")
Cc: Pavel Begunkov 
Signed-off-by: Ming Lei 
Reviewed-by: Pavel Begunkov 
Link: https://lore.kernel.org/r/20210821150751.1290434-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe 
Signed-off-by: Sasha Levin

io_uring: allow retry for O_NONBLOCK if async is supported

2021-09-22T10:39:17+00:00

commit 5d329e1286b0a040264e239b80257c937f6e685f upstream.

A common complaint is that using O_NONBLOCK files with io_uring can be a
bit of a pain. Be a bit nicer and allow normal retry IFF the file does
support async behavior. This makes it possible to use io_uring more
reliably with O_NONBLOCK files, for use cases where it either isn't
possible or feasible to modify the file flags.

Cc: stable@vger.kernel.org
Reported-and-tested-by: Dan Melnic 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman

io_uring: ensure symmetry in handling iter types in loop_rw_iter()

2021-09-22T10:39:14+00:00

commit 16c8d2df7ec0eed31b7d3b61cb13206a7fb930cc upstream.

When setting up the next segment, we check what type the iter is and
handle it accordingly. However, when incrementing and processed amount
we do not, and both iter advance and addr/len are adjusted, regardless
of type. Split the increment side just like we do on the setup side.

Fixes: 4017eb91a9e7 ("io_uring: make loop_rw_iter() use original user supplied pointers")
Cc: stable@vger.kernel.org
Reported-by: Valentina Palmiotti 
Reviewed-by: Pavel Begunkov 
Signed-off-by: Jens Axboe 
Signed-off-by: Greg Kroah-Hartman