summaryrefslogtreecommitdiffstats
path: root/io_uring/register.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* io_uring: clean up a type in io_uring_register_get_file()Dan Carpenter2024-09-161-1/+1
| | | | | | | | | | | | | Originally "fd" was unsigned int but it was changed to int when we pulled this code into a separate function in commit 0b6d253e084a ("io_uring/register: provide helper to get io_ring_ctx from 'fd'"). This doesn't really cause a runtime problem because the call to array_index_nospec() will clamp negative fds to 0 and nothing else uses the negative values. Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Link: https://lore.kernel.org/r/6f6cb630-079f-4fdf-bf95-1082e0a3fc6e@stanley.mountain Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: rename "copy buffers" to "clone buffers"Jens Axboe2024-09-141-2/+2
| | | | | | | | | | | | | A recent commit added support for copying registered buffers from one ring to another. But that term is a bit confusing, as no copying of buffer data is done here. What is being done is simply cloning the buffer registrations from one ring to another. Rename it while we still can, so that it's more descriptive. No functional changes in this patch. Fixes: 7cc2a6eadcd7 ("io_uring: add IORING_REGISTER_COPY_BUFFERS method") Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: add IORING_REGISTER_COPY_BUFFERS methodJens Axboe2024-09-121-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Buffers can get registered with io_uring, which allows to skip the repeated pin_pages, unpin/unref pages for each O_DIRECT operation. This reduces the overhead of O_DIRECT IO. However, registrering buffers can take some time. Normally this isn't an issue as it's done at initialization time (and hence less critical), but for cases where rings can be created and destroyed as part of an IO thread pool, registering the same buffers for multiple rings become a more time sensitive proposition. As an example, let's say an application has an IO memory pool of 500G. Initial registration takes: Got 500 huge pages (each 1024MB) Registered 500 pages in 409 msec or about 0.4 seconds. If we go higher to 900 1GB huge pages being registered: Registered 900 pages in 738 msec which is, as expected, a fully linear scaling. Rather than have each ring pin/map/register the same buffer pool, provide an io_uring_register(2) opcode to simply duplicate the buffers that are registered with another ring. Adding the same 900GB of registered buffers to the target ring can then be accomplished in: Copied 900 pages in 17 usec While timing differs a bit, this provides around a 25,000-40,000x speedup for this use case. Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring/register: provide helper to get io_ring_ctx from 'fd'Jens Axboe2024-09-121-21/+33
| | | | | | | | | | | | | Can be done in one of two ways: 1) Regular file descriptor, just fget() 2) Registered ring, index our own table for that In preparation for adding another register use of needing to get a ctx from a file descriptor, abstract out this helper and use it in the main register syscall as well. Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: user registered clockid for wait timeoutsPavel Begunkov2024-08-251-0/+31
| | | | | | | | | | | | Add a new registration opcode IORING_REGISTER_CLOCK, which allows the user to select which clock id it wants to use with CQ waiting timeouts. It only allows a subset of all posix clocks and currently supports CLOCK_MONOTONIC and CLOCK_BOOTTIME. Suggested-by: Lewis Baker <lewissbaker@gmail.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/98f2bc8a3c36cdf8f0e6a275245e81e903459703.1723039801.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: Allocate only necessary memory in io_probeGabriel Krisman Bertazi2024-06-191-4/+3
| | | | | | | | | | We write at most IORING_OP_LAST entries in the probe buffer, so we don't need to allocate temporary space for more than that. As a side effect, we no longer can overflow "size". Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> Link: https://lore.kernel.org/r/20240619020620.5301-3-krisman@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: Fix probe of disabled operationsGabriel Krisman Bertazi2024-06-191-1/+1
| | | | | | | | | | | | io_probe checks io_issue_def->not_supported, but we never really set that field, as we mark non-supported functions through a specific ->prep handler. This means we end up returning IO_URING_OP_SUPPORTED, even for disabled operations. Fix it by just checking the prep handler itself. Fixes: 66f4af93da57 ("io_uring: add support for probing opcodes") Signed-off-by: Gabriel Krisman Bertazi <krisman@suse.de> Link: https://lore.kernel.org/r/20240619020620.5301-2-krisman@suse.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring/eventfd: move eventfd handling to separate fileJens Axboe2024-06-161-55/+1
| | | | | | | | This is pretty nicely abstracted already, but let's move it to a separate file rather than have it in the main io_uring file. With that, we can also move the io_ev_fd struct and enum out of global scope. Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring/eventfd: move to more idiomatic RCU free usageJens Axboe2024-06-161-3/+3
| | | | | | | | | | | In some ways, it just "happens to work" currently with using the ops field for both the free and signaling bit. But it depends on ordering of operations in terms of freeing and signaling. Clean it up and use the usual refs == 0 under RCU read side lock to determine if the ev_fd is still valid, and use the reference to gate the freeing as well. Fixes: 21a091b970cd ("io_uring: signal registered eventfd to process deferred task work") Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: fix possible deadlock in io_register_iowq_max_workers()Hagar Hemdan2024-06-041-0/+4
| | | | | | | | | | | | | | | | | | | The io_register_iowq_max_workers() function calls io_put_sq_data(), which acquires the sqd->lock without releasing the uring_lock. Similar to the commit 009ad9f0c6ee ("io_uring: drop ctx->uring_lock before acquiring sqd->lock"), this can lead to a potential deadlock situation. To resolve this issue, the uring_lock is released before calling io_put_sq_data(), and then it is re-acquired after the function call. This change ensures that the locks are acquired in the correct order, preventing the possibility of a deadlock. Suggested-by: Maximilian Heyne <mheyne@amazon.de> Signed-off-by: Hagar Hemdan <hagarhem@amazon.com> Link: https://lore.kernel.org/r/20240604130527.3597-1-hagarhem@amazon.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: fix warnings on shadow variablesJens Axboe2024-04-151-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are a few of those: io_uring/fdinfo.c:170:16: warning: declaration shadows a local variable [-Wshadow] 170 | struct file *f = io_file_from_index(&ctx->file_table, i); | ^ io_uring/fdinfo.c:53:67: note: previous declaration is here 53 | __cold void io_uring_show_fdinfo(struct seq_file *m, struct file *f) | ^ io_uring/cancel.c:187:25: warning: declaration shadows a local variable [-Wshadow] 187 | struct io_uring_task *tctx = node->task->io_uring; | ^ io_uring/cancel.c:166:31: note: previous declaration is here 166 | struct io_uring_task *tctx, | ^ io_uring/register.c:371:25: warning: declaration shadows a local variable [-Wshadow] 371 | struct io_uring_task *tctx = node->task->io_uring; | ^ io_uring/register.c:312:24: note: previous declaration is here 312 | struct io_uring_task *tctx = NULL; | ^ and a simple cleanup gets rid of them. For the fdinfo case, make a distinction between the file being passed in (for the ring), and the registered files we iterate. For the other two cases, just get rid of shadowed variable, there's no reason to have a new one. Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring: add register/unregister napi functionStefan Roesch2024-02-091-0/+13
| | | | | | | | | | | | This adds an api to register and unregister the napi for io-uring. If the arg value is specified when unregistering, the current napi setting for the busy poll timeout is copied into the user structure. If this is not required, NULL can be passed as the arg value. Signed-off-by: Stefan Roesch <shr@devkernel.io> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20230608163839.2891748-7-shr@devkernel.io Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring/register: guard compat syscall with CONFIG_COMPATJens Axboe2024-01-171-3/+5
| | | | | | | | | | | | | | | | | | Add compat.h include to avoid a potential build issue: io_uring/register.c:281:6: error: call to undeclared function 'in_compat_syscall'; ISO C99 and later do not support implicit function declarations [-Werror,-Wimplicit-function-declaration] if (in_compat_syscall()) { ^ 1 warning generated. io_uring/register.c:282:9: error: call to undeclared function 'compat_get_bitmap'; ISO C99 and later do not support implicit function declarations [-Werror,-Wimplicit-function-declaration] ret = compat_get_bitmap(cpumask_bits(new_mask), ^ Fixes: c43203154d8a ("io_uring/register: move io_uring_register(2) related code to register.c") Reported-by: Manu Bretelle <chantra@meta.com> Reviewed-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring/kbuf: add method for returning provided buffer ring headJens Axboe2023-12-211-0/+6
| | | | | | | | | | | | | | | | | | The tail of the provided ring buffer is shared between the kernel and the application, but the head is private to the kernel as the application doesn't need to see it. However, this also prevents the application from knowing how many buffers the kernel has consumed. Usually this is fine, as the information is inherently racy in that the kernel could be consuming buffers continually, but for cleanup purposes it may be relevant to know how many buffers are still left in the ring. Add IORING_REGISTER_PBUF_STATUS which will return status for a given provided buffer ring. Right now it just returns the head, but space is reserved for more information later in, if needed. Link: https://github.com/axboe/liburing/discussions/1020 Signed-off-by: Jens Axboe <axboe@kernel.dk>
* io_uring/register: move io_uring_register(2) related code to register.cJens Axboe2023-12-191-0/+599
Most of this code is basically self contained, move it out of the core io_uring file to bring a bit more separation to the registration related bits. This moves another ~10% of the code into register.c. Signed-off-by: Jens Axboe <axboe@kernel.dk>