summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* process_vm_rw_pages(): pass accurate amount of bytesAl Viro2014-04-021-8/+14
| | | | | | ... makes passing the amount of pages unnecessary Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* process_vm_access: take get_user_pages/put_pages one level upAl Viro2014-04-021-58/+39
| | | | | | ... and trim the fuck out of process_vm_rw_pages() argument list. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* process_vm_access: switch to copy_page_to_iter/iov_iter_copy_from_userAl Viro2014-04-021-68/+23
| | | | | | | | ... rather than open-coding those. As a side benefit, we get much saner loop calling those; we can just feed entire pages, instead of the "copy would span the iovec boundary, let's do it in two loop iterations" mess. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* process_vm_access: switch to iov_iterAl Viro2014-04-021-34/+28
| | | | | | | instead of keeping its pieces in separate variables and passing pointers to all of them... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* untangling process_vm_..., part 4Al Viro2014-04-021-16/+13
| | | | | | | | instead of passing vector size (by value) and index (by reference), pass the number of elements remaining. That's all we care about in these functions by that point. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* untangling process_vm_..., part 3Al Viro2014-04-021-4/+3
| | | | | | | lift iov one more level out - from process_vm_rw_single_vec to process_vm_rw_core(). Same story as with the previous commit. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* untangling process_vm_..., part 2Al Viro2014-04-021-3/+5
| | | | | | | | | | move iov to caller's stack frame; the value we assign to it on the next call of process_vm_rw_pages() is equal to the value it had when the last time we were leaving process_vm_rw_pages(). drop lvec argument of process_vm_rw_pages() - it's not used anymore. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* untangling process_vm_..., part 1Al Viro2014-04-021-5/+9
| | | | | | | | we want to massage it to use of iov_iter. This one is an equivalent transformation - just introduce a local variable mirroring lvec + *lvec_current. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* read_code(): go through vfs_read() instead of calling the method directlyAl Viro2014-04-021-1/+1
| | | | | | | | ... and don't skip on sanity checks. It's *not* a hot path, TYVM (a couple of calls per a.out execve(), for pity sake) and headers of random a.out binary are not to be trusted. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fold cifs_iovec_read() into its (only) callerAl Viro2014-04-021-18/+9
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* cifs_iovec_read: keep iov_iter between the calls of cifs_readdata_to_iov()Al Viro2014-04-021-45/+17
| | | | | | | | | | ... we are doing them on adjacent parts of file, so what happens is that each subsequent call works to rebuild the iov_iter to exact state it had been abandoned in by previous one. Just keep it through the entire cifs_iovec_read(). And use copy_page_to_iter() instead of doing kmap/copy_to_user/kunmap manually... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch vmsplice_to_user() to copy_page_to_iter()Al Viro2014-04-021-89/+21
| | | | | | | | I've switched the sanity checks on iovec to rw_copy_check_uvector(); we might need to do a local analog, if any behaviour differences are not actually bugfixes here... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch pipe_read() to copy_page_to_iter()Al Viro2014-04-021-71/+8
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* cifs_iovec_read(): resubmit shouldn't restart the loopAl Viro2014-04-021-8/+8
| | | | | | | | ... by that point the request we'd just resent is in the head of the list anyway. Just return to the beginning of the loop body... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* introduce copy_page_to_iter, kill loop over iovec in generic_file_aio_read()Al Viro2014-04-024-144/+138
| | | | | | | | | | | | generic_file_aio_read() was looping over the target iovec, with loop over (source) pages nested inside that. Just set an iov_iter up and pass *that* to do_generic_file_aio_read(). With copy_page_to_iter() doing all work of mapping and copying a page to iovec and advancing iov_iter. Switch shmem_file_aio_read() to the same and kill file_read_actor(), while we are at it. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* iov_iter: Move iov_iter to uio.hKent Overstreet2014-04-022-32/+50
| | | | Signed-off-by: Kent Overstreet <kmo@daterainc.com>
* do_shmem_file_read(): call file_read_actor() directlyAl Viro2014-04-021-3/+3
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* callers of iov_copy_from_user_atomic() don't need pagecache_disable()Al Viro2014-04-023-10/+0
| | | | | | ... it does that itself (via kmap_atomic()) Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch ->is_partially_uptodate() to saner argumentsAl Viro2014-04-026-9/+9
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* pipe: kill ->map() and ->unmap()Al Viro2014-04-027-100/+29
| | | | | | all pipe_buffer_operations have the same instances of those... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fuse/dev: use atomic mapsAl Viro2014-04-021-5/+5
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* VFS: Make delayed_free() call free_vfsmnt()David Howells2014-04-021-12/+8
| | | | | | | | | Make delayed_free() call free_vfsmnt() so that we don't have two functions doing the same job. This requires the calls to mnt_free_id() in free_vfsmnt() to be moved into the callers of that function. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* mn10300: kmap_atomic() returns void *, not unsigned long...Al Viro2014-04-021-2/+2
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* cifs: ->rename() without ->lookup() makes no senseAl Viro2014-04-021-1/+0
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* get rid of pointless checks for NULL ->i_opAl Viro2014-04-025-7/+5
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* ntfs: don't put NULL into ->i_op/->i_fopAl Viro2014-04-021-2/+0
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* new helper: readlink_copy()Al Viro2014-04-025-47/+12
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* lustre: generic_readlink() is just fine there, TYVM...Al Viro2014-04-021-22/+1
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* get rid of files_defer_init()Al Viro2014-04-023-10/+4
| | | | | | | | the only thing it's doing these days is calculation of upper limit for fs.nr_open sysctl and that can be done statically Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* namei.c: move EXPORT_SYMBOL to corresponding definitionsAl Viro2014-04-021-28/+27
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* get_write_access() is inlined, exporting it is pointlessAl Viro2014-04-021-1/+0
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* tidy do_dentry_open() up a bitAl Viro2014-04-021-12/+10
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* mark struct file that had write access grabbed by open()Al Viro2014-04-024-41/+11
| | | | | | | | | new flag in ->f_mode - FMODE_WRITER. Set by do_dentry_open() in case when it has grabbed write access, checked by __fput() to decide whether it wants to drop the sucker. Allows to stop bothering with mnt_clone_write() in alloc_file(), along with fewer special_file() checks. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* fold __get_file_write_access() into its only callerAl Viro2014-04-021-19/+6
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* get rid of DEBUG_WRITECOUNTAl Viro2014-04-0210-78/+0
| | | | | | it only makes control flow in __fput() and friends more convoluted. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* don't bother with {get,put}_write_access() on non-regular filesAl Viro2014-04-022-21/+9
| | | | | | | | | it's pointless and actually leads to wrong behaviour in at least one moderately convoluted case (pipe(), close one end, try to get to another via /proc/*/fd and run into ETXTBUSY). Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* ncpfs: switch to sockfd_lookup()/sockfd_put()Al Viro2014-04-022-40/+12
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch nbd to sockfd_lookup/sockfd_putAl Viro2014-04-022-31/+20
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* vhost: don't open-code sockfd_put()Al Viro2014-04-021-7/+7
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* usbip: don't open-code sockfd_lookup/sockfd_putAl Viro2014-04-025-35/+9
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* reduce m_start() cost...Al Viro2014-04-023-4/+23
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* smarter propagate_mnt()Al Viro2014-04-024-82/+133
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current mainline has copies propagated to *all* nodes, then tears down the copies we made for nodes that do not contain counterparts of the desired mountpoint. That sets the right propagation graph for the copies (at teardown time we move the slaves of removed node to a surviving peer or directly to master), but we end up paying a fairly steep price in useless allocations. It's fairly easy to create a situation where N calls of mount(2) create exactly N bindings, with O(N^2) vfsmounts allocated and freed in process. Fortunately, it is possible to avoid those allocations/freeings. The trick is to create copies in the right order and find which one would've eventually become a master with the current algorithm. It turns out to be possible in O(nodes getting propagation) time and with no extra allocations at all. One part is that we need to make sure that eventual master will be created before its slaves, so we need to walk the propagation tree in a different order - by peer groups. And iterate through the peers before dealing with the next group. Another thing is finding the (earlier) copy that will be a master of one we are about to create; to do that we are (temporary) marking the masters of mountpoints we are attaching the copies to. Either we are in a peer of the last mountpoint we'd dealt with, or we have the following situation: we are attaching to mountpoint M, the last copy S_0 had been attached to M_0 and there are sequences S_0...S_n, M_0...M_n such that S_{i+1} is a master of S_{i}, S_{i} mounted on M{i} and we need to create a slave of the first S_{k} such that M is getting propagation from M_{k}. It means that the master of M_{k} will be among the sequence of masters of M. On the other hand, the nearest marked node in that sequence will either be the master of M_{k} or the master of M_{k-1} (the latter - in the case if M_{k-1} is a slave of something M gets propagation from, but in a wrong peer group). So we go through the sequence of masters of M until we find a marked one (P). Let N be the one before it. Then we go through the sequence of masters of S_0 until we find one (say, S) mounted on a node D that has P as master and check if D is a peer of N. If it is, S will be the master of new copy, if not - the master of S will be. That's it for the hard part; the rest is fairly simple. Iterator is in next_group(), handling of one prospective mountpoint is propagate_one(). It seems to survive all tests and gives a noticably better performance than the current mainline for setups that are seriously using shared subtrees. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch mnt_hash to hlistAl Viro2014-03-314-50/+61
| | | | | | | | | | | | | fixes RCU bug - walking through hlist is safe in face of element moves, since it's self-terminating. Cyclic lists are not - if we end up jumping to another hash chain, we'll loop infinitely without ever hitting the original list head. [fix for dumb braino folded] Spotted by: Max Kellermann <mk@cm4all.com> Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* don't bother with propagate_mnt() unless the target is sharedAl Viro2014-03-311-10/+7
| | | | | | | | | If the dest_mnt is not shared, propagate_mnt() does nothing - there's no mounts to propagate to and thus no copies to create. Might as well don't bother calling it in that case. Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* keep shadowed vfsmounts togetherAl Viro2014-03-311-9/+23
| | | | | | | preparation to switching mnt_hash to hlist Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* resizable namespace.c hashesAl Viro2014-03-312-24/+59
| | | | | | | | | * switch allocation to alloc_large_system_hash() * make sizes overridable by boot parameters (mhash_entries=, mphash_entries=) * switch mountpoint_hashtable from list_head to hlist_head Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge branch 'timers-urgent-for-linus' of ↵Linus Torvalds2014-03-291-1/+2
|\ | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "A late breaking fix from John. (The bug fixed has a hard lockup potential, but that was not observed, warnings were)" * 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: time: Revert to calling clock_was_set_delayed() while in irq context
| * time: Revert to calling clock_was_set_delayed() while in irq contextJohn Stultz2014-03-281-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 47a1b796306356f35 ("tick/timekeeping: Call update_wall_time outside the jiffies lock"), we moved to calling clock_was_set() due to the fact that we were no longer holding the timekeeping or jiffies lock. However, there is still the problem that clock_was_set() triggers an IPI, which cannot be done from the timer's hard irq context, and will generate WARN_ON warnings. Apparently in my earlier testing, I'm guessing I didn't bump the dmesg log level, so I somehow missed the WARN_ONs. Thus we need to revert back to calling clock_was_set_delayed(). Signed-off-by: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1395963049-11923-1-git-send-email-john.stultz@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* | Merge branch 'for-linus' of ↵Linus Torvalds2014-03-291-1/+0
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fix from Sage Weil: "This drops a bad assert that a few users have been hitting but we've only recently been able to track down" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: rbd: drop an unsafe assertion
| * | rbd: drop an unsafe assertionAlex Elder2014-03-291-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Olivier Bonvalet reported having repeated crashes due to a failed assertion he was hitting in rbd_img_obj_callback(): Assertion failure in rbd_img_obj_callback() at line 2165: rbd_assert(which >= img_request->next_completion); With a lot of help from Olivier with reproducing the problem we were able to determine the object and image requests had already been completed (and often freed) at the point the assertion failed. There was a great deal of discussion on the ceph-devel mailing list about this. The problem only arose when there were two (or more) object requests in an image request, and the problem was always seen when the second request was being completed. The problem is due to a race in the window between setting the "done" flag on an object request and checking the image request's next completion value. When the first object request completes, it checks to see if its successor request is marked "done", and if so, that request is also completed. In the process, the image request's next_completion value is updated to reflect that both the first and second requests are completed. By the time the second request is able to check the next_completion value, it has been set to a value *greater* than its own "which" value, which caused an assertion to fail. Fix this problem by skipping over any completion processing unless the completing object request is the next one expected. Test only for inequality (not >=), and eliminate the bad assertion. Tested-by: Olivier Bonvalet <ob@daevel.fr> Signed-off-by: Alex Elder <elder@linaro.org> Reviewed-by: Sage Weil <sage@inktank.com> Reviewed-by: Ilya Dryomov <ilya.dryomov@inktank.com>