summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'bcache-for-3.15' of git://evilpiepirate.org/~kent/linux-bcache ↵Jens Axboe2014-03-1818-784/+664
|\ | | | | | | | | | | | | | | | | into for-3.15/drivers Kent writes: Jens, here's the bcache changes for 3.15. Lots of bugfixes, and some refactoring and cleanups.
| * bcache: remove nested function usageJohn Sheu2014-03-182-72/+76
| | | | | | | | | | | | | | | | | | | | | | Uninlined nested functions can cause crashes when using ftrace, as they don't follow the normal calling convention and confuse the ftrace function graph tracer as it examines the stack. Also, nested functions are supported as a gcc extension, but may fail on other compilers (e.g. llvm). Signed-off-by: John Sheu <john.sheu@gmail.com>
| * bcache: Kill bucket->gc_genKent Overstreet2014-03-184-11/+9
| | | | | | | | | | | | | | | | gc_gen was a temporary used to recalculate last_gc, but since we only need bucket->last_gc when gc isn't running (gc_mark_valid = 1), we can just update last_gc directly. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * bcache: Kill unused freelistKent Overstreet2014-03-186-129/+112
| | | | | | | | | | | | | | | | This was originally added as at optimization that for various reasons isn't needed anymore, but it does add a lot of nasty corner cases (and it was responsible for some recently fixed bugs). Just get rid of it now. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * bcache: Rework btree cache reserve handlingKent Overstreet2014-03-186-139/+145
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This changes the bucket allocation reserves to use _real_ reserves - separate freelists - instead of watermarks, which if nothing else makes the current code saner to reason about and is going to be important in the future when we add support for multiple btrees. It also adds btree_check_reserve(), which checks (and locks) the reserves for both bucket allocation and memory allocation for btree nodes; the old code just kinda sorta assumed that since (e.g. for btree node splits) it had the root locked and that meant no other threads could try to make use of the same reserve; this technically should have been ok for memory allocation (we should always have a reserve for memory allocation (the btree node cache is used as a reserve and we preallocate it)), but multiple btrees will mean that locking the root won't be sufficient anymore, and for the bucket allocation reserve it was technically possible for the old code to deadlock. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * bcache: Kill btree_io_wqKent Overstreet2014-03-183-24/+2
| | | | | | | | | | | | | | | | With the locking rework in the last patch, this shouldn't be needed anymore - btree_node_write_work() only takes b->write_lock which is never held for very long. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * bcache: btree locking reworkKent Overstreet2014-03-184-52/+133
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new lock, b->write_lock, which is required to actually modify - or write - a btree node; this lock is only held for short durations. This means we can write out a btree node without taking b->lock, which _is_ held for long durations - solving a deadlock when btree_flush_write() (from the journalling code) is called with a btree node locked. Right now just occurs in bch_btree_set_root(), but with an upcoming journalling rework is going to happen a lot more. This also turns b->lock is now more of a read/intent lock instead of a read/write lock - but not completely, since it still blocks readers. May turn it into a real intent lock at some point in the future. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * bcache: Fix a race when freeing btree nodesKent Overstreet2014-03-181-33/+20
| | | | | | | | | | | | | | | | | | | | | | | | This isn't a bulletproof fix; btree_node_free() -> bch_bucket_free() puts the bucket on the unused freelist, where it can be reused right away without any ordering requirements. It would be better to wait on at least a journal write to go down before reusing the bucket. bch_btree_set_root() does this, and inserting into non leaf nodes is completely synchronous so we should be ok, but future patches are just going to get rid of the unused freelist - it was needed in the past for various reasons but shouldn't be anymore. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * bcache: Add a real GC_MARK_RECLAIMABLEKent Overstreet2014-03-184-14/+21
| | | | | | | | | | | | | | This means the garbage collection code can better check for data and metadata pointers to the same buckets. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * bcache: Add bch_keylist_init_single()Kent Overstreet2014-03-182-4/+7
| | | | | | | | | | | | | | This will potentially save us an allocation when we've got inode/dirent bkeys that don't fit in the keylist's inline keys. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * bcache: Improve priority_statsKent Overstreet2014-03-181-6/+20
| | | | | | | | | | | | Break down data into clean data/dirty data/metadata. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * bcache: Better alloc tracepointsKent Overstreet2014-03-183-19/+46
| | | | | | | | | | | | | | Change the invalidate tracepoint to indicate how much data we're invalidating, and change the alloc tracepoints to indicate what offset they're for. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * bcache: Kill dead cgroup codeKent Overstreet2014-03-185-202/+0
| | | | | | | | | | | | This hasn't been used or even enabled in ages. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * bcache: stop moving_gc marking buckets that can't be moved.Nicholas Swenson2014-03-181-1/+4
| | | | | | | | Signed-off-by: Nicholas Swenson <nks@daterainc.com>
| * bcache: Fix moving_pred()Kent Overstreet2014-03-181-5/+3
| | | | | | | | | | | | Avoid a potential null pointer deref (e.g. from check keys for cache misses) Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * bcache: Fix moving_gc deadlocking with a foreground writeNicholas Swenson2014-03-185-8/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Deadlock happened because a foreground write slept, waiting for a bucket to be allocated. Normally the gc would mark buckets available for invalidation. But the moving_gc was stuck waiting for outstanding writes to complete. These writes used the bcache_wq, the same queue foreground writes used. This fix gives moving_gc its own work queue, so it was still finish moving even if foreground writes are stuck waiting for allocation. It also makes work queue a parameter to the data_insert path, so moving_gc can use its workqueue for writes. Signed-off-by: Nicholas Swenson <nks@daterainc.com> Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * bcache: Fix discard granularityKent Overstreet2014-03-181-0/+1
| | | | | | | | | | | | blk_stack_limits() doesn't like a discard granularity of 0. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * bcache: Fix another bug recovering from unclean shutdownKent Overstreet2014-03-183-65/+36
| | | | | | | | | | | | | | | | | | | | | | The on disk bucket gens are allowed to be out of date, when we reuse buckets that didn't have any live data in them. To deal with this, the initial gc has to update the bucket gen when we find a pointer gen newer than the bucket's gen. Unfortunately we weren't doing this for pointers in the journal that we're about to replay. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * bcache: Fix a bug recovering from unclean shutdownKent Overstreet2014-03-181-2/+2
| | | | | | | | | | | | | | The code to fixup incorrect bucket prios incorrectly did not skip btree node freeing keys Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * bcache: Fix a journalling reclaim after recovery bugKent Overstreet2014-03-181-2/+8
| | | | | | | | | | | | | | | | On recovery we weren't correctly keeping track of what journal buckets had open journal entries, thus it was possible for them to be overwritten until we'd written all new journal entries. Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * bcache: Fix a null ptr deref in journal replayKent Overstreet2014-03-181-1/+5
| | | | | | | | Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * bcache: Fix a lockdep splat in an error pathKent Overstreet2014-03-181-3/+5
| | | | | | | | Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * bcache: Fix a shutdown bugKent Overstreet2014-02-263-2/+12
| | | | | | | | | | | | Shutdown wasn't cancelling/waiting on journal_write_work() Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * bcache: Fix flash_dev_cache_miss() for real this timeKent Overstreet2014-02-261-14/+5
| | | | | | | | | | | | | | The code was using sectors to count the number of sectors it was zeroing... but then it passed it to bio_advance()... after it had been set to 0. Amusing... Signed-off-by: Kent Overstreet <kmo@daterainc.com>
| * bcache: Fix another compiler warning on m68kKent Overstreet2014-02-181-2/+2
| | | | | | | | | | | | | | Use a bigger hammer this time Signed-off-by: Kent Overstreet <kmo@daterainc.com> Cc: linux-stable <stable@vger.kernel.org>
* | mtip32xx: mtip_async_complete() bug fixesSam Bradshaw2014-03-132-39/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes 2 issues in the fast completion path: 1) Possible double completions / double dma_unmap_sg() calls due to lack of atomicity in the check and subsequent dereference of the upper layer callback function. Fixed with cmpxchg before unmap and callback. 2) Regression in unaligned IO constraining workaround for p420m devices. Fixed by checking if IO is unaligned and using proper semaphore if so. Signed-off-by: Sam Bradshaw <sbradshaw@micron.com> Cc: stable@kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
* | mtip32xx: Unmap the DMA segments before completing the IO requestFelipe Franciosi2014-03-131-12/+12
| | | | | | | | | | | | | | | | | | If the buffers are unmapped after completing a request, then stale data might be in the request. Signed-off-by: Felipe Franciosi <felipe@paradoxo.org> Cc: stable@kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
* | mtip32xx: Set queue bounce limitFelipe Franciosi2014-03-131-0/+1
| | | | | | | | | | | | | | | | | | We need to set the queue bounce limit during the device initialization to prevent excessive bouncing on 32 bit architectures. Signed-off-by: Felipe Franciosi <felipe@paradoxo.org> Cc: stable@kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
* | nvme: Use pci_enable_msi_range() and pci_enable_msix_range()Alexander Gordeev2014-03-131-24/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As result of deprecation of MSI-X/MSI enablement functions pci_enable_msix() and pci_enable_msi_block() all drivers using these two interfaces need to be updated to use the new pci_enable_msi_range() or pci_enable_msi_exact() and pci_enable_msix_range() or pci_enable_msix_exact() interfaces. Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Cc: Keith Busch <keith.busch@intel.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-nvme@lists.infradead.org Cc: linux-pci@vger.kernel.org Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* | cciss: Fallback to MSI rather than to INTx if MSI-X failedAlexander Gordeev2014-03-131-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the driver falls back to INTx mode when MSI-X initialization failed. This is a suboptimal behaviour for chips that also support MSI. This update changes that behaviour and falls back to MSI mode in case MSI-X mode initialization failed. Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Cc: Mike Miller <mike.miller@hp.com> Cc: iss_storagedev@hp.com Cc: Jens Axboe <axboe@kernel.dk> Cc: linux-pci@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
* | swim3: fix interruptible_sleep_on raceArnd Bergmann2014-03-131-7/+11
| | | | | | | | | | | | | | | | | | | | | | | | interruptible_sleep_on is racy and going away. This replaces the one caller in the swim3 driver with the equivalent race-free wait_event_interruptible call. Since we're here already, this also fixes the case where we get interrupted from atomic context, which used to just spin in the loop. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jens Axboe <axboe@fb.com>
* | ataflop: fix sleep_on racesArnd Bergmann2014-03-131-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sleep_on() is inherently racy, and has been deprecated for a long time. This fixes two instances in the atari floppy driver: * fdc_wait/fdc_busy becomes an open-coded mutex. We cannot use the regular mutex since it gets released in interrupt context. The open-coded version using wait_event() and cmpxchg() is equivalent to the existing code but does the checks atomically, and we can now safely check the condition with irqs enabled. * format_wait becomes a completion, which is the natural structure here. The format ioctl waits for the background task to either complete or abort. This does not attempt to fix the preexisting bug of calling schedule with local interrupts disabled. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Michael Schmitz <schmitz@biophys.uni-duesseldorf.de> Signed-off-by: Jens Axboe <axboe@fb.com>
* | DAC960: remove sleep_on usageArnd Bergmann2014-03-131-18/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sleep_on and its variants are going away. The use of sleep_on() in DAC960_V2_ExecuteUserCommand seems to be bogus because the command by the time we get there, the command has completed already and we just enter the timeout. Based on this interpretation, I concluded that we can replace it with a simple msleep(1000) and rearrange the code around it slightly. The interruptible_sleep_on_timeout in DAC960_gam_ioctl seems equivalent to the race-free version using wait_event_interruptible_timeout. I left the driver to return -EINTR rather than -ERESTARTSYS to preserve the timeout behavior. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Jens Axboe <axboe@fb.com>
* | mtip32xx: Use pci_enable_msi() instead of pci_enable_msi_range()Alexander Gordeev2014-03-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit "mtip32xx: Use pci_enable_msix_range() instead of pci_enable_msix()" was unnecessary, since pci_enable_msi() function is not deprecated and is still preferable for enabling the single MSI mode. This update reverts usage of pci_enable_msi() function. Besides, the changelog for that commit was bogus, since mtip32xx driver uses MSI interrupt, not MSI-X. Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: linux-pci@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
* | skd: Use pci_enable_msix_range() instead of pci_enable_msix()Alexander Gordeev2014-02-221-26/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As result of deprecation of MSI-X/MSI enablement functions pci_enable_msix() and pci_enable_msi_block() all drivers using these two interfaces need to be updated to use the new pci_enable_msi_range() and pci_enable_msix_range() interfaces. Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: linux-pci@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
* | skd: Use unified access to skdev->msix_entries throughout the codeAlexander Gordeev2014-02-221-2/+1
| | | | | | | | | | | | | | | | | | Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: linux-pci@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
* | skd: Fix incomplete cleanup of MSI-X interruptAlexander Gordeev2014-02-221-24/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When enabling MSI-X interrupts fails due to lack of memory the call to pci_disable_msix() is missed and the device is left with MSI-X interrupts enabled while the driver assumes otherwise. This update fixes the described misbehaviour and cleans up the code of skd_release_msix() function. Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: linux-pci@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
* | skd: Fix out of array boundary accessAlexander Gordeev2014-02-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When enabling MSI-X, interrupts are requested for SKD_MAX_MSIX_COUNT entries in skdev->msix_entries array, while the number of actually allocated entries is skdev->msix_count. This might lead to an out of boundary access in case number of allocated entries is less than SKD_MAX_MSIX_COUNT. This update fixes the described misbehaviour. Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Kyungmin Park <kyungmin.park@samsung.com> Cc: linux-pci@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
* | mtip32xx: Use pci_enable_msix_range() instead of pci_enable_msix()Alexander Gordeev2014-02-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | As result of deprecation of MSI-X/MSI enablement functions pci_enable_msix() and pci_enable_msi_block() all drivers using these two interfaces need to be updated to use the new pci_enable_msi_range() and pci_enable_msix_range() interfaces. Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: linux-pci@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
* | mtip32xx: Remove superfluous call to pci_disable_msi()Alexander Gordeev2014-02-221-1/+3
| | | | | | | | | | | | | | | | | | | | | | There is no need to call pci_disable_msi() in case the previous call to pci_enable_msi() failed Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Asai Thambi S P <asamymuthupa@micron.com> Cc: linux-pci@vger.kernel.org Signed-off-by: Jens Axboe <axboe@fb.com>
* | drbd: Fix future possible NULL pointer dereferenceAndreas Gruenbacher2014-02-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now every resource has exactly one connection. But we are preparing for dynamic connections. I.e. in the future thre can be resources without connections. However smatch points this out as 'variable dereferenced before check', which is correct. This issue was introduced in drbd: get_one_status(): Iterate over resource->devices instead of connection->peer_devices Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* | drbd: Add drbd_thread->resource and make drbd_thread->connection optionalAndreas Gruenbacher2014-02-172-19/+31
| | | | | | | | | | | | | | | | | | In the drbd_thread "infrastructure" functions, only use the resource instead of the connection. Make the connection field of drbd_thread optional. This will allow to introduce threads which are not associated with a connection. Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
* | drbd: Use the right peer deviceAndreas Gruenbacher2014-02-171-31/+38
| | | | | | | | | | | | | | in w_e_ (peer request) callbacks and in peer request I/O completion handlers Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
* | drbd: Remove unused parameter of wire_flags_to_bio()Andreas Gruenbacher2014-02-171-2/+2
| | | | | | | | | | Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
* | drbd: Get rid of first_peer_device() in handle_write_conflicts()Andreas Gruenbacher2014-02-171-5/+3
| | | | | | | | | | Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
* | drbd: In the worker thread, process drbd_work instead of drbd_device_work itemsAndreas Gruenbacher2014-02-171-7/+7
| | | | | | | | | | Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
* | drbd: Turn w_make_ov_request and make_resync_request into "normal" functionsAndreas Gruenbacher2014-02-171-9/+6
| | | | | | | | | | | | | | These functions are not used as drbd_work callbacks. Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
* | drbd: Make w_make_resync_request() staticAndreas Gruenbacher2014-02-172-3/+2
| | | | | | | | | | Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
* | drbd: struct drbd_peer_request: Use drbd_work instead of drbd_device_workAndreas Gruenbacher2014-02-173-75/+69
| | | | | | | | | | Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
* | drbd: struct after_conn_state_chg_work: Use drbd_work instead of ↵Andreas Gruenbacher2014-02-171-4/+4
| | | | | | | | | | | | | | drbd_device_work Signed-off-by: Andreas Gruenbacher <agruen@linbit.com> Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>