summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* NVMe: Add the nvme thread to the wait queue before waking it upMatthew Wilcox2011-11-041-0/+2
| | | | | | | | | | If the I/O was not completed by a single NVMe command, we add the bio to the congestion list and wake up the kthread to resubmit it. But the kthread calls remove_wait_queue() unconditionally, which will oops if it's not on the wait queue. So add the kthread to the wait queue before waking it up. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Return real error from nvme_create_queueMatthew Wilcox2011-11-041-4/+4
| | | | | | | | | nvme_setup_io_queues() was assuming that a NULL return from nvme_create_queue() was an out-of-memory error. That's not necessarily true; the adapter might return -EIO, for example. Change the calling convention to return an ERR_PTR on failure instead of NULL. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Version 0.6Matthew Wilcox2011-11-041-1/+1
| | | | Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Add a few calling convention notesMatthew Wilcox2011-11-041-1/+10
| | | | | | | For the benefit of reviewers, add comments to a few functions describing their calling context Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Handle failures from memory allocations in nvme_setup_prpsMatthew Wilcox2011-11-041-15/+41
| | | | | | | | | | | | | | | If any of the memory allocations in nvme_setup_prps fail, handle it by modifying the passed-in data length to reflect the number of bytes we are actually able to send. Also allow the caller to specify the GFP flags they need; for user-initiated commands, we can use GFP_KERNEL allocations. The various callers are updated to handle this possibility; the main I/O path is already prepared for this possibility (as it may happen due to nvme_map_bio being unable to map all the segments of the I/O). The other callers return -ENOMEM instead of doing partial I/Os. Reported-by: Andi Kleen <andi@firstfloor.org> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Use an IDA to allocate minor numbersMatthew Wilcox2011-11-041-4/+34
| | | | | | | | | The current approach of using the namespace ID as the minor number doesn't work when there are multiple adapters in the machine. Rather than statically partitioning the number of namespaces between adapters, dynamically allocate minor numbers to namespaces as they are detected. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Add include of delay.h for msleepMatthew Wilcox2011-11-041-0/+1
| | | | | | Previously it was being implicitly included through some other header file Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Add support for timing out I/OsMatthew Wilcox2011-11-041-6/+31
| | | | | | | In the kthread, walk the list of outstanding I/Os and check they've not hit the timeout. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Rename cancel_cmdid_data to cancel_cmdidMatthew Wilcox2011-11-041-2/+5
| | | | | | | The trailing '_data' on the end was annoying and inconsistent. Also, make it actually return the data since this is needed for timing out commands. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Fix bug in error handlingMatthew Wilcox2011-11-041-2/+2
| | | | | | | When an I/O completed with an error, we would call bio_endio twice (once with -EIO and once with 0). Found by inspection. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Time out initialisation after a few secondsMatthew Wilcox2011-11-042-0/+12
| | | | | | | | | THe device reports (in its capability register) how long it will take to initialise. If that time elapses before the ready bit becomes set, conclude the device is broken and refuse to initialise it. Log a nice error message so the user knows why we did nothing. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Fix warning in free_irqMatthew Wilcox2011-11-041-1/+3
| | | | | | | We need to clear the affinity mask before calling free_irq() Reported-by: Shane Michael Matthews <shane.matthews@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Correct the Controller Configuration settingsMatthew Wilcox2011-11-042-4/+7
| | | | | | | | | The arbitration field was extended by one bit, shifting the shutdown notification bits by one. Also, the SQ/CQ entry size was made configurable for future extensions. Reported-by: Paul Luse <paul.e.luse@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Version 0.5Matthew Wilcox2011-11-041-1/+1
| | | | Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Change the definition of nvme_user_ioMatthew Wilcox2011-11-042-15/+20
| | | | | | | | | | | | | | | | | | | | | | | The read and write commands don't define a 'result', so there's no need to copy it back to userspace. Remove the ability of the ioctl to submit commands to a different namespace; it's just asking for trouble, and the use case I have in mind will be addressed througha different ioctl in the future. That removes the need for both the block_shift and nsid arguments. Check that the opcode is one of 'read' or 'write'. Future opcodes may be added in the future, but we will need a different structure definition for them. The nblocks field is redefined to be 0-based. This allows the user to request the full 65536 blocks. Don't byteswap the reftag, apptag and appmask. Martin Petersen tells me these are calculated in big-endian and are transmitted to the device in big-endian. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Correct the definitions of two ioctlsMatthew Wilcox2011-11-041-2/+2
| | | | | | | | NVME_IOCTL_SUBMIT_IO has a struct nvme_user_io, not a struct nvme_rw_command as a parameter, and NVME_IOCTL_DOWNLOAD_FW is a Write, not a Read. Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Add compat_ioctlMatthew Wilcox2011-11-041-0/+1
| | | | | | | | | Make ioctls work for 32-bit applications on 64-bit kernels. The structures are defined to be the same for both 32- and 64-bit applications, so we can use the same handler for both. Reported-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Simplify queue lookupMatthew Wilcox2011-11-041-6/+6
| | | | | | | | | Fill in all the num_possible_cpus() entries with duplicate pointers. This reduces the complexity of the frequently-called get_nvmeq(), as well as avoiding a bug in it when there are fewer queues than CPUs. Reported-by: Shane Michael Matthews <shane.matthews@intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Remove the kthread from the wait queueMatthew Wilcox2011-11-041-0/+3
| | | | | | | Once there are no more bios on the congestion list, we can stop waking up the nvme kthread every time a completion happens. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Fix off-by-one when filling in PRP listsMatthew Wilcox2011-11-041-3/+4
| | | | | | | | If the last element in the PRP list fits on the end of the page, there's no need to allocate an extra page to put that single element in. It can fit on the end of the page. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Fix interpretation of 'Number of Namespaces' fieldMatthew Wilcox2011-11-041-1/+1
| | | | | | | The spec says this is a 0s based value. We don't need to handle the maximal value because it's reserved to mean "every namespace". Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Remove outdated commentsMatthew Wilcox2011-11-042-2/+0
| | | | | | | The head can never overrun the tail since we won't allocate enough command IDs to let that happen. The status codes are in sync with the spec. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Fix comment formattingMatthew Wilcox2011-11-041-2/+4
| | | | | Reported-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Convert comments to kernel-doc notationMatthew Wilcox2011-11-041-5/+5
| | | | | Reported-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Update admin opcodes to match the 1.0RC specKrzysztof Wierzbicki2011-11-041-7/+7
| | | | | | Signed-off-by: Krzysztof Wierzbicki <krzysztof.wierzbicki@intel.com> Signed-off-by: Matthew Wilcox <willy@linux.intel.com> Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Version 0.4Matthew Wilcox2011-11-041-1/+1
| | | | Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Reduce maximum queue depth by 1Matthew Wilcox2011-11-041-1/+1
| | | | | | | The spec says we're not allowed to completely fill the submission queue. Solve this by reducing the number of allocatable cmdids by 1. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Fix discontiguous accessesMatthew Wilcox2011-11-041-0/+2
| | | | | | | | When we submit subsequent portions of the I/O, we need to access the updated block, not start reading again from the original position. This was showing up as miscompares in the XFS randholes testcase. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Handle bios that contain non-virtually contiguous addressesMatthew Wilcox2011-11-041-9/+29
| | | | | | | | | | NVMe scatterlists must be virtually contiguous, like almost all I/Os. However, when the filesystem lays out files with a hole, it can be that adjacent LBAs map to non-adjacent virtual addresses. Handle this by submitting one NVMe command at a time for each virtually discontiguous range. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Implement FlushMatthew Wilcox2011-11-041-0/+40
| | | | | | | | | Linux implements Flush as a bit in the bio. That means there may also be data associated with the flush; if so the flush should be sent before the data. To avoid completing the bio twice, I add CMD_CTX_FLUSH to indicate the completion routine should do nothing. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Mark CMD_CTX_CANCELLED as being unlikelyMatthew Wilcox2011-11-041-1/+1
| | | | Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Correct SQ doorbell semanticsMatthew Wilcox2011-11-041-2/+2
| | | | | | | The value written to the doorbell needs to be the first free index in the queue, not the most recently used index in the queue. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Let the kthread take care of devices earlierMatthew Wilcox2011-11-041-4/+10
| | | | | | | If interrupts are misconfigured, the kthread will be needed to process admin queue completions. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Rename nr_queues to nr_io_queuesMatthew Wilcox2011-11-041-11/+12
| | | | | | | | I got confused about whether this included the admin queue or not, and had to resort to reading the spec. It doesn't include the admin queue, so make that clear in the name. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Remove setting of 'flags' in rw commandMatthew Wilcox2011-11-041-1/+0
| | | | | | This was the data transfer bit until spec rev 0.92 Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Release 0.3Matthew Wilcox2011-11-041-1/+1
| | | | Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Add a kthread to handle the congestion listMatthew Wilcox2011-11-041-19/+67
| | | | | | | | Instead of trying to resubmit I/Os in the I/O completion path (in interrupt context), wake up a kthread which will resubmit I/O from user context. This allows mke2fs to run to completion. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Handle failures differently in nvme_submit_bio_queue()Matthew Wilcox2011-11-041-19/+17
| | | | | | | | | | | | Return -EBUSY if the queue is full or -ENOMEM if we failed to allocate memory (or map a scatterlist). Also use GFP_ATOMIC to allocate the nvme_bio and move the locking to the callers of nvme_submit_bio_queue(). In nvme_make_request(), don't permit an I/O to jump the queue -- if the congestion list already has an entry, just add to the tail, rather than trying to submit. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Update BAR structure to match the current specMatthew Wilcox2011-11-041-2/+4
| | | | | | | | | | Add two reserved registers in the middle of the BAR to match the 1.0 spec plus ECN 0002. Also rename IMC and ISC to INTMC and INTSC to conform with the spec. We still don't need to use them :-) Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Handle physical merging of bvec entriesMatthew Wilcox2011-11-041-9/+15
| | | | | | | In order to not overrun the sg array, we have to merge physically contiguous pages into a single sg entry. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Check for DMA mapping failureMatthew Wilcox2011-11-041-1/+7
| | | | | | If dma_map_sg returns 0 (failure), we need to fail the I/O. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Pass the nvme_dev to nvme_free_prps and nvme_setup_prpsMatthew Wilcox2011-11-041-13/+11
| | | | | | | | We were passing the nvme_queue to access the q_dmadev for the dma_alloc_coherent calls, but since we moved to the dma pool API, we really only need the nvme_dev. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Optimise memory usage for I/Os between 4k and 128kMatthew Wilcox2011-11-041-8/+23
| | | | | | | Add a second memory pool for smaller I/Os. We can pack 16 of these on a single page instead of using an entire page for each one. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Switch to use DMA Pool APIMatthew Wilcox2011-11-041-7/+32
| | | | | | | | Calling dma_free_coherent from interrupt context causes warnings. Using the DMA pools delays freeing until pool destruction, so avoids the problem. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Rename nvme_req_info to nvme_bioMatthew Wilcox2011-11-041-24/+24
| | | | | | | | There are too many things called 'info' in this driver. This data structure is auxiliary information for a struct bio, so call it nvme_bio, or nbio when used as a variable. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Initial PRP List supportShane Michael Matthews2011-11-041-13/+91
| | | | | | | | | | | | | Add a pointer to the nvme_req_info to hold a new data structure (nvme_prps) which contains a list of the pages allocated to this particular request for holding PRP list entries. nvme_setup_prps() now returns this pointer. To allocate and free the memory used for PRP lists, we need a struct device, so we need to pass the nvme_queue pointer to many functions which didn't use to need it. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Advance the sg pointer when filling in an sg listMatthew Wilcox2011-11-041-0/+1
| | | | | | | For multipage BIOs, we were always using sg[0] instead of advancing through the list. Oops :-) Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Renumber the special context valuesMatthew Wilcox2011-11-041-3/+3
| | | | | | | If POISON_POINTER_DELTA isn't defined, ensure they're in page 0 which should never be mapped. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Handle the congestion list a little betterMatthew Wilcox2011-11-041-0/+17
| | | | | | | | In the bio completion handler, check for bios on the congestion list for this NVM queue. Also, lock the congestion list in the make_request function as the queue may end up being shared between multiple CPUs. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>
* NVMe: Record the timeout for each commandMatthew Wilcox2011-11-041-17/+32
| | | | | | | | In addition to recording the completion data for each command, record the anticipated completion time. Choose a timeout of 5 seconds for normal I/Os and 60 seconds for admin I/Os. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com>