| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enabling failfast should let request fail immediately if either an
error occurred or the device gets disconnected.
For disconnected devices new requests are not fetches from the block
queue and therefore failfast is not triggered.
Fix by letting the DASD driver fetch requests for disconnected devices
with failfast active.
Signed-off-by: Stefan Haberland <stefan.haberland@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Return value from kzalloc is not tested and
using a null pointer would lead to crash.
Even if this should not happen at this moment, we
may let the system decide if there is a better choice.
Signed-off-by: Pierre Morel <pmorel@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Constifies hmcdrv_ftp_ops structures in s390's char
driver since they are not modified after their
initialization.
Detected and found using Coccinelle.
Suggested-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Aya Mahfouz <mahfouz.saif.elyazal@gmail.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add NULL test on call to kzalloc.
The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@@
expression x;
identifier fld;
@@
* x = kzalloc(...);
... when != x == NULL
x->fld
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adding tracepoints to inline functions adds tracepoint invocation code
for each instance where the function is inlined. The resulting increase
in kernel code size can have negative impact:
- More cache misses in instruction cache
- Reduced amount of DMA-capable memory
Therefore change all functions implementing I/O instructions from inline
to normal functions.
Bloat-o-meter summary after change (using performance_defconfig):
add/remove: 24/2 grow/shrink: 4/39 up/down: 2205/-4858 (-2653)
Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Add tracepoints to interrupt handler and core inline assemblies used by
the s390 common I/O layer. These tracepoints can be used to monitor and
validate hardware and hypervisor requests and responses.
Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Reviewed-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Reviewed-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Replace the current semi-arbitrary distribution of inline assemblies:
- Inline assemblies used by CIO go into ioasm.h
- Data definitions used by inline assemblies go into cio.h
Beyond cleaning up the current structure this is also required for
use of tracepoints in inline assemblies introduced by a follow-on
patch.
Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Acked-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The numeric representation of the xsch instruction was incorrectly
specified, resulting in reserved fields of the instruction opcode
potentially being set to a non-zero value. While this doesn't currently
cause any problem, a future architecture might make use of these fields
so that the current specification could result in an exception or
unwanted side-effects.
Fix this by using the xsch instruction code for which support in
binutils was added in 2003.
Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
There is no longer a need to maintain two versions of the same inline
assembly - one with exception handling, and one without - so get rid of
the duplicates and adjust names accordingly. This applies to stsch_err
and msch_err which are now renamed to stsch and msch respectively,
while the original msch function is removed.
Signed-off-by: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Provide a user space interface and an enhancement to the sclp device driver
which allows to send an 'Open for Business' event from the operating system
to the Support Element. The 'Open for Business' event is used to signal the
Support Element that the operating system (or an application running on top
of it) is up and running.
Signed-off-by: Jochen Schweflinghaus <schwefel@de.ibm.com>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since commit c05ffc4f2b20 ("[S390] sclp: sysfs interface for
SCLP cpi"), which was made 2008 the user can specify a system
and sysplex name through the /sys/firmware/cpi interface. In
addition to sysplex and system name, the user can also override
the system type and system version.
Because the syfs interface is easier to use and allows the
settings to be updated, the sclp_cpi module becomes obsolete
and can be removed.
Signed-off-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Acked-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
The debug_unregister() function performs also input parameter validation.
Thus the test around the calls is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce save_area_alloc(), save_area_boot_cpu(), save_area_add_regs()
and save_area_add_vxrs to deal with storing the CPU state in case of a
system dump. Remove struct save_area and save_area_ext, and create a new
struct save_area as a local definition to arch/s390/kernel/crash_dump.c.
Copy each individual field from the hardware status area to the save area,
storing the minimum of required data.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
To collect the CPU registers of the crashed system allocated a single
page with memblock_alloc_base and use it as a copy buffer. Replace the
stop-and-store-status sigp with a store-status-at-address sigp in
smp_save_dump_cpus() and smp_store_status(). In both cases the target
CPU is already stopped and store-status-at-address avoids the detour
via the absolute zero page.
For kexec simplify s390_reset_system and call store_status() before
the prefix register of the boot CPU has been set to zero. Use STPX
to store the prefix register and remove dump_prefix_page.
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
| |
Replace the SAVE_AREA_BASE offset calculations in reipl.S with the
assembler constant for the location of each register status area.
Use __LC_FPREGS_SAVE_AREA instead of SAVE_AREA_BASE in the three
remaining code locations and remove the definition of SAVE_AREA_BASE.
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
|
|
|
|
|
|
|
| |
Replace the three part copy logic int memcpy_hsa with a single loop
around sclp_sdias_copy with appropriate offset and size calculations,
and inline memcpy_hsa into memcpy_hsa_user and memcpy_hsa_kernel.
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
|
|
|
|
|
|
|
| |
Introduce two copy functions for the memory of the dumped system,
copy_oldmem_kernel() to copy to the virtual kernel address space
and copy_oldmem_user() to copy to user space.
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
|
|
|
|
|
|
| |
New versions of the SCSI dumper use the /dev/vmcore interface instead
of zcore mem. Remove the outdated interface.
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The /sys/kernel/debug/zcore/mem interface delivers the memory of the
old system with the CPU registers stored to the assigned locations in
each prefix page.
For the vector registers the prefix page of each CPU has an address of
a 1024 byte save area at 0x11b0. But the /sys/kernel/debug/zcore/mem
interface fails copy the vector registers saved at boot of the zfcpdump
kernel into the dump image.
Copy the saved vector registers of a CPU to the outout buffer if the
memory area that is read via /sys/kernel/debug/zcore/mem intersects
with the vector register save area of this CPU.
Acked-by: Michael Holzheu <holzheu@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
|
|
|
|
|
|
| |
The extended save area for the boot CPU has been allocated by
smp_save_dump_cpus() with memblock_alloc() and may not be freed
with kfree().
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the AP queue depth of requests was reached additional requests
have been ignored. These request are stuck in the request queue.
The AP queue handling now push the next waiting request into the
queue after fetching a previous serviced and finished reply.
Signed-off-by: Ingo Tuchscherer <ingo.tuchscherer@linux.vnet.ibm.com>
Acked-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Acked-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
|
|
|
|
|
|
|
|
|
|
| |
When we fail various metadata related operations in nvme_queue_rq we
need to unmap the data SGL.
Cc: stable@vger.kernel.org
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We received a bug report recently when DDW (64-bit direct DMA on Power)
is not enabled for NVMe devices. In that case, we fall back to 32-bit
DMA via the IOMMU, which is always done via 4K TCEs (Translation Control
Entries).
The NVMe device driver, though, assumes that the DMA alignment for the
PRP entries will match the device's page size, and that the DMA aligment
matches the kernel's page aligment. On Power, the the IOMMU page size,
as mentioned above, can be 4K, while the device can have a page size of
8K, while the kernel has a page size of 64K. This eventually trips the
BUG_ON in nvme_setup_prps(), as we have a 'dma_len' that is a multiple
of 4K but not 8K (e.g., 0xF000).
In this particular case of page sizes, we clearly want to use the
IOMMU's page size in the driver. And generally, the NVMe driver in this
function should be using the IOMMU's page size for the default device
page size, rather than the kernel's page size. There is not currently an
API to obtain the IOMMU's page size across all architectures and in the
interest of a stop-gap fix to this functional issue, default the NVMe
device page size to 4K, with the intent of adding such an API and
implementation across all architectures in the next merge window.
With the functionally equivalent v3 of this patch, our hardware test
exerciser survives when using 32-bit DMA; without the patch, the kernel
will BUG within a few minutes.
Signed-off-by: Nishanth Aravamudan <nacc at linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
"Two fixes for 4.4-rc1's DM ioctl changes that introduced the potential
for infinite recursion on ioctl (with DM multipath).
And four stable fixes:
- A DM thin-provisioning fix to restore 'error_if_no_space' setting
when a thin-pool is made writable again (after having been out of
space).
- A DM thin-provisioning fix to properly advertise discard support
for thin volumes that are stacked on a thin-pool whose underlying
data device doesn't support discards.
- A DM ioctl fix to allow ctrl-c to break out of an ioctl retry loop
when DM multipath is configured to 'queue_if_no_path'.
- A DM crypt fix for a possible hang on dm-crypt device removal"
* tag 'dm-4.4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm thin: fix regression in advertised discard limits
dm crypt: fix a possible hang due to race condition on exit
dm mpath: fix infinite recursion in ioctl when no paths and !queue_if_no_path
dm: do not reuse dm_blk_ioctl block_device input as local variable
dm: fix ioctl retry termination with signal
dm thin: restore requested 'error_if_no_space' setting on OODS to WRITE transition
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When establishing a thin device's discard limits we cannot rely on the
underlying thin-pool device's discard capabilities (which are inherited
from the thin-pool's underlying data device) given that DM thin devices
must provide discard support even when the thin-pool's underlying data
device doesn't support discards.
Users were exposed to this thin device discard limits regression if
their thin-pool's underlying data device does _not_ support discards.
This regression caused all upper-layers that called the
blkdev_issue_discard() interface to not be able to issue discards to
thin devices (because discard_granularity was 0). This regression
wasn't caught earlier because the device-mapper-test-suite's extensive
'thin-provisioning' discard tests are only ever performed against
thin-pool's with data devices that support discards.
Fix is to have thin_io_hints() test the pool's 'discard_enabled' feature
rather than inferring whether or not a thin device's discard support
should be enabled by looking at the thin-pool's discard_granularity.
Fixes: 216076705 ("dm thin: disable discard support for thin devices if pool's is disabled")
Reported-by: Mike Gerber <mike@sprachgewalt.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # 4.1+
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
A kernel thread executes __set_current_state(TASK_INTERRUPTIBLE),
__add_wait_queue, spin_unlock_irq and then tests kthread_should_stop().
It is possible that the processor reorders memory accesses so that
kthread_should_stop() is executed before __set_current_state(). If such
reordering happens, there is a possible race on thread termination:
CPU 0:
calls kthread_should_stop()
it tests KTHREAD_SHOULD_STOP bit, returns false
CPU 1:
calls kthread_stop(cc->write_thread)
sets the KTHREAD_SHOULD_STOP bit
calls wake_up_process on the kernel thread, that sets the thread
state to TASK_RUNNING
CPU 0:
sets __set_current_state(TASK_INTERRUPTIBLE)
spin_unlock_irq(&cc->write_thread_wait.lock)
schedule() - and the process is stuck and never terminates, because the
state is TASK_INTERRUPTIBLE and wake_up_process on CPU 1 already
terminated
Fix this race condition by using a new flag DM_CRYPT_EXIT_THREAD to
signal that the kernel thread should exit. The flag is set and tested
while holding cc->write_thread_wait.lock, so there is no possibility of
racy access to the flag.
Also, remove the unnecessary set_task_state(current, TASK_RUNNING)
following the schedule() call. When the process was woken up, its state
was already set to TASK_RUNNING. Other kernel code also doesn't set the
state to TASK_RUNNING following schedule() (for example,
do_wait_for_common in completion.c doesn't do it).
Fixes: dc2676210c42 ("dm crypt: offload writes to thread")
Signed-off-by: Mikulas Patocka <mpatocka@redhat.com>
Cc: stable@vger.kernel.org # v4.0+
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In multipath_prepare_ioctl(),
- pgpath is a path selected from available paths
- m->queue_io is true if we cannot send a request immediately to
paths, either because:
* there is no available path
* the path group needs activation (pg_init)
- pg_init is not started
- pg_init is still running
- m->queue_if_no_path is true if the device is configured to queue
I/O if there are no available paths
If !pgpath && !m->queue_if_no_path, the handler should return -EIO.
However in the course of refactoring the condition check has broken
and returns success in that case. Since bdev points to the dm device
itself, dm_blk_ioctl() calls __blk_dev_driver_ioctl() for itself and
recurses until crash.
You could reproduce the problem like this:
# dmsetup create mp --table '0 1024 multipath 0 0 0 0'
# sg_inq /dev/mapper/mp
<crash>
[ 172.648615] BUG: unable to handle kernel paging request at fffffffc81b10268
[ 172.662843] PGD 19dd067 PUD 0
[ 172.666269] Thread overran stack, or stack corrupted
[ 172.671808] Oops: 0000 [#1] SMP
...
Fix the condition check with some clarifications.
Fixes: e56f81e0b01e ("dm: refactor ioctl handling")
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
(Ab)using the @bdev passed to dm_blk_ioctl() opens the potential for
targets' .prepare_ioctl to fail if they go on to check the bdev for
!NULL.
Fixes: e56f81e0b01e ("dm: refactor ioctl handling")
Reported-by: Junichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
dm-mpath retries ioctl, when no path is readily available and the device
is configured to queue I/O in such a case. If you want to stop the retry
before multipathd decides to turn off queueing mode, you could send
signal for the process to exit from the loop.
However the check of fatal signal has not carried along when commit
6c182cd88d17 ("dm mpath: fix ioctl deadlock when no paths") moved the
loop from dm-mpath to dm core. As a result, we can't terminate such
a process in the retry loop.
Easy reproducer of the situation is:
# dmsetup create mp --table '0 1024 multipath 0 0 0 0'
# dmsetup message mp 0 'queue_if_no_path'
# sg_inq /dev/mapper/mp
then you should be able to terminate sg_inq by pressing Ctrl+C.
Fixes: 6c182cd88d17 ("dm mpath: fix ioctl deadlock when no paths")
Signed-off-by: Jun'ichi Nomura <j-nomura@ce.jp.nec.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Mikulas Patocka <mpatocka@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
transition
A thin-pool that is in out-of-data-space (OODS) mode may transition back
to write mode -- without the admin adding more space to the thin-pool --
if/when blocks are released (either by deleting thin devices or
discarding provisioned blocks).
But as part of the thin-pool's earlier transition to out-of-data-space
mode the thin-pool may have set the 'error_if_no_space' flag to true if
the no_space_timeout expires without more space having been made
available. That implementation detail, of changing the pool's
error_if_no_space setting, needs to be reset back to the default that
the user specified when the thin-pool's table was loaded.
Otherwise we'll drop the user requested behaviour on the floor when this
out-of-data-space to write mode transition occurs.
Reported-by: Vivek Goyal <vgoyal@redhat.com>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Acked-by: Joe Thornber <ejt@redhat.com>
Fixes: 2c43fd26e4 ("dm thin: fix missing out-of-data-space to write mode transition if blocks are released")
Cc: stable@vger.kernel.org
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Pull block layer fixes from Jens Axboe:
"A round of fixes/updates for the current series.
This looks a little bigger than it is, but that's mainly because we
pushed the lightnvm enabled null_blk change out of the merge window so
it could be updated a bit. The rest of the volume is also mostly
lightnvm. In particular:
- Lightnvm. Various fixes, additions, updates from Matias and
Javier, as well as from Wenwei Tao.
- NVMe:
- Fix for potential arithmetic overflow from Keith.
- Also from Keith, ensure that we reap pending completions from
a completion queue before deleting it. Fixes kernel crashes
when resetting a device with IO pending.
- Various little lightnvm related tweaks from Matias.
- Fixup flushes to go through the IO scheduler, for the cases where a
flush is not required. Fixes a case in CFQ where we would be
idling and not see this request, hence not break the idling. From
Jan Kara.
- Use list_{first,prev,next} in elevator.c for cleaner code. From
Gelian Tang.
- Fix for a warning trigger on btrfs and raid on single queue blk-mq
devices, where we would flush plug callbacks with preemption
disabled. From me.
- A mac partition validation fix from Kees Cook.
- Two merge fixes from Ming, marked stable. A third part is adding a
new warning so we'll notice this quicker in the future, if we screw
up the accounting.
- Cleanup of thread name/creation in mtip32xx from Rasmus Villemoes"
* 'for-linus' of git://git.kernel.dk/linux-block: (32 commits)
blk-merge: warn if figured out segment number is bigger than nr_phys_segments
blk-merge: fix blk_bio_segment_split
block: fix segment split
blk-mq: fix calling unplug callbacks with preempt disabled
mac: validate mac_partition is within sector
mtip32xx: use formatting capability of kthread_create_on_node
NVMe: reap completion entries when deleting queue
lightnvm: add free and bad lun info to show luns
lightnvm: keep track of block counts
nvme: lightnvm: use admin queues for admin cmds
lightnvm: missing free on init error
lightnvm: wrong return value and redundant free
null_blk: do not del gendisk with lightnvm
null_blk: use device addressing mode
null_blk: use ppa_cache pool
NVMe: Fix possible arithmetic overflow for max segments
blk-flush: Queue through IO scheduler when flush not required
null_blk: register as a LightNVM device
elevator: use list_{first,prev,next}_entry
lightnvm: cleanup queue before target removal
...
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
kthread_create_on_node takes format+args, so there's no need to do the
pretty-printing in advance. Moreover, "mtip_svc_thd_99" (including its
'\0') only just fits in 16 bytes, so if index could ever go above 99
we'd have a stack buffer overflow.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Make sure that there are no unprocesssed entries on a completion
queue before deleting it, and check for validity of the CQ
door bell before writing completions to it.
This fixes problems with doing a sysfs reset of the device while
it's handling IO.
Tested-by: Jon Derrick <jonathan.derrick@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add free block, used block, and bad block information to the show debug
interface. This information is used to debug how targets track blocks.
Also, change debug function name to make it more generic.
Signed-off-by: Javier Gonzalez <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Maintain number of in use blocks, free blocks, and bad blocks in a per
lun basis. This allows the upper layers to get information about the
state of each lun.
Also, account for blocks reserved to the device on the free block count.
nr_free_blocks matches now the actual number of blocks on the free list
when the device is booted.
Signed-off-by: Javier Gonzalez <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
According to the Open-Channel SSD Specification, the NVMe-NVM admin
commands use vendor specific opcodes of NVMe, so use the NVMe admin
queue to dispatch these commands.
Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com>
Updated by me to include set bad block table as well and also use
the admin queue for l2p len calculation.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If either max_phys_sect is out of bound, the nvm_dev structure is not
freed.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The return value should be non-zero under error conditions.
Remove nvme_free(dev) to avoid free dev more than once.
Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The gendisk structure has not been initialized when using lightnvm.
Make sure to not delete it upon exit. Also make sure that we use the
appropriate disk_name at unregistration.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The linear addressing mode was removed in 7386af2. Make null_blk instead
expose the ppa format geometry and support the generic addressing mode.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Instead of using a page pool, we can save memory by only allocating room
for 64 entries for the ppa command. Introduce a ppa_cache to allocate only
the required memory for the ppa list.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | | |
Reported-by: Paul Grabinar <paul.grabinar@ranbarg.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add support for registering as a LightNVM device. This allows us to
evaluate the performance of the LightNVM subsystem.
In /drivers/Makefile, LightNVM is moved above block device drivers
to make sure that the LightNVM media managers have been initialized
before drivers under /drivers/block are initialized.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Fix by Jens Axboe to remove unneeded slab cache and the following
memory leak.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This prevents outstanding IOs to be sent for completion to target after
the target has been removed. The flow is now: stop new IOs > cleanup
queue > remove target.
Signed-off-by: Javier Gonzalez <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The specification was updated the remove the double word just after
number of configuration groups and capabilities. Update the identify
structure to reflect it.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The ppa format was not copied from the NVMe specific ppa format to the
lightnvm specific ppa format. This led to the ppa format not being
communicated to the layers above.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The linear and device specific address modes can be replaced with a
simple offset and bit length conversion that is generic across all
devices.
This both simplifies the specification and removes the special case for
qemu nvme, that previously relied on the linear address mapping.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Both the nvm_register and nvm_init does a kfree(dev) on error. Make sure
to only free it once.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We register with nvm_devices when there registration can still fail.
Move the final registration at the end of the nvm_register function
to make sure we are fully registered when added to the nvm_devices list.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Only NAND flash with SLC and MLC is supported. Make sure to not try to
initialize TLC memory or other non-volatile memory types.
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|