| Commit message (Collapse) | Author | Age | Files | Lines |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We have a fundamental issue that fabric commands use the admin_q.
The reason is, that admin-connect, register reads and writes and
admin commands cannot be guaranteed ordering while we are running
controller resets.
For example, when we reset a controller we perform:
1. disable the controller
2. teardown the admin queue
3. re-establish the admin queue
4. enable the controller
In order to perform (3), we need to unquiesce the admin queue, however
we may have some admin commands that are already pending on the
quiesced admin_q and will immediate execute when we unquiesce it before
we execute (4). The host must not send admin commands to the controller
before enabling the controller.
To fix this, we have the fabric commands (admin connect and property
get/set, but not I/O queue connect) use a separate fabrics_q and make
sure to quiesce the admin_q before we disable the controller, and
unquiesce it only after we enable the controller.
This fixes the error prints from nvmet in a controller reset storm test:
kernel: nvmet: got cmd 6 while CC.EN == 0 on qid = 0
Which indicate that the host is sending an admin command when the
controller is not enabled.
Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Another issue with the Apple T2 based 2018 controllers seem to be
that they blow up (and shut the machine down) if there's a tag
collision between the IO queue and the Admin queue.
My suspicion is that they use our tags for their internal tracking
and don't mix them with the queue id. They also seem to not like
when tags go beyond the IO queue depth, ie 128 tags.
This adds a quirk that marks tags 0..31 of the IO queue reserved
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Based on reverse engineering and original patch by
Paul Pawlowski <paul@mrarm.io>
This adds support for Apple weird implementation of NVME in their
2018 or later machines. It accounts for the twice-as-big SQ entries
for the IO queues, and the fact that only interrupt vector 0 appears
to function properly.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The size of a submission queue element should always be 6 (64 bytes)
by spec.
However some controllers such as Apple's are not properly implementing
the standard and require a different size.
This provides the ground work for the subsequent quirks for these
controllers.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
|
|
|
|
|
|
|
|
|
|
| |
This will make it easier to handle variable queue entry sizes
later. No functional change.
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
|
|
|
|
|
|
|
|
|
|
|
| |
When native multipathing is enabled we cannot enable blktrace for
the underlying paths, so any completion is never traced.
Signed-off-by: Hannes Reinecke <hare@suse.com>
[fixed-up by Mikhail for non-multipath-build]
Signed-off-by: Mikhail Skorzhinskii <mskorzhinskiy@solarflare.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
ANA log parsing invokes nvme_update_ana_state() per ANA group desc.
This updates the state of namespaces with nsids in desc->nsids[].
Both ctrl->namespaces list and desc->nsids[] array are sorted by nsid.
Hence nvme_update_ana_state() performs a single walk over ctrl->namespaces:
- if current namespace matches the current desc->nsids[n],
this namespace is updated, and n is incremented.
- the process stops when it encounters the end of either
ctrl->namespaces end or desc->nsids[]
In case desc->nsids[n] does not match any of ctrl->namespaces,
the remaining nsids following desc->nsids[n] will not be updated.
Such situation was considered abnormal and generated WARN_ON_ONCE.
However ANA log MAY contain nsids not (yet) found in ctrl->namespaces.
For example, lets consider the following scenario:
- nvme0 exposes namespaces with nsids = [2, 3] to the host
- a new namespace nsid = 1 is added dynamically
- also, a ANA topology change is triggered
- NS_CHANGED aen is generated and triggers scan_work
- before scan_work discovers nsid=1 and creates a namespace, a NOTICE_ANA
aen was issues and ana_work receives ANA log with nsids=[1, 2, 3]
Result: ana_work fails to update ANA state on existing namespaces [2, 3]
Solution:
Change the way nvme_update_ana_state() namespace list walk
checks the current namespace against desc->nsids[n] as follows:
a) ns->head->ns_id < desc->nsids[n]: keep walking ctrl->namespaces.
b) ns->head->ns_id == desc->nsids[n]: match, update the namespace
c) ns->head->ns_id >= desc->nsids[n]: skip to desc->nsids[n+1]
This enables correct operation in the scenario described above.
This also allows ANA log to contain nsids currently invisible
to the host, i.e. inactive nsids.
Signed-off-by: Anton Eidelman <anton@lightbitslabs.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
|
|
|
|
|
|
|
|
|
|
|
| |
Set the outgoing packets type of service (TOS) according to the
receiving TOS.
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Suggested-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
TOS provide clients the ability to segregate traffic flows for
different type of data.
One of the TOS usage is bandwidth management which allows setting bandwidth
limits for QoS classes, e.g. 80% bandwidth to controllers at QoS class A
and 20% to controllers at QoS class B.
usage examples:
nvme connect --tos=0 --transport=tcp --traddr=10.0.1.1 --nqn=test-nvme
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
|
|
|
|
|
|
|
|
|
| |
This patch doesn't change any functionality.
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For RDMA transports, TOS is an extension of IB QoS to provide clients
the ability to segregate traffic flows for different type of data.
RDMA CM abstract it for ULPs using rdma_set_service_type().
Internally, each traffic flow is represented by a connection with all of
its independent resources like that of a normal connection, and is
differentiated by service type. In other words, there can be multiple qp
connections between an IP pair and each supports a unique service type.
One of the TOS usage is bandwidth management which allows setting bandwidth
limits for QoS classes, e.g. 80% bandwidth to controllers at QoS class A
and 20% to controllers at QoS class B.
Note: In addition to the TOS configuration, QOS must be configured on the
relevant HCA on the target (send RDMA commands) and initiator to effect
the traffic.
usage examples:
nvme connect --tos=0 --transport=rdma --traddr=10.0.1.1 --nqn=test-nvme
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
|
|
|
|
|
|
|
|
|
|
| |
TOS is user-defined and needs to be configured via nvme-cli.
It must be set before initiating any traffic and once set the TOS
cannot be changed.
Signed-off-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
|
|
|
|
|
|
|
|
| |
when we uninit a command in error flow we also need to
free an iovec if it was allocated.
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
|
|
|
|
|
|
|
| |
We must only call sgl_free for sgl that we actually
allocated.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
|
|
|
|
|
|
|
|
|
| |
Four different fields are in CDWs of Get LBA Status command which means
it would be great if we can see in detail when tracing in target side
also.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
|
|
|
|
|
|
|
|
| |
Four different fields are in CDWs of Get LBA Status command which means
it would be great if we can see in detail when tracing.
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In nvme spec 1.3 there is a definition for data write/read counters
from SMART log, (See section 5.14.1.2):
This value is reported in thousands (i.e., a value of 1
corresponds to 1000 units of 512 bytes read) and is rounded up.
However, in nvme target where value is reported with actual units,
but not thousands of units as the spec requires.
Signed-off-by: Tom Wu <tomwu@mellanox.com>
Reviewed-by: Israel Rukshin <israelr@mellanox.com>
Reviewed-by: Max Gurtovoy <maxg@mellanox.com>
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Simple polling support via socket busy_poll interface.
Although we do not shutdown interrupts but simply hammer
the socket poll, we can sometimes find completions faster
than the normal interrupt driven RX path.
We add per queue nr_cqe counter that resets every time
RX path is invoked such that .poll callback can return it
to stay consistent with the semantics.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The tcp host module is now taking those APIs from crypto ahash:
(1) crypto_ahash_final()
(2) crypto_ahash_digest()
(3) crypto_alloc_ahash()
nvme-tcp should depends on CRYPTO_CRC32C.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Jens Axboe <axboe@fb.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Minwoo Im <minwoo.im.dev@gmail.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
|
|
|
|
|
|
|
|
|
| |
All seem to call it with ctrl->cap so no need to pass it
at all.
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
|
|
|
|
|
|
|
|
|
| |
nvme_enable_ctrl reads the cap register right after, so
no need to do that locally in the transport driver. Have
sqsize setting in nvme_init_identify.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
|
|
|
|
|
|
| |
Align with what the rest of the transports are doing.
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
|
|
|
|
|
|
|
|
| |
No need to use a stack cap variable.
Reviewed-by: Minwoo Im <minwoo.im.dev@gmail.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using socket specific read_sock() calls instead of directly calling
tcp_read_sock() helps lld module registered handlers if any, to be called
from nvme-tcp host.
This patch therefore replaces the tcp_read_sock() with socket specific
prot_ops.
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
|
|
|
|
|
|
|
| |
Can return directly in the switch statement
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Often limits can be changed by admin. When discussing such things
it helps if you can provide "self-sustained" facts. Also
sometimes the admin thinks he changed a limit, but it did not
take effect for some reason or he changed the wrong thing.
V3: Only pr_warn when Faulty is 0.
V2: Add read_errors value to pr_warn.
Signed-off-by: Nigel Croxon <ncroxon@redhat.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Until revalidate_disk() has completed, the size of a new md array will
appear to be zero.
So we shouldn't report, through array_state, that the array is active
until that time.
udev rules check array_state to see if the array is ready. As soon as
it appear to be zero, fsck can be run. If it find the size to be
zero, it will fail.
So add a new flag to provide an interlock between do_md_run() and
array_state_show(). This flag is set while do_md_run() is active and
it prevents array_state_show() from reporting that the array is
active.
Before do_md_run() is called, ->pers will be NULL so array is
definitely not active.
After do_md_run() is called, revalidate_disk() will have run and the
array will be completely ready.
We also move various sysfs_notify*() calls out of md_run() into
do_md_run() after MD_NOT_READY is cleared. This ensure the
information is ready before the notification is sent.
Prior to v4.12, array_state_show() was called with the
mddev->reconfig_mutex held, which provided exclusion with do_md_run().
Note that MD_NOT_READY cleared twice. This is deliberate to cover
both success and error paths with minimal noise.
Fixes: b7b17c9b67e5 ("md: remove mddev_lock() from md_attr_show()")
Cc: stable@vger.kernel.org (v4.12++)
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since commit 4ad23a976413 ("MD: use per-cpu counter for
writes_pending"), set_in_sync() is substantially more expensive: it
can wait for a full RCU grace period which can be 10s of milliseconds.
So we should only call it when the cost is justified.
md_check_recovery() currently calls set_in_sync() every time it finds
anything to do (on non-external active arrays). For an array
performing resync or recovery, this will be quite often.
Each call will introduce a delay to the md thread, which can noticeable
affect IO submission latency.
In md_check_recovery() we only need to call set_in_sync() if
'safemode' was non-zero at entry, meaning that there has been not
recent IO. So we save this "safemode was nonzero" state, and only
call set_in_sync() if it was non-zero.
This measurably reduces mean and maximum IO submission latency during
resync/recovery.
Reported-and-tested-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Fixes: 4ad23a976413 ("MD: use per-cpu counter for writes_pending")
Cc: stable@vger.kernel.org (v4.12+)
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
|
|
|
|
|
|
|
| |
You can't magically mark a function inline and expect that to work.
Fixes: fceb5d1b19cb ("null_blk: create a helper for zoned devices")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
| |
This patch creates a helper function for handling the request
completion in the null_handle_cmd().
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch creates a helper function for handling zoned block device
operations.
This patch also restructured the code for null_blk_zoned.c and uses the
pattern to return blk_status_t and catch the error in the function
null_handle_cmd() into cmd->error variable instead of setting it up in
the deeper layer just like the way it is done for flush, badblocks and
memory backed case in the null_handle_cmd(). We also move
null_handle_zoned() to the null_blk_zoned.c to keep the zoned code
separate.
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch creates a helper for handling requests when null_blk is
memory backed in the null_handle_cmd(). Although the helper is very
simple right now, it makes the code flow consistent with the rest of
code in the null_handle_cmd() and provides a uniform code structure
for future code.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
| |
This patch creates a helper for handling badblocks code in the
null_handle_cmd().
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
| |
This patch creates a helper for handling throttling code in the
null_handle_cmd().
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a preparation patch which moves the duplicate code for sectors
and nr_sectors calculations for bio vs request mode into their
respective callers (null_queue_bio(), null_qeueue_req()). Now the core
function only deals with the respective actions and commands instead of
having to calculte the bio vs req operations and different sector
related variables. We also move the flush command handling at the top
which significantly simplifies the rest of the code.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a bug added in 4.10 with commit:
commit 9561a7ade0c205bc2ee035a2ac880478dcc1a024
Author: Josef Bacik <jbacik@fb.com>
Date: Tue Nov 22 14:04:40 2016 -0500
nbd: add multi-connection support
that limited the number of devices to 256. Before the patch we could
create 1000s of devices, but the patch switched us from using our
own thread to using a work queue which has a default limit of 256
active works.
The problem is that our recv_work function sits in a loop until
disconnection but only handles IO for one connection. The work is
started when the connection is started/restarted, but if we end up
creating 257 or more connections, the queue_work call just queues
connection257+'s recv_work and that waits for connection 1 - 256's
recv_work to be disconnected and that work instance completing.
Instead of reverting back to kthreads, this has us allocate a
workqueue_struct per device, so we can block in the work.
Cc: stable@vger.kernel.org
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This fixes a regression added in 4.9 with commit:
commit 0eadf37afc2500e1162c9040ec26a705b9af8d47
Author: Josef Bacik <jbacik@fb.com>
Date: Thu Sep 8 12:33:40 2016 -0700
nbd: allow block mq to deal with timeouts
where before the patch userspace would set the timeout to 0 to disable
it. With the above patch, a zero timeout tells the block layer to use
the default value of 30 seconds. For setups where commands can take a
long time or experience transient issues like network disruptions this
then results in IO errors being sent to the application.
To fix this, the patch still uses the common block layer timeout
framework, but if zero is set, nbd just logs a message and then resets
the timer when it expires.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix bug added with the patch:
commit 8f3ea35929a0806ad1397db99a89ffee0140822a
Author: Josef Bacik <josef@toxicpanda.com>
Date: Mon Jul 16 12:11:35 2018 -0400
nbd: handle unexpected replies better
where if the timeout handler runs when the completion path is and we fail
to grab the mutex in the timeout handler we will leave a config reference
and cannot free the config later.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
| |
This adds a helper function to convert a block req op to a nbd cmd type.
It will be used in the last patch to log the type in the timeout
handler.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
| |
Add a helper to set the cmd timeout. It does not really do a lot now,
but will be more useful in the next patches.
Reviewed-by: Josef Bacik <josef@toxicpanda.com>
Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
|
| |
As sparse points out, these two copy_from_user() should actually be
copy_to_user().
Fixes: 229b53c9bf4e ("take floppy compat ioctls to sodding floppy.c")
Cc: stable@vger.kernel.org
Acked-by: Alexander Popov <alex.popov@linux.com>
Reviewed-by: Mukesh Ojha <mojha@codeaurora.org>
Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
| |
A previous commit correctly removed set-but-not-read variables, but
this left two new variables now unused. Kill them.
Fixes: ba6f7da99aaf ("lightnvm: remove set but not used variables 'data_len' and 'rq_len'")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Enabling Direct I/O with loop devices helps reducing memory usage by
avoiding double caching. 32 bit applications running on 64 bits systems
are currently not able to request direct I/O because is missing from the
lo_compat_ioctl.
This patch fixes the compatibility issue mentioned above by exporting
LOOP_SET_DIRECT_IO as additional lo_compat_ioctl() entry.
The input argument for this ioctl is a single long converted to a 1-bit
boolean, so compatibility is preserved.
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Alessio Balsini <balsini@android.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
drivers/lightnvm/pblk-read.c: In function pblk_submit_read_gc:
drivers/lightnvm/pblk-read.c:423:6: warning: variable data_len set but not used [-Wunused-but-set-variable]
drivers/lightnvm/pblk-recovery.c: In function pblk_recov_scan_oob:
drivers/lightnvm/pblk-recovery.c:368:15: warning: variable rq_len set but not used [-Wunused-but-set-variable]
They are not used since commit 48e5da725581 ("lightnvm:
move metadata mapping to lower level driver")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
|
|
|
|
|
| |
It's just code clean-up.
Signed-off-by: Hou Tao <houtao1@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
can't work
When add one disk to array, the md_reap_sync_thread is responsible
to activate the spare and set In_sync flag for the new member in
spare_active().
But if raid1 has one member disk A, and disk B is added to the array.
Then we offline A before all the datas are synchronized from A to B,
obviously B doesn't have the latest data as A, but B is still marked
with In_sync flag.
So let's not call spare_active under the condition, otherwise B is
still showed with 'U' state which is not correct.
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When a disk is added to array, the following path is called in mdadm.
Manage_subdevs -> sysfs_freeze_array
-> Manage_add
-> sysfs_set_str(&info, NULL, "sync_action","idle")
Then from kernel side, Manage_add invokes the path (add_new_disk ->
validate_super = super_1_validate) to set In_sync flag.
Since In_sync means "device is in_sync with rest of array", and the new
added disk need to resync thread to help the synchronization of data.
And md_reap_sync_thread would call spare_active to set In_sync for the
new added disk finally. So don't set In_sync if array is in frozen.
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When the 'last' device in a RAID1 or RAID10 reports an error,
we do not mark it as failed. This would serve little purpose
as there is no risk of losing data beyond that which is obviously
lost (as there is with RAID5), and there could be other sectors
on the device which are readable, and only readable from this device.
This in general this maximises access to data.
However the current implementation also stops an admin from removing
the last device by direct action. This is rarely useful, but in many
case is not harmful and can make automation easier by removing special
cases.
Also, if an attempt to write metadata fails the device must be marked
as faulty, else an infinite loop will result, attempting to update
the metadata on all non-faulty devices.
So add 'fail_last_dev' member to 'struct mddev', then we can bypasses
the 'last disk' checks for RAID1 and RAID10, and control the behavior
per array by change sysfs node.
Signed-off-by: NeilBrown <neilb@suse.de>
[add sysfs node for fail_last_dev by Guoqing]
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
|
|
|
|
|
|
|
|
| |
Instead of linear approach to calculate power of 10, use generic int_pow()
which does it better.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
|
|
|
|
|
|
|
|
| |
Just like raid1, we do not queue write error bio to retry write
and acknowlege badblocks, when the device is faulty.
Signed-off-by: Yufen Yu <yuyufen@huawei.com>
Signed-off-by: Song Liu <songliubraving@fb.com>
|