summaryrefslogtreecommitdiffstats
path: root/drivers/scsi/sd.h (follow)
Commit message (Collapse)AuthorAgeFilesLines
* sd: convert to the atomic queue limits APIChristoph Hellwig2024-06-141-2/+4
| | | | | | | | | | | | | | Assign all queue limits through a local queue_limits variable and queue_limits_commit_update so that we can't race updating them from multiple places, and freeze the queue when updating them so that in-progress I/O submissions don't see half-updated limits. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: John Garry <john.g.garry@oracle.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240531074837.1648501-12-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* scsi: sd: Use the block layer zone append emulationDamien Le Moal2024-04-171-19/+0
| | | | | | | | | | | | | | | | | | | | Set the request queue of a TYPE_ZBC device as needing zone append emulation by setting the device queue max_zone_append_sectors limit to 0. This enables the block layer generic implementation provided by zone write plugging. With this, the sd driver will never see a REQ_OP_ZONE_APPEND request and the zone append emulation code implemented in sd_zbc.c can be removed. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Tested-by: Hans Holmberg <hans.holmberg@wdc.com> Tested-by: Dennis Maisenbacher <dennis.maisenbacher@wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20240408014128.205141-14-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
* scsi: sd: Translate data lifetime informationBart Van Assche2024-02-271-1/+3
| | | | | | | | | | | | | | | | Recently T10 standardized SBC constrained streams. This mechanism allows to pass data lifetime information to SCSI devices in the group number field. Add support for translating write hint information into a permanent stream number in the sd driver. Use WRITE(10) instead of WRITE(6) if data lifetime information is present because the WRITE(6) command does not have a GROUP NUMBER field. Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240130214911.1863909-12-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: core: Query the Block Limits Extension VPD pageBart Van Assche2024-02-271-0/+1
| | | | | | | | | | | | | Parse the Reduced Stream Control Supported (RSCS) bit from the block limits extension VPD page. The RSCS bit is defined in SBC-5 r05 (https://www.t10.org/cgi-bin/ac.pl?t=f&f=sbc5r05.pdf). Reviewed-by: Avri Altman <avri.altman@wdc.com> Reviewed-by: Daejun Park <daejun7.park@samsung.com> Cc: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240130214911.1863909-10-bvanassche@acm.org Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: sd: Do not issue commands to suspended disks on shutdownDamien Le Moal2023-09-281-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | If an error occurs when resuming a host adapter before the devices attached to the adapter are resumed, the adapter low level driver may remove the scsi host, resulting in a call to sd_remove() for the disks of the host. This in turn results in a call to sd_shutdown() which will issue a synchronize cache command and a start stop unit command to spindown the disk. sd_shutdown() issues the commands only if the device is not already runtime suspended but does not check the power state for system-wide suspend/resume. That is, the commands may be issued with the device in a suspended state, which causes PM resume to hang, forcing a reset of the machine to recover. Fix this by tracking the suspended state of a disk by introducing the suspended boolean field in the scsi_disk structure. This flag is set to true when the disk is suspended is sd_suspend_common() and resumed with sd_resume(). When suspended is true, sd_shutdown() is not executed from sd_remove(). Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: sd: Revert "Rework asynchronous resume support"Bart Van Assche2022-08-231-5/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Although commit 88f1669019bd ("scsi: sd: Rework asynchronous resume support") eliminates a delay for some ATA disks after resume, it causes resume of ATA disks to fail on other setups. See also: * "Resume process hangs for 5-6 seconds starting sometime in 5.16" (https://bugzilla.kernel.org/show_bug.cgi?id=215880). * Geert's regression report (https://lore.kernel.org/linux-scsi/alpine.DEB.2.22.394.2207191125130.1006766@ramsan.of.borg/). This is what I understand about this issue: * During resume, ata_port_pm_resume() starts the SCSI error handler. This changes the SCSI host state into SHOST_RECOVERY and causes scsi_queue_rq() to return BLK_STS_RESOURCE. * sd_resume() calls sd_start_stop_device() for ATA devices. That function in turn calls sd_submit_start() which tries to submit a START STOP UNIT command. That command can only be submitted after the SCSI error handler has changed the SCSI host state back to SHOST_RUNNING. * The SCSI error handler runs on its own thread and calls schedule_work(&(ap->scsi_rescan_task)). That causes ata_scsi_dev_rescan() to be called from the context of a kernel workqueue. That call hangs in blk_mq_get_tag(). I'm not sure why - maybe because all available tags have been allocated by sd_submit_start() calls (this is a guess). Link: https://lore.kernel.org/r/20220816172638.538734-1-bvanassche@acm.org Fixes: 88f1669019bd ("scsi: sd: Rework asynchronous resume support") Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: gzhqyz@gmail.com Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Reported-by: gzhqyz@gmail.com Reported-and-tested-by: Vlastimil Babka <vbabka@suse.cz> Tested-by: John Garry <john.garry@huawei.com> Tested-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: sd: Rework asynchronous resume supportBart Van Assche2022-07-071-0/+5
| | | | | | | | | | | | | | | | | | | | | | | For some technologies, e.g. an ATA bus, resuming can take multiple seconds. Waiting for resume to finish can cause a very noticeable delay. Hence this commit that restores the behavior from before "scsi: core: pm: Rely on the device driver core for async power management" for most SCSI devices. This commit introduces a behavior change: if the START command fails, do not consider this as a SCSI disk resume failure. Link: https://bugzilla.kernel.org/show_bug.cgi?id=215880 Link: https://lore.kernel.org/r/20220630195703.10155-3-bvanassche@acm.org Fixes: a19a93e4c6a9 ("scsi: core: pm: Rely on the device driver core for async power management") Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: John Garry <john.garry@huawei.com> Cc: ericspero@icloud.com Cc: jason600.groome@gmail.com Tested-by: jason600.groome@gmail.com Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: sd_zbc: Prevent zone information memory leakDamien Le Moal2022-06-021-2/+2
| | | | | | | | | | | | | | | | | | | | | Make sure to always free a scsi disk zone information, even for regular disks. This ensures that there is no memory leak, even in the case of a zoned disk changing type to a regular disk (e.g. with a reformat using the FORMAT WITH PRESET command or other vendor proprietary command). To do this, rename sd_zbc_clear_zone_info() to sd_zbc_free_zone_info() and remove sd_zbc_release_disk(). A call to sd_zbc_free_zone_info() is added to sd_zbc_read_zones() for drives for which sd_is_zoned() returns false. Furthermore, sd_zbc_free_zone_info() code make s sure that the sdkp rev_mutex is never used while not being initialized by gating the cleanup code with a a check on the zone_wp_update_buf field as it is never NULL when rev_mutex has been initialized. Link: https://lore.kernel.org/r/20220601062544.905141-3-damien.lemoal@opensource.wdc.com Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: sd: Optimal I/O size should be a multiple of reported granularityMartin K. Petersen2022-05-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | Commit a83da8a4509d ("scsi: sd: Optimal I/O size should be a multiple of physical block size") validated the reported optimal I/O size against the physical block size to overcome problems with devices reporting nonsensical transfer sizes. However, some devices claim conformity to older SCSI versions that predate the physical block size being reported. Other devices do not report a physical block size at all. We need to be able to validate the optimal I/O size on those devices as well. Many devices report an OPTIMAL TRANSFER LENGTH GRANULARITY in the same VPD page as the OPTIMAL TRANSFER LENGTH. Use this value to validate the optimal I/O size. Also check that the reported granularity is a multiple of the physical block size, if supported. Link: https://lore.kernel.org/r/33fb522e-4f61-1b76-914f-c9e6a3553c9b@gmail.com Link: https://lore.kernel.org/r/20220302053559.32147-9-martin.petersen@oracle.com Reported-by: Bernhard Sulzer <micraft.b@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: sd: sd_zbc: Hide gap zonesDamien Le Moal2022-04-261-0/+5
| | | | | | | | | | | | | | | | | | | ZBC-2 allows host-managed disks to report gap zones. This allow zoned disks to report an offset between data zone starts that is a power of two even if the number of logical blocks with data per zone is not a power of two. Another new feature in ZBC-2 is support for constant zone starting LBA offsets. For zoned disks that report a constant zone starting LBA offset, hide the gap zones from the block layer. Report the offset between data zone starts as zone size and report the number of logical blocks with data per zone as the zone capacity. Link: https://lore.kernel.org/r/20220421183023.3462291-7-bvanassche@acm.org Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> [ bvanassche: Reworked this patch ] Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: sd: sd_zbc: Introduce struct zoned_disk_infoBart Van Assche2022-04-261-4/+18
| | | | | | | | | | | | | | Deriving the meaning of the nr_zones, rev_nr_zones, zone_blocks and rev_zone_blocks member variables requires careful analysis of the source code. Make the meaning of these member variables easier to understand by introducing struct zoned_disk_info. Link: https://lore.kernel.org/r/20220421183023.3462291-5-bvanassche@acm.org Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: sd: sd_zbc: Improve source code documentationBart Van Assche2022-04-261-3/+2
| | | | | | | | | | | | Add several kernel-doc headers. Declare input arrays const. Specify the array size in function declarations. Link: https://lore.kernel.org/r/20220421183023.3462291-2-bvanassche@acm.org Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* sd: rename the scsi_disk.dev fieldChristoph Hellwig2022-03-091-2/+7
| | | | | | | | | | | | | dev is very hard to grep for. Give the field a more descriptive name and documents its purpose. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220308055200.735835-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* scsi: don't use disk->private_data to find the scsi_driverChristoph Hellwig2022-03-091-2/+1
| | | | | | | | | | | | | | Requiring every ULP to have the scsi_drive as first member of the private data is rather fragile and not necessary anyway. Just use the driver hanging off the SCSI device instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20220308055200.735835-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
* scsi: sd: add concurrent positioning ranges supportDamien Le Moal2021-10-271-0/+1
| | | | | | | | | | | | | | | | | | | | | | | Add the sd_read_cpr() function to the sd scsi disk driver to discover if a device has multiple concurrent positioning ranges (i.e. multiple actuators on an HDD). The existence of VPD page B9h indicates if a device has multiple concurrent positioning ranges. The page content describes each range supported by the device. sd_read_cpr() is called from sd_revalidate_disk() and uses the block layer functions disk_alloc_independent_access_ranges() and disk_set_independent_access_ranges() to represent the set of actuators of the device as independent access ranges. The format of the Concurrent Positioning Ranges VPD page B9h is defined in section 6.6.6 of SBC-5. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20211027022223.183838-3-damien.lemoal@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds2020-10-151-0/+1
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull SCSI updates from James Bottomley: "The usual driver updates (ufs, qla2xxx, tcmu, ibmvfc, lpfc, smartpqi, hisi_sas, qedi, qedf, mpt3sas) and minor bug fixes. There are only three core changes: adding sense codes, cleaning up noretry and adding an option for limitless retries" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (226 commits) scsi: hisi_sas: Recover PHY state according to the status before reset scsi: hisi_sas: Filter out new PHY up events during suspend scsi: hisi_sas: Add device link between SCSI devices and hisi_hba scsi: hisi_sas: Add check for methods _PS0 and _PR0 scsi: hisi_sas: Add controller runtime PM support for v3 hw scsi: hisi_sas: Switch to new framework to support suspend and resume scsi: hisi_sas: Use hisi_hba->cq_nvecs for calling calling synchronize_irq() scsi: qedf: Remove redundant assignment to variable 'rc' scsi: lpfc: Remove unneeded variable 'status' in lpfc_fcp_cpu_map_store() scsi: snic: Convert to use DEFINE_SEQ_ATTRIBUTE macro scsi: qla4xxx: Delete unneeded variable 'status' in qla4xxx_process_ddb_changed scsi: sun_esp: Use module_platform_driver to simplify the code scsi: sun3x_esp: Use module_platform_driver to simplify the code scsi: sni_53c710: Use module_platform_driver to simplify the code scsi: qlogicpti: Use module_platform_driver to simplify the code scsi: mac_esp: Use module_platform_driver to simplify the code scsi: jazz_esp: Use module_platform_driver to simplify the code scsi: mvumi: Fix error return in mvumi_io_attach() scsi: lpfc: Drop nodelist reference on error in lpfc_gen_req() scsi: be2iscsi: Fix a theoretical leak in beiscsi_create_eqs() ...
| * scsi: sd: Allow user to configure command retriesMike Christie2020-10-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some iSCSI targets went with the traditional "export N ports" approach and then allowed the initiator to multipath over them. Other targets went the opposite direction and export a single port, and then software on the target side performs load balancing and failover to other targets via an iSCSI specific feature or IP takover. The problem for the 2nd type of config is we quickly run out of our five retries and get I/O errors. In these setups we want to reduce resource use on the initiator side so we only wanted the one session and no dm-multipath. To handle traditional multipath operations like failover we do IP takover on the target side. So we would have an iSCSI target running on node1. Some monitoring software decides it's dead or the node is overloaded so it starts the iSCSI target on node2. The problem is for the failover case where we might have the equivalent of a dm-multipath temporary all paths down, or we just have to try more than 5 nodes before finding a good one. To handle this type of issue allow the user to configure the disk cmd retries from -1 to the current max of 5. -1 means infinite retries and should be used for setups where some other setting is going to control when to fail. For example iSCSI has the replacement/recovery timeout and fc (some users have used FC with NPIV and done something similar as IP takover) has dev_loss_tmo/fast_io_fail which will eventually expire and fail I/O. Link: https://lore.kernel.org/r/1601566554-26752-3-git-send-email-michael.christie@oracle.com Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Mike Christie <michael.christie@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* | scsi: sd: sd_zbc: Fix ZBC disk initializationDamien Le Moal2020-09-161-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make sure to call sd_zbc_init_disk() when the sdkp->zoned field is known, that is, once sd_read_block_characteristics() is executed in sd_revalidate_disk(), so that host-aware disks also get initialized. To do so, move sd_zbc_init_disk() call in sd_zbc_revalidate_zones() and make sure to execute it for all zoned disks, including for host-aware disks used as regular disks as these disk zoned model may be changed back to BLK_ZONED_HA when partitions are deleted. Link: https://lore.kernel.org/r/20200915073347.832424-3-damien.lemoal@wdc.com Fixes: 5795eb443060 ("scsi: sd_zbc: emulate ZONE_APPEND commands") Cc: <stable@vger.kernel.org> # v5.8+ Reported-by: Borislav Petkov <bp@alien8.de> Tested-by: Borislav Petkov <bp@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* | scsi: sd: sd_zbc: Fix handling of host-aware ZBC disksDamien Le Moal2020-09-161-1/+1
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | When CONFIG_BLK_DEV_ZONED is disabled, allow using host-aware ZBC disks as regular disks. In this case, ensure that command completion is correctly executed by changing sd_zbc_complete() to return good_bytes instead of 0 and causing a hang during device probe (endless retries). When CONFIG_BLK_DEV_ZONED is enabled and a host-aware disk is detected to have partitions, it will be used as a regular disk. In this case, make sure to not do anything in sd_zbc_revalidate_zones() as that triggers warnings. Since all these different cases result in subtle settings of the disk queue zoned model, introduce the block layer helper function blk_queue_set_zoned() to generically implement setting up the effective zoned model according to the disk type, the presence of partitions on the disk and CONFIG_BLK_DEV_ZONED configuration. Link: https://lore.kernel.org/r/20200915073347.832424-2-damien.lemoal@wdc.com Fixes: b72053072c0b ("block: allow partitions on host aware zone devices") Cc: <stable@vger.kernel.org> Reported-by: Borislav Petkov <bp@alien8.de> Suggested-by: Christoph Hellwig <hch@infradead.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: sd_zbc: Improve zone revalidationDamien Le Moal2020-08-051-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, for zoned disks, since blk_revalidate_disk_zones() requires the disk capacity to be set already to operate correctly, zones revalidation can only be done on the second revalidate scan once the gendisk capacity is set at the end of the first scan. As a result, if zone revalidation fails, there is no second chance to recover from the failure and the disk capacity is changed to 0, with the disk left unusable. This can be improved by shuffling around code, specifically, by moving the call to sd_zbc_revalidate_zones() from sd_zbc_read_zones() to the end of sd_revalidate_disk(), after set_capacity_revalidate_and_notify() is called to set the gendisk capacity. With this change, if sd_zbc_revalidate_zones() fails on the first scan, the second scan will call it again to recover, if possible. Using the new struct scsi_disk fields rev_nr_zones and rev_zone_blocks, sd_zbc_revalidate_zones() does actual work only if it detects a change with the disk zone configuration. This means that for a successful zones revalidation on the first scan, the second scan will not cause another heavy full check. While at it, remove the unecesary "extern" declaration of sd_zbc_read_zones(). Link: https://lore.kernel.org/r/20200731054928.668547-1-damien.lemoal@wdc.com Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: sd_zbc: Remove unused inline functionsYueHaibing2020-07-151-6/+0
| | | | | | | | | | These are no longer used and can be removed. Link: https://lore.kernel.org/r/20200715025523.34620-1-yuehaibing@huawei.com Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: sd_zbc: emulate ZONE_APPEND commandsJohannes Thumshirn2020-05-131-5/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Emulate ZONE_APPEND for SCSI disks using a regular WRITE(16) command with a start LBA set to the target zone write pointer position. In order to always know the write pointer position of a sequential write zone, the write pointer of all zones is tracked using an array of 32bits zone write pointer offset attached to the scsi disk structure. Each entry of the array indicate a zone write pointer position relative to the zone start sector. The write pointer offsets are maintained in sync with the device as follows: 1) the write pointer offset of a zone is reset to 0 when a REQ_OP_ZONE_RESET command completes. 2) the write pointer offset of a zone is set to the zone size when a REQ_OP_ZONE_FINISH command completes. 3) the write pointer offset of a zone is incremented by the number of 512B sectors written when a write, write same or a zone append command completes. 4) the write pointer offset of all zones is reset to 0 when a REQ_OP_ZONE_RESET_ALL command completes. Since the block layer does not write lock zones for zone append commands, to ensure a sequential ordering of the regular write commands used for the emulation, the target zone of a zone append command is locked when the function sd_zbc_prepare_zone_append() is called from sd_setup_read_write_cmnd(). If the zone write lock cannot be obtained (e.g. a zone append is in-flight or a regular write has already locked the zone), the zone append command dispatching is delayed by returning BLK_STS_ZONE_RESOURCE. To avoid the need for write locking all zones for REQ_OP_ZONE_RESET_ALL requests, use a spinlock to protect accesses and modifications of the zone write pointer offsets. This spinlock is initialized from sd_probe() using the new function sd_zbc_init(). Co-developed-by: Damien Le Moal <Damien.LeMoal@wdc.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds2019-12-081-0/+3
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull more SCSI updates from James Bottomley: "Eleven patches, all in drivers (no core changes) that are either minor cleanups or small fixes. They were late arriving, but still safe for -rc1" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: MAINTAINERS: Add the linux-scsi mailing list to the ISCSI entry scsi: megaraid_sas: Make poll_aen_lock static scsi: sd_zbc: Improve report zones error printout scsi: qla2xxx: Fix qla2x00_request_irqs() for MSI scsi: qla2xxx: unregister ports after GPN_FT failure scsi: qla2xxx: fix rports not being mark as lost in sync fabric scan scsi: pm80xx: Remove unused include of linux/version.h scsi: pm80xx: fix logic to break out of loop when register value is 2 or 3 scsi: scsi_transport_sas: Fix memory leak when removing devices scsi: lpfc: size cpu map by last cpu id set scsi: ibmvscsi_tgt: Remove unneeded variable rc
| * scsi: sd_zbc: Improve report zones error printoutDamien Le Moal2019-11-271-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | In the case of a report zones command failure, instead of simply printing the host_byte and driver_byte values returned, print a message that is more human readable and useful, adding sense codes too. To do so, use the already defined sd_print_sense_hdr() and sd_print_result() functions by moving the declaration of these functions into sd.h. Link: https://lore.kernel.org/r/20191125070518.951717-1-damien.lemoal@wdc.com Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* | block: rework zone reportingChristoph Hellwig2019-11-131-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Avoid the need to allocate a potentially large array of struct blk_zone in the block layer by switching the ->report_zones method interface to a callback model. Now the caller simply supplies a callback that is executed on each reported zone, and private data for it. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* | scsi: sd_zbc: add zone open, close, and finish supportAjay Joshi2019-11-071-3/+5
|/ | | | | | | | | | | | | | | | | Implement REQ_OP_ZONE_OPEN, REQ_OP_ZONE_CLOSE and REQ_OP_ZONE_FINISH support to allow explicit control of zone states. Contains contributions from Matias Bjorling, Hans Holmberg, Keith Busch and Damien Le Moal. Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Ajay Joshi <ajay.joshi@wdc.com> Signed-off-by: Matias Bjorling <matias.bjorling@wdc.com> Signed-off-by: Hans Holmberg <hans.holmberg@wdc.com> Signed-off-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* scsi: implement REQ_OP_ZONE_RESET_ALLChaitanya Kulkarni2019-08-051-2/+3
| | | | | | | | | | | | | This patch implements the zone reset all operation for sd_zbc.c. We add a new boolean parameter for the sd_zbc_setup_reset_cmd() to indicate REQ_OP_ZONE_RESET_ALL command setup. Along with that we add support in the completion path for the zone reset all. Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: Kill gfp_t argument of blkdev_report_zones()Damien Le Moal2019-07-121-2/+1
| | | | | | | | | | | | | Only GFP_KERNEL and GFP_NOIO are used with blkdev_report_zones(). In preparation of using vmalloc() for large report buffer and zone array allocations used by this function, remove its "gfp_t gfp_mask" argument and rely on the caller context to use memalloc_noio_save/restore() where necessary (block layer zone revalidation and dm-zoned I/O error path). Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* scsi: sd: Fix typo in sd_first_printk()Dietmar Hahn2019-02-131-1/+1
| | | | | | | | | | | | Commit b2bff6ceb61a9 ("[SCSI] sd: Quiesce mode sense error messages") added the macro sd_first_printk(). The macro takes "sdsk" as argument but dereferences "sdkp". This hasn't caused any real issues since all callers of sd_first_printk() have an sdkp. But fix the typo. [mkp: Turned this into a real patch and tweaked commit description] Signed-off-by: Dietmar Hahn <dietmar.hahn@ts.fujitsu.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: sd: Make protection lookup tables static and relocate functionsJohn Garry2019-01-091-62/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the protection lookup tables in sd_prot_flag_mask() and sd_prot_op() are declared as non-static. As such, they will be rebuilt for each respective function call. Optimise by making them static. This saves ~100B object code for sd.c: Before: text data bss dec hex filename 25403 1024 16 26443 674b drivers/scsi/sd.o After: text data bss dec hex filename 25299 1024 16 26339 66e3 drivers/scsi/sd.o In addition, since those same functions are declared in sd.h, but each are only referenced in sd.c, relocate them to that same c file. The inline specifier is dropped also, since gcc should be able to make the decision to inline. Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: return blk_status_t from scsi_init_io and ->init_commandChristoph Hellwig2018-11-101-3/+3
| | | | | | | | | | Replace the old BLKPREP_* values with the BLK_STS_ ones that they are converted to later anyway. Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: Introduce blk_revalidate_disk_zones()Damien Le Moal2018-10-251-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Drivers exposing zoned block devices have to initialize and maintain correctness (i.e. revalidate) of the device zone bitmaps attached to the device request queue (seq_zones_bitmap and seq_zones_wlock). To simplify coding this, introduce a generic helper function blk_revalidate_disk_zones() suitable for most (and likely all) cases. This new function always update the seq_zones_bitmap and seq_zones_wlock bitmaps as well as the queue nr_zones field when called for a disk using a request based queue. For a disk using a BIO based queue, only the number of zones is updated since these queues do not have schedulers and so do not need the zone bitmaps. With this change, the zone bitmap initialization code in sd_zbc.c can be replaced with a call to this function in sd_zbc_read_zones(), which is called from the disk revalidate block operation method. A call to blk_revalidate_disk_zones() is also added to the null_blk driver for devices created with the zoned mode enabled. Finally, to ensure that zoned devices created with dm-linear or dm-flakey expose the correct number of zones through sysfs, a call to blk_revalidate_disk_zones() is added to dm_table_set_restrictions(). The zone bitmaps allocated and initialized with blk_revalidate_disk_zones() are freed automatically from __blk_release_queue() using the block internal function blk_queue_free_zone_bitmaps(). Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: add a report_zones methodChristoph Hellwig2018-10-251-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Dispatching a report zones command through the request queue is a major pain due to the command reply payload rewriting necessary. Given that blkdev_report_zones() is executing everything synchronously, implement report zones as a block device file operation instead, allowing major simplification of the code in many places. sd, null-blk, dm-linear and dm-flakey being the only block device drivers supporting exposing zoned block devices, these drivers are modified to provide the device side implementation of the report_zones() block device file operation. For device mappers, a new report_zones() target type operation is defined so that the upper block layer calls blkdev_report_zones() can be propagated down to the underlying devices of the dm targets. Implementation for this new operation is added to the dm-linear and dm-flakey targets. Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Christoph Hellwig <hch@lst.de> [Damien] * Changed method block_device argument to gendisk * Various bug fixes and improvements * Added support for null_blk, dm-linear and dm-flakey. Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: move dif_prepare/dif_complete functions to block layerMax Gurtovoy2018-07-301-9/+0
| | | | | | | | | | | | | Currently these functions are implemented in the scsi layer, but their actual place should be the block layer since T10-PI is a general data integrity feature that is used in the nvme protocol as well. Also, use the tuple size from the integrity profile since it may vary between integrity types. Suggested-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
* scsi: sd_zbc: Change the type of the ZBC fields into u32Bart Van Assche2018-04-191-6/+6
| | | | | | | | | | | | This patch does not change any functionality but makes it clear that it is on purpose that these fields are 32 bits wide. Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.com> Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: sd: Remove zone write lockingDamien Le Moal2018-01-091-11/+0
| | | | | | | | | | | The block layer now handles zone write locking. [mkp: removed SCMD_ZONE_WRITE_LOCK reference in scsi_debugfs] Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* License cleanup: add SPDX GPL-2.0 license identifier to files with no licenseGreg Kroah-Hartman2017-11-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Many source files in the tree are missing licensing information, which makes it harder for compliance tools to determine the correct license. By default all files without license information are under the default license of the kernel, which is GPL version 2. Update the files which contain no license information with the 'GPL-2.0' SPDX license identifier. The SPDX identifier is a legally binding shorthand, which can be used instead of the full boiler plate text. This patch is based on work done by Thomas Gleixner and Kate Stewart and Philippe Ombredanne. How this work was done: Patches were generated and checked against linux-4.14-rc6 for a subset of the use cases: - file had no licensing information it it. - file was a */uapi/* one with no licensing information in it, - file was a */uapi/* one with existing licensing information, Further patches will be generated in subsequent months to fix up cases where non-standard license headers were used, and references to license had to be inferred by heuristics based on keywords. The analysis to determine which SPDX License Identifier to be applied to a file was done in a spreadsheet of side by side results from of the output of two independent scanners (ScanCode & Windriver) producing SPDX tag:value files created by Philippe Ombredanne. Philippe prepared the base worksheet, and did an initial spot review of a few 1000 files. The 4.13 kernel was the starting point of the analysis with 60,537 files assessed. Kate Stewart did a file by file comparison of the scanner results in the spreadsheet to determine which SPDX license identifier(s) to be applied to the file. She confirmed any determination that was not immediately clear with lawyers working with the Linux Foundation. Criteria used to select files for SPDX license identifier tagging was: - Files considered eligible had to be source code files. - Make and config files were included as candidates if they contained >5 lines of source - File already had some variant of a license header in it (even if <5 lines). All documentation files were explicitly excluded. The following heuristics were used to determine which SPDX license identifiers to apply. - when both scanners couldn't find any license traces, file was considered to have no license information in it, and the top level COPYING file license applied. For non */uapi/* files that summary was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 11139 and resulted in the first patch in this series. If that file was a */uapi/* path one, it was "GPL-2.0 WITH Linux-syscall-note" otherwise it was "GPL-2.0". Results of that was: SPDX license identifier # files ---------------------------------------------------|------- GPL-2.0 WITH Linux-syscall-note 930 and resulted in the second patch in this series. - if a file had some form of licensing information in it, and was one of the */uapi/* ones, it was denoted with the Linux-syscall-note if any GPL family license was found in the file or had no licensing in it (per prior point). Results summary: SPDX license identifier # files ---------------------------------------------------|------ GPL-2.0 WITH Linux-syscall-note 270 GPL-2.0+ WITH Linux-syscall-note 169 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-2-Clause) 21 ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) 17 LGPL-2.1+ WITH Linux-syscall-note 15 GPL-1.0+ WITH Linux-syscall-note 14 ((GPL-2.0+ WITH Linux-syscall-note) OR BSD-3-Clause) 5 LGPL-2.0+ WITH Linux-syscall-note 4 LGPL-2.1 WITH Linux-syscall-note 3 ((GPL-2.0 WITH Linux-syscall-note) OR MIT) 3 ((GPL-2.0 WITH Linux-syscall-note) AND MIT) 1 and that resulted in the third patch in this series. - when the two scanners agreed on the detected license(s), that became the concluded license(s). - when there was disagreement between the two scanners (one detected a license but the other didn't, or they both detected different licenses) a manual inspection of the file occurred. - In most cases a manual inspection of the information in the file resulted in a clear resolution of the license that should apply (and which scanner probably needed to revisit its heuristics). - When it was not immediately clear, the license identifier was confirmed with lawyers working with the Linux Foundation. - If there was any question as to the appropriate license identifier, the file was flagged for further research and to be revisited later in time. In total, over 70 hours of logged manual review was done on the spreadsheet to determine the SPDX license identifiers to apply to the source files by Kate, Philippe, Thomas and, in some cases, confirmation by lawyers working with the Linux Foundation. Kate also obtained a third independent scan of the 4.13 code base from FOSSology, and compared selected files where the other two scanners disagreed against that SPDX file, to see if there was new insights. The Windriver scanner is based on an older version of FOSSology in part, so they are related. Thomas did random spot checks in about 500 files from the spreadsheets for the uapi headers and agreed with SPDX license identifier in the files he inspected. For the non-uapi files Thomas did random spot checks in about 15000 files. In initial set of patches against 4.14-rc6, 3 files were found to have copy/paste license identifier errors, and have been fixed to reflect the correct identifier. Additionally Philippe spent 10 hours this week doing a detailed manual inspection and review of the 12,461 patched files from the initial patch version early this week with: - a full scancode scan run, collecting the matched texts, detected license ids and scores - reviewing anything where there was a license detected (about 500+ files) to ensure that the applied SPDX license was correct - reviewing anything where there was no detection but the patch license was not GPL-2.0 WITH Linux-syscall-note to ensure that the applied SPDX license was correct This produced a worksheet with 20 files needing minor correction. This worksheet was then exported into 3 different .csv files for the different types of files to be modified. These .csv files were then reviewed by Greg. Thomas wrote a script to parse the csv files and add the proper SPDX tag to the file, in the format that the file expected. This script was further refined by Greg based on the output to detect more types of files automatically and to distinguish between header and source .c files (which need different comment types.) Finally Greg ran the script using the .csv files to generate the patches. Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Philippe Ombredanne <pombredanne@nexb.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* sd: add support for TCG OPAL self encrypting disksChristoph Hellwig2017-06-291-0/+2
| | | | | | | | | | | | Just wire up the generic TCG OPAL infrastructure to the SCSI disk driver and the Security In/Out commands. Note that I don't know of any actual SCSI disks that do support TCG OPAL, but this is required to support ATA disks through libata. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds2017-05-041-4/+10
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull SCSI updates from James Bottomley: "This update includes the usual round of major driver updates (hisi_sas, ufs, fnic, cxlflash, be2iscsi, ipr, stex). There's also the usual amount of cosmetic and spelling stuff" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (155 commits) scsi: qla4xxx: fix spelling mistake: "Tempalate" -> "Template" scsi: stex: make S6flag static scsi: mac_esp: fix to pass correct device identity to free_irq() scsi: aacraid: pci_alloc_consistent() failures on ARM64 scsi: ufs: make ufshcd_get_lists_status() register operation obvious scsi: ufs: use MASK_EE_STATUS scsi: mac_esp: Replace bogus memory barrier with spinlock scsi: fcoe: make fcoe_e_d_tov and fcoe_r_a_tov static scsi: sd_zbc: Do not write lock zones for reset scsi: sd_zbc: Remove superfluous assignments scsi: sd: sd_zbc: Rename sd_zbc_setup_write_cmnd scsi: Improve scsi_get_sense_info_fld scsi: sd: Cleanup sd_done sense data handling scsi: sd: Improve sd_completed_bytes scsi: sd: Fix function descriptions scsi: mpt3sas: remove redundant wmb scsi: mpt: Move scsi_remove_host() out of mptscsih_remove_host() scsi: sg: reset 'res_in_use' after unlinking reserved array scsi: mvumi: remove code handling zero scsi_sg_count(scmd) case scsi: fusion: fix spelling mistake: "Persistancy" -> "Persistency" ...
| * scsi: sd: sd_zbc: Rename sd_zbc_setup_write_cmndDamien Le Moal2017-04-251-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Rename sd_zbc_setup_write_cmnd() to sd_zbc_write_lock_zone() to be clear about what the function actually does. To be consistent, also rename sd_zbc_cancel_write_cmnd() to sd_zbc_write_unlock_zone(). No functional change is introduced by this patch. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Bart Van Assche <Bart.VanAssche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * scsi: sd: Improve sd_completed_bytesDamien Le Moal2017-04-251-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Re-shuffle the code to be more efficient by not initializing variables upfront (i.e. do it only when necessary). Also replace the do_div calls with calls to sectors_to_logical(). No functional change is introduced by this patch. [mkp: bytes_to_logical()] Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
| * scsi: scsi_error: count medium access timeout only once per EH runHannes Reinecke2017-04-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current medium access timeout counter will be increased for each command, so if there are enough failed commands we'll hit the medium access timeout for even a single device failure and the following kernel message is displayed: sd H:C:T:L: [sdXY] Medium access timeout failure. Offlining disk! Fix this by making the timeout per EH run, ie the counter will only be increased once per device and EH run. Fixes: 18a4d0a ("[SCSI] Handle disk devices which can not process medium access commands") Cc: Ewan Milne <emilne@redhat.com> Cc: Lawrence Obermann <loberman@redhat.com> Cc: Benjamin Block <bblock@linux.vnet.ibm.com> Cc: Steffen Maier <maier@linux.vnet.ibm.com> Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* | scsi: sd: Separate zeroout and discard command choicesMartin K. Petersen2017-04-081-0/+8
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that zeroout and discards are distinct operations we need to separate the policy of choosing the appropriate command. Create a zeroing_mode which can be one of: write: Zeroout assist not present, use regular WRITE writesame: Allow WRITE SAME(10/16) with a zeroed payload writesame_16_unmap: Allow WRITE SAME(16) with UNMAP writesame_10_unmap: Allow WRITE SAME(10) with UNMAP The last two are conditional on the device being thin provisioned with LBPRZ=1 and LBPWS=1 or LBPWS10=1 respectively. Whether to set the UNMAP bit or not depends on the REQ_NOUNMAP flag. And if none of the _unmap variants are supported, regular WRITE SAME will be used if the device supports it. The zeroout_mode is exported in sysfs and the detected mode for a given device can be overridden using the string constants above. With this change in place we can now issue WRITE SAME(16) with UNMAP set for block zeroing applications that require hard guarantees and logical_block_size granularity. And at the same time use the UNMAP command with the device's preferred granulary and alignment for discard operations. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* sd: Implement support for ZBC devicesHannes Reinecke2016-10-191-0/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement ZBC support functions to setup zoned disks, both host-managed and host-aware models. Only zoned disks that satisfy the following conditions are supported: 1) All zones are the same size, with the exception of an eventual last smaller runt zone. 2) For host-managed disks, reads are unrestricted (reads are not failed due to zone or write pointer alignement constraints). Zoned disks that do not satisfy these 2 conditions are setup with a capacity of 0 to prevent their use. The function sd_zbc_read_zones, called from sd_revalidate_disk, checks that the device satisfies the above two constraints. This function may also change the disk capacity previously set by sd_read_capacity for devices reporting only the capacity of conventional zones at the beginning of the LBA range (i.e. devices reporting rc_basis set to 0). The capacity message output was moved out of sd_read_capacity into a new function sd_print_capacity to include this eventual capacity change by sd_zbc_read_zones. This new function also includes a call to sd_zbc_print_zones to display the number of zones and zone size of the device. Signed-off-by: Hannes Reinecke <hare@suse.de> [Damien: * Removed zone cache support * Removed mapping of discard to reset write pointer command * Modified sd_zbc_read_zones to include checks that the device satisfies the kernel constraints * Implemeted REPORT ZONES setup and post-processing based on code from Shaun Tancheff <shaun.tancheff@seagate.com> * Removed confusing use of 512B sector units in functions interface] Signed-off-by: Damien Le Moal <damien.lemoal@hgst.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Shaun Tancheff <shaun.tancheff@seagate.com> Tested-by: Shaun Tancheff <shaun.tancheff@seagate.com> Acked-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Jens Axboe <axboe@fb.com>
* scsi: sd: Move DIF protection types to t10-pi.hChristoph Hellwig2016-09-151-21/+0
| | | | | | | | | | | These should go together with the rest of the T10 protection information defintions. [mkp: s/T10_DIF/T10_PI/] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: scsi_debug: Use struct t10_pi_tuple instead of struct sd_dif_tupleChristoph Hellwig2016-09-151-9/+0
| | | | | | | | | And remove the declaration of the latter in sd.h as scsi_debug was the only user. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* sd: Fix rw_max for devices that report an optimal xfer sizeMartin K. Petersen2016-06-021-0/+5
| | | | | | | | | | | | | | | For historic reasons, io_opt is in bytes and max_sectors in block layer sectors. This interface inconsistency is error prone and should be fixed. But for 4.4--4.7 let's make the unit difference explicit via a wrapper function. Fixes: d0eb20a863ba ("sd: Optimal I/O size is in bytes, not sectors") Cc: stable@vger.kernel.org # 4.4+ Reported-by: Fam Zheng <famz@redhat.com> Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Andrew Patterson <andrew.patterson@hpe.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* sd: Fix excessive capacity printing on devices with blocks bigger than 512 bytesMartin K. Petersen2016-04-011-1/+6
| | | | | | | | | | | | | | | | | | | | | | During revalidate we check whether device capacity has changed before we decide whether to output disk information or not. The check for old capacity failed to take into account that we scaled sdkp->capacity based on the reported logical block size. And therefore the capacity test would always fail for devices with sectors bigger than 512 bytes and we would print several copies of the same discovery information. Avoid scaling sdkp->capacity and instead adjust the value on the fly when setting the block device capacity and generating fake C/H/S geometry. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Cc: <stable@vger.kernel.org> Reported-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Hannes Reinicke <hare@suse.de> Reviewed-by: Ewan Milne <emilne@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* block/sd: Fix device-imposed transfer length limitsMartin K. Petersen2015-11-261-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 4f258a46346c ("sd: Fix maximum I/O size for BLOCK_PC requests") had the unfortunate side-effect of removing an implicit clamp to BLK_DEF_MAX_SECTORS for REQ_TYPE_FS requests in the block layer code. This caused problems for some SMR drives. Debugging this issue revealed a few problems with the existing infrastructure since the block layer didn't know how to deal with device-imposed limits, only limits set by the I/O controller. - Introduce a new queue limit, max_dev_sectors, which is used by the ULD to signal the maximum sectors for a REQ_TYPE_FS request. - Ensure that max_dev_sectors is correctly stacked and taken into account when overriding max_sectors through sysfs. - Rework sd_read_block_limits() so it saves the max_xfer and opt_xfer values for later processing. - In sd_revalidate() set the queue's max_dev_sectors based on the MAXIMUM TRANSFER LENGTH value in the Block Limits VPD. If this value is not reported, fall back to a cap based on the CDB TRANSFER LENGTH field size. - In sd_revalidate(), use OPTIMAL TRANSFER LENGTH from the Block Limits VPD--if reported and sane--to signal the preferred device transfer size for FS requests. Otherwise use BLK_DEF_MAX_SECTORS. - blk_limits_max_hw_sectors() is no longer used and can be removed. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=93581 Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: sweeneygj@gmx.com Tested-by: Arzeets <anatol.pomozov@gmail.com> Tested-by: David Eisner <david.eisner@oriel.oxon.org> Tested-by: Mario Kicherer <dev@kicherer.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
* scsi: introduce sdev_prefix_printk()Hannes Reinecke2014-11-121-3/+3
| | | | | | | | | | Like scmd_printk(), but the device name is passed in as a string. Can be used by eg ULDs which do not have access to the scsi_cmnd structure. Signed-off-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Robert Elliott <elliott@hp.com> Signed-off-by: Christoph Hellwig <hch@lst.de>