| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Pull block driver changes from Jens Axboe:
"Now that the core bits are in, here's the pull request for the driver
related changes for 3.16. Nothing out of the ordinary here, mostly
business as usual. There are a few pulls of for-3.16/core into this
branch, which were done when the blk-mq was modified after the
mtip32xx conversion was put in.
The pull request contains:
- skd and cciss converted to use pci_enable_msix_exact(). From
Alexander Gordeev.
- A few mtip32xx fixes from Asai @ Micron.
- The conversion of mtip32xx from make_request_fn to blk-mq, and a
later small fix for that conversion on quiescing for non-queued IO.
From me.
- A fix for bsg to use an exported function to check whether this
driver is request based or not. Needed updating for blk-mq, which
is request based, but does not have a request_fn hook. From me.
- Small floppy bug fix from Jiri.
- A series of cleanups for the cdrom uniform layer from Joe Perches.
Gets rid of various old ugly macros, making the code conform more
to the modern coding style.
- A series of patches for drbd from the drbd crew (Lars Ellenberg and
Philipp Reisner).
- A use-after-free fix for null_blk from Ming Lei.
- Also from Ming Lei is a performance patch for virtio-blk, which can
net us a 3x win on kvm platforms where world notification is
expensive.
- Ming Lei also fixed a stall issue in virtio-blk, due to a race
between queue start/stop and resource limits.
- A small batch of fixes for xen-blk{back,front} from Olaf Hering and
Valentin Priescu"
* 'for-3.16/drivers' of git://git.kernel.dk/linux-block: (54 commits)
block: virtio_blk: don't hold spin lock during world switch
xen-blkback: defer freeing blkif to avoid blocking xenwatch
xen blkif.h: fix comment typo in discard-alignment
xen/blkback: disable discard feature if requested by toolstack
xen-blkfront: remove type check from blkfront_setup_discard
floppy: do not corrupt bio.bi_flags when reading block 0
mtip32xx: move error handling to service thread
virtio_blk: fix race between start and stop queue
mtip32xx: stop block hardware queues before quiescing IO
mtip32xx: blk_mq_init_queue() returns an ERR_PTR
mtip32xx: convert to use blk-mq
cdrom: Remove unnecessary prototype for cdrom_get_disc_info
cdrom: Remove unnecessary prototype for cdrom_mrw_exit
cdrom: Remove cdrom_count_tracks prototype
cdrom: Remove cdrom_get_next_writeable prototype
cdrom: Remove cdrom_get_last_written prototype
cdrom: Move mmc_ioctls above cdrom_ioctl to remove unnecessary prototype
cdrom: Remove unnecessary sanitize_format prototype
cdrom: Remove unnecessary check_for_audio_disc prototype
cdrom: Remove prototype for open_for_data
...
|
| |\
| | |
| | |
| | |
| | |
| | |
| | | |
Pulled in for the blk_mq_tag_to_rq() change, which impacts
mtip32xx.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |\ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip into for-3.16/drivers
Konrad writes:
Please git pull the following branch:
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip.git stable/for-jens-3.16
which has a bunch of fixes to the Xen block frontend and backend driver
and a new parameter for Xen backend driver - an override (set by the toolstack)
whether to expose the discard support (if disk of course supports it) or not.
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Add the missing 'n' to discard-alignment
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
| |\ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Pull in core changes (again), since we got rid of the alloc/free
hctx mq_ops hooks and mtip32xx then needed updating again.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |\ \ \ \
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
mtip32xx uses blk_mq_alloc_reserved_request(), so pull in the
core changes so we have a properly merged end result.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
In case a connection transitions into C_TIMEOUT within the timer
function (request_timer_fn()) we need to make sure that the receiver
thread (potentially running on a different CPU) sees the updated
cstate later on.
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
...instead directly assigning to q->limits.discard_zeroes_data
Signed-off-by: Philipp Reisner <philipp.reisner@linbit.com>
Signed-off-by: Lars Ellenberg <lars.ellenberg@linbit.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
bsg currently checks ->request_fn to check whether a queue can
handle struct request. But with blk-mq, we don't have a request_fn
yet are request based. Add a queue_is_rq_based() helper and use
that in bsg, I'm guessing this is not the last place we need to
update for this. Besides, it better explains what is being
checked.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
|\ \ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci into next
Pull PCI changes from Bjorn Helgaas:
"Enumeration
- Notify driver before and after device reset (Keith Busch)
- Use reset notification in NVMe (Keith Busch)
NUMA
- Warn if we have to guess host bridge node information (Myron Stowe)
- Work around AMD Fam15h BIOSes that fail to provide _PXM (Suravee
Suthikulpanit)
- Clean up and mark early_root_info_init() as deprecated (Suravee
Suthikulpanit)
Driver binding
- Add "driver_override" for force specific binding (Alex Williamson)
- Fail "new_id" addition for devices we already know about (Bandan
Das)
Resource management
- Support BAR sizes up to 8GB (Nikhil Rao, Alan Cox)
- Don't move IORESOURCE_PCI_FIXED resources (Bjorn Helgaas)
- Mark SBx00 HPET BAR as IORESOURCE_PCI_FIXED (Bjorn Helgaas)
- Fail safely if we can't handle BARs larger than 4GB (Bjorn Helgaas)
- Reject BAR above 4GB if dma_addr_t is too small (Bjorn Helgaas)
- Don't convert BAR address to resource if dma_addr_t is too small
(Bjorn Helgaas)
- Don't set BAR to zero if dma_addr_t is too small (Bjorn Helgaas)
- Don't print anything while decoding is disabled (Bjorn Helgaas)
- Don't add disabled subtractive decode bus resources (Bjorn Helgaas)
- Add resource allocation comments (Bjorn Helgaas)
- Restrict 64-bit prefetchable bridge windows to 64-bit resources
(Yinghai Lu)
- Assign i82875p_edac PCI resources before adding device (Yinghai Lu)
PCI device hotplug
- Remove unnecessary "dev->bus" test (Bjorn Helgaas)
- Use PCI_EXP_SLTCAP_PSN define (Bjorn Helgaas)
- Fix rphahp endianess issues (Laurent Dufour)
- Acknowledge spurious "cmd completed" event (Rajat Jain)
- Allow hotplug service drivers to operate in polling mode (Rajat Jain)
- Fix cpqphp possible NULL dereference (Rickard Strandqvist)
MSI
- Replace pci_enable_msi_block() by pci_enable_msi_exact()
(Alexander Gordeev)
- Replace pci_enable_msix() by pci_enable_msix_exact() (Alexander Gordeev)
- Simplify populate_msi_sysfs() (Jan Beulich)
Virtualization
- Add Intel Patsburg (X79) root port ACS quirk (Alex Williamson)
- Mark RTL8110SC INTx masking as broken (Alex Williamson)
Generic host bridge driver
- Add generic PCI host controller driver (Will Deacon)
Freescale i.MX6
- Use new clock names (Lucas Stach)
- Drop old IRQ mapping (Lucas Stach)
- Remove optional (and unused) IRQs (Lucas Stach)
- Add support for MSI (Lucas Stach)
- Fix imx6_add_pcie_port() section mismatch warning (Sachin Kamat)
Renesas R-Car
- Add gen2 device tree support (Ben Dooks)
- Use new OF interrupt mapping when possible (Lucas Stach)
- Add PCIe driver (Phil Edworthy)
- Add PCIe MSI support (Phil Edworthy)
- Add PCIe device tree bindings (Phil Edworthy)
Samsung Exynos
- Remove unnecessary OOM messages (Jingoo Han)
- Fix add_pcie_port() section mismatch warning (Sachin Kamat)
Synopsys DesignWare
- Make MSI ISR shared IRQ aware (Lucas Stach)
Miscellaneous
- Check for broken config space aliasing (Alex Williamson)
- Update email address (Ben Hutchings)
- Fix Broadcom CNB20LE unintended sign extension (Bjorn Helgaas)
- Fix incorrect vgaarb conditional in WARN_ON() (Bjorn Helgaas)
- Remove unnecessary __ref annotations (Bjorn Helgaas)
- Add arch/x86/kernel/quirks.c to MAINTAINERS PCI file patterns
(Bjorn Helgaas)
- Fix use of uninitialized MPS value (Bjorn Helgaas)
- Tidy x86/gart messages (Bjorn Helgaas)
- Fix return value from pci_user_{read,write}_config_*() (Gavin Shan)
- Turn pcibios_penalize_isa_irq() into a weak function (Hanjun Guo)
- Remove unused serial device IDs (Jean Delvare)
- Use designated initialization in PCI_VDEVICE (Mark Rustad)
- Fix powerpc NULL dereference in pci_root_buses traversal (Mike Qiu)
- Configure MPS on ARM (Murali Karicheri)
- Remove unnecessary includes of <linux/init.h> (Paul Gortmaker)
- Move Open Firmware devspec attribute to PCI common code (Sebastian Ott)
- Use pdev->dev.groups for attribute creation on s390 (Sebastian Ott)
- Remove pcibios_add_platform_entries() (Sebastian Ott)
- Add new ID for Intel GPU "spurious interrupt" quirk (Thomas Jarosch)
- Rename pci_is_bridge() to pci_has_subordinate() (Yijing Wang)
- Add and use new pci_is_bridge() interface (Yijing Wang)
- Make pci_bus_add_device() void (Yijing Wang)
DMA API
- Clarify physical/bus address distinction in docs (Bjorn Helgaas)
- Fix typos in docs (Emilio López)
- Update dma_pool_create ()and dma_pool_alloc() descriptions (Gioh Kim)
- Change dma_declare_coherent_memory() CPU address to phys_addr_t
(Bjorn Helgaas)
- Pass GAPSPCI_DMA_BASE CPU & bus address to dma_declare_coherent_memory()
(Bjorn Helgaas)"
* tag 'pci-v3.16-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: (92 commits)
MAINTAINERS: Add generic PCI host controller driver
PCI: generic: Add generic PCI host controller driver
PCI: imx6: Add support for MSI
PCI: designware: Make MSI ISR shared IRQ aware
PCI: imx6: Remove optional (and unused) IRQs
PCI: imx6: Drop old IRQ mapping
PCI: imx6: Use new clock names
i82875p_edac: Assign PCI resources before adding device
ARM/PCI: Call pcie_bus_configure_settings() to set MPS
PCI: imx6: Fix imx6_add_pcie_port() section mismatch warning
PCI: Make pci_bus_add_device() void
PCI: exynos: Fix add_pcie_port() section mismatch warning
PCI: Introduce new device binding path using pci_dev.driver_override
PCI: rcar: Add gen2 device tree support
PCI: cpqphp: Fix possible null pointer dereference
PCI: rcar: Add R-Car PCIe device tree bindings
PCI: rcar: Add MSI support for PCIe
PCI: rcar: Add Renesas R-Car PCIe driver
PCI: Fix return value from pci_user_{read,write}_config_*()
PCI: exynos: Remove unnecessary OOM messages
...
|
| | \ \ \ \ \ | |
| | \ \ \ \ \ | |
| |\ \ \ \ \ \ \
| | | |_|_|_|/ /
| | |/| | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
'pci/misc' into next
* pci/host-exynos:
PCI: exynos: Fix add_pcie_port() section mismatch warning
* pci/host-imx6:
PCI: imx6: Add support for MSI
PCI: designware: Make MSI ISR shared IRQ aware
PCI: imx6: Remove optional (and unused) IRQs
PCI: imx6: Drop old IRQ mapping
PCI: imx6: Use new clock names
PCI: imx6: Fix imx6_add_pcie_port() section mismatch warning
* pci/resource:
i82875p_edac: Assign PCI resources before adding device
* pci/misc:
ARM/PCI: Call pcie_bus_configure_settings() to set MPS
PCI: Make pci_bus_add_device() void
Conflicts:
drivers/edac/i82875p_edac.c
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
pci_bus_add_device() always returns 0, so there's no point in returning
anything at all. Make it a void function and remove the tests of the
return value from the callers.
[bhelgaas: changelog, remove unused "err" from i82875p_setup_overfl_dev()]
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
| |\ \ \ \ \ \ \
| | | |/ / / / /
| | |/| | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
* pci/misc:
PCI: Fix return value from pci_user_{read,write}_config_*()
PCI: Turn pcibios_penalize_isa_irq() into a weak function
PCI: Test for std config alias when testing extended config space
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
The PCI user-space config accessors pci_user_{read,write}_config_*() return
negative error numbers, which were introduced by commit 34e3207205ef
("PCI: handle positive error codes"). That patch converted all positive
error numbers from platform-specific PCI config accessors to -EINVAL, which
means the callers don't know anything about the specific cause of the
failure.
The patch fixes the issue by converting the positive PCIBIOS_* error values
to generic negative error numbers with pcibios_err_to_errno().
[bhelgaas: changelog]
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Greg Thelen <gthelen@google.com>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
pcibios_penalize_isa_irq() is only implemented by x86 now, and legacy ISA
is not used by some architectures. Make pcibios_penalize_isa_irq() a
__weak function to simplify the code. This removes the need for new
platforms to add stub implementations of pcibios_penalize_isa_irq().
[bhelgaas: changelog, comments]
Signed-off-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
|
| | | | | | | | | |
| | \ \ \ \ \ \ | |
| | \ \ \ \ \ \ | |
| | \ \ \ \ \ \ | |
| |\ \ \ \ \ \ \ \ \
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | |
| | | | | | | | | | | |
into next
* pci/hotplug:
PCI: cpqphp: Fix possible null pointer dereference
NVMe: Implement PCIe reset notification callback
PCI: Notify driver before and after device reset
* pci/pci_is_bridge:
pcmcia: Use pci_is_bridge() to simplify code
PCI: pciehp: Use pci_is_bridge() to simplify code
PCI: acpiphp: Use pci_is_bridge() to simplify code
PCI: cpcihp: Use pci_is_bridge() to simplify code
PCI: shpchp: Use pci_is_bridge() to simplify code
PCI: rpaphp: Use pci_is_bridge() to simplify code
sparc/PCI: Use pci_is_bridge() to simplify code
powerpc/PCI: Use pci_is_bridge() to simplify code
ia64/PCI: Use pci_is_bridge() to simplify code
x86/PCI: Use pci_is_bridge() to simplify code
PCI: Use pci_is_bridge() to simplify code
PCI: Add new pci_is_bridge() interface
PCI: Rename pci_is_bridge() to pci_has_subordinate()
* pci/virtualization:
PCI: Introduce new device binding path using pci_dev.driver_override
Conflicts:
drivers/pci/pci-sysfs.c
|
| | | | | |/ / / / /
| | | | |/| | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
The driver_override field allows us to specify the driver for a device
rather than relying on the driver to provide a positive match of the
device. This shortcuts the existing process of looking up the vendor and
device ID, adding them to the driver new_id, binding the device, then
removing the ID, but it also provides a couple advantages.
First, the above existing process allows the driver to bind to any device
matching the new_id for the window where it's enabled. This is often not
desired, such as the case of trying to bind a single device to a meta
driver like pci-stub or vfio-pci. Using driver_override we can do this
deterministically using:
echo pci-stub > /sys/bus/pci/devices/0000:03:00.0/driver_override
echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind
echo 0000:03:00.0 > /sys/bus/pci/drivers_probe
Previously we could not invoke drivers_probe after adding a device to
new_id for a driver as we get non-deterministic behavior whether the driver
we intend or the standard driver will claim the device. Now it becomes a
deterministic process, only the driver matching driver_override will probe
the device.
To return the device to the standard driver, we simply clear the
driver_override and reprobe the device:
echo > /sys/bus/pci/devices/0000:03:00.0/driver_override
echo 0000:03:00.0 > /sys/bus/pci/devices/0000:03:00.0/driver/unbind
echo 0000:03:00.0 > /sys/bus/pci/drivers_probe
Another advantage to this approach is that we can specify a driver override
to force a specific binding or prevent any binding. For instance when an
IOMMU group is exposed to userspace through VFIO we require that all
devices within that group are owned by VFIO. However, devices can be
hot-added into an IOMMU group, in which case we want to prevent the device
from binding to any driver (override driver = "none") or perhaps have it
automatically bind to vfio-pci. With driver_override it's a simple matter
for this field to be set internally when the device is first discovered to
prevent driver matches.
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reviewed-by: Alexander Graf <agraf@suse.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
| | | |/ / / / / /
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Add a helper function to check a device's header type for PCI bridge or
CardBus bridge.
Requires: 326c1cdae741 PCI: Rename pci_is_bridge() to pci_has_subordinate()
Signed-off-by: Yijing Wang <wangyijing@huawei.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
| | |/ / / / / /
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
Notify a PCI device driver when its device's access is about to be disabled
for an impending reset attempt, then after the attempt completes and device
access is restored. The notification is via the pci_error_handlers
interface.
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
| | | | | | | | | |
| | \ \ \ \ \ \ | |
| | \ \ \ \ \ \ | |
| | \ \ \ \ \ \ | |
| | \ \ \ \ \ \ | |
| | \ \ \ \ \ \ | |
| |\ \ \ \ \ \ \ \ \ \
| | | | |_|/ / / / / /
| | | |/| | / / / / /
| | | | | |/ / / / /
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | |
| | | | | | | | | | |
'pci/resource' into next
* dma-api:
iommu/exynos: Remove unnecessary "&" from function pointers
DMA-API: Update dma_pool_create ()and dma_pool_alloc() descriptions
DMA-API: Fix duplicated word in DMA-API-HOWTO.txt
DMA-API: Capitalize "CPU" consistently
sh/PCI: Pass GAPSPCI_DMA_BASE CPU & bus address to dma_declare_coherent_memory()
DMA-API: Change dma_declare_coherent_memory() CPU address to phys_addr_t
DMA-API: Clarify physical/bus address distinction
* pci/virtualization:
PCI: Mark RTL8110SC INTx masking as broken
* pci/msi:
PCI/MSI: Remove pci_enable_msi_block()
* pci/misc:
PCI: Remove pcibios_add_platform_entries()
s390/pci: use pdev->dev.groups for attribute creation
PCI: Move Open Firmware devspec attribute to PCI common code
* pci/resource:
PCI: Add resource allocation comments
PCI: Simplify __pci_assign_resource() coding style
PCI: Change pbus_size_mem() return values to be more conventional
PCI: Restrict 64-bit prefetchable bridge windows to 64-bit resources
PCI: Support BAR sizes up to 8GB
resources: Clarify sanity check message
PCI: Don't add disabled subtractive decode bus resources
PCI: Don't print anything while decoding is disabled
PCI: Don't set BAR to zero if dma_addr_t is too small
PCI: Don't convert BAR address to resource if dma_addr_t is too small
PCI: Reject BAR above 4GB if dma_addr_t is too small
PCI: Fail safely if we can't handle BARs larger than 4GB
x86/gart: Tidy messages and add bridge device info
x86/gart: Replace printk() with pr_info()
x86/PCI: Move pcibios_assign_resources() annotation to definition
x86/PCI: Mark ATI SBx00 HPET BAR as IORESOURCE_PCI_FIXED
x86/PCI: Don't try to move IORESOURCE_PCI_FIXED resources
x86/PCI: Fix Broadcom CNB20LE unintended sign extension
|
| | |_|_|/ / / / /
| |/| | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | |
| | | | | | | | | |
Remove pcibios_add_platform_entries(). Architecture-specific attributes
can be achieved by setting pdev->dev.groups.
Link: https://lkml.kernel.org/r/alpine.LFD.2.11.1404141101500.1529@denkbrett
Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
| | | |/ / / / /
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
There are no users of pci_enable_msi_block() function left. Obsolete it in
favor of pci_enable_msi_range() and pci_enable_msi_exact() functions.
Previously, we called arch_setup_msi_irqs() once, requesting the same
vector count we passed to arch_msi_check_device(). Now we may call it
several times: if it returns failure, we may retry and request fewer
vectors.
We don't keep track of the vector count we initially passed to
arch_msi_check_device(). We only keep track of the number of vectors
successfully set up by arch_setup_msi_irqs(), and this is what we use to
clean things up when disabling MSI. Therefore, we assume that
arch_msi_check_device() does nothing that will have to be cleaned up later.
[bhelgaas: changelog]
Signed-off-by: Alexander Gordeev <agordeev@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | |
| | | | | | | | |
dma_declare_coherent_memory() takes two addresses for a region of memory: a
"bus_addr" and a "device_addr". I think the intent is that "bus_addr" is
the physical address a *CPU* would use to access the region, and
"device_addr" is the bus address the *device* would use to address the
region.
Rename "bus_addr" to "phys_addr" and change its type to phys_addr_t.
Most callers already supply a phys_addr_t for this argument. The others
supply a 32-bit integer (a constant, unsigned int, or __u32) and need no
change.
Use "unsigned long", not phys_addr_t, to hold PFNs.
No functional change (this could theoretically fix a truncation in a config
with 32-bit dma_addr_t and 64-bit phys_addr_t, but I don't think there are
any such cases involving this code).
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: James Bottomley <jbottomley@Parallels.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
|
| | |/ / / / /
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
The DMA-API documentation sometimes refers to "physical addresses" when it
really means "bus addresses." Sometimes these are identical, but they may
be different if the bridge leading to the bus performs address translation.
Update the documentation to use "bus address" when appropriate.
Also, consistently capitalize "DMA", use parens with function names, use
dev_printk() in examples, and reword a few sections for clarity.
No functional change; documentation changes only.
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: James Bottomley <jbottomley@Parallels.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
By using designated initialization in PCI_VDEVICE, like other similar
macros, many "missing initializer" warnings that appear when compiling with
W=2 can be silenced.
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Tested-by: Kavindya Deegala <kavindya.s.deegala@intel.com>
Signed-off-by: Mark Rustad <mark.d.rustad@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
|
| |/ / / / /
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
These IDs are no longer referenced since kernel 3.1 so I suppose we can
remove them from pci_ids.h.
Signed-off-by: Jean Delvare <jdelvare@suse.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|\ \ \ \ \ \
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Pull block core updates from Jens Axboe:
"It's a big(ish) round this time, lots of development effort has gone
into blk-mq in the last 3 months. Generally we're heading to where
3.16 will be a feature complete and performant blk-mq. scsi-mq is
progressing nicely and will hopefully be in 3.17. A nvme port is in
progress, and the Micron pci-e flash driver, mtip32xx, is converted
and will be sent in with the driver pull request for 3.16.
This pull request contains:
- Lots of prep and support patches for scsi-mq have been integrated.
All from Christoph.
- API and code cleanups for blk-mq from Christoph.
- Lots of good corner case and error handling cleanup fixes for
blk-mq from Ming Lei.
- A flew of blk-mq updates from me:
* Provide strict mappings so that the driver can rely on the CPU
to queue mapping. This enables optimizations in the driver.
* Provided a bitmap tagging instead of percpu_ida, which never
really worked well for blk-mq. percpu_ida relies on the fact
that we have a lot more tags available than we really need, it
fails miserably for cases where we exhaust (or are close to
exhausting) the tag space.
* Provide sane support for shared tag maps, as utilized by scsi-mq
* Various fixes for IO timeouts.
* API cleanups, and lots of perf tweaks and optimizations.
- Remove 'buffer' from struct request. This is ancient code, from
when requests were always virtually mapped. Kill it, to reclaim
some space in struct request. From me.
- Remove 'magic' from blk_plug. Since we store these on the stack
and since we've never caught any actual bugs with this, lets just
get rid of it. From me.
- Only call part_in_flight() once for IO completion, as includes two
atomic reads. Hopefully we'll get a better implementation soon, as
the part IO stats are now one of the more expensive parts of doing
IO on blk-mq. From me.
- File migration of block code from {mm,fs}/ to block/. This
includes bio.c, bio-integrity.c, bounce.c, and ioprio.c. From me,
from a discussion on lkml.
That should describe the meat of the pull request. Also has various
little fixes and cleanups from Dave Jones, Shaohua Li, Duan Jiong,
Fengguang Wu, Fabian Frederick, Randy Dunlap, Robert Elliott, and Sam
Bradshaw"
* 'for-3.16/core' of git://git.kernel.dk/linux-block: (100 commits)
blk-mq: push IPI or local end_io decision to __blk_mq_complete_request()
blk-mq: remember to start timeout handler for direct queue
block: ensure that the timer is always added
blk-mq: blk_mq_unregister_hctx() can be static
blk-mq: make the sysfs mq/ layout reflect current mappings
blk-mq: blk_mq_tag_to_rq should handle flush request
block: remove dead code in scsi_ioctl:blk_verify_command
blk-mq: request initialization optimizations
block: add queue flag for disabling SG merging
block: remove 'magic' from struct blk_plug
blk-mq: remove alloc_hctx and free_hctx methods
blk-mq: add file comments and update copyright notices
blk-mq: remove blk_mq_alloc_request_pinned
blk-mq: do not use blk_mq_alloc_request_pinned in blk_mq_map_request
blk-mq: remove blk_mq_wait_for_tags
blk-mq: initialize request in __blk_mq_alloc_request
blk-mq: merge blk_mq_alloc_reserved_request into blk_mq_alloc_request
blk-mq: add helper to insert requests from irq context
blk-mq: remove stale comment for blk_mq_complete_request()
blk-mq: allow non-softirq completions
...
|
| | |_|_|_|/
| |/| | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Currently blk-mq registers all the hardware queues in sysfs,
regardless of whether it uses them (e.g. they have CPU mappings)
or not. The unused hardware queues lack the cpux/ directories,
and the other sysfs entries (like active, pending, etc) are all
zeroes.
Change this so that sysfs correctly reflects the current mappings
of the hardware queues.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
flush request is special, which borrows the tag from the parent
request. Hence blk_mq_tag_to_rq needs special handling to return
the flush request from the tag.
Signed-off-by: Shaohua Li <shli@fusionio.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
If devices are not SG starved, we waste a lot of time potentially
collapsing SG segments. Enough that 1.5% of the CPU time goes
to this, at only 400K IOPS. Add a queue flag, QUEUE_FLAG_NO_SG_MERGE,
which just returns the number of vectors in a bio instead of looping
over all segments and checking for collapsible ones.
Add a BLK_MQ_F_SG_MERGE flag so that drivers can opt-in on the sg
merging, if they so desire.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | |_|_|/
| |/| | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
I don't think we've ever caught any bugs with this, and there's the
list poisoning for the plug lists to catch uninitialized cases.
So remove the magic member and save 8 bytes in the struct.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | |_|/
| |/| |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
There is no need for drivers to control hardware context allocation
now that we do the context to node mapping in common code.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Instead of having two almost identical copies of the same code just let
the callers pass in the reserved flag directly.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Both the cache flush state machine and the SCSI midlayer want to submit
requests from irq context, and the current per-request requeue_work
unfortunately causes corruption due to sharing with the csd field for
flushes. Replace them with a per-request_queue list of requests to
be requeued.
Based on an earlier test by Ming Lei.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Ming Lei <tom.leiming@gmail.com>
Tested-by: Ming Lei <tom.leiming@gmail.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
It works for both IPI and local completions as of commit
95f096849932.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Right now we export two ways of completing a request:
1) blk_mq_complete_request(). This uses an IPI (if needed) and
completes through q->softirq_done_fn(). It also works with
timeouts.
2) blk_mq_end_io(). This completes inline, and ignores any timeout
state of the request.
Let blk_mq_complete_request() handle non-softirq_done_fn completions
as well, by just completing inline. If a driver has enough completion
ports to place completions correctly, it need not define a
mq_ops->complete() and we can avoid an indirect function call by
doing the completion inline.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Drivers currently have to figure this out on their own, and they
are missing information to do it properly. The ones that did
attempt to do it, do it wrong.
So just pass in the suggested node directly to the alloc
function.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Export the blk-mq in-flight tag iterator for driver consumption.
This is particularly useful in exception paths or SRSI where
in-flight IOs need to be cancelled and/or reissued. The NVMe driver
conversion will use this.
Signed-off-by: Sam Bradshaw <sbradshaw@micron.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Prepare this for the next patch which adds more smarts in the
plugging logic, so that we can save some memory.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
For request_fn based devices, the block layer exports a 'nr_requests'
file through sysfs to allow adjusting of queue depth on the fly.
Currently this returns -EINVAL for blk-mq, since it's not wired up.
Wire this up for blk-mq, so that it now also always dynamic
adjustments of the allowed queue depth for any given block device
managed by blk-mq.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |\ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Signed-off-by: Jens Axboe <axboe@fb.com>
Conflicts:
block/blk-mq-tag.c
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This adds support for active queue tracking, meaning that the
blk-mq tagging maintains a count of active users of a tag set.
This allows us to maintain a notion of fairness between users,
so that we can distribute the tag depth evenly without starving
some users while allowing others to try unfair deep queues.
If sharing of a tag set is detected, each hardware queue will
track the depth of its own queue. And if this exceeds the total
depth divided by the number of active queues, the user is actively
throttled down.
The active queue count is done lazily to avoid bouncing that data
between submitter and completer. Each hardware queue gets marked
active when it allocates its first tag, and gets marked inactive
when 1) the last tag is cleared, and 2) the queue timeout grace
period has passed.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| |/ / /
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Each hardware queue has a bitmap of software queues with pending
requests. When new IO is queued on a software queue, the bit is
set, and when IO is pruned on a hardware queue run, the bit is
cleared. This causes a lot of traffic. Switch this from the regular
BITS_PER_LONG bitmap to a sparser layout, similarly to what was
done for blk-mq tagging.
20% performance increase was observed for single threaded IO, and
about 15% performanc increase on multiple threads driving the
same device.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
blk-mq currently uses percpu_ida for tag allocation. But that only
works well if the ratio between tag space and number of CPUs is
sufficiently high. For most devices and systems, that is not the
case. The end result if that we either only utilize the tag space
partially, or we end up attempting to fully exhaust it and run
into lots of lock contention with stealing between CPUs. This is
not optimal.
This new tagging scheme is a hybrid bitmap allocator. It uses
two tricks to both be SMP friendly and allow full exhaustion
of the space:
1) We cache the last allocated (or freed) tag on a per blk-mq
software context basis. This allows us to limit the space
we have to search. The key element here is not caching it
in the shared tag structure, otherwise we end up dirtying
more shared cache lines on each allocate/free operation.
2) The tag space is split into cache line sized groups, and
each context will start off randomly in that space. Even up
to full utilization of the space, this divides the tag users
efficiently into cache line groups, avoiding dirtying the same
one both between allocators and between allocator and freeer.
This scheme shows drastically better behaviour, both on small
tag spaces but on large ones as well. It has been tested extensively
to show better performance for all the cases blk-mq cares about.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This allows us to avoid a non-atomic memset over ->atomic_flags as well
as killing lots of duplicate initializations.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Right now we just pick the first CPU in the mask, but that can
easily overload that one. Add some basic batching and round-robin
all the entries in the mask instead.
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The blk-mq code is using it's own version of the I/O completion affinity
tunables, which causes a few issues:
- the rq_affinity sysfs file doesn't work for blk-mq devices, even if it
still is present, thus breaking existing tuning setups.
- the rq_affinity = 1 mode, which is the defauly for legacy request based
drivers isn't implemented at all.
- blk-mq drivers don't implement any completion affinity with the default
flag settings.
This patches removes the blk-mq ipi_redirect flag and sysfs file, as well
as the internal BLK_MQ_F_SHOULD_IPI flag and replaces it with code that
respects the queue-wide rq_affinity flags and also implements the
rq_affinity = 1 mode.
This means I/O completion affinity can now only be tuned block-queue wide
instead of per context, which seems more sensible to me anyway.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | |/
| |/|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
bs is no longer used in biovec_create_pool since 9f060e2231ca96 ("block:
Convert integrity to bvec_alloc_bs()")
Signed-off-by: Fabian Frederick <fabf@skynet.be>
Cc: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This allows to mirror the blk-mq code flow for more a more readable I/O
completion handler in SCSI.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We will use this work_struct to requeue scsi commands from the
completion handler as well, so give it a more generic name.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
|