summaryrefslogtreecommitdiffstats
path: root/mm/migrate.c (unfollow)
Commit message (Collapse)AuthorFilesLines
2013-05-30xfs: don't emit v5 superblock warnings on writeDave Chinner1-7/+11
We write the superblock every 30s or so which results in the verifier being called. Right now that results in this output every 30s: XFS (vda): Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled! Use of these features in this kernel is at your own risk! And spamming the logs. We don't need to check for whether we support v5 superblocks or whether there are feature bits we don't support set as these are only relevant when we first mount the filesytem. i.e. on superblock read. Hence for the write verification we can just skip all the checks (and hence verbose output) altogether. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-05-24xfs: rework remote attr CRCsDave Chinner4-158/+247
Note: this changes the on-disk remote attribute format. I assert that this is OK to do as CRCs are marked experimental and the first kernel it is included in has not yet reached release yet. Further, the userspace utilities are still evolving and so anyone using this stuff right now is a developer or tester using volatile filesystems for testing this feature. Hence changing the format right now to save longer term pain is the right thing to do. The fundamental change is to move from a header per extent in the attribute to a header per filesytem block in the attribute. This means there are more header blocks and the parsing of the attribute data is slightly more complex, but it has the advantage that we always know the size of the attribute on disk based on the length of the data it contains. This is where the header-per-extent method has problems. We don't know the size of the attribute on disk without first knowing how many extents are used to hold it. And we can't tell from a mapping lookup, either, because remote attributes can be allocated contiguously with other attribute blocks and so there is no obvious way of determining the actual size of the atribute on disk short of walking and mapping buffers. The problem with this approach is that if we map a buffer incorrectly (e.g. we make the last buffer for the attribute data too long), we then get buffer cache lookup failure when we map it correctly. i.e. we get a size mismatch on lookup. This is not necessarily fatal, but it's a cache coherency problem that can lead to returning the wrong data to userspace or writing the wrong data to disk. And debug kernels will assert fail if this occurs. I found lots of niggly little problems trying to fix this issue on a 4k block size filesystem, finally getting it to pass with lots of fixes. The thing is, 1024 byte filesystems still failed, and it was getting really complex handling all the corner cases that were showing up. And there were clearly more that I hadn't found yet. It is complex, fragile code, and if we don't fix it now, it will be complex, fragile code forever more. Hence the simple fix is to add a header to each filesystem block. This gives us the same relationship between the attribute data length and the number of blocks on disk as we have without CRCs - it's a linear mapping and doesn't require us to guess anything. It is simple to implement, too - the remote block count calculated at lookup time can be used by the remote attribute set/get/remove code without modification for both CRC and non-CRC filesystems. The world becomes sane again. Because the copy-in and copy-out now need to iterate over each filesystem block, I moved them into helper functions so we separate the block mapping and buffer manupulations from the attribute data and CRC header manipulations. The code becomes much clearer as a result, and it is a lot easier to understand and debug. It also appears to be much more robust - once it worked on 4k block size filesystems, it has worked without failure on 1k block size filesystems, too. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-05-24xfs: fully initialise temp leaf in xfs_attr3_leaf_compactDave Chinner1-16/+26
xfs_attr3_leaf_compact() uses a temporary buffer for compacting the the entries in a leaf. It copies the the original buffer into the temporary buffer, then zeros the original buffer completely. It then copies the entries back into the original buffer. However, the original buffer has not been correctly initialised, and so the movement of the entries goes horribly wrong. Make sure the zeroed destination buffer is fully initialised, and once we've set up the destination incore header appropriately, write is back to the buffer before starting to move entries around. While debugging this, the _d/_s prefixes weren't sufficient to remind me what buffer was what, so rename then all _src/_dst. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-05-24xfs: fully initialise temp leaf in xfs_attr3_leaf_unbalanceDave Chinner1-3/+13
xfs_attr3_leaf_unbalance() uses a temporary buffer for recombining the entries in two leaves when the destination leaf requires compaction. The temporary buffer ends up being copied back over the original destination buffer, so the header in the temporary buffer needs to contain all the information that is in the destination buffer. To make sure the temporary buffer is fully initialised, once we've set up the temporary incore header appropriately, write is back to the temporary buffer before starting to move entries around. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-05-24xfs: correctly map remote attr buffers during removalDave Chinner1-9/+18
If we don't map the buffers correctly (same as for get/set operations) then the incore buffer lookup will fail. If a block number matches but a length is wrong, then debug kernels will ASSERT fail in _xfs_buf_find() due to the length mismatch. Ensure that we map the buffers correctly by basing the length of the buffer on the attribute data length rather than the remote block count. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-05-24xfs: remote attribute tail zeroing does too muchDave Chinner1-19/+18
When an attribute data does not fill then entire remote block, we zero the remaining part of the buffer. This, however, needs to take into account that the buffer has a header, and so the offset where zeroing starts and the length of zeroing need to take this into account. Otherwise we end up with zeros over the end of the attribute value when CRCs are enabled. While there, make sure we only ask to map an extent that covers the remaining range of the attribute, rather than asking every time for the full length of remote data. If the remote attribute blocks are contiguous with other parts of the attribute tree, it will map those blocks as well and we can potentially zero them incorrectly. We can also get buffer size mistmatches when trying to read or remove the remote attribute, and this can lead to not finding the correct buffer when looking it up in cache. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-05-24xfs: remote attribute read too shortDave Chinner1-6/+9
Reading a maximally size remote attribute fails when CRCs are enabled with this verification error: XFS (vdb): remote attribute header does not match required off/len/owner) There are two reasons for this, the first being that the length of the buffer being read is determined from the args->rmtblkcnt which doesn't take into account CRC headers. Hence the mapped length ends up being too short and so we need to calculate it directly from the value length. The second is that the byte count of valid data within a buffer is capped by the length of the data and so doesn't take into account that the buffer might be longer due to headers. Hence we need to calculate the data space in the buffer first before calculating the actual byte count of data. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-05-21xfs: remote attribute allocation may be contiguousDave Chinner1-1/+5
When CRCs are enabled, there may be multiple allocations made if the headers cause a length overflow. This, however, does not mean that the number of headers required increases, as the second and subsequent extents may be contiguous with the previous extent. Hence when we map the extents to write the attribute data, we may end up with less extents than allocations made. Hence the assertion that we consume the number of headers we calculated in the allocation loop is incorrect and needs to be removed. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-05-21xfs: avoid nesting transactions in xfs_qm_scall_setqlim()Dave Chinner1-17/+23
Lockdep reports: ============================================= [ INFO: possible recursive locking detected ] 3.9.0+ #3 Not tainted --------------------------------------------- setquota/28368 is trying to acquire lock: (sb_internal){++++.?}, at: [<c11e8846>] xfs_trans_alloc+0x26/0x50 but task is already holding lock: (sb_internal){++++.?}, at: [<c11e8846>] xfs_trans_alloc+0x26/0x50 from xfs_qm_scall_setqlim()->xfs_dqread() when a dquot needs to be allocated. xfs_qm_scall_setqlim() is starting a transaction and then not passing it into xfs_qm_dqet() and so it starts it's own transaction when allocating the dquot. Splat! Fix this by not allocating the dquot in xfs_qm_scall_setqlim() inside the setqlim transaction. This requires getting the dquot first (and allocating it if necessary) then dropping and relocking the dquot before joining it to the setqlim transaction. Reported-by: Michael L. Semon <mlsemon35@gmail.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-05-21xfs: remote attribute lookups require the value lengthDave Chinner1-1/+2
When reading a remote attribute, to correctly calculate the length of the data buffer for CRC enable filesystems, we need to know the length of the attribute data. We get this information when we look up the attribute, but we don't store it in the args structure along with the other remote attr information we get from the lookup. Add this information to the args structure so we can use it appropriately. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-05-20xfs: xfs_attr_shortform_allfit() does not handle attr3 format.Dave Chinner1-11/+13
xfstests generic/117 fails with: XFS: Assertion failed: leaf->hdr.info.magic == cpu_to_be16(XFS_ATTR_LEAF_MAGIC) indicating a function that does not handle the attr3 format correctly. Fix it. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-05-20xfs: xfs_da3_node_read_verify() doesn't handle XFS_ATTR3_LEAF_MAGICDave Chinner1-0/+1
Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-05-20xfs: fix missing KM_NOFS tags to keep lockdep happyDave Chinner4-5/+7
There are several places where we use KM_SLEEP allocation contexts and use the fact that they are called from transaction context to add KM_NOFS where appropriate. Unfortunately, there are several places where the code makes this assumption but can be called from outside transaction context but with filesystem locks held. These places need explicit KM_NOFS annotations to avoid lockdep complaining about reclaim contexts. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Ben Myers <bpm@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-05-20xfs: Don't reference the EFI after it is freedDave Chinner1-2/+3
Checking the EFI for whether it is being released from recovery after we've already released the known active reference is a mistake worthy of a brown paper bag. Fix the (now) obvious use after free that it can cause. Reported-by: Dave Jones <davej@redhat.com> Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-05-20xfs: fix rounding in xfs_free_file_spaceDave Chinner1-2/+2
The offset passed into xfs_free_file_space() needs to be rounded down to a certain size, but the rounding mask is built by a 32 bit variable. Hence the mask will always mask off the upper 32 bits of the offset and lead to incorrect writeback and invalidation ranges. This is not actually exposed as a bug because we writeback and invalidate from the rounded offset to the end of the file, and hence the offset we are actually punching a hole out of will always be covered by the code. This needs fixing, however, if we ever want to use exact ranges for writeback/invalidation here... Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-05-20xfs: fix sub-page blocksize data integrity writesDave Chinner1-0/+19
FSX on 512 byte block size filesystems has been failing for some time with corrupted data. The fault dates back to the change in the writeback data integrity algorithm that uses a mark-and-sweep approach to avoid data writeback livelocks. Unfortunately, a side effect of this mark-and-sweep approach is that each page will only be written once for a data integrity sync, and there is a condition in writeback in XFS where a page may require two writeback attempts to be fully written. As a result of the high level change, we now only get a partial page writeback during the integrity sync because the first pass through writeback clears the mark left on the page index to tell writeback that the page needs writeback.... The cause is writing a partial page in the clustering code. This can happen when a mapping boundary falls in the middle of a page - we end up writing back the first part of the page that the mapping covers, but then never revisit the page to have the remainder mapped and written. The fix is simple - if the mapping boundary falls inside a page, then simple abort clustering without touching the page. This means that the next ->writepage entry that write_cache_pages() will make is the page we aborted on, and xfs_vm_writepage() will map all sections of the page correctly. This behaviour is also optimal for non-data integrity writes, as it results in contiguous sequential writeback of the file rather than missing small holes and having to write them a "random" writes in a future pass. With this fix, all the fsx tests in xfstests now pass on a 512 byte block size filesystem on a 4k page machine. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-05-20xfs: Avoid pathological backwards allocationJan Kara1-6/+18
Writing a large file using direct IO in 16 MB chunks sometimes results in a pathological allocation pattern where 16 MB chunks of large free extent are allocated to a file in a reversed order. So extents of a file look for example as: ext logical physical expected length flags 0 0 13 4550656 1 4550656 188136807 4550668 12562432 2 17113088 200699240 200699238 622592 3 17735680 182046055 201321831 4096 4 17739776 182041959 182050150 4096 5 17743872 182037863 182046054 4096 6 17747968 182033767 182041958 4096 7 17752064 182029671 182037862 4096 ... 6757 45400064 154381644 154389835 4096 6758 45404160 154377548 154385739 4096 6759 45408256 252951571 154381643 73728 eof This happens because XFS_ALLOCTYPE_THIS_BNO allocation fails (the last extent in the file cannot be further extended) so we fall back to XFS_ALLOCTYPE_NEAR_BNO allocation which picks end of a large free extent as the best place to continue the file. Since the chunk at the end of the free extent again cannot be further extended, this behavior repeats until the whole free extent is consumed in a reversed order. For data allocations this backward allocation isn't beneficial so make xfs_alloc_compute_diff() pick start of a free extent instead of its end for them. That avoids the backward allocation pattern. See thread at http://oss.sgi.com/archives/xfs/2013-03/msg00144.html for more details about the reproduction case and why this solution was chosen. Based on idea by Dave Chinner <dchinner@redhat.com>. CC: Dave Chinner <dchinner@redhat.com> Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Mark Tinguely <tinguely@sgi.com> Signed-off-by: Ben Myers <bpm@sgi.com>
2013-05-12Linux 3.10-rc1v3.10-rc1Linus Torvalds1-2/+2
2013-05-10[SCSI] qla2xxx: Update firmware link in Kconfig file.Chad Dupuis1-1/+3
Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com> Signed-off-by: Chad Dupuis <chad.dupuis@qlogic.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-05-10[SCSI] iscsi class, qla4xxx: fix sess/conn refcounting when find fns are usedMike Christie3-55/+49
This fixes a bug where the iscsi class/driver did not do a put_device when a sess/conn device was found. This also simplifies the interface by not having to pass in some arguments that were duplicated and did not need to be exported. Reported-by: Zhao Hongjiang <zhaohongjiang@huawei.com> Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Acked-by: Vikas Chaudhary <vikas.chaudhary@qlogic.com> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-05-10[SCSI] sas: unify the pointlessly separated enums sas_dev_type and ↵James Bottomley24-165/+160
sas_device_type These enums have been separate since the dawn of SAS, mainly because the latter is a procotol only enum and the former includes additional state for libsas. The dichotomy causes endless confusion about which one you should use where and leads to pointless warnings like this: drivers/scsi/mvsas/mv_sas.c: In function 'mvs_update_phyinfo': drivers/scsi/mvsas/mv_sas.c:1162:34: warning: comparison between 'enum sas_device_type' and 'enum sas_dev_type' [-Wenum-compare] Fix by eliminating one of them. The one kept is effectively the sas.h one, but call it sas_device_type and make sure the enums are all properly namespaced with the SAS_ prefix. Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-05-10[SCSI] pm80xx: thermal, sas controller config and error handling updateSakthivel K6-17/+249
Modified thermal configuration to happen after interrupt registration Added SAS controller configuration during initialization Added error handling logic to handle I_T_Nexus errors and variants [jejb: fix up tabs and spaces issues] Signed-off-by: Anand Kumar S <AnandKumar.Santhanam@pmcs.com> Acked-by: Jack Wang <jack_wang@usish.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-05-10[SCSI] pm80xx: NCQ error handling changesSakthivel K5-28/+544
Handled NCQ errors in the low level driver as the FW is not providing the faulty tag for NCQ errors for libsas to recover. [jejb: fix checkpatch issues] Signed-off-by: Anand Kumar S <AnandKumar.Santhanam@pmcs.com> Acked-by: Jack Wang <jack_wang@usish.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-05-10[SCSI] pm80xx: WWN Modification for PM8081/88/89 controllersSakthivel K1-7/+36
Individual WWN read operations based on controller. PM8081 - Read WWN from Flash VPD. PM8088/89 - Read WWN from EEPROM. PM8001 - Read WWN from NVM. Signed-off-by: Sakthivel K <Sakthivel.SaravananKamalRaju@pmcs.com> Signed-off-by: Anand Kumar S <AnandKumar.Santhanam@pmcs.com> Acked-by: Jack Wang <jack_wang@usish.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-05-10[SCSI] pm80xx: Changed module name and debug messages updateSakthivel K3-13/+20
Changed name in driver to pm80xx. Updated debug messages. Signed-off-by: Sakthivel K <Sakthivel.SaravananKamalRaju@pmcs.com> Signed-off-by: Anand Kumar S <AnandKumar.Santhanam@pmcs.com> Acked-by: Jack Wang <jack_wang@usish.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-05-10[SCSI] pm80xx: Firmware flash memory free fix, with addition of new memory ↵Sakthivel K4-27/+14
region for it Performing pci_free_consistent in tasklet had result in a core dump. So allocated a new memory region for it. Fix for passing proper address and operation in firmware flash update. Signed-off-by: Sakthivel K <Sakthivel.SaravananKamalRaju@pmcs.com> Signed-off-by: Anand Kumar S <AnandKumar.Santhanam@pmcs.com> Acked-by: Jack Wang <jack_wang@usish.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-05-10[SCSI] pm80xx: SPC new firmware changes for device id 0x8081 aloneSakthivel K2-3/+30
Additional bar shift for new SPC firmware, applicable to device id 0x8081 only. Signed-off-by: Sakthivel K <Sakthivel.SaravananKamalRaju@pmcs.com> Signed-off-by: Anand Kumar S <AnandKumar.Santhanam@pmcs.com> Acked-by: Jack Wang <jack_wang@usish.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-05-10[SCSI] pm80xx: Added SPCv/ve specific hardware functionalities and relevant ↵Sakthivel K8-18/+5287
changes in common files Implementation of SPCv/ve specific hardware functionality and macros. Changing common functionalities wrt SPCv/ve operations. Conditional checks for SPC specific operations. Signed-off-by: Sakthivel K <Sakthivel.SaravananKamalRaju@pmcs.com> Signed-off-by: Anand Kumar S <AnandKumar.Santhanam@pmcs.com> Acked-by: Jack Wang <jack_wang@usish.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-05-10[SCSI] pm80xx: MSI-X implementation for using 64 interruptsSakthivel K2-30/+113
Implementation of interrupt handlers and tasklets to support upto 64 interrupt for the device. Signed-off-by: Sakthivel K <Sakthivel.SaravananKamalRaju@pmcs.com> Signed-off-by: Anand Kumar S <AnandKumar.Santhanam@pmcs.com> Acked-by: Jack Wang <jack_wang@usish.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-05-10[SCSI] pm80xx: Updated common functions common for SPC and SPCv/veSakthivel K3-105/+168
Update of function prototype for common function to SPC and SPCv/ve. Multiple queues implementation for IO. Signed-off-by: Sakthivel K <Sakthivel.SaravananKamalRaju@pmcs.com> Signed-off-by: Anand Kumar S <AnandKumar.Santhanam@pmcs.com> Acked-by: Jack Wang <jack_wang@usish.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-05-10[SCSI] pm80xx: Multiple inbound/outbound queue configurationSakthivel K3-60/+98
Memory allocation and configuration of multiple inbound and outbound queues. Signed-off-by: Sakthivel K <Sakthivel.SaravananKamalRaju@pmcs.com> Signed-off-by: Anand Kumar S <AnandKumar.Santhanam@pmcs.com> Acked-by: Jack Wang <jack_wang@usish.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-05-10[SCSI] pm80xx: Added SPCv/ve specific ids, variables and modify for SPCSakthivel K5-121/+320
Updated pci id table with device, vendor, subdevice and subvendor ids for 8081, 8088, 8089 SAS/SATA controllers. Added SPCv/ve related macros. Updated macros, hba info structure and other structures for SPCv/ve. Update of structure and variable names for SPC hardware functionalities. Signed-off-by: Sakthivel K <Sakthivel.SaravananKamalRaju@pmcs.com> Signed-off-by: Anand Kumar S <AnandKumar.Santhanam@pmcs.com> Acked-by: Jack Wang <jack_wang@usish.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-05-10[SCSI] lpfc: fix up Kconfig dependenciesJames Bottomley1-0/+2
lpfc uses the generic checksum as well as the T10DIF one from the lib/ directory, so make sure they're selected. Reported-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-05-10[SCSI] Handle MLQUEUE busy response in scsi_send_eh_cmndHannes Reinecke1-10/+27
scsi_send_eh_cmnd() is calling queuecommand() directly, so it needs to check the return value here. The only valid return codes for queuecommand() are 'busy' states, so we need to wait for a bit to allow the LLDD to recover. Based on an earlier patch from Wen Xiong. [jejb: fix confusion between msec and jiffies values and other issues] [bvanassche: correct stall_for interval] Cc: Wen Xiong <wenxiong@linux.vnet.ibm.com> Cc: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Hannes Reinecke <hare@suse.de> Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-05-10dm cache: set config valueJoe Thornber1-28/+31
Share configuration option processing code between the dm cache ctr and message functions. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-05-10dm cache: move config fnsAlasdair G Kergon1-17/+17
Move process_config_option() in dm-cache-target.c to make the next patch more readable. Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-05-10dm thin: generate event when metadata threshold passedJoe Thornber3-0/+58
Generate a dm event when the amount of remaining thin pool metadata space falls below a certain level. The threshold is taken to be a quarter of the size of the metadata device with a minimum threshold of 4MB. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-05-10dm persistent metadata: add space map threshold callbackJoe Thornber1-1/+76
Add a threshold callback to dm persistent data space maps. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-05-10dm persistent data: add threshold callback to space mapJoe Thornber3-3/+29
Add a threshold callback function to the persistent data space map interface for a subsequent patch to use. dm-thin and dm-cache are interested in knowing when they're getting low on metadata or data blocks. This patch introduces a new method for registering a callback against a threshold. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-05-10dm thin: detect metadata device resizingJoe Thornber3-3/+64
Allow the dm thin pool metadata device to be extended. Whenever a pool is resumed, detect whether the size of the metadata device has increased, and if so, extend the metadata to use the new space. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-05-10dm persistent data: support space map resizingJoe Thornber1-6/+32
Support extending a dm persistent data metadata space map. The extend itself is implemented by switching back to the boostrap allocator and pointing to the new space. The extra bitmap indexes are then allocated from the new space, and finally we switch back to the proper space map ops and tweak the reference counts. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-05-10dm thin: open dev read only when possibleJoe Thornber1-11/+14
If a thin pool is created in read-only-metadata mode then only open the metadata device read-only. Previously it was always opened with FMODE_READ | FMODE_WRITE. (Note that dm_get_device() still allows read-only dm devices to be used read-write at the moment: If I create a read-only linear device for the metadata, via dmsetup load --readonly, then I can still create a rw pool out of it.) Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-05-10dm thin: refactor data dev resizeJoe Thornber2-33/+64
Refactor device size functions in preparation for similar metadata device resizing functions. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-05-10dm cache: replace memcpy with struct assignmentJoe Thornber2-3/+3
Use struct assignment rather than memcpy in dm cache. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-05-10dm cache: fix typos in commentsJoe Thornber1-3/+4
Fix up some typos in dm-cache comments. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-05-10dm cache policy: fix description of lookup fnAlasdair G Kergon1-2/+2
Correct the documented requirement on the return code from dm cache policy lookup functions stated in the policy module header file. Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-05-10dm: document iterate_devicesAlasdair G Kergon1-0/+15
Document iterate_devices in device-mapper.h. Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-05-10dm persistent data: fix error message typosJoe Thornber1-4/+4
Fix some typos in dm-space-map-metadata.c error messages. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-05-10dm cache: tune migration throttlingJoe Thornber1-1/+2
Tune the dm cache migration throttling. i) Issue a tick every second, just in case there's no i/o going through. ii) Drop the migration threshold right down to something suitable for background work. Signed-off-by: Joe Thornber <ejt@redhat.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>
2013-05-10dm mpath: enable WRITE SAME supportMike Snitzer1-0/+1
Enable WRITE SAME support in dm multipath. As far as multipath is concerned it is just another write request. Signed-off-by: Mike Snitzer <snitzer@redhat.com> Tested-by: Bharata B Rao <bharata.rao@gmail.com> Signed-off-by: Alasdair G Kergon <agk@redhat.com>