summaryrefslogtreecommitdiffstats
path: root/fs/xfs (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge tag 'xfs-4.15-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds2017-11-233-7/+13
|\ | | | | | | | | | | | | | | | | | | | | | | Pull xfs fixes from Darrick Wong: - Fix a memory leak in the new in-core extent map - Refactor the xfs_dev_t conversions for easier xfsprogs porting * tag 'xfs-4.15-merge-3' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: abstract out dev_t conversions xfs: fix memory leak in xfs_iext_free_last_leaf
| * xfs: abstract out dev_t conversionsChristoph Hellwig2017-11-212-6/+12
| | | | | | | | | | | | | | | | | | And move them to xfs_linux.h so that xfsprogs can stub them out more easily. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * xfs: fix memory leak in xfs_iext_free_last_leafShu Wang2017-11-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | found the issue by kmemleak. unreferenced object 0xffff8800674611c0 (size 16): xfs_iext_insert+0x82a/0xa90 [xfs] xfs_bmap_add_extent_hole_delay+0x1e5/0x5b0 [xfs] xfs_bmapi_reserve_delalloc+0x483/0x530 [xfs] xfs_file_iomap_begin+0xac8/0xd40 [xfs] iomap_apply+0xb8/0x1b0 iomap_file_buffered_write+0xac/0xe0 xfs_file_buffered_aio_write+0x198/0x420 [xfs] xfs_file_write_iter+0x23f/0x2a0 [xfs] __vfs_write+0x23e/0x340 vfs_write+0xe9/0x240 SyS_write+0xa1/0x120 do_syscall_64+0xda/0x260 Signed-off-by: Shu Wang <shuwang@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
* | Merge tag 'xfs-4.15-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds2017-11-173-3/+4
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | Pull xfs fixes from Darrick Wong: "A couple more patches to fix a locking bug and some inconsistent type usage in some of the new code: - Fix a forgotten rcu read unlock - Fix some inconsistent integer type usage" * tag 'xfs-4.15-merge-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: fix type usage xfs: fix forgotten rcu read unlock when skipping inode reclaim
| * xfs: fix type usageDarrick J. Wong2017-11-162-3/+3
| | | | | | | | | | | | | | | | Be consistent about using uint32_t/uint8_t instead of u32/u8. This is more so that we don't have to maintain /those/ types in xfsprogs. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Eric Sandeen <sandeen@redhat.com>
| * xfs: fix forgotten rcu read unlock when skipping inode reclaimDarrick J. Wong2017-11-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit f2e9ad21 ("xfs: check for race with xfs_reclaim_inode"), we skip an inode if we're racing with freeing the inode via xfs_reclaim_inode, but we forgot to release the rcu read lock when dumping the inode, with the result that we exit to userspace with a lock held. Don't do that; generic/320 with a 1k block size fails this very occasionally. ================================================ WARNING: lock held when returning to user space! 4.14.0-rc6-djwong #4 Tainted: G W ------------------------------------------------ rm/30466 is leaving the kernel with locks still held! 1 lock held by rm/30466: #0: (rcu_read_lock){....}, at: [<ffffffffa01364d3>] xfs_ifree_cluster.isra.17+0x2c3/0x6f0 [xfs] ------------[ cut here ]------------ WARNING: CPU: 1 PID: 30466 at kernel/rcu/tree_plugin.h:329 rcu_note_context_switch+0x71/0x700 Modules linked in: deadline_iosched dm_snapshot dm_bufio ext4 mbcache jbd2 dm_flakey xfs libcrc32c dax_pmem device_dax nd_pmem sch_fq_codel af_packet [last unloaded: scsi_debug] CPU: 1 PID: 30466 Comm: rm Tainted: G W 4.14.0-rc6-djwong #4 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.10.2-1ubuntu1djwong0 04/01/2014 task: ffff880037680000 task.stack: ffffc90001064000 RIP: 0010:rcu_note_context_switch+0x71/0x700 RSP: 0000:ffffc90001067e50 EFLAGS: 00010002 RAX: 0000000000000001 RBX: ffff880037680000 RCX: ffff88003e73d200 RDX: 0000000000000002 RSI: ffffffff819e53e9 RDI: ffffffff819f4375 RBP: 0000000000000000 R08: 0000000000000000 R09: ffff880062c900d0 R10: 0000000000000000 R11: 0000000000000000 R12: ffff880037680000 R13: 0000000000000000 R14: ffffc90001067eb8 R15: ffff880037680690 FS: 00007fa3b8ce8700(0000) GS:ffff88003ec00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f69bf77c000 CR3: 000000002450a000 CR4: 00000000000006e0 Call Trace: __schedule+0xb8/0xb10 schedule+0x40/0x90 exit_to_usermode_loop+0x6b/0xa0 prepare_exit_to_usermode+0x7a/0x90 retint_user+0x8/0x20 RIP: 0033:0x7fa3b87fda87 RSP: 002b:00007ffe41206568 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff02 RAX: 0000000000000000 RBX: 00000000010e88c0 RCX: 00007fa3b87fda87 RDX: 0000000000000000 RSI: 00000000010e89c8 RDI: 0000000000000005 RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000000000 R10: 000000000000015e R11: 0000000000000246 R12: 00000000010c8060 R13: 00007ffe41206690 R14: 0000000000000000 R15: 0000000000000000 ---[ end trace e88f83bf0cfbd07d ]--- Fixes: f2e9ad212def50bcf4c098c6288779dd97fff0f0 Cc: Omar Sandoval <osandov@fb.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Omar Sandoval <osandov@fb.com>
* | Merge tag 'libnvdimm-for-4.15' of ↵Linus Torvalds2017-11-173-28/+23
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm Pull libnvdimm and dax updates from Dan Williams: "Save for a few late fixes, all of these commits have shipped in -next releases since before the merge window opened, and 0day has given a build success notification. The ext4 touches came from Jan, and the xfs touches have Darrick's reviewed-by. An xfstest for the MAP_SYNC feature has been through a few round of reviews and is on track to be merged. - Introduce MAP_SYNC and MAP_SHARED_VALIDATE, a mechanism to enable 'userspace flush' of persistent memory updates via filesystem-dax mappings. It arranges for any filesystem metadata updates that may be required to satisfy a write fault to also be flushed ("on disk") before the kernel returns to userspace from the fault handler. Effectively every write-fault that dirties metadata completes an fsync() before returning from the fault handler. The new MAP_SHARED_VALIDATE mapping type guarantees that the MAP_SYNC flag is validated as supported by the filesystem's ->mmap() file operation. - Add support for the standard ACPI 6.2 label access methods that replace the NVDIMM_FAMILY_INTEL (vendor specific) label methods. This enables interoperability with environments that only implement the standardized methods. - Add support for the ACPI 6.2 NVDIMM media error injection methods. - Add support for the NVDIMM_FAMILY_INTEL v1.6 DIMM commands for latch last shutdown status, firmware update, SMART error injection, and SMART alarm threshold control. - Cleanup physical address information disclosures to be root-only. - Fix revalidation of the DIMM "locked label area" status to support dynamic unlock of the label area. - Expand unit test infrastructure to mock the ACPI 6.2 Translate SPA (system-physical-address) command and error injection commands. Acknowledgements that came after the commits were pushed to -next: - 957ac8c421ad ("dax: fix PMD faults on zero-length files"): Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> - a39e596baa07 ("xfs: support for synchronous DAX faults") and 7b565c9f965b ("xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault()") Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>" * tag 'libnvdimm-for-4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm: (49 commits) acpi, nfit: add 'Enable Latch System Shutdown Status' command support dax: fix general protection fault in dax_alloc_inode dax: fix PMD faults on zero-length files dax: stop requiring a live device for dax_flush() brd: remove dax support dax: quiet bdev_dax_supported() fs, dax: unify IOMAP_F_DIRTY read vs write handling policy in the dax core tools/testing/nvdimm: unit test clear-error commands acpi, nfit: validate commands against the device type tools/testing/nvdimm: stricter bounds checking for error injection commands xfs: support for synchronous DAX faults xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault() ext4: Support for synchronous DAX faults ext4: Simplify error handling in ext4_dax_huge_fault() dax: Implement dax_finish_sync_fault() dax, iomap: Add support for synchronous faults mm: Define MAP_SYNC and VM_SYNC flags dax: Allow tuning whether dax_insert_mapping_entry() dirties entry dax: Allow dax_iomap_fault() to return pfn dax: Fix comment describing dax_iomap_fault() ...
| * | fs, dax: unify IOMAP_F_DIRTY read vs write handling policy in the dax coreDan Williams2017-11-141-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While reviewing whether MAP_SYNC should strengthen its current guarantee of syncing writes from the initiating process to also include third-party readers observing dirty metadata, Dave pointed out that the check of IOMAP_WRITE is misplaced. The policy of what to with IOMAP_F_DIRTY should be separated from the generic filesystem mechanism of reporting dirty metadata. Move this policy to the fs-dax core to simplify the per-filesystem iomap handlers, and further centralize code that implements the MAP_SYNC policy. This otherwise should not change behavior, it just makes it easier to change behavior in the future. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reported-by: Dave Chinner <david@fromorbit.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * | xfs: support for synchronous DAX faultsChristoph Hellwig2017-11-032-1/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Return IOMAP_F_DIRTY from xfs_file_iomap_begin() when asked to prepare blocks for writing and the inode is pinned, and has dirty fields other than the timestamps. In __xfs_filemap_fault() we then detect this case and call dax_finish_sync_fault() to make sure all metadata is committed, and to insert the page table entry. Note that this will also dirty corresponding radix tree entry which is what we want - fsync(2) will still provide data integrity guarantees for applications not using userspace flushing. And applications using userspace flushing can avoid calling fsync(2) and thus avoid the performance overhead. [JK: Added VM_SYNC flag handling] Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * | xfs: Implement xfs_filemap_pfn_mkwrite() using __xfs_filemap_fault()Jan Kara2017-11-032-27/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | xfs_filemap_pfn_mkwrite() duplicates a lot of __xfs_filemap_fault(). It will also need to handle flushing for synchronous page faults. So just make that function use __xfs_filemap_fault(). Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
| * | dax: Allow dax_iomap_fault() to return pfnJan Kara2017-11-031-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For synchronous page fault dax_iomap_fault() will need to return PFN which will then need to be inserted into page tables after fsync() completes. Add necessary parameter to dax_iomap_fault(). Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
* | | slab, slub, slob: add slab_flags_tAlexey Dobriyan2017-11-161-1/+1
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add sparse-checked slab_flags_t for struct kmem_cache::flags (SLAB_POISON, etc). SLAB is bloated temporarily by switching to "unsigned long", but only temporarily. Link: http://lkml.kernel.org/r/20171021100225.GA22428@avx2 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge tag 'xfs-4.15-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds2017-11-1487-4011/+10972
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull xfs updates from Darrick Wong: "xfs: great scads of new stuff for 4.15. This merge cycle, we're making some substantive changes to XFS. The in-core extent mappings have been refactored to use proper iterators and a btree to handle heavily fragmented files without needing high-order memory allocations; some important log recovery bug fixes; and the first part of the online fsck functionality. (The online fsck feature is disabled by default and more pieces of it will be coming in future release cycles.) This giant pile of patches has been run through a full xfstests run over the weekend and through a quick xfstests run against this morning's master, with no major failures reported. New in this version: - Refactor the incore extent map manipulations to use a cursor instead of directly modifying extent data. - Refactor the incore extent map cursor to use an in-memory btree instead of a single high-order allocation. This eliminates a major source of complaints about insufficient memory when opening a heavily fragmented file into a system whose memory is also heavily fragmented. - Fix a longstanding bug where deleting a file with a complex extended attribute btree incorrectly handled memory pointers, which could lead to memory corruption. - Improve metadata validation to eliminate crashing problems found while fuzzing xfs. - Move the error injection tag definitions into libxfs to be shared with userspace components. - Fix some log recovery bugs where we'd underflow log block position vector and incorrectly fail log recovery. - Drain the buffer lru after log recovery to force recovered buffers back through the verifiers after mount. On a v4 filesystem the log never attaches verifiers during log replay (v5 does), so we could end up with buffers marked verified but without having ever been verified. - Fix various other bugs. - Introduce the first part of a new online fsck tool. The new fsck tool will be able to iterate every piece of metadata in the filesystem to look for obvious errors and corruptions. In the next release cycle the checking will be extended to cross-reference with the other fs metadata, so this feature should only be used by the developers in the mean time" * tag 'xfs-4.15-merge-1' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: (131 commits) xfs: on failed mount, force-reclaim inodes after unmounting quota controls xfs: check the uniqueness of the AGFL entries xfs: remove u_int* type usage xfs: handle zero entries case in xfs_iext_rebalance_leaf xfs: add comments documenting the rebalance algorithm xfs: trivial indentation fixup for xfs_iext_remove_node xfs: remove a superflous assignment in xfs_iext_remove_node xfs: add some comments to xfs_iext_insert/xfs_iext_insert_node xfs: fix number of records handling in xfs_iext_split_leaf fs/xfs: Remove NULL check before kmem_cache_destroy xfs: only check da node header padding on v5 filesystems xfs: fix btree scrub deref check xfs: fix uninitialized return values in scrub code xfs: pass inode number to xfs_scrub_ino_set_{preen,warning} xfs: refactor the directory data block bestfree checks xfs: mark xlog_verify_dest_ptr STATIC xfs: mark xlog_recover_check_summary STATIC xfs: mark xfs_btree_check_lblock and xfs_btree_check_ptr static xfs: remove unreachable error injection code in xfs_qm_dqget xfs: remove unused debug counts for xfs_lock_inodes ...
| * | xfs: on failed mount, force-reclaim inodes after unmounting quota controlsDarrick J. Wong2017-11-101-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When mounting fails, we must force-reclaim inodes (and disable delayed reclaim) /after/ the realtime and quota control have let go of the realtime and quota inodes. Without this, we corrupt the timer list and cause other weird problems. Found by xfs/376 fuzzing u3.bmbt[0].lastoff on an rmap filesystem to force a bogus post-eof extent reclaim that causes the fs to go down. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
| * | xfs: check the uniqueness of the AGFL entriesDarrick J. Wong2017-11-101-2/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make sure we don't list a block twice in the agfl by copying the contents of the AGFL to an array, sorting it, and looking for duplicates. We can easily check that the number of agfl entries we see actually matches the flcount, so do that too. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
| * | xfs: remove u_int* type usageDarrick J. Wong2017-11-104-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | Use the uint* types instead of the u_int* types. This will (hopefully) pair with an xfsprogs cleanup. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
| * | xfs: handle zero entries case in xfs_iext_rebalance_leafChristoph Hellwig2017-11-091-7/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | And also rename fill to nr_entries to match the rest of the code. Reported-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | xfs: add comments documenting the rebalance algorithmChristoph Hellwig2017-11-091-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | Reported-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | xfs: trivial indentation fixup for xfs_iext_remove_nodeChristoph Hellwig2017-11-091-2/+1
| | | | | | | | | | | | | | | | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | xfs: remove a superflous assignment in xfs_iext_remove_nodeChristoph Hellwig2017-11-091-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | Reported-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | xfs: add some comments to xfs_iext_insert/xfs_iext_insert_nodeChristoph Hellwig2017-11-091-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | Reported-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | xfs: fix number of records handling in xfs_iext_split_leafChristoph Hellwig2017-11-091-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix to check the correct value, and remove a duplicate handling of the uneven record number split algorith, Reported-by: Brian Foster <bfoster@redhat.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | fs/xfs: Remove NULL check before kmem_cache_destroyTim Hansen2017-11-091-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | kmem_cache_destroy already checks for null values. Signed-off-by: Tim Hansen <devtimhansen@gmail.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | xfs: only check da node header padding on v5 filesystemsDarrick J. Wong2017-11-091-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It turns out that we only started zeroing a new da btree node's block header on v5 filesystems. Prior to that, we just wouldn't set anything at all, which means that the pad field never got set and would retain whatever happened to be in memory. Therefore, we can only check the pad for zeroness on v5 filesystems. shared/006 on a v4 filesystem exposes this scrub bug. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
| * | xfs: fix btree scrub deref checkDarrick J. Wong2017-11-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The btree scrubber has some custom code to retrieve and check a btree block via xfs_btree_lookup_get_block. This function will either return an error code (verifiers failed) or a *pblock will be untouched (bad pointer). Since we previously set *pblock to NULL, we need to check *pblock, not pblock, to trigger the early bailout. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
| * | xfs: fix uninitialized return values in scrub codeDarrick J. Wong2017-11-092-3/+3
| | | | | | | | | | | | | | | | | | | | | Fix smatch complaints about uninitialized return codes. Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
| * | xfs: pass inode number to xfs_scrub_ino_set_{preen,warning}Darrick J. Wong2017-11-094-9/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are two ways to scrub an inode -- calling xfs_iget and checking the raw inode core, or by loading the inode cluster buffer and checking the on-disk contents directly. The second method is only useful if _iget fails the verifiers; when this is the case, sc->ip is NULL and calling the tracepoint will cause a system crash. Therefore, pass the raw inode number directly into the _preen and _warning functions. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
| * | xfs: refactor the directory data block bestfree checksDarrick J. Wong2017-11-091-15/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In a directory data block, the zeroth bestfree item must point to the longest free space. Therefore, when we check the bestfree block's records against the data blocks, we only need to compare with bf[0] and don't need the loop. The weird loop was most probably the result of an earlier refactoring gone bad. Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Reviewed-by: Dave Chinner <dchinner@redhat.com>
| * | xfs: mark xlog_verify_dest_ptr STATICChristoph Hellwig2017-11-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | We already did it in the forward declaration, but not for the function body itself. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | xfs: mark xlog_recover_check_summary STATICChristoph Hellwig2017-11-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | We already did it in the forward declaration, but not for the function body itself. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | xfs: mark xfs_btree_check_lblock and xfs_btree_check_ptr staticChristoph Hellwig2017-11-061-2/+2
| | | | | | | | | | | | | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | xfs: remove unreachable error injection code in xfs_qm_dqgetChristoph Hellwig2017-11-061-17/+0
| | | | | | | | | | | | | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | xfs: remove unused debug counts for xfs_lock_inodesChristoph Hellwig2017-11-061-21/+0
| | | | | | | | | | | | | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | xfs: mark xfs_errortag_ktype staticChristoph Hellwig2017-11-061-1/+1
| | | | | | | | | | | | | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | xfs: trivial sparse fixes for the new scrub codeChristoph Hellwig2017-11-065-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | [darrick: fix broken initializer in xfs_scrub_xattr] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | xfs: always define STATIC to static noinlineChristoph Hellwig2017-11-062-13/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Ever since we added the noinline tag there is no good reason to define away the static for debug builds - we'll get just as good debug information with our without it, so don't mess up sparse and other checkers due to it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | xfs: move xfs_bmbt_irec and xfs_exntst_t to xfs_types.hChristoph Hellwig2017-11-062-18/+12
| | | | | | | | | | | | | | | | | | | | | | | | Neither defines an on-disk format, so move them out of xfs_format.h. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | xfs: pass struct xfs_bmbt_irec to xfs_bmbt_validate_extentChristoph Hellwig2017-11-063-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | This removed an unaligned load per extent, as well as the manual poking into the on-disk extent format. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | xfs: remove the nr_extents argument to xfs_iext_removeChristoph Hellwig2017-11-063-35/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | We only have two places that remove 2 extents at the same time, so unroll the loop there. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | xfs: remove the nr_extents argument to xfs_iext_insertChristoph Hellwig2017-11-064-40/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | We only have two places that insert 2 extents at the same time, so unroll the loop there. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | xfs: use a b+tree for the in-core extent listChristoph Hellwig2017-11-0613-1259/+1093
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace the current linear list and the indirection array for the in-core extent list with a b+tree to avoid the need for larger memory allocations for the indirection array when lots of extents are present. The current extent list implementations leads to heavy pressure on the memory allocator when modifying files with a high extent count, and can lead to high latencies because of that. The replacement is a b+tree with a few quirks. The leaf nodes directly store the extent record in two u64 values. The encoding is a little bit different from the existing in-core extent records so that the start offset and length which are required for lookups can be retreived with simple mask operations. The inner nodes store a 64-bit key containing the start offset in the first half of the node, and the pointers to the next lower level in the second half. In either case we walk the node from the beginninig to the end and do a linear search, as that is more efficient for the low number of cache lines touched during a search (2 for the inner nodes, 4 for the leaf nodes) than a binary search. We store termination markers (zero length for the leaf nodes, an otherwise impossible high bit for the inner nodes) to terminate the key list / records instead of storing a count to use the available cache lines as efficiently as possible. One quirk of the algorithm is that while we normally split a node half and half like usual btree implementations we just spill over entries added at the very end of the list to a new node on its own. This means we get a 100% fill grade for the common cases of bulk insertion when reading an inode into memory, and when only sequentially appending to a file. The downside is a slightly higher chance of splits on the first random insertions. Both insert and removal manually recurse into the lower levels, but the bulk deletion of the whole tree is still implemented as a recursive function call, although one limited by the overall depth and with very little stack usage in every iteration. For the first few extents we dynamically grow the list from a single extent to the next powers of two until we have a first full leaf block and that building the actual tree. The code started out based on the generic lib/btree.c code from Joern Engel based on earlier work from Peter Zijlstra, but has since been rewritten beyond recognition. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | xfs: allow unaligned extent records in xfs_bmbt_disk_set_allChristoph Hellwig2017-11-061-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To make life a little simpler make xfs_bmbt_set_all unaligned access aware so that we can use it directly on the destination buffer. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | xfs: remove support for inlining data/extents into the inode forkChristoph Hellwig2017-11-063-198/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Supporting a small bit of data inside the inode fork blows up the fork size a lot, removing the 32 bytes of inline data halves the effective size of the inode fork (and it still has a lot of unused padding left), and the performance of a single kmalloc doesn't show up compared to the size to read an inode or create one. It also simplifies the fork management code a lot. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | xfs: simplify xfs_reflink_convert_cowChristoph Hellwig2017-11-063-20/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Instead of looking up extents to convert and calling xfs_bmapi_write on each of them just let xfs_bmapi_write handle the full range. To make this robust add a new XFS_BMAPI_CONVERT_ONLY that only converts ranges and never allocates blocks. [darrick: shorten the stringified CONVERT_ONLY trace flag] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | xfs: iterate backwards in xfs_reflink_cancel_cow_blocksChristoph Hellwig2017-11-061-4/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Match the iteration order for extent deletion in the truncate and reflink I/O completion path. This also happens to make implementing the new incore extent list a lot easier. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | xfs: introduce the xfs_iext_cursor abstractionChristoph Hellwig2017-11-0613-337/+407
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new xfs_iext_cursor structure to hide the direct extent map index manipulations. In addition to the existing lookup/get/insert/ remove and update routines new primitives to get the first and last extent cursor, as well as moving up and down by one extent are provided. Also new are convenience to increment/decrement the cursor and retreive the new extent, as well as to peek into the previous/next extent without updating the cursor and last but not least a macro to iterate over all extents in a fork. [darrick: rename for_each_iext to for_each_xfs_iext] Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | xfs: iterate over extents in xfs_bmap_extents_to_btreeChristoph Hellwig2017-11-061-12/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This actually makes the function very slightly less efficient for now as we detour through the expanded irect format between the in-core extent format and the on-disk one instead of just endian swapping them. But with the incore extent btree the in-core one will use a different format and the representation will be entirely hidden. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | xfs: iterate over extents in xfs_iextents_copyChristoph Hellwig2017-11-061-40/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This actually makes the function very slightly less efficient for now as we detour through the expanded irect format between the in-core extent format and the on-disk one instead of just endian swapping them. But with the incore extent btree the in-core one will use a different format and the representation will be entirely hidden. It also happens to make the function a whole more readable. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | xfs: pass an on-disk extent to xfs_bmbt_validate_extentChristoph Hellwig2017-11-063-10/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This prepares for getting rid of the current in-memory extent format. At the end of the series we will change the calling convention again to pass the xfs_bmbt_irec structure once it is available everywhere. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
| * | xfs: treat idx as a cursor in xfs_bmap_collapse_extentsChristoph Hellwig2017-11-061-11/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Stop poking before and after the index and just increment or decrement it while doing our operations on it to prepare for a new extent list implementation. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>