From 219db95bbe796277c739cca17fe134f6c2ce6676 Mon Sep 17 00:00:00 2001 From: Ayush Ranjan Date: Thu, 22 Aug 2019 23:18:33 -0400 Subject: ext4: documentation fixes This commit aims to fix the following issues in ext4 documentation: - Flexible block group docs said that the aim was to group block metadata together instead of block group metadata. - The documentation consistly uses "location" instead of "block number". It is easy to confuse location to be an absolute offset on disk. Added a line to clarify all location values are in terms of block numbers. - Dirent2 docs said that the rec_len field is shortened instead of the name_len field. - Typo in bg_checksum description. - Inode size is 160 bytes now, and hence i_extra_isize is now 32. - Cluster size formula was incorrect, it did not include the +10 to s_log_cluster_size value. - Typo: there were two s_wtime_hi in the superblock struct. - Superblock struct was outdated, added the new fields which were part of s_reserved earlier. - Multiple mount protection seems to be implemented in fs/ext4/mmp.c. Signed-off-by: Ayush Ranjan Signed-off-by: Theodore Ts'o Reviewed-by: Andreas Dilger --- Documentation/filesystems/ext4/blockgroup.rst | 10 +++++----- Documentation/filesystems/ext4/blocks.rst | 4 +++- Documentation/filesystems/ext4/directory.rst | 2 +- Documentation/filesystems/ext4/group_descr.rst | 9 ++++++--- Documentation/filesystems/ext4/inodes.rst | 4 ++-- Documentation/filesystems/ext4/super.rst | 20 ++++++++++++++------ 6 files changed, 31 insertions(+), 18 deletions(-) (limited to 'Documentation/filesystems') diff --git a/Documentation/filesystems/ext4/blockgroup.rst b/Documentation/filesystems/ext4/blockgroup.rst index baf888e4c06a..3da156633339 100644 --- a/Documentation/filesystems/ext4/blockgroup.rst +++ b/Documentation/filesystems/ext4/blockgroup.rst @@ -71,11 +71,11 @@ if the flex\_bg size is 4, then group 0 will contain (in order) the superblock, group descriptors, data block bitmaps for groups 0-3, inode bitmaps for groups 0-3, inode tables for groups 0-3, and the remaining space in group 0 is for file data. The effect of this is to group the -block metadata close together for faster loading, and to enable large -files to be continuous on disk. Backup copies of the superblock and -group descriptors are always at the beginning of block groups, even if -flex\_bg is enabled. The number of block groups that make up a flex\_bg -is given by 2 ^ ``sb.s_log_groups_per_flex``. +block group metadata close together for faster loading, and to enable +large files to be continuous on disk. Backup copies of the superblock +and group descriptors are always at the beginning of block groups, even +if flex\_bg is enabled. The number of block groups that make up a +flex\_bg is given by 2 ^ ``sb.s_log_groups_per_flex``. Meta Block Groups ----------------- diff --git a/Documentation/filesystems/ext4/blocks.rst b/Documentation/filesystems/ext4/blocks.rst index 73d4dc0f7bda..bd722ecd92d6 100644 --- a/Documentation/filesystems/ext4/blocks.rst +++ b/Documentation/filesystems/ext4/blocks.rst @@ -10,7 +10,9 @@ block groups. Block size is specified at mkfs time and typically is 4KiB. You may experience mounting problems if block size is greater than page size (i.e. 64KiB blocks on a i386 which only has 4KiB memory pages). By default a filesystem can contain 2^32 blocks; if the '64bit' -feature is enabled, then a filesystem can have 2^64 blocks. +feature is enabled, then a filesystem can have 2^64 blocks. The location +of structures is stored in terms of the block number the structure lives +in and not the absolute offset on disk. For 32-bit filesystems, limits are as follows: diff --git a/Documentation/filesystems/ext4/directory.rst b/Documentation/filesystems/ext4/directory.rst index 614034e24669..073940cc64ed 100644 --- a/Documentation/filesystems/ext4/directory.rst +++ b/Documentation/filesystems/ext4/directory.rst @@ -59,7 +59,7 @@ is at most 263 bytes long, though on disk you'll need to reference - File name. Since file names cannot be longer than 255 bytes, the new directory -entry format shortens the rec\_len field and uses the space for a file +entry format shortens the name\_len field and uses the space for a file type flag, probably to avoid having to load every inode during directory tree traversal. This format is ``ext4_dir_entry_2``, which is at most 263 bytes long, though on disk you'll need to reference diff --git a/Documentation/filesystems/ext4/group_descr.rst b/Documentation/filesystems/ext4/group_descr.rst index 0f783ed88592..7ba6114e7f5c 100644 --- a/Documentation/filesystems/ext4/group_descr.rst +++ b/Documentation/filesystems/ext4/group_descr.rst @@ -99,9 +99,12 @@ The block group descriptor is laid out in ``struct ext4_group_desc``. * - 0x1E - \_\_le16 - bg\_checksum - - Group descriptor checksum; crc16(sb\_uuid+group+desc) if the - RO\_COMPAT\_GDT\_CSUM feature is set, or crc32c(sb\_uuid+group\_desc) & - 0xFFFF if the RO\_COMPAT\_METADATA\_CSUM feature is set. + - Group descriptor checksum; crc16(sb\_uuid+group\_num+bg\_desc) if the + RO\_COMPAT\_GDT\_CSUM feature is set, or + crc32c(sb\_uuid+group\_num+bg\_desc) & 0xFFFF if the + RO\_COMPAT\_METADATA\_CSUM feature is set. The bg\_checksum + field in bg\_desc is skipped when calculating crc16 checksum, + and set to zero if crc32c checksum is used. * - - - diff --git a/Documentation/filesystems/ext4/inodes.rst b/Documentation/filesystems/ext4/inodes.rst index 6bd35e506b6f..34f62928cebc 100644 --- a/Documentation/filesystems/ext4/inodes.rst +++ b/Documentation/filesystems/ext4/inodes.rst @@ -470,8 +470,8 @@ inode, which allows struct ext4\_inode to grow for a new kernel without having to upgrade all of the on-disk inodes. Access to fields beyond EXT2\_GOOD\_OLD\_INODE\_SIZE should be verified to be within ``i_extra_isize``. By default, ext4 inode records are 256 bytes, and (as -of October 2013) the inode structure is 156 bytes -(``i_extra_isize = 28``). The extra space between the end of the inode +of August 2019) the inode structure is 160 bytes +(``i_extra_isize = 32``). The extra space between the end of the inode structure and the end of the inode record can be used to store extended attributes. Each inode record can be as large as the filesystem block size, though this is not terribly efficient. diff --git a/Documentation/filesystems/ext4/super.rst b/Documentation/filesystems/ext4/super.rst index 04ff079a2acf..48b6c78fc38e 100644 --- a/Documentation/filesystems/ext4/super.rst +++ b/Documentation/filesystems/ext4/super.rst @@ -58,7 +58,7 @@ The ext4 superblock is laid out as follows in * - 0x1C - \_\_le32 - s\_log\_cluster\_size - - Cluster size is (2 ^ s\_log\_cluster\_size) blocks if bigalloc is + - Cluster size is 2 ^ (10 + s\_log\_cluster\_size) blocks if bigalloc is enabled. Otherwise s\_log\_cluster\_size must equal s\_log\_block\_size. * - 0x20 - \_\_le32 @@ -447,7 +447,7 @@ The ext4 superblock is laid out as follows in - Upper 8 bits of the s_wtime field. * - 0x275 - \_\_u8 - - s\_wtime_hi + - s\_mtime_hi - Upper 8 bits of the s_mtime field. * - 0x276 - \_\_u8 @@ -466,12 +466,20 @@ The ext4 superblock is laid out as follows in - s\_last_error_time_hi - Upper 8 bits of the s_last_error_time_hi field. * - 0x27A - - \_\_u8[2] - - s\_pad + - \_\_u8 + - s\_pad[2] - Zero padding. * - 0x27C + - \_\_le16 + - s\_encoding + - Filename charset encoding. + * - 0x27E + - \_\_le16 + - s\_encoding_flags + - Filename charset encoding flags. + * - 0x280 - \_\_le32 - - s\_reserved[96] + - s\_reserved[95] - Padding to the end of the block. * - 0x3FC - \_\_le32 @@ -617,7 +625,7 @@ following: * - 0x80 - Enable a filesystem size of 2^64 blocks (INCOMPAT\_64BIT). * - 0x100 - - Multiple mount protection. Not implemented (INCOMPAT\_MMP). + - Multiple mount protection (INCOMPAT\_MMP). * - 0x200 - Flexible block groups. See the earlier discussion of this feature (INCOMPAT\_FLEX\_BG). -- cgit v1.2.3 From e85526404ca70fa49f3d13b3898b067392522b24 Mon Sep 17 00:00:00 2001 From: Ayush Ranjan Date: Sat, 7 Sep 2019 11:56:47 -0400 Subject: ext4: add missing bigalloc documentation. There was a broken link for bigalloc. The page https://ext4.wiki.kernel.org/index.php/Bigalloc was not migrated into the current documentation sources. This patch adds the contents of that missing page into the section for Bigalloc itself. Signed-off-by: Ayush Ranjan Link: https://lore.kernel.org/r/20190831154419.GA30357@fa19-cs241-404.cs.illinois.edu Signed-off-by: Theodore Ts'o --- Documentation/filesystems/ext4/bigalloc.rst | 32 ++++++++++++++++++++--------- 1 file changed, 22 insertions(+), 10 deletions(-) (limited to 'Documentation/filesystems') diff --git a/Documentation/filesystems/ext4/bigalloc.rst b/Documentation/filesystems/ext4/bigalloc.rst index c6d88557553c..72075aa608e4 100644 --- a/Documentation/filesystems/ext4/bigalloc.rst +++ b/Documentation/filesystems/ext4/bigalloc.rst @@ -9,14 +9,26 @@ ext4 code is not prepared to handle the case where the block size exceeds the page size. However, for a filesystem of mostly huge files, it is desirable to be able to allocate disk blocks in units of multiple blocks to reduce both fragmentation and metadata overhead. The -`bigalloc `__ feature provides exactly this ability. The -administrator can set a block cluster size at mkfs time (which is stored -in the s\_log\_cluster\_size field in the superblock); from then on, the -block bitmaps track clusters, not individual blocks. This means that -block groups can be several gigabytes in size (instead of just 128MiB); -however, the minimum allocation unit becomes a cluster, not a block, -even for directories. TaoBao had a patchset to extend the “use units of -clusters instead of blocks” to the extent tree, though it is not clear -where those patches went-- they eventually morphed into “extent tree v2” -but that code has not landed as of May 2015. +bigalloc feature provides exactly this ability. + +The bigalloc feature (EXT4_FEATURE_RO_COMPAT_BIGALLOC) changes ext4 to +use clustered allocation, so that each bit in the ext4 block allocation +bitmap addresses a power of two number of blocks. For example, if the +file system is mainly going to be storing large files in the 4-32 +megabyte range, it might make sense to set a cluster size of 1 megabyte. +This means that each bit in the block allocation bitmap now addresses +256 4k blocks. This shrinks the total size of the block allocation +bitmaps for a 2T file system from 64 megabytes to 256 kilobytes. It also +means that a block group addresses 32 gigabytes instead of 128 megabytes, +also shrinking the amount of file system overhead for metadata. + +The administrator can set a block cluster size at mkfs time (which is +stored in the s\_log\_cluster\_size field in the superblock); from then +on, the block bitmaps track clusters, not individual blocks. This means +that block groups can be several gigabytes in size (instead of just +128MiB); however, the minimum allocation unit becomes a cluster, not a +block, even for directories. TaoBao had a patchset to extend the “use +units of clusters instead of blocks” to the extent tree, though it is +not clear where those patches went-- they eventually morphed into +“extent tree v2” but that code has not landed as of May 2015. -- cgit v1.2.3