diff options
Diffstat (limited to 'Documentation/filesystems')
-rw-r--r-- | Documentation/filesystems/fiemap.txt | 3 | ||||
-rw-r--r-- | Documentation/filesystems/inotify.txt | 197 | ||||
-rw-r--r-- | Documentation/filesystems/nfs/nfs41-server.txt | 23 | ||||
-rw-r--r-- | Documentation/filesystems/nfs/pnfs-block-server.txt | 37 | ||||
-rw-r--r-- | Documentation/filesystems/nfs/pnfs.txt | 13 | ||||
-rw-r--r-- | Documentation/filesystems/ocfs2.txt | 4 | ||||
-rw-r--r-- | Documentation/filesystems/proc.txt | 25 | ||||
-rw-r--r-- | Documentation/filesystems/seq_file.txt | 12 | ||||
-rw-r--r-- | Documentation/filesystems/xfs.txt | 22 |
9 files changed, 101 insertions, 235 deletions
diff --git a/Documentation/filesystems/fiemap.txt b/Documentation/filesystems/fiemap.txt index 1b805a0efbb0..f6d9c99103a4 100644 --- a/Documentation/filesystems/fiemap.txt +++ b/Documentation/filesystems/fiemap.txt @@ -196,7 +196,8 @@ struct fiemap_extent_info { }; It is intended that the file system should not need to access any of this -structure directly. +structure directly. Filesystem handlers should be tolerant to signals and return +EINTR once fatal signal received. Flag checking should be done at the beginning of the ->fiemap callback via the diff --git a/Documentation/filesystems/inotify.txt b/Documentation/filesystems/inotify.txt index cfd02712b83e..51f61db787fb 100644 --- a/Documentation/filesystems/inotify.txt +++ b/Documentation/filesystems/inotify.txt @@ -4,201 +4,10 @@ Document started 15 Mar 2005 by Robert Love <rml@novell.com> +Document updated 4 Jan 2015 by Zhang Zhen <zhenzhang.zhang@huawei.com> + --Deleted obsoleted interface, just refer to manpages for user interface. - -(i) User Interface - -Inotify is controlled by a set of three system calls and normal file I/O on a -returned file descriptor. - -First step in using inotify is to initialise an inotify instance: - - int fd = inotify_init (); - -Each instance is associated with a unique, ordered queue. - -Change events are managed by "watches". A watch is an (object,mask) pair where -the object is a file or directory and the mask is a bit mask of one or more -inotify events that the application wishes to receive. See <linux/inotify.h> -for valid events. A watch is referenced by a watch descriptor, or wd. - -Watches are added via a path to the file. - -Watches on a directory will return events on any files inside of the directory. - -Adding a watch is simple: - - int wd = inotify_add_watch (fd, path, mask); - -Where "fd" is the return value from inotify_init(), path is the path to the -object to watch, and mask is the watch mask (see <linux/inotify.h>). - -You can update an existing watch in the same manner, by passing in a new mask. - -An existing watch is removed via - - int ret = inotify_rm_watch (fd, wd); - -Events are provided in the form of an inotify_event structure that is read(2) -from a given inotify instance. The filename is of dynamic length and follows -the struct. It is of size len. The filename is padded with null bytes to -ensure proper alignment. This padding is reflected in len. - -You can slurp multiple events by passing a large buffer, for example - - size_t len = read (fd, buf, BUF_LEN); - -Where "buf" is a pointer to an array of "inotify_event" structures at least -BUF_LEN bytes in size. The above example will return as many events as are -available and fit in BUF_LEN. - -Each inotify instance fd is also select()- and poll()-able. - -You can find the size of the current event queue via the standard FIONREAD -ioctl on the fd returned by inotify_init(). - -All watches are destroyed and cleaned up on close. - - -(ii) - -Prototypes: - - int inotify_init (void); - int inotify_add_watch (int fd, const char *path, __u32 mask); - int inotify_rm_watch (int fd, __u32 mask); - - -(iii) Kernel Interface - -Inotify's kernel API consists a set of functions for managing watches and an -event callback. - -To use the kernel API, you must first initialize an inotify instance with a set -of inotify_operations. You are given an opaque inotify_handle, which you use -for any further calls to inotify. - - struct inotify_handle *ih = inotify_init(my_event_handler); - -You must provide a function for processing events and a function for destroying -the inotify watch. - - void handle_event(struct inotify_watch *watch, u32 wd, u32 mask, - u32 cookie, const char *name, struct inode *inode) - - watch - the pointer to the inotify_watch that triggered this call - wd - the watch descriptor - mask - describes the event that occurred - cookie - an identifier for synchronizing events - name - the dentry name for affected files in a directory-based event - inode - the affected inode in a directory-based event - - void destroy_watch(struct inotify_watch *watch) - -You may add watches by providing a pre-allocated and initialized inotify_watch -structure and specifying the inode to watch along with an inotify event mask. -You must pin the inode during the call. You will likely wish to embed the -inotify_watch structure in a structure of your own which contains other -information about the watch. Once you add an inotify watch, it is immediately -subject to removal depending on filesystem events. You must grab a reference if -you depend on the watch hanging around after the call. - - inotify_init_watch(&my_watch->iwatch); - inotify_get_watch(&my_watch->iwatch); // optional - s32 wd = inotify_add_watch(ih, &my_watch->iwatch, inode, mask); - inotify_put_watch(&my_watch->iwatch); // optional - -You may use the watch descriptor (wd) or the address of the inotify_watch for -other inotify operations. You must not directly read or manipulate data in the -inotify_watch. Additionally, you must not call inotify_add_watch() more than -once for a given inotify_watch structure, unless you have first called either -inotify_rm_watch() or inotify_rm_wd(). - -To determine if you have already registered a watch for a given inode, you may -call inotify_find_watch(), which gives you both the wd and the watch pointer for -the inotify_watch, or an error if the watch does not exist. - - wd = inotify_find_watch(ih, inode, &watchp); - -You may use container_of() on the watch pointer to access your own data -associated with a given watch. When an existing watch is found, -inotify_find_watch() bumps the refcount before releasing its locks. You must -put that reference with: - - put_inotify_watch(watchp); - -Call inotify_find_update_watch() to update the event mask for an existing watch. -inotify_find_update_watch() returns the wd of the updated watch, or an error if -the watch does not exist. - - wd = inotify_find_update_watch(ih, inode, mask); - -An existing watch may be removed by calling either inotify_rm_watch() or -inotify_rm_wd(). - - int ret = inotify_rm_watch(ih, &my_watch->iwatch); - int ret = inotify_rm_wd(ih, wd); - -A watch may be removed while executing your event handler with the following: - - inotify_remove_watch_locked(ih, iwatch); - -Call inotify_destroy() to remove all watches from your inotify instance and -release it. If there are no outstanding references, inotify_destroy() will call -your destroy_watch op for each watch. - - inotify_destroy(ih); - -When inotify removes a watch, it sends an IN_IGNORED event to your callback. -You may use this event as an indication to free the watch memory. Note that -inotify may remove a watch due to filesystem events, as well as by your request. -If you use IN_ONESHOT, inotify will remove the watch after the first event, at -which point you may call the final inotify_put_watch. - -(iv) Kernel Interface Prototypes - - struct inotify_handle *inotify_init(struct inotify_operations *ops); - - inotify_init_watch(struct inotify_watch *watch); - - s32 inotify_add_watch(struct inotify_handle *ih, - struct inotify_watch *watch, - struct inode *inode, u32 mask); - - s32 inotify_find_watch(struct inotify_handle *ih, struct inode *inode, - struct inotify_watch **watchp); - - s32 inotify_find_update_watch(struct inotify_handle *ih, - struct inode *inode, u32 mask); - - int inotify_rm_wd(struct inotify_handle *ih, u32 wd); - - int inotify_rm_watch(struct inotify_handle *ih, - struct inotify_watch *watch); - - void inotify_remove_watch_locked(struct inotify_handle *ih, - struct inotify_watch *watch); - - void inotify_destroy(struct inotify_handle *ih); - - void get_inotify_watch(struct inotify_watch *watch); - void put_inotify_watch(struct inotify_watch *watch); - - -(v) Internal Kernel Implementation - -Each inotify instance is represented by an inotify_handle structure. -Inotify's userspace consumers also have an inotify_device which is -associated with the inotify_handle, and on which events are queued. - -Each watch is associated with an inotify_watch structure. Watches are chained -off of each associated inotify_handle and each associated inode. - -See fs/notify/inotify/inotify_fsnotify.c and fs/notify/inotify/inotify_user.c -for the locking and lifetime rules. - - -(vi) Rationale +(i) Rationale Q: What is the design decision behind not tying the watch to the open fd of the watched object? diff --git a/Documentation/filesystems/nfs/nfs41-server.txt b/Documentation/filesystems/nfs/nfs41-server.txt index c49cd7e796e7..682a59fabe3f 100644 --- a/Documentation/filesystems/nfs/nfs41-server.txt +++ b/Documentation/filesystems/nfs/nfs41-server.txt @@ -24,11 +24,6 @@ focuses on the mandatory-to-implement NFSv4.1 Sessions, providing "exactly once" semantics and better control and throttling of the resources allocated for each client. -Other NFSv4.1 features, Parallel NFS operations in particular, -are still under development out of tree. -See http://wiki.linux-nfs.org/wiki/index.php/PNFS_prototype_design -for more information. - The table below, taken from the NFSv4.1 document, lists the operations that are mandatory to implement (REQ), optional (OPT), and NFSv4.0 operations that are required not to implement (MNI) @@ -43,9 +38,7 @@ The OPTIONAL features identified and their abbreviations are as follows: The following abbreviations indicate the linux server implementation status. I Implemented NFSv4.1 operations. NS Not Supported. - NS* unimplemented optional feature. - P pNFS features implemented out of tree. - PNS pNFS features that are not supported yet (out of tree). + NS* Unimplemented optional feature. Operations @@ -70,13 +63,13 @@ I | DESTROY_SESSION | REQ | | Section 18.37 | I | EXCHANGE_ID | REQ | | Section 18.35 | I | FREE_STATEID | REQ | | Section 18.38 | | GETATTR | REQ | | Section 18.7 | -P | GETDEVICEINFO | OPT | pNFS (REQ) | Section 18.40 | -P | GETDEVICELIST | OPT | pNFS (OPT) | Section 18.41 | +I | GETDEVICEINFO | OPT | pNFS (REQ) | Section 18.40 | +NS*| GETDEVICELIST | OPT | pNFS (OPT) | Section 18.41 | | GETFH | REQ | | Section 18.8 | NS*| GET_DIR_DELEGATION | OPT | DDELG (REQ) | Section 18.39 | -P | LAYOUTCOMMIT | OPT | pNFS (REQ) | Section 18.42 | -P | LAYOUTGET | OPT | pNFS (REQ) | Section 18.43 | -P | LAYOUTRETURN | OPT | pNFS (REQ) | Section 18.44 | +I | LAYOUTCOMMIT | OPT | pNFS (REQ) | Section 18.42 | +I | LAYOUTGET | OPT | pNFS (REQ) | Section 18.43 | +I | LAYOUTRETURN | OPT | pNFS (REQ) | Section 18.44 | | LINK | OPT | | Section 18.9 | | LOCK | REQ | | Section 18.10 | | LOCKT | REQ | | Section 18.11 | @@ -122,9 +115,9 @@ Callback Operations | | MNI | or OPT) | | +-------------------------+-----------+-------------+---------------+ | CB_GETATTR | OPT | FDELG (REQ) | Section 20.1 | -P | CB_LAYOUTRECALL | OPT | pNFS (REQ) | Section 20.3 | +I | CB_LAYOUTRECALL | OPT | pNFS (REQ) | Section 20.3 | NS*| CB_NOTIFY | OPT | DDELG (REQ) | Section 20.4 | -P | CB_NOTIFY_DEVICEID | OPT | pNFS (OPT) | Section 20.12 | +NS*| CB_NOTIFY_DEVICEID | OPT | pNFS (OPT) | Section 20.12 | NS*| CB_NOTIFY_LOCK | OPT | | Section 20.11 | NS*| CB_PUSH_DELEG | OPT | FDELG (OPT) | Section 20.5 | | CB_RECALL | OPT | FDELG, | Section 20.2 | diff --git a/Documentation/filesystems/nfs/pnfs-block-server.txt b/Documentation/filesystems/nfs/pnfs-block-server.txt new file mode 100644 index 000000000000..2143673cf154 --- /dev/null +++ b/Documentation/filesystems/nfs/pnfs-block-server.txt @@ -0,0 +1,37 @@ +pNFS block layout server user guide + +The Linux NFS server now supports the pNFS block layout extension. In this +case the NFS server acts as Metadata Server (MDS) for pNFS, which in addition +to handling all the metadata access to the NFS export also hands out layouts +to the clients to directly access the underlying block devices that are +shared with the client. + +To use pNFS block layouts with with the Linux NFS server the exported file +system needs to support the pNFS block layouts (currently just XFS), and the +file system must sit on shared storage (typically iSCSI) that is accessible +to the clients in addition to the MDS. As of now the file system needs to +sit directly on the exported volume, striping or concatenation of +volumes on the MDS and clients is not supported yet. + +On the server, pNFS block volume support is automatically if the file system +support it. On the client make sure the kernel has the CONFIG_PNFS_BLOCK +option enabled, the blkmapd daemon from nfs-utils is running, and the +file system is mounted using the NFSv4.1 protocol version (mount -o vers=4.1). + +If the nfsd server needs to fence a non-responding client it calls +/sbin/nfsd-recall-failed with the first argument set to the IP address of +the client, and the second argument set to the device node without the /dev +prefix for the file system to be fenced. Below is an example file that shows +how to translate the device into a serial number from SCSI EVPD 0x80: + +cat > /sbin/nfsd-recall-failed << EOF +#!/bin/sh + +CLIENT="$1" +DEV="/dev/$2" +EVPD=`sg_inq --page=0x80 ${DEV} | \ + grep "Unit serial number:" | \ + awk -F ': ' '{print $2}'` + +echo "fencing client ${CLIENT} serial ${EVPD}" >> /var/log/pnfsd-fence.log +EOF diff --git a/Documentation/filesystems/nfs/pnfs.txt b/Documentation/filesystems/nfs/pnfs.txt index adc81a35fe2d..44a9f2493a88 100644 --- a/Documentation/filesystems/nfs/pnfs.txt +++ b/Documentation/filesystems/nfs/pnfs.txt @@ -57,15 +57,16 @@ bit is set, preventing any new lsegs from being added. layout drivers -------------- -PNFS utilizes what is called layout drivers. The STD defines 3 basic -layout types: "files" "objects" and "blocks". For each of these types -there is a layout-driver with a common function-vectors table which -are called by the nfs-client pnfs-core to implement the different layout -types. +PNFS utilizes what is called layout drivers. The STD defines 4 basic +layout types: "files", "objects", "blocks", and "flexfiles". For each +of these types there is a layout-driver with a common function-vectors +table which are called by the nfs-client pnfs-core to implement the +different layout types. -Files-layout-driver code is in: fs/nfs/nfs4filelayout.c && nfs4filelayoutdev.c +Files-layout-driver code is in: fs/nfs/filelayout/.. directory Objects-layout-deriver code is in: fs/nfs/objlayout/.. directory Blocks-layout-deriver code is in: fs/nfs/blocklayout/.. directory +Flexfiles-layout-driver code is in: fs/nfs/flexfilelayout/.. directory objects-layout setup -------------------- diff --git a/Documentation/filesystems/ocfs2.txt b/Documentation/filesystems/ocfs2.txt index 7618a287aa41..28f8c08201e2 100644 --- a/Documentation/filesystems/ocfs2.txt +++ b/Documentation/filesystems/ocfs2.txt @@ -100,3 +100,7 @@ coherency=full (*) Disallow concurrent O_DIRECT writes, cluster inode coherency=buffered Allow concurrent O_DIRECT writes without EX lock among nodes, which gains high performance at risk of getting stale data on other nodes. +journal_async_commit Commit block can be written to disk without waiting + for descriptor blocks. If enabled older kernels cannot + mount the device. This will enable 'journal_checksum' + internally. diff --git a/Documentation/filesystems/proc.txt b/Documentation/filesystems/proc.txt index aae9dd13c91f..cf8fc2f0b34b 100644 --- a/Documentation/filesystems/proc.txt +++ b/Documentation/filesystems/proc.txt @@ -28,7 +28,7 @@ Table of Contents 1.6 Parallel port info in /proc/parport 1.7 TTY info in /proc/tty 1.8 Miscellaneous kernel statistics in /proc/stat - 1.9 Ext4 file system parameters + 1.9 Ext4 file system parameters 2 Modifying System Parameters @@ -42,6 +42,7 @@ Table of Contents 3.6 /proc/<pid>/comm & /proc/<pid>/task/<tid>/comm 3.7 /proc/<pid>/task/<tid>/children - Information about task children 3.8 /proc/<pid>/fdinfo/<fd> - Information about opened file + 3.9 /proc/<pid>/map_files - Information about memory mapped files 4 Configuring procfs 4.1 Mount options @@ -1763,6 +1764,28 @@ pair provide additional information particular to the objects they represent. with TIMER_ABSTIME option which will be shown in 'settime flags', but 'it_value' still exhibits timer's remaining time. +3.9 /proc/<pid>/map_files - Information about memory mapped files +--------------------------------------------------------------------- +This directory contains symbolic links which represent memory mapped files +the process is maintaining. Example output: + + | lr-------- 1 root root 64 Jan 27 11:24 333c600000-333c620000 -> /usr/lib64/ld-2.18.so + | lr-------- 1 root root 64 Jan 27 11:24 333c81f000-333c820000 -> /usr/lib64/ld-2.18.so + | lr-------- 1 root root 64 Jan 27 11:24 333c820000-333c821000 -> /usr/lib64/ld-2.18.so + | ... + | lr-------- 1 root root 64 Jan 27 11:24 35d0421000-35d0422000 -> /usr/lib64/libselinux.so.1 + | lr-------- 1 root root 64 Jan 27 11:24 400000-41a000 -> /usr/bin/ls + +The name of a link represents the virtual memory bounds of a mapping, i.e. +vm_area_struct::vm_start-vm_area_struct::vm_end. + +The main purpose of the map_files is to retrieve a set of memory mapped +files in a fast way instead of parsing /proc/<pid>/maps or +/proc/<pid>/smaps, both of which contain many more records. At the same +time one can open(2) mappings from the listings of two processes and +comparing their inode numbers to figure out which anonymous memory areas +are actually shared. + ------------------------------------------------------------------------------ Configuring procfs ------------------------------------------------------------------------------ diff --git a/Documentation/filesystems/seq_file.txt b/Documentation/filesystems/seq_file.txt index b797ed38de46..9de4303201e1 100644 --- a/Documentation/filesystems/seq_file.txt +++ b/Documentation/filesystems/seq_file.txt @@ -194,16 +194,16 @@ which is in the string esc will be represented in octal form in the output. There are also a pair of functions for printing filenames: - int seq_path(struct seq_file *m, struct path *path, char *esc); - int seq_path_root(struct seq_file *m, struct path *path, - struct path *root, char *esc) + int seq_path(struct seq_file *m, const struct path *path, + const char *esc); + int seq_path_root(struct seq_file *m, const struct path *path, + const struct path *root, const char *esc) Here, path indicates the file of interest, and esc is a set of characters which should be escaped in the output. A call to seq_path() will output the path relative to the current process's filesystem root. If a different -root is desired, it can be used with seq_path_root(). Note that, if it -turns out that path cannot be reached from root, the value of root will be -changed in seq_file_root() to a root which *does* work. +root is desired, it can be used with seq_path_root(). If it turns out that +path cannot be reached from root, seq_path_root() returns SEQ_SKIP. A function producing complicated output may want to check bool seq_has_overflowed(struct seq_file *m); diff --git a/Documentation/filesystems/xfs.txt b/Documentation/filesystems/xfs.txt index 5be51fd888bd..0bfafe108357 100644 --- a/Documentation/filesystems/xfs.txt +++ b/Documentation/filesystems/xfs.txt @@ -287,9 +287,9 @@ The following sysctls are available for the XFS filesystem: XFS_ERRLEVEL_LOW: 1 XFS_ERRLEVEL_HIGH: 5 - fs.xfs.panic_mask (Min: 0 Default: 0 Max: 127) + fs.xfs.panic_mask (Min: 0 Default: 0 Max: 255) Causes certain error conditions to call BUG(). Value is a bitmask; - AND together the tags which represent errors which should cause panics: + OR together the tags which represent errors which should cause panics: XFS_NO_PTAG 0 XFS_PTAG_IFLUSH 0x00000001 @@ -299,6 +299,7 @@ The following sysctls are available for the XFS filesystem: XFS_PTAG_SHUTDOWN_CORRUPT 0x00000010 XFS_PTAG_SHUTDOWN_IOERROR 0x00000020 XFS_PTAG_SHUTDOWN_LOGERROR 0x00000040 + XFS_PTAG_FSBLOCK_ZERO 0x00000080 This option is intended for debugging only. @@ -348,16 +349,13 @@ The following sysctls are available for the XFS filesystem: Deprecated Sysctls ================== - fs.xfs.xfsbufd_centisecs (Min: 50 Default: 100 Max: 3000) - Dirty metadata is now tracked by the log subsystem and - flushing is driven by log space and idling demands. The - xfsbufd no longer exists, so this syctl does nothing. +None at present. - Due for removal in 3.14. - fs.xfs.age_buffer_centisecs (Min: 100 Default: 1500 Max: 720000) - Dirty metadata is now tracked by the log subsystem and - flushing is driven by log space and idling demands. The - xfsbufd no longer exists, so this syctl does nothing. +Removed Sysctls +=============== - Due for removal in 3.14. + Name Removed + ---- ------- + fs.xfs.xfsbufd_centisec v3.20 + fs.xfs.age_buffer_centisecs v3.20 |