summaryrefslogtreecommitdiffstats
path: root/fs (follow)
Commit message (Collapse)AuthorAgeFilesLines
* ocfs2: reserve inline space for extended attributeTiger Yang2008-10-144-9/+62
| | | | | | | | | | Add the structures and helper functions we want for handling inline extended attributes. We also update the inline-data handlers so that they properly function in the event that we have both inline data and inline attributes sharing an inode block. Signed-off-by: Tiger Yang <tiger.yang@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
* ocfs2: Add extent tree operation for xattr value btreesTao Ma2008-10-1413-64/+569
| | | | | | | | | | | | | | | | | | | | | | Add some thin wrappers around ocfs2_insert_extent() for each of the 3 different btree types, ocfs2_inode_insert_extent(), ocfs2_xattr_value_insert_extent() and ocfs2_xattr_tree_insert_extent(). The last is for the xattr index btree, which will be used in a followup patch. All the old callers in file.c etc will call ocfs2_dinode_insert_extent(), while the other two handle the xattr issue. And the init of extent tree are handled by these functions. When storing xattr value which is too large, we will allocate some clusters for it and here ocfs2_extent_list and ocfs2_extent_rec will also be used. In order to re-use the b-tree operation code, a new parameter named "private" is added into ocfs2_extent_tree and it is used to indicate the root of ocfs2_exent_list. The reason is that we can't deduce the root from the buffer_head now. It may be in an inode, an ocfs2_xattr_block or even worse, in any place in an ocfs2_xattr_bucket. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
* ocfs2: Add helper function in uptodate.c for removing xattr clustersTao Ma2008-10-132-6/+29
| | | | | | | | | The old uptodate only handles the issue of removing one buffer_head from ocfs2 inode's buffer cache. With xattr clusters, we may need to remove multiple buffer_head's at a time. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
* ocfs2: Add the basic xattr disk layout in ocfs2_fs.hTao Ma2008-10-131-0/+118
| | | | | | | | | | | | | | | | | Ocfs2 uses a very flexible structure for storing extended attributes on disk. Small amount of attributes are stored directly in the inode block - up to 256 bytes worth. If that fills up, attributes are also stored in an external block, linked to from the inode block. That block can in turn expand to a btree, capable of storing large numbers of attributes. Individual attribute values are stored inline if they're small enough (currently about 80 bytes, this can be changed though), and otherwise are expanded to a btree. The theoretical limit to the size of an individual attribute is about the same as an inode, though the kernel's upper bound on the size of an attributes data is far smaller. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
* ocfs2: Make high level btree extend code genericTao Ma2008-10-137-135/+176
| | | | | | | | | | Factor out the non-inode specifics of ocfs2_do_extend_allocation() into a more generic function, ocfs2_do_cluster_allocation(). ocfs2_do_extend_allocation calls ocfs2_do_cluster_allocation() now, but the latter can be used for other btree types as well. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
* ocfs2: Abstract ocfs2_extent_tree in b-tree operations.Tao Ma2008-10-138-288/+456
| | | | | | | | | | | | | | | | | | | | | | | | | | In the old extent tree operation, we take the hypothesis that we are using the ocfs2_extent_list in ocfs2_dinode as the tree root. As xattr will also use ocfs2_extent_list to store large value for a xattr entry, we refactor the tree operation so that xattr can use it directly. The refactoring includes 4 steps: 1. Abstract set/get of last_eb_blk and update_clusters since they may be stored in different location for dinode and xattr. 2. Add a new structure named ocfs2_extent_tree to indicate the extent tree the operation will work on. 3. Remove all the use of fe_bh and di, use root_bh and root_el in extent tree instead. So now all the fe_bh is replaced with et->root_bh, el with root_el accordingly. 4. Make ocfs2_lock_allocators generic. Now it is limited to be only used in file extend allocation. But the whole function is useful when we want to store large EAs. Note: This patch doesn't touch ocfs2_commit_truncate() since it is not used for anything other than truncate inode data btrees. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
* ocfs2: Use ocfs2_extent_list instead of ocfs2_dinode.Tao Ma2008-10-138-20/+40
| | | | | | | | | | | | ocfs2_extend_meta_needed(), ocfs2_calc_extend_credits() and ocfs2_reserve_new_metadata() are all useful for extent tree operations. But they are all limited to an inode btree because they use a struct ocfs2_dinode parameter. Change their parameter to struct ocfs2_extent_list (the part of an ocfs2_dinode they actually use) so that the xattr btree code can use these functions. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
* ocfs2: Modify ocfs2_num_free_extents for future xattr usage.Tao Ma2008-10-136-11/+15
| | | | | | | | | | | ocfs2_num_free_extents() is used to find the number of free extent records in an inode btree. Hence, it takes an "ocfs2_dinode" parameter. We want to use this for extended attribute trees in the future, so genericize the interface the take a buffer head. A future patch will allow that buffer_head to contain any structure rooting an ocfs2 btree. Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
* ocfs2: track local alloc state via debugfsMark Fasheh2008-10-132-0/+92
| | | | | | | | | | A per-mount debugfs file, "local_alloc" is created which when read will expose live state of the nodes local alloc file. Performance impact is minimal, only a bit of memory overhead per mount point. Still, the code is hidden behind CONFIG_OCFS2_FS_STATS. This feature will help us debug local alloc performance problems on a live system. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
* ocfs2: throttle back local alloc when low on disk spaceMark Fasheh2008-10-136-31/+230
| | | | | | | | | | | | | | | | | | | | | | | Ocfs2's local allocator disables itself for the duration of a mount point when it has trouble allocating a large enough area from the primary bitmap. That can cause performance problems, especially for disks which were only temporarily full or fragmented. This patch allows for the allocator to shrink it's window first, before being disabled. Later, it can also be re-enabled so that any performance drop is minimized. To do this, we allow the value of osb->local_alloc_bits to be shrunk when needed. The default value is recorded in a mostly read-only variable so that we can re-initialize when required. Locking had to be updated so that we could protect changes to local_alloc_bits. Mostly this involves protecting various local alloc values with the osb spinlock. A new state is also added, OCFS2_LA_THROTTLED, which is used when the local allocator is has shrunk, but is not disabled. If the available space dips below 1 megabyte, the local alloc file is disabled. In either case, local alloc is re-enabled 30 seconds after the event, or when an appropriate amount of bits is seen in the primary bitmap. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
* ocfs2: Track local alloc bits internallyMark Fasheh2008-10-133-26/+26
| | | | | | | | | Do this instead of tracking absolute local alloc size. This avoids needless re-calculatiion of bits from bytes in localalloc.c. Additionally, the value is now in a more natural unit for internal file system bitmap work. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
* ocfs2: POSIX file locks supportMark Fasheh2008-10-138-2/+154
| | | | | | | | | | | | | | This is actually pretty easy since fs/dlm already handles the bulk of the work. The Ocfs2 userspace cluster stack module already uses fs/dlm as the underlying lock manager, so I only had to add the right calls. Cluster-aware POSIX locks ("plocks") can be turned off by the same means at UNIX locks - mount with 'noflocks', or create a local-only Ocfs2 volume. Internally, the file system uses two sets of file_operations, depending on whether cluster aware plocks is required. This turns out to be easier than implementing local-only versions of ->lock. Signed-off-by: Mark Fasheh <mfasheh@suse.com>
* vfs: Use const for kernel parser tableSteven Whitehouse2008-10-1329-34/+34
| | | | | | | | | | | | | | This is a much better version of a previous patch to make the parser tables constant. Rather than changing the typedef, we put the "const" in all the various places where its required, allowing the __initconst exception for nfsroot which was the cause of the previous trouble. This was posted for review some time ago and I believe its been in -mm since then. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Cc: Alexander Viro <aviro@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'proc' of ↵Linus Torvalds2008-10-1310-51/+25
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc * 'proc' of git://git.kernel.org/pub/scm/linux/kernel/git/adobriyan/proc: proc: remove kernel.maps_protect proc: remove now unneeded ADDBUF macro [PATCH] proc: show personality via /proc/pid/personality [PATCH] signal, procfs: some lock_task_sighand() users do not need rcu_read_lock() proc: move PROC_PAGE_MONITOR to fs/proc/Kconfig proc: make grab_header() static proc: remove unused get_dma_list() proc: remove dummy vmcore_open() proc: proc_sys_root tweak proc: fix return value of proc_reg_open() in "too late" case Fixed up trivial conflict in removed file arch/sparc/include/asm/dma_32.h
| * proc: remove kernel.maps_protectAlexey Dobriyan2008-10-104-25/+1
| | | | | | | | | | | | | | | | | | | | After commit 831830b5a2b5d413407adf380ef62fe17d6fcbf2 aka "restrict reading from /proc/<pid>/maps to those who share ->mm or can ptrace" sysctl stopped being relevant because commit moved security checks from ->show time to ->start time (mm_for_maps()). Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Kees Cook <kees.cook@canonical.com>
| * proc: remove now unneeded ADDBUF macroAlexey Dobriyan2008-10-101-5/+0
| | | | | | | | | | | | After local seq_file conversion it was forgotten. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
| * [PATCH] proc: show personality via /proc/pid/personalityKees Cook2008-10-101-0/+9
| | | | | | | | | | | | | | | | | | Make process personality flags visible in /proc. Since a process's personality is potentially sensitive (e.g. READ_IMPLIES_EXEC), make this file only readable by the process owner. Signed-off-by: Kees Cook <kees.cook@canonical.com> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
| * [PATCH] signal, procfs: some lock_task_sighand() users do not need ↵Lai Jiangshan2008-10-102-10/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | rcu_read_lock() lock_task_sighand() make sure task->sighand is being protected, so we do not need rcu_read_lock(). [ exec() will get task->sighand->siglock before change task->sighand! ] But code using rcu_read_lock() _just_ to protect lock_task_sighand() only appear in procfs. (and some code in procfs use lock_task_sighand() without such redundant protection.) Other subsystem may put lock_task_sighand() into rcu_read_lock() critical region, but these rcu_read_lock() are used for protecting "for_each_process()", "find_task_by_vpid()" etc. , not for protecting lock_task_sighand(). Signed-off-by: Lai Jiangshan <laijs@cn.fujitsu.com> [ok from Oleg] Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
| * proc: move PROC_PAGE_MONITOR to fs/proc/KconfigAlexey Dobriyan2008-10-101-0/+10
| | | | | | | | Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
| * proc: make grab_header() staticAdrian Bunk2008-10-101-1/+1
| | | | | | | | | | Signed-off-by: Adrian Bunk <bunk@kernel.org> Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
| * proc: remove unused get_dma_list()Alexey Dobriyan2008-10-101-1/+0
| | | | | | | | Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
| * proc: remove dummy vmcore_open()Alexey Dobriyan2008-10-101-6/+0
| | | | | | | | | | | | Empty ->open is equivalent to always succeeding ->open. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
| * proc: proc_sys_root tweakAlexey Dobriyan2008-10-101-2/+2
| | | | | | | | Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
| * proc: fix return value of proc_reg_open() in "too late" caseAlexey Dobriyan2008-10-101-1/+1
| | | | | | | | | | | | | | | | | | | | If ->open() wasn't called, returning 0 is misleading and, theoretically, oopsable: 1) remove_proc_entry clears ->proc_fops, drops lock, 2) ->open "succeeds", 3) ->release oopses, because it assumes ->open was called (single_release()). Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
* | Merge branch 'for-linus' of ↵Linus Torvalds2008-10-131-2/+1
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6: (24 commits) integrity: special fs magic As pointed out by Jonathan Corbet, the timer must be deleted before ERROR: code indent should use tabs where possible The tpm_dev_release function is only called for platform devices, not pnp Protect tpm_chip_list when transversing it. Renames num_open to is_open, as only one process can open the file at a time. Remove the BKL calls from the TPM driver, which were added in the overall netlabel: Add configuration support for local labeling cipso: Add support for native local labeling and fixup mapping names netlabel: Changes to the NetLabel security attributes to allow LSMs to pass full contexts selinux: Cache NetLabel secattrs in the socket's security struct selinux: Set socket NetLabel based on connection endpoint netlabel: Add functionality to set the security attributes of a packet netlabel: Add network address selectors to the NetLabel/LSM domain mapping netlabel: Add a generic way to create ordered linked lists of network addrs netlabel: Replace protocol/NetLabel linking with refrerence counts smack: Fix missing calls to netlbl_skbuff_err() selinux: Fix missing calls to netlbl_skbuff_err() selinux: Fix a problem in security_netlbl_sid_to_secattr() selinux: Better local/forward check in selinux_ip_postroute() ...
| * | integrity: special fs magicMimi Zohar2008-10-131-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Discussion on the mailing list questioned the use of these magic values in userspace, concluding these values are already exported to userspace via statfs and their correct/incorrect usage is left up to the userspace application. - Move special fs magic number definitions to magic.h - Add magic.h include Signed-off-by: Mimi Zohar <zohar@us.ibm.com> Reviewed-by: James Morris <jmorris@namei.org> Signed-off-by: James Morris <jmorris@namei.org>
* | | Merge git://git.infradead.org/users/dwmw2/random-2.6Linus Torvalds2008-10-132-4/+4
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.infradead.org/users/dwmw2/random-2.6: Fix autoloading of MacBook Pro backlight driver. Automatic MODULE_ALIAS() for DMI match tables. Remove asm/a.out.h files for all architectures without a.out support. Introduce HAVE_AOUT symbol to remove hard-coded arch list for BINFMT_AOUT Remove redundant CONFIG_ARCH_SUPPORTS_AOUT S390: Update comments about why we don't use <asm-generic/statfs.h> SPARC: Use <asm-generic/statfs.h> PowerPC: Use <asm-generic/statfs.h> PARISC: Use <asm-generic/statfs.h> x86_64: Use <asm-generic/statfs.h> IA64: Use <asm-generic/statfs.h> ARM: Use <asm-generic/statfs.h> Make <asm-generic/statfs.h> suitable for 64-bit platforms. Define and use PCI_DEVICE_ID_MARVELL_88ALP01_CCIC for CAFÉ camera driver [MTD] [NAND] Define and use PCI_DEVICE_ID_MARVELL_88ALP01_NAND for CAFÉ Use PCI_DEVICE_ID_88ALP01 for CAFÉ chip, rather than PCI_DEVICE_ID_CAFE. EFS: Don't set f_fsid in statfs().
| * \ \ Merge branch 'master' of ↵David Woodhouse2008-10-13118-4574/+4835
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: include/asm-x86/statfs.h
| * | | | Introduce HAVE_AOUT symbol to remove hard-coded arch list for BINFMT_AOUTDavid Woodhouse2008-09-061-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | HAVE_AOUT doesn't quite do the same thing as the recently removed ARCH_SUPPORTS_AOUT config option. That was set even on platforms where binfmt_aout isn't supported, although it's not entirely clear why. So it's best just to introduce a new symbol, handled consistently with other similar HAVE_xxx symbols; with a simple 'select' in the arch Kconfig. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
| * | | | Remove redundant CONFIG_ARCH_SUPPORTS_AOUTDavid Woodhouse2008-09-061-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't need this any more; arguably we never really did. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
| * | | | EFS: Don't set f_fsid in statfs().David Woodhouse2008-09-031-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't have any suitable value to put in f_fsid. Using EFS_MAGIC really isn't a good idea, because all EFS file systems will have the same f_fsid then. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
* | | | | Simplify devpts_pty_killSukadev Bhattiprolu2008-10-131-17/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When creating a new pty, save the pty's inode in the tty->driver_data. Use this inode in pty_kill() to identify the devpts instance. Since we now have the inode for the pty, we can skip get_node() lookup and remove the unused get_node(). TODO: - check if the mutex_lock is needed in pty_kill(). Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | Simplify devpts_pty_new()Sukadev Bhattiprolu2008-10-131-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | devpts_pty_new() is called when setting up a new pty and would not will not have an existing dentry or inode for the pty. So don't bother looking for an existing dentry - just create a new one. Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | Simplify devpts_get_tty()Sukadev Bhattiprolu2008-10-131-12/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As pointed out by H. Peter Anvin, since the inode for the pty is known, we don't need to look it up. Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | Add an instance parameter devpts interfacesSukadev Bhattiprolu2008-10-131-5/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pass-in 'inode' or 'tty' parameter to devpts interfaces. With multiple devpts instances, these parameters will be used in subsequent patches to identify the instance of devpts mounted. The parameters also help simplify devpts implementation. Changelog[v3]: - minor changes due to merge with ttydev updates - rename parameters to emphasize they are ptmx or pts inodes - pass-in tty_struct * to devpts_pty_kill() (this will help cleanup the get_node() call in a subsequent patch) Signed-off-by: Sukadev Bhattiprolu <sukadev@us.ibm.com> Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | tty: Redo current tty lockingAlan Cox2008-10-131-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently it is sometimes locked by the tty mutex and sometimes by the sighand lock. The latter is in fact correct and now we can hand back referenced objects we can fix this up without problems around sleeping functions. Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | tty: the vhangup syscall is racyAlan Cox2008-10-131-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We now have the infrastructure to sort this out but rather than teaching the syscall tty lock rules we move the hard work into a tty helper Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | tty: Make get_current_tty use a krefAlan Cox2008-10-131-3/+3
| |/ / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We now return a kref covered tty reference. That ensures the tty structure doesn't go away when you have a return from get_current_tty. This is not enough to protect you from most of the resources being freed behind your back - yet. [Updated to include fixes for SELinux problems found by Andrew Morton and an s390 leak found while debugging the former] Signed-off-by: Alan Cox <alan@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge branch 'for_linus' of ↵Linus Torvalds2008-10-138-31/+122
|\ \ \ \ | |_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix kconfig typo and extra whitespace ext4: fix build failure without procfs ext4: add an option to control error handling on file data jbd2: don't dirty original metadata buffer on abort ext4: add checks for errors from jbd2 jbd2: fix error handling for checkpoint io jbd2: abort when failed to log metadata buffers
| * | | ext4: fix kconfig typo and extra whitespaceJan Engelhardt2008-10-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * | | ext4: fix build failure without procfsAlexander Beregalov2008-10-121-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | fs/ext4/super.c: In function 'ext4_fill_super': fs/ext4/super.c:2226: error: 'ext4_ui_proc_fops' undeclared (first use in this function) fs/ext4/super.c:2226: error: (Each undeclared identifier is reported only once fs/ext4/super.c:2226: error: for each function it appears in.) Signed-off-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * | | ext4: add an option to control error handling on file dataHidehiro Kawai2008-10-113-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the journal doesn't abort when it gets an IO error in file data blocks, the file data corruption will spread silently. Because most of applications and commands do buffered writes without fsync(), they don't notice the IO error. It's scary for mission critical systems. On the other hand, if the journal aborts whenever it gets an IO error in file data blocks, the system will easily become inoperable. So this patch introduces a filesystem option to determine whether it aborts the journal or just call printk() when it gets an IO error in file data. If you mount an ext4 fs with data_err=abort option, it aborts on file data write error. If you mount it with data_err=ignore, it doesn't abort, just call printk(). data_err=ignore is the default. Here is the corresponding patch of the ext3 version: http://kerneltrap.org/mailarchive/linux-kernel/2008/9/9/3239374 Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * | | jbd2: don't dirty original metadata buffer on abortHidehiro Kawai2008-10-111-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, original metadata buffers are dirtied when they are unfiled whether the journal has aborted or not. Eventually these buffers will be written-back to the filesystem by pdflush. This means some metadata buffers are written to the filesystem without journaling if the journal aborts. So if both journal abort and system crash happen at the same time, the filesystem would become inconsistent state. Additionally, replaying journaled metadata can overwrite the latest metadata on the filesystem partly. Because, if the journal gets aborted, journaled metadata are preserved and replayed during the next mount not to lose uncheckpointed metadata. This would also break the consistency of the filesystem. This patch prevents original metadata buffers from being dirtied on abort by clearing BH_JBDDirty flag from those buffers. Thus, no metadata buffers are written to the filesystem without journaling. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * | | ext4: add checks for errors from jbd2Hidehiro Kawai2008-10-112-8/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the journal has aborted due to a checkpointing failure, we have to keep the contents of the journal space. Otherwise, the filesystem will lose uncheckpointed metadata completely and become inconsistent. To avoid this, we need to keep needs_recovery flag if checkpoint has failed. With this patch, ext4_put_super() detects a checkpointing failure from the return value of journal_destroy(), then it invokes ext4_abort() to make the filesystem read only and keep needs_recovery flag. Errors from jbd2_journal_flush() are also handled by this patch in some places. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * | | jbd2: fix error handling for checkpoint ioHidehiro Kawai2008-10-113-20/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a checkpointing IO fails, current JBD2 code doesn't check the error and continue journaling. This means latest metadata can be lost from both the journal and filesystem. This patch leaves the failed metadata blocks in the journal space and aborts journaling in the case of jbd2_log_do_checkpoint(). To achieve this, we need to do: 1. don't remove the failed buffer from the checkpoint list where in the case of __try_to_free_cp_buf() because it may be released or overwritten by a later transaction 2. jbd2_log_do_checkpoint() is the last chance, remove the failed buffer from the checkpoint list and abort the journal 3. when checkpointing fails, don't update the journal super block to prevent the journaled contents from being cleaned. For safety, don't update j_tail and j_tail_sequence either 4. when checkpointing fails, notify this error to the ext4 layer so that ext4 don't clear the needs_recovery flag, otherwise the journaled contents are ignored and cleaned in the recovery phase 5. if the recovery fails, keep the needs_recovery flag 6. prevent jbd2_cleanup_journal_tail() from being called between __jbd2_journal_drop_transaction() and jbd2_journal_abort() (a possible race issue between jbd2_log_do_checkpoint()s called by jbd2_journal_flush() and __jbd2_log_wait_for_space()) Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
| * | | jbd2: abort when failed to log metadata buffersHidehiro Kawai2008-10-121-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we failed to write metadata buffers to the journal space and succeeded to write the commit record, stale data can be written back to the filesystem as metadata in the recovery phase. To avoid this, when we failed to write out metadata buffers, abort the journal before writing the commit record. We can also avoid this kind of corruption by using the journal checksum feature because it can detect invalid metadata blocks in the journal and avoid them from being replayed. So we don't need to care about asynchronous commit record writeout with a checksum. Signed-off-by: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6Linus Torvalds2008-10-121-2/+0
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/sfrench/cifs-2.6: [CIFS] cifs: remove pointless lock and unlock of GlobalMid_Lock in header_assemble
| * | | | [CIFS] cifs: remove pointless lock and unlock of GlobalMid_Lock in ↵Jeff Layton2008-10-121-2/+0
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | header_assemble We lock GlobalMid_Lock in header_assemble and then immediately unlock it again without doing anything. Not sure what this was intended to do, but remove it. Signed-off-by: Jeff Layton <jlayton@redhat.com> Signed-off-by: Steve French <sfrench@us.ibm.com>
* / / / provide generic_block_fiemap() only with BLOCK=yAdrian Bunk2008-10-121-0/+4
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes the following compile error with CONFIG_BLOCK=n caused by commit 68c9d702bb72f367f3b148963ec6cf5e07ff7f65 ("generic block based fiemap implementation"): CC fs/ioctl.o fs/ioctl.c: In function 'generic_block_fiemap': fs/ioctl.c:249: error: storage size of 'tmp' isn't known fs/ioctl.c:272: error: invalid application of 'sizeof' to incomplete type 'struct buffer_head' fs/ioctl.c:280: error: implicit declaration of function 'buffer_mapped' fs/ioctl.c:249: warning: unused variable 'tmp' make[2]: *** [fs/ioctl.o] Error 1 Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge branch 'for_linus' of ↵Linus Torvalds2008-10-1141-2475/+2209
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (43 commits) ext4: Rename ext4dev to ext4 ext4: Avoid double dirtying of super block in ext4_put_super() Update ext4 MAINTAINERS file Hook ext4 to the vfs fiemap interface. generic block based fiemap implementation ocfs2: fiemap support vfs: vfs-level fiemap interface ext4: fix xattr deadlock jbd2: Fix buffer head leak when writing the commit block ext4: Add debugging markers that can be used by systemtap jbd2: abort instead of waiting for nonexistent transaction ext4: fix initialization of UNINIT bitmap blocks ext4: Remove old legacy block allocator ext4: Use readahead when reading an inode from the inode table ext4: Improve the documentation for ext4's /proc tunables ext4: Combine proc file handling into a single set of functions ext4: move /proc setup and teardown out of mballoc.c ext4: Don't use 'struct dentry' for internal lookups ext4/jbd2: Avoid WARN() messages when failing to write to the superblock ext4: use percpu data structures for lg_prealloc_list ...