summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* bonding: remap muticast addresses without using dev_close() and dev_open()Moni Shoua2009-09-1510-6/+82
| | | | | | | | | | | | | | | | | | | | This patch fixes commit e36b9d16c6a6d0f59803b3ef04ff3c22c3844c10. The approach there is to call dev_close()/dev_open() whenever the device type is changed in order to remap the device IP multicast addresses to HW multicast addresses. This approach suffers from 2 drawbacks: *. It assumes tha the device is UP when calling dev_close(), or otherwise dev_close() has no affect. It is worth to mention that initscripts (Redhat) and sysconfig (Suse) doesn't act the same in this matter. *. dev_close() has other side affects, like deleting entries from the routing table, which might be unnecessary. The fix here is to directly remap the IP multicast addresses to HW multicast addresses for a bonding device that changes its type, and nothing else. Reported-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Signed-off-by: Moni Shoua <monis@voltaire.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* can: fix NOHZ local_softirq_pending 08 warningOliver Hartkopp2009-09-152-2/+4
| | | | | | | | | | | | | | | | | | | | When using nanosleep() in an userspace application we get a ratelimit warning NOHZ: local_softirq_pending 08 for 10 times. The echo of CAN frames is done from process context and softirq context only. Therefore the usage of netif_rx() was wrong (for years). This patch replaces netif_rx() with netif_rx_ni() which has to be used from process/softirq context. It also adds a missing comment that can_send() must no be used from hardirq context. Signed-off-by: Oliver Hartkopp <oliver@hartkopp.net> Signed-off-by: Urs Thuermann <urs@isnogud.escape.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: fix ssthresh u16 leftoverIlpo Järvinen2009-09-156-7/+15
| | | | | | | | | | | | | | It was once upon time so that snd_sthresh was a 16-bit quantity. ...That has not been true for long period of time. I run across some ancient compares which still seem to trust such legacy. Put all that magic into a single place, I hopefully found all of them. Compile tested, though linking of allyesconfig is ridiculous nowadays it seems. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi> Signed-off-by: David S. Miller <davem@davemloft.net>
* pkt_sched: Fix qdisc_graft WRT ingress qdiscJarek Poplawski2009-09-151-5/+10
| | | | | | | | After the recent mq change using ingress qdisc overwrites dev->qdisc; there is also a wrong old qdisc pointer passed to notify_and_destroy. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* igb: do not allow phy sw reset code to make calls to null pointersAlexander Duyck2009-09-151-1/+4
| | | | | | | | | | | In the case of fiber and serdes adapters we were seeing issues with ethtool -t causing kernel panics due to null function pointers. To prevent this we need to exit out of the phy reset code in the event that we do not have a valid phy. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* igb: reset sgmii phy at start of initAlexander Duyck2009-09-154-109/+95
| | | | | | | | | | | Our SGMII phy code was incomplete in that it was not actually placing the phy in SGMII mode and as a result the PHY was not able to establish a link when connected to a non serdes link partner. This patch updates the code to combine the SGMII/serdes PCS init and to add the necessary reset. Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ixgbe: Create separate media type for CX4 adaptersPeter P Waskiewicz Jr2009-09-153-3/+6
| | | | | | | | | | | | | | Currently the media type detection for CX4 adapters lumps them into a type of fiber. This causes some strange fallout when firmware verification is done on the NIC, and certain fiber NIC rules get enforced incorrectly. This patch introduces a new media type for CX4, and puts both 82598 and 82599 CX4 adapters into this bucket. Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ixgbe: Add support for 82599-based CX4 adaptersPeter P Waskiewicz Jr2009-09-153-0/+6
| | | | | | | | | This patch adds support for CX4 adapters based on 82599. Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ixgbe: Properly disable packet split per-ring when globally disabledPeter P Waskiewicz Jr2009-09-151-0/+2
| | | | | | | | | | | | | | | | The packet split feature was recently moved out of the adapter-wide flags feature field and into a per-Rx ring feature field. In the process, packet split isn't properly disabled in the Rx ring if the adapter has it globally disabled, followed by a device reset. This won't impact the driver today, since it's always in packet split mode. However, this will prevent any pitfalls if someone disables packet split on the adapter in the future and doesn't disable it in each ring. Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: Don Skidmore <donald.c.skidmore@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* cdc-phonet: remove noisy debug statementRémi Denis-Courmont2009-09-151-1/+0
| | | | | | | From: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Phonet: Netlink event for autoconfigured addressesRémi Denis-Courmont2009-09-151-1/+8
| | | | | | | From: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: constify remaining proto_opsAlexey Dobriyan2009-09-155-8/+8
| | | | | Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: constify struct inet6_protocolAlexey Dobriyan2009-09-1517-33/+29
| | | | | Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: constify struct net_protocolAlexey Dobriyan2009-09-1514-32/+27
| | | | | | | Remove long removed "inet_protocol_base" declaration. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netdev: smc91x: drop Blackfin cruftMichael Hennerich2009-09-151-28/+0
| | | | | | | | | Now that all Blackfin boards are using the board resources, we don't need to keep the arch/board specific crap in the driver header. Signed-off-by: Michael Hennerich <michael.hennerich@analog.com> Signed-off-by: Mike Frysinger <vapier@gentoo.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* net-next-2.6 [PATCH 1/1] dccp: ccids whitespace-cleanup / CodingStyleGerrit Renker2009-09-1511-67/+48
| | | | | | | | | | | | | | | No code change, cosmetical changes only: * whitespace cleanup via scripts/cleanfile, * remove self-references to filename at top of files, * fix coding style (extraneous brackets), * fix documentation style (kernel-doc-nano-HOWTO). Thanks are due to Ivo Augusto Calado who raised these issues by submitting good-quality patches. Signed-off-by: Gerrit Renker <gerrit@erg.abdn.ac.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
* genetlink: fix netns vs. netlink table lockingJohannes Berg2009-09-153-23/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | Since my commits introducing netns awareness into genetlink we can get this problem: BUG: scheduling while atomic: modprobe/1178/0x00000002 2 locks held by modprobe/1178: #0: (genl_mutex){+.+.+.}, at: [<ffffffff8135ee1a>] genl_register_mc_grou #1: (rcu_read_lock){.+.+..}, at: [<ffffffff8135eeb5>] genl_register_mc_g Pid: 1178, comm: modprobe Not tainted 2.6.31-rc8-wl-34789-g95cb731-dirty # Call Trace: [<ffffffff8103e285>] __schedule_bug+0x85/0x90 [<ffffffff81403138>] schedule+0x108/0x588 [<ffffffff8135b131>] netlink_table_grab+0xa1/0xf0 [<ffffffff8135c3a7>] netlink_change_ngroups+0x47/0x100 [<ffffffff8135ef0f>] genl_register_mc_group+0x12f/0x290 because I overlooked that netlink_table_grab() will schedule, thinking it was just the rwlock. However, in the contention case, that isn't actually true. Fix this by letting the code grab the netlink table lock first and then the RCU for netns protection. Signed-off-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* Have atalk_route_packet() return NET_RX_SUCCESS not NET_XMIT_SUCCESSMark Smith2009-09-151-1/+1
| | | | | | | | | | | | Have atalk_route_packet() return NET_RX_SUCCESS not NET_XMIT_SUCCESS atalk_route_packet() returns NET_RX_DROP if it's call to aarp_send_ddp() returns NET_XMIT_DROP. If aarp_send_ddp() returns anything else atalk_route_packet() should return NET_RX_SUCCESS, not NET_XMIT_SUCCESS. Signed-off-by: Mark Smith <markzzzsmith@yahoo.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'osync_cleanup' of ↵Linus Torvalds2009-09-1418-380/+204
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6 * 'osync_cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6: fsync: wait for data writeout completion before calling ->fsync vfs: Remove generic_osync_inode() and sync_page_range{_nolock}() fat: Opencode sync_page_range_nolock() pohmelfs: Use new syncing helper xfs: Convert sync_page_range() to simple filemap_write_and_wait_range() ocfs2: Update syncing after splicing to match generic version ntfs: Use new syncing helpers and update comments ext4: Remove syncing logic from ext4_file_write ext3: Remove syncing logic from ext3_file_write ext2: Update comment about generic_osync_inode vfs: Introduce new helpers for syncing after writing to O_SYNC file or IS_SYNC inode vfs: Rename generic_file_aio_write_nolock ocfs2: Use __generic_file_aio_write instead of generic_file_aio_write_nolock pohmelfs: Use __generic_file_aio_write instead of generic_file_aio_write_nolock vfs: Remove syncing from generic_file_direct_write() and generic_file_buffered_write() vfs: Export __generic_file_aio_write() and add some comments vfs: Introduce filemap_fdatawait_range
| * fsync: wait for data writeout completion before calling ->fsyncChristoph Hellwig2009-09-141-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currenly vfs_fsync(_range) first calls filemap_fdatawrite to write out the data, the calls into ->fsync to write out the metadata and then finally calls filemap_fdatawait to wait for the data I/O to complete. What sounds like a clever micro-optimization actually is nast trap for many filesystems. For many modern filesystems i_size or other inode information is only updated on I/O completion and we need to wait for I/O to finish before we can write out the metadata. For old fashionen filesystems that instanciate blocks during the actual write and also update the metadata at that point it opens up a large window were we could expose uninitialized blocks after a crash. While a few filesystems that need it already wait for the I/O to finish inside their ->fsync methods it is rather suboptimal as it is done under the i_mutex and also always for the whole file instead of just a part as we could do for O_SYNC handling. Here is a small audit of all fsync instances in the tree: - spufs_mfc_fsync: - ps3flash_fsync: - vol_cdev_fsync: - printer_fsync: - fb_deferred_io_fsync: - bad_file_fsync: - simple_sync_file: don't care - filesystems/drivers do't use the page cache or are purely in-memory. - simple_fsync: - file_fsync: - affs_file_fsync: - fat_file_fsync: - jfs_fsync: - ubifs_fsync: - reiserfs_dir_fsync: - reiserfs_sync_file: never touch pagecache themselves. We need to wait before if we do not want to expose stale data after an allocation. - afs_fsync: - fuse_fsync_common: do the waiting writeback itself in awkward ways, would benefit from proper semantics - block_fsync: Does a filemap_write_and_wait on the block device inode. Because we now have f_mapping that is the same inode we call it on in vfs_fsync. So just removing it and letting the VFS do the work in one go would be an improvement. - btrfs_sync_file: - cifs_fsync: - xfs_file_fsync: need the wait first and currently do it themselves. would benefit from doing it outside i_mutex. - coda_fsync: - ecryptfs_fsync: - exofs_file_fsync: - shm_fsync: only passes the fsync through to the lower layer - ext3_sync_file: doesn't seem to care, comments are confusing. - ext4_sync_file: would need the wait to work correctly for delalloc mode with late i_size updates. Otherwise the ext3 comment applies. currently implemens it's own writeback and wait in an odd way, could benefit from doing it properly. - gfs2_fsync: not needed for journaled data mode, but probably harmless there. Currently writes back data asynchronously itself. Needs some major audit. - hostfs_fsync: just calls fsync/datasync on the host FD. Without the wait before data might not even be inflight yet if we're unlucky. - hpfs_file_fsync: - ncp_fsync: no-ops. Dangerous before and after. - jffs2_fsync: just calls jffs2_flush_wbuf_gc, not sure how this relates to data. - nfs_fsync_dir: just increments stats, claims all directory operations are synchronous - nfs_file_fsync: only writes out data??? Looks very odd. - nilfs_sync_file: looks like it expects all data done, but not sure from the code - ntfs_dir_fsync: - ntfs_file_fsync: appear to do their own data writeback. Very convoluted code. - ocfs2_sync_file: does it's own data writeback, but no wait. probably needs the wait. - smb_fsync: according to a comment expects all pages written already, probably needs the wait before. This patch only changes vfs_fsync_range, removal of the wait in the methods that have it is left to the filesystem maintainers. Note that most filesystems really do need an audit for their fsync methods given the gems found in this very brief audit. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
| * vfs: Remove generic_osync_inode() and sync_page_range{_nolock}()Jan Kara2009-09-144-127/+0
| | | | | | | | | | | | Remove these three functions since nobody uses them anymore. Signed-off-by: Jan Kara <jack@suse.cz>
| * fat: Opencode sync_page_range_nolock()Jan Kara2009-09-142-4/+22
| | | | | | | | | | | | | | | | | | | | | | | | fat_cont_expand() is the only user of sync_page_range_nolock(). It's also the only user of generic_osync_inode() which does not have a file open. So opencode needed actions for FAT so that we can convert generic_osync_inode() to a standard syncing path. Update a comment about generic_osync_inode(). CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Jan Kara <jack@suse.cz>
| * pohmelfs: Use new syncing helperJan Kara2009-09-141-2/+2
| | | | | | | | | | | | | | Use new generic_write_sync() helper instead of sync_page_range(). Acked-by: Evgeniy Polyakov <zbr@ioremap.net> Signed-off-by: Jan Kara <jack@suse.cz>
| * xfs: Convert sync_page_range() to simple filemap_write_and_wait_range()Jan Kara2009-09-141-1/+2
| | | | | | | | | | | | | | | | | | | | | | Christoph Hellwig says that it is enough for XFS to call filemap_write_and_wait_range() instead of sync_page_range() because we do all the metadata syncing when forcing the log. CC: Felix Blyakher <felixb@sgi.com> CC: xfs@oss.sgi.com CC: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
| * ocfs2: Update syncing after splicing to match generic versionJan Kara2009-09-141-21/+6
| | | | | | | | | | | | | | | | | | | | | | | | Update ocfs2 specific splicing code to use generic syncing helper. The sync now does not happen under rw_lock because generic_write_sync() acquires i_mutex which ranks above rw_lock. That should not matter because standard fsync path does not hold it either. Acked-by: Joel Becker <Joel.Becker@oracle.com> Acked-by: Mark Fasheh <mfasheh@suse.com> CC: ocfs2-devel@oss.oracle.com Signed-off-by: Jan Kara <jack@suse.cz>
| * ntfs: Use new syncing helpers and update commentsJan Kara2009-09-142-19/+10
| | | | | | | | | | | | | | | | | | | | Use new syncing helpers in .write and .aio_write functions. Also remove superfluous syncing in ntfs_file_buffered_write() and update comments about generic_osync_inode(). CC: Anton Altaparmakov <aia21@cantab.net> CC: linux-ntfs-dev@lists.sourceforge.net Signed-off-by: Jan Kara <jack@suse.cz>
| * ext4: Remove syncing logic from ext4_file_writeJan Kara2009-09-141-51/+2
| | | | | | | | | | | | | | | | | | The syncing is now properly handled by generic_file_aio_write() so no special ext4 code is needed. CC: linux-ext4@vger.kernel.org CC: tytso@mit.edu Signed-off-by: Jan Kara <jack@suse.cz>
| * ext3: Remove syncing logic from ext3_file_writeJan Kara2009-09-141-60/+1
| | | | | | | | | | | | | | | | Syncing is now properly done by generic_file_aio_write() so no special logic is needed in ext3. CC: linux-ext4@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>
| * ext2: Update comment about generic_osync_inodeJan Kara2009-09-141-1/+1
| | | | | | | | | | | | | | We rely on generic_write_sync() now. CC: linux-ext4@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>
| * vfs: Introduce new helpers for syncing after writing to O_SYNC file or ↵Jan Kara2009-09-144-30/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | IS_SYNC inode Introduce new function for generic inode syncing (vfs_fsync_range) and use it from fsync() path. Introduce also new helper for syncing after a sync write (generic_write_sync) using the generic function. Use these new helpers for syncing from generic VFS functions. This makes O_SYNC writes to block devices acquire i_mutex for syncing. If we really care about this, we can make block_fsync() drop the i_mutex and reacquire it before it returns. CC: Evgeniy Polyakov <zbr@ioremap.net> CC: ocfs2-devel@oss.oracle.com CC: Joel Becker <joel.becker@oracle.com> CC: Felix Blyakher <felixb@sgi.com> CC: xfs@oss.sgi.com CC: Anton Altaparmakov <aia21@cantab.net> CC: linux-ntfs-dev@lists.sourceforge.net CC: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> CC: linux-ext4@vger.kernel.org CC: tytso@mit.edu Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
| * vfs: Rename generic_file_aio_write_nolockChristoph Hellwig2009-09-144-43/+33
| | | | | | | | | | | | | | | | | | | | generic_file_aio_write_nolock() is now used only by block devices and raw character device. Filesystems should use __generic_file_aio_write() in case generic_file_aio_write() doesn't suit them. So rename the function to blkdev_aio_write() and move it to fs/blockdev.c. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
| * ocfs2: Use __generic_file_aio_write instead of generic_file_aio_write_nolockJan Kara2009-09-141-10/+12
| | | | | | | | | | | | | | | | | | | | | | Use the new helper. We have to submit data pages ourselves in case of O_SYNC write because __generic_file_aio_write does not do it for us. OCFS2 developpers might think about moving the sync out of i_mutex which seems to be easily possible but that's out of scope of this patch. CC: ocfs2-devel@oss.oracle.com Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Jan Kara <jack@suse.cz>
| * pohmelfs: Use __generic_file_aio_write instead of generic_file_aio_write_nolockJan Kara2009-09-141-1/+1
| | | | | | | | | | | | | | | | Use new helper __generic_file_aio_write(). Since the fs takes care of syncing by itself afterwards, there are no more changes needed. CC: Evgeniy Polyakov <zbr@ioremap.net> Signed-off-by: Jan Kara <jack@suse.cz>
| * vfs: Remove syncing from generic_file_direct_write() and ↵Jan Kara2009-09-141-29/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | generic_file_buffered_write() generic_file_direct_write() and generic_file_buffered_write() called generic_osync_inode() if it was called on O_SYNC file or IS_SYNC inode. But this is superfluous since generic_file_aio_write() does the syncing as well. Also XFS and OCFS2 which call these functions directly handle syncing themselves. So let's have a single place where syncing happens: generic_file_aio_write(). We slightly change the behavior by syncing only the range of file to which the write happened for buffered writes but that should be all that is required. CC: ocfs2-devel@oss.oracle.com CC: Joel Becker <joel.becker@oracle.com> CC: Felix Blyakher <felixb@sgi.com> CC: xfs@oss.sgi.com Signed-off-by: Jan Kara <jack@suse.cz>
| * vfs: Export __generic_file_aio_write() and add some commentsJan Kara2009-09-142-7/+52
| | | | | | | | | | | | | | | | | | | | | | | | Rename __generic_file_aio_write_nolock() to __generic_file_aio_write(), add comments to write helpers explaining how they should be used and export __generic_file_aio_write() since it will be used by some filesystems. CC: ocfs2-devel@oss.oracle.com CC: Joel Becker <joel.becker@oracle.com> Acked-by: Evgeniy Polyakov <zbr@ioremap.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
| * vfs: Introduce filemap_fdatawait_rangeJan Kara2009-09-142-0/+22
| | | | | | | | | | | | | | | | This simple helper saves some filesystems conversion from byte offset to page numbers and also makes the fdata* interface more complete. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz>
* | Merge branch 'master' of ↵Linus Torvalds2009-09-1421-806/+678
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw * 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/steve/gfs2-2.6-nmw: GFS2: Whitespace fixes GFS2: Remove unused sysfs file GFS2: Be extra careful about deallocating inodes GFS2: Remove no_formal_ino generating code GFS2: Rename eattr.[ch] as xattr.[ch] GFS2: Clean up of extended attribute support GFS2: Add explanation of extended attr on-disk format GFS2: Add "-o errors=panic|withdraw" mount options GFS2: jumping to wrong label? GFS2: free disk inode which is deleted by remote node -V2 GFS2: Add a document explaining GFS2's uevents GFS2: Add sysfs link to device GFS2: Replace assertion with proper error handling GFS2: Improve error handling in inode allocation GFS2: Add some more info to uevents GFS2: Add online uevent to GFS2
| * | GFS2: Whitespace fixesSteven Whitehouse2009-09-143-4/+4
| | | | | | | | | | | | | | | Reported-by: Daniel Walker <dwalker@fifo99.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Remove unused sysfs fileSteven Whitehouse2009-09-093-14/+1
| | | | | | | | | | | | | | | | | | | | | | | | The /sys/fs/gfs2/<fsname>/lock_module/id file has been unused for some time now, so we can remove it. We still accept the mount option though, as userspace still sends that. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Be extra careful about deallocating inodesSteven Whitehouse2009-09-084-35/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a potential race in the inode deallocation code if two nodes try to deallocate the same inode at the same time. Most of the issue is solved by the iopen locking. There is still a small window which is not covered by the iopen lock. This patches fixes that and also makes the deallocation code more robust in the face of any errors in the rgrp bitmaps, or erroneous iopen callbacks from other nodes. This does introduce one extra disk read, but that is generally not an issue since its the same block that must be written to later in the deallocation process. The total disk accesses therefore stay the same, Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Remove no_formal_ino generating codeSteven Whitehouse2009-08-275-190/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The inum structure used throughout GFS2 has two fields. One no_addr is the disk block number of the inode in question and is used everywhere as the inode number. The other, no_formal_ino, is used only as the generation number for NFS. Historically the no_formal_ino field was set using a complicated system of one global and one per-node file containing inode numbers in order to ensure that each no_formal_ino was unique. Also this code made no provision for what would happen when eventually the (64 bit) numbers ran out. Now I know that is pretty unlikely to happen given the large space of numbers, but it is possible nevertheless. The only guarantee required for no_formal_ino is that, for any single inode, the same number doesn't get reused too quickly. We already have a generation number which is kept in the inode and initialised from a counter in the resource group (almost no overhead, since we have to touch the resource group anyway in order to allocate an inode in the first place). Aside from ensuring that we never use the value 0 in the no_formal_ino field, we can use that counter directly. As a result of that change, we lose about 200 lines of code and also gain about 10 creates/sec on the postmark benchmark (on my test machine). Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Rename eattr.[ch] as xattr.[ch]Steven Whitehouse2009-08-267-6/+6
| | | | | | | | | | | | | | | | | | | | | Use the more conventional name for the extended attribute support code. Update all the places which care. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Clean up of extended attribute supportSteven Whitehouse2009-08-2611-526/+333
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This has been on my list for some time. We need to change the way in which we handle extended attributes to allow faster file creation times (by reducing the number of transactions required) and the extended attribute code is the main obstacle to this. In addition to that, the VFS provides a way to demultiplex the xattr calls which we ought to be using, rather than rolling our own. This patch changes the GFS2 code to use that VFS feature and as a result the code shrinks by a couple of hundred lines or so, and becomes easier to read. I'm planning on doing further clean up work in this area, but this patch is a good start. The cleaned up code also uses the more usual "xattr" shorthand, I plan to eliminate the use of "eattr" eventually and in the mean time it serves as a flag as to which bits of the code have been updated. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Add explanation of extended attr on-disk formatSteven Whitehouse2009-08-251-0/+22
| | | | | | | | | | | | | | | | | | | | | Some useful info regarding the on-disk representation of GFS2 extended attributes. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Add "-o errors=panic|withdraw" mount optionsBob Peterson2009-08-244-14/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds "-o errors=panic" and "-o errors=withdraw" to the gfs2 mount options. The "errors=withdraw" option is today's current behaviour, meaning to withdraw from the file system if a non-serious gfs2 error occurs. The new "errors=panic" option tells gfs2 to force a kernel panic if a non-serious gfs2 file system error occurs. This may be useful, for example, where fabric-level fencing is used that has no way to reboot (such as fence_scsi). Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: jumping to wrong label?Roel Kluin2009-08-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | Also a gfs2_glock_dq() is required here. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: free disk inode which is deleted by remote node -V2Wengang Wang2009-08-181-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | this patch is for the same problem that Benjamin Marzinski fixes at commit b94a170e96dc416828af9d350ae2e34b70ae7347 quotation of the original problem: ---cut here--- When a file is deleted from a gfs2 filesystem on one node, a dcache entry for it may still exist on other nodes in the cluster. If this happens, gfs2 will be unable to free this file on disk. Because of this, it's possible to have a gfs2 filesystem with no files on it and no free space. With this patch, when a node receives a callback notifying it that the file is being deleted on another node, it schedules a new workqueue thread to remove the file's dcache entry. ---end cut--- after applying Benjamin's patch, I think there is still a case in which the disk inode remains even when "no space" is hit. the case is that when running d_prune_aliases() against the inode, there are one or more dentries(aliases) which have reference count number > 0. in this case the dentries won't be pruned. and even later, the reference count becomes to 0, the dentries can still be cached in memory. unfortunately, no callback come again, things come back to the state before the callback runs. thus the on disk inode remains there until in memoryinode is removed for some other reason(shrinking inode cache or unmount the volume..). this patch is to remove those dentries when their reference count becomes to 0 and the inode is deleted by remote node. for implementation, gfs2_dentry_delete() is added as dentry_operations.d_delete. the function returns true when the inode is deleted by remote node. in dput(), gfs2_dentry_delete() is called and since it returns true, the dentry is unhashed from dcache and then removed. when all dentries are removed, the in memory inode get removed so that the on disk inode is freed. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Add a document explaining GFS2's ueventsSteven Whitehouse2009-08-171-0/+100
| | | | | | | | | | | | | | | | | | | | | | | | | | | This will be essential reading for anybody who wants to understand how GFS2 interacts with the userland gfs_controld, and the details of recovery. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com>
| * | GFS2: Add sysfs link to deviceSteven Whitehouse2009-08-171-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | This adds a link from the per-gfs2 sb sysfs directory to the block device upon which the filesystem is mounted. The link is called "device", strangely enough :-) Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>
| * | GFS2: Replace assertion with proper error handlingSteven Whitehouse2009-08-171-1/+3
| | | | | | | | | | | | | | | | | | | | | One fewer assert, one more place we can recover gracefully if there is an error. Signed-off-by: Steven Whitehouse <swhiteho@redhat.com>