summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* core: Fix user return notifier on fork()Avi Kivity2009-11-292-0/+9
| | | | | | | | | | | | | | | | | fork() clones all thread_info flags, including TIF_USER_RETURN_NOTIFY; if the new task is first scheduled on a cpu which doesn't have user return notifiers set, this causes user return notifiers to trigger without any way of clearing itself. This is easy to trigger with a forky workload on the host in parallel with kvm, resulting in a cpu in an endless loop on the verge of returning to userspace. Fix by dropping the TIF_USER_RETURN_NOTIFY immediately after fork. Signed-off-by: Avi Kivity <avi@redhat.com> LKML-Reference: <1259505288-16559-1-git-send-email-avi@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* x86: Fix user return notifier put_cpu_var() invocationStephen Rothwell2009-11-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Today's linux-next build (x86_64 allmodconfig) failed like this: kernel/user-return-notifier.c: In function 'fire_user_return_notifiers': kernel/user-return-notifier.c:45: error: expected expression before ')' token Introduced by commit 7c68af6e32c73992bad24107311f3433c89016e2 ("core, x86: Add user return notifiers") from the tip and kvm trees but revealed by commit e0fdb0e050eae331046385643618f12452aa7e73 ("percpu: add __percpu for sparse") from the percpu tree. Before that percpu tree commit, "put_cpu_var()" would compile without error (even though it really needs a parameter). Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Avi Kivity <avi@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Christoph Lameter <cl@linux-foundation.org> LKML-Reference: <20091102161722.eea4358d.sfr@canb.auug.org.au> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* x86: Fix user return notifier buildAvi Kivity2009-10-251-0/+1
| | | | | | | | | When CONFIG_USER_RETURN_NOTIFIER is set, we need to link kernel/user-return-notifier.o. Signed-off-by: Avi Kivity <avi@redhat.com> LKML-Reference: <1256473485-23109-1-git-send-email-avi@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* core, x86: Add user return notifiersAvi Kivity2009-10-017-2/+109
| | | | | | | | | | | | | | Add a general per-cpu notifier that is called whenever the kernel is about to return to userspace. The notifier uses a thread_info flag and existing checks, so there is no impact on user return or context switch fast paths. This will be used initially to speed up KVM task switching by lazily updating MSRs. Signed-off-by: Avi Kivity <avi@redhat.com> LKML-Reference: <1253342422-13811-1-git-send-email-avi@redhat.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2009-10-0170-166/+166
|\ | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: ax25: Fix possible oops in ax25_make_new net: restore tx timestamping for accelerated vlans Phonet: fix mutex imbalance sit: fix off-by-one in ipip6_tunnel_get_prl net: Fix sock_wfree() race net: Make setsockopt() optlen be unsigned.
| * ax25: Fix possible oops in ax25_make_newJarek Poplawski2009-10-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | In ax25_make_new, if kmemdup of digipeat returns an error, there would be an oops in sk_free while calling sk_destruct, because sk_protinfo is NULL at the moment; move sk->sk_destruct initialization after this. BTW of reported-by: Bernard Pidoux F6BVP <f6bvp@free.fr> Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: restore tx timestamping for accelerated vlansEric Dumazet2009-10-011-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 9b22ea560957de1484e6b3e8538f7eef202e3596 ( net: fix packet socket delivery in rx irq handler ) We lost rx timestamping of packets received on accelerated vlans. Effect is that tcpdump on real dev can show strange timings, since it gets rx timestamps too late (ie at skb dequeueing time, not at skb queueing time) 14:47:26.986871 IP 192.168.20.110 > 192.168.20.141: icmp 64: echo request seq 1 14:47:26.986786 IP 192.168.20.141 > 192.168.20.110: icmp 64: echo reply seq 1 14:47:27.986888 IP 192.168.20.110 > 192.168.20.141: icmp 64: echo request seq 2 14:47:27.986781 IP 192.168.20.141 > 192.168.20.110: icmp 64: echo reply seq 2 14:47:28.986896 IP 192.168.20.110 > 192.168.20.141: icmp 64: echo request seq 3 14:47:28.986780 IP 192.168.20.141 > 192.168.20.110: icmp 64: echo reply seq 3 Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * Phonet: fix mutex imbalanceRémi Denis-Courmont2009-10-011-1/+0
| | | | | | | | | | | | | | | | | | From: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> port_mutex was unlocked twice. Signed-off-by: Rémi Denis-Courmont <remi.denis-courmont@nokia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * sit: fix off-by-one in ipip6_tunnel_get_prlSascha Hlusiak2009-10-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | When requesting all prl entries (kprl.addr == INADDR_ANY) and there are more prl entries than there is space passed from userspace, the existing code would always copy cmax+1 entries, which is more than can be handled. This patch makes the kernel copy only exactly cmax entries. Signed-off-by: Sascha Hlusiak <contact@saschahlusiak.de> Acked-By: Fred L. Templin <Fred.L.Templin@boeing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: Fix sock_wfree() raceEric Dumazet2009-10-011-7/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 2b85a34e911bf483c27cfdd124aeb1605145dc80 (net: No more expensive sock_hold()/sock_put() on each tx) opens a window in sock_wfree() where another cpu might free the socket we are working on. A fix is to call sk->sk_write_space(sk) while still holding a reference on sk. Reported-by: Jike Song <albcamus@gmail.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: Make setsockopt() optlen be unsigned.David S. Miller2009-10-0167-153/+149
| | | | | | | | | | | | | | | | | | | | | | | | This provides safety against negative optlen at the type level instead of depending upon (sometimes non-trivial) checks against this sprinkled all over the the place, in each and every implementation. Based upon work done by Arjan van de Ven and feedback from Linus Torvalds. Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'sched-fixes-for-linus' of ↵Linus Torvalds2009-10-015-16/+85
|\ \ | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: sched_clock: Fix atomicity/continuity bug by using cmpxchg64() x86: Provide an alternative() based cmpxchg64()
| * | sched_clock: Fix atomicity/continuity bug by using cmpxchg64()Eric Dumazet2009-09-301-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit def0a9b2573 (sched_clock: Make it NMI safe) assumed cmpxchg() of 64bit values was available on X86_32. That is not so - and causes some subtle scheduler misbehavior due to incorrect timestamps off to up by ~4 seconds. Two symptoms are known right now: - interactivity problems seen by Arjan: up to 600 msecs latencies instead of the expected 20-40 msecs. These latencies are very visible on the desktop. - incorrect CPU stats: occasionally too high percentages in 'top', and crazy CPU usage stats. Reported-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: John Stultz <johnstul@us.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090930170754.0886ff2e@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | x86: Provide an alternative() based cmpxchg64()Arjan van de Ven2009-09-304-14/+83
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cmpxchg64() today generates, to quote Linus, "barf bag" code. cmpxchg64() is about to get used in the scheduler to fix a bug there, but it's a prerequisite that cmpxchg64() first be made non-sucking. This patch turns cmpxchg64() into an efficient implementation that uses the alternative() mechanism to just use the raw instruction on all modern systems. Note: the fallback is NOT smp safe, just like the current fallback is not SMP safe. (Interested parties with i486 based SMP systems are welcome to submit fix patches for that.) Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> [ fixed asm constraint bug ] Fixed-by: Eric Dumazet <eric.dumazet@gmail.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: John Stultz <johnstul@us.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <20090930170754.0886ff2e@infradead.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | Merge branch 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linusLinus Torvalds2009-09-3030-114/+1852
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'upstream' of git://ftp.linux-mips.org/pub/scm/upstream-linus: MIPS: Avoid spurious make includecheck message MIPS: VPE: Get rid of BKL. MIPS: VPE: Fix build after the credential changes a while ago. MIPS: Excite: Get rid of BKL. MIPS: Sibyte: Get rid of BKL. MIPS: BCM63xx: Add PCMCIA & Cardbus support. MIPS: MSP71xx: request_irq() failure ignored in msp_pcibios_config_access() MIPS: Decrease size of au1xxx_dbdma_pm_regs[][] MIPS: SMP: Inline arch_send_call_function_{single_ipi,ipi_mask} MIPS: SMP: Fix build. MIPS: MIPSxx SC: Avoid destructive invalidation on partial L2 cachelines. MIPS: Sibyte: Fix compilation error. MIPS: BCM1480: Re-apply patch lost due to bad resolution of merge conflict. MIPS: BCM63xx: Add serial driver for bcm63xx integrated UART. MIPS: Loongson2: Fix typo "enalbe" -> "enable" MIPS: SMTC: Remove duplicate structure field initialization MIPS: Remove duplicated #include MIPS: BCM63xx: Remove duplicated #include
| * | | MIPS: Avoid spurious make includecheck messageRalf Baechle2009-09-301-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | arch/mips/include/asm/unaligned.h: linux/unaligned/generic.h is included more than once. Entirely legitimate but just noise. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | | MIPS: VPE: Get rid of BKL.Ralf Baechle2009-09-302-42/+50
| | | | | | | | | | | | | | | | Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | | MIPS: VPE: Fix build after the credential changes a while ago.Ralf Baechle2009-09-301-10/+23
| | | | | | | | | | | | | | | | Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | | MIPS: Excite: Get rid of BKL.Ralf Baechle2009-09-301-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | It's not obvious what good it was supposed to do here anyway. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | | MIPS: Sibyte: Get rid of BKL.Ralf Baechle2009-09-301-18/+15
| | | | | | | | | | | | | | | | Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | | MIPS: BCM63xx: Add PCMCIA & Cardbus support.Maxime Bizon2009-09-308-1/+763
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Maxime Bizon <mbizon@freebox.fr> Reviewed-by: Wolfram Sang <w.sang@pengutronix.de> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | | MIPS: MSP71xx: request_irq() failure ignored in msp_pcibios_config_access()Roel Kluin2009-09-301-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Produce an error if request_irq() fails. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Cc: "Ithamar R. Adema" <ithamar.adema@team-embedded.nl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | | MIPS: Decrease size of au1xxx_dbdma_pm_regs[][]Roel Kluin2009-09-301-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are 16 individual channels (NUM_DBDMA_CHANS) to save/restore plus the global ddma block config (the +1). The last register in a channel can be skipped since it's read-only (at offset 0x18). Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Cc: Manuel Lauss <manuel.lauss@googlemail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | | MIPS: SMP: Inline arch_send_call_function_{single_ipi,ipi_mask}Ralf Baechle2009-09-302-15/+13
| | | | | | | | | | | | | | | | Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | | MIPS: SMP: Fix build.Ralf Baechle2009-09-302-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 48a048fed82a8e5fdd8618574f6d3de1a0d67a50 Author: Rusty Russell <rusty@rustcorp.com.au> Date: Thu Sep 24 09:34:44 2009 -0600 apparently only passed the "looks good" level of QA ;-) Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | | MIPS: MIPSxx SC: Avoid destructive invalidation on partial L2 cachelines.Kevin Cernekee2009-09-301-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This extends commit a8ca8b64e3fdfec17679cba0ca5ce6e3ffed092d to cover MIPSxx-style board cache code. Signed-off-by: Kevin Cernekee <cernekee@gmail.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | | MIPS: Sibyte: Fix compilation error.Mark Mason2009-09-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Build error introduced by d4f587c67fc39e0030ddd718675e252e208da4d7. Signed-off-by: Mark Mason <mmason@upwardaccess.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | | MIPS: BCM1480: Re-apply patch lost due to bad resolution of merge conflict.Ralf Baechle2009-09-301-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch 14275ccdb1e4b487cca745aba994699c426a31ee and d5dedd4507d307eb3f35f21b6e16f336fdc0d82a are conflicting and the conflict was resolved badly in merge 92241940be501f798cb21db344bbb3d1ec3c4f1c resulting in the BCM1480 changes of 14275ccdb1e4b487cca745aba994699c426a31ee getting lost. Sort out the damage. Reported and initial patch by Mark Mason <mmason@upwardaccess.com>. Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | | MIPS: BCM63xx: Add serial driver for bcm63xx integrated UART.Maxime Bizon2009-09-308-1/+964
| | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Maxime Bizon <mbizon@freebox.fr> Acked-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | | MIPS: Loongson2: Fix typo "enalbe" -> "enable"Uwe Kleine-König2009-09-301-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Yanhua <yanh@lemote.com> Cc: Robert Richter <robert.richter@amd.com> Acked-by: Wu Zhangjin <wuzj@lemote.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | | MIPS: SMTC: Remove duplicate structure field initializationJulia Lawall2009-09-301-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The definition of the irq_ipi structure has two initializations of the flags field. This combines them. [Ralf: The issue was originally introduced by commit be4894196d79455f420dd7bb78be7dc73bec115c (linux-mips.org) rsp. 033890b084adfa367c544864451d7730552ce8bf (kernel.org). The original intention of the code was to initialize .flags with both flags ored together. The broken C code as actually implemented will be compiled by an equally broken gcc to use only the last initialization, that is IRQF_PERCPU which means this turned into an SMTC bug for 2.6.23 and newer.] The semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r@ identifier I, s, fld; position p0,p; expression E; @@ struct I s =@p0 { ... .fld@p = E, ...}; @s@ identifier I, s, r.fld; position r.p0,p; expression E; @@ struct I s =@p0 { ... .fld@p = E, ...}; @script:python@ p0 << r.p0; fld << r.fld; ps << s.p; pr << r.p; @@ if int(ps[0].line)!=int(pr[0].line) or int(ps[0].column)!=int(pr[0].column): cocci.print_main(fld,p0) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | | MIPS: Remove duplicated #includeHuang Weiyi2009-09-301-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove duplicated #include in arch/mips/kernel/smp.c. Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
| * | | MIPS: BCM63xx: Remove duplicated #includeHuang Weiyi2009-09-301-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove duplicated #include in arch/mips/bcm63xx/boards/board_bcm963xx.c. Signed-off-by: Huang Weiyi <weiyi.huang@gmail.com> Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
* | | | Merge branch 'for-linus' of ↵Linus Torvalds2009-09-302-0/+2
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ryusuke/nilfs2: nilfs2: fix missing initialization of i_dir_start_lookup member nilfs2: fix missing zero-fill initialization of btree node cache
| * | | | nilfs2: fix missing initialization of i_dir_start_lookup memberRyusuke Konishi2009-09-291-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The i_dir_start_lookup field in nilfs_inode_info objects should be cleared when the objects are allocated, but the the initialization was missing in case of reading from disk. This adds the initialization. Since the variable just gives a start page on directory lookups, the bug was nonfatal until now. Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
| * | | | nilfs2: fix missing zero-fill initialization of btree node cacheRyusuke Konishi2009-09-291-0/+1
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This will fix file system corruption which infrequently happens after mount. The problem was reported from users with the title "[NILFS users] Fail to mount NILFS." (Message-ID: <200908211918.34720.yuri@itinteg.net>), and so forth. I've also experienced the corruption multiple times on kernel 2.6.30 and 2.6.31. The problem turned out to be caused due to discordance between mapping->nrpages of a btree node cache and the actual number of pages hung on the cache; if the mapping->nrpages becomes zero even as it has pages, truncate_inode_pages() returns without doing anything. Usually this is harmless except it may cause page leak, but garbage collection fairly infrequently sees a stale page remained in the btree node cache of DAT (i.e. disk address translation file of nilfs), and induces the corruption. I identified a missing initialization in btree node caches was the root cause. This corrects the bug. I've tested this for kernel 2.6.30 and 2.6.31. Reported-by: Yuri Chislov <yuri@itinteg.net> Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp> Cc: stable <stable@kernel.org>
* | | | Merge branch 'for_linus' of ↵Linus Torvalds2009-09-3020-751/+1362
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: Fix time encoding with extra epoch bits ext4: Add a stub for mpage_da_data in the trace header jbd2: Use tracepoints for history file ext4: Use tracepoints for mb_history trace file ext4, jbd2: Drop unneeded printks at mount and unmount time ext4: Handle nested ext4_journal_start/stop calls without a journal ext4: Make sure ext4_dirty_inode() updates the inode in no journal mode ext4: Avoid updating the inode table bh twice in no journal mode ext4: EXT4_IOC_MOVE_EXT: Check for different original and donor inodes first ext4: async direct IO for holes and fallocate support ext4: Use end_io callback to avoid direct I/O fallback to buffered I/O ext4: Split uninitialized extents for direct I/O ext4: release reserved quota when block reservation for delalloc retry ext4: Adjust ext4_da_writepages() to write out larger contiguous chunks ext4: Fix hueristic which avoids group preallocation for closed files ext4: Use ext4_msg() for ext4_da_writepage() errors ext4: Update documentation about quota mount options
| * | | | ext4: Fix time encoding with extra epoch bitsTheodore Ts'o2009-09-301-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "Looking at ext4.h, I think the setting of extra time fields forgets to mask the epoch bits so the epoch part overwrites nsec part. The second change is only for coherency (2 -> EXT4_EPOCH_BITS)." Thanks to Damien Guibouret for pointing out this problem. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | | | ext4: Add a stub for mpage_da_data in the trace headerJosh Stone2009-09-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The tracepoint ext4_da_write_pages has a struct mpage_da_data* parameter, but that struct is only defined in fs/ext4/ext4.h. This patch adds a forward declaration for that struct, so this tracepoint header can still be used by tools like SystemTap. This is a continuation of the fix in commit 3661d286. http://sourceware.org/bugzilla/show_bug.cgi?id=10703 Signed-off-by: Josh Stone <jistone@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | | | jbd2: Use tracepoints for history fileTheodore Ts'o2009-09-305-228/+130
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The /proc/fs/jbd2/<dev>/history was maintained manually; by using tracepoints, we can get all of the existing functionality of the /proc file plus extra capabilities thanks to the ftrace infrastructure. We save memory as a bonus. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | | | ext4: Use tracepoints for mb_history trace fileTheodore Ts'o2009-09-306-348/+182
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The /proc/fs/ext4/<dev>/mb_history was maintained manually, and had a number of problems: it required a largish amount of memory to be allocated for each ext4 filesystem, and the s_mb_history_lock introduced a CPU contention problem. By ripping out the mb_history code and replacing it with ftrace tracepoints, and we get more functionality: timestamps, event filtering, the ability to correlate mballoc history with other ext4 tracepoints, etc. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | | | ext4, jbd2: Drop unneeded printks at mount and unmount timeTheodore Ts'o2009-09-295-22/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are a number of kernel printk's which are printed when an ext4 filesystem is mounted and unmounted. Disable them to economize space in the system logs. In addition, disabling the mballoc stats by default saves a number of unneeded atomic operations for every block allocation or deallocation. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | | | ext4: Handle nested ext4_journal_start/stop calls without a journalCurt Wohlgemuth2009-09-293-13/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch fixes a problem with handling nested calls to ext4_journal_start/ext4_journal_stop, when there is no journal present. Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | | | ext4: Make sure ext4_dirty_inode() updates the inode in no journal modeCurt Wohlgemuth2009-09-291-15/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch a problem that ext4_dirty_inode() was not calling ext4_mark_inode_dirty() if the current_handle is not valid, which it is the case in no journal mode. It also removes a test for non-matching transaction which can never happen. Signed-off-by: Curt Wohlgemuth <curtw@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | | | ext4: Avoid updating the inode table bh twice in no journal modeFrank Mayhar2009-09-291-21/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a cleanup of commit 91ac6f4. Since ext4_mark_inode_dirty() has already called ext4_mark_iloc_dirty(), which in turn calls ext4_do_update_inode(), it's not necessary to have ext4_write_inode() call ext4_do_update_inode() in no journal mode. Indeed, it would be duplicated work. Reviewed-by: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Frank Mayhar <fmayhar@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | | | ext4: EXT4_IOC_MOVE_EXT: Check for different original and donor inodes firstTheodore Ts'o2009-09-281-8/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the check to make sure the original and donor inodes are different earlier, to avoid a potential deadlock by trying to lock the same inode twice. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | | | ext4: async direct IO for holes and fallocate supportMingming Cao2009-09-285-41/+234
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For async direct IO that covers holes or fallocate, the end_io callback function now queued the convertion work on workqueue but don't flush the work rightaway as it might take too long to afford. But when fsync is called after all the data is completed, user expects the metadata also being updated before fsync returns. Thus we need to flush the conversion work when fsync() is called. This patch keep track of a listed of completed async direct io that has a work queued on workqueue. When fsync() is called, it will go through the list and do the conversion. Signed-off-by: Mingming Cao <cmm@us.ibm.com>
| * | | | ext4: Use end_io callback to avoid direct I/O fallback to buffered I/OMingming Cao2009-09-283-1/+210
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the DIO VFS code passes create = 0 when writing to the middle of file. It does this to avoid block allocation for holes, so as not to expose stale data out when there is a parallel buffered read (which does not hold the i_mutex lock). Direct I/O writes into holes falls back to buffered IO for this reason. Since preallocated extents are treated as holes when doing a get_block() look up (buffer is not mapped), direct IO over fallocate also falls back to buffered IO. Thus ext4 actually silently falls back to buffered IO in above two cases, which is undesirable. To fix this, this patch creates unitialized extents when a direct I/O write into holes in sparse files, and registering an end_io callback which converts the uninitialized extent to an initialized extent after the I/O is completed. Singed-Off-By: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | | | ext4: Split uninitialized extents for direct I/OMingming Cao2009-09-286-42/+419
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When writing into an unitialized extent via direct I/O, and the direct I/O doesn't exactly cover the unitialized extent, split the extent into uninitialized and initialized extents before submitting the I/O. This avoids needing to deal with an ENOSPC error in the end_io callback that gets used for direct I/O. When the IO is complete, the written extent will be marked as initialized. Singed-Off-By: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | | | ext4: release reserved quota when block reservation for delalloc retryMingming Cao2009-09-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ext4_da_reserve_space() can reserve quota blocks multiple times if ext4_claim_free_blocks() fail and we retry the allocation. We should release the quota reservation before restarting. Bug found by Jan Kara. Signed-off-by: Mingming Cao <cmm@us.ibm.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>