summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* TCP: tcp_hybla: Fix integer overflow in slow start incrementDaniele Lacamera2010-06-021-2/+2
| | | | | | | For large values of rtt, 2^rho operation may overflow u32. Clamp down the increment to 2^16. Signed-off-by: Daniele Lacamera <root@danielinux.net> Signed-off-by: David S. Miller <davem@davemloft.net>
* act_nat: fix the wrong checksum when addr isn't in old_addr/maskChangli Gao2010-06-021-0/+4
| | | | | | | | | | | | | fix the wrong checksum when addr isn't in old_addr/mask For TCP and UDP packets, when addr isn't in old_addr/mask we don't do SNAT or DNAT, and we should not update layer 4 checksum. Signed-off-by: Changli Gao <xiaosuo@gmail.com> ---- net/sched/act_nat.c | 4 ++++ 1 file changed, 4 insertions(+) Signed-off-by: David S. Miller <davem@davemloft.net>
* net/fec: fix pm to survive to suspend/resumeEric Bénard2010-06-021-8/+8
| | | | | | | | | | | | | * in the actual driver, calling fec_stop and fec_enet_init doesn't allow to have a working network interface at resume (where a ifconfig down and up is required to recover the interface) * by using fec_enet_close and fec_enet_open, this patch solves this problem and handle the case where the link changed between suspend and resume * this patch also disable clock at suspend and reenable it at resume Signed-off-by: Eric Bénard <eric@eukrea.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* korina: count RX DMA OVR as rx_fifo_errorPhil Sutter2010-06-021-1/+1
| | | | | | | | | This way, RX DMA overruns (actually being caused by overrun of the 512byte input FIFO) show up in ifconfig output. The rx_fifo_errors counter is unused otherwise. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
* korina: use netdev_alloc_skb_ip_align() here, tooPhil Sutter2010-06-021-2/+1
| | | | | | | | This patch completes commit 89d71a66c40d629e3b1285def543ab1425558cd5 which missed this spot, as it seems. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
* korina: fix deadlock on RX FIFO overrunPhil Sutter2010-06-021-14/+13
| | | | | | | | | | | | | | | | | By calling korina_restart(), the IRQ handler tries to disable the interrupt it's currently serving. This leads to a deadlock since disable_irq() waits for any running IRQ handlers to finish before returning. This patch addresses the issue by turning korina_restart() into a workqueue task, which is then scheduled when needed. Reproducing the deadlock is easily done using e.g. GNU netcat to send large amounts of UDP data to the host running this driver. Note that the same problem (and fix) applies to TX FIFO underruns, but apparently these are less easy to trigger. Signed-off-by: Phil Sutter <phil@nwl.cc> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: fix conflict between null_or_orig and null_or_bondJohn Fastabend2010-06-021-4/+4
| | | | | | | | | | | | | | | | | If a skb is received on an inactive bond that does not meet the special cases checked for by skb_bond_should_drop it should only be delivered to exact matches as the comment in netif_receive_skb() says. However because null_or_bond could also be null this is not always true. This patch renames null_or_bond to orig_or_bond and initializes it to orig_dev. This keeps the intent of null_or_bond to pass frames received on VLAN interfaces stacked on bonding interfaces without invalidating the statement for null_or_orig. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: init_vlan should not copy slave or master flagsJohn Fastabend2010-06-021-1/+2
| | | | | | | | | The vlan device should not copy the slave or master flags from the real device. It is not in the bond until added nor is it a master. Signed-off-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* enic: bug fix: make the set/get netlink VF_PORT support symmetricalScott Feldman2010-06-022-103/+104
| | | | | | | | | | | | To make get/set netlink VF_PORT truly symmetrical, we need to keep track of what items are set and only return those items on get. Previously, the driver wasn't differentiating between a set of attr with a NULL string, for example, and not setting the attr at all. We only want to return the NULL string if the attr was actually set with a NULL string. Otherwise, don't return the attr. Signed-off-by: Scott Feldman <scofeldm@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* bnx2: Fix hang during rmmod bnx2.Michael Chan2010-06-021-1/+13
| | | | | | | | | | | | | | | | | | | | The regression is caused by: commit 4327ba435a56ada13eedf3eb332e583c7a0586a9 bnx2: Fix netpoll crash. If ->open() and ->close() are called multiple times, the same napi structs will be added to dev->napi_list multiple times, corrupting the dev->napi_list. This causes free_netdev() to hang during rmmod. We fix this by calling netif_napi_del() during ->close(). Also, bnx2_init_napi() must not be in the __devinit section since it is called by ->open(). Signed-off-by: Michael Chan <mchan@broadcom.com> Signed-off-by: Benjamin Li <benli@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* xfrm: force a dst reference in __xfrm_route_forward()Eric Dumazet2010-06-021-0/+1
| | | | | | | | | | | | | Packets going through __xfrm_route_forward() have a not refcounted dst entry, since we enabled a noref forwarding path. xfrm_lookup() might incorrectly release this dst entry. It's a bit late to make invasive changes in xfrm_lookup(), so lets force a refcount in this path. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* drivers/isdn/hardware/mISDN: Use GFP_ATOMIC when a lock is heldJulia Lawall2010-06-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function inittiger is only called from nj_init_card, where a lock is held. The semantic patch that makes this change is as follows: (http://coccinelle.lip6.fr/) // <smpl> @gfp exists@ identifier fn; position p; @@ fn(...) { ... when != spin_unlock_irqrestore when any GFP_KERNEL@p ... when any } @locked@ identifier gfp.fn; @@ spin_lock_irqsave(...) ... when != spin_unlock_irqrestore fn(...) @depends on locked@ position gfp.p; @@ - GFP_KERNEL@p + GFP_ATOMIC // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
* ksz884x: Add missing validate_addr hookDenis Kirjanov2010-06-011-0/+1
| | | | | | | Add missing validate_addr hook Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* ksz884x: convert to netdev_tx_tDenis Kirjanov2010-06-011-1/+1
| | | | | | | Convert TX hook to netdev_tx_t type Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* virtio-net: pass gfp to add_bufMichael S. Tsirkin2010-06-011-4/+4
| | | | | | | | | | | | virtio-net bounces buffer allocations off to a thread if it can't allocate buffers from the atomic pool. However, if posting buffers still requires atomic buffers, this is unlikely to succeed. Fix by passing in the proper gfp_t parameter. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* be2net: convert hdr.timeout in be_cmd_loopback_test() to le32Sathya Perla2010-06-011-1/+1
| | | | | | | | The current code fails on ppc as hdr.timeout is not being converted to le32. Signed-off-by: Sathya Perla <sathyap@serverengines.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* can: mpc5xxx_can.c: Fix build failureAnatolij Gustschin2010-06-011-5/+5
| | | | | | | | | | Fixes build error caused by the OF device_node pointer being moved into struct device. Signed-off-by: Anatolij Gustschin <agust@denx.de> Cc: Wolfgang Grandegger <wg@grandegger.com> Cc: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/ipv4/tcp_input.c: fix compilation breakage when FASTRETRANS_DEBUG > 1Joe Perches2010-06-011-2/+2
| | | | | | | | Commit: c720c7e8383aff1cb219bddf474ed89d850336e3 missed these. Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'master' of ↵David S. Miller2010-06-014-17/+6
|\ | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-2.6
| * netfilter: xtables: stackptr should be percpuEric Dumazet2010-05-314-13/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | commit f3c5c1bfd4 (netfilter: xtables: make ip_tables reentrant) introduced a performance regression, because stackptr array is shared by all cpus, adding cache line ping pongs. (16 cpus share a 64 bytes cache line) Fix this using alloc_percpu() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Acked-By: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
| * netfilter: don't xt_jumpstack_alloc twice in xt_register_tableXiaotian Feng2010-05-311-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In xt_register_table, xt_jumpstack_alloc is called first, later xt_replace_table is used. But in xt_replace_table, xt_jumpstack_alloc will be used again. Then the memory allocated by previous xt_jumpstack_alloc will be leaked. We can simply remove the previous xt_jumpstack_alloc because there aren't any users of newinfo between xt_jumpstack_alloc and xt_replace_table. Signed-off-by: Xiaotian Feng <dfeng@redhat.com> Cc: Patrick McHardy <kaber@trash.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jan Engelhardt <jengelh@medozas.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Alexey Dobriyan <adobriyan@gmail.com> Acked-By: Jan Engelhardt <jengelh@medozas.de> Signed-off-by: Patrick McHardy <kaber@trash.net>
* | net: sock_queue_err_skb() dont mess with sk_forward_allocEric Dumazet2010-06-014-24/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Correct sk_forward_alloc handling for error_queue would need to use a backlog of frames that softirq handler could not deliver because socket is owned by user thread. Or extend backlog processing to be able to process normal and error packets. Another possibility is to not use mem charge for error queue, this is what I implemented in this patch. Note: this reverts commit 29030374 (net: fix sk_forward_alloc corruptions), since we dont need to lock socket anymore. Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | greth: Fix build after OF device conversions.David S. Miller2010-05-311-6/+5
| | | | | | | | Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'master' of /home/davem/src/GIT/linux-2.6/David S. Miller2010-05-313919-121402/+289099
|\|
| * Linux 2.6.35-rc1v2.6.35-rc1Linus Torvalds2010-05-301-2/+2
| | | | | | | | .. and thus endeth the merge window.
| * Merge branch 'slub/urgent' of ↵Linus Torvalds2010-05-302-29/+15
| |\ | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6 * 'slub/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/penberg/slab-2.6: SLUB: Allow full duplication of kmalloc array for 390 slub: move kmem_cache_node into it's own cacheline
| | * SLUB: Allow full duplication of kmalloc array for 390Christoph Lameter2010-05-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 756dee75872a2a764b478e18076360b8a4ec9045 ("SLUB: Get rid of dynamic DMA kmalloc cache allocation") makes S390 run out of kmalloc caches. Increase the number of kmalloc caches to a safe size. Cc: <stable@kernel.org> [ .33 and .34 ] Reported-by: Heiko Carstens <heiko.carstens@de.ibm.com> Tested-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
| | * slub: move kmem_cache_node into it's own cachelineAlexander Duyck2010-05-242-28/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is meant to improve the performance of SLUB by moving the local kmem_cache_node lock into it's own cacheline separate from kmem_cache. This is accomplished by simply removing the local_node when NUMA is enabled. On my system with 2 nodes I saw around a 5% performance increase w/ hackbench times dropping from 6.2 seconds to 5.9 seconds on average. I suspect the performance gain would increase as the number of nodes increases, but I do not have the data to currently back that up. Bugzilla-Reference: http://bugzilla.kernel.org/show_bug.cgi?id=15713 Cc: <stable@kernel.org> Reported-by: Alex Shi <alex.shi@intel.com> Tested-by: Alex Shi <alex.shi@intel.com> Acked-by: Yanmin Zhang <yanmin_zhang@linux.intel.com> Acked-by: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: Pekka Enberg <penberg@cs.helsinki.fi>
| * | Merge branch 'core-fixes-for-linus' of ↵Linus Torvalds2010-05-301-0/+7
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: mutex: Fix optimistic spinning vs. BKL
| | * | mutex: Fix optimistic spinning vs. BKLTony Breeds2010-05-191-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, we can hit a nasty case with optimistic spinning on mutexes: CPU A tries to take a mutex, while holding the BKL CPU B tried to take the BLK while holding the mutex This looks like a AB-BA scenario but in practice, is allowed and happens due to the auto-release on schedule() nature of the BKL. In that case, the optimistic spinning code can get us into a situation where instead of going to sleep, A will spin waiting for B who is spinning waiting for A, and the only way out of that loop is the need_resched() test in mutex_spin_on_owner(). This patch fixes it by completely disabling spinning if we own the BKL. This adds one more detail to the extensive list of reasons why it's a bad idea for kernel code to be holding the BKL. Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: <stable@kernel.org> LKML-Reference: <20100519054636.GC12389@ozlabs.org> [ added an unlikely() attribute to the branch ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | Merge branch 'perf-fixes-for-linus' of ↵Linus Torvalds2010-05-308-16/+45
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf tui: Fix last use_browser problem related to .perfconfig perf symbols: Add the build id cache to the vmlinux path perf tui: Reset use_browser if stdout is not a tty ring-buffer: Move zeroing out excess in page to ring buffer code ring-buffer: Reset "real_end" when page is filled
| | * \ \ Merge branch 'tip/perf/core' of ↵Ingo Molnar2010-05-292-8/+17
| | |\ \ \ | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into perf/urgent
| | | * | | ring-buffer: Move zeroing out excess in page to ring buffer codeSteven Rostedt2010-05-252-8/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the trace splice code zeros out the excess bytes in the page before sending it off to userspace. This is to make sure userspace is not getting anything it should not be when reading the pages, because the excess data was never initialized to zero before writing (for perfomance reasons). But the splice code has no business in doing this work, it should be done by the ring buffer. With the latest changes for recording lost events, the splice code gets it wrong anyway. Move the zeroing out of excess bytes into the ring buffer code. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| | | * | | ring-buffer: Reset "real_end" when page is filledSteven Rostedt2010-05-251-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The code to store the "lost events" requires knowing the real end of the page. Since the 'commit' includes the padding at the end of a page a "real_end" variable was used to keep track of the end not including the padding. If events were lost, the reader can place the count of events in the padded area if there is enough room. The bug this patch fixes is that when we fill the page we do not reset the real_end variable, and if the writer had wrapped a few times, the real_end would be incorrect. This patch simply resets the real_end if the page was filled. Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
| | * | | | perf tui: Fix last use_browser problem related to .perfconfigArnaldo Carvalho de Melo2010-05-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we moved to using ~/.perfconfig to set the value of use_browser, it changed from a boolean to an int so that the convention used for use_pager was followed. That convention is: -1: unspecified, that is what use_{browser,pager} is initialized 0: Don't use the browser (should be TUI), because was explicitely set to 0/off/false on ~/.perfconfig [tui] cmd =, or because we're redirecting the stdout to a file or piping it to some other command (!isatty()). 1: Use the TUI Some code was not properly audited and continued testing it as a boolean, this seems to be the last one. Reported-by: Frédéric Weisbecker <fweisbec@gmail.com> Tested-by: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | perf symbols: Add the build id cache to the vmlinux pathArnaldo Carvalho de Melo2010-05-263-6/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | So that if the kernel DSO has a build id because record inserted it in the perf.data build id table in the header, or a BUILD_ID event was inserted in the stream, we first look at the build id cache ($HOME/.debug/). If we find it there, try to use it, allowing offline annotation in addition to 'perf report'. Reported-by: Stephane Eranian <eranian@google.com> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| | * | | | perf tui: Reset use_browser if stdout is not a ttyArnaldo Carvalho de Melo2010-05-262-1/+2
| | |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The newt initialization routines weren't being called because the output was a file (perf annotate > /tmp/bla) but use_browser was still 1, because ~/.perfconfig had it as 'on', so, later on newt routines segfaulted. Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <new-submission> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
| * | | | ia64: revert __node_random additionLinus Torvalds2010-05-301-17/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This partially reverts commit 4ec37de89d8c758ee8115e0e64b3f994910789ee ("[IA64] Fix build breakage"), since the commit that made it necessary got reverted earlier (see commit 35926ff5fba8, 'Revert "cpusets: randomize node rotor used in cpuset_mem_spread_node()"') Even if we ever re-introduce this, there is no reason to make __node_random be some architecture-specific function. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
| * | | | Merge branch 'for-linus' of ↵Linus Torvalds2010-05-307-82/+500
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: mm: export generic_pipe_buf_*() to modules fuse: support splice() reading from fuse device fuse: allow splice to move pages mm: export remove_from_page_cache() to modules mm: export lru_cache_add_*() to modules fuse: support splice() writing to fuse device fuse: get page reference for readpages fuse: use get_user_pages_fast() fuse: remove unneeded variable
| | * | | | mm: export generic_pipe_buf_*() to modulesMiklos Szeredi2010-05-261-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is needed by fuse device code which wants to create pipe buffers. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
| | * | | | fuse: support splice() reading from fuse deviceMiklos Szeredi2010-05-251-41/+187
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow userspace filesystem implementation to use splice() to read from the fuse device. The userspace filesystem can now transfer data coming from a WRITE request to an arbitrary file descriptor (regular file, block device or socket) without having to go through a userspace buffer. The semantics of using splice() to read messages are: 1) with a single splice() call move the whole message from the fuse device to a temporary pipe 2) read the header from the pipe and determine the message type 3a) if message is a WRITE then splice data from pipe to destination 3b) else read rest of message to userspace buffer Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
| | * | | | fuse: allow splice to move pagesMiklos Szeredi2010-05-253-15/+167
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When splicing buffers to the fuse device with SPLICE_F_MOVE, try to move pages from the pipe buffer into the page cache. This allows populating the fuse filesystem's cache without ever touching the page contents, i.e. zero copy read capability. The following steps are performed when trying to move a page into the page cache: - buf->ops->confirm() to make sure the new page is uptodate - buf->ops->steal() to try to remove the new page from it's previous place - remove_from_page_cache() on the old page - add_to_page_cache_locked() on the new page If any of the above steps fail (non fatally) then the code falls back to copying the page. In particular ->steal() will fail if there are external references (other than the page cache and the pipe buffer) to the page. Also since the remove_from_page_cache() + add_to_page_cache_locked() are non-atomic it is possible that the page cache is repopulated in between the two and add_to_page_cache_locked() will fail. This could be fixed by creating a new atomic replace_page_cache_page() function. fuse_readpages_end() needed to be reworked so it works even if page->mapping is NULL for some or all pages which can happen if the add_to_page_cache_locked() failed. A number of sanity checks were added to make sure the stolen pages don't have weird flags set, etc... These could be moved into generic splice/steal code. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
| | * | | | mm: export remove_from_page_cache() to modulesMiklos Szeredi2010-05-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is needed to enable moving pages into the page cache in fuse with splice(..., SPLICE_F_MOVE). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
| | * | | | mm: export lru_cache_add_*() to modulesMiklos Szeredi2010-05-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is needed to enable moving pages into the page cache in fuse with splice(..., SPLICE_F_MOVE). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
| | * | | | fuse: support splice() writing to fuse deviceMiklos Szeredi2010-05-252-32/+148
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allow userspace filesystem implementation to use splice() to write to the fuse device. The semantics of using splice() are: 1) buffer the message header and data in a temporary pipe 2) with a *single* splice() call move the message from the temporary pipe to the fuse device The READ reply message has the most interesting use for this, since now the data from an arbitrary file descriptor (which could be a regular file, a block device or a socket) can be tranferred into the fuse device without having to go through a userspace buffer. It will also allow zero copy moving of pages. One caveat is that the protocol on the fuse device requires the length of the whole message to be written into the header. But the length of the data transferred into the temporary pipe may not be known in advance. The current library implementation works around this by using vmplice to write the header and modifying the header after splicing the data into the pipe (error handling omitted): struct fuse_out_header out; iov.iov_base = &out; iov.iov_len = sizeof(struct fuse_out_header); vmsplice(pip[1], &iov, 1, 0); len = splice(input_fd, input_offset, pip[1], NULL, len, 0); /* retrospectively modify the header: */ out.len = len + sizeof(struct fuse_out_header); splice(pip[0], NULL, fuse_chan_fd(req->ch), NULL, out.len, flags); This works since vmsplice only saves a pointer to the data, it does not copy the data itself. Since pipes are currently limited to 16 pages and messages need to be spliced atomically, the length of the data is limited to 15 pages (or 60kB for 4k pages). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
| | * | | | fuse: get page reference for readpagesMiklos Szeredi2010-05-251-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Acquire a page ref on pages in ->readpages() and release them when the read has finished. Not acquiring a reference didn't seem to cause any trouble since the page is locked and will not be kicked out of the page cache during the read. However the following patches will want to remove the page from the cache so a separate ref is needed. Making the reference in req->pages explicit also makes the code easier to understand. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
| | * | | | fuse: use get_user_pages_fast()Miklos Szeredi2010-05-252-8/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace uses of get_user_pages() with get_user_pages_fast(). It looks nicer and should be faster in most cases. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
| | * | | | fuse: remove unneeded variableDan Carpenter2010-05-251-2/+2
| | | |_|/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "map" isn't needed any more after: 0bd87182d3ab18 "fuse: fix kunmap in fuse_ioctl_copy_user" Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
| * | | | Merge branch 'for-linus' of ↵Linus Torvalds2010-05-301-4/+5
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-kconfig * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-kconfig: kconfig: Hide error output in find command in streamline_config.pl kconfig: Fix typo in comment in streamline_config.pl kconfig: Make a variable local in streamline_config.pl
| | * | | | kconfig: Hide error output in find command in streamline_config.plToralf Förster2010-05-281-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Finding the list of Makefiles in streamline-config should not report errors. Also move the "chomp" to the @makefiles array instead of doing it in the for loop. This is more efficient, and does not make it any less readable by C programmers. Signed-off-by: Toralf Foerster <toralf.foerster@gmx.de> LKML-Reference: <201005262022.02928.toralf.foerster@gmx.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>