linux - linux

	Commit message (Collapse)	Author	Age	Files	Lines
*	percpu: teach large page allocator about NUMA	Tejun Heo	2009-07-04	3	-95/+359
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Large page first chunk allocator is primarily used for NUMA machines; however, its NUMA handling is extremely simplistic. Regardless of their proximity, each cpu is put into separate large page just to return most of the allocated space back wasting large amount of vmalloc space and increasing cache footprint. This patch teachs NUMA details to large page allocator. Given processor proximity information, pcpu_lpage_build_unit_map() will find fitting cpu -> unit mapping in which cpus in LOCAL_DISTANCE share the same large page and not too much virtual address space is wasted. This greatly reduces the unit and thus chunk size and wastes much less address space for the first chunk. For example, on 4/4 NUMA machine, the original code occupied 16MB of virtual space for the first chunk while the new code only uses 4MB - one 2MB page for each node. [ Impact: much better space efficiency on NUMA machines ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Jan Beulich <JBeulich@novell.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Miller <davem@davemloft.net>
*	percpu: allow non-linear / sparse cpu -> unit mapping	Tejun Heo	2009-07-04	3	-37/+97
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently cpu and unit are always identity mapped. To allow more efficient large page support on NUMA and lazy allocation for possible but offline cpus, cpu -> unit mapping needs to be non-linear and/or sparse. This can be easily implemented by adding a cpu -> unit mapping array and using it whenever looking up the matching unit for a cpu. The only unusal conversion is in pcpu_chunk_addr_search(). The passed in address is unit0 based and unit0 might not be in use so it needs to be converted to address of an in-use unit. This is easily done by adding the unit offset for the current processor. [ Impact: allows non-linear/sparse cpu -> unit mapping, no visible change yet ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net>
*	percpu: drop pcpu_chunk->page[]	Tejun Heo	2009-07-04	3	-249/+400
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	percpu core doesn't need to tack all the allocated pages. It needs to know whether certain pages are populated and a way to reverse map address to page when freeing. This patch drops pcpu_chunk->page[] and use populated bitmap and vmalloc_to_page() lookup instead. Using vmalloc_to_page() exclusively is also possible but complicates first chunk handling, inflates cache footprint and prevents non-standard memory allocation for percpu memory. pcpu_chunk->page[] was used to track each page's allocation and allowed asymmetric population which happens during failure path; however, with single bitmap for all units, this is no longer possible. Bite the bullet and rewrite (de)populate functions so that things are done in clearly separated steps such that asymmetric population doesn't happen. This makes the (de)population process much more modular and will also ease implementing non-standard memory usage in the future (e.g. large pages). This makes @get_page_fn parameter to pcpu_setup_first_chunk() unnecessary. The parameter is dropped and all first chunk helpers are updated accordingly. Please note that despite the volume most changes to first chunk helpers are symbol renames for variables which don't need to be referenced outside of the helper anymore. This change reduces memory usage and cache footprint of pcpu_chunk. Now only #unit_pages bits are necessary per chunk. [ Impact: reduced memory usage and cache footprint for bookkeeping ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: David Miller <davem@davemloft.net>
*	percpu: reorder a few functions in mm/percpu.c	Tejun Heo	2009-07-04	1	-45/+45
\| \| \| \| \| \| \| \| \| \| \|	(de)populate functions are about to be reimplemented to drop pcpu_chunk->page array. Move a few functions so that the rewrite patch doesn't have code movement making it more difficult to read. [ Impact: code movement ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu>
*	percpu: simplify pcpu_setup_first_chunk()	Tejun Heo	2009-07-04	3	-78/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that all first chunk allocator helpers allocate and map the first chunk themselves, there's no need to have optional default alloc/map in pcpu_setup_first_chunk(). Drop @populate_pte_fn and only leave @dyn_size optional and make all other params mandatory. This makes it much easier to follow what pcpu_setup_first_chunk() is doing and what actual differences tweaking each parameter results in. [ Impact: drop unused code path ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu>
*	x86,percpu: generalize lpage first chunk allocator	Tejun Heo	2009-07-04	5	-171/+244
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Generalize and move x86 setup_pcpu_lpage() into pcpu_lpage_first_chunk(). setup_pcpu_lpage() now is a simple wrapper around the generalized version. Other than taking size parameters and using arch supplied callbacks to allocate/free/map memory, pcpu_lpage_first_chunk() is identical to the original implementation. This simplifies arch code and will help converting more archs to dynamic percpu allocator. While at it, factor out pcpu_calc_fc_sizes() which is common to pcpu_embed_first_chunk() and pcpu_lpage_first_chunk(). [ Impact: code reorganization and generalization ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu>
*	percpu: make 4k first chunk allocator map memory	Tejun Heo	2009-07-04	1	-17/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	At first, percpu first chunk was always setup page-by-page by the generic code. To add other allocators, different parts of the generic initialization was made optional. Now we have three allocators - embed, remap and 4k. embed and remap fully handle allocation and mapping of the first chunk while 4k still depends on generic code for those. This makes the generic alloc/map paths specifci to 4k and makes the code unnecessary complicated with optional generic behaviors. This patch makes the 4k allocator to allocate and map memory directly instead of depending on the generic code. The only outside visible change is that now dynamic area in the first chunk is allocated up-front instead of on-demand. This doesn't make any meaningful difference as the area is minimal (usually less than a page, just enough to fill the alignment) on 4k allocator. Plus, dynamic area in the first chunk usually gets fully used anyway. This will allow simplification of pcpu_setpu_first_chunk() and removal of chunk->page array. [ Impact: no outside visible change other than up-front allocation of dyn area ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu>
*	x86,percpu: generalize 4k first chunk allocator	Tejun Heo	2009-07-04	3	-62/+113
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Generalize and move x86 setup_pcpu_4k() into pcpu_4k_first_chunk(). setup_pcpu_4k() now is a simple wrapper around the generalized version. Other than taking size parameters and using arch supplied callbacks to allocate/free memory, pcpu_4k_first_chunk() is identical to the original implementation. This simplifies arch code and will help converting more archs to dynamic percpu allocator. While at it, s/pcpu_populate_pte_fn_t/pcpu_fc_populate_pte_fn_t/ for consistency. [ Impact: code reorganization and generalization ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu>
*	percpu: drop @unit_size from embed first chunk allocator	Tejun Heo	2009-07-04	3	-14/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The only extra feature @unit_size provides is making dead space at the end of the first chunk which doesn't have any valid usecase. Drop the parameter. This will increase consistency with generalized 4k allocator. James Bottomley spotted missing conversion for the default setup_per_cpu_areas() which caused build breakage on all arcsh which use it. [ Impact: drop unused code path ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: Ingo Molnar <mingo@elte.hu>
*	x86: make pcpu_chunk_addr_search() matching stricter	Tejun Heo	2009-07-04	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	The @addr passed into pcpu_chunk_addr_search() is unit0 based address and thus should be matched inside unit0 area. Currently, when it uses chunk size when determining whether the address falls in the first chunk. Addresses in unitN where N>0 shouldn't be passed in anyway, so this doesn't cause any malfunction but fix it for consistency. [ Impact: mostly cleanup ] Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Ingo Molnar <mingo@elte.hu>
*	Merge branch 'master' into for-next	Tejun Heo	2009-07-04	1365	-17518/+55214
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull linus#master to merge PER_CPU_DEF_ATTRIBUTES and alpha build fix changes. As alpha in percpu tree uses 'weak' attribute instead of inline assembly, there's no need for __used attribute. Conflicts: arch/alpha/include/asm/percpu.h arch/mn10300/kernel/vmlinux.lds.S include/linux/percpu-defs.h
\| *	Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify	Linus Torvalds	2009-07-03	1	-0/+3
\| \|\ \| \| \| \| \| \| \| \| \| \| \| \|	* 'for-linus' of git://git.infradead.org/users/eparis/notify: fs/notify/inotify: decrement user inotify count on close
\| \| *	fs/notify/inotify: decrement user inotify count on close	Keith Packard	2009-07-02	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The per-user inotify_devs value is incremented each time a new file is allocated, but never decremented. This led to inotify_init failing after a limited number of calls. Signed-off-by: Keith Packard <keithp@keithp.com> Signed-off-by: Eric Paris <eparis@redhat.com>
\| * \|	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable	Linus Torvalds	2009-07-03	8	-193/+423
\| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: fix error message formatting Btrfs: fix use after free in btrfs_start_workers fail path Btrfs: honor nodatacow/sum mount options for new files Btrfs: update backrefs while dropping snapshot Btrfs: account for space we may use in fallocate Btrfs: fix the file clone ioctl for preallocated extents Btrfs: don't log the inode in file_write while growing the file
\| \| * \|	Btrfs: fix error message formatting	Hu Tao	2009-07-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make an error msg look nicer by inserting a space between number and word. Signed-off-by: Hu Tao <hu.taoo@gmail.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| \| * \|	Btrfs: fix use after free in btrfs_start_workers fail path	Jiri Slaby	2009-07-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	worker memory is already freed on one fail path in btrfs_start_workers, but is still dereferenced. Switch the dereference and kfree. Signed-off-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| \| * \|	Btrfs: honor nodatacow/sum mount options for new files	Chris Mason	2009-07-02	1	-6/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The btrfs attr patches unconditionally inherited the inode flags field without honoring nodatacow and nodatasum. This fix makes sure we properly record the nodatacow/sum mount options in new inodes. Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| \| * \|	Btrfs: update backrefs while dropping snapshot	Yan Zheng	2009-07-02	4	-181/+395
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The new backref format has restriction on type of backref item. If a tree block isn't referenced by its owner tree, full backrefs must be used for the pointers in it. When a tree block loses its owner tree's reference, backrefs for the pointers in it should be updated to full backrefs. Current btrfs_drop_snapshot misses the code that updates backrefs, so it's unsafe for general use. This patch adds backrefs update code to btrfs_drop_snapshot. It isn't a problem in the restricted form btrfs_drop_snapshot is used today, but for general snapshot deletion this update is required. Signed-off-by: Yan Zheng <zheng.yan@oracle.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| \| * \|	Btrfs: account for space we may use in fallocate	Josef Bacik	2009-07-02	1	-1/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using Eric Sandeen's xfstest for fallocate, you can easily trigger a ENOSPC panic on btrfs. This is because we do not account for data we may use when doing the fallocate. This patch fixes the problem by properly reserving space, and then just freeing it when we are done. The reservation stuff was made with delalloc in mind, so its a little crude for this case, but it keeps the box from panicing. Signed-off-by: Josef Bacik <jbacik@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| \| * \|	Btrfs: fix the file clone ioctl for preallocated extents	Chris Mason	2009-07-02	1	-2/+4
\| \| \| \|
\| \| * \|	Btrfs: don't log the inode in file_write while growing the file	Chris Mason	2009-07-02	1	-1/+4
\| \| \| \|
\| * \| \|	Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6	Linus Torvalds	2009-07-03	6	-13/+20
\| \|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-rc-fixes-2.6: [SCSI] cxgb3i: fix connection error when vlan is enabled [SCSI] FC transport: Locking fix for common-code FC pass-through patch [SCSI] zalon: fix oops on attach failure [SCSI] fnic: use DMA_BIT_MASK(nn) instead of deprecated DMA_nnBIT_MASK [SCSI] fnic: remove redundant BUG_ONs and fix checks on unsigned [SCSI] ibmvscsi: Fix module load hang
\| \| * \| \|	[SCSI] cxgb3i: fix connection error when vlan is enabled	Karen Xie	2009-06-27	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is a bug when VLAN is configured on the cxgb3 interface, the iscsi conn. would be denied with message "cxgb3i: NOT going through cxgbi device." This patch adds code to get the real egress net_device when vlan is configured. Signed-off-by: Karen Xie <kxie@chelsio.com> Reviewed-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
\| \| * \| \|	[SCSI] FC transport: Locking fix for common-code FC pass-through patch	Christof Schmitt	2009-06-26	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix this: ------------[ cut here ]------------ Badness at block/blk-core.c:244 CPU: 0 Tainted: G W 2.6.31-rc1-00004-gd3a263a #3 Process zfcp_wq (pid: 901, task: 000000002fb7a038, ksp: 000000002f02bc78) Krnl PSW : 0704300180000000 00000000002141ba (blk_remove_plug+0xb2/0xb8) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:3 PM:0 EA:3 Krnl GPRS: 0000000000000001 0000000000000001 0000000022811440 0000000022811798 000000000027ff4e 0000000000000000 0000000000000000 000000002f00f000 070000000006a0f4 000000002af70000 000000002af2a800 00000000228d1c00 0000000022811440 000000000050c708 000000002f02bca8 000000002f02bc80 Krnl Code: 00000000002141b0: b9140022 lgfr %r2,%r2 00000000002141b4: 07fe bcr 15,%r14 00000000002141b6: a7f40001 brc 15,2141b8 >00000000002141ba: a7f4ffbe brc 15,214136 00000000002141be: 0707 bcr 0,%r7 00000000002141c0: ebaff0680024 stmg %r10,%r15,104(%r15) 00000000002141c6: c0d00017c2a9 larl %r13,50c718 00000000002141cc: a7f13fc0 tmll %r15,16320 Call Trace: ([<000000000050e7d8>] C.272.16122+0x88/0x110) [<00000000002141ec>] __blk_run_queue+0x2c/0x154 [<000000000028013a>] fc_remote_port_add+0x85e/0x95c [<000000000037596e>] zfcp_scsi_rport_work+0xe6/0x148 [<000000000006908c>] worker_thread+0x25c/0x318 [<000000000006f10c>] kthread+0x94/0x9c [<000000000001c2b2>] kernel_thread_starter+0x6/0xc [<000000000001c2ac>] kernel_thread_starter+0x0/0xc INFO: lockdep is turned off. Last Breaking-Event-Address: [<00000000002141b6>] blk_remove_plug+0xae/0xb8 The FC pass-through support triggers the WARN_ON(!irqs_disabled()) in blk_plug_device. Since blk_plug_device requires being called with disabled interrupts, use spin_lock_irqsave in fc_bsg_goose_queue to disable the interrupts before calling into the block layer. Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com> Acked-by: James Smart <james.smart@emulex.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
\| \| * \| \|	[SCSI] zalon: fix oops on attach failure	James Bottomley	2009-06-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I recently discovered on my zalon that if the attachment fails because of a bus misconfiguration (I scrapped my HVD array, so the card is now unterminated) then the system oopses. The reason is that if ncr_attach() returns NULL (signalling failure) that NULL is passed by the goto failed straight into ncr_detach() which oopses. The fix is just to return -ENODEV in this case. Cc: Stable Tree <stable@kernel.org> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
\| \| * \| \|	[SCSI] fnic: use DMA_BIT_MASK(nn) instead of deprecated DMA_nnBIT_MASK	Abhijeet Joglekar	2009-06-25	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Robert Love reported warning while building fnic_main.c: drivers/scsi/fnic/fnic_main.c:478: warning: `DMA_nnBIT_MASK' is deprecated. Replaced use of DMA_nnBIT_MASK by DMA_BIT_MASK(nn) Signed-off-by: Abhijeet Joglekar <abjoglek@cisco.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
\| \| * \| \|	[SCSI] fnic: remove redundant BUG_ONs and fix checks on unsigned	Roel Kluin	2009-06-25	1	-5/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The shost sg tablesize is set to FNIC_MAX_SG_DESC_CNT and fnic uses scsi_dma_map, so both BUG_ONs can be removed. scsi_dma_map may return -ENOMEM, sg_count should be int to catch that. Signed-off-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
\| \| * \| \|	[SCSI] ibmvscsi: Fix module load hang	Brian King	2009-06-25	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fixes a regression seen in the ibmvscsi driver when using the VSCSI server in SLES 9 and SLES 10. The VSCSI server in these releases has a bug in it in which it does not send responses to unknown MADs. Check the OS Type field in the adapter info response and do not send these unsupported commands when talking to an older server. Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
\| * \| \| \|	Merge git://git.infradead.org/iommu-2.6	Linus Torvalds	2009-07-03	3	-359/+353
\| \|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://git.infradead.org/iommu-2.6: (38 commits) intel-iommu: Don't keep freeing page zero in dma_pte_free_pagetable() intel-iommu: Introduce first_pte_in_page() to simplify PTE-setting loops intel-iommu: Use cmpxchg64_local() for setting PTEs intel-iommu: Warn about unmatched unmap requests intel-iommu: Kill superfluous mapping_lock intel-iommu: Ensure that PTE writes are 64-bit atomic, even on i386 intel-iommu: Make iommu=pt work on i386 too intel-iommu: Performance improvement for dma_pte_free_pagetable() intel-iommu: Don't free too much in dma_pte_free_pagetable() intel-iommu: dump mappings but don't die on pte already set intel-iommu: Combine domain_pfn_mapping() and domain_sg_mapping() intel-iommu: Introduce domain_sg_mapping() to speed up intel_map_sg() intel-iommu: Simplify __intel_alloc_iova() intel-iommu: Performance improvement for domain_pfn_mapping() intel-iommu: Performance improvement for dma_pte_clear_range() intel-iommu: Clean up iommu_domain_identity_map() intel-iommu: Remove last use of PHYSICAL_PAGE_MASK, for reserving PCI BARs intel-iommu: Make iommu_flush_iotlb_psi() take pfn as argument intel-iommu: Change aligned_size() to aligned_nrpages() intel-iommu: Clean up intel_map_sg(), remove domain_page_mapping() ...
\| \| * \| \| \|	intel-iommu: Don't keep freeing page zero in dma_pte_free_pagetable()	David Woodhouse	2009-07-02	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Check dma_pte_present() and only free the page if there _is_ one. Kind of surprising that there was no warning about this. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
\| \| * \| \| \|	intel-iommu: Introduce first_pte_in_page() to simplify PTE-setting loops	David Woodhouse	2009-07-02	1	-11/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	On Wed, 2009-07-01 at 16:59 -0700, Linus Torvalds wrote: > I also _really_ hate how you do > > (unsigned long)pte >> VTD_PAGE_SHIFT == > (unsigned long)first_pte >> VTD_PAGE_SHIFT Kill this, in favour of just looking to see if the incremented pte pointer has 'wrapped' onto the next page. Which means we have to check it _after_ incrementing it, not before. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
\| \| * \| \| \|	intel-iommu: Use cmpxchg64_local() for setting PTEs	David Woodhouse	2009-07-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
\| \| * \| \| \|	intel-iommu: Warn about unmatched unmap requests	David Woodhouse	2009-07-01	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This would have found the bug in i386 pci_unmap_addr() a long time ago. We shouldn't just silently return without doing anything. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
\| \| * \| \| \|	intel-iommu: Kill superfluous mapping_lock	David Woodhouse	2009-07-01	1	-10/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since we're using cmpxchg64() anyway (because that's the only way to do an atomic 64-bit store on i386), we might as well ditch the extra locking and just use cmpxchg64() to ensure that we don't add the page twice. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
\| \| * \| \| \|	intel-iommu: Ensure that PTE writes are 64-bit atomic, even on i386	David Woodhouse	2009-07-01	1	-14/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
\| \| * \| \| \|	intel-iommu: Make iommu=pt work on i386 too	David Woodhouse	2009-07-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
\| \| * \| \| \|	intel-iommu: Performance improvement for dma_pte_free_pagetable()	David Woodhouse	2009-06-30	1	-10/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As with other functions, batch the CPU data cache flushes and don't keep recalculating PTE addresses. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
\| \| * \| \| \|	intel-iommu: Don't free too much in dma_pte_free_pagetable()	David Woodhouse	2009-06-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The loop condition was wrong -- we should free a PMD only if its _entire_ range is within the range we're intending to clear. The early-termination condition was right, but not the loop. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
\| \| * \| \| \|	intel-iommu: dump mappings but don't die on pte already set	David Woodhouse	2009-06-30	1	-1/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
\| \| * \| \| \|	intel-iommu: Combine domain_pfn_mapping() and domain_sg_mapping()	David Woodhouse	2009-06-30	1	-40/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
\| \| * \| \| \|	intel-iommu: Introduce domain_sg_mapping() to speed up intel_map_sg()	David Woodhouse	2009-06-30	1	-21/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of calling domain_pfn_mapping() repeatedly with single or small numbers of pages, just pass the sglist in. It can optimise the number of cache flushes like domain_pfn_mapping() does, and gives a huge speedup for large scatterlists. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
\| \| * \| \| \|	intel-iommu: Simplify __intel_alloc_iova()	David Woodhouse	2009-06-29	1	-31/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There's no need for the separate iommu_alloc_iova() function, and certainly not for it to be global. Remove the underscores while we're at it. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
\| \| * \| \| \|	intel-iommu: Performance improvement for domain_pfn_mapping()	David Woodhouse	2009-06-29	1	-9/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As with dma_pte_clear_range(), don't keep flushing a single PTE at a time. And also micro-optimise the setting of PTE values rather than using the helper functions to do all the masking. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
\| \| * \| \| \|	intel-iommu: Performance improvement for dma_pte_clear_range()	David Woodhouse	2009-06-29	1	-16/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's a bit silly to repeatedly call domain_flush_cache() for each PTE individually, as we clear it. Instead, batch them up and flush a whole range at a time. We might as well refrain from recalculating the PTE address from scratch each time round the loop too. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
\| \| * \| \| \|	intel-iommu: Clean up iommu_domain_identity_map()	David Woodhouse	2009-06-29	1	-15/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
\| \| * \| \| \|	intel-iommu: Remove last use of PHYSICAL_PAGE_MASK, for reserving PCI BARs	David Woodhouse	2009-06-29	1	-10/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is fairly broken anyway -- it doesn't take hotplug into account. We should probably be checking page_is_ram() instead. Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
\| \| * \| \| \|	intel-iommu: Make iommu_flush_iotlb_psi() take pfn as argument	David Woodhouse	2009-06-29	1	-12/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Most of its callers are having to shift for themselves anyway, so we might as well do it in iommu_flush_iotlb_psi(). Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
\| \| * \| \| \|	intel-iommu: Change aligned_size() to aligned_nrpages()	David Woodhouse	2009-06-29	1	-9/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
\| \| * \| \| \|	intel-iommu: Clean up intel_map_sg(), remove domain_page_mapping()	David Woodhouse	2009-06-29	1	-35/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>
\| \| * \| \| \|	intel-iommu: Use domain_pfn_mapping() in intel_iommu_map_range()	David Woodhouse	2009-06-29	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: David Woodhouse <David.Woodhouse@intel.com>