summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* percpu: fix per_cpu_ptr_to_phys() handling of non-page-aligned addressesEugene Surovegin2011-12-151-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | per_cpu_ptr_to_phys() incorrectly rounds up its result for non-kmalloc case to the page boundary, which is bogus for any non-page-aligned address. This affects the only in-tree user of this function - sysfs handler for per-cpu 'crash_notes' physical address. The trouble is that the crash_notes per-cpu variable is not page-aligned: crash_notes = 0xc08e8ed4 PER-CPU OFFSET VALUES: CPU 0: 3711f000 CPU 1: 37129000 CPU 2: 37133000 CPU 3: 3713d000 So, the per-cpu addresses are: crash_notes on CPU 0: f7a07ed4 => phys 36b57ed4 crash_notes on CPU 1: f7a11ed4 => phys 36b4ded4 crash_notes on CPU 2: f7a1bed4 => phys 36b43ed4 crash_notes on CPU 3: f7a25ed4 => phys 36b39ed4 However, /sys/devices/system/cpu/cpu*/crash_notes says: /sys/devices/system/cpu/cpu0/crash_notes: 36b57000 /sys/devices/system/cpu/cpu1/crash_notes: 36b4d000 /sys/devices/system/cpu/cpu2/crash_notes: 36b43000 /sys/devices/system/cpu/cpu3/crash_notes: 36b39000 As you can see, all values are rounded down to a page boundary. Consequently, this is where kexec sets up the NOTE segments, and thus where the secondary kernel is looking for them. However, when the first kernel crashes, it saves the notes to the unaligned addresses, where they are not found. Fix it by adding offset_in_page() to the translated page address. -tj: Combined Eugene's and Petr's commit messages. Signed-off-by: Eugene Surovegin <ebs@ebshome.net> Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Petr Tesarik <ptesarik@suse.cz> Cc: stable@kernel.org
* Merge branch 'stable/for-linus-fixes-3.2' of ↵Linus Torvalds2011-12-152-5/+17
|\ | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen * 'stable/for-linus-fixes-3.2' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen: xen/swiotlb: Use page alignment for early buffer allocation. xen: only limit memory map to maximum reservation for domain 0.
| * xen/swiotlb: Use page alignment for early buffer allocation.Konrad Rzeszutek Wilk2011-12-151-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes an odd bug found on a Dell PowerEdge 1850/0RC130 (BIOS A05 01/09/2006) where all of the modules doing pci_set_dma_mask would fail with: ata_piix 0000:00:1f.1: enabling device (0005 -> 0007) ata_piix 0000:00:1f.1: can't derive routing for PCI INT A ata_piix 0000:00:1f.1: BMDMA: failed to set dma mask, falling back to PIO The issue was the Xen-SWIOTLB was allocated such as that the end of buffer was stradling a page (and also above 4GB). The fix was spotted by Kalev Leonid which was to piggyback on git commit e79f86b2ef9c0a8c47225217c1018b7d3d90101c "swiotlb: Use page alignment for early buffer allocation" which: We could call free_bootmem_late() if swiotlb is not used, and it will shrink to page alignment. So alloc them with page alignment at first, to avoid lose two pages And doing that fixes the outstanding issue. CC: stable@kernel.org Suggested-by: "Kalev, Leonid" <Leonid.Kalev@ca.com> Reported-and-Tested-by: "Taylor, Neal E" <Neal.Taylor@ca.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
| * xen: only limit memory map to maximum reservation for domain 0.Ian Campbell2011-12-151-3/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | d312ae878b6a "xen: use maximum reservation to limit amount of usable RAM" clamped the total amount of RAM to the current maximum reservation. This is correct for dom0 but is not correct for guest domains. In order to boot a guest "pre-ballooned" (e.g. with memory=1G but maxmem=2G) in order to allow for future memory expansion the guest must derive max_pfn from the e820 provided by the toolstack and not the current maximum reservation (which can reflect only the current maximum, not the guest lifetime max). The existing algorithm already behaves this correctly if we do not artificially limit the maximum number of pages for the guest case. For a guest booted with maxmem=512, memory=128 this results in: [ 0.000000] BIOS-provided physical RAM map: [ 0.000000] Xen: 0000000000000000 - 00000000000a0000 (usable) [ 0.000000] Xen: 00000000000a0000 - 0000000000100000 (reserved) -[ 0.000000] Xen: 0000000000100000 - 0000000008100000 (usable) -[ 0.000000] Xen: 0000000008100000 - 0000000020800000 (unusable) +[ 0.000000] Xen: 0000000000100000 - 0000000020800000 (usable) ... [ 0.000000] NX (Execute Disable) protection: active [ 0.000000] DMI not present or invalid. [ 0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved) [ 0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable) -[ 0.000000] last_pfn = 0x8100 max_arch_pfn = 0x1000000 +[ 0.000000] last_pfn = 0x20800 max_arch_pfn = 0x1000000 [ 0.000000] initial memory mapped : 0 - 027ff000 [ 0.000000] Base memory trampoline at [c009f000] 9f000 size 4096 -[ 0.000000] init_memory_mapping: 0000000000000000-0000000008100000 -[ 0.000000] 0000000000 - 0008100000 page 4k -[ 0.000000] kernel direct mapping tables up to 8100000 @ 27bb000-27ff000 +[ 0.000000] init_memory_mapping: 0000000000000000-0000000020800000 +[ 0.000000] 0000000000 - 0020800000 page 4k +[ 0.000000] kernel direct mapping tables up to 20800000 @ 26f8000-27ff000 [ 0.000000] xen: setting RW the range 27e8000 - 27ff000 [ 0.000000] 0MB HIGHMEM available. -[ 0.000000] 129MB LOWMEM available. -[ 0.000000] mapped low ram: 0 - 08100000 -[ 0.000000] low ram: 0 - 08100000 +[ 0.000000] 520MB LOWMEM available. +[ 0.000000] mapped low ram: 0 - 20800000 +[ 0.000000] low ram: 0 - 20800000 With this change "xl mem-set <domain> 512M" will successfully increase the guest RAM (by reducing the balloon). There is no change for dom0. Reported-and-Tested-by: George Shuklin <george.shuklin@gmail.com> Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: stable@kernel.org Reviewed-by: David Vrabel <david.vrabel@citrix.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
* | Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds2011-12-151-0/+10
|\ \ | | | | | | | | | | | | * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: drm/radeon/kms: add some new pci ids
| * | drm/radeon/kms: add some new pci idsAlex Deucher2011-12-141-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=43739 Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@kernel.org Signed-off-by: Dave Airlie <airlied@redhat.com>
* | | Merge tag 'tytso-for-linus-20111214' of ↵Linus Torvalds2011-12-154-55/+31
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 * tag 'tytso-for-linus-20111214' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: handle EOF correctly in ext4_bio_write_page() ext4: remove a wrong BUG_ON in ext4_ext_convert_to_initialized ext4: correctly handle pages w/o buffers in ext4_discard_partial_buffers() ext4: avoid potential hang in mpage_submit_io() when blocksize < pagesize ext4: avoid hangs in ext4_da_should_update_i_disksize() ext4: display the correct mount option in /proc/mounts for [no]init_itable ext4: Fix crash due to getting bogus eh_depth value on big-endian systems ext4: fix ext4_end_io_dio() racing against fsync() .. using the new signed tag merge of git that now verifies the gpg signature automatically. Yay. The branchname was just 'dev', which is prettier. I'll tell Ted to use nicer tag names for future cases.
| * | | ext4: handle EOF correctly in ext4_bio_write_page()Yongqiang Yang2011-12-141-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to zero out part of a page which beyond EOF before setting uptodate, otherwise, mapread or write will see non-zero data beyond EOF. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
| * | | ext4: remove a wrong BUG_ON in ext4_ext_convert_to_initializedYongqiang Yang2011-12-141-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a file is fallocated on a hole, map->m_lblk + map->m_len may be greater than ee_block + ee_len. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
| * | | ext4: correctly handle pages w/o buffers in ext4_discard_partial_buffers()Yongqiang Yang2011-12-141-39/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a page has been read into memory and never been written, it has no buffers, but we should handle the page in truncate or punch hole. VFS code of writing operations has handled holes correctly, so this patch removes the code handling holes in writing operations. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
| * | | ext4: avoid potential hang in mpage_submit_io() when blocksize < pagesizeYongqiang Yang2011-12-141-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If there is an unwritten but clean buffer in a page and there is a dirty buffer after the buffer, then mpage_submit_io does not write the dirty buffer out. As a result, da_writepages loops forever. This patch fixes the problem by checking dirty flag. Signed-off-by: Yongqiang Yang <xiaoqiangnk@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
| * | | ext4: avoid hangs in ext4_da_should_update_i_disksize()Andrea Arcangeli2011-12-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the pte mapping in generic_perform_write() is unmapped between iov_iter_fault_in_readable() and iov_iter_copy_from_user_atomic(), the "copied" parameter to ->end_write can be zero. ext4 couldn't cope with it with delayed allocations enabled. This skips the i_disksize enlargement logic if copied is zero and no new data was appeneded to the inode. gdb> bt #0 0xffffffff811afe80 in ext4_da_should_update_i_disksize (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x1\ 08000, len=0x1000, copied=0x0, page=0xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2467 #1 ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\ xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512 #2 0xffffffff810d97f1 in generic_perform_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value o\ ptimized out>, pos=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2440 #3 generic_file_buffered_write (iocb=<value optimized out>, iov=<value optimized out>, nr_segs=<value optimized out>, p\ os=0x108000, ppos=0xffff88001e26be40, count=<value optimized out>, written=0x0) at mm/filemap.c:2482 #4 0xffffffff810db5d1 in __generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, ppos=0\ xffff88001e26be40) at mm/filemap.c:2600 #5 0xffffffff810db853 in generic_file_aio_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=<value optimi\ zed out>, pos=<value optimized out>) at mm/filemap.c:2632 #6 0xffffffff811a71aa in ext4_file_write (iocb=0xffff88001e26bde8, iov=0xffff88001e26bec8, nr_segs=0x1, pos=0x108000) a\ t fs/ext4/file.c:136 #7 0xffffffff811375aa in do_sync_write (filp=0xffff88003f606a80, buf=<value optimized out>, len=<value optimized out>, \ ppos=0xffff88001e26bf48) at fs/read_write.c:406 #8 0xffffffff81137e56 in vfs_write (file=0xffff88003f606a80, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x4\ 000, pos=0xffff88001e26bf48) at fs/read_write.c:435 #9 0xffffffff8113816c in sys_write (fd=<value optimized out>, buf=0x1ec2960 <Address 0x1ec2960 out of bounds>, count=0x\ 4000) at fs/read_write.c:487 #10 <signal handler called> #11 0x00007f120077a390 in __brk_reservation_fn_dmi_alloc__ () #12 0x0000000000000000 in ?? () gdb> print offset $22 = 0xffffffffffffffff gdb> print idx $23 = 0xffffffff gdb> print inode->i_blkbits $24 = 0xc gdb> up #1 ext4_da_write_end (file=0xffff88003f606a80, mapping=0xffff88001d3824e0, pos=0x108000, len=0x1000, copied=0x0, page=0\ xffffea0000d792e8, fsdata=0x0) at fs/ext4/inode.c:2512 2512 if (ext4_da_should_update_i_disksize(page, end)) { gdb> print start $25 = 0x0 gdb> print end $26 = 0xffffffffffffffff gdb> print pos $27 = 0x108000 gdb> print new_i_size $28 = 0x108000 gdb> print ((struct ext4_inode_info *)((char *)inode-((int)(&((struct ext4_inode_info *)0)->vfs_inode))))->i_disksize $29 = 0xd9000 gdb> down 2467 for (i = 0; i < idx; i++) gdb> print i $30 = 0xd44acbee This is 100% reproducible with some autonuma development code tuned in a very aggressive manner (not normal way even for knumad) which does "exotic" changes to the ptes. It wouldn't normally trigger but I don't see why it can't happen normally if the page is added to swap cache in between the two faults leading to "copied" being zero (which then hangs in ext4). So it should be fixed. Especially possible with lumpy reclaim (albeit disabled if compaction is enabled) as that would ignore the young bits in the ptes. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
| * | | ext4: display the correct mount option in /proc/mounts for [no]init_itableTheodore Ts'o2011-12-131-9/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | /proc/mounts was showing the mount option [no]init_inode_table when the correct mount option that will be accepted by parse_options() is [no]init_itable. Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
| * | | ext4: Fix crash due to getting bogus eh_depth value on big-endian systemsPaul Mackerras2011-12-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 1939dd84b3 ("ext4: cleanup ext4_ext_grow_indepth code") added a reference to ext4_extent_header.eh_depth, but forget to pass the value read through le16_to_cpu. The result is a crash on big-endian machines, such as this crash on a POWER7 server: attempt to access beyond end of device sda8: rw=0, want=776392648163376, limit=168558560 Unable to handle kernel paging request for data at address 0x6b6b6b6b6b6b6bcb Faulting instruction address: 0xc0000000001f5f38 cpu 0x14: Vector: 300 (Data Access) at [c000001bd1aaecf0] pc: c0000000001f5f38: .__brelse+0x18/0x60 lr: c0000000002e07a4: .ext4_ext_drop_refs+0x44/0x80 sp: c000001bd1aaef70 msr: 9000000000009032 dar: 6b6b6b6b6b6b6bcb dsisr: 40000000 current = 0xc000001bd15b8010 paca = 0xc00000000ffe4600 pid = 19911, comm = flush-8:0 enter ? for help [c000001bd1aaeff0] c0000000002e07a4 .ext4_ext_drop_refs+0x44/0x80 [c000001bd1aaf090] c0000000002e0c58 .ext4_ext_find_extent+0x408/0x4c0 [c000001bd1aaf180] c0000000002e145c .ext4_ext_insert_extent+0x2bc/0x14c0 [c000001bd1aaf2c0] c0000000002e3fb8 .ext4_ext_map_blocks+0x628/0x1710 [c000001bd1aaf420] c0000000002b2974 .ext4_map_blocks+0x224/0x310 [c000001bd1aaf4d0] c0000000002b7f2c .mpage_da_map_and_submit+0xbc/0x490 [c000001bd1aaf5a0] c0000000002b8688 .write_cache_pages_da+0x2c8/0x430 [c000001bd1aaf720] c0000000002b8b28 .ext4_da_writepages+0x338/0x670 [c000001bd1aaf8d0] c000000000157280 .do_writepages+0x40/0x90 [c000001bd1aaf940] c0000000001ea830 .writeback_single_inode+0xe0/0x530 [c000001bd1aafa00] c0000000001eb680 .writeback_sb_inodes+0x210/0x300 [c000001bd1aafb20] c0000000001ebc84 .__writeback_inodes_wb+0xd4/0x140 [c000001bd1aafbe0] c0000000001ebfec .wb_writeback+0x2fc/0x3e0 [c000001bd1aafce0] c0000000001ed770 .wb_do_writeback+0x2f0/0x300 [c000001bd1aafdf0] c0000000001ed848 .bdi_writeback_thread+0xc8/0x340 [c000001bd1aafed0] c0000000000c5494 .kthread+0xb4/0xc0 [c000001bd1aaff90] c000000000021f48 .kernel_thread+0x54/0x70 This is due to getting ext_depth(inode) == 0x101 and therefore running off the end of the path array in ext4_ext_drop_refs into following unallocated structures. This fixes it by adding the necessary le16_to_cpu. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | | ext4: fix ext4_end_io_dio() racing against fsync()Theodore Ts'o2011-12-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to make sure iocb->private is cleared *before* we put the io_end structure on i_completed_io_list. Otherwise fsync() could potentially run on another CPU and free the iocb structure out from under us. Reported-by: Kent Overstreet <koverstreet@google.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org
* | | | Merge branch 'for-linus' of ↵Linus Torvalds2011-12-152-2/+7
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse: fuse: llseek fix race fuse: fix llseek bug fuse: fix fuse_retrieve
| * | | | fuse: llseek fix raceMiklos Szeredi2011-12-131-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix race between lseek(fd, 0, SEEK_CUR) and read/write. This was fixed in generic code by commit 5b6f1eb97d (vfs: lseek(fd, 0, SEEK_CUR) race condition). Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
| * | | | fuse: fix llseek bugRoel Kluin2011-12-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The test in fuse_file_llseek() "not SEEK_CUR or not SEEK_SET" always evaluates to true. This was introduced in 3.1 by commit 06222e49 (fs: handle SEEK_HOLE/SEEK_DATA properly in all fs's that define their own llseek) and changed the behavior of SEEK_CUR and SEEK_SET to always retrieve the file attributes. This is a performance regression. Fix the test so that it makes sense. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@vger.kernel.org CC: Josef Bacik <josef@redhat.com> CC: Al Viro <viro@zeniv.linux.org.uk>
| * | | | fuse: fix fuse_retrieveMiklos Szeredi2011-12-131-1/+2
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix two bugs in fuse_retrieve(): - retrieving more than one page would yield repeated instances of the first page - if more than FUSE_MAX_PAGES_PER_REQ pages were requested than the request page array would overflow fuse_retrieve() was added in 2.6.36 and these bugs had been there since the beginning. Signed-off-by: Miklos Szeredi <mszeredi@suse.cz> CC: stable@vger.kernel.org
* | | | Merge branch 'for-linus' of ↵Linus Torvalds2011-12-158-61/+48
|\ \ \ \ | |_|/ / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: fs/ncpfs: fix error paths and goto statements in ncp_fill_super() configfs: register_filesystem() called too early fuse: register_filesystem() called too early ubifs: too early register_filesystem() ... and the same kind of leak for mqueue procfs: fix a vfsmount longterm reference leak
| * | | fs/ncpfs: fix error paths and goto statements in ncp_fill_super()Djalal Harouni2011-12-141-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The label 'out_bdi' should be followed by bdi_destroy() instead of fput() which should be after the 'out_fput' label. If bdi_setup_and_register() fails then jump to the 'out_fput' label instead of the 'out_bdi' one. If fget(data.info_fd) fails then jump to the previously fixed 'out_bdi' label to call bdi_destroy() otherwise the bdi object will not be destroyed. Compile tested only. Signed-off-by: Djalal Harouni <tixxdz@opendz.org> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | configfs: register_filesystem() called too earlyAl Viro2011-12-132-20/+18
| | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | fuse: register_filesystem() called too earlyAl Viro2011-12-131-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | same story as with ubifs Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | ubifs: too early register_filesystem()Al Viro2011-12-131-10/+8
| | | | | | | | | | | | | | | | | | | | | | | | doing that before you are ready to handle mount() is a Bad Idea(tm)... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | ... and the same kind of leak for mqueueAl Viro2011-12-092-10/+3
| | | | | | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * | | procfs: fix a vfsmount longterm reference leakAl Viro2011-12-091-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | kern_mount() doesn't pair with plain mntput()... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | | Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds2011-12-147-94/+37
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Revert "x86, efi: Calling __pa() with an ioremap()ed address is invalid" x86, efi: Make efi_call_phys_{prelog,epilog} CONFIG_RELOCATABLE-aware
| * | | | Revert "x86, efi: Calling __pa() with an ioremap()ed address is invalid"Keith Packard2011-12-126-48/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This hangs my MacBook Air at boot time; I get no console messages at all. I reverted this on top of -rc5 and my machine boots again. This reverts commit e8c7106280a305e1ff2a3a8a4dfce141469fb039. Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Keith Packard <keithp@keithp.com> Acked-by: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg@redhat.com> Cc: Zhang Rui <rui.zhang@intel.com> Cc: Huang Ying <huang.ying.caritas@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: http://lkml.kernel.org/r/1321621751-3650-1-git-send-email-matt@console Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | x86, efi: Make efi_call_phys_{prelog,epilog} CONFIG_RELOCATABLE-awareMatt Fleming2011-12-101-46/+2
| | |/ / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | efi_call_phys_prelog() sets up a 1:1 mapping of the physical address range in swapper_pg_dir. Instead of replacing then restoring entries in swapper_pg_dir we should be using initial_page_table which already contains the 1:1 mapping. It's safe to blindly switch back to swapper_pg_dir in the epilog because the physical EFI routines are only called before efi_enter_virtual_mode(), e.g. before any user processes have been forked. Therefore, we don't need to track which pgd was in %cr3 when we entered the prelog. The previous code actually contained a bug because it assumed that the kernel was loaded at a physical address within the first 8MB of ram, usually at 0x100000. However, this isn't the case with a CONFIG_RELOCATABLE=y kernel which could have been loaded anywhere in the physical address space. Also delete the ancient (and bogus) comments about the page table being restored after the lock is released. There is no locking. Cc: Matthew Garrett <mjg@redhat.com> Cc: Darrent Hart <dvhart@linux.intel.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Link: http://lkml.kernel.org/r/1323346250.3894.74.camel@mfleming-mobl1.ger.corp.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* | | | Merge branch 'for-linus' of ↵Linus Torvalds2011-12-1315-329/+239
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: add missing spin_unlock at ceph_mdsc_build_path() ceph: fix SEEK_CUR, SEEK_SET regression crush: fix mapping calculation when force argument doesn't exist ceph: use i_ceph_lock instead of i_lock rbd: remove buggy rollback functionality rbd: return an error when an invalid header is read ceph: fix rasize reporting by ceph_show_options
| * | | | ceph: add missing spin_unlock at ceph_mdsc_build_path()Yehuda Sadeh2011-12-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | one of the paths was missing spin_unlock Signed-off-by: Yehuda Sadeh <yehuda@hq.newdream.net>
| * | | | ceph: fix SEEK_CUR, SEEK_SET regressionSage Weil2011-12-131-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 06222e491e663dac939f04b125c9dc52126a75c4 got the if wrong so that it always evaluates as true. This is semantically harmless, but makes SEEK_CUR and SEEK_SET needlessly query the server. Rewrite the if to explicitly enumerate the cases we DO need a valid i_size to make this code less fragile. Reported-by: Roel Kluin <roel.kluin@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
| * | | | crush: fix mapping calculation when force argument doesn't existSage Weil2011-12-121-22/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the force argument isn't valid, we should continue calculating a mapping as if it weren't specified. Signed-off-by: Sage Weil <sage@newdream.net>
| * | | | ceph: use i_ceph_lock instead of i_lockSage Weil2011-12-0711-207/+212
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have been using i_lock to protect all kinds of data structures in the ceph_inode_info struct, including lists of inodes that we need to iterate over while avoiding races with inode destruction. That requires grabbing a reference to the inode with the list lock protected, but igrab() now takes i_lock to check the inode flags. Changing the list lock ordering would be a painful process. However, using a ceph-specific i_ceph_lock in the ceph inode instead of i_lock is a simple mechanical change and avoids the ordering constraints imposed by igrab(). Reported-by: Amon Ott <a.ott@m-privacy.de> Signed-off-by: Sage Weil <sage@newdream.net>
| * | | | rbd: remove buggy rollback functionalityJosh Durgin2011-12-072-97/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This doesn't interact with resizing well, since it doesn't set the size of the device to the size at the snapshot. It's also an expensive operation to be synchronous. Rollback can still be done with the userspace rbd tool. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
| * | | | rbd: return an error when an invalid header is readJosh Durgin2011-12-071-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This protects against opening future rbd images that have incompatible format changes. Signed-off-by: Josh Durgin <josh.durgin@dreamhost.com>
| * | | | ceph: fix rasize reporting by ceph_show_optionsSage Weil2011-12-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix typo. Reported-by: mowang da <whooya.xxl@gmail.com> Signed-off-by: Sage Weil <sage@newdream.net>
* | | | | Merge branch 'writeback-for-linus' of ↵Linus Torvalds2011-12-133-6/+37
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux * 'writeback-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/wfg/linux: writeback: set max_pause to lowest value on zero bdi_dirty writeback: permit through good bdi even when global dirty exceeded writeback: comment on the bdi dirty threshold fs: Make write(2) interruptible by a fatal signal writeback: Fix issue on make htmldocs
| * | | | | writeback: set max_pause to lowest value on zero bdi_dirtyWu Fengguang2011-12-081-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some trace shows lots of bdi_dirty=0 lines where it's actually some small value if w/o the accounting errors in the per-cpu bdi stats. In this case the max pause time should really be set to the smallest (non-zero) value to avoid IO queue underrun and improve throughput. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
| * | | | | writeback: permit through good bdi even when global dirty exceededWu Fengguang2011-12-081-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On a system with 1 local mount and 1 NFS mount, if the NFS server becomes not responding when dd to the NFS mount, the NFS dirty pages may exceed the global dirty limit and _every_ task involving writing will be blocked. The whole system appears unresponsive. The workaround is to permit through the bdi's that only has a small number of dirty pages. The number chosen (bdi_stat_error pages) is not enough to enable the local disk to run in optimal throughput, however is enough to make the system responsive on a broken NFS mount. The user can then kill the dirtiers on the NFS mount and increase the global dirty limit to bring up the local disk's throughput. It risks allowing dirty pages to grow much larger than the global dirty limit when there are 1000+ mounts, however that's very unlikely to happen, especially in low memory profiles. Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
| * | | | | writeback: comment on the bdi dirty thresholdWu Fengguang2011-12-081-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We do "floating proportions" to let active devices to grow its target share of dirty pages and stalled/inactive devices to decrease its target share over time. It works well except in the case of "an inactive disk suddenly goes busy", where the initial target share may be too small. To mitigate this, bdi_position_ratio() has the below line to raise a small bdi_thresh when it's safe to do so, so that the disk be feed with enough dirty pages for efficient IO and in turn fast rampup of bdi_thresh: bdi_thresh = max(bdi_thresh, (limit - dirty) / 8); balance_dirty_pages() normally does negative feedback control which adjusts ratelimit to balance the bdi dirty pages around the target. In some extreme cases when that is not enough, it will have to block the tasks completely until the bdi dirty pages drop below bdi_thresh. Acked-by: Jan Kara <jack@suse.cz> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
| * | | | | fs: Make write(2) interruptible by a fatal signalJan Kara2011-12-021-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently write(2) to a file is not interruptible by any signal. Sometimes this is desirable, e.g. when you want to quickly kill a process hogging your disk. Also, with commit 499d05ecf990 ("mm: Make task in balance_dirty_pages() killable"), it's necessary to abort the current write accordingly to avoid it quickly dirtying lots more pages at unthrottled rate. This patch makes write interruptible by SIGKILL. We do not allow write to be interruptible by any other signal because that has larger potential of screwing some badly written applications. Reported-by: Kazuya Mio <k-mio@sx.jp.nec.com> Tested-by: Kazuya Mio <k-mio@sx.jp.nec.com> Acked-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
| * | | | | writeback: Fix issue on make htmldocsMarcos Paulo de Souza2011-11-291-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Document the @reason parameter to make "make htmldocs" happy. Acked-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Marcos Paulo de Souza <marcos.mage@gmail.com> Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
* | | | | | Merge branch 'fixes' of ↵Linus Torvalds2011-12-135-66/+99
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | http://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm * 'fixes' of http://ftp.arm.linux.org.uk/pub/linux/arm/kernel/git-cur/linux-2.6-arm: ARM: 7204/1: arch/arm/kernel/setup.c: initialize arm_dma_zone_size earlier ARM: 7185/1: perf: don't assign platform_device on unsupported CPUs ARM: 7187/1: fix unwinding for XIP kernels ARM: 7186/1: fix Kconfig issue with PHYS_OFFSET and !MMU
| * | | | | | ARM: 7204/1: arch/arm/kernel/setup.c: initialize arm_dma_zone_size earlierArnaud Patard2011-12-111-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | arm_dma_zone_size is used by arm_bootmem_free() which is called by paging_init(). Thus it needs to be set before calling it. Signed-off-by: Arnaud Patard <arnaud.patard@rtp-net.org> Acked-by: Nicolas Pitre <nico@linaro.org> Cc: stable@kernel.org Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | | | | | ARM: 7185/1: perf: don't assign platform_device on unsupported CPUsWill Deacon2011-12-061-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the unlikely case that a platform registers a PMU platform_device when running on a CPU that is unsupported by perf, we will encounter a NULL dereference when trying to assign the platform_device to the cpu_pmu structure. This patch checks that the CPU is supported by perf before assigning the platform_device. Reported-by: Pawel Moll <pawel.moll@arm.com> Signed-off-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | | | | | ARM: 7187/1: fix unwinding for XIP kernelsUwe Kleine-König2011-12-063-59/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The linker places the unwind tables in readonly sections. So when using an XIP kernel these are located in ROM and cannot be modified. For that reason the current approach to convert the relative offsets in the unwind index to absolute addresses early in the boot process doesn't work with XIP. The offsets in the unwind index section are signed 31 bit numbers and the structs are sorted by this offset. So it first has offsets between 0x40000000 and 0x7fffffff (i.e. the negative offsets) and then offsets between 0x00000000 and 0x3fffffff. When seperating these two blocks the numbers are sorted even when interpreting the offsets as unsigned longs. So determine the first non-negative entry once and track that using the new origin pointer. The actual bisection can then use a plain unsigned long comparison. The only thing that makes the new bisection more complicated is that the offsets are relative to their position in the index section, so the key to search needs to be adapted accordingly in each step. Moreover several consts are added to catch future writes and rename the member "addr" of struct unwind_idx to "addr_offset" to better match the new semantic. (This has the additional benefit of breaking eventual users at compile time to make them aware of the change.) In my tests the new algorithm was a tad faster than the original and has the additional upside of not needing the initial conversion and so saves some boot time and it's possible to unwind even earlier. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
| * | | | | | ARM: 7186/1: fix Kconfig issue with PHYS_OFFSET and !MMUNicolas Pitre2011-12-061-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 1b9f95f8ade9 (ARM: prepare for removal of a bunch of <mach/memory.h> files) introduced CONFIG_PHYS_OFFSET but the Kconfig hex prompt did not provide a default value. This has the undesired side effect of breaking a reportedly used trick for updating defconfigs on the fly for routine buildtesting across all arch and all platforms, i.e. cp /path/to/somedefconfig .config ; yes "" | make oldconfig because the config system will endlessly loop until a valid address is provided. However we can't just pick a random default value since it is likely to be wrong for the majority of the boards as the right answer for this option is quite varied. So the fact that the config system insists on having a proper value be entered is actually a good thing. It turns out that only at91x40_defconfig has this problem because it has CONFIG_MMU=n. However, in the !MMU case, there is already a CONFIG_DRAM_BASE value that can be used here. So let's use that as a default in that case and suppress the redundant CONFIG_PHYS_OFFSET prompt. Eventually the DRAM_BASE config option could simply be replaced by PHYS_OFFSET directly, but that's a larger change better suited for later. Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Nicolas Pitre <nico@linaro.org> Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
* | | | | | | linux/log2.h: Fix rounddown_pow_of_two(1)Linus Torvalds2011-12-131-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Exactly like roundup_pow_of_two(1), the rounddown version was buggy for the case of a compile-time constant '1' argument. Probably because it originated from the same code, sharing history with the roundup version from before the bugfix (for that one, see commit 1a06a52ee1b0: "Fix roundup_pow_of_two(1)"). However, unlike the roundup version, the fix for rounddown is to just remove the broken special case entirely. It's simply not needed - the generic code 1UL << ilog2(n) does the right thing for the constant '1' argment too. The only reason roundup needed that special case was because rounding up does so by subtracting one from the argument (and then adding one to the result) causing the obvious problems with "ilog2(0)". But rounddown doesn't do any of that, since ilog2() naturally truncates (ie "rounds down") to the right rounded down value. And without the ilog2(0) case, there's no reason for the special case that had the wrong value. tl;dr: rounddown_pow_of_two(1) should be 1, not 0. Acked-by: Dmitry Torokhov <dtor@vmware.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | | | Merge branch 'hwmon-for-linus' of ↵Linus Torvalds2011-12-131-2/+2
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging * 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging: hwmon: (jz4740) Staticise jz4740_hwmon_driver hwmon: (jz4740) fix signedness bug