| Commit message (Collapse) | Author | Files | Lines |
|
Static storage is not required for the struct vmap_area in
__get_vm_area_node.
Removing "static" to store this variable on the stack instead.
Signed-off-by: Kautuk Consul <consul.kautuk@gmail.com>
Acked-by: David Rientjes <rientjes@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If the BMC gets reset, it will return 0x80 response errors.
In less than a week
# grep "Error 80 on cmd 22" /var/log/kernel |wc -l
378681
In this case, it is probably a good idea to restore the IPMI settings.
Signed-off-by: Corey Minyard <cminyard@mvista.com>
Tested-by: Arkadiusz Miśkiewicz <a.miskiewicz@gmail.com>
Reported-by: Arkadiusz Miśkiewicz <a.miskiewicz@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
An integer overflow will happen on 64bit archs if task's sum of rss,
swapents and nr_ptes exceeds (2^31)/1000 value. This was introduced by
commit
f755a04 oom: use pte pages in OOM score
where the oom score computation was divided into several steps and it's no
longer computed as one expression in unsigned long(rss, swapents, nr_pte
are unsigned long), where the result value assigned to points(int) is in
range(1..1000). So there could be an int overflow while computing
176 points *= 1000;
and points may have negative value. Meaning the oom score for a mem hog task
will be one.
196 if (points <= 0)
197 return 1;
For example:
[ 3366] 0 3366 35390480 24303939 5 0 0 oom01
Out of memory: Kill process 3366 (oom01) score 1 or sacrifice child
Here the oom1 process consumes more than 24303939(rss)*4096~=92GB physical
memory, but it's oom score is one.
In this situation the mem hog task is skipped and oom killer kills another and
most probably innocent task with oom score greater than one.
The points variable should be of type long instead of int to prevent the
int overflow.
Signed-off-by: Frantisek Hrbata <fhrbata@redhat.com>
Acked-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Oleg Nesterov <oleg@redhat.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: <stable@vger.kernel.org> [2.6.36+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If the request is to create non-root group and we fail to meet it, we
should leave the root unchanged.
Signed-off-by: Hillf Danton <dhillf@gmail.com>
Acked-by: Hugh Dickins <hughd@google.com>
Acked-by: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: Michal Hocko <mhocko@suse.cz>
Cc: Balbir Singh <bsingharora@gmail.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
There is a potential integer overflow in nilfs_ioctl_clean_segments().
When a large argv[n].v_nmembs is passed from the userspace, the subsequent
call to vmalloc() will allocate a buffer smaller than expected, which
leads to out-of-bound access in nilfs_ioctl_move_blocks() and
lfs_clean_segments().
The following check does not prevent the overflow because nsegs is also
controlled by the userspace and could be very large.
if (argv[n].v_nmembs > nsegs * nilfs->ns_blocks_per_segment)
goto out_free;
This patch clamps argv[n].v_nmembs to UINT_MAX / argv[n].v_size, and
returns -EINVAL when overflow.
Signed-off-by: Haogang Chen <haogangchen@gmail.com>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
commit 828b1c50ae ("nilfs2: add compat ioctl") incidentally broke all
other NILFS compat ioctls. Make them work again.
Signed-off-by: Thomas Meyer <thomas@m3y3r.de>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: Ryusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: <stable@vger.kernel.org> [3.0+]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Kernels where MAX_NUMNODES > BITS_PER_LONG may temporarily see an empty
nodemask in a tsk's mempolicy if its previous nodemask is remapped onto a
new set of allowed cpuset nodes where the two nodemasks, as a result of
the remap, are now disjoint.
c0ff7453bb5c ("cpuset,mm: fix no node to alloc memory when changing
cpuset's mems") adds get_mems_allowed() to prevent the set of allowed
nodes from changing for a thread. This causes any update to a set of
allowed nodes to stall until put_mems_allowed() is called.
This stall is unncessary, however, if at least one node remains unchanged
in the update to the set of allowed nodes. This was addressed by
89e8a244b97e ("cpusets: avoid looping when storing to mems_allowed if one
node remains set"), but it's still possible that an empty nodemask may be
read from a mempolicy because the old nodemask may be remapped to the new
nodemask during rebind. To prevent this, only avoid the stall if there is
no mempolicy for the thread being changed.
This is a temporary solution until all reads from mempolicy nodemasks can
be guaranteed to not be empty without the get_mems_allowed()
synchronization.
Also moves the check for nodemask intersection inside task_lock() so that
tsk->mems_allowed cannot change. This ensures that nothing can set this
tsk's mems_allowed out from under us and also protects tsk->mempolicy.
Reported-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Paul Menage <paul@paulmenage.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit 53ab1c64983 ("IB/qib: Correct nfreectxts for multiple HCAs")
reversed the increments and decrements of dd->nfreectxts. Fix it.
Reviewed-by: Ram Vepa <ram.vepa@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
private_data_len is defined as a u8. If the user specifies a large
private_data size (> 220 bytes), we will calculate a total length that
exceeds 255, resulting in private_data_len wrapping back to 0. This
can lead to overwriting random kernel memory. Avoid this by verifying
that the resulting size fits into a u8.
Reported-by: B. Thery <benjamin.thery@bull.net>
Addresses: <http://bugs.openfabrics.org/bugzilla/show_bug.cgi?id=2335>
Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
|
|
I noticed that hotplug of one setup does not work with recent change in
pci tree.
After checking the bridge conf setup, I noticed that the bridges get
assigned but do not get enabled.
The reason is the following commit, while simply ignores bridge
resources when enabling a pci device:
| commit bbef98ab0f019f1b0c25c1acdf1683c68933d41b
| Author: Ram Pai <linuxram@us.ibm.com>
| Date: Sun Nov 6 10:33:10 2011 +0800
|
| PCI: defer enablement of SRIOV BARS
|...
| NOTE: Note, there is subtle change in the pci_enable_device() API. Any
| driver that depends on SRIOV BARS to be enabled in pci_enable_device()
| can fail.
Put back bridge resource and ROM resource checking to fix the problem.
That should fix regression like BIOS does not assign correct resource to
bridge.
Discussion can be found at:
http://www.spinics.net/lists/linux-pci/msg12874.html
Signed-off-by: Yinghai Lu <yinghai@kernel.org>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Before registering an adapter to i2c subsystem, we need make sure
driver is ready for incoming i2c xfer, becasue the i2c_add_adapter()
may trigger a i2c device driver's proble function which may start
some real i2c xfer. I met this issue when integrating a TSC2007 i2c
touch screen device with the i2c-eg20t driver.
This patch will call request_irq() and hw init before calling
i2c_add_adapter().
Signed-off-by: Feng Tang <feng.tang@intel.com>
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
|
|
Currently the fifo depth is set to zero for OMAP4 which disables
the FIFO usage. This patch enables the FIFO usage for I2C transactions
on OMAP4 also.
Tested on omap4430 and 3430.
Tested-and-Reported-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Shubhrajyoti D <shubhrajyoti@ti.com>
Signed-off-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
|
|
s3c24xx_i2c_parse_dt_gpio is called when cfg_gpio is not defined
in the platform data of the i2c device. When DT is not enabled,
the above function always returns -EINVAL. Since there can be
some i2c devices which don't need to configure any gpio lines,
the probe of such devices would fail here. Changing the default
return value to success would fix this issue.
Signed-off-by: Tushar Behera <tushar.behera@linaro.org>
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
|
|
Be a bit more friendly.
Signed-off-by: Mark Brown <broonie@opensource.wolfsonmicro.com>
Signed-off-by: Ben Dooks <ben-linux@fluff.org>
|
|
|
|
This reverts commit eb1711bb94991e93669c5a1b5f84f11be2d51ea1.
It blows up the i915 seqno tracking, resulting in the
BUG_ON(seqno == 0);
in i915_wait_request() triggering, which will cause lock-ups.
See for example
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/903010
https://lkml.org/lkml/2011/12/14/395
Reported-requested-and-tested-by: Dirk Hohndel <dirk@hohndel.org>
Reported-by: Richard Eames <Richard.Eames@flinders.edu.au>
Reported-by: Rocko Requin <rockorequin@hotmail.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Keith Packard <keithp@keithp.com>
Cc: Eric Anholt <eric@anholt.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Tests show that the original large intervals can easily make the dirty
limit exceeded on 100 concurrent dd's. So adapt to as large as the
next check point selected by the dirty throttling algorithm.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
Some active adaptors (VGA usually) only have two lanes at 2.7GHz.
That's a maximum pixel clock of 144MHz at 8bpc, but 192MHz at 6bpc.
Fixes Asus UX31 panel being black at startup due to no valid modes since
dc22ee6fc18ce0f15424e753e8473c306ece95c1.
v2: Rebased to current code, resulting in the fix applying to EDP panels as
well. Also changed from spatio-temporal to just spatial dithering on
pre-ironlake, to be conssitent (and less visual flicker)
Signed-off-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Tested-by: Eric Anholt <eric@anholt.net>
Tested-by: Dirk Hohndel <hohndel@infradead.org>
Signed-off-by: Keith Packard <keithp@keithp.com>
|
|
This adds a default setting for semaphores parameter, and enables
semaphores by default on IVB.
For now, as semaphores interaction with VTd causes random issues on
SNB, we do not enable them by default. But they can still be enabled
via the semaphores=1 kernel parameter.
v2: enables semaphores on SNB when IO remapping is disabled, with base
on Keith Packard patch.
CC: Daniel Vetter <daniel.vetter@ffwll.ch>
CC: Ben Widawsky <ben@bwidawsk.net>
CC: Keith Packard <keithp@keithp.com>
CC: Jesse Barnes <jbarnes@virtuousgeek.org>
CC: Chris Wilson <chris@chris-wilson.co.uk>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=42696
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=40564
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41353
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38862
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
|
|
This fixes a race where we may try to finish a page flip and decrement
the refcount even if our vblank_get failed and we ended up with a
spurious flip pending interrupt.
Fixes https://bugs.freedesktop.org/show_bug.cgi?id=34211.
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Keith Packard <keithp@keithp.com>
|
|
RC6 should always work on IVB, and should work on SNB whenever IO
remapping is disabled. RC6 never works on Ironlake. Make the default
value for the parameter follow these guidelines. Setting the value
to either 0 or 1 will force the specified behavior.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Reviewed-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=38567
Cc: Ted Phelps <phelps@gnusto.com>
Cc: Peter <pab1612@gmail.com>
Cc: Lukas Hejtmanek <xhejtman@fi.muni.cz>
Cc: Andrew Lutomirski <luto@mit.edu>
|
|
In i915 driver, we do not enable either rc6 or semaphores on SNB when dmar
is enabled. The new 'intel_iommu_enabled' variable signals when the
iommu code is in operation.
Cc: Ted Phelps <phelps@gnusto.com>
Cc: Peter <pab1612@gmail.com>
Cc: Lukas Hejtmanek <xhejtman@fi.muni.cz>
Cc: Andrew Lutomirski <luto@mit.edu>
CC: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
|
|
We were checking whether the supplied edid matched the connector it was
read from. We do this in case a DDC read returns an EDID for another
device on a multifunction or otherwise interesting card. However, we
failed to include LVDS as a digital device and so rejecting an otherwise
valid EDID.
Fixes the detection of the secondary SDVO LVDS panel on the Libretto
W105.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Adam Jackson <ajax@redhat.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=39216
Signed-off-by: Keith Packard <keithp@keithp.com>
|
|
This prevents an in-kernel division by zero which happens when we are
asking for i915_chipset_val too quickly, or within a race condition
between the power monitoring thread and userspace accesses via debugfs.
The issue can be reproduced easily via the following command:
while ``; do cat /sys/kernel/debug/dri/0/i915_emon_status; done
This is particularly dangerous because it can be triggered by
a non-privileged user by just reading the debugfs entry.
This issue was also found independently by Konstantin Belousov
<kostikbel@gmail.com>, who proposed a similar patch.
Reported-by: Konstantin Belousov <kostikbel@gmail.com>
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Acked-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: <stable@vger.kernel.org>
Signed-off-by: Eugeni Dodonov <eugeni.dodonov@intel.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
|
|
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
|
|
v2: add a CPT-specific macro, make code cleaner
v3: fix commit message
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=41272
Cc: stable@kernel.org
Signed-off-by: Paulo Zanoni <paulo.r.zanoni@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Keith Packard <keithp@keithp.com>
|
|
https://bugzilla.redhat.com/show_bug.cgi?id=750006
Signed-off-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
|
|
Mike Galbraith reported that this recent commit:
commit 4dcfe1025b513c2c1da5bf5586adb0e80148f612
Author: Peter Zijlstra <peterz@infradead.org>
Date: Thu Nov 10 13:01:10 2011 +0100
sched: Avoid SMT siblings in select_idle_sibling() if possible
stopped selecting an idle SMT sibling when there are no idle
cores in a single socket system.
Intent of the select_idle_sibling() was to fallback to an idle
SMT sibling, if it fails to identify an idle core. But this
fallback was not happening on systems where all the scheduler
domains had `SD_SHARE_PKG_RESOURCES' flag set.
Fix it. Slightly bigger patch of cleaning all these goto's etc
is queued up for the next release.
Reported-by: Mike Galbraith <efault@gmx.de>
Reported-by: Alex Shi <alex.shi@intel.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Link: http://lkml.kernel.org/r/1323978421.1984.244.camel@sbsiddha-desk.sc.intel.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|
The bisection implemented in unwind_find_origin() stopped to early. If
there is only a single entry left to check the original code just took
the end point as origin which might be wrong.
This was introduced in commit de66a979012d ("ARM: 7187/1: fix unwinding
for XIP kernels").
Reported-and-tested-by: Nick Bowler <nbowler@elliptictech.com>
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The btrfs io submission threads can build up massive plug lists. This
keeps things more reasonable so we don't hand over huge dumps of IO at
once.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
per_cpu_ptr_to_phys() incorrectly rounds up its result for non-kmalloc
case to the page boundary, which is bogus for any non-page-aligned
address.
This affects the only in-tree user of this function - sysfs handler
for per-cpu 'crash_notes' physical address. The trouble is that the
crash_notes per-cpu variable is not page-aligned:
crash_notes = 0xc08e8ed4
PER-CPU OFFSET VALUES:
CPU 0: 3711f000
CPU 1: 37129000
CPU 2: 37133000
CPU 3: 3713d000
So, the per-cpu addresses are:
crash_notes on CPU 0: f7a07ed4 => phys 36b57ed4
crash_notes on CPU 1: f7a11ed4 => phys 36b4ded4
crash_notes on CPU 2: f7a1bed4 => phys 36b43ed4
crash_notes on CPU 3: f7a25ed4 => phys 36b39ed4
However, /sys/devices/system/cpu/cpu*/crash_notes says:
/sys/devices/system/cpu/cpu0/crash_notes: 36b57000
/sys/devices/system/cpu/cpu1/crash_notes: 36b4d000
/sys/devices/system/cpu/cpu2/crash_notes: 36b43000
/sys/devices/system/cpu/cpu3/crash_notes: 36b39000
As you can see, all values are rounded down to a page
boundary. Consequently, this is where kexec sets up the NOTE segments,
and thus where the secondary kernel is looking for them. However, when
the first kernel crashes, it saves the notes to the unaligned
addresses, where they are not found.
Fix it by adding offset_in_page() to the translated page address.
-tj: Combined Eugene's and Petr's commit messages.
Signed-off-by: Eugene Surovegin <ebs@ebshome.net>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Petr Tesarik <ptesarik@suse.cz>
Cc: stable@kernel.org
|
|
While probing, fd sets up queue, probes hardware and tears down the
queue if probing fails. In the process, blk_drain_queue() kicks the
queue which failed to finish initialization and fd is unhappy about
that.
floppy0: no floppy controllers found
------------[ cut here ]------------
WARNING: at drivers/block/floppy.c:2929 do_fd_request+0xbf/0xd0()
Hardware name: To Be Filled By O.E.M.
VFS: do_fd_request called on non-open device
Modules linked in:
Pid: 1, comm: swapper Not tainted 3.2.0-rc4-00077-g5983fe2 #2
Call Trace:
[<ffffffff81039a6a>] warn_slowpath_common+0x7a/0xb0
[<ffffffff81039b41>] warn_slowpath_fmt+0x41/0x50
[<ffffffff813d657f>] do_fd_request+0xbf/0xd0
[<ffffffff81322b95>] blk_drain_queue+0x65/0x80
[<ffffffff81322c93>] blk_cleanup_queue+0xe3/0x1a0
[<ffffffff818a809d>] floppy_init+0xdeb/0xe28
[<ffffffff818a72b2>] ? daring+0x6b/0x6b
[<ffffffff810002af>] do_one_initcall+0x3f/0x170
[<ffffffff81884b34>] kernel_init+0x9d/0x11e
[<ffffffff810317c2>] ? schedule_tail+0x22/0xa0
[<ffffffff815dbb14>] kernel_thread_helper+0x4/0x10
[<ffffffff81884a97>] ? start_kernel+0x2be/0x2be
[<ffffffff815dbb10>] ? gs_change+0xb/0xb
Avoid it by making blk_drain_queue() kick queue iff dispatch queue has
something on it.
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Ralf Hildebrandt <Ralf.Hildebrandt@charite.de>
Reported-by: Wu Fengguang <fengguang.wu@intel.com>
Tested-by: Sergei Trofimovich <slyich@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
btrfs_update_inode is sometimes called with a null reservation.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
This patch is functionally tested on Samsung SMDKV6410.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: David Dajun Chen <dchen@diasemi.com>
Signed-off-by: Ashish Jangam <ashish.jangam@kpitcummins.com>
[grant.likely: don't create an unnecessary header file]
Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
|
|
This fixes an odd bug found on a Dell PowerEdge 1850/0RC130
(BIOS A05 01/09/2006) where all of the modules doing pci_set_dma_mask
would fail with:
ata_piix 0000:00:1f.1: enabling device (0005 -> 0007)
ata_piix 0000:00:1f.1: can't derive routing for PCI INT A
ata_piix 0000:00:1f.1: BMDMA: failed to set dma mask, falling back to PIO
The issue was the Xen-SWIOTLB was allocated such as that the end of
buffer was stradling a page (and also above 4GB). The fix was
spotted by Kalev Leonid which was to piggyback on git commit
e79f86b2ef9c0a8c47225217c1018b7d3d90101c "swiotlb: Use page alignment
for early buffer allocation" which:
We could call free_bootmem_late() if swiotlb is not used, and
it will shrink to page alignment.
So alloc them with page alignment at first, to avoid lose two pages
And doing that fixes the outstanding issue.
CC: stable@kernel.org
Suggested-by: "Kalev, Leonid" <Leonid.Kalev@ca.com>
Reported-and-Tested-by: "Taylor, Neal E" <Neal.Taylor@ca.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
d312ae878b6a "xen: use maximum reservation to limit amount of usable RAM"
clamped the total amount of RAM to the current maximum reservation. This is
correct for dom0 but is not correct for guest domains. In order to boot a guest
"pre-ballooned" (e.g. with memory=1G but maxmem=2G) in order to allow for
future memory expansion the guest must derive max_pfn from the e820 provided by
the toolstack and not the current maximum reservation (which can reflect only
the current maximum, not the guest lifetime max). The existing algorithm
already behaves this correctly if we do not artificially limit the maximum
number of pages for the guest case.
For a guest booted with maxmem=512, memory=128 this results in:
[ 0.000000] BIOS-provided physical RAM map:
[ 0.000000] Xen: 0000000000000000 - 00000000000a0000 (usable)
[ 0.000000] Xen: 00000000000a0000 - 0000000000100000 (reserved)
-[ 0.000000] Xen: 0000000000100000 - 0000000008100000 (usable)
-[ 0.000000] Xen: 0000000008100000 - 0000000020800000 (unusable)
+[ 0.000000] Xen: 0000000000100000 - 0000000020800000 (usable)
...
[ 0.000000] NX (Execute Disable) protection: active
[ 0.000000] DMI not present or invalid.
[ 0.000000] e820 update range: 0000000000000000 - 0000000000010000 (usable) ==> (reserved)
[ 0.000000] e820 remove range: 00000000000a0000 - 0000000000100000 (usable)
-[ 0.000000] last_pfn = 0x8100 max_arch_pfn = 0x1000000
+[ 0.000000] last_pfn = 0x20800 max_arch_pfn = 0x1000000
[ 0.000000] initial memory mapped : 0 - 027ff000
[ 0.000000] Base memory trampoline at [c009f000] 9f000 size 4096
-[ 0.000000] init_memory_mapping: 0000000000000000-0000000008100000
-[ 0.000000] 0000000000 - 0008100000 page 4k
-[ 0.000000] kernel direct mapping tables up to 8100000 @ 27bb000-27ff000
+[ 0.000000] init_memory_mapping: 0000000000000000-0000000020800000
+[ 0.000000] 0000000000 - 0020800000 page 4k
+[ 0.000000] kernel direct mapping tables up to 20800000 @ 26f8000-27ff000
[ 0.000000] xen: setting RW the range 27e8000 - 27ff000
[ 0.000000] 0MB HIGHMEM available.
-[ 0.000000] 129MB LOWMEM available.
-[ 0.000000] mapped low ram: 0 - 08100000
-[ 0.000000] low ram: 0 - 08100000
+[ 0.000000] 520MB LOWMEM available.
+[ 0.000000] mapped low ram: 0 - 20800000
+[ 0.000000] low ram: 0 - 20800000
With this change "xl mem-set <domain> 512M" will successfully increase the
guest RAM (by reducing the balloon).
There is no change for dom0.
Reported-and-Tested-by: George Shuklin <george.shuklin@gmail.com>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: stable@kernel.org
Reviewed-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
|
|
A user reported a problem booting into a new kernel with the old format inodes.
He was panicing in cow_file_range while writing out the inode cache. This is
because if the block group is not cached we'll just skip writing out the cache,
however if it gets dirtied again in the same transaction and it finished caching
we'd go ahead and write it out, but since we set cache_generation to the transid
we think we've already truncated it and will just carry on, running into
cow_file_range and blowing up. We need to make sure we only set
cache_generation if we've done the truncate. The user tested this patch and
verified that the panic no longer occured. Thanks,
Reported-and-Tested-by: Klaus Bitto <klaus.bitto@gmail.com>
Signed-off-by: Josef Bacik <josef@redhat.com>
|
|
I've been hitting this BUG_ON() in btrfs_orphan_add when running xfstest 269 in
a loop. This is because we will add an orphan item, do the truncate, the
truncate will fail for whatever reason (*cough*ENOSPC*cough*) and then we're
left with an orphan item still in the fs. Then we come back later to do another
truncate and it blows up because we already have an orphan item. This is ok so
just fix the BUG_ON() to only BUG() if ret is not EEXIST. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
|
|
We were occasionaly leaking space when running xfstest 269. This is because if
we failed to start the transaction in the truncate loop we'd just goto out, but
we need to break so that the inode is removed from the orphan list and the space
is properly freed. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
|
|
Running xfstests 269 with some tracing my scripts kept spitting out errors about
releasing bytes that we didn't actually have reserved. This took me down a huge
rabbit hole and it turns out the way we deal with reserved_extents is wrong,
we need to only be setting it if the reservation succeeds, otherwise the free()
method will come in and unreserve space that isn't actually reserved yet, which
can lead to other warnings and such. The math was all working out right in the
end, but it caused all sorts of other issues in addition to making my scripts
yell and scream and generally make it impossible for me to track down the
original issue I was looking for. The other problem is with our error handling
in the reservation code. There are two cases that we need to deal with
1) We raced with free. In this case free won't free anything because csum_bytes
is modified before we dro the lock in our reservation path, so free rightly
doesn't release any space because the reservation code may be depending on that
reservation. However if we fail, we need the reservation side to do the free at
that point since that space is no longer in use. So as it stands the code was
doing this fine and it worked out, except in case #2
2) We don't race with free. Nobody comes in and changes anything, and our
reservation fails. In this case we didn't reserve anything anyway and we just
need to clean up csum_bytes but not free anything. So we keep track of
csum_bytes before we drop the lock and if it hasn't changed we know we can just
decrement csum_bytes and carry on.
Because of the case where we can race with free()'s since we have to drop our
spin_lock to do the reservation, I'm going to serialize all reservations with
the i_mutex. We already get this for free in the heavy use paths, truncate and
file write all hold the i_mutex, just needed to add it to page_mkwrite and
various ioctl/balance things. With this patch my space leak scripts no longer
scream bloody murder. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
|
|
Now that we're properly keeping track of delayed inode space we've been getting
a lot of warnings out of btrfs_dirty_inode() when running xfstest 83. This is
because a bunch of people call mark_inode_dirty, which is void so we can't
return ENOSPC. This needs to be fixed in a few areas
1) file_update_time - this updates the mtime and such when writing to a file,
which will call mark_inode_dirty. So copy file_update_time into btrfs so we can
call btrfs_dirty_inode directly and return an error if we get one appropriately.
2) fix symlinks to use btrfs_setattr for ->setattr. For some reason we weren't
setting ->setattr for symlinks, even though we should have been. This catches
one of the cases where we were getting errors in mark_inode_dirty.
3) Fix btrfs_setattr and btrfs_setsize to call btrfs_dirty_inode directly
instead of mark_inode_dirty. This lets us return errors properly for truncate
and chown/anything related to setattr.
4) Add a new btrfs_fs_dirty_inode which will just call btrfs_dirty_inode and
print an error if we have one. The only remaining user we can't control for
this is touch_atime(), but we don't really want to keep people from walking
down the tree if we don't have space to save the atime update, so just complain
but don't worry about it.
With this patch xfstests 83 complains a handful of times instead of hundreds of
times. Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
|
|
Al pointed out we have some random problems with the way we account for
num_workers_starting in the async thread stuff. First of all we need to make
sure to decrement num_workers_starting if we fail to start the worker, so make
__btrfs_start_workers do this. Also fix __btrfs_start_workers so that it
doesn't call btrfs_stop_workers(), there is no point in stopping everybody if we
failed to create a worker. Also check_pending_worker_creates needs to call
__btrfs_start_work in it's work function since it already increments
num_workers_starting.
People only start one worker at a time, so get rid of the num_workers argument
everywhere, and make btrfs_queue_worker a void since it will always succeed.
Thanks,
Signed-off-by: Josef Bacik <josef@redhat.com>
|
|
The Smack LSM hook for security_d_instantiate checks
the inode's i_op->getxattr value to determine if the
containing filesystem supports extended attributes.
The BTRFS filesystem sets the inode's i_op value only
after it has instantiated the inode. This results in
Smack incorrectly giving new BTRFS inodes attributes
from the filesystem defaults on the assumption that
values can't be stored on the filesystem. This patch
moves the assignment of inode operation vectors ahead
of the calls to d_instantiate, letting Smack know that
the filesystem supports extended attributes. There
should be no impact on the performance or behavior of
BTRFS.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
If we have a constant stream of end_io completions or crc work,
we can hit softlockup messages from the async helper threads. This
adds a cond_resched() into the loop to avoid them.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
To reproduce the bug:
# touch /mnt/tmp
# stat /mnt/tmp | grep Change
Change: 2011-12-09 09:32:23.412105981 +0800
# chattr +i /mnt/tmp
# stat /mnt/tmp | grep Change
Change: 2011-12-09 09:32:43.198105295 +0800
# umount /mnt
# mount /dev/loop1 /mnt
# stat /mnt/tmp | grep Change
Change: 2011-12-09 09:32:23.412105981 +0800
We should update ctime of in-memory inode before calling
btrfs_update_inode().
Signed-off-by: Li Zefan <lizf@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
Since we have the free space caches, btrfs_orphan_cleanup also runs for
the tree_root. Unfortunately this also cleans up the orphans used to mark
subvol deletions in progress.
Currently if a subvol deletion gets interrupted twice by umount/mount, the
deletion will not be continued and the space permanently lost, though it
would be possible to write a tool to recover those lost subvol deletions.
This patch checks if the orphan belongs to a subvol (dead root) and skips
the deletion.
Signed-off-by: Arne Jansen <sensille@gmx.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
When we use raid0 as the data profile, df command may show us a very
inaccurate value of the available space, which may be much less than the
real one. It may make the users puzzled. Fix it by changing the calculation
of the available space, and making it be more similar to a fake chunk
allocation.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
Btrfsck report errors after the 83th case of xfstests was run, The error
number is 400, it means the used disk space of the file is wrong.
The reason of this bug is that:
The file truncation may fail when the space of the file system is not enough,
and leave some file extents, whose offset are beyond the end of the files.
When we want to expand those files, we will drop those file extents, and
put in dummy file extents, and then we should update the i-node. But btrfs
forgets to do it.
This patch adds the forgotten i-node update.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
Btrfsck report error 100 after the 83th case of xfstests was run, it means
the i_size of the file is wrong.
The reason of this bug is that:
Btrfs increased i_size of the file at the beginning, but it failed to expand
the file, and failed to update the i_size to the old size because there is no
enough space in the file system, so we found a wrong i_size.
This patch fixes this bug by updating the i_size just when we pass the file
expanding and get enough space to update i-node.
Signed-off-by: Miao Xie <miaox@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
|
Several people with this chipset have reported inconsistent/sloppy
values for position reporting when the DMA position buffer is used,
and that setting position_fix=1 have fixed their problems.
BugLink: https://bugs.launchpad.net/bugs/825709
Signed-off-by: David Henningsson <david.henningsson@canonical.com>
Signed-off-by: Takashi Iwai <tiwai@suse.de>
|