| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'x86-asm-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86: Mark atomic irq ops raw for 32bit legacy
x86: Merge show_regs()
x86: Macroise x86 cache descriptors
x86-32: clean up rwsem inline asm statements
x86: Merge asm/atomic_{32,64}.h
x86: Sync asm/atomic_32.h and asm/atomic_64.h
x86: Split atomic64_t functions into seperate headers
x86-64: Modify memcpy()/memset() alternatives mechanism
x86-64: Modify copy_user_generic() alternatives mechanism
x86: Lift restriction on the location of FIX_BTMAP_*
x86, core: Optimize hweight32()
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The atomic ops emulation for 32bit legacy CPUs floods the tracer with
irq off/on entries. The irq disabled regions are short and therefor
not interesting when chasing long irq disabled latencies. Mark them
raw and keep them out of the trace.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Using kernel_stack_pointer() allows 32-bit and 64-bit versions to
be merged. This is more correct for 64-bit, since the old %rsp is
always saved on the stack.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1263397555-27695-1-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Use a macro to define the cache sizes when cachesize > 1 MB.
This is less typing, and less prone to introducing bugs like we
saw in e02e0e1a130b9ca37c5186d38ad4b3aaf58bb149, and means we
don't have to do maths when adding new non-power-of-2 updates
like those seen recently.
Signed-off-by: Dave Jones <davej@redhat.com>
LKML-Reference: <20100104144735.GA18390@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This makes gcc use the right register names and instruction operand sizes
automatically for the rwsem inline asm statements.
So instead of using "(%%eax)" to specify the memory address that is the
semaphore, we use "(%1)" or similar. And instead of forcing the operation
to always be 32-bit, we use "%z0", taking the size from the actual
semaphore data structure itself.
This doesn't actually matter on x86-32, but if we want to use the same
inline asm for x86-64, we'll need to have the compiler generate the proper
64-bit names for the registers (%rax instead of %eax), and if we want to
use a 64-bit counter too (in order to avoid the 15-bit limit on the
write counter that limits concurrent users to 32767 threads), we'll need
to be able to generate instructions with "q" accesses rather than "l".
Since this header currently isn't enabled on x86-64, none of that matters,
but we do want to use the xadd version of the semaphores rather than have
to take spinlocks to do a rwsem. The mm->mmap_sem can be heavily contended
when you have lots of threads all taking page faults, and the fallback
rwsem code that uses a spinlock performs abysmally badly in that case.
[ hpa: modified the patch to skip size suffixes entirely when they are
redundant due to register operands. ]
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <alpine.LFD.2.00.1001121613560.17145@localhost.localdomain>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Merge the now identical code from asm/atomic_32.h and asm/atomic_64.h
into asm/atomic.h.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1262883215-4034-4-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Prepare for merging into asm/atomic.h.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1262883215-4034-3-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Split atomic64_t functions out into separate headers, since they will
not be practical to merge between 32 and 64 bits.
Signed-off-by: Brian Gerst <brgerst@gmail.com>
LKML-Reference: <1262883215-4034-2-git-send-email-brgerst@gmail.com>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In order to avoid unnecessary chains of branches, rather than
implementing memcpy()/memset()'s access to their alternative
implementations via a jump, patch the (larger) original function
directly.
The memcpy() part of this is slightly subtle: while alternative
instruction patching does itself use memcpy(), with the
replacement block being less than 64-bytes in size the main loop
of the original function doesn't get used for copying memcpy_c()
over memcpy(), and hence we can safely write over its beginning.
Also note that the CFI annotations are fine for both variants of
each of the functions.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <4B2BB8D30200007800026AF2@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In order to avoid unnecessary chains of branches, rather than
implementing copy_user_generic() as a function consisting of
just a single (possibly patched) branch, instead properly deal
with patching call instructions in the alternative instructions
framework, and move the patching into the callers.
As a follow-on, one could also introduce something like
__EXPORT_SYMBOL_ALT() to avoid patching call sites in modules.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Nick Piggin <npiggin@suse.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <4B2BB8180200007800026AE7@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The early ioremap fixmap entries cover half (or for 32-bit
non-PAE, a quarter) of a page table, yet they got
uncondtitionally aligned so far to a 256-entry boundary. This is
not necessary if the range of page table entries anyway falls
into a single page table.
This buys back, for (theoretically) 50% of all configurations
(25% of all non-PAE ones), at least some of the lowmem
necessarily lost with commit e621bd18958ef5dbace3129ebe17a0a475e127d9.
Signed-off-by: Jan Beulich <jbeulich@novell.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <4B2BB66F0200007800026AD6@vpn.id2.novell.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Optimize hweight32 by using the same technique in hweight64.
The proof of this technique can be found in the commit log for
f9b4192923fa6e38331e88214b1fe5fc21583fcc ("bitops: hweight()
speedup").
The userspace benchmark on x86_32 showed 20% speedup with
bitmap_weight() which uses hweight32 to count bits for each
unsigned long on 32bit architectures.
int main(void)
{
#define SZ (1024 * 1024 * 512)
static DECLARE_BITMAP(bitmap, SZ) = {
[0 ... 100] = 1,
};
return bitmap_weight(bitmap, SZ);
}
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
LKML-Reference: <1258603932-4590-1-git-send-email-akinobu.mita@gmail.com>
[ only x86 sets ARCH_HAS_FAST_MULTIPLIER so we do this via the x86 tree]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (25 commits)
sched: Fix SCHED_MC regression caused by change in sched cpu_power
sched: Don't use possibly stale sched_class
kthread, sched: Remove reference to kthread_create_on_cpu
sched: cpuacct: Use bigger percpu counter batch values for stats counters
percpu_counter: Make __percpu_counter_add an inline function on UP
sched: Remove member rt_se from struct rt_rq
sched: Change usage of rt_rq->rt_se to rt_rq->tg->rt_se[cpu]
sched: Remove unused update_shares_locked()
sched: Use for_each_bit
sched: Queue a deboosted task to the head of the RT prio queue
sched: Implement head queueing for sched_rt
sched: Extend enqueue_task to allow head queueing
sched: Remove USER_SCHED
sched: Fix the place where group powers are updated
sched: Assume *balance is valid
sched: Remove load_balance_newidle()
sched: Unify load_balance{,_newidle}()
sched: Add a lock break for PREEMPT=y
sched: Remove from fwd decls
sched: Remove rq_iterator from move_one_task
...
Fix up trivial conflicts in kernel/sched.c
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
On platforms like dual socket quad-core platform, the scheduler load
balancer is not detecting the load imbalances in certain scenarios. This
is leading to scenarios like where one socket is completely busy (with
all the 4 cores running with 4 tasks) and leaving another socket
completely idle. This causes performance issues as those 4 tasks share
the memory controller, last-level cache bandwidth etc. Also we won't be
taking advantage of turbo-mode as much as we would like, etc.
Some of the comparisons in the scheduler load balancing code are
comparing the "weighted cpu load that is scaled wrt sched_group's
cpu_power" with the "weighted average load per task that is not scaled
wrt sched_group's cpu_power". While this has probably been broken for a
longer time (for multi socket numa nodes etc), the problem got aggrevated
via this recent change:
|
| commit f93e65c186ab3c05ce2068733ca10e34fd00125e
| Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
| Date: Tue Sep 1 10:34:32 2009 +0200
|
| sched: Restore __cpu_power to a straight sum of power
|
Also with this change, the sched group cpu power alone no longer reflects
the group capacity that is needed to implement MC, MT performance
(default) and power-savings (user-selectable) policies.
We need to use the computed group capacity (sgs.group_capacity, that is
computed using the SD_PREFER_SIBLING logic in update_sd_lb_stats()) to
find out if the group with the max load is above its capacity and how
much load to move etc.
Reported-by: Ma Ling <ling.ma@intel.com>
Initial-Analysis-by: Zhang, Yanmin <yanmin_zhang@linux.intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
[ -v2: build fix ]
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: <stable@kernel.org> # [2.6.32.x, 2.6.33.x]
LKML-Reference: <1266970432.11588.22.camel@sbs-t61.sc.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
setscheduler() saves task->sched_class outside of the rq->lock held
region for a check after the setscheduler changes have become
effective. That might result in checking a stale value.
rtmutex_setprio() has the same problem, though it is protected by
p->pi_lock against setscheduler(), but for correctness sake (and to
avoid bad examples) it needs to be fixed as well.
Retrieve task->sched_class inside of the rq->lock held region.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: stable@kernel.org
|
| |\ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Conflicts: kernel/sched.c
Necessary due to the urgent fixes which conflict with the code move
from sched.c to sched_fair.c
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
kthread_create_on_cpu doesn't exist so update a comment in
kthread.c to reflect this.
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: Rusty Russell <rusty@rustcorp.com.au>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100209040740.GB3702@kryten>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
When CONFIG_VIRT_CPU_ACCOUNTING and CONFIG_CGROUP_CPUACCT are
enabled we can call cpuacct_update_stats with values much larger
than percpu_counter_batch. This means the call to
percpu_counter_add will always add to the global count which is
protected by a spinlock and we end up with a global spinlock in
the scheduler.
Based on an idea by KOSAKI Motohiro, this patch scales the batch
value by cputime_one_jiffy such that we have the same batch
limit as we would if CONFIG_VIRT_CPU_ACCOUNTING was disabled.
His patch did this once at boot but that initialisation happened
too early on PowerPC (before time_init) and it was never updated
at runtime as a result of a hotplug cpu add/remove.
This patch instead scales percpu_counter_batch by
cputime_one_jiffy at runtime, which keeps the batch correct even
after cpu hotplug operations. We cap it at INT_MAX in case of
overflow.
For architectures that do not support
CONFIG_VIRT_CPU_ACCOUNTING, cputime_one_jiffy is the constant 1
and gcc is smart enough to optimise min(s32
percpu_counter_batch, INT_MAX) to just percpu_counter_batch at
least on x86 and PowerPC. So there is no need to add an #ifdef.
On a 64 thread PowerPC box with CONFIG_VIRT_CPU_ACCOUNTING and
CONFIG_CGROUP_CPUACCT enabled, a context switch microbenchmark
is 234x faster and almost matches a CONFIG_CGROUP_CPUACCT
disabled kernel:
CONFIG_CGROUP_CPUACCT disabled: 16906698 ctx switches/sec
CONFIG_CGROUP_CPUACCT enabled: 61720 ctx switches/sec
CONFIG_CGROUP_CPUACCT + patch: 16663217 ctx switches/sec
Tested with:
wget http://ozlabs.org/~anton/junkcode/context_switch.c
make context_switch
for i in `seq 0 63`; do taskset -c $i ./context_switch & done
vmstat 1
Signed-off-by: Anton Blanchard <anton@samba.org>
Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Tested-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Even though batch isn't used on UP, we may want to pass one in
to keep the SMP and UP code paths similar. Convert
__percpu_counter_add to an inline function so we wont get
variable unused warnings if we do.
Signed-off-by: Anton Blanchard <anton@samba.org>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| |\ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Merge reason: Merge dependent fix, update to latest -rc.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
It's a duplicate of tg->rt_se[cpu] and the only usage is
sched_rt_rq_dequeue() and sched_rt_rq_enqueue(). After the
first patch to those two function. rt_se can be removed.
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <2674af741001282258q38781619u653ca4a7dd267347@mail.gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
This is the first step to remove rt_rq member rt_se because it have the
same meaning with tg->rt_se[cpu]. And the latter style is also used by
the fair scheduling class.
Signed-off-by: Yong Zhang <yong.zhang0@gmail.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <2674af741001282257r28c97a92o9f90cf16fe8d3d84@mail.gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Commit f492e12ef050e02bf0185b6b57874992591b9be1 ("sched: Remove
load_balance_newidle()") removed the only user of this function,
so remove it too.
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1265019219.24455.128.camel@laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
No change in functionality.
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
LKML-Reference: <1264938810-4173-1-git-send-email-akinobu.mita@gmail.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
rtmutex_set_prio() is used to implement priority inheritance for
futexes. When a task is deboosted it gets enqueued at the tail of its
RT priority list. This is violating the POSIX scheduling semantics:
rt priority list X contains two runnable tasks A and B
task A runs with priority X and holds mutex M
task C preempts A and is blocked on mutex M
-> task A is boosted to priority of task C (Y)
task A unlocks the mutex M and deboosts itself
-> A is dequeued from rt priority list Y
-> A is enqueued to the tail of rt priority list X
task C schedules away
task B runs
This is wrong as task A did not schedule away and therefor violates
the POSIX scheduling semantics.
Enqueue the task to the head of the priority list instead.
Reported-by: Mathias Weber <mathias.weber.mw1@roche.com>
Reported-by: Carsten Emde <cbe@osadl.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Carsten Emde <cbe@osadl.org>
Tested-by: Mathias Weber <mathias.weber.mw1@roche.com>
LKML-Reference: <20100120171629.809074113@linutronix.de>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The ability of enqueueing a task to the head of a SCHED_FIFO priority
list is required to fix some violations of POSIX scheduling policy.
Implement the functionality in sched_rt.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Carsten Emde <cbe@osadl.org>
Tested-by: Mathias Weber <mathias.weber.mw1@roche.com>
LKML-Reference: <20100120171629.772169931@linutronix.de>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The ability of enqueueing a task to the head of a SCHED_FIFO priority
list is required to fix some violations of POSIX scheduling policy.
Extend the related functions with a "head" argument.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Tested-by: Carsten Emde <cbe@osadl.org>
Tested-by: Mathias Weber <mathias.weber.mw1@roche.com>
LKML-Reference: <20100120171629.734886007@linutronix.de>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Remove the USER_SCHED feature. It has been scheduled to be removed in
2.6.34 as per http://marc.info/?l=linux-kernel&m=125728479022976&w=2
Signed-off-by: Dhaval Giani <dhaval.giani@gmail.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1263990378.24844.3.camel@localhost>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
We want to update the sched_group_powers when balance_cpu == this_cpu.
Currently the group powers are updated only if the balance_cpu is the
first CPU in the local group. But balance_cpu = this_cpu could also be
the first idle cpu in the group. Hence fix the place where the group
powers are updated.
Signed-off-by: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Joel Schopp <jschopp@austin.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1264017764.5717.127.camel@jschopp-laptop>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Since all load_balance() callers will have !NULL balance parameters we
can now assume so and remove a few checks.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The two functions: load_balance{,_newidle}() are very similar, with the
following differences:
- rq->lock usage
- sb->balance_interval updates
- *balance check
So remove the load_balance_newidle() call with load_balance(.idle =
CPU_NEWLY_IDLE), explicitly unlock the rq->lock before calling (would be
done by double_lock_balance() anyway), and ignore the other differences
for now.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
load_balance() and load_balance_newidle() look remarkably similar, one
key point they differ in is the condition on when to active balance.
So split out that logic into a separate function.
One side effect is that previously load_balance_newidle() used to fail
and return -1 under these conditions, whereas now it doesn't. I've not
yet fully figured out the whole -1 return case for either
load_balance{,_newidle}().
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Since load-balancing can hold rq->locks for quite a long while, allow
breaking out early when there is lock contention.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Move code around to get rid of fwd declarations.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Again, since we only iterate the fair class, remove the abstraction.
Since this is the last user of the rq_iterator, remove all that too.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Since we only ever iterate the fair class, do away with this abstraction.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Take out the sched_class methods for load-balancing.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
Straight fwd code movement.
Since non of the load-balance abstractions are used anymore, do away with
them and simplify the code some. In preparation move the code around.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
kernel/sched: don't expose local functions
The get_rr_interval_* functions are all class methods of
struct sched_class. They are not exported so make them
static.
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Cc: Peter Zijlstra <peterz@infradead.org>
LKML-Reference: <201001132021.53253.hartleys@visionengravers.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| | |_|/
| |/| |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Fixes a warning when building with g++:
warning: deprecated conversion from string constant to 'char*'
And the file parameter use is constant, so mark it as such.
Signed-off-by: Simon Kagstrom <simon.kagstrom@netinsight.net>
Cc: peterz@infradead.org
LKML-Reference: <20091223110818.442d848e@marrow.netinsight.se>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|\ \ \ \
| | |_|/
| |/| |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'sched-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Fix race between ttwu() and task_rq_lock()
sched: Fix SMT scheduler regression in find_busiest_queue()
sched: Fix sched_mv_power_savings for !SMT
kernel/sched.c: Suppress unused var warning
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Thomas found that due to ttwu() changing a task's cpu without holding
the rq->lock, task_rq_lock() might end up locking the wrong rq.
Avoid this by serializing against TASK_WAKING.
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1266241712.15770.420.camel@laptop>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Fix a SMT scheduler performance regression that is leading to a scenario
where SMT threads in one core are completely idle while both the SMT threads
in another core (on the same socket) are busy.
This is caused by this commit (with the problematic code highlighted)
commit bdb94aa5dbd8b55e75f5a50b61312fe589e2c2d1
Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Tue Sep 1 10:34:38 2009 +0200
sched: Try to deal with low capacity
@@ -4203,15 +4223,18 @@ find_busiest_queue()
...
for_each_cpu(i, sched_group_cpus(group)) {
+ unsigned long power = power_of(i);
...
- wl = weighted_cpuload(i);
+ wl = weighted_cpuload(i) * SCHED_LOAD_SCALE;
+ wl /= power;
- if (rq->nr_running == 1 && wl > imbalance)
+ if (capacity && rq->nr_running == 1 && wl > imbalance)
continue;
On a SMT system, power of the HT logical cpu will be 589 and
the scheduler load imbalance (for scenarios like the one mentioned above)
can be approximately 1024 (SCHED_LOAD_SCALE). The above change of scaling
the weighted load with the power will result in "wl > imbalance" and
ultimately resulting in find_busiest_queue() return NULL, causing
load_balance() to think that the load is well balanced. But infact
one of the tasks can be moved to the idle core for optimal performance.
We don't need to use the weighted load (wl) scaled by the cpu power to
compare with imabalance. In that condition, we already know there is only a
single task "rq->nr_running == 1" and the comparison between imbalance,
wl is to make sure that we select the correct priority thread which matches
imbalance. So we really need to compare the imabalnce with the original
weighted load of the cpu and not the scaled load.
But in other conditions where we want the most hammered(busiest) cpu, we can
use scaled load to ensure that we consider the cpu power in addition to the
actual load on that cpu, so that we can move the load away from the
guy that is getting most hammered with respect to the actual capacity,
as compared with the rest of the cpu's in that busiest group.
Fix it.
Reported-by: Ma Ling <ling.ma@intel.com>
Initial-Analysis-by: Zhang, Yanmin <yanmin_zhang@linux.intel.com>
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1266023662.2808.118.camel@sbs-t61.sc.intel.com>
Cc: stable@kernel.org [2.6.32.x]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | |/
| |/|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Fix for sched_mc_powersavigs for pre-Nehalem platforms.
Child sched domain should clear SD_PREFER_SIBLING if parent will have
SD_POWERSAVINGS_BALANCE because they are contradicting.
Sets the flags correctly based on sched_mc_power_savings.
Signed-off-by: Vaidyanathan Srinivasan <svaidy@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100208100555.GD2931@dirshya.in.ibm.com>
Cc: stable@kernel.org [2.6.32.x]
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
On UP:
kernel/sched.c: In function 'wake_up_new_task':
kernel/sched.c:2631: warning: unused variable 'cpu'
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (172 commits)
perf_event, amd: Fix spinlock initialization
perf_event: Fix preempt warning in perf_clock()
perf tools: Flush maps on COMM events
perf_events, x86: Split PMU definitions into separate files
perf annotate: Handle samples not at objdump output addr boundaries
perf_events, x86: Remove superflous MSR writes
perf_events: Simplify code by removing cpu argument to hw_perf_group_sched_in()
perf_events, x86: AMD event scheduling
perf_events: Add new start/stop PMU callbacks
perf_events: Report the MMAP pgoff value in bytes
perf annotate: Defer allocating sym_priv->hist array
perf symbols: Improve debugging information about symtab origins
perf top: Use a macro instead of a constant variable
perf symbols: Check the right return variable
perf/scripts: Tag syscall_name helper as not yet available
perf/scripts: Add perf-trace-python Documentation
perf/scripts: Remove unnecessary PyTuple resizes
perf/scripts: Add syscall tracing scripts
perf/scripts: Add Python scripting engine
perf/scripts: Remove check-perf-trace from listed scripts
...
Fix trivial conflict in tools/perf/util/probe-event.c
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Avoid kernels from exploding on AMD machines when they have any
lock debugging bits enabled.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
A recent commit introduced a preemption warning for
perf_clock(), use raw_smp_processor_id() to avoid this, it
really doesn't matter which cpu we use here.
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1267198583.22519.684.camel@laptop>
Cc: <stable@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Even though we don't register the counters until the child is right about
to exec(), we're still going to get at least a few events while the
fork()'d child is still executing 'perf' and in particular we're going to
get the MMAP events.
We can't distinguish the ones in the newly executed process because the
PID will be the same.
One way to solve this would be to have a PERF_RECORD_EXEC event, and when
this is seen 'perf' can flush it's map cache. We can't use
PERF_RECORD_COMM since that's generated by other things, not just exec().
Actually, thinking about it some more, using PERF_RECORD_COMM might be a
good enough approximation.
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frédéric Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
LKML-Reference: <1267196914-16238-1-git-send-email-acme@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Split amd,p6,intel into separate files so that we can easily deal with
CONFIG_CPU_SUP_* things, needed to make things build now that perf_event.c
relies on symbols from amd.c
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
|