linux - linux

	Commit message (Collapse)	Author	Age	Files	Lines
*	sched: Implement hierarchical task accounting for SCHED_OTHER	Paul Turner	2011-08-14	4	-5/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduce hierarchical task accounting for the group scheduling case in CFS, as well as promoting the responsibility for maintaining rq->nr_running to the scheduling classes. The primary motivation for this is that with scheduling classes supporting bandwidth throttling it is possible for entities participating in throttled sub-trees to not have root visible changes in rq->nr_running across activate and de-activate operations. This in turn leads to incorrect idle and weight-per-task load balance decisions. This also allows us to make a small fixlet to the fastpath in pick_next_task() under group scheduling. Note: this issue also exists with the existing sched_rt throttling mechanism. This patch does not address that. Signed-off-by: Paul Turner <pjt@google.com> Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110721184756.878333391@google.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched/cpupri: Remove cpupri->pri_active	Yong Zhang	2011-08-14	2	-4/+1
\| \| \| \| \| \| \| \| \| \| \|	Since [sched/cpupri: Remove the vec->lock], member pri_active of struct cpupri is not needed any more, just remove it. Also clean stuff related to it. Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110806001004.GA2207@zhy Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched/cpupri: Fix memory barriers for vec updates to always be in order	Steven Rostedt	2011-08-14	1	-3/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[ This patch actually compiles. Thanks to Mike Galbraith for pointing that out. I compiled and booted this patch with no issues. ] Re-examining the cpupri patch, I see there's a possible race because the update of the two priorities vec->counts are not protected by a memory barrier. When a RT runqueue is overloaded and wants to push an RT task to another runqueue, it scans the RT priority vectors in a loop from lowest priority to highest. When we queue or dequeue an RT task that changes a runqueue's highest priority task, we update the vectors to show that a runqueue is rated at a different priority. To do this, we first set the new priority mask, and increment the vec->count, and then set the old priority mask by decrementing the vec->count. If we are lowering the runqueue's RT priority rating, it will trigger a RT pull, and we do not care if we miss pushing to this runqueue or not. But if we raise the priority, but the priority is still lower than an RT task that is looking to be pushed, we must make sure that this runqueue is still seen by the push algorithm (the loop). Because the loop reads from lowest to highest, and the new priority is set before the old one is cleared, we will either see the new or old priority set and the vector will be checked. But! Since there's no memory barrier between the updates of the two, the old count may be decremented first before the new count is incremented. This means the loop may see the old count of zero and skip it, and also the new count of zero before it was updated. A possible runqueue that the RT task could move to could be missed. A conditional memory barrier is placed between the vec->count updates and is only called when both updates are done. The smp_wmb() has also been changed to smp_mb__before_atomic_inc/dec(), as they are not needed by archs that already synchronize atomic_inc/dec(). The smp_rmb() has been moved to be called at every iteration of the loop so that the race between seeing the two updates is visible by each iteration of the loop, as an arch is free to optimize the reading of memory of the counters in the loop. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1312547269.18583.194.camel@gandalf.stny.rr.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched/cpupri: Remove the vec->lock	Steven Rostedt	2011-08-14	2	-26/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sched/cpupri: Remove the vec->lock The cpupri vec->lock has been showing up as a top contention lately. This is because of the RT push/pull logic takes an agressive approach for migrating RT tasks. The cpupri logic is in place to improve the performance of the push/pull when dealing with large number CPU machines. The problem though is a vec->lock is required, where a vec is a global per RT priority structure. That is, if there are lots of RT tasks at the same priority, every time they are added or removed from the RT queue, this global vec->lock is taken. Now that more kernel threads are becoming RT (RCU boost and threaded interrupts) this is becoming much more of an issue. There are two variables that are being synced by the vec->lock. The cpupri bitmask, and the vec->counter. The cpupri bitmask is one bit per priority. If a RT priority vec has a process queued, then the vec->count is > 0 and the cpupri bitmask is set for that RT priority. If the cpupri bitmask gets out of sync with the vec->counter, we could end up pushing a low proirity RT task to a high priority queue. That RT task that could have run immediately could be queued on a run queue with a higher priority task indefinitely. The solution is not to use the cpupri bitmask and just look at the vec->count directly when doing a pull. The cpupri bitmask is just a fast way to scan the RT priorities when a pull is made. Instead of using the bitmask, and just examine all RT priorities, and look at the vec->counts, we could eliminate the vec->lock. The scan of RT tasks is to find a run queue that we can push an RT task to, and we do not push to a high priority queue, thus the scan only needs to go from 1 to RT task->prio, and not all 100 RT priorities. The push algorithm, which does the scan of RT priorities (and scan of the bitmask) only happens when we have an overloaded RT run queue (more than one RT task queued). The grabbing of the vec->lock happens every time any RT task is queued or dequeued on the run queue for that priority. The slowing down of the scan by not using a bitmask is negligible by the speed up of removing the vec->lock contention, and replacing it with an atomic counter and memory barrier. To prove this, I wrote a patch that times both the loop and the code that grabs the vec->locks. I passed the patches to various people (and companies) to test and show the results. I let everyone choose their own load to test, giving different loads on the system, for various different setups. Here's some of the results: (snipping to a few CPUs to not make this change log huge, but the results were consistent across the entire system). System 1 (24 CPUs) Before patch: CPU: Name Count Max Min Average Total ---- ---- ----- --- --- ------- ----- [...] cpu 20: loop 3057 1.766 0.061 0.642 1963.170 vec 6782949 90.469 0.089 0.414 2811760.503 cpu 21: loop 2617 1.723 0.062 0.641 1679.074 vec 6782810 90.499 0.089 0.291 1978499.900 cpu 22: loop 2212 1.863 0.063 0.699 1547.160 vec 6767244 85.685 0.089 0.435 2949676.898 cpu 23: loop 2320 2.013 0.062 0.594 1380.265 vec 6781694 87.923 0.088 0.431 2928538.224 After patch: cpu 20: loop 2078 1.579 0.061 0.533 1108.006 vec 6164555 5.704 0.060 0.143 885185.809 cpu 21: loop 2268 1.712 0.065 0.575 1305.248 vec 6153376 5.558 0.060 0.187 1154960.469 cpu 22: loop 1542 1.639 0.095 0.533 823.249 vec 6156510 5.720 0.060 0.190 1172727.232 cpu 23: loop 1650 1.733 0.068 0.545 900.781 vec 6170784 5.533 0.060 0.167 1034287.953 All times are in microseconds. The 'loop' is the amount of time spent doing the loop across the priorities (before patch uses bitmask). the 'vec' is the amount of time in the code that requires grabbing the vec->lock. The second patch just does not have the vec lock, but encompasses the same code. Amazingly the loop code even went down on average. The vec code went from .5 down to .18, that's more than half the time spent! Note, more than one test was run, but they all had the same results. System 2 (64 CPUs) Before patch: CPU: Name Count Max Min Average Total ---- ---- ----- --- --- ------- ----- cpu 60: loop 0 0 0 0 0 vec 5410840 277.954 0.084 0.782 4232895.727 cpu 61: loop 0 0 0 0 0 vec 4915648 188.399 0.084 0.570 2803220.301 cpu 62: loop 0 0 0 0 0 vec 5356076 276.417 0.085 0.786 4214544.548 cpu 63: loop 0 0 0 0 0 vec 4891837 170.531 0.085 0.799 3910948.833 After patch: cpu 60: loop 0 0 0 0 0 vec 5365118 5.080 0.021 0.063 340490.267 cpu 61: loop 0 0 0 0 0 vec 4898590 1.757 0.019 0.071 347903.615 cpu 62: loop 0 0 0 0 0 vec 5737130 3.067 0.021 0.119 687108.734 cpu 63: loop 0 0 0 0 0 vec 4903228 1.822 0.021 0.071 348506.477 The test run during the measurement did not have any (very few, from other CPUs) RT tasks pushing. But this shows that it helped out tremendously with the contention, as the contention happens because the vec->lock is taken only on queuing at an RT priority, and different CPUs that queue tasks at the same priority will have contention. I tested on my own 4 CPU machine with the following results: Before patch: CPU: Name Count Max Min Average Total ---- ---- ----- --- --- ------- ----- cpu 0: loop 2377 1.489 0.158 0.588 1398.395 vec 4484 770.146 2.301 4.396 19711.755 cpu 1: loop 2169 1.962 0.160 0.576 1250.110 vec 4425 152.769 2.297 4.030 17834.228 cpu 2: loop 2324 1.749 0.155 0.559 1299.799 vec 4368 779.632 2.325 4.665 20379.268 cpu 3: loop 2325 1.629 0.157 0.561 1306.113 vec 4650 408.782 2.394 4.348 20222.577 After patch: CPU: Name Count Max Min Average Total ---- ---- ----- --- --- ------- ----- cpu 0: loop 2121 1.616 0.113 0.636 1349.189 vec 4303 1.151 0.225 0.421 1811.966 cpu 1: loop 2130 1.638 0.178 0.644 1372.927 vec 4627 1.379 0.235 0.428 1983.648 cpu 2: loop 2056 1.464 0.165 0.637 1310.141 vec 4471 1.311 0.217 0.433 1937.927 cpu 3: loop 2154 1.481 0.162 0.601 1295.083 vec 4236 1.253 0.230 0.425 1803.008 This was running my migrate.c code that can be found at: http://lwn.net/Articles/425763/ The migrate code does stress the RT tasks a bit. This shows that the loop did increase a little after the patch, but not by much. The vec code dropped dramatically. From 4.3us down to .42us. That's a 10x improvement! Tested-by: Mike Galbraith <mgalbraith@suse.de> Tested-by: Luis Claudio R. Gonçalves <lgoncalv@redhat.com> Tested-by: Matthew Hank Sabins<msabins@linux.vnet.ibm.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Gregory Haskins <gregory.haskins@gmail.com> Acked-by: Hillf Danton <dhillf@gmail.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Chris Mason <chris.mason@oracle.com> Link: http://lkml.kernel.org/r/1312317372.18583.101.camel@gandalf.stny.rr.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: Use pushable_tasks to determine next highest prio	Steven Rostedt	2011-08-14	1	-43/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Hillf Danton proposed a patch (see link) that cleaned up the sched_rt code that calculates the priority of the next highest priority task to be used in finding run queues to pull from. His patch removed the calculating of the next prio to just use the current prio when deteriming if we should examine a run queue to pull from. The problem with his patch was that it caused more false checks. Because we check a run queue for pushable tasks if the current priority of that run queue is higher in priority than the task about to run on our run queue. But after grabbing the locks and doing the real check, we find that there may not be a task that has a higher prio task to pull. Thus the locks were taken with nothing to do. I added some trace_printks() to record when and how many times the run queue locks were taken to check for pullable tasks, compared to how many times we pulled a task. With the current method, it was: 3806 locks taken vs 2812 pulled tasks With Hillf's patch: 6728 locks taken vs 2804 pulled tasks The number of times locks were taken to pull a task went up almost double with no more success rate. But his patch did get me thinking. When we look at the priority of the highest task to consider taking the locks to do a pull, a failure to pull can be one of the following: (in order of most likely) o RT task was pushed off already between the check and taking the lock o Waiting RT task can not be migrated o RT task's CPU affinity does not include the target run queue's CPU o RT task's priority changed between the check and taking the lock And with Hillf's patch, the thing that caused most of the failures, is the RT task to pull was not at the right priority to pull (not greater than the current RT task priority on the target run queue). Most of the above cases we can't help. But the current method does not check if the next highest prio RT task can be migrated or not, and if it can not, we still grab the locks to do the test (we don't find out about this fact until after we have the locks). I thought about this case, and realized that the pushable task plist that is maintained only holds RT tasks that can migrate. If we move the calculating of the next highest prio task from the inc/dec_rt_task() functions into the queuing of the pushable tasks, then we only measure the priorities of those tasks that we push, and we get this basically for free. Not only does this patch make the code a little more efficient, it cleans it up and makes it a little simpler. Thanks to Hillf Danton for inspiring me on this patch. Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hillf Danton <dhillf@gmail.com> Cc: Gregory Haskins <ghaskins@novell.com> Link: http://lkml.kernel.org/r/BANLkTimQ67180HxCx5vgMqumqw1EkFh3qg@mail.gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: Balance RT tasks when forked as well	Steven Rostedt	2011-08-14	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When a new task is woken, the code to balance the RT task is currently skipped in the select_task_rq() call. But it will be pushed if the rq is currently overloaded with RT tasks anyway. The issue is that we already queued the task, and if it does get pushed, it will have to be dequeued and requeued on the new run queue. The advantage with pushing it first is that we avoid this requeuing as we are pushing it off before the task is ever queued. See commit 318e0893ce3f524 ("sched: pre-route RT tasks on wakeup") for more details. The return of select_task_rq() when it is not a wake up has also been changed to return task_cpu() instead of smp_processor_id(). This is more of a sanity because the current only other user of select_task_rq() besides wake ups, is an exec, where task_cpu() should also be the same as smp_processor_id(). But if it is used for other purposes, lets keep the task on the same CPU. Why would we mant to migrate it to the current CPU? Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Hillf Danton <dhillf@gmail.com> Link: http://lkml.kernel.org/r/20110617015919.832743148@goodmis.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: Remove resetting exec_start in put_prev_task_rt()	Hillf Danton	2011-08-14	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	There's no reason to clean the exec_start in put_prev_task_rt() as it is reset when the task gets back to the run queue. This saves us doing a store() in the fast path. Signed-off-by: Hillf Danton <dhillf@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Yong Zhang <yong.zhang0@gmail.com> Link: http://lkml.kernel.org/r/BANLkTimqWD=q6YnSDi-v9y=LMWecgEzEWg@mail.gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched, rt: Fix rq->rt.pushable_tasks bug in push_rt_task()	Hillf Danton	2011-08-14	1	-7/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	Do not call dequeue_pushable_task() when failing to push an eligible task, as it remains pushable, merely not at this particular moment. Signed-off-by: Hillf Danton <dhillf@gmail.com> Signed-off-by: Mike Galbraith <mgalbraith@gmx.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Yong Zhang <yong.zhang0@gmail.com> Link: http://lkml.kernel.org/r/1306895385.4791.26.camel@marge.simson.net Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: Remove noop in lowest_flag_domain()	Hillf Danton	2011-08-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Checking for the validity of sd is removed, since it is already checked by the for_each_domain macro. Signed-off-by: Hillf Danton <dhillf@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/BANLkTimT+Tut-3TshCDm-NiLLXrOznibNA@mail.gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: Remove noop in next_prio()	Hillf Danton	2011-08-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	When computing the next priority for a given run-queue, the check for RT priority of the task determined by the pick_next_highest_task_rt() function could be removed, since only RT tasks are returned by the function. Reviewed-by: Yong Zhang <yong.zhang0@gmail.com> Signed-off-by: Hillf Danton <dhillf@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/BANLkTimxmWiof9s5AvS3v_0X+sMiE=0x5g@mail.gmail.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: fix broken SCHED_RESET_ON_FORK handling	Mike Galbraith	2011-08-14	1	-13/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Setting child->prio = current->normal_prio _after_ SCHED_RESET_ON_FORK has been handled for an RT parent gives birth to a deranged mutant child with non-RT policy, but RT prio and sched_class. Move PI leakage protection up, always set priorities and weight, and if the child is leaving RT class, reset rt_priority to the proper value. Signed-off-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1311779695.8691.2.camel@marge.simson.net Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: Kill WAKEUP_PREEMPT	Yong Zhang	2011-08-14	2	-13/+1
\| \| \| \| \| \| \| \| \| \| \|	Remove the WAKEUP_PREEMPT feature, disabling it doesn't make any sense and its outlived its use by a long long while. Signed-off-by: Yong Zhang <yong.zhang0@gmail.com> Acked-by: Mike Galbraith <efault@gmx.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110729082033.GB12106@zhy Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	sched: Remove rq->avg_load_per_task	Jan H. Schönherr	2011-08-14	1	-6/+2
\| \| \| \| \| \| \| \| \| \|	Since commit a2d47777 ("sched: fix stale value in average load per task") the variable rq->avg_load_per_task is no longer required. Remove it. Signed-off-by: Jan H. Schönherr <schnhrr@cs.tu-berlin.de> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1312189408-17172-1-git-send-email-schnhrr@cs.tu-berlin.de Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc	Linus Torvalds	2011-08-12	1	-6/+7
\|\ \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc: Don't do hypervisor calls on non-sun4v in DS driver.
\| *	sparc: Don't do hypervisor calls on non-sun4v in DS driver.	David S. Miller	2011-08-12	1	-6/+7
\| \| \| \| \| \| \| \| \| \|	Reported-by: Pieter-Paul Giesberts <pieterpg@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* \|	pnfs: Automatically select blocks & objects layouts	Boaz Harrosh	2011-08-12	1	-14/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Just like files-layout, blocks & objects layouts are part of the NFS 4.1 protocol and should be automatically selected if NFS_4_1 is selected. The small problem is that these depend on other Kernel support being present, while files only depends on NFS itself. This patch removes from the user choice the presence of objects and blocks layout. But makes sure these are selected only if the depended subsystems are present in the Kernel. Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Acked-by: Peng Tao <peng_tao@emc.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	ext4: Properly count journal credits for long symlinks	Eric Sandeen	2011-08-12	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit df5e6223407e ("ext4: fix deadlock in ext4_symlink() in ENOSPC conditions") recalculated the number of credits needed for a long symlink, in the process of splitting it into two transactions. However, the first credit calculation under-counted because if selinux is enabled, credits are needed to create the selinux xattr as well. Overrunning the reservation will result in an OOPS in jbd2_journal_dirty_metadata() due to this assert: J_ASSERT_JH(jh, handle->h_buffer_credits > 0); Fix this by increasing the reservation size. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	ext3: Properly count journal credits for long symlinks	Eric Sandeen	2011-08-12	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit ae54870a1dc9 ("ext3: Fix lock inversion in ext3_symlink()") recalculated the number of credits needed for a long symlink, in the process of splitting it into two transactions. However, the first credit calculation under-counted because if selinux is enabled, credits are needed to create the selinux xattr as well. Overrunning the reservation will result in an OOPS in journal_dirty_metadata() due to this assert: J_ASSERT_JH(jh, handle->h_buffer_credits > 0); Fix this by increasing the reservation size. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	move RLIMIT_NPROC check from set_user() to do_execve_common()	Vasiliy Kulikov	2011-08-11	5	-8/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The patch http://lkml.org/lkml/2003/7/13/226 introduced an RLIMIT_NPROC check in set_user() to check for NPROC exceeding via setuid() and similar functions. Before the check there was a possibility to greatly exceed the allowed number of processes by an unprivileged user if the program relied on rlimit only. But the check created new security threat: many poorly written programs simply don't check setuid() return code and believe it cannot fail if executed with root privileges. So, the check is removed in this patch because of too often privilege escalations related to buggy programs. The NPROC can still be enforced in the common code flow of daemons spawning user processes. Most of daemons do fork()+setuid()+execve(). The check introduced in execve() (1) enforces the same limit as in setuid() and (2) doesn't create similar security issues. Neil Brown suggested to track what specific process has exceeded the limit by setting PF_NPROC_EXCEEDED process flag. With the change only this process would fail on execve(), and other processes' execve() behaviour is not changed. Solar Designer suggested to re-check whether NPROC limit is still exceeded at the moment of execve(). If the process was sleeping for days between set*uid() and execve(), and the NPROC counter step down under the limit, the defered execve() failure because NPROC limit was exceeded days ago would be unexpected. If the limit is not exceeded anymore, we clear the flag on successful calls to execve() and fork(). The flag is also cleared on successful calls to set_user() as the limit was exceeded for the previous user, not the current one. Similar check was introduced in -ow patches (without the process flag). v3 - clear PF_NPROC_EXCEEDED on successful calls to set_user(). Reviewed-by: James Morris <jmorris@namei.org> Signed-off-by: Vasiliy Kulikov <segoon@openwall.com> Acked-by: NeilBrown <neilb@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	Merge branch 'perf-urgent-for-linus' of ↵	Linus Torvalds	2011-08-11	16	-68/+239
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: perf symbols: Check '/tmp/perf-' symbol file ownership perf sched: Usage leftover from trace -> script rename perf sched: Do not delete session object prematurely perf tools: Check $HOME/.perfconfig ownership perf, x86: Add model 45 SandyBridge support perf tools: Add support to install perf python extension perf tools: do not look at ./config for configuration perf tools: Make clean leaves some files perf lock: Dropping unsupported ':r' modifier perf probe: Fix coredump introduced by probe module option jump label: Reduce the cycle count by changing the link order perf report: Use ui__warning in some more places perf python: Add PERF_RECORD_{LOST,READ,SAMPLE} routine tables perf evlist: Introduce 'disable' method trace events: Update version number reference to new 3.x scheme for EVENT_POWER_TRACING_DEPRECATED perf buildid-cache: Zero out buffer of filenames when adding/removing buildid
\| * \	Merge branch 'perf/core' of ↵	Ingo Molnar	2011-08-10	3	-12/+50
\| \|\ \ \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
\| \| * \|	perf symbols: Check '/tmp/perf-' symbol file ownership	Pekka Enberg	2011-08-09	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The external symbol files are generated by JIT compilers, for example, but we need to make sure they're ours before injecting them to 'perf report'. Requested-by: Ingo Molnar <mingo@elte.hu> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1312919658-17158-1-git-send-email-penberg@kernel.org Signed-off-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
\| \| * \|	perf sched: Usage leftover from trace -> script rename	Jiri Olsa	2011-08-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The 'perf sched' command usage still showing 'trace' command instead of the 'script' command. Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110809124651.GD2056@jolsa.brq.redhat.com Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
\| \| * \|	perf sched: Do not delete session object prematurely	Jiri Olsa	2011-08-09	1	-7/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The session object is released prematurely when processing events for latency command. The session's thread objects are used within the output_lat_thread function. Runnning following commands: # perf sched record # perf sched latency the latter displays incorrect data and might cause access violation. Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1312837414-3819-1-git-send-email-jolsa@redhat.com Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
\| \| * \|	perf tools: Check $HOME/.perfconfig ownership	Arnaldo Carvalho de Melo	2011-08-09	1	-4/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Just like we do already for perf.data files. Requested-by: Ingo Molnar <mingo@elte.hu> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Christian Ohm <chr.ohm@gmx.net> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jonathan Nieder <jrnieder@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-qgokmxsmvppwpc5404qhyk7e@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
\| * \| \|	Merge branch 'perf/core' of ↵	Ingo Molnar	2011-08-09	5	-23/+46
\| \|\\| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
\| \| * \|	perf tools: Add support to install perf python extension	Jiri Olsa	2011-08-08	2	-7/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Adding install-python_ext target to install python extension related files. Installation directory is governed by python distutils package and follows the DESTDIR variable settings. Also moving python extension build output into '$(O)python_ext_build' directory and making it configurable via PYTHON_EXTBUILD variable. Keeping the '$(O)python/perf.so' file, so it could be used for testing as of until now. Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110722113307.GA1931@jolsa.brq.redhat.com Signed-off-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
\| \| * \|	perf tools: do not look at ./config for configuration	Jonathan Nieder	2011-08-08	1	-7/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In addition to /etc/perfconfig and $HOME/.perfconfig, perf looks for configuration in the file ./config, imitating git which looks at $GIT_DIR/config. If ./config is not a perf configuration file, it fails, or worse, treats it as a configuration file and changes behavior in some unexpected way. "config" is not an unusual name for a file to be lying around and perf does not have a private directory dedicated for its own use, so let's just stop looking for configuration in the cwd. Callers needing context-sensitive configuration can use the PERF_CONFIG environment variable. Requested-by: Christian Ohm <chr.ohm@gmx.net> Cc: 632923@bugs.debian.org Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Christian Ohm <chr.ohm@gmx.net> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110805165838.GA7237@elie.gateway.2wire.net Signed-off-by: Jonathan Nieder <jrnieder@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
\| \| * \|	perf tools: Make clean leaves some files	Kusanagi Kouichi	2011-08-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use LIB_OBJS and BUILTIN_OBJS for .o files. LIB_FILE is already prefixed with OUTPUT. Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/20110807083932.9C0E514C03B@msa103.auone-net.jp Signed-off-by: Kusanagi Kouichi <slash@ac.auone-net.jp> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
\| \| * \|	perf lock: Dropping unsupported ':r' modifier	Zhu Yanhai	2011-08-08	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Looks to me like the :r modifier is not supported anymore, so remove it from the list of events. Without this fix 'perf lock record' doesn't work. Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Zhu Yanhai <gaoyang.zyh@taobao.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1312035232-9534-1-git-send-email-gaoyang.zyh@taobao.com Signed-off-by: Zhu Yanhai <gaoyang.zyh@taobao.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
\| \| * \|	perf probe: Fix coredump introduced by probe module option	Jovi Zhang	2011-08-08	1	-4/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	perf will coredump if the user doesn't give the "-m" option in probe command, this patch fixes it. [root@localhost perf]# ./perf probe --add='PROBE' Segmentation fault (core dumped) Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1311602888-2389-1-git-send-email-bookjovi@gmail.com Signed-off-by: Jovi Zhang <bookjovi@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
\| * \| \|	perf, x86: Add model 45 SandyBridge support	Youquan Song	2011-08-09	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add support to Romely-EP SandyBridge. Signed-off-by: Youquan Song <youquan.song@intel.com> Signed-off-by: Anhua Xu <anhua.xu@intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Link: http://lkml.kernel.org/r/1312264895-2010-1-git-send-email-youquan.song@intel.com Signed-off-by: Ingo Molnar <mingo@elte.hu>
\| * \| \|	jump label: Reduce the cycle count by changing the link order	Jason Baron	2011-08-05	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In the course of testing jump labels for use with the CFS bandwidth controller, Paul Turner, discovered that using jump labels reduced the branch count and the instruction count, but did not reduce the cycle count or wall time. I noticed that having the jump_label.o included in the kernel but not used in any way still caused this increase in cycle count and wall time. Thus, I moved jump_label.o in the kernel/Makefile, thus changing the link order, and presumably moving it out of hot icache areas. This brought down the cycle count/time as expected. In addition to Paul's testing, I've tested the patch using a single 'static_branch()' in the getppid() path, and basically running tight loops of calls to getppid(). Here are my results for the branch disabled case: With jump labels turned on (CONFIG_JUMP_LABEL), branch disabled: Performance counter stats for 'bash -c /tmp/getppid;true' (50 runs): 3,969,510,217 instructions # 0.864 IPC ( +-0.000% ) 4,592,334,954 cycles ( +- 0.046% ) 751,634,470 branches ( +- 0.000% ) 1.722635797 seconds time elapsed ( +- 0.046% ) Jump labels turned off (CONFIG_JUMP_LABEL not set), branch disabled: Performance counter stats for 'bash -c /tmp/getppid;true' (50 runs): 4,009,611,846 instructions # 0.867 IPC ( +-0.000% ) 4,622,210,580 cycles ( +- 0.012% ) 771,662,904 branches ( +- 0.000% ) 1.734341454 seconds time elapsed ( +- 0.022% ) Signed-off-by: Jason Baron <jbaron@redhat.com> Cc: rth@redhat.com Cc: a.p.zijlstra@chello.nl Cc: rostedt@goodmis.org Link: http://lkml.kernel.org/r/20110805204040.GG2522@redhat.com Signed-off-by: Ingo Molnar <mingo@elte.hu> Tested-by: Paul Turner <pjt@google.com>
\| * \| \|	Merge branch 'perf/core' of ↵	Ingo Molnar	2011-08-05	6	-32/+140
\| \|\\| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent
\| \| * \|	perf report: Use ui__warning in some more places	Arnaldo Carvalho de Melo	2011-08-03	1	-8/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	So that we get a proper warning in the TUI in cases like: $ perf report --stdio -g fractal,0.5,caller --sort pid Selected -g but no callchain data. Did you call 'perf record' without -g? $ The --stdio case is ok because it uses fprintf, ui__warning is needed to figure out if --stdio or --tui is being used. Cc: Arun Sharma <asharma@fb.com> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sam Liao <phyomh@gmail.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-ag9fz2wd17mbbfjsbznq1wms@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
\| \| * \|	perf python: Add PERF_RECORD_{LOST,READ,SAMPLE} routine tables	Arnaldo Carvalho de Melo	2011-07-25	1	-3/+112
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	So those friggin "spurious" PERF_RECORD_MMAP events were actually a brain fart copy'n'paste error in the python binding, doh. I.e. they weren't MMAPs, just SAMPLEs. Fix it by providing routines for these events instead of using the MMAP ones. Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-b0rc8y5jd03f9f11kftodvkm@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
\| \| * \|	perf evlist: Introduce 'disable' method	Arnaldo Carvalho de Melo	2011-07-25	3	-17/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	To remove the last case of access to the FD() macro outside the library. Inspired by a patch by Borislav that moved the FD() macro to util.h, for namespace concerns I rather preferred to constrain it to ev{sel,list}.c. Cc: Borislav Petkov <bp@amd64.org> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-qn893qsstcg366tkucu649qj@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
\| \| * \|	perf buildid-cache: Zero out buffer of filenames when adding/removing buildid	Han Pingtian	2011-07-22	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The readlink() function doesn't append a null byte to buf. So we should zero out buf with zalloc(). Or we'll see sometimes error like this: [root@intel-s3e36-01]~# /usr/bin/perf buildid-cache -a /lib/modules/2.6.32-130.el6.x86_64/kernel/crypto/twofish_common.ko -v Adding f64ba8efd5f53c7ad332fc17db1d21de309038e1 /lib/modules/2.6.32-130.el6.x86_64/kernel/crypto/twofish_common.ko: Ok [root@intel-s3e36-01]~# /usr/bin/perf buildid-cache -r /lib/modules/2.6.32-130.el6.x86_64/kernel/crypto/twofish_common.ko -v Removing f64ba8efd5f53c7ad332fc17db1d21de309038e1 /lib/modules/2.6.32-130.el6.x86_64/kernel/crypto/twofish_common.ko: FAIL /lib/modules/2.6.32-130.el6.x86_64/kernel/crypto/twofish_common.ko wasn't in the cache The change in build_id_cache__add_s() is a defense. Tested-by: Jiri Olsa <jolsa@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lkml.kernel.org/r/20110718031314.GA5802@hpt.nay.redhat.com Signed-off-by: Han Pingtian <phan@redhat.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
\| * \| \|	Merge branch 'linus' into perf/urgent	Ingo Molnar	2011-08-05	8119	-374350/+487400
\| \|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Merge reason: Include most of the merge window trees, to do fixes on top. Signed-off-by: Ingo Molnar <mingo@elte.hu>
\| * \| \| \|	trace events: Update version number reference to new 3.x scheme for ↵	Jesper Juhl	2011-07-25	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	EVENT_POWER_TRACING_DEPRECATED What was scheduled to be 2.6.41 is now going to be 3.1 . Signed-off-by: Jesper Juhl <jj@chaosbits.net> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/alpine.LNX.2.00.1107250929370.8080@swampdragon.chaosbits.net Signed-off-by: Ingo Molnar <mingo@elte.hu>
* \| \| \| \|	MAINTAINERS: Update linus' git repository	Tracey Dent	2011-08-11	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change to new git tree - (git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git). Signed-off-by: Tracey Dent <tdent48227@gmail.com> Acked-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \| \| \|	Revert "EDAC: Correct Kconfig dependencies"	Linus Torvalds	2011-08-11	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit af9d220bac41dc3201893e1601cc7c44f7da4498. It turns out that one was meant to be applied on top of the edac.git tree in -next that has more i7core_edac changes, but that wasn't clear in the original email. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Borislav Petkov <borislav.petkov@amd.com> Cc: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \| \| \|	NFS41: make PNFS_BLOCK selectable	Peng Tao	2011-08-11	1	-5/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PNFS_BLOCK needs BLK_DEV_DM/MD, which is not a dependency for other pnfs layout drivers. Seperate it out so others can still build when BLK_DEV_DM/MD is not enabled. Also change select to depends on to avoid build failures. Reported-and-tested-by: Randy Dunlap <rdunlap@xenotime.net> Signed-off-by: Peng Tao <peng_tao@emc.com> Acked-by: Benny Halevy <bhalevy@tonian.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \| \| \|	Merge branch 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm	Linus Torvalds	2011-08-11	6	-19/+50
\|\ \ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* 'fixes' of master.kernel.org:/home/rmk/linux-2.6-arm: ARM: drop experimental status for ARM_PATCH_PHYS_VIRT ARM: 7008/1: alignment: Make SIGBUS sent to userspace POSIXly correct ARM: 7007/1: alignment: Prevent ignoring of faults with ARMv6 unaligned access model ARM: 7010/1: mm: fix invalid loop for poison_init_mem ARM: 7005/1: freshen up mm/proc-arm946.S dmaengine: PL08x: Fix trivial build error ARM: Fix build error for SMP=n builds
\| * \| \| \| \|	ARM: drop experimental status for ARM_PATCH_PHYS_VIRT	Russell King	2011-08-10	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This has now been well tested, and several platforms are now selecting this directly. It's time to drop its experimental status. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
\| * \| \| \| \|	ARM: 7008/1: alignment: Make SIGBUS sent to userspace POSIXly correct	Dave Martin	2011-08-09	1	-3/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	With the UM_SIGNAL alignment fault mode, no siginfo structure is passed to userspace. POSIX specifies how siginfo_t should be populated for alignment faults, so this patch does just that: * si_signo = SIGBUS * si_code = BUS_ADRALN * si_addr = misaligned data address at which access was attempted Signed-off-by: Dave Martin <dave.martin@linaro.org> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org> Acked-by: Kirill A. Shutemov <kirill@shutemov.name> Reviewed-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
\| * \| \| \| \|	ARM: 7007/1: alignment: Prevent ignoring of faults with ARMv6 unaligned ↵	Dave Martin	2011-08-09	1	-12/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	access model Currently, it's possible to set the kernel to ignore alignment faults when changing the alignment fault handling mode at runtime via /proc/sys/alignment, even though this is undesirable on ARMv6 and above, where it can result in infinite spins where an un-fixed- up instruction repeatedly faults. In addition, the kernel clobbers any alignment mode specified on the command-line if running on ARMv6 or above. This patch factors out the necessary safety check into a couple of new helper functions, and checks and modifies the fault handling mode as appropriate on boot and on writes to /proc/cpu/alignment. Prior to ARMv6, the behaviour is unchanged. For ARMv6 and above, the behaviour changes as follows: * Attempting to ignore faults on ARMv6 results in the mode being forced to UM_FIXUP instead. A warning is printed if this happened as a result of a write to /proc/cpu/alignment. The user's UM_WARN bit (if present) is still honoured. * An alignment= argument from the kernel command-line is now honoured, except that the kernel will modify the specified mode as described above. This is allows modes such as UM_SIGNAL and UM_WARN to be active immediately from boot, which is useful for debugging purposes. Signed-off-by: Dave Martin <dave.martin@linaro.org> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
\| * \| \| \| \|	ARM: 7010/1: mm: fix invalid loop for poison_init_mem	Jamie Iles	2011-08-09	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	poison_init_mem() used a loop of: while ((count = count - 4)) which has 2 problems - an off by one error so that we do one less word than we should, and the other is that if count == 0 then we loop forever and poison too much. On a platform with HAVE_TCM=y but nothing in the TCM's, this caused corruption and the platform failed to boot. Acked-by: Stephen Boyd <sboyd@codeaurora.org> Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org> Signed-off-by: Jamie Iles <jamie@jamieiles.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
\| * \| \| \| \|	ARM: 7005/1: freshen up mm/proc-arm946.S	Brian S. Julin	2011-08-09	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The file mm/proc-arm946.S contains a typo and is missing a structure member in __arm946_proc_info. The former prevents compilation and the latter causes problems during boot. It is likely this file was manually copied from a similar file and not tested, then later updates to the *_proc_info structures missed this file. This patch will apply (with offset) with or without the recent macro unification work that has been done in this directory. This was verified against linux-next/stable last week. See arm-linux-kernel thread: http://lists.arm.linux.org.uk/lurker/message/20110718.103237.0106d468.en.html Signed-off-by: Brian S. Julin <bri@abrij.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
\| * \| \| \| \|	dmaengine: PL08x: Fix trivial build error	Russell King	2011-08-09	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Something changed during the 3.1 merge window in the include files which now causes the pl08x DMA engine driver to fail to build. Fix this by adding the now necessary dma-mapping.h include: drivers/dma/amba-pl08x.c: In function ■pl08x_unmap_buffers■: drivers/dma/amba-pl08x.c:1524: error: implicit declaration of function ■dma_unmap_single■ drivers/dma/amba-pl08x.c:1527: error: implicit declaration of function ■dma_unmap_page■ Acked-by: Vinod Koul <vinod.koul@intel.com> Acked-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>