summaryrefslogtreecommitdiffstats
path: root/kernel (follow)
Commit message (Collapse)AuthorAgeFilesLines
* modules: warn about suspicious return values from module's ->init() hookAlexey Dobriyan2008-03-111-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Return value convention of module's init functions is 0/-E. Sometimes, e.g. during forward-porting mistakes happen and buggy module created, where result of comparison "workqueue != NULL" is propagated all the way up to sys_init_module. What happens is that some other module created workqueue in question, our module created it again and module was successfully loaded. Or it could be some other bug. Let's make such mistakes much more visible. In retrospective, such messages would noticeably shorten some of my head-scratching sessions. Note, that dump_stack() is just a way to get attention from user. Sample message: sys_init_module: 'foo'->init suspiciously returned 1, it should follow 0/-E convention sys_init_module: loading module anyway... Pid: 4223, comm: modprobe Not tainted 2.6.24-25f666300625d894ebe04bac2b4b3aadb907c861 #5 Call Trace: [<ffffffff80254b05>] sys_init_module+0xe5/0x1d0 [<ffffffff8020b39b>] system_call_after_swapgs+0x7b/0x80 Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* modules: fix module waiting for dependent modules' initRusty Russell2008-03-111-3/+4
| | | | | | | | | | | | | | | | Commit c9a3ba55 (module: wait for dependent modules doing init.) didn't quite work because the waiter holds the module lock, meaning that the state of the module it's waiting for cannot change. Fortunately, it's fairly simple to update the state outside the lock and do the wakeup. Thanks to Jan Glauber for tracking this down and testing (qdio and qeth). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Jan Glauber <jang@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrtLinus Torvalds2008-03-093-11/+20
|\ | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/tglx/linux-2.6-hrt: time: remove obsolete CLOCK_TICK_ADJUST time: don't touch an offlined CPU's ts->tick_stopped in tick_cancel_sched_timer() time: prevent the loop in timespec_add_ns() from being optimised away ntp: use unsigned input for do_div()
| * time: remove obsolete CLOCK_TICK_ADJUSTRoman Zippel2008-03-092-5/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The first version of the ntp_interval/tick_length inconsistent usage patch was recently merged as bbe4d18ac2e058c56adb0cd71f49d9ed3216a405 http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=bbe4d18ac2e058c56adb0cd71f49d9ed3216a405 While the fix did greatly improve the situation, it was correctly pointed out by Roman that it does have a small bug: If the users change clocksources after the system has been running and NTP has made corrections, the correctoins made against the old clocksource will be applied against the new clocksource, causing error. The second attempt, which corrects the issue in the NTP_INTERVAL_LENGTH definition has also made it up-stream as commit e13a2e61dd5152f5499d2003470acf9c838eab84 http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e13a2e61dd5152f5499d2003470acf9c838eab84 Roman has correctly pointed out that CLOCK_TICK_ADJUST is calculated based on the PIT's frequency, and isn't really relevant to non-PIT driven clocksources (that is, clocksources other then jiffies and pit). This patch reverts both of those changes, and simply removes CLOCK_TICK_ADJUST. This does remove the granularity error correction for users of PIT and Jiffies clocksource users, but the granularity error but for the majority of users, it should be within the 500ppm range NTP can accommodate for. For systems that have granularity errors greater then 500ppm, the "ntp_tick_adj=" boot option can be used to compensate. [johnstul@us.ibm.com: provided changelog] [mattilinnanvuori@yahoo.com: maek ntp_tick_adj static] Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Acked-by: john stultz <johnstul@us.ibm.com> Signed-off-by: Matti Linnanvuori <mattilinnanvuori@yahoo.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: mingo@elte.hu Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * time: don't touch an offlined CPU's ts->tick_stopped in ↵Karsten Wiese2008-03-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | tick_cancel_sched_timer() Silences WARN_ONs in rcu_enter_nohz() and rcu_exit_nohz(), which appeared before caused by (repeated) calls to: $ echo 0 > /sys/devices/system/cpu/cpu1/online $ echo 1 > /sys/devices/system/cpu/cpu1/online Signed-off-by: Karsten Wiese <fzu@wemgehoertderstaat.de> Cc: johnstul@us.ibm.com Cc: Rafael Wysocki <rjw@sisk.pl> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * ntp: use unsigned input for do_div()David Howells2008-03-091-5/+7
| | | | | | | | | | | | | | | | | | | | | | | | The kernel NTP code shouldn't hand 64-bit *signed* values to do_div(). Make it instead hand 64-bit unsigned values. This gets rid of a couple of warnings. Signed-off-by: David Howells <dhowells@redhat.com> Cc: Roman Zippel <zippel@linux-m68k.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | cpu hotplug: adjust root-domain->online span in response to hotplug eventGregory Haskins2008-03-091-11/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We currently set the root-domain online span automatically when the domain is added to the cpu if the cpu is already a member of cpu_online_map. This was done as a hack/bug-fix for s2ram, but it also causes a problem with hotplug CPU_DOWN transitioning. The right way to fix the original problem is to actually respond to CPU_UP events, instead of CPU_ONLINE, which is already too late. This solves the hung reboot regression reported by Andrew Morton and others. Signed-off-by: Gregory Haskins <ghaskins@novell.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Fix waitid si_code regressionRoland McGrath2008-03-081-1/+1
|/ | | | | | | | | | | In commit ee7c82da830ea860b1f9274f1f0cdf99f206e7c2 ("wait_task_stopped: simplify and fix races with SIGCONT/SIGKILL/untrace"), the magic (short) cast when storing si_code was lost in wait_task_stopped. This leaks the in-kernel CLD_* values that do not match what userland expects. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* sched: don't allow rt_runtime_us to be zero for groups having rt tasksDhaval Giani2008-03-071-0/+17
| | | | | | | | | | This patch checks if we can set the rt_runtime_us to 0. If there is a realtime task in the group, we don't want to set the rt_runtime_us as 0 or bad things will happen. (that task wont get any CPU time despite being TASK_RUNNNG) Signed-off-by: Dhaval Giani <dhaval@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: rt-group: fixup schedulability constraints calculationPeter Zijlstra2008-03-071-7/+3
| | | | | | | | | | | | | | | it was only possible to configure the rt-group scheduling parameters beyond the default value in a very small range. that's because div64_64() has a different calling convention than do_div() :/ fix a few untidies while we are here; sysctl_sched_rt_period may overflow due to that multiplication, so cast to u64 first. Also that RUNTIME_INF juggling makes little sense although its an effective NOP. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: fix the wrong time slice value for SCHED_FIFO tasksMiao Xie2008-03-071-1/+1
| | | | | | | | Function sys_sched_rr_get_interval returns wrong time slice value for SCHED_FIFO tasks. The time slice for SCHED_FIFO tasks should be 0. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: export task_nicePavel Roskin2008-03-071-1/+1
| | | | | | | The API is trivial, and so is the implementation. Signed-off-by: Pavel Roskin <proski@gnu.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: balance RT task resched only on runqueueSteven Rostedt2008-03-071-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sripathi Kodi reported a crash in the -rt kernel: https://bugzilla.redhat.com/show_bug.cgi?id=435674 this is due to a place that can reschedule a task without holding the tasks runqueue lock. This was caused by the RT balancing code that pulls RT tasks to the current run queue and will reschedule the current task. There's a slight chance that the pulling of the RT tasks will release the current runqueue's lock and retake it (in the double_lock_balance). During this time that the runqueue is released, the current task can migrate to another runqueue. In the prio_changed_rt code, after the pull, if the current task is of lesser priority than one of the RT tasks pulled, resched_task is called on the current task. If the current task had migrated in that small window, resched_task will be called without holding the runqueue lock for the runqueue that the task is on. This race condition also exists in the mainline kernel and this patch adds a check to make sure the task hasn't migrated before calling resched_task. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Tested-by: Sripathi Kodi <sripathik@in.ibm.com> Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* sched: retain vruntimePeter Zijlstra2008-03-072-0/+19
| | | | | | | | | | | | | Kei Tokunaga reported an interactivity problem when moving tasks between control groups. Tasks would retain their old vruntime when moved between groups, this can cause funny lags. Re-set the vruntime on group move to fit within the new tree. Reported-by: Kei Tokunaga <tokunaga.keiich@jp.fujitsu.com> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* cpusets: fix obsolete commentDavid Rientjes2008-03-061-2/+2
| | | | | | | | | | | | | mm migration is no longer done in cpuset_update_task_memory_state() so it can no longer take current->mm->mmap_sem, so fix the obsolete comment. [ This changed in commit 04c19fa6f16047abff2288ddbc1f0798ede5a849 ("cpuset: migrate all tasks in cpuset at once") when the mm migration was moved from cpuset_update_task_memory_state() to update_nodemask() ] Signed-off-by: David Rientjes <rientjes@google.com> Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* module: allow ndiswrapper to use GPL-only symbolsPavel Roskin2008-03-051-1/+8
| | | | | | | | | | | | | | A change after 2.6.24 broke ndiswrapper by accidentally removing its access to GPL-only symbols. Revert that change and add comments about the reasons why ndiswrapper and driverloader are treated in a special way. Signed-off-by: Pavel Roskin <proski@gnu.org> Acked-by: Greg KH <gregkh@suse.de> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Jon Masters <jonathan@jonmasters.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* kprobes: fix a null pointer bug in register_kretprobe()Masami Hiramatsu2008-03-051-17/+26
| | | | | | | | | | | | | | Fix a bug in regiseter_kretprobe() which does not check rp->kp.symbol_name == NULL before calling kprobe_lookup_name. For maintainability, this introduces kprobe_addr helper function which resolves addr field. It is used by register_kprobe and register_kretprobe. Signed-off-by: Masami Hiramatsu <mhiramat@redhat.com> Cc: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Cc: Jim Keniston <jkenisto@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* markers: don't risk NULL deref in markerJesper Juhl2008-03-051-4/+5
| | | | | | | | | | get_marker() may return NULL, so test for it. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Jesper Juhl <jesper.juhl@gmail.com> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Kprobes: indicate kretprobe support in KconfigAnanth N Mavinakayanahalli2008-03-051-6/+3
| | | | | | | | | | | | | | | | | Add CONFIG_HAVE_KRETPROBES to the arch/<arch>/Kconfig file for relevant architectures with kprobes support. This facilitates easy handling of in-kernel modules (like samples/kprobes/kretprobe_example.c) that depend on kretprobes being present in the kernel. Thanks to Sam Ravnborg for helping make the patch more lean. Per Mathieu's suggestion, added CONFIG_KRETPROBES and fixed up dependencies. Signed-off-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Acked-by: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Memory Resource Controller use strstrip while parsing argumentsBalbir Singh2008-03-051-0/+1
| | | | | | | | | | The memory controller has a requirement that while writing values, we need to use echo -n. This patch fixes the problem and makes the UI more consistent. Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Cc: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cgroup: fix default notify_on_release settingLi Zefan2008-03-051-1/+3
| | | | | | | | | | | The documentation says the default value of notify_on_release of a child cgroup is inherited from its parent, which is reasonable, but the implementation just sets the flag disabled. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'for-linus' of ↵Linus Torvalds2008-03-044-350/+70
|\ | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched-devel: sched: revert load_balance_monitor() changes
| * sched: revert load_balance_monitor() changesPeter Zijlstra2008-03-044-350/+70
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following commits cause a number of regressions: commit 58e2d4ca581167c2a079f4ee02be2f0bc52e8729 Author: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Date: Fri Jan 25 21:08:00 2008 +0100 sched: group scheduling, change how cpu load is calculated commit 6b2d7700266b9402e12824e11e0099ae6a4a6a79 Author: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Date: Fri Jan 25 21:08:00 2008 +0100 sched: group scheduler, fix fairness of cpu bandwidth allocation for task groups Namely: - very frequent wakeups on SMP, reported by PowerTop users. - cacheline trashing on (large) SMP - some latencies larger than 500ms While there is a mergeable patch to fix the latter, the former issues are not fixable in a manner suitable for .25 (we're at -rc3 now). Hence we revert them and try again in v2.6.26. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> Tested-by: Alexey Zaytsev <alexey.zaytsev@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | freezer vs stopped or tracedRoland McGrath2008-03-042-19/+26
|/ | | | | | | | | | | | | | | | | | | | | | | | This changes the "freezer" code used by suspend/hibernate in its treatment of tasks in TASK_STOPPED (job control stop) and TASK_TRACED (ptrace) states. As I understand it, the intent of the "freezer" is to hold all tasks from doing anything significant. For this purpose, TASK_STOPPED and TASK_TRACED are "frozen enough". It's possible the tasks might resume from ptrace calls (if the tracer were unfrozen) or from signals (including ones that could come via timer interrupts, etc). But this doesn't matter as long as they quickly block again while "freezing" is in effect. Some minor adjustments to the signal.c code make sure that try_to_freeze() very shortly follows all wakeups from both kinds of stop. This lets the freezer code safely leave stopped tasks unmolested. Changing this fixes the longstanding bug of seeing after resuming from suspend/hibernate your shell report "[1] Stopped" and the like for all your jobs stopped by ^Z et al, as if you had freshly fg'd and ^Z'd them. It also removes from the freezer the arcane special case treatment for ptrace'd tasks, which relied on intimate knowledge of ptrace internals. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* exit_notify: fix kill_orphaned_pgrp() usage with mt exitOleg Nesterov2008-03-031-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. exit_notify() always calls kill_orphaned_pgrp(). This is wrong, we should do this only when the whole process exits. 2. exit_notify() uses "current" as "ignored_task", obviously wrong. Use ->group_leader instead. Test case: void hup(int sig) { printf("HUP received\n"); } void *tfunc(void *arg) { sleep(2); printf("sub-thread exited\n"); return NULL; } int main(int argc, char *argv[]) { if (!fork()) { signal(SIGHUP, hup); kill(getpid(), SIGSTOP); exit(0); } pthread_t thr; pthread_create(&thr, NULL, tfunc, NULL); sleep(1); printf("main thread exited\n"); syscall(__NR_exit, 0); return 0; } output: main thread exited HUP received Hangup With this patch the output is: main thread exited sub-thread exited HUP received Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* will_become_orphaned_pgrp: partially fix insufficient ->exit_state checkOleg Nesterov2008-03-031-9/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | p->exit_state != 0 doesn't mean this process is dead, it may have sub-threads. Change the code to use "p->exit_state && thread_group_empty(p)" instead. Without this patch, ^Z doesn't deliver SIGTSTP to the foreground process if the main thread has exited. However, the new check is not perfect either. There is a window when exit_notify() drops tasklist and before release_task(). Suppose that the last (non-leader) thread exits. This means that entire group exits, but thread_group_empty() is not true yet. As Eric pointed out, is_global_init() is wrong as well, but I did not dare to do other changes. Just for the record, has_stopped_jobs() is absolutely wrong too. But we can't fix it now, we should first fix SIGNAL_STOP_STOPPED issues. Even with this patch ^Z doesn't play well with the dead main thread. The task is stopped correctly but do_wait(WSTOPPED) won't see it. This is another unrelated issue, will be (hopefully) fixed separately. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* introduce kill_orphaned_pgrp() helperOleg Nesterov2008-03-031-39/+35
| | | | | | | | | Factor out the common code in reparent_thread() and exit_notify(). No functional changes. Signed-off-by: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* [PATCH] drop EOE records from printkSteve Grubb2008-03-011-6/+8
| | | | | | | | | | | | Hi, While we are looking at the printk issue, I see that its printk'ing the EOE (end of event) records which is really not something that we need in syslog. Its really intended for the realtime audit event stream handled by the audit daemon. So, lets avoid printk'ing that record type. Signed-off-by: Steve Grubb <sgrubb@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* [RFC] AUDIT: do not panic when printk loses messagesEric Paris2008-03-011-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | On the latest kernels if one was to load about 15 rules, set the failure state to panic, and then run service auditd stop the kernel will panic. This is because auditd stops, then the script deletes all of the rules. These deletions are sent as audit messages out of the printk kernel interface which is already known to be lossy. These will overun the default kernel rate limiting (10 really fast messages) and will call audit_panic(). The same effect can happen if a slew of avc's come through while auditd is stopped. This can be fixed a number of ways but this patch fixes the problem by just not panicing if auditd is not running. We know printk is lossy and if the user chooses to set the failure mode to panic and tries to use printk we can't make any promises no matter how hard we try, so why try? At least in this way we continue to get lost message accounting and will eventually know that things went bad. The other change is to add a new call to audit_log_lost() if auditd disappears. We already pulled the skb off the queue and couldn't send it so that message is lost. At least this way we will account for the last message and panic if the machine is configured to panic. This code path should only be run if auditd dies for unforeseen reasons. If auditd closes correctly audit_pid will get set to 0 and we won't walk this code path. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* [PATCH] Audit: Fix the format type for size_t variablesPaul Moore2008-03-011-1/+1
| | | | | | | | | | | | Fix the following compiler warning by using "%zu" as defined in C99. CC kernel/auditsc.o kernel/auditsc.c: In function 'audit_log_single_execve_arg': kernel/auditsc.c:1074: warning: format '%ld' expects type 'long int', but argument 4 has type 'size_t' Signed-off-by: Paul Moore <paul.moore@hp.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* rcupreempt: remove never-migrates assumption from rcu_process_callbacks()Paul E. McKenney2008-02-291-2/+4
| | | | | | | | | | | | | | | This patch fixes a potentially invalid access to a per-CPU variable in rcu_process_callbacks(). This per-CPU access needs to be done in such a way as to guarantee that the code using it cannot move to some other CPU before all uses of the value accessed have completed. Even though this code is currently only invoked from softirq context, which currrently cannot migrate to some other CPU, life would be better if this code did not silently make such an assumption. Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* rcupreempt: fix hibernate/resume in presence of PREEMPT_RCU and hotplugPaul E. McKenney2008-02-291-1/+2
| | | | | | | | | | | | | | | This fixes a oops encountered when doing hibernate/resume in presence of PREEMPT_RCU. The problem was that the code failed to disable preemption when accessing a per-CPU variable. This is OK when called from code that already has preemption disabled, but such is not the case from the suspend/resume code path. Reported-by: Dave Young <hidave.darkstar@gmail.com> Tested-by: Dave Young <hidave.darkstar@gmail.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* softlockup: fix task state settingDmitry Adamushko2008-02-291-6/+7
| | | | | | | | kthread_stop() can be called when a 'watchdog' thread is executing after kthread_should_stop() but before set_task_state(TASK_INTERRUPTIBLE). Signed-off-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* rcu: add support for dynamic ticks and preempt rcuSteven Rostedt2008-02-293-4/+224
| | | | | | | | | | | | | | | The PREEMPT-RCU can get stuck if a CPU goes idle and NO_HZ is set. The idle CPU will not progress the RCU through its grace period and a synchronize_rcu my get stuck. Without this patch I have a box that will not boot when PREEMPT_RCU and NO_HZ are set. That same box boots fine with this patch. This patch comes from the -rt kernel where it has been tested for several months. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* Merge branch 'v2.6.25-rc3-lockdep' of ↵Linus Torvalds2008-02-261-4/+4
|\ | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-lockdep * 'v2.6.25-rc3-lockdep' of git://git.kernel.org/pub/scm/linux/kernel/git/peterz/linux-2.6-lockdep: Subject: lockdep: include all lock classes in all_lock_classes lockdep: increase MAX_LOCK_DEPTH
| * Subject: lockdep: include all lock classes in all_lock_classesDale Farnsworth2008-02-251-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | Add each lock class to the all_lock_classes list when it is first registered. Previously, lock classes were added to all_lock_classes when the lock class was first used. Since one of the uses of the list is to find unused locks, this didn't work well. Signed-off-by: Dale Farnsworth <dale@farnsworth.org> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-schedLinus Torvalds2008-02-262-14/+15
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: latencytop: change /proc task_struct access method latencytop: fix memory leak on latency proc file latencytop: fix kernel panic while reading latency proc file sched: add declaration of sched_tail to sched.h sched: fix signedness warnings in sched.c sched: clean up __pick_last_entity() a bit sched: remove duplicate code from sched_fair.c sched: make early bootup sched_clock() use safer
| * | sched: fix signedness warnings in sched.cHarvey Harrison2008-02-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unsigned long values are always assigned to switch_count, make it unsigned long. kernel/sched.c:3897:15: warning: incorrect type in assignment (different signedness) kernel/sched.c:3897:15: expected long *switch_count kernel/sched.c:3897:15: got unsigned long *<noident> kernel/sched.c:3921:16: warning: incorrect type in assignment (different signedness) kernel/sched.c:3921:16: expected long *switch_count kernel/sched.c:3921:16: got unsigned long *<noident> Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: clean up __pick_last_entity() a bitIngo Molnar2008-02-251-5/+3
| | | | | | | | | | | | Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: remove duplicate code from sched_fair.cBalbir Singh2008-02-251-9/+6
| | | | | | | | | | | | | | | | | | | | | | | | pick_task_entity() duplicates existing code. This functionality can be easily obtained using rb_last(). Avoid code duplication by using rb_last(). Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | sched: make early bootup sched_clock() use saferIngo Molnar2008-02-251-4/+10
| |/ | | | | | | | | | | | | | | | | | | | | | | | | do not call sched_clock() too early. Not only might rq->idle not be set up - but pure per-cpu data might not be accessible either. this solves an ia64 early bootup hang with CONFIG_PRINTK_TIME=y. Tested-by: Tony Luck <tony.luck@gmail.com> Acked-by: Tony Luck <tony.luck@gmail.com> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* / printk: fix possible printk overrunTejun Heo2008-02-261-1/+1
|/ | | | | | | | | | | | | | | | | | | | | printk recursion detection prepends message to printk_buf and offsets printk_buf when actual message is printed but it forgets to trim buffer length accordingly. This can result in overrun in extreme cases. Fix it. [ mingo@elte.hu: bug was introduced by me via: commit 32a76006683f7b28ae3cc491da37716e002f198e Author: Ingo Molnar <mingo@elte.hu> Date: Fri Jan 25 21:07:58 2008 +0100 printk: make printk more robust by not allowing recursion ] Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Add memory barrier semantics to wake_up() & coLinus Torvalds2008-02-241-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Oleg Nesterov and others have pointed out that on some architectures, the traditional sequence of set_current_state(TASK_INTERRUPTIBLE); if (CONDITION) return; schedule(); is racy wrt another CPU doing CONDITION = 1; wake_up_process(p); because while set_current_state() has a memory barrier separating setting of the TASK_INTERRUPTIBLE state from reading of the CONDITION variable, there is no such memory barrier on the wakeup side. Now, wake_up_process() does actually take a spinlock before it reads and sets the task state on the waking side, and on x86 (and many other architectures) that spinlock is in fact equivalent to a memory barrier, but that is not generally guaranteed. The write that sets CONDITION could move into the critical region protected by the runqueue spinlock. However, adding a smp_wmb() to before the spinlock should now order the writing of CONDITION wrt the lock itself, which in turn is ordered wrt the accesses within the spinlock (which includes the reading of the old state). This should thus close the race (which probably has never been seen in practice, but since smp_wmb() is a no-op on x86, it's not like this will make anything worse either on the most common architecture where the spinlock already gave the required protection). Acked-by: Oleg Nesterov <oleg@tv-sign.ru> Acked-by: Dmitry Adamushko <dmitry.adamushko@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cgroup: remove dead code in cgroup_get_rootdir()Li Zefan2008-02-241-1/+0
| | | | | | | Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cgroup: remove duplicate code in find_css_set()Li Zefan2008-02-241-1/+0
| | | | | | | | | The list head res->tasks gets initialized twice in find_css_set(). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cgroup: fix subsys bitopsLi Zefan2008-02-241-2/+2
| | | | | | | | | Cgroup uses unsigned long for subsys bitops, not unsigned long long. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cgroup: fix memory leak in cgroup_get_sb()Li Zefan2008-02-241-1/+4
| | | | | | | | | opts.release_agent is not kfree()ed in all necessary places. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* cgroup: fix commentsLi Zefan2008-02-241-63/+79
| | | | | | | | | | | | fix: - comments about need_forkexit_callback - comments about release agent - typo and comment style, etc. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* kprobes: refuse kprobe insertion on add/sub_preempt_counter()Srinivasa Ds2008-02-241-2/+2
| | | | | | | | | | | | | Kprobes makes use of preempt_disable(),preempt_enable_noresched() and these functions inturn call add/sub_preempt_count(). So we need to refuse user from inserting probe in to these functions. This patch disallows user from probing add/sub_preempt_count(). Signed-off-by: Srinivasa DS <srinivasa@in.ibm.com> Acked-by: Ananth N Mavinakayanahalli <ananth@in.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* futex: runtime enable pi and robust functionalityThomas Gleixner2008-02-242-4/+43
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Not all architectures implement futex_atomic_cmpxchg_inatomic(). The default implementation returns -ENOSYS, which is currently not handled inside of the futex guts. Futex PI calls and robust list exits with a held futex result in an endless loop in the futex code on architectures which have no support. Fixing up every place where futex_atomic_cmpxchg_inatomic() is called would add a fair amount of extra if/else constructs to the already complex code. It is also not possible to disable the robust feature before user space tries to register robust lists. Compile time disabling is not a good idea either, as there are already architectures with runtime detection of futex_atomic_cmpxchg_inatomic support. Detect the functionality at runtime instead by calling cmpxchg_futex_value_locked() with a NULL pointer from the futex initialization code. This is guaranteed to fail, but the call of futex_atomic_cmpxchg_inatomic() happens with pagefaults disabled. On architectures, which use the asm-generic implementation or have a runtime CPU feature detection, a -ENOSYS return value disables the PI/robust features. On architectures with a working implementation the call returns -EFAULT and the PI/robust features are enabled. The relevant syscalls return -ENOSYS and the robust list exit code is blocked, when the detection fails. Fixes http://lkml.org/lkml/2008/2/11/149 Originally reported by: Lennart Buytenhek Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Lennert Buytenhek <buytenh@wantstofly.org> Cc: Riku Voipio <riku.voipio@movial.fi> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>