summaryrefslogtreecommitdiffstats
path: root/kernel (follow)
Commit message (Collapse)AuthorAgeFilesLines
* [PATCH] pid: remove temporary debug code in attach_pidEric W. Biederman2006-09-271-3/+0
| | | | | | | | | | | With the patches flying between Oleg and myself somehow this temporary debug code got left in pid.c. It was never intended to make it to the stable kernel. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] pid: Implement transfer_pid and use it to simplify de_threadEric W. Biederman2006-09-271-0/+9
| | | | | | | | | | | | | | | | | | | In de_thread we move pids from one process to another, a rather ugly case. The function transfer_pid makes it clear what we are doing, and makes the action atomic. This is useful we ever want to atomically traverse the process group and session lists, in a rcu safe manner. Even if the atomic properties this change should be a win as transfer_pid should be less code to execute than executing both attach_pid and detach_pid, and this should make de_thread slightly smaller as only a single function call needs to be emitted. The only downside is that the code might be slower to execute as the odds are against transfer_pid being in cache. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Cc: Oleg Nesterov <oleg@tv-sign.ru> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] sysctl: Allow /proc/sys without sys_sysctlEric W. Biederman2006-09-271-75/+38
| | | | | | | | | | | Since sys_sysctl is deprecated start allow it to be compiled out. This should catch any remaining user space code that cares, and paves the way for further sysctl cleanups. [akpm@osdl.org: If sys_sysctl() is not compiled-in, emit a warning] Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] inode-diet: Eliminate i_blksize from the inode structureTheodore Ts'o2006-09-271-1/+0
| | | | | | | | | | | | | | | | This eliminates the i_blksize field from struct inode. Filesystems that want to provide a per-inode st_blksize can do so by providing their own getattr routine instead of using the generic_fillattr() function. Note that some filesystems were providing pretty much random (and incorrect) values for i_blksize. [bunk@stusta.de: cleanup] [akpm@osdl.org: generic_fillattr() fix] Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Adrian Bunk <bunk@stusta.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] inode_diet: Replace inode.u.generic_ip with inode.i_privateTheodore Ts'o2006-09-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | The following patches reduce the size of the VFS inode structure by 28 bytes on a UP x86. (It would be more on an x86_64 system). This is a 10% reduction in the inode size on a UP kernel that is configured in a production mode (i.e., with no spinlock or other debugging functions enabled; if you want to save memory taken up by in-core inodes, the first thing you should do is disable the debugging options; they are responsible for a huge amount of bloat in the VFS inode structure). This patch: The filesystem or device-specific pointer in the inode is inside a union, which is pretty pointless given that all 30+ users of this field have been using the void pointer. Get rid of the union and rename it to i_private, with a comment to explain who is allowed to use the void pointer. This is just a cleanup, but it allows us to reuse the union 'u' for something something where the union will actually be used. [judith@osdl.org: powerpc build fix] Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Judith Lebzelter <judith@osdl.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] NOMMU: move the fallback arch_vma_name() to a sensible placeDavid Howells2006-09-271-0/+5
| | | | | | | | | | | | | Move the fallback arch_vma_name() to a sensible place (kernel/signal.c). Currently it's in fs/proc/task_mmu.c, a file that is dependent on both CONFIG_PROC_FS and CONFIG_MMU being enabled, but it's used from kernel/signal.c from where it is called unconditionally. [akpm@osdl.org: build fix] Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] NOMMU: Check that access_process_vm() has a valid targetDavid Howells2006-09-271-54/+0
| | | | | | | | | | | | Check that access_process_vm() is accessing a valid mapping in the target process. This limits ptrace() accesses and accesses through /proc/<pid>/maps to only those regions actually mapped by a program. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* Resources: insert identical resources above existing resourcesMatthew Wilcox2006-09-271-16/+16
| | | | | | | | | | | | | | | | | If you have two resources which aree exactly the same size, insert_resource() currently inserts the new one below the existing one. This is wrong because there's no way to insert a resource of the same size above an existing one. I took this opportunity to rewrite the initial loop to be a for-loop instead of a goto-loop and fix the documentation. Signed-off-by: Matthew Wilcox <matthew@wil.cx> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* Merge branch 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6Linus Torvalds2006-09-267-11/+110
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://one.firstfloor.org/home/andi/git/linux-2.6: (225 commits) [PATCH] Don't set calgary iommu as default y [PATCH] i386/x86-64: New Intel feature flags [PATCH] x86: Add a cumulative thermal throttle event counter. [PATCH] i386: Make the jiffies compares use the 64bit safe macros. [PATCH] x86: Refactor thermal throttle processing [PATCH] Add 64bit jiffies compares (for use with get_jiffies_64) [PATCH] Fix unwinder warning in traps.c [PATCH] x86: Allow disabling early pci scans with pci=noearly or disallowing conf1 [PATCH] x86: Move direct PCI scanning functions out of line [PATCH] i386/x86-64: Make all early PCI scans dependent on CONFIG_PCI [PATCH] Don't leak NT bit into next task [PATCH] i386/x86-64: Work around gcc bug with noreturn functions in unwinder [PATCH] Fix some broken white space in ia32_signal.c [PATCH] Initialize argument registers for 32bit signal handlers. [PATCH] Remove all traces of signal number conversion [PATCH] Don't synchronize time reading on single core AMD systems [PATCH] Remove outdated comment in x86-64 mmconfig code [PATCH] Use string instructions for Core2 copy/clear [PATCH] x86: - restore i8259A eoi status on resume [PATCH] i386: Split multi-line printk in oops output. ...
| * [PATCH] i386/x86-64: Work around gcc bug with noreturn functions in unwinderJan Beulich2006-09-261-7/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current gcc generates calls not jumps to noreturn functions. When that happens the return address can point to the next function, which confuses the unwinder. This patch works around it by marking asynchronous exception frames in contrast normal call frames in the unwind information. Then teach the unwinder to decode this. For normal call frames the unwinder now subtracts one from the address which avoids this problem. The standard libgcc unwinder uses the same trick. It doesn't include adjustment of the printed address (i.e. for the original example, it'd still be kernel_math_error+0 that gets displayed, but the unwinder wouldn't get confused anymore. This only works with binutils 2.6.17+ and some versions of H.J.Lu's 2.6.16 unfortunately because earlier binutils don't support .cfi_signal_frame [AK: added automatic detection of the new binutils and wrote description] Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Andi Kleen <ak@suse.de>
| * [PATCH] Add the __stack_chk_fail() functionArjan van de Ven2006-09-261-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | GCC emits a call to a __stack_chk_fail() function when the stack canary is not matching the expected value. Since this is a bad security issue; lets panic the kernel rather than limping along; the kernel really can't be trusted anymore when this happens. Signed-off-by: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andi Kleen <ak@suse.de> CC: Andi Kleen <ak@suse.de>
| * [PATCH] Add the canary field to the PDA area and the task structArjan van de Ven2006-09-261-0/+5
| | | | | | | | | | | | | | | | | | | | | | This patch adds the per thread cookie field to the task struct and the PDA. Also it makes sure that the PDA value gets the new cookie value at context switch, and that a new task gets a new cookie at task creation time. Signed-off-by: Arjan van Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andi Kleen <ak@suse.de> CC: Andi Kleen <ak@suse.de>
| * [PATCH] Avoid recursion in lockdep when stack tracer takes locksAndi Kleen2006-09-261-0/+4
| | | | | | | | | | | | | | | | | | | | | | The new dwarf2 unwinder needs to take locks to do backtraces inside modules. This patch makes sure lockdep which calls stacktrace is not reentered. Thanks to Ingo for suggesting this simpler approach. Cc: mingo@elte.hu Signed-off-by: Andi Kleen <ak@suse.de>
| * [PATCH] x86: Some preparationary cleanup for stack traceAndi Kleen2006-09-261-1/+4
| | | | | | | | | | | | | | | | | | | | | | - Remove unused all_contexts parameter No caller used it - Move skip argument into the structure (needed for followon patches) Cc: mingo@elte.hu Signed-off-by: Andi Kleen <ak@suse.de>
| * [PATCH] i386: Account spinlocks to the caller during profiling for !FP kernelsAndi Kleen2006-09-261-0/+5
| | | | | | | | | | | | | | | | | | This ports the algorithm from x86-64 (with improvements) to i386. Previously this only worked for frame pointer enabled kernels. But spinlocks have a very simple stack frame that can be manually analyzed. Do this. Signed-off-by: Andi Kleen <ak@suse.de>
| * [PATCH] x86: Add portable getcpu callAndi Kleen2006-09-261-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For NUMA optimization and some other algorithms it is useful to have a fast to get the current CPU and node numbers in user space. x86-64 added a fast way to do this in a vsyscall. This adds a generic syscall for other architectures to make it a generic portable facility. I expect some of them will also implement it as a faster vsyscall. The cache is an optimization for the x86-64 vsyscall optimization. Since what the syscall returns is an approximation anyways and user space often wants very fast results it can be cached for some time. The norma methods to get this information in user space are relatively slow The vsyscall is in a better position to manage the cache because it has direct access to a fast time stamp (jiffies). For the generic syscall optimization it doesn't help much, but enforce a valid argument to keep programs portable I only added an i386 syscall entry for now. Other architectures can follow as needed. AK: Also added some cleanups from Andrew Morton Signed-off-by: Andi Kleen <ak@suse.de>
| * [PATCH] x86: Allow users to force a panic on NMIDon Zickus2006-09-262-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To quote Alan Cox: The default Linux behaviour on an NMI of either memory or unknown is to continue operation. For many environments such as scientific computing it is preferable that the box is taken out and the error dealt with than an uncorrected parity/ECC error get propogated. A small number of systems do generate NMI's for bizarre random reasons such as power management so the default is unchanged. In other respects the new proc/sys entry works like the existing panic controls already in that directory. This is separate to the edac support - EDAC allows supported chipsets to handle ECC errors well, this change allows unsupported cases to at least panic rather than cause problems further down the line. Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Andi Kleen <ak@suse.de>
| * [PATCH] x86: Add abilty to enable/disable nmi watchdog with sysctlDon Zickus2006-09-261-0/+11
| | | | | | | | | | | | | | | | Adds a new /proc/sys/kernel/nmi call that will enable/disable the nmi watchdog. Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Andi Kleen <ak@suse.de>
| * [PATCH] i386/x86-64: Remove un/set_nmi_callback and ↵Don Zickus2006-09-261-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | reserve/release_lapic_nmi functions Removes the un/set_nmi_callback and reserve/release_lapic_nmi functions as they are no longer needed. The various subsystems are modified to register with the die_notifier instead. Also includes compile fixes by Andrew Morton. Signed-off-by: Don Zickus <dzickus@redhat.com> Signed-off-by: Andi Kleen <ak@suse.de>
* | Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6Linus Torvalds2006-09-264-4/+22
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * master.kernel.org:/pub/scm/linux/kernel/git/gregkh/driver-2.6: (47 commits) Driver core: Don't call put methods while holding a spinlock Driver core: Remove unneeded routines from driver core Driver core: Fix potential deadlock in driver core PCI: enable driver multi-threaded probe Driver Core: add ability for drivers to do a threaded probe sysfs: add proper sysfs_init() prototype drivers/base: check errors drivers/base: Platform notify needs to occur before drivers attach to the device v4l-dev2: handle __must_check add CONFIG_ENABLE_MUST_CHECK add __must_check to device management code Driver core: fixed add_bind_files() definition Driver core: fix comments in drivers/base/power/resume.c sysfs_remove_bin_file: no return value, dump_stack on error kobject: must_check fixes Driver core: add ability for devices to create and remove bin files Class: add support for class interfaces for devices Driver core: create devices/virtual/ tree Driver core: add device_rename function Driver core: add ability for classes to handle devices properly ...
| * | PM: no suspend_prepare() phaseDavid Brownell2006-09-261-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove the new suspend_prepare() phase. It doesn't seem very usable, has never been tested, doesn't address fault cleanup, and would need a sibling resume_complete(); plus there are no real use cases. It could be restored later if those issues get resolved. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * | PM: add kconfig option for deprecated .../power/state filesDavid Brownell2006-09-261-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new PM_SYSFS_DEPRECATED config option to control whether or not the /sys/devices/.../power/state files are provided. This will make it easier to get rid of that mechanism when the time comes, and to verify that userspace tools work right without it. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Acked-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * | PM: issue PM_EVENT_PRETHAWDavid Brownell2006-09-263-4/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch is the first of this series that should actually change any behavior ... by issuing the new event, now tha the rest of the kernel is prepared to receive it. This converts the PM core to issue the new PRETHAW message, which the rest of the kernel is now ready to receive. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
| * | Suspend infrastructure cleanup and extensionLinus Torvalds2006-09-261-0/+4
| |/ | | | | | | | | | | | | | | | | | | | | | | Allow devices to participate in the suspend process more intimately, in particular, allow the final phase (with interrupts disabled) to also be open to normal devices, not just system devices. Also, allow classes to participate in device suspend. Signed-off-by: Linus Torvalds <torvalds@osdl.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
* | [PATCH] PM: Add pm_trace switchRafael J. Wysocki2006-09-261-0/+30
| | | | | | | | | | | | | | | | | | | | | | Add the pm_trace attribute in /sys/power which has to be explicitly set to one to really enable the "PM tracing" code compiled in when CONFIG_PM_TRACE is set (which modifies the machine's CMOS clock in unpredictable ways). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] PM: make it possible to disable console suspendingRafael J. Wysocki2006-09-262-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | Change suspend_console() so that it waits for all consoles to flush the remaining messages and make it possible to switch the console suspending off with the help of a Kconfig option. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Cc: Stefan Seyfried <seife@suse.de> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] swsusp: Use memory bitmaps during resumeRafael J. Wysocki2006-09-264-246/+186
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make swsusp use memory bitmaps to store its internal information during the resume phase of the suspend-resume cycle. If the pfns of saveable pages are saved during the suspend phase instead of the kernel virtual addresses of these pages, we can use them during the resume phase directly to set the corresponding bits in a memory bitmap. Then, this bitmap is used to mark the page frames corresponding to the pages that were saveable before the suspend (aka "unsafe" page frames). Next, we allocate as many page frames as needed to store the entire suspend image and make sure that there will be some extra free "safe" page frames for the list of PBEs constructed later. Subsequently, the image is loaded and, if possible, the data loaded from it are written into their "original" page frames (ie. the ones they had occupied before the suspend). The image data that cannot be written into their "original" page frames are loaded into "safe" page frames and their "original" kernel virtual addresses, as well as the addresses of the "safe" pages containing their copies, are stored in a list of PBEs. Finally, the list of PBEs is used to copy the remaining image data into their "original" page frames (this is done atomically, by the architecture-dependent parts of swsusp). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] swsusp: Introduce memory bitmapsRafael J. Wysocki2006-09-263-70/+540
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce the memory bitmap data structure and make swsusp use in the suspend phase. The current swsusp's internal data structure is not very efficient from the memory usage point of view, so it seems reasonable to replace it with a data structure that will require less memory, such as a pair of bitmaps. The idea is to use bitmaps that may be allocated as sets of individual pages, so that we can avoid making allocations of order greater than 0. For this reason the memory bitmap structure consists of several linked lists of objects that contain pointers to memory pages with the actual bitmap data. Still, for a typical system all of these lists fit in a single page, so it's reasonable to introduce an additional mechanism allowing us to allocate all of them efficiently without sacrificing the generality of the design. This is done with the help of the chain_allocator structure and associated functions. We need to use two memory bitmaps during the suspend phase of the suspend-resume cycle. One of them is necessary for marking the saveable pages, and the second is used to mark the pages in which to store the copies of them (aka image pages). First, the bitmaps are created and we allocate as many image pages as needed (the corresponding bits in the second bitmap are set as soon as the pages are allocated). Second, the bits corresponding to the saveable pages are set in the first bitmap and the saveable pages are copied to the image pages. Finally, the first bitmap is used to save the kernel virtual addresses of the saveable pages and the second one is used to save the contents of the image pages. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] swsusp: Introduce some helpful constantsRafael J. Wysocki2006-09-261-9/+15
| | | | | | | | | | | | | | | | | | | | Introduce some constants that hopefully will help improve the readability of code in kernel/power/snapshot.c. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] Change the name of pagedir_nosaveRafael J. Wysocki2006-09-262-14/+14
| | | | | | | | | | | | | | | | | | | | The name of the pagedir_nosave variable does not make sense any more, so it seems reasonable to change it to something more meaningful. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] swsusp: Fix alloc_pagedirRafael J. Wysocki2006-09-261-15/+17
| | | | | | | | | | | | | | | | | | | | Get rid of the FIXME in kernel/power/snapshot.c#alloc_pagedir() and simplify the functions called by it. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] swsusp: Reorder memory-allocating functionsRafael J. Wysocki2006-09-261-40/+53
| | | | | | | | | | | | | | | | | | | | | | Move some functions in kernel/power/snapshot.c to a better place (in the same file) and introduce free_image_page() (will be necessary in the future). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] swsusp: Fix mark_free_pagesRafael J. Wysocki2006-09-261-14/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Clean up mm/page_alloc.c#mark_free_pages() and make it avoid clearing PageNosaveFree for PageNosave pages. This allows us to get rid of an ugly hack in kernel/power/snapshot.c#copy_data_pages(). Additionally, the page-copying loop in copy_data_pages() is moved to an inline function. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] Disable CPU hotplug during suspendRafael J. Wysocki2006-09-266-96/+137
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current suspend code has to be run on one CPU, so we use the CPU hotplug to take the non-boot CPUs offline on SMP machines. However, we should also make sure that these CPUs will not be enabled by someone else after we have disabled them. The functions disable_nonboot_cpus() and enable_nonboot_cpus() are moved to kernel/cpu.c, because they now refer to some stuff in there that should better be static. Also it's better if disable_nonboot_cpus() returns an error instead of panicking if something goes wrong, and enable_nonboot_cpus() has no reason to panic(), because the CPUs may have been enabled by the userland before it tries to take them online. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] swsusp: struct snapshot_handle cleanupRafael J. Wysocki2006-09-262-28/+76
| | | | | | | | | | | | | | | | | | | | Add comments describing struct snapshot_handle and its members, change the confusing name of its member 'page' to 'cur'. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] swsusp: clean up browsing of pfnsRafael J. Wysocki2006-09-261-30/+32
| | | | | | | | | | | | | | | | | | | | | | Clean up some loops over pfns for each zone in snapshot.c: reduce the number of additions to perform, rework detection of saveable pages and make the code a bit less difficult to understand, hopefully. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Acked-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] swsusp: read speedupAndrew Morton2006-09-263-71/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implement async reads for swsusp resuming. Crufty old PIII testbox: 15.7 MB/s -> 20.3 MB/s Sony Vaio: 14.6 MB/s -> 33.3 MB/s I didn't implement the post-resume bio_set_pages_dirty(). I don't really understand why resume needs to run set_page_dirty() against these pages. It might be a worry that this code modifies PG_Uptodate, PG_Error and PG_Locked against the image pages. Can this possibly affect the resumed-into kernel? Hopefully not, if we're atomically restoring its mem_map? Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Jens Axboe <axboe@suse.de> Cc: Laurent Riffard <laurent.riffard@free.fr> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] swsusp: add read-speed instrumentationAndrew Morton2006-09-261-0/+5
| | | | | | | | | | | | | | | | | | | | Add some instrumentation to the swsusp readin code to show what bandwidth we're achieving. Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] swsusp: write speedupAndrew Morton2006-09-261-18/+75
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Switch the swsusp writeout code from 4k-at-a-time to 4MB-at-a-time. Crufty old PIII testbox: 12.9 MB/s -> 20.9 MB/s Sony Vaio: 14.7 MB/s -> 26.5 MB/s The implementation is crude. A better one would use larger BIOs, but wouldn't gain any performance. The memcpys will be mostly pipelined with the IO and basically come for free. The ENOMEM path has not been tested. It should be. Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] swsusp: add write-speed instrumentationAndrew Morton2006-09-261-3/+29
| | | | | | | | | | | | | | | | | | | | Add some instrumentation to the swsusp writeout code to show what bandwidth we're achieving. Cc: Pavel Machek <pavel@ucw.cz> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] FRV: permit __do_IRQ() to be dispensed withDavid Howells2006-09-261-0/+2
| | | | | | | | | | | | | | | | | | | | | | Permit __do_IRQ() to be dispensed with based on a configuration option. Signed-off-by: David Howells <dhowells@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] selinux: rename selinux_ctxid_to_stringStephen Smalley2006-09-263-10/+10
| | | | | | | | | | | | | | | | | | | | Rename selinux_ctxid_to_string to selinux_sid_to_string to be consistent with other interfaces. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] selinux: eliminate selinux_task_ctxidStephen Smalley2006-09-261-1/+1
| | | | | | | | | | | | | | | | | | Eliminate selinux_task_ctxid since it duplicates selinux_task_get_sid. Signed-off-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: James Morris <jmorris@namei.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] NUMA: Add zone_to_nid functionChristoph Lameter2006-09-261-2/+2
| | | | | | | | | | | | | | | | | | | | | | There are many places where we need to determine the node of a zone. Currently we use a difficult to read sequence of pointer dereferencing. Put that into an inline function and use throughout VM. Maybe we can find a way to optimize the lookup in the future. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] zone_reclaim: dynamic slab reclaimChristoph Lameter2006-09-261-0/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently one can enable slab reclaim by setting an explicit option in /proc/sys/vm/zone_reclaim_mode. Slab reclaim is then used as a final option if the freeing of unmapped file backed pages is not enough to free enough pages to allow a local allocation. However, that means that the slab can grow excessively and that most memory of a node may be used by slabs. We have had a case where a machine with 46GB of memory was using 40-42GB for slab. Zone reclaim was effective in dealing with pagecache pages. However, slab reclaim was only done during global reclaim (which is a bit rare on NUMA systems). This patch implements slab reclaim during zone reclaim. Zone reclaim occurs if there is a danger of an off node allocation. At that point we 1. Shrink the per node page cache if the number of pagecache pages is more than min_unmapped_ratio percent of pages in a zone. 2. Shrink the slab cache if the number of the nodes reclaimable slab pages (patch depends on earlier one that implements that counter) are more than min_slab_ratio (a new /proc/sys/vm tunable). The shrinking of the slab cache is a bit problematic since it is not node specific. So we simply calculate what point in the slab we want to reach (current per node slab use minus the number of pages that neeed to be allocated) and then repeately run the global reclaim until that is unsuccessful or we have reached the limit. I hope we will have zone based slab reclaim at some point which will make that easier. The default for the min_slab_ratio is 5% Also remove the slab option from /proc/sys/vm/zone_reclaim_mode. [akpm@osdl.org: cleanups] Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] Profiling: require buffer allocation on the correct nodeChristoph Lameter2006-09-261-4/+12
| | | | | | | | | | | | | | | | | | | | Profiling really suffers with off node buffers. Fail if no memory is available on the nodes. The profiling code can deal with these failures should they occur. Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] Add __GFP_THISNODE to avoid fallback to other nodes and ignore ↵Christoph Lameter2006-09-261-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cpuset/memory policy restrictions Add a new gfp flag __GFP_THISNODE to avoid fallback to other nodes. This flag is essential if a kernel component requires memory to be located on a certain node. It will be needed for alloc_pages_node() to force allocation on the indicated node and for alloc_pages() to force allocation on the current node. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: Andy Whitcroft <apw@shadowen.org> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* | [PATCH] Fix longstanding load balancing bug in the schedulerChristoph Lameter2006-09-261-8/+46
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The scheduler will stop load balancing if the most busy processor contains processes pinned via processor affinity. The scheduler currently only does one search for busiest cpu. If it cannot pull any tasks away from the busiest cpu because they were pinned then the scheduler goes into a corner and sulks leaving the idle processors idle. F.e. If you have processor 0 busy running four tasks pinned via taskset, there are none on processor 1 and one just started two processes on processor 2 then the scheduler will not move one of the two processes away from processor 2. This patch fixes that issue by forcing the scheduler to come out of its corner and retrying the load balancing by considering other processors for load balancing. This patch was originally developed by John Hawkes and discussed at http://marc.theaimsgroup.com/?l=linux-kernel&m=113901368523205&w=2. I have removed extraneous material and gone back to equipping struct rq with the cpu the queue is associated with since this makes the patch much easier and it is likely that others in the future will have the same difficulty of figuring out which processor owns which runqueue. The overhead added through these patches is a single word on the stack if the kernel is configured to support 32 cpus or less (32 bit). For 32 bit environments the maximum number of cpus that can be configued is 255 which would result in the use of 32 bytes additional on the stack. On IA64 up to 1k cpus can be configured which will result in the use of 128 additional bytes on the stack. The maximum additional cache footprint is one cacheline. Typically memory use will be much less than a cacheline and the additional cpumask will be placed on the stack in a cacheline that already contains other local variable. Signed-off-by: Christoph Lameter <clameter@sgi.com> Cc: John Hawkes <hawkes@sgi.com> Cc: "Siddha, Suresh B" <suresh.b.siddha@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Nick Piggin <nickpiggin@yahoo.com.au> Cc: Peter Williams <pwil3058@bigpond.net.au> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [PATCH] load_module: no BUG if module_subsys uninitializedEd Swierk2006-09-261-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | Invoking load_module() before param_sysfs_init() is called crashes in mod_sysfs_setup(), since the kset in module_subsys is not initialized yet. In my case, net-pf-1 is getting modprobed as a result of hotplug trying to create a UNIX socket. Calls to hotplug begin after the topology_init initcall. Another patch for the same symptom (module_subsys-initialize-earlier.patch) moves param_sysfs_init() to the subsys initcalls, but this is still not early enough in the boot process in some cases. In particular, topology_init() causes /sbin/hotplug to run, which requests net-pf-1 (the UNIX socket protocol) which can be compiled as a module. Moving param_sysfs_init() to the postcore initcalls fixes this particular race, but there might well be other cases where a usermodehelper causes a module to load earlier still. The patch makes load_module() return an error rather than crashing the kernel if invoked before module_subsys is initialized. Cc: Mark Huang <mlhuang@cs.princeton.edu> Cc: Greg KH <greg@kroah.com> Cc: <stable@kernel.org> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
* [NETLINK]: Extend netlink messaging interfaceThomas Graf2006-09-221-1/+1
| | | | | | | | | | | | | | Adds: nlmsg_get_pos() return current position in message nlmsg_trim() trim part of message nla_reserve_nohdr(skb, len) reserve room for an attribute w/o hdr nla_put_nohdr(skb, len, data) add attribute w/o hdr nla_find_nested() find attribute in nested attributes Fixes nlmsg_new() to take allocation flags and consider size. Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>