summaryrefslogtreecommitdiffstats
path: root/fs/proc (follow)
Commit message (Collapse)AuthorAgeFilesLines
* mm: don't access vm_flags as 'int'KOSAKI Motohiro2011-05-261-1/+1
| | | | | | | | | | | The type of vma->vm_flags is 'unsigned long'. Neither 'int' nor 'unsigned int'. This patch fixes such misuse. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> [ Changed to use a typedef - we'll extend it to cover more cases later, since there has been discussion about making it a 64-bit type.. - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/linux-2.6-nsfdLinus Torvalds2011-05-265-11/+233
|\ | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/linux-2.6-nsfd: net: fix get_net_ns_by_fd for !CONFIG_NET_NS ns proc: Return -ENOENT for a nonexistent /proc/self/ns/ entry. ns: Declare sys_setns in syscalls.h net: Allow setting the network namespace by fd ns proc: Add support for the ipc namespace ns proc: Add support for the uts namespace ns proc: Add support for the network namespace. ns: Introduce the setns syscall ns: proc files for namespace naming policy.
| * ns proc: Return -ENOENT for a nonexistent /proc/self/ns/ entry.Eric W. Biederman2011-05-251-0/+1
| | | | | | | | | | Spotted-by: Nathan Lynch <ntl@pobox.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
| * ns proc: Add support for the ipc namespaceEric W. Biederman2011-05-101-0/+3
| | | | | | | | | | Acked-by: Daniel Lezcano <daniel.lezcano@free.fr> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
| * ns proc: Add support for the uts namespaceEric W. Biederman2011-05-101-0/+3
| | | | | | | | | | Acked-by: Daniel Lezcano <daniel.lezcano@free.fr> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
| * ns proc: Add support for the network namespace.Eric W. Biederman2011-05-101-0/+3
| | | | | | | | | | | | | | | | | | Implementing file descriptors for the network namespace is simple and straight forward. Acked-by: David S. Miller <davem@davemloft.net> Acked-by: Daniel Lezcano <daniel.lezcano@free.fr> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
| * ns: proc files for namespace naming policy.Eric W. Biederman2011-05-105-11/+223
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Create files under /proc/<pid>/ns/ to allow controlling the namespaces of a process. This addresses three specific problems that can make namespaces hard to work with. - Namespaces require a dedicated process to pin them in memory. - It is not possible to use a namespace unless you are the child of the original creator. - Namespaces don't have names that userspace can use to talk about them. The namespace files under /proc/<pid>/ns/ can be opened and the file descriptor can be used to talk about a specific namespace, and to keep the specified namespace alive. A namespace can be kept alive by either holding the file descriptor open or bind mounting the file someplace else. aka: mount --bind /proc/self/ns/net /some/filesystem/path mount --bind /proc/self/fd/<N> /some/filesystem/path This allows namespaces to be named with userspace policy. It requires additional support to make use of these filedescriptors and that will be comming in the following patches. Acked-by: Daniel Lezcano <daniel.lezcano@free.fr> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6Linus Torvalds2011-05-261-0/+1
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: (89 commits) bonding: documentation and code cleanup for resend_igmp bonding: prevent deadlock on slave store with alb mode (v3) net: hold rtnl again in dump callbacks Add Fujitsu 1000base-SX PCI ID to tg3 bnx2x: protect sequence increment with mutex sch_sfq: fix peek() implementation isdn: netjet - blacklist Digium TDM400P via-velocity: don't annotate MAC registers as packed xen: netfront: hold RTNL when updating features. sctp: fix memory leak of the ASCONF queue when free asoc net: make dev_disable_lro use physical device if passed a vlan dev (v2) net: move is_vlan_dev into public header file (v2) bug.h: Fix build with CONFIG_PRINTK disabled. wireless: fix fatal kernel-doc error + warning in mac80211.h wireless: fix cfg80211.h new kernel-doc warnings iwlagn: dbg_fixed_rate only used when CONFIG_MAC80211_DEBUGFS enabled dst: catch uninitialized metrics be2net: hash key for rss-config cmd not set bridge: initialize fake_rtable metrics net: fix __dst_destroy_metrics_generic() ... Fix up trivial conflicts in drivers/staging/brcm80211/brcmfmac/wl_cfg80211.c
| * \ Merge ↵John W. Linville2011-05-241-0/+1
| |\ \ | | | | | | | | | | | | ssh://master.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next-2.6 into for-davem
| | * | airo: correct proc entry creation interfacesAlexey Dobriyan2011-05-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * use proc_mkdir_mode() instead of create_proc_entry(S_IFDIR|...), export proc_mkdir_mode() for that, oh well. * don't supply S_IFREG to proc_create_data(), it's unnecessary Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: John W. Linville <linville@tuxdriver.com>
* | | | proc: allocate storage for numa_maps statistics onceStephen Wilson2011-05-251-9/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In show_numa_map() we collect statistics into a numa_maps structure. Since the number of NUMA nodes can be very large, this structure is not a candidate for stack allocation. Instead of going thru a kmalloc()+kfree() cycle each time show_numa_map() is invoked, perform the allocation just once when /proc/pid/numa_maps is opened. Performing the allocation when numa_maps is opened, and thus before a reference to the target tasks mm is taken, eliminates a potential stalemate condition in the oom-killer as originally described by Hugh Dickins: ... imagine what happens if the system is out of memory, and the mm we're looking at is selected for killing by the OOM killer: while we wait in __get_free_page for more memory, no memory is freed from the selected mm because it cannot reach exit_mmap while we hold that reference. Signed-off-by: Stephen Wilson <wilsons@start.ca> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | proc: make struct proc_maps_private truly privateStephen Wilson2011-05-251-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that mm/mempolicy.c is no longer implementing /proc/pid/numa_maps there is no need to export struct proc_maps_private to the world. Move it to fs/proc/internal.h instead. Signed-off-by: Stephen Wilson <wilsons@start.ca> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | mm: proc: move show_numa_map() to fs/proc/task_mmu.cStephen Wilson2011-05-251-2/+182
|/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Moving show_numa_map() from mempolicy.c to task_mmu.c solves several issues. - Having the show() operation "miles away" from the corresponding seq_file iteration operations is a maintenance burden. - The need to export ad hoc info like struct proc_maps_private is eliminated. - The implementation of show_numa_map() can be improved in a simple manner by cooperating with the other seq_file operations (start, stop, etc) -- something that would be messy to do without this change. Signed-off-by: Stephen Wilson <wilsons@start.ca> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Hugh Dickins <hughd@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Lee Schermerhorn <lee.schermerhorn@hp.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Christoph Lameter <cl@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | / Don't lock guardpage if the stack is growing upMikulas Patocka2011-05-101-5/+7
| |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Linux kernel excludes guard page when performing mlock on a VMA with down-growing stack. However, some architectures have up-growing stack and locking the guard page should be excluded in this case too. This patch fixes lvm2 on PA-RISC (and possibly other architectures with up-growing stack). lvm2 calculates number of used pages when locking and when unlocking and reports an internal error if the numbers mismatch. [ Patch changed fairly extensively to also fix /proc/<pid>/maps for the grows-up case, and to move things around a bit to clean it all up and share the infrstructure with the /proc bits. Tested on ia64 that has both grow-up and grow-down segments - Linus ] Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz> Tested-by: Tony Luck <tony.luck@gmail.com> Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | proc: do proper range check on readdir offsetLinus Torvalds2011-04-181-2/+7
| | | | | | | | | | | | | | | | | | | | Rather than pass in some random truncated offset to the pid-related functions, check that the offset is in range up-front. This is just cleanup, the previous commit fixed the real problem. Cc: stable@kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Fix common misspellingsLucas De Marchi2011-03-311-1/+1
|/ | | | | | Fixes generated by 'codespell' and manually reviewed. Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
* proc: fix oops on invalid /proc/<pid>/maps accessLinus Torvalds2011-03-281-1/+2
| | | | | | | | | | | | | | | | | | | When m_start returns an error, the seq_file logic will still call m_stop with that error entry, so we'd better make sure that we check it before using it as a vma. Introduced by commit ec6fd8a4355c ("report errors in /proc/*/*map* sanely"), which replaced NULL with various ERR_PTR() cases. (On ia64, you happen to get a unaligned fault instead of a page fault, since the address used is generally some random error code like -EPERM) Reported-by: Anca Emanuel <anca.emanuel@gmail.com> Reported-by: Tony Luck <tony.luck@intel.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Américo Wang <xiyou.wangcong@gmail.com> Cc: Stephen Wilson <wilsons@start.ca> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'for-linus' of ↵Linus Torvalds2011-03-243-78/+132
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: deal with races in /proc/*/{syscall,stack,personality} proc: enable writing to /proc/pid/mem proc: make check_mem_permission() return an mm_struct on success proc: hold cred_guard_mutex in check_mem_permission() proc: disable mem_write after exec mm: implement access_remote_vm mm: factor out main logic of access_process_vm mm: use mm_struct to resolve gate vma's in __get_user_pages mm: arch: rename in_gate_area_no_task to in_gate_area_no_mm mm: arch: make in_gate_area take an mm_struct instead of a task_struct mm: arch: make get_gate_vma take an mm_struct instead of a task_struct x86: mark associated mm when running a task in 32 bit compatibility mode x86: add context tag to mark mm when running a task in 32-bit compatibility mode auxv: require the target to be tracable (or yourself) close race in /proc/*/environ report errors in /proc/*/*map* sanely pagemap: close races with suid execve make sessionid permissions in /proc/*/task/* match those in /proc/* fix leaks in path_lookupat() Fix up trivial conflicts in fs/proc/base.c
| * deal with races in /proc/*/{syscall,stack,personality}Al Viro2011-03-231-19/+50
| | | | | | | | | | | | | | | | | | | | | | All of those are rw-r--r-- and all are broken for suid - if you open a file before the target does suid-root exec, you'll be still able to access it. For personality it's not a big deal, but for syscall and stack it's a real problem. Fix: check that task is tracable for you at the time of read(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * proc: enable writing to /proc/pid/memStephen Wilson2011-03-231-5/+0
| | | | | | | | | | | | | | | | With recent changes there is no longer a security hazard with writing to /proc/pid/mem. Remove the #ifdef. Signed-off-by: Stephen Wilson <wilsons@start.ca> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * proc: make check_mem_permission() return an mm_struct on successStephen Wilson2011-03-231-24/+34
| | | | | | | | | | | | | | | | | | | | | | | | | | This change allows us to take advantage of access_remote_vm(), which in turn eliminates a security issue with the mem_write() implementation. The previous implementation of mem_write() was insecure since the target task could exec a setuid-root binary between the permission check and the actual write. Holding a reference to the target mm_struct eliminates this vulnerability. Signed-off-by: Stephen Wilson <wilsons@start.ca> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * proc: hold cred_guard_mutex in check_mem_permission()Stephen Wilson2011-03-231-4/+22
| | | | | | | | | | | | | | | | | | | | | | | | Avoid a potential race when task exec's and we get a new ->mm but check against the old credentials in ptrace_may_access(). Holding of the mutex is implemented by factoring out the body of the code into a helper function __check_mem_permission(). Performing this factorization now simplifies upcoming changes and minimizes churn in the diff's. Signed-off-by: Stephen Wilson <wilsons@start.ca> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * proc: disable mem_write after execStephen Wilson2011-03-231-0/+4
| | | | | | | | | | | | | | | | | | | | This change makes mem_write() observe the same constraints as mem_read(). This is particularly important for mem_write as an accidental leak of the fd across an exec could result in arbitrary modification of the target process' memory. IOW, /proc/pid/mem is implicitly close-on-exec. Signed-off-by: Stephen Wilson <wilsons@start.ca> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * mm: arch: make get_gate_vma take an mm_struct instead of a task_structStephen Wilson2011-03-231-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | Morally, the presence of a gate vma is more an attribute of a particular mm than a particular task. Moreover, dropping the dependency on task_struct will help make both existing and future operations on mm's more flexible and convenient. Signed-off-by: Stephen Wilson <wilsons@start.ca> Reviewed-by: Michel Lespinasse <walken@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * auxv: require the target to be tracable (or yourself)Al Viro2011-03-231-3/+3
| | | | | | | | | | | | | | same as for environ, except that we didn't do any checks to prevent access after suid execve Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * close race in /proc/*/environAl Viro2011-03-231-6/+4
| | | | | | | | | | | | | | Switch to mm_for_maps(). Maybe we ought to make it r--r--r--, since we do checks on IO anyway... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * report errors in /proc/*/*map* sanelyAl Viro2011-03-233-11/+13
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * pagemap: close races with suid execveAl Viro2011-03-232-7/+4
| | | | | | | | | | | | just use mm_for_maps() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * make sessionid permissions in /proc/*/task/* match those in /proc/*Al Viro2011-03-231-1/+1
| | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | procfs: kill the global proc_mnt variableOleg Nesterov2011-03-243-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | After the previous cleanup in proc_get_sb() the global proc_mnt has no reasons to exists, kill it. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr> Cc: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Serge E. Hallyn <serge@hallyn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | pidns: call pid_ns_prepare_proc() from create_pid_namespace()Eric W. Biederman2011-03-241-18/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | Reorganize proc_get_sb() so it can be called before the struct pid of the first process is allocated. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Acked-by: Serge E. Hallyn <serge@hallyn.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | proc: protect mm start_code/end_code in /proc/pid/statKees Cook2011-03-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While mm->start_stack was protected from cross-uid viewing (commit f83ce3e6b02d5 ("proc: avoid information leaks to non-privileged processes")), the start_code and end_code values were not. This would allow the text location of a PIE binary to leak, defeating ASLR. Note that the value "1" is used instead of "0" for a protected value since "ps", "killall", and likely other readers of /proc/pid/stat, take start_code of "0" to mean a kernel thread and will misbehave. Thanks to Brad Spengler for pointing this out. Addresses CVE-2011-0726 Signed-off-by: Kees Cook <kees.cook@canonical.com> Cc: <stable@kernel.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: David Howells <dhowells@redhat.com> Cc: Eugene Teo <eugeneteo@kernel.sg> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Brad Spengler <spender@grsecurity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | proc: make struct proc_dir_entry::namelen unsigned intAlexey Dobriyan2011-03-241-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. namelen is declared "unsigned short" which hints for "maybe space savings". Indeed in 2.4 struct proc_dir_entry looked like: struct proc_dir_entry { unsigned short low_ino; unsigned short namelen; Now, low_ino is "unsigned int", all savings were gone for a long time. "struct proc_dir_entry" is not that countless to worry about it's size, anyway. 2. converting from unsigned short to int/unsigned int can only create problems, we better play it safe. Space is not really conserved, because of natural alignment for the next field. sizeof(struct proc_dir_entry) remains the same. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | procfs: fix some wrong error code usageJovi Zhang2011-03-241-2/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | [root@wei 1]# cat /proc/1/mem cat: /proc/1/mem: No such process error code -ESRCH is wrong in this situation. Return -EPERM instead. Signed-off-by: Jovi Zhang <bookjovi@gmail.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | procfs: fix /proc/<pid>/maps heap checkAaro Koskinen2011-03-241-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The current code fails to print the "[heap]" marking if the heap is split into multiple mappings. Fix the check so that the marking is displayed in all possible cases: 1. vma matches exactly the heap 2. the heap vma is merged e.g. with bss 3. the heap vma is splitted e.g. due to locked pages Test cases. In all cases, the process should have mapping(s) with [heap] marking: (1) vma matches exactly the heap #include <stdio.h> #include <unistd.h> #include <sys/types.h> int main (void) { if (sbrk(4096) != (void *)-1) { printf("check /proc/%d/maps\n", (int)getpid()); while (1) sleep(1); } return 0; } # ./test1 check /proc/553/maps [1] + Stopped ./test1 # cat /proc/553/maps | head -4 00008000-00009000 r-xp 00000000 01:00 3113640 /test1 00010000-00011000 rw-p 00000000 01:00 3113640 /test1 00011000-00012000 rw-p 00000000 00:00 0 [heap] 4006f000-40070000 rw-p 00000000 00:00 0 (2) the heap vma is merged #include <stdio.h> #include <unistd.h> #include <sys/types.h> char foo[4096] = "foo"; char bar[4096]; int main (void) { if (sbrk(4096) != (void *)-1) { printf("check /proc/%d/maps\n", (int)getpid()); while (1) sleep(1); } return 0; } # ./test2 check /proc/556/maps [2] + Stopped ./test2 # cat /proc/556/maps | head -4 00008000-00009000 r-xp 00000000 01:00 3116312 /test2 00010000-00012000 rw-p 00000000 01:00 3116312 /test2 00012000-00014000 rw-p 00000000 00:00 0 [heap] 4004a000-4004b000 rw-p 00000000 00:00 0 (3) the heap vma is splitted (this fails without the patch) #include <stdio.h> #include <unistd.h> #include <sys/mman.h> #include <sys/types.h> int main (void) { if ((sbrk(4096) != (void *)-1) && !mlockall(MCL_FUTURE) && (sbrk(4096) != (void *)-1)) { printf("check /proc/%d/maps\n", (int)getpid()); while (1) sleep(1); } return 0; } # ./test3 check /proc/559/maps [1] + Stopped ./test3 # cat /proc/559/maps|head -4 00008000-00009000 r-xp 00000000 01:00 3119108 /test3 00010000-00011000 rw-p 00000000 01:00 3119108 /test3 00011000-00012000 rw-p 00000000 00:00 0 [heap] 00012000-00013000 rw-p 00000000 00:00 0 [heap] It looks like the bug has been there forever, and since it only results in some information missing from a procfile, it does not fulfil the -stable "critical issue" criteria. Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com> Reviewed-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | proc: hide kernel addresses via %pK in /proc/<pid>/stackKonstantin Khlebnikov2011-03-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | This file is readable for the task owner. Hide kernel addresses from unprivileged users, leave them function names and offsets. Signed-off-by: Konstantin Khlebnikov <khlebnikov@openvz.org> Acked-by: Kees Cook <kees.cook@canonical.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | smaps: have smaps show transparent huge pagesDave Hansen2011-03-231-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that the mere act of _looking_ at /proc/$pid/smaps will not destroy transparent huge pages, tell how much of the VMA is actually mapped with them. This way, we can make sure that we're getting THPs where we expect to see them. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Eric B Munson <emunson@mgebm.net> Tested-by: Eric B Munson <emunson@mgebm.net> Cc: Michael J Wolf <mjwolf@us.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | smaps: teach smaps_pte_range() about THP pmdsDave Hansen2011-03-231-2/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds code to explicitly detect and handle pmd_trans_huge() pmds. It then passes HPAGE_SIZE units in to the smap_pte_entry() function instead of PAGE_SIZE. This means that using /proc/$pid/smaps now will no longer cause THPs to be broken down in to small pages. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Reviewed-by: Eric B Munson <emunson@mgebm.net> Tested-by: Eric B Munson <emunson@mgebm.net> Acked-by: Andrea Arcangeli <aarcange@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: Michael J Wolf <mjwolf@us.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | smaps: pass pte size argument in to smaps_pte_entry()Dave Hansen2011-03-231-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add an argument to the new smaps_pte_entry() function to let it account in things other than PAGE_SIZE units. I changed all of the PAGE_SIZE sites, even though not all of them can be reached for transparent huge pages, just so this will continue to work without changes as THPs are improved. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Eric B Munson <emunson@mgebm.net> Tested-by: Eric B Munson <emunson@mgebm.net> Cc: Michael J Wolf <mjwolf@us.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Matt Mackall <mpm@selenic.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | smaps: break out smaps_pte_entry() from smaps_pte_range()Dave Hansen2011-03-231-40/+47
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We will use smaps_pte_entry() in a moment to handle both small and transparent large pages. But, we must break it out of smaps_pte_range() first. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Eric B Munson <emunson@mgebm.net> Tested-by: Eric B Munson <emunson@mgebm.net> Cc: Michael J Wolf <mjwolf@us.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Matt Mackall <mpm@selenic.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | pagewalk: only split huge pages when necessaryDave Hansen2011-03-231-0/+6
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Right now, if a mm_walk has either ->pte_entry or ->pmd_entry set, it will unconditionally split any transparent huge pages it runs in to. In practice, that means that anyone doing a cat /proc/$pid/smaps will unconditionally break down every huge page in the process and depend on khugepaged to re-collapse it later. This is fairly suboptimal. This patch changes that behavior. It teaches each ->pmd_entry handler (there are five) that they must break down the THPs themselves. Also, the _generic_ code will never break down a THP unless a ->pte_entry handler is actually set. This means that the ->pmd_entry handlers can now choose to deal with THPs without breaking them down. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Eric B Munson <emunson@mgebm.net> Tested-by: Eric B Munson <emunson@mgebm.net> Cc: Michael J Wolf <mjwolf@us.ibm.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matt Mackall <mpm@selenic.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge branch 'next' into for-linusJames Morris2011-03-151-1/+0
|\
| * Merge branch 'master' of git://git.infradead.org/users/eparis/selinux into nextJames Morris2011-03-081-1/+0
| |\
| | * security/selinux: fix /proc/sys/ labelingLucian Adrian Grijincu2011-02-011-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This fixes an old (2007) selinux regression: filesystem labeling for /proc/sys returned -r--r--r-- unknown /proc/sys/fs/file-nr instead of -r--r--r-- system_u:object_r:sysctl_fs_t:s0 /proc/sys/fs/file-nr Events that lead to breaking of /proc/sys/ selinux labeling: 1) sysctl was reimplemented to route all calls through /proc/sys/ commit 77b14db502cb85a031fe8fde6c85d52f3e0acb63 [PATCH] sysctl: reimplement the sysctl proc support 2) proc_dir_entry was removed from ctl_table: commit 3fbfa98112fc3962c416452a0baf2214381030e6 [PATCH] sysctl: remove the proc_dir_entry member for the sysctl tables 3) selinux still walked the proc_dir_entry tree to apply labeling. Because ctl_tables don't have a proc_dir_entry, we did not label /proc/sys/ inodes any more. To achieve this the /proc/sys/ inodes were marked private and private inodes were ignored by selinux. commit bbaca6c2e7ef0f663bc31be4dad7cf530f6c4962 [PATCH] selinux: enhance selinux to always ignore private inodes commit 86a71dbd3e81e8870d0f0e56b87875f57e58222b [PATCH] sysctl: hide the sysctl proc inodes from selinux Access control checks have been done by means of a special sysctl hook that was called for read/write accesses to any /proc/sys/ entry. We don't have to do this because, instead of walking the proc_dir_entry tree we can walk the dentry tree (as done in this patch). With this patch: * we don't mark /proc/sys/ inodes as private * we don't need the sysclt security hook * we walk the dentry tree to find the path to the inode. We have to strip the PID in /proc/PID/ entries that have a proc_dir_entry because selinux does not know how to label paths like '/1/net/rpc/nfsd.fh' (and defaults to 'proc_t' labeling). Selinux does know of '/net/rpc/nfsd.fh' (and applies the 'sysctl_rpc_t' label). PID stripping from the path was done implicitly in the previous code because the proc_dir_entry tree had the root in '/net' in the example from above. The dentry tree has the root in '/1'. Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Lucian Adrian Grijincu <lucian.grijincu@gmail.com> Signed-off-by: Eric Paris <eparis@redhat.com>
* | | /proc/self is never going to be invalidated...Al Viro2011-03-101-30/+0
| | | | | | | | | | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | | unfuck proc_sysctl ->d_compare()Al Viro2011-03-082-4/+11
|/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | a) struct inode is not going to be freed under ->d_compare(); however, the thing PROC_I(inode)->sysctl points to just might. Fortunately, it's enough to make freeing that sucker delayed, provided that we don't step on its ->unregistering, clear the pointer to it in PROC_I(inode) before dropping the reference and check if it's NULL in ->d_compare(). b) I'm not sure that we *can* walk into NULL inode here (we recheck dentry->seq between verifying that it's still hashed / fetching dentry->d_inode and passing it to ->d_compare() and there's no negative hashed dentries in /proc/sys/*), but if we can walk into that, we really should not have ->d_compare() return 0 on it! Said that, I really suspect that this check can be simply killed. Nick? Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | of/flattree: Drop an uninteresting message to pr_debug levelPaul Bolle2011-03-021-1/+1
| | | | | | | | | | | | | | | | This message looks like an error (which it isn't) when booting with a flattened device tree. Remove the message from normal kernel builds. Signed-off-by: Paul Bolle <pebolle@tiscali.nl> Signed-off-by: Grant Likely <grant.likely@secretlab.ca>
* | s390: remove task_show_regsMartin Schwidefsky2011-02-151-3/+0
| | | | | | | | | | | | | | | | | | | | | | task_show_regs used to be a debugging aid in the early bringup days of Linux on s390. /proc/<pid>/status is a world readable file, it is not a good idea to show the registers of a process. The only correct fix is to remove task_show_regs. Reported-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | console: rename acquire/release_console_sem() to console_lock/unlock()Torben Hohn2011-01-261-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The -rt patches change the console_semaphore to console_mutex. As a result, a quite large chunk of the patches changes all acquire/release_console_sem() to acquire/release_console_mutex() This commit makes things use more neutral function names which dont make implications about the underlying lock. The only real change is the return value of console_trylock which is inverted from try_acquire_console_sem() This patch also paves the way to switching console_sem from a semaphore to a mutex. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: make console_trylock return 1 on success, per Geert] Signed-off-by: Torben Hohn <torbenh@gmx.de> Cc: Thomas Gleixner <tglx@tglx.de> Cc: Greg KH <gregkh@suse.de> Cc: Ingo Molnar <mingo@elte.hu> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERTDavid Rientjes2011-01-211-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The meaning of CONFIG_EMBEDDED has long since been obsoleted; the option is used to configure any non-standard kernel with a much larger scope than only small devices. This patch renames the option to CONFIG_EXPERT in init/Kconfig and fixes references to the option throughout the kernel. A new CONFIG_EMBEDDED option is added that automatically selects CONFIG_EXPERT when enabled and can be used in the future to isolate options that should only be considered for embedded systems (RISC architectures, SLOB, etc). Calling the option "EXPERT" more accurately represents its intention: only expert users who understand the impact of the configuration changes they are making should enable it. Reviewed-by: Ingo Molnar <mingo@elte.hu> Acked-by: David Woodhouse <david.woodhouse@intel.com> Signed-off-by: David Rientjes <rientjes@google.com> Cc: Greg KH <gregkh@suse.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jens Axboe <axboe@kernel.dk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Robin Holt <holt@sgi.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>