| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace
Pull user namespace related fixes from Eric Biederman:
"As these are bug fixes almost all of thes changes are marked for
backporting to stable.
The first change (implicitly adding MNT_NODEV on remount) addresses a
regression that was created when security issues with unprivileged
remount were closed. I go on to update the remount test to make it
easy to detect if this issue reoccurs.
Then there are a handful of mount and umount related fixes.
Then half of the changes deal with the a recently discovered design
bug in the permission checks of gid_map. Unix since the beginning has
allowed setting group permissions on files to less than the user and
other permissions (aka ---rwx---rwx). As the unix permission checks
stop as soon as a group matches, and setgroups allows setting groups
that can not later be dropped, results in a situtation where it is
possible to legitimately use a group to assign fewer privileges to a
process. Which means dropping a group can increase a processes
privileges.
The fix I have adopted is that gid_map is now no longer writable
without privilege unless the new file /proc/self/setgroups has been
set to permanently disable setgroups.
The bulk of user namespace using applications even the applications
using applications using user namespaces without privilege remain
unaffected by this change. Unfortunately this ix breaks a couple user
space applications, that were relying on the problematic behavior (one
of which was tools/selftests/mount/unprivileged-remount-test.c).
To hopefully prevent needing a regression fix on top of my security
fix I rounded folks who work with the container implementations mostly
like to be affected and encouraged them to test the changes.
> So far nothing broke on my libvirt-lxc test bed. :-)
> Tested with openSUSE 13.2 and libvirt 1.2.9.
> Tested-by: Richard Weinberger <richard@nod.at>
> Tested on Fedora20 with libvirt 1.2.11, works fine.
> Tested-by: Chen Hanxiao <chenhanxiao@cn.fujitsu.com>
> Ok, thanks - yes, unprivileged lxc is working fine with your kernels.
> Just to be sure I was testing the right thing I also tested using
> my unprivileged nsexec testcases, and they failed on setgroup/setgid
> as now expected, and succeeded there without your patches.
> Tested-by: Serge Hallyn <serge.hallyn@ubuntu.com>
> I tested this with Sandstorm. It breaks as is and it works if I add
> the setgroups thing.
> Tested-by: Andy Lutomirski <luto@amacapital.net> # breaks things as designed :("
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace:
userns: Unbreak the unprivileged remount tests
userns; Correct the comment in map_write
userns: Allow setting gid_maps without privilege when setgroups is disabled
userns: Add a knob to disable setgroups on a per user namespace basis
userns: Rename id_map_mutex to userns_state_mutex
userns: Only allow the creator of the userns unprivileged mappings
userns: Check euid no fsuid when establishing an unprivileged uid mapping
userns: Don't allow unprivileged creation of gid mappings
userns: Don't allow setgroups until a gid mapping has been setablished
userns: Document what the invariant required for safe unprivileged mappings.
groups: Consolidate the setgroups permission checks
mnt: Clear mnt_expire during pivot_root
mnt: Carefully set CL_UNPRIVILEGED in clone_mnt
mnt: Move the clear of MNT_LOCKED from copy_tree to it's callers.
umount: Do not allow unmounting rootfs.
umount: Disallow unprivileged mount force
mnt: Update unprivileged remount test
mnt: Implicitly add MNT_NODEV on remount when it was implicitly added by mount
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
A security fix in caused the way the unprivileged remount tests were
using user namespaces to break. Tweak the way user namespaces are
being used so the test works again.
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
It is important that all maps are less than PAGE_SIZE
or else setting the last byte of the buffer to '0'
could write off the end of the allocated storage.
Correct the misleading comment.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Now that setgroups can be disabled and not reenabled, setting gid_map
without privielge can now be enabled when setgroups is disabled.
This restores most of the functionality that was lost when unprivileged
setting of gid_map was removed. Applications that use this functionality
will need to check to see if they use setgroups or init_groups, and if they
don't they can be fixed by simply disabling setgroups before writing to
gid_map.
Cc: stable@vger.kernel.org
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
- Expose the knob to user space through a proc file /proc/<pid>/setgroups
A value of "deny" means the setgroups system call is disabled in the
current processes user namespace and can not be enabled in the
future in this user namespace.
A value of "allow" means the segtoups system call is enabled.
- Descendant user namespaces inherit the value of setgroups from
their parents.
- A proc file is used (instead of a sysctl) as sysctls currently do
not allow checking the permissions at open time.
- Writing to the proc file is restricted to before the gid_map
for the user namespace is set.
This ensures that disabling setgroups at a user namespace
level will never remove the ability to call setgroups
from a process that already has that ability.
A process may opt in to the setgroups disable for itself by
creating, entering and configuring a user namespace or by calling
setns on an existing user namespace with setgroups disabled.
Processes without privileges already can not call setgroups so this
is a noop. Prodcess with privilege become processes without
privilege when entering a user namespace and as with any other path
to dropping privilege they would not have the ability to call
setgroups. So this remains within the bounds of what is possible
without a knob to disable setgroups permanently in a user namespace.
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Generalize id_map_mutex so it can be used for more state of a user namespace.
Cc: stable@vger.kernel.org
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
If you did not create the user namespace and are allowed
to write to uid_map or gid_map you should already have the necessary
privilege in the parent user namespace to establish any mapping
you want so this will not affect userspace in practice.
Limiting unprivileged uid mapping establishment to the creator of the
user namespace makes it easier to verify all credentials obtained with
the uid mapping can be obtained without the uid mapping without
privilege.
Limiting unprivileged gid mapping establishment (which is temporarily
absent) to the creator of the user namespace also ensures that the
combination of uid and gid can already be obtained without privilege.
This is part of the fix for CVE-2014-8989.
Cc: stable@vger.kernel.org
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
setresuid allows the euid to be set to any of uid, euid, suid, and
fsuid. Therefor it is safe to allow an unprivileged user to map
their euid and use CAP_SETUID privileged with exactly that uid,
as no new credentials can be obtained.
I can not find a combination of existing system calls that allows setting
uid, euid, suid, and fsuid from the fsuid making the previous use
of fsuid for allowing unprivileged mappings a bug.
This is part of a fix for CVE-2014-8989.
Cc: stable@vger.kernel.org
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
As any gid mapping will allow and must allow for backwards
compatibility dropping groups don't allow any gid mappings to be
established without CAP_SETGID in the parent user namespace.
For a small class of applications this change breaks userspace
and removes useful functionality. This small class of applications
includes tools/testing/selftests/mount/unprivilged-remount-test.c
Most of the removed functionality will be added back with the addition
of a one way knob to disable setgroups. Once setgroups is disabled
setting the gid_map becomes as safe as setting the uid_map.
For more common applications that set the uid_map and the gid_map
with privilege this change will have no affect.
This is part of a fix for CVE-2014-8989.
Cc: stable@vger.kernel.org
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
setgroups is unique in not needing a valid mapping before it can be called,
in the case of setgroups(0, NULL) which drops all supplemental groups.
The design of the user namespace assumes that CAP_SETGID can not actually
be used until a gid mapping is established. Therefore add a helper function
to see if the user namespace gid mapping has been established and call
that function in the setgroups permission check.
This is part of the fix for CVE-2014-8989, being able to drop groups
without privilege using user namespaces.
Cc: stable@vger.kernel.org
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The rule is simple. Don't allow anything that wouldn't be allowed
without unprivileged mappings.
It was previously overlooked that establishing gid mappings would
allow dropping groups and potentially gaining permission to files and
directories that had lesser permissions for a specific group than for
all other users.
This is the rule needed to fix CVE-2014-8989 and prevent any other
security issues with new_idmap_permitted.
The reason for this rule is that the unix permission model is old and
there are programs out there somewhere that take advantage of every
little corner of it. So allowing a uid or gid mapping to be
established without privielge that would allow anything that would not
be allowed without that mapping will result in expectations from some
code somewhere being violated. Violated expectations about the
behavior of the OS is a long way to say a security issue.
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Today there are 3 instances of setgroups and due to an oversight their
permission checking has diverged. Add a common function so that
they may all share the same permission checking code.
This corrects the current oversight in the current permission checks
and adds a helper to avoid this in the future.
A user namespace security fix will update this new helper, shortly.
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When inspecting the pivot_root and the current mount expiry logic I
realized that pivot_root fails to clear like mount move does.
Add the missing line in case someone does the interesting feat of
moving an expirable submount. This gives a strong guarantee that root
of the filesystem tree will never expire.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
| |
| |
| |
| |
| |
| | |
old->mnt_expiry should be ignored unless CL_EXPIRE is set.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Clear MNT_LOCKED in the callers of copy_tree except copy_mnt_ns, and
collect_mounts. In copy_mnt_ns it is necessary to create an exact
copy of a mount tree, so not clearing MNT_LOCKED is important.
Similarly collect_mounts is used to take a snapshot of the mount tree
for audit logging purposes and auditing using a faithful copy of the
tree is important.
This becomes particularly significant when we start setting MNT_LOCKED
on rootfs to prevent it from being unmounted.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Andrew Vagin <avagin@parallels.com> writes:
> #define _GNU_SOURCE
> #include <sys/types.h>
> #include <sys/stat.h>
> #include <fcntl.h>
> #include <sched.h>
> #include <unistd.h>
> #include <sys/mount.h>
>
> int main(int argc, char **argv)
> {
> int fd;
>
> fd = open("/proc/self/ns/mnt", O_RDONLY);
> if (fd < 0)
> return 1;
> while (1) {
> if (umount2("/", MNT_DETACH) ||
> setns(fd, CLONE_NEWNS))
> break;
> }
>
> return 0;
> }
>
> root@ubuntu:/home/avagin# gcc -Wall nsenter.c -o nsenter
> root@ubuntu:/home/avagin# strace ./nsenter
> execve("./nsenter", ["./nsenter"], [/* 22 vars */]) = 0
> ...
> open("/proc/self/ns/mnt", O_RDONLY) = 3
> umount("/", MNT_DETACH) = 0
> setns(3, 131072) = 0
> umount("/", MNT_DETACH
>
causes:
> [ 260.548301] ------------[ cut here ]------------
> [ 260.550941] kernel BUG at /build/buildd/linux-3.13.0/fs/pnode.c:372!
> [ 260.552068] invalid opcode: 0000 [#1] SMP
> [ 260.552068] Modules linked in: xt_CHECKSUM iptable_mangle xt_tcpudp xt_addrtype xt_conntrack ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack bridge stp llc dm_thin_pool dm_persistent_data dm_bufio dm_bio_prison iptable_filter ip_tables x_tables crct10dif_pclmul crc32_pclmul ghash_clmulni_intel binfmt_misc nfsd auth_rpcgss nfs_acl aesni_intel nfs lockd aes_x86_64 sunrpc fscache lrw gf128mul glue_helper ablk_helper cryptd serio_raw ppdev parport_pc lp parport btrfs xor raid6_pq libcrc32c psmouse floppy
> [ 260.552068] CPU: 0 PID: 1723 Comm: nsenter Not tainted 3.13.0-30-generic #55-Ubuntu
> [ 260.552068] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
> [ 260.552068] task: ffff8800376097f0 ti: ffff880074824000 task.ti: ffff880074824000
> [ 260.552068] RIP: 0010:[<ffffffff811e9483>] [<ffffffff811e9483>] propagate_umount+0x123/0x130
> [ 260.552068] RSP: 0018:ffff880074825e98 EFLAGS: 00010246
> [ 260.552068] RAX: ffff88007c741140 RBX: 0000000000000002 RCX: ffff88007c741190
> [ 260.552068] RDX: ffff88007c741190 RSI: ffff880074825ec0 RDI: ffff880074825ec0
> [ 260.552068] RBP: ffff880074825eb0 R08: 00000000000172e0 R09: ffff88007fc172e0
> [ 260.552068] R10: ffffffff811cc642 R11: ffffea0001d59000 R12: ffff88007c741140
> [ 260.552068] R13: ffff88007c741140 R14: ffff88007c741140 R15: 0000000000000000
> [ 260.552068] FS: 00007fd5c7e41740(0000) GS:ffff88007fc00000(0000) knlGS:0000000000000000
> [ 260.552068] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 260.552068] CR2: 00007fd5c7968050 CR3: 0000000070124000 CR4: 00000000000406f0
> [ 260.552068] Stack:
> [ 260.552068] 0000000000000002 0000000000000002 ffff88007c631000 ffff880074825ed8
> [ 260.552068] ffffffff811dcfac ffff88007c741140 0000000000000002 ffff88007c741160
> [ 260.552068] ffff880074825f38 ffffffff811dd12b ffffffff811cc642 0000000075640000
> [ 260.552068] Call Trace:
> [ 260.552068] [<ffffffff811dcfac>] umount_tree+0x20c/0x260
> [ 260.552068] [<ffffffff811dd12b>] do_umount+0x12b/0x300
> [ 260.552068] [<ffffffff811cc642>] ? final_putname+0x22/0x50
> [ 260.552068] [<ffffffff811cc849>] ? putname+0x29/0x40
> [ 260.552068] [<ffffffff811dd88c>] SyS_umount+0xdc/0x100
> [ 260.552068] [<ffffffff8172aeff>] tracesys+0xe1/0xe6
> [ 260.552068] Code: 89 50 08 48 8b 50 08 48 89 02 49 89 45 08 e9 72 ff ff ff 0f 1f 44 00 00 4c 89 e6 4c 89 e7 e8 f5 f6 ff ff 48 89 c3 e9 39 ff ff ff <0f> 0b 66 2e 0f 1f 84 00 00 00 00 00 90 66 66 66 66 90 55 b8 01
> [ 260.552068] RIP [<ffffffff811e9483>] propagate_umount+0x123/0x130
> [ 260.552068] RSP <ffff880074825e98>
> [ 260.611451] ---[ end trace 11c33d85f1d4c652 ]--
Which in practice is totally uninteresting. Only the global root user can
do it, and it is just a stupid thing to do.
However that is no excuse to allow a silly way to oops the kernel.
We can avoid this silly problem by setting MNT_LOCKED on the rootfs
mount point and thus avoid needing any special cases in the unmount
code.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Forced unmount affects not just the mount namespace but the underlying
superblock as well. Restrict forced unmount to the global root user
for now. Otherwise it becomes possible a user in a less privileged
mount namespace to force the shutdown of a superblock of a filesystem
in a more privileged mount namespace, allowing a DOS attack on root.
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
- MNT_NODEV should be irrelevant except when reading back mount flags,
no longer specify MNT_NODEV on remount.
- Test MNT_NODEV on devpts where it is meaningful even for unprivileged mounts.
- Add a test to verify that remount of a prexisting mount with the same flags
is allowed and does not change those flags.
- Cleanup up the definitions of MS_REC, MS_RELATIME, MS_STRICTATIME that are used
when the code is built in an environment without them.
- Correct the test error messages when tests fail. There were not 5 tests
that tested MS_RELATIME.
Cc: stable@vger.kernel.org
Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Now that remount is properly enforcing the rule that you can't remove
nodev at least sandstorm.io is breaking when performing a remount.
It turns out that there is an easy intuitive solution implicitly
add nodev on remount when nodev was implicitly added on mount.
Tested-by: Cedric Bosdonnat <cbosdonnat@suse.com>
Tested-by: Richard Weinberger <richard@nod.at>
Cc: stable@vger.kernel.org
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Dave Hansen reports that commit fb7332a9fedf ("mmu_gather: move minimal
range calculations into generic code") caused a performance problem:
"tlb_finish_mmu() goes up about 9x in the profiles (~0.4%->3.6%) and
tlb_flush_mmu_free() takes about 3.1% of CPU time with the patch
applied, but does not show up at all on the commit before"
and the reason is that Will moved the test for whether we need to flush
from tlb_flush_mmu() into tlb_flush_mmu_tlbonly(). But that meant that
tlb_flush_mmu_free() basically lost that check.
Move it back into tlb_flush_mmu() where it belongs, so that it covers
both tlb_flush_mmu_tlbonly() _and_ tlb_flush_mmu_free().
Reported-and-tested-by: Dave Hansen <dave@sr71.net>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
My commit 26178ec11ef3 ("x86: mm: consolidate VM_FAULT_RETRY handling")
had a really stupid typo: the FAULT_FLAG_USER bit is in the 'flags'
variable, not the 'fault' variable. Duh,
The one silver lining in this is that Dave finding this at least
confirms that trinity actually triggers this special path easily, in a
way normal use does not.
Reported-by: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Pull VFIO updates from Alex Williamson:
- s390 support (Frank Blaschka)
- Enable iommu-type1 for ARM SMMU (Will Deacon)
* tag 'vfio-v3.19-rc1' of git://github.com/awilliam/linux-vfio:
drivers/vfio: allow type-1 IOMMU instantiation on top of an ARM SMMU
vfio: make vfio run on s390
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The ARM SMMU driver is compatible with the notion of a type-1 IOMMU in
VFIO.
This patch allows VFIO_IOMMU_TYPE1 to be selected if ARM_SMMU=y.
Signed-off-by: Will Deacon <will.deacon@arm.com>
[aw: update for existing S390 patch]
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
add Kconfig switch to hide INTx
add Kconfig switch to let vfio announce PCI BARs are not mapable
Signed-off-by: Frank Blaschka <frank.blaschka@de.ibm.com>
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux
Pull virtio updates from Rusty Russell:
"A balloon enhancement, and a minor race-on-module-unload theoretical
bug which doesn't merit cc: stable.
All the exciting stuff went via MST this cycle"
* tag 'virtio-next-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rusty/linux:
virtio_balloon: free some memory from balloon on OOM
virtio_balloon: return the amount of freed memory from leak_balloon()
virtio_blk: fix race at module removal
virtio: Fix comment typo 'CONFIG_S_FAILED'
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Excessive virtio_balloon inflation can cause invocation of OOM-killer,
when Linux is under severe memory pressure. Various mechanisms are
responsible for correct virtio_balloon memory management. Nevertheless
it is often the case that these control tools does not have enough time
to react on fast changing memory load. As a result OS runs out of memory
and invokes OOM-killer. The balancing of memory by use of the virtio
balloon should not cause the termination of processes while there are
pages in the balloon. Now there is no way for virtio balloon driver to
free some memory at the last moment before some process will be get
killed by OOM-killer.
This does not provide a security breach as balloon itself is running
inside guest OS and is working in the cooperation with the host. Thus
some improvements from guest side should be considered as normal.
To solve the problem, introduce a virtio_balloon callback which is
expected to be called from the oom notifier call chain in out_of_memory()
function. If virtio balloon could release some memory, it will make
the system to return and retry the allocation that forced the out of
memory killer to run.
Allocate virtio feature bit for this: it is not set by default,
the the guest will not deflate virtio balloon on OOM without explicit
permission from host.
Signed-off-by: Raushaniya Maksudova <rmaksudova@parallels.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This value would be useful in the next patch to provide the amount of
the freed memory for OOM killer.
Signed-off-by: Raushaniya Maksudova <rmaksudova@parallels.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Rusty Russell <rusty@rustcorp.com.au>
CC: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
If a device appears while module is being removed,
driver will get a callback after we've given up
on the major number.
In theory this means this major number can get reused
by something else, resulting in a conflict.
To fix, cleanup in reverse order of initialization.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
| |/ /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Without the VIRTIO_ prefix CONFIG_S_FAILED looks like a Kconfig macro.
So use that prefix here too.
Signed-off-by: Paul Bolle <pebolle@tiscali.nl>
Acked-by: Cornelia Huck <cornelia.huck@de.ibm.com>
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux
Pull thermal management update from Zhang Rui:
"Summary:
- of-thermal extension to allow drivers to register and use its
functionality in a better way, without exploiting thermal core.
From Lukasz Majewski.
- Fix a bug in intel_soc_dts_thermal driver which calls a sleep
function in interrupt handler. From Maurice Petallo.
- add a thermal UAPI header file for exporting the thermal generic
netlink information to user-space. From Florian Fainelli.
- First round of refactoring in Exynos driver. Bartlomiej and Lukasz
are attempting to make it lean and easier to understand.
- New thermal driver for Rockchip (rk3288), with support for DT
thermal. From Caesar Wang.
- New thermal driver for Nvidia, Tegra124 SOCTHERM driver, with
support for DT thermal. From Mikko Perttunen.
- New cooling device, based on common clock framework. From Eduardo
Valentin.
- a couple of small fixes in thermal core framework. From Srinivas
Pandruvada, Javi Merino, Luis Henriques.
- Dropping Armada A375-Z1 SoC thermal support as the chip is not in
the market, armada folks decided to drop its support.
- a couple of small fixes and cleanups in int340x thermal driver"
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (58 commits)
thermal: provide an UAPI header file
Thermal/int340x: Clear the error value of the last acpi_bus_get_device() call
thermal/powerclamp: add id for braswell cpu
thermal: Intel SoC DTS: Don't do thermal zone update inside spin_lock
Thermal: fix platform_no_drv_owner.cocci warnings
Thermal/int340x: avoid unnecessary pointer casting
thermal: int3403: Delete a check before thermal_zone_device_unregister()
thermal/int3400: export uuids
thermal: of: Extend current of-thermal.c code to allow setting emulated temp
thermal: of: Extend of-thermal to export table of trip points
thermal: of: Rename struct __thermal_trip to struct thermal_trip
thermal: of: Extend of-thermal.c to provide check if trip point is valid
thermal: of: Extend of-thermal.c to provide number of trip points
thermal: Fix error path in thermal_init()
thermal: lock the thermal zone when switching governors
thermal: core: ignore invalid trip temperature
thermal: armada: Remove support for A375-Z1 SoC
thermal: rockchip: add driver for thermal
dt-bindings: document Rockchip thermal
thermal: exynos: remove exynos_tmu_data.h include
...
|
| | \ \ | |
| | \ \ | |
| |\ \ \ \ |
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Previously the return value of the last acpi_bus_get_device() was
returned. However, since we only report those issues, it should be
cleared to continue as expected.
Signed-off-by: Ilkka Koskinen <ilkka.koskinen@linux.intel.com>
Acked-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
drivers/thermal/int340x_thermal/int3403_thermal.c:468:3-8: No need to set .owner here. The core will do it.
Remove .owner field if calls are used which set it automatically
Generated by: scripts/coccinelle/api/platform_no_drv_owner.cocci
Signed-off-by: Fengguang Wu <fengguang.wu@intel.com>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
Avoid pointer casting which may also lead to problems on big endian
64 bit systems.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
The thermal_zone_device_unregister() function tests whether its argument
is NULL and then returns immediately. Thus the test around the call
is not needed.
This issue was detected by using the Coccinelle software.
Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
|
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | | |
INT3400 currently supports only one policy, which can't be changed.
This change exports all available policies (uuids) to user space and
allow this to be changed.
It introduces an attribute group uuids in INT3400 platform driver.
There are two attributes exposed:
- available_uuids
- current_uuid
User space can set current_uuid via this interface to one of the
available uuids. The uuid change is communicated to firmware, only
when the current zone mode is changed to "enabled". So the ideal
sequence should be
- set INT3400 zone mode to "disabled"
- change current_uuid
- set INT3400 zone mode to "enabled"
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
|
| | |\ \ \ \ |
|
| | | |\ \ \ \
| | | | |/ / /
| | | |/| | |
| | | | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal into eduardo-soc-thermal
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Before this change it was only possible to set get_temp() and get_trend()
methods to be used in the common code handling passing parameters via
device tree to "cpu-thermal" CPU thermal zone device.
Now it is possible to also set emulated value of temperature for debug
purposes.
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Acked-by: Eduardo Valentin <edubezval@gmail.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
This patch extends the of-thermal.c to export trip points for a given
thermal zone.
Thermal drivers should use of_thermal_get_trip_points() method to get
pointer to table of thermal trip points.
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
This patch changes name of struct __thermal_trip to thermal_trip and moves
declaration of the latter to ./include/linux/thermal.h for better visibility.
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
This patch extends the of-thermal.c to provide check if trip point is
valid.
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
This patch extends the of-thermal.c to provide information about number of
available trip points.
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
The Armada 375 Z1 SoC revision is no longer supported. This commit
removes the quirk needed for the thermal sensor.
Acked-by: Jason Cooper <jason@lakedaemon.net>
Signed-off-by: Ezequiel Garcia <ezequiel.garcia@free-electrons.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Thermal is TS-ADC Controller module supports
user-defined mode and automatic mode.
User-defined mode refers,TSADC all the control signals entirely by
software writing to register for direct control.
Automaic mode refers to the module automatically poll TSADC output,
and the results were checked.If you find that the temperature High
in a period of time,an interrupt is generated to the processor
down-measures taken;If the temperature over a period of time High,
the resulting TSHUT gave CRU module,let it reset the entire chip,
or via GPIO give PMIC.
Signed-off-by: zhaoyifeng <zyf@rock-chips.com>
Signed-off-by: Caesar Wang <caesar.wang@rock-chips.com>
Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
This add the necessary binding documentation for the thermal
found on Rockchip SoCs
Signed-off-by: zhaoyifeng <zyf@rock-chips.com>
Signed-off-by: Caesar Wang <caesar.wang@rock-chips.com>
Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
There is no longer need to share defines between exynos_tmu.c
and exynos_tmu_data.c (as they are now only used by the former
file) so move them accordingly. Then move externs for struct
exynos_tmu_init_data instances to exynos_tmu.h and remove no
longer needed exynos_tmu_data.h include.
There should be no functional changes caused by this patch.
Cc: Amit Daniel Kachhap <amit.daniel@samsung.com>
Cc: Lukasz Majewski <l.majewski@samsung.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Tested-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
__EXYNOS5420_TMU_DATA macro is now identical to __EXYNOS5260_TMU_DATA
one and can be removed.
There should be no functional changes caused by this patch.
Cc: Amit Daniel Kachhap <amit.daniel@samsung.com>
Cc: Lukasz Majewski <l.majewski@samsung.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Tested-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Maximum theoretical size saving (i.e. with only Exynos5410
SoC support enabled in kernel config so all SoC dependend
Exynos thermal driver code was dropped) is 4096 bytes so
there is no much sense in keeping these ifdefs (especially
given that they are useless once the driver gets updated to
use device tree).
While at it remove needless 'void *' casts.
There should be no functional changes caused by this patch.
Cc: Amit Daniel Kachhap <amit.daniel@samsung.com>
Cc: Lukasz Majewski <l.majewski@samsung.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Tested-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
|
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | |
| | | | | | | |
Replace pdata->test_mux check in get_con_reg() by explicitly
checking for SoC type.
Also since the used pdata->test_mux value is always identical
use it directly and remove pdata->test_mux completely.
There should be no functional changes caused by this patch.
Cc: Amit Daniel Kachhap <amit.daniel@samsung.com>
Cc: Lukasz Majewski <l.majewski@samsung.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Cc: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Acked-by: Kyungmin Park <kyungmin.park@samsung.com>
Tested-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
|