summaryrefslogtreecommitdiffstats
path: root/include (follow)
Commit message (Collapse)AuthorAgeFilesLines
* mm/memblock: add memblock memory allocation apisSantosh Shilimkar2014-01-221-0/+152
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce memblock memory allocation APIs which allow to support PAE or LPAE extension on 32 bits archs where the physical memory start address can be beyond 4GB. In such cases, existing bootmem APIs which operate on 32 bit addresses won't work and needs memblock layer which operates on 64 bit addresses. So we add equivalent APIs so that we can replace usage of bootmem with memblock interfaces. Architectures already converted to NO_BOOTMEM use these new memblock interfaces. The architectures which are still not converted to NO_BOOTMEM continue to function as is because we still maintain the fal lback option of bootmem back-end supporting these new interfaces. So no functional change as such. In long run, once all the architectures moves to NO_BOOTMEM, we can get rid of bootmem layer completely. This is one step to remove the core code dependency with bootmem and also gives path for architectures to move away from bootmem. The proposed interface will became active if both CONFIG_HAVE_MEMBLOCK and CONFIG_NO_BOOTMEM are specified by arch. In case !CONFIG_NO_BOOTMEM, the memblock() wrappers will fallback to the existing bootmem apis so that arch's not converted to NO_BOOTMEM continue to work as is. The meaning of MEMBLOCK_ALLOC_ACCESSIBLE and MEMBLOCK_ALLOC_ANYWHERE is kept same. [akpm@linux-foundation.org: s/depricated/deprecated/] Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Paul Walmsley <paul@pwsan.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Russell King <linux@arm.linux.org.uk> Cc: Tony Lindgren <tony@atomide.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/memblock: switch to use NUMA_NO_NODE instead of MAX_NUMNODESGrygorii Strashko2014-01-221-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It's recommended to use NUMA_NO_NODE everywhere to select "process any node" behavior or to indicate that "no node id specified". Hence, update __next_free_mem_range*() API's to accept both NUMA_NO_NODE and MAX_NUMNODES, but emit warning once on MAX_NUMNODES, and correct corresponding API's documentation to describe new behavior. Also, update other memblock/nobootmem APIs where MAX_NUMNODES is used dirrectly. The change was suggested by Tejun Heo. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Paul Walmsley <paul@pwsan.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Russell King <linux@arm.linux.org.uk> Cc: Tony Lindgren <tony@atomide.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/memblock: reorder parameters of memblock_find_in_range_nodeGrygorii Strashko2014-01-221-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | Reorder parameters of memblock_find_in_range_node to be consistent with other memblock APIs. The change was suggested by Tejun Heo <tj@kernel.org>. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Paul Walmsley <paul@pwsan.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Russell King <linux@arm.linux.org.uk> Cc: Tony Lindgren <tony@atomide.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/bootmem: remove duplicated declaration of __free_pages_bootmem()Grygorii Strashko2014-01-221-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | The __free_pages_bootmem is used internally by MM core and already defined in internal.h. So, remove duplicated declaration. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@ti.com> Reviewed-by: Tejun Heo <tj@kernel.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Paul Walmsley <paul@pwsan.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Russell King <linux@arm.linux.org.uk> Cc: Tony Lindgren <tony@atomide.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* introduce for_each_thread() to replace the buggy while_each_thread()Oleg Nesterov2014-01-222-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | while_each_thread() and next_thread() should die, almost every lockless usage is wrong. 1. Unless g == current, the lockless while_each_thread() is not safe. while_each_thread(g, t) can loop forever if g exits, next_thread() can't reach the unhashed thread in this case. Note that this can happen even if g is the group leader, it can exec. 2. Even if while_each_thread() itself was correct, people often use it wrongly. It was never safe to just take rcu_read_lock() and loop unless you verify that pid_alive(g) == T, even the first next_thread() can point to the already freed/reused memory. This patch adds signal_struct->thread_head and task->thread_node to create the normal rcu-safe list with the stable head. The new for_each_thread(g, t) helper is always safe under rcu_read_lock() as long as this task_struct can't go away. Note: of course it is ugly to have both task_struct->thread_node and the old task_struct->thread_group, we will kill it later, after we change the users of while_each_thread() to use for_each_thread(). Perhaps we can kill it even before we convert all users, we can reimplement next_thread(t) using the new thread_head/thread_node. But we can't do this right now because this will lead to subtle behavioural changes. For example, do/while_each_thread() always sees at least one task, while for_each_thread() can do nothing if the whole thread group has died. Or thread_group_empty(), currently its semantics is not clear unless thread_group_leader(p) and we need to audit the callers before we can change it. So this patch adds the new interface which has to coexist with the old one for some time, hopefully the next changes will be more or less straightforward and the old one will go away soon. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Sergey Dyasly <dserrg@gmail.com> Tested-by: Sergey Dyasly <dserrg@gmail.com> Reviewed-by: Sameer Nanda <snanda@chromium.org> Acked-by: David Rientjes <rientjes@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mandeep Singh Baines <msb@chromium.org> Cc: "Ma, Xindong" <xindong.ma@intel.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: "Tu, Xiaobing" <xiaobing.tu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/rmap: use rmap_walk() in page_referenced()Joonsoo Kim2014-01-222-3/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now, we have an infrastructure in rmap_walk() to handle difference from variants of rmap traversing functions. So, just use it in page_referenced(). In this patch, I change following things. 1. remove some variants of rmap traversing functions. cf> page_referenced_ksm, page_referenced_anon, page_referenced_file 2. introduce new struct page_referenced_arg and pass it to page_referenced_one(), main function of rmap_walk, in order to count reference, to store vm_flags and to check finish condition. 3. mechanical change to use rmap_walk() in page_referenced(). [liwanp@linux.vnet.ibm.com: fix BUG at rmap_walk] Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Hillf Danton <dhillf@gmail.com> Signed-off-by: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/rmap: use rmap_walk() in try_to_munlock()Joonsoo Kim2014-01-221-6/+0
| | | | | | | | | | | | | | | | | | | | | | | | Now, we have an infrastructure in rmap_walk() to handle difference from variants of rmap traversing functions. So, just use it in try_to_munlock(). In this patch, I change following things. 1. remove some variants of rmap traversing functions. cf> try_to_unmap_ksm, try_to_unmap_anon, try_to_unmap_file 2. mechanical change to use rmap_walk() in try_to_munlock(). 3. copy and paste comments. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Hillf Danton <dhillf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/rmap: use rmap_walk() in try_to_unmap()Joonsoo Kim2014-01-221-4/+1
| | | | | | | | | | | | | | | | | | | | | | Now, we have an infrastructure in rmap_walk() to handle difference from variants of rmap traversing functions. So, just use it in try_to_unmap(). In this patch, I change following things. 1. enable rmap_walk() if !CONFIG_MIGRATION. 2. mechanical change to use rmap_walk() in try_to_unmap(). Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Hillf Danton <dhillf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/rmap: extend rmap_walk_xxx() to cope with different casesJoonsoo Kim2014-01-221-0/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are a lot of common parts in traversing functions, but there are also a little of uncommon parts in it. By assigning proper function pointer on each rmap_walker_control, we can handle these difference correctly. Following are differences we should handle. 1. difference of lock function in anon mapping case 2. nonlinear handling in file mapping case 3. prechecked condition: checking memcg in page_referenced(), checking VM_SHARE in page_mkclean() checking temporary vma in try_to_unmap() 4. exit condition: checking page_mapped() in try_to_unmap() So, in this patch, I introduce 4 function pointers to handle above differences. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Hillf Danton <dhillf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/rmap: make rmap_walk to get the rmap_walk_control argumentJoonsoo Kim2014-01-222-6/+10
| | | | | | | | | | | | | | | | | | | In each rmap traverse case, there is some difference so that we need function pointers and arguments to them in order to handle these For this purpose, struct rmap_walk_control is introduced in this patch, and will be extended in following patch. Introducing and extending are separate, because it clarify changes. Signed-off-by: Joonsoo Kim <iamjoonsoo.kim@lge.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: Rik van Riel <riel@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Hillf Danton <dhillf@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* memblock, mem_hotplug: make memblock skip hotpluggable regions if neededTang Chen2014-01-221-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Linux kernel cannot migrate pages used by the kernel. As a result, hotpluggable memory used by the kernel won't be able to be hot-removed. To solve this problem, the basic idea is to prevent memblock from allocating hotpluggable memory for the kernel at early time, and arrange all hotpluggable memory in ACPI SRAT(System Resource Affinity Table) as ZONE_MOVABLE when initializing zones. In the previous patches, we have marked hotpluggable memory regions with MEMBLOCK_HOTPLUG flag in memblock.memory. In this patch, we make memblock skip these hotpluggable memory regions in the default top-down allocation function if movable_node boot option is specified. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Rafael J . Wysocki" <rjw@sisk.pl> Cc: Chen Tang <imtangchen@gmail.com> Cc: Gong Chen <gong.chen@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Len Brown <lenb@kernel.org> Cc: Liu Jiang <jiang.liu@huawei.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Renninger <trenn@suse.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* memblock: make memblock_set_node() support different memblock_typeTang Chen2014-01-221-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [sfr@canb.auug.org.au: fix powerpc build] Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Rafael J . Wysocki" <rjw@sisk.pl> Cc: Chen Tang <imtangchen@gmail.com> Cc: Gong Chen <gong.chen@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Len Brown <lenb@kernel.org> Cc: Liu Jiang <jiang.liu@huawei.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Renninger <trenn@suse.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* memblock, mem_hotplug: introduce MEMBLOCK_HOTPLUG flag to mark hotpluggable ↵Tang Chen2014-01-221-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | regions In find_hotpluggable_memory, once we find out a memory region which is hotpluggable, we want to mark them in memblock.memory. So that we could control memblock allocator not to allocte hotpluggable memory for the kernel later. To achieve this goal, we introduce MEMBLOCK_HOTPLUG flag to indicate the hotpluggable memory regions in memblock and a function memblock_mark_hotplug() to mark hotpluggable memory if we find one. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Rafael J . Wysocki" <rjw@sisk.pl> Cc: Chen Tang <imtangchen@gmail.com> Cc: Gong Chen <gong.chen@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Len Brown <lenb@kernel.org> Cc: Liu Jiang <jiang.liu@huawei.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Renninger <trenn@suse.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Wen Congyang <wency@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* memblock, numa: introduce flags field into memblockTang Chen2014-01-221-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no flag in memblock to describe what type the memory is. Sometimes, we may use memblock to reserve some memory for special usage. And we want to know what kind of memory it is. So we need a way to In hotplug environment, we want to reserve hotpluggable memory so the kernel won't be able to use it. And when the system is up, we have to free these hotpluggable memory to buddy. So we need to mark these memory first. In order to do so, we need to mark out these special memory in memblock. In this patch, we introduce a new "flags" member into memblock_region: struct memblock_region { phys_addr_t base; phys_addr_t size; unsigned long flags; /* This is new. */ #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP int nid; #endif }; This patch does the following things: 1) Add "flags" member to memblock_region. 2) Modify the following APIs' prototype: memblock_add_region() memblock_insert_region() 3) Add memblock_reserve_region() to support reserve memory with flags, and keep memblock_reserve()'s prototype unmodified. 4) Modify other APIs to support flags, but keep their prototype unmodified. The idea is from Wen Congyang <wency@cn.fujitsu.com> and Liu Jiang <jiang.liu@huawei.com>. Suggested-by: Wen Congyang <wency@cn.fujitsu.com> Suggested-by: Liu Jiang <jiang.liu@huawei.com> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "Rafael J . Wysocki" <rjw@sisk.pl> Cc: Chen Tang <imtangchen@gmail.com> Cc: Gong Chen <gong.chen@linux.intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Jiang Liu <jiang.liu@huawei.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Cc: Larry Woodman <lwoodman@redhat.com> Cc: Len Brown <lenb@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Nazarewicz <mina86@mina86.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Taku Izumi <izumi.taku@jp.fujitsu.com> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Renninger <trenn@suse.de> Cc: Toshi Kani <toshi.kani@hp.com> Cc: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com> Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: add overcommit_kbytes sysctl variableJerome Marchand2014-01-222-0/+10
| | | | | | | | | | | | | | | | | | | Some applications that run on HPC clusters are designed around the availability of RAM and the overcommit ratio is fine tuned to get the maximum usage of memory without swapping. With growing memory, the 1%-of-all-RAM grain provided by overcommit_ratio has become too coarse for these workload (on a 2TB machine it represents no less than 20GB). This patch adds the new overcommit_kbytes sysctl variable that allow a much finer grain. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: fix nommu build] Signed-off-by: Jerome Marchand <jmarchan@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Alan Cox <alan@lxorguk.ukuu.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm, show_mem: remove SHOW_MEM_FILTER_PAGE_COUNTMel Gorman2014-01-221-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | Commit 4b59e6c47309 ("mm, show_mem: suppress page counts in non-blockable contexts") introduced SHOW_MEM_FILTER_PAGE_COUNT to suppress PFN walks on large memory machines. Commit c78e93630d15 ("mm: do not walk all of system memory during show_mem") avoided a PFN walk in the generic show_mem helper which removes the requirement for SHOW_MEM_FILTER_PAGE_COUNT in that case. This patch removes PFN walkers from the arch-specific implementations that report on a per-node or per-zone granularity. ARM and unicore32 still do a PFN walk as they report memory usage on each bank which is a much finer granularity where the debugging information may still be of use. As the remaining arches doing PFN walks have relatively small amounts of memory, this patch simply removes SHOW_MEM_FILTER_PAGE_COUNT. [akpm@linux-foundation.org: fix parisc] Signed-off-by: Mel Gorman <mgorman@suse.de> Acked-by: David Rientjes <rientjes@google.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: James Bottomley <jejb@parisc-linux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm, mempolicy: remove unneeded functions for UMA configsDavid Rientjes2014-01-221-32/+0
| | | | | | | | | | | | | | | | | | Mempolicies only exist for CONFIG_NUMA configurations. Therefore, a certain class of functions are unneeded in configurations where CONFIG_NUMA is disabled such as functions that duplicate existing mempolicies, lookup existing policies, set certain mempolicy traits, or test mempolicies for certain attributes. Remove the unneeded functions so that any future callers get a compile- time error and protect their code with CONFIG_NUMA as required. Signed-off-by: David Rientjes <rientjes@google.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: create a separate slab for page->ptl allocationKirill A. Shutemov2014-01-221-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | If DEBUG_SPINLOCK and DEBUG_LOCK_ALLOC are enabled spinlock_t on x86_64 is 72 bytes. For page->ptl they will be allocated from kmalloc-96 slab, so we loose 24 on each. An average system can easily allocate few tens thousands of page->ptl and overhead is significant. Let's create a separate slab for page->ptl allocation to solve this. To make sure that it really works this time, some numbers from my test machine (just booted, no load): Before: # grep '^\(kmalloc-96\|page->ptl\)' /proc/slabinfo kmalloc-96 31987 32190 128 30 1 : tunables 120 60 8 : slabdata 1073 1073 92 After: # grep '^\(kmalloc-96\|page->ptl\)' /proc/slabinfo page->ptl 27516 28143 72 53 1 : tunables 120 60 8 : slabdata 531 531 9 kmalloc-96 3853 5280 128 30 1 : tunables 120 60 8 : slabdata 176 176 0 Note that the patch is useful not only for debug case, but also for PREEMPT_RT, where spinlock_t is always bloated. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: get rid of unnecessary pageblock scanning in setup_zone_migrate_reserveYasuaki Ishimatsu2014-01-221-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Yasuaki Ishimatsu reported memory hot-add spent more than 5 _hours_ on 9TB memory machine since onlining memory sections is too slow. And we found out setup_zone_migrate_reserve spent >90% of the time. The problem is, setup_zone_migrate_reserve scans all pageblocks unconditionally, but it is only necessary if the number of reserved block was reduced (i.e. memory hot remove). Moreover, maximum MIGRATE_RESERVE per zone is currently 2. It means that the number of reserved pageblocks is almost always unchanged. This patch adds zone->nr_migrate_reserve_block to maintain the number of MIGRATE_RESERVE pageblocks and it reduces the overhead of setup_zone_migrate_reserve dramatically. The following table shows time of onlining a memory section. Amount of memory | 128GB | 192GB | 256GB| --------------------------------------------- linux-3.12 | 23.9 | 31.4 | 44.5 | This patch | 8.3 | 8.3 | 8.6 | Mel's proposal patch | 10.9 | 19.2 | 31.3 | --------------------------------------------- (millisecond) 128GB : 4 nodes and each node has 32GB of memory 192GB : 6 nodes and each node has 32GB of memory 256GB : 8 nodes and each node has 32GB of memory (*1) Mel proposed his idea by the following threads. https://lkml.org/lkml/2013/10/30/272 [akpm@linux-foundation.org: tweak comment] Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Reported-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Tested-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: thp: turn compound_head() into BUG_ON(!PageTail) in get_huge_page_tail()Oleg Nesterov2014-01-221-4/+3
| | | | | | | | | | | | | | | | get_huge_page_tail()->compound_head() looks confusing. Every caller must check PageTail(page), otherwise atomic_inc(&page->_mapcount) is simply wrong if this page is compound-trans-head. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Dave Jones <davej@redhat.com> Cc: Darren Hart <dvhart@linux.intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Acked-by: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: tail page refcounting optimization for slab and hugetlbfsAndrea Arcangeli2014-01-222-7/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This skips the _mapcount mangling for slab and hugetlbfs pages. The main trouble in doing this is to guarantee that PageSlab and PageHeadHuge remains constant for all get_page/put_page run on the tail of slab or hugetlbfs compound pages. Otherwise if they're set during get_page but not set during put_page, the _mapcount of the tail page would underflow. PageHeadHuge will remain true until the compound page is released and enters the buddy allocator so it won't risk to change even if the tail page is the last reference left on the page. PG_slab instead is cleared before the slab frees the head page with put_page, so if the tail pin is released after the slab freed the page, we would have a problem. But in the slab case the tail pin cannot be the last reference left on the page. This is because the slab code is free to reuse the compound page after a kfree/kmem_cache_free without having to check if there's any tail pin left. In turn all tail pins must be always released while the head is still pinned by the slab code and so we know PG_slab will be still set too. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Khalid Aziz <khalid.aziz@oracle.com> Cc: Pravin Shelar <pshelar@nicira.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: thp: optimize compound_trans_hugeAndrea Arcangeli2014-01-221-0/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we don't clobber page_tail->first_page during split_huge_page, so compound_trans_head can be set to compound_head without adverse effects, and this mostly optimizes away a smp_rmb. It looks worthwhile to keep around the implementation that doesn't relay on page_tail->first_page not to be clobbered, because it would be necessary if we'll decide to enforce page->private to zero at all times whenever PG_private is not set, also for anonymous pages. For anonymous pages enforcing such an invariant doesn't matter as anonymous pages don't use page->private so we can get away with this microoptimization. Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Cc: Khalid Aziz <khalid.aziz@oracle.com> Cc: Pravin Shelar <pshelar@nicira.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Ben Hutchings <bhutchings@solarflare.com> Cc: Christoph Lameter <cl@linux.com> Cc: Johannes Weiner <jweiner@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: hugetlbfs: Add some VM_BUG_ON()s to catch non-hugetlbfs pagesDave Hansen2014-01-221-0/+1
| | | | | | | | | | | | | | | | | | Dave Jiang reported that he was seeing oopses when running NUMA systems and default_hugepagesz=1G. I traced the issue down to migrate_page_copy() trying to use the same code for hugetlb pages and transparent hugepages. It should not have been trying to pass thp pages in there. So, add some VM_BUG_ON()s for the next hapless VM developer that tries the same thing. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com> Tested-by: Dave Jiang <dave.jiang@intel.com> Acked-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: Make {,set}page_address() static inline if WANT_PAGE_VIRTUALGeert Uytterhoeven2014-01-221-5/+8
| | | | | | | | | | | | | | | | | | | | | | | | {,set}page_address() are macros if WANT_PAGE_VIRTUAL. If !WANT_PAGE_VIRTUAL, they're plain C functions. If someone calls them with a void *, this pointer is auto-converted to struct page * if !WANT_PAGE_VIRTUAL, but causes a build failure on architectures using WANT_PAGE_VIRTUAL (arc, m68k and sparc64): drivers/md/bcache/bset.c: In function `__btree_sort': drivers/md/bcache/bset.c:1190: warning: dereferencing `void *' pointer drivers/md/bcache/bset.c:1190: error: request for member `virtual' in something not a structure or union Convert them to static inline functions to fix this. There are already plenty of users of struct page members inside <linux/mm.h>, so there's no reason to keep them as macros. Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Tested-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* posix_acl: uninliningAndrew Morton2014-01-221-72/+6
| | | | | | | | | | | | | | | | | | | | Uninline vast tracts of nested inline functions in include/linux/posix_acl.h. This reduces the text+data+bss size of x86_64 allyesconfig vmlinux by 8026 bytes. The patch also regularises the positioning of the EXPORT_SYMBOLs in posix_acl.c. Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: J. Bruce Fields <bfields@fieldses.org> Cc: Trond Myklebust <Trond.Myklebust@netapp.com> Tested-by: Benny Halevy <bhalevy@primarydata.com> Cc: Benny Halevy <bhalevy@panasas.com> Cc: Andreas Gruenbacher <agruen@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fsnotify: remove .should_send_event callbackJan Kara2014-01-221-4/+0
| | | | | | | | | | | | | After removing event structure creation from the generic layer there is no reason for separate .should_send_event and .handle_event callbacks. So just remove the first one. Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Eric Paris <eparis@parisplace.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* fsnotify: do not share events between notification groupsJan Kara2014-01-221-87/+27
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently fsnotify framework creates one event structure for each notification event and links this event into all interested notification groups. This is done so that we save memory when several notification groups are interested in the event. However the need for event structure shared between inotify & fanotify bloats the event structure so the result is often higher memory consumption. Another problem is that fsnotify framework keeps path references with outstanding events so that fanotify can return open file descriptors with its events. This has the undesirable effect that filesystem cannot be unmounted while there are outstanding events - a regression for inotify compared to a situation before it was converted to fsnotify framework. For fanotify this problem is hard to avoid and users of fanotify should kind of expect this behavior when they ask for file descriptors from notified files. This patch changes fsnotify and its users to create separate event structure for each group. This allows for much simpler code (~400 lines removed by this patch) and also smaller event structures. For example on 64-bit system original struct fsnotify_event consumes 120 bytes, plus additional space for file name, additional 24 bytes for second and each subsequent group linking the event, and additional 32 bytes for each inotify group for private data. After the conversion inotify event consumes 48 bytes plus space for file name which is considerably less memory unless file names are long and there are several groups interested in the events (both of which are uncommon). Fanotify event fits in 56 bytes after the conversion (fanotify doesn't care about file names so its events don't have to have it allocated). A win unless there are four or more fanotify groups interested in the event. The conversion also solves the problem with unmount when only inotify is used as we don't have to grab path references for inotify events. [hughd@google.com: fanotify: fix corruption preventing startup] Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Eric Paris <eparis@parisplace.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* dma-debug: introduce debug_dma_assert_idle()Dan Williams2014-01-221-0/+6
| | | | | | | | | | | | | | | | | | | | | | | Record actively mapped pages and provide an api for asserting a given page is dma inactive before execution proceeds. Placing debug_dma_assert_idle() in cow_user_page() flagged the violation of the dma-api in the NET_DMA implementation (see commit 77873803363c "net_dma: mark broken"). The implementation includes the capability to count, in a limited way, repeat mappings of the same page that occur without an intervening unmap. This 'overlap' counter is limited to the few bits of tag space in a radix tree. This mechanism is added to mitigate false negative cases where, for example, a page is dma mapped twice and debug_dma_assert_idle() is called after the page is un-mapped once. Signed-off-by: Dan Williams <dan.j.williams@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Vinod Koul <vinod.koul@intel.com> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Cc: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'for-v3.14' of git://git.infradead.org/battery-2.6Linus Torvalds2014-01-214-56/+43
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull battery updates from Dmitry Eremin-Solenikov: "I'm picking up power supply maintainership from Anton Vorontov. Could you please pull battery-2.6 git tree changes prepared for the v3.14 release. Highlights: - Power supply notifier - Several drivers gained DT support - Added Maxim 14577 driver - Change of maintainer" * tag 'for-v3.14' of git://git.infradead.org/battery-2.6: MAINTAINERS: Pick up power supply maintainership max17042_battery: Add IRQF_ONESHOT flag to use default irq handler gpio-charger: Support wakeup events power_supply: Add charger support for Maxim 14577 dt: Binding documentation for isp1704 charger isp1704_charger: Add DT support charger-manager: of_cm_parse_desc() should be static bq2415x_charger: Add DT support power_supply: Add power_supply_get_by_phandle bq2415x_charger: Use power_supply notifier for automode power: reset: Add as3722 power-off driver mfd: AS3722: Add dt node properties for system power controller charger-manager: Support deivce tree in charger manager driver charger-manager: Modify the way of checking battery's temperature power_supply: Add power_supply notifier
| * isp1704_charger: Add DT supportSebastian Reichel2013-12-241-0/+1
| | | | | | | | | | | | | | | | | | This patch introduces device tree support to the isp1704 charger driver. Adding support involved moving the handling of the enable GPIO from board code into the driver. Signed-off-by: Sebastian Reichel <sre@debian.org> Signed-off-by: Anton Vorontsov <anton@enomsg.org>
| * power_supply: Add power_supply_get_by_phandleSebastian Reichel2013-12-241-0/+8
| | | | | | | | | | | | | | | | Add method to get power supply by device tree phandle. Signed-off-by: Sebastian Reichel <sre@debian.org> Reviewed-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Anton Vorontsov <anton@enomsg.org>
| * bq2415x_charger: Use power_supply notifier for automodePali Rohár2013-12-241-43/+5
| | | | | | | | | | | | | | | | | | | | | | | | This patch removing set_mode_hook function from board data and replacing it with new string variable of notifier power supply device. After this change it is possible to add DT support because driver does not need specific board function anymore. Only static data and name of power supply device is required. Signed-off-by: Pali Rohár <pali.rohar@gmail.com> Reviewed-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Anton Vorontsov <anton@enomsg.org>
| * charger-manager: Support deivce tree in charger manager driverJonghwa Lee2013-12-241-6/+6
| | | | | | | | | | | | | | | | | | Charger-manager can parse charger_desc data from devicetree which is used to register charger manager. Signed-off-by: Jonghwa Lee <jonghwa3.lee@samsung.com> Signed-off-by: Myungjoo Ham <myungjoo.ham@samsung.com> Signed-off-by: Anton Vorontsov <anton@enomsg.org>
| * charger-manager: Modify the way of checking battery's temperatureJonghwa Lee2013-12-241-8/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Charger-manager driver used to check battery temperature through the callback function passed by platform data. Unfortunatley, without that pre-defined callback function, charger-manager can't get battery's temperature at all. Also passing callback function through platform data ruins DT support for charger-manager. This patch mondifies charger-manager driver to get temperature of battery without pre-defined callback function. Now, charger-manager can use either of battery thermometer in fuel-gauge and ouside of battery. It uses thermal framework interface for outer thermometer if thermal fw is enabled. Otherwise, it tries to use fuel-gauge's through the power supply interface. Signed-off-by: Jonghwa Lee <jonghwa3.lee@samsung.com> Signed-off-by: Myungjoo Ham <myungjoo.ham@samsung.com> Signed-off-by: Anton Vorontsov <anton@enomsg.org>
| * power_supply: Add power_supply notifierPali Rohár2013-12-011-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a notifier chain to the power_supply, this helps drivers in other subsystem to listen to changes in power supply subsystem. This would help to take some actions in those drivers on changing the power supply properties. One such scenario is to increase/decrease system performance based on the battery capacity/voltage. Another scenario is to adjust the h/w peak current detection voltage/current thresholds based on battery voltage/capacity. The notifier helps drivers to listen to changes in power_suppy susbystem without polling the power_supply properties Signed-off-by: Jenny TC <jenny.tc@intel.com> Signed-off-by: Pali Rohár <pali.rohar@gmail.com> Acked-by: Jenny TC <jenny.tc@intel.com> Signed-off-by: Anton Vorontsov <anton@enomsg.org>
* | Merge tag 'mfd-3.14-1' of git://git.linaro.org/people/ljones/mfdLinus Torvalds2014-01-2113-63/+533
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull MFD changes from Lee Jones: "New drivers - Samsung Maxim 14577; Micro USB, Regulator, IRQ Controller and Battery Charger - TI/National Semiconductor LP3943 I2C GPIO Expander and PWM Generator Existing driver adaptions - Expansion of Wolfson Arizona DSP and High-Pass filter controls - TI TWL6040 default Regmap support and Regcache addition/bypass - Some nice Smatch catch fixes - Conversion of TI OMAP-USB and TI TWL6030 to endian neutralness - ChromeOS EC timing (delay) adaptions and added dependency on OF - Many constifications of 'struct {mfd_cell,regmap_irq,et.al}' - Watchdog support added for NVIDIA AS3722 - Convert functions to static in TI AM335x - Realigned previously defeated functionality in TI AM335x - IIO ADC-TSC concurrency dead-lock/timeout resolution - Addition of Power Management and Clock support for Samsung core - DEFINE_PCI_DEVICE_TABLE macro removal from MFD Subsystem - Greater use of irqdomain functionality in ST-E AB8500 - Removal of 'include/linux/mfd/abx500/ab8500-gpio.h' - Wolfson WM831x PMIC Power Management changes s/poweroff/shutdown/ - Device Tree documentation added for TI/Nat Semi LP3943 - Version detection and voltage tables for TI TPS6586x PMIC devices - Simplification of Freescale MC13XXX (de-)initialisation routines - Clean-up and simplification of the Realtek parent driver - Added support for RTL8402 Realtek PCI-Express card reader - Resource leak fix for Maxim 77686 - Possible suspend BUG() fix in OMAP USB TLL - Support for new Wolfson WM5110 Revision (D) - Testing of automatic assignment of of_node in mfd_add_device() - Reversion of the above when it started to cause issues - Remove legacy Platform Data from; TI TWL Core, Qualcomm SSBI and ST-E ABx500 Pinctrl - Clean-ups; tabbing issues, function name changes, 'drvdata = NULL' removal, unused uninitialised warning mitigation, error message clarity, removal of redundant/duplicate checks, licensing (GPL -> GPL2), coding consistency, duplicate function declaration, ret checks, commit corrections, redundant of_match_ptr() helper removal, spelling, #if-deffery removal and header guards name changes" * tag 'mfd-3.14-1' of git://git.linaro.org/people/ljones/mfd: (78 commits) mfd: wm5110: Add register patch for rev D chip mfd: omap-usb-tll: Don't hold lock during pm_runtime_get/put_sync() gpio: lp3943: Remove redundant of_match_ptr helper mfd: sta2x11-mfd: Use named constants for pci_power_t values Documentation: mfd: Fix LDO index in s2mps11.txt mfd: Cleanup mfd-mcp-sa11x0.h header mfd: max8997: Use "IS_ENABLED(CONFIG_OF)" for DT code. mfd: twl6030: Fix endianness problem in IRQ handler mfd: sec-core: Add cells for S5M8767-clocks mfd: max14577: Remove redundant of_match_ptr helper mfd: twl6040: Fix sparse non static symbol warning mfd: Revert "mfd: Always assign of_node in mfd_add_device()" mfd: rtsx: Fix sparse non static symbol warning mfd: max77693: Set proper maximum register for MUIC regmap mfd: max77686: Fix regmap resource leak on driver remove mfd: Represent correct filenames in file headers mfd: rtsx: Add support for card reader rtl8402 mfd: rtsx: Add set pull control macro and simplify rtl8411 mfd: max8997: Enforce mfd_add_devices() return value check mfd: mc13xxx: Simplify probe() & remove() ...
| * | mfd: Cleanup mfd-mcp-sa11x0.h headerSachin Kamat2014-01-211-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit a1fd844c6e62 ("ARM: sa1100: move platform_data definitions") moved the file to the current location but forgot to remove the pointer to its previous location. Clean it up. While at it also change the header file protection macros appropriately. Signed-off-by: Sachin Kamat <sachin.kamat@linaro.org> Signed-off-by: Lee Jones <lee.jones@linaro.org>
| * | mfd: Represent correct filenames in file headersLaszlo Papp2014-01-213-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | The original author(s) probably copy/pasted these headers from the existing public header files. Signed-off-by: Laszlo Papp <lpapp@kde.org> Signed-off-by: Lee Jones <lee.jones@linaro.org>
| * | mfd: mc13xxx: Remove duplicate mc13xxx_get_flags() declarationAlexander Shiyan2014-01-211-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | mc13xxx_get_flags() declaration given twice. This patch removes this duplicate. Signed-off-by: Alexander Shiyan <shc_work@mail.ru> Signed-off-by: Lee Jones <lee.jones@linaro.org>
| * | mfd: ssbi: Constify buffer in ssbi_writeStephen Boyd2014-01-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | In preparation for passing a const pointer directly to ssbi_write() from the regmap APIs. Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Lee Jones <lee.jones@linaro.org>
| * | mfd: ssbi: Remove platform data structs and hide ssbi type enumStephen Boyd2014-01-211-16/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | The ssbi driver assumes that the device is DT based. Remove the platform data structs that will never be used and hide the enum in the only C file that uses it. Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Lee Jones <lee.jones@linaro.org>
| * | mfd: tps6586x: Add version detectionStefan Agner2014-01-211-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the VERSIONCRC to determine the exact device version. According to the datasheet this register can be used as device identifier. The identification is needed since some tps6586x regulators use a different voltage table. Signed-off-by: Stefan Agner <stefan@agner.ch> Reviewed-by: Thierry Reding <treding@nvidia.com> Acked-by: Stephen Warren <swarren@nvidia.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
| * | mfd: Add LP3943 MFD driverMilo Kim2014-01-211-0/+114
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | LP3943 has 16 output pins which can be used as GPIO expander and PWM generator. * Regmap I2C interface for R/W LP3943 registers * Atomic operations for output pin assignment The driver should check whether requested pin is available or not. If the pin is already used, pin request returns as a failure. A driver data, 'pin_used' is checked when gpio_request() and pwm_request() are called. If the pin is available, then pin_used is set. And it is cleared when gpio_free() and pwm_free(). * Device tree support Compatible strings for GPIO and PWM driver. LP3943 platform data is PWM related, so parsing the device tree is implemented in the PWM driver. Signed-off-by: Milo Kim <milo.kim@ti.com> Signed-off-by: Lee Jones <lee.jones@linaro.org>
| * | mfd/pinctrl: Delete platform data headerLinus Walleij2014-01-211-23/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | This deletes the special AB8500 GPIO platform data passing header and merges the few remaining contents down into the abx500 pinctrl driver which handles the abx500 GPIO device. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Lee Jones <lee.jones@linaro.org>
| * | mfd: ab8500: Delete all GPIO platform data instancesLinus Walleij2014-01-212-12/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | This deletes all instances where the AB8500 GPIO platform data is passed around. It is completely unused in the kernel now, so it does not hurt anyone. Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Lee Jones <lee.jones@linaro.org>
| * | Merge tag 'ib-iio-input-3.13-1' into for-mfd-nextLee Jones2014-01-211-2/+6
| |\ \ | | | | | | | | | | | | Immutable branch for IIO and Input
| | * | mfd: input: iio: ti_amm335x: Rework TSC/ADC synchronizationSebastian Andrzej Siewior2014-01-071-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The ADC driver always programs all possible ADC values and discards them except for the value IIO asked for. On the am335x-evm the driver programs four values and it takes 500us to gather them. Reducing the number of conversations down to the (required) one also reduces the busy loop down to 125us. This leads to another error, namely the FIFOCOUNT register is sometimes (like one out of 10 attempts) not updated in time leading to EBUSY. The next read has the FIFOCOUNT register updated. Checking for the ADCSTAT register for being idle isn't a good choice either. The problem is that if TSC is used at the same time, the HW completes the conversation for ADC *and* before the driver noticed it, the HW begins to perform a TSC conversation and so the driver never seen the HW idle. The next time we would have two values in the FIFO but since the driver reads everything we always see the current one. So instead of polling for the IDLE bit in ADCStatus register, we should check the FIFOCOUNT register. It should be one instead of zero because we request one value. This change in turn leads to another error. Sometimes if TSC & ADC are used together the TSC starts generating interrupts even if nobody actually touched the touchscreen. The interrupts seem valid because TSC's FIFO is filled with values for each channel of the TSC. This condition stops after a few ADC reads but will occur again. Not good. On top of this (even without the changes I just mentioned) there is a ADC & TSC lockup condition which was reported to me by Jeff Lance including the following test case: A busy loop of "cat /sys/bus/iio/devices/iio\:device0/in_voltage4_raw" and a mug on touch screen. With this setup, the hardware will lockup after something between 20 minutes and it could take up to a couple of hours. During that lockup, the ADCSTAT register says 0x30 (or 0x70) which means STEP_ID = IDLE and FSM_BUSY = yes. That means the hardware says that it is idle and busy at the same time which is an invalid condition. For all this reasons I decided to rework this TSC/ADC part and add a handshake / synchronization here: First the ADC signals that it needs the HW and writes a 0 mask into the SE register. The HW (if active) will complete the current conversation and become idle. The TSC driver will gather the values from the FIFO (woken up by an interrupt) and won't "enable" another conversation. Instead it will wake up the ADC driver which is already waiting. The ADC driver will start "its" conversation and once it is done, it will enable the TSC steps so the TSC will work again. After this rework I haven't observed the lockup so far. Plus the busy loop has been reduced from 500us to 125us. The continues-read mode remains unchanged. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Lee Jones <lee.jones@linaro.org>
| | * | mfd: ti_am335x_tscadc: Don't read back REG_SESebastian Andrzej Siewior2014-01-071-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The purpose of reg_se_cache has been defeated. It should avoid the read-back of the register to avoid the latency and the fact that the bits are reset to 0 after the individual conversation took place. The reason why this is required like this to work, is that read-back of the register removes the bits of the ADC so they do not start another conversation after the register is re-written from the TSC side for the update. To avoid the not required read-back I introduce a "set once" variant which does not update the cache mask. After the conversation completes, the bit is removed from the SE register anyway and we don't plan a new conversation "any time soon". The current set function is renamed to set_cache to distinguish the two operations. This is a small preparation for a larger sync-rework. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Acked-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Lee Jones <lee.jones@linaro.org>
| | * | mfd: ti_am335x_tscadc: Make am335x_tsc_se_update() localSebastian Andrzej Siewior2014-01-071-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since the "recent" changes, am335x_tsc_se_update() has no longer any users outside of this file so make it local. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Lee Jones <lee.jones@linaro.org>
| * | | Merge tag 'ib-asoc-3.14.2' into for-mfd-nextLee Jones2014-01-211-0/+3
| |\ \ \ | | | | | | | | | | | | | | | Immutable branch between MFD and ASoC due for the v3.14 merge window