summaryrefslogtreecommitdiffstats
path: root/include (follow)
Commit message (Collapse)AuthorAgeFilesLines
* mm, fs: move randomize_stack_top from fs to mmAlexandre Ghiti2019-09-251-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "Provide generic top-down mmap layout functions", v6. This series introduces generic functions to make top-down mmap layout easily accessible to architectures, in particular riscv which was the initial goal of this series. The generic implementation was taken from arm64 and used successively by arm, mips and finally riscv. Note that in addition the series fixes 2 issues: - stack randomization was taken into account even if not necessary. - [1] fixed an issue with mmap base which did not take into account randomization but did not report it to arm and mips, so by moving arm64 into a generic library, this problem is now fixed for both architectures. This work is an effort to factorize architecture functions to avoid code duplication and oversights as in [1]. [1]: https://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1429066.html This patch (of 14): This preparatory commit moves this function so that further introduction of generic topdown mmap layout is contained only in mm/util.c. Link: http://lkml.kernel.org/r/20190730055113.23635-2-alex@ghiti.fr Signed-off-by: Alexandre Ghiti <alex@ghiti.fr> Acked-by: Kees Cook <keescook@chromium.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Paul Burton <paul.burton@mips.com> Cc: James Hogan <jhogan@kernel.org> Cc: Palmer Dabbelt <palmer@sifive.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* khugepaged: enable collapse pmd for pte-mapped THPSong Liu2019-09-251-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | khugepaged needs exclusive mmap_sem to access page table. When it fails to lock mmap_sem, the page will fault in as pte-mapped THP. As the page is already a THP, khugepaged will not handle this pmd again. This patch enables the khugepaged to retry collapse the page table. struct mm_slot (in khugepaged.c) is extended with an array, containing addresses of pte-mapped THPs. We use array here for simplicity. We can easily replace it with more advanced data structures when needed. In khugepaged_scan_mm_slot(), if the mm contains pte-mapped THP, we try to collapse the page table. Since collapse may happen at an later time, some pages may already fault in. collapse_pte_mapped_thp() is added to properly handle these pages. collapse_pte_mapped_thp() also double checks whether all ptes in this pmd are mapping to the same THP. This is necessary because some subpage of the THP may be replaced, for example by uprobe. In such cases, it is not possible to collapse the pmd. [kirill.shutemov@linux.intel.com: add comments for retract_page_tables()] Link: http://lkml.kernel.org/r/20190816145443.6ard3iilytc6jlgv@box Link: http://lkml.kernel.org/r/20190815164525.1848545-6-songliubraving@fb.com Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm, thp: introduce FOLL_SPLIT_PMDSong Liu2019-09-251-0/+1
| | | | | | | | | | | | | | | | | Introduce a new foll_flag: FOLL_SPLIT_PMD. As the name says FOLL_SPLIT_PMD splits huge pmd for given mm_struct, the underlining huge page stays as-is. FOLL_SPLIT_PMD is useful for cases where we need to use regular pages, but would switch back to huge page and huge pmd on. One of such example is uprobe. The following patches use FOLL_SPLIT_PMD in uprobe. Link: http://lkml.kernel.org/r/20190815164525.1848545-4-songliubraving@fb.com Signed-off-by: Song Liu <songliubraving@fb.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: move memcmp_pages() and pages_identical()Song Liu2019-09-251-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "THP aware uprobe", v13. This patchset makes uprobe aware of THPs. Currently, when uprobe is attached to text on THP, the page is split by FOLL_SPLIT. As a result, uprobe eliminates the performance benefit of THP. This set makes uprobe THP-aware. Instead of FOLL_SPLIT, we introduces FOLL_SPLIT_PMD, which only split PMD for uprobe. After all uprobes within the THP are removed, the PTE-mapped pages are regrouped as huge PMD. This set (plus a few THP patches) is also available at https://github.com/liu-song-6/linux/tree/uprobe-thp This patch (of 6): Move memcmp_pages() to mm/util.c and pages_identical() to mm.h, so that we can use them in other files. Link: http://lkml.kernel.org/r/20190815164525.1848545-2-songliubraving@fb.com Signed-off-by: Song Liu <songliubraving@fb.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Matthew Wilcox <matthew.wilcox@oracle.com> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: thp: make deferred split shrinker memcg awareYang Shi2019-09-253-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently THP deferred split shrinker is not memcg aware, this may cause premature OOM with some configuration. For example the below test would run into premature OOM easily: $ cgcreate -g memory:thp $ echo 4G > /sys/fs/cgroup/memory/thp/memory/limit_in_bytes $ cgexec -g memory:thp transhuge-stress 4000 transhuge-stress comes from kernel selftest. It is easy to hit OOM, but there are still a lot THP on the deferred split queue, memcg direct reclaim can't touch them since the deferred split shrinker is not memcg aware. Convert deferred split shrinker memcg aware by introducing per memcg deferred split queue. The THP should be on either per node or per memcg deferred split queue if it belongs to a memcg. When the page is immigrated to the other memcg, it will be immigrated to the target memcg's deferred split queue too. Reuse the second tail page's deferred_list for per memcg list since the same THP can't be on multiple deferred split queues. [yang.shi@linux.alibaba.com: simplify deferred split queue dereference per Kirill Tkhai] Link: http://lkml.kernel.org/r/1566496227-84952-5-git-send-email-yang.shi@linux.alibaba.com Link: http://lkml.kernel.org/r/1565144277-36240-5-git-send-email-yang.shi@linux.alibaba.com Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Qian Cai <cai@lca.pw> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: shrinker: make shrinker not depend on memcg kmemYang Shi2019-09-252-9/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently shrinker is just allocated and can work when memcg kmem is enabled. But, THP deferred split shrinker is not slab shrinker, it doesn't make too much sense to have such shrinker depend on memcg kmem. It should be able to reclaim THP even though memcg kmem is disabled. Introduce a new shrinker flag, SHRINKER_NONSLAB, for non-slab shrinker. When memcg kmem is disabled, just such shrinkers can be called in shrinking memcg slab. [yang.shi@linux.alibaba.com: add comment] Link: http://lkml.kernel.org/r/1566496227-84952-4-git-send-email-yang.shi@linux.alibaba.com Link: http://lkml.kernel.org/r/1565144277-36240-4-git-send-email-yang.shi@linux.alibaba.com Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Qian Cai <cai@lca.pw> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: thp: extract split_queue_* into a structYang Shi2019-09-251-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "Make deferred split shrinker memcg aware", v6. Currently THP deferred split shrinker is not memcg aware, this may cause premature OOM with some configuration. For example the below test would run into premature OOM easily: $ cgcreate -g memory:thp $ echo 4G > /sys/fs/cgroup/memory/thp/memory/limit_in_bytes $ cgexec -g memory:thp transhuge-stress 4000 transhuge-stress comes from kernel selftest. It is easy to hit OOM, but there are still a lot THP on the deferred split queue, memcg direct reclaim can't touch them since the deferred split shrinker is not memcg aware. Convert deferred split shrinker memcg aware by introducing per memcg deferred split queue. The THP should be on either per node or per memcg deferred split queue if it belongs to a memcg. When the page is immigrated to the other memcg, it will be immigrated to the target memcg's deferred split queue too. Reuse the second tail page's deferred_list for per memcg list since the same THP can't be on multiple deferred split queues. Make deferred split shrinker not depend on memcg kmem since it is not slab. It doesn't make sense to not shrink THP even though memcg kmem is disabled. With the above change the test demonstrated above doesn't trigger OOM even though with cgroup.memory=nokmem. This patch (of 4): Put split_queue, split_queue_lock and split_queue_len into a struct in order to reduce code duplication when we convert deferred_split to memcg aware in the later patches. Link: http://lkml.kernel.org/r/1565144277-36240-2-git-send-email-yang.shi@linux.alibaba.com Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Suggested-by: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Hugh Dickins <hughd@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: David Rientjes <rientjes@google.com> Cc: Qian Cai <cai@lca.pw> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm,thp: avoid writes to file with THP in pagecacheSong Liu2019-09-251-0/+32
| | | | | | | | | | | | | | | | | | | | | | | | | In previous patch, an application could put part of its text section in THP via madvise(). These THPs will be protected from writes when the application is still running (TXTBSY). However, after the application exits, the file is available for writes. This patch avoids writes to file THP by dropping page cache for the file when the file is open for write. A new counter nr_thps is added to struct address_space. In do_dentry_open(), if the file is open for write and nr_thps is non-zero, we drop page cache for the whole file. Link: http://lkml.kernel.org/r/20190801184244.3169074-8-songliubraving@fb.com Signed-off-by: Song Liu <songliubraving@fb.com> Reported-by: kbuild test robot <lkp@intel.com> Acked-by: Rik van Riel <riel@surriel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Hillf Danton <hdanton@sina.com> Cc: Hugh Dickins <hughd@google.com> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm,thp: stats for file backed THPSong Liu2019-09-251-0/+2
| | | | | | | | | | | | | | | | | | | | | In preparation for non-shmem THP, this patch adds a few stats and exposes them in /proc/meminfo, /sys/bus/node/devices/<node>/meminfo, and /proc/<pid>/task/<tid>/smaps. This patch is mostly a rewrite of Kirill A. Shutemov's earlier version: https://lkml.kernel.org/r/20170126115819.58875-5-kirill.shutemov@linux.intel.com/ Link: http://lkml.kernel.org/r/20190801184244.3169074-5-songliubraving@fb.com Signed-off-by: Song Liu <songliubraving@fb.com> Acked-by: Rik van Riel <riel@surriel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Hillf Danton <hdanton@sina.com> Cc: Hugh Dickins <hughd@google.com> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* thp: update split_huge_page_pmd() commentKefeng Wang2019-09-251-3/+2
| | | | | | | | | | | According to 78ddc5347341 ("thp: rename split_huge_page_pmd() to split_huge_pmd()"), update related comment. Link: http://lkml.kernel.org/r/20190731033406.185285-1-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm, compaction: raise compaction priority after it withdrawnsVlastimil Babka2019-09-251-5/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mike Kravetz reports that "hugetlb allocations could stall for minutes or hours when should_compact_retry() would return true more often then it should. Specifically, this was in the case where compact_result was COMPACT_DEFERRED and COMPACT_PARTIAL_SKIPPED and no progress was being made." The problem is that the compaction_withdrawn() test in should_compact_retry() includes compaction outcomes that are only possible on low compaction priority, and results in a retry without increasing the priority. This may result in furter reclaim, and more incomplete compaction attempts. With this patch, compaction priority is raised when possible, or should_compact_retry() returns false. The COMPACT_SKIPPED result doesn't really fit together with the other outcomes in compaction_withdrawn(), as that's a result caused by insufficient order-0 pages, not due to low compaction priority. With this patch, it is moved to a new compaction_needs_reclaim() function, and for that outcome we keep the current logic of retrying if it looks like reclaim will be able to help. Link: http://lkml.kernel.org/r/20190806014744.15446-4-mike.kravetz@oracle.com Reported-by: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Tested-by: Mike Kravetz <mike.kravetz@oracle.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/vmalloc: modify struct vmap_area to reduce its sizePengfei Li2019-09-251-7/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Objective --------- The current implementation of struct vmap_area wasted space. After applying this commit, sizeof(struct vmap_area) has been reduced from 11 words to 8 words. Description ----------- 1) Pack "subtree_max_size", "vm" and "purge_list". This is no problem because A) "subtree_max_size" is only used when vmap_area is in "free" tree B) "vm" is only used when vmap_area is in "busy" tree C) "purge_list" is only used when vmap_area is in vmap_purge_list 2) Eliminate "flags". ;Since only one flag VM_VM_AREA is being used, and the same thing can be done by judging whether "vm" is NULL, then the "flags" can be eliminated. Link: http://lkml.kernel.org/r/20190716152656.12255-3-lpf.vector@gmail.com Signed-off-by: Pengfei Li <lpf.vector@gmail.com> Suggested-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Oleksiy Avramchenko <oleksiy.avramchenko@sonymobile.com> Cc: Roman Gushchin <guro@fb.com> Cc: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* drivers/base/memory.c: don't store end_section_nr in memory blocksDavid Hildenbrand2019-09-251-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | Each memory block spans the same amount of sections/pages/bytes. The size is determined before the first memory block is created. No need to store what we can easily calculate - and the calculations even look simpler now. Michal brought up the idea of variable-sized memory blocks. However, if we ever implement something like this, we will need an API compatibility switch and reworks at various places (most code assumes a fixed memory block size). So let's cleanup what we have right now. While at it, fix the variable naming in register_mem_sect_under_node() - we no longer talk about a single section. Link: http://lkml.kernel.org/r/20190809110200.2746-1-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* driver/base/memory.c: validate memory block size earlyDavid Hildenbrand2019-09-251-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | Let's validate the memory block size early, when initializing the memory device infrastructure. Fail hard in case the value is not suitable. As nobody checks the return value of memory_dev_init(), turn it into a void function and fail with a panic in all scenarios instead. Otherwise, we'll crash later during boot when core/drivers expect that the memory device infrastructure (including memory_block_size_bytes()) works as expected. I think long term, we should move the whole memory block size configuration (set_memory_block_size_order() and memory_block_size_bytes()) into drivers/base/memory.c. Link: http://lkml.kernel.org/r/20190806090142.22709-1-david@redhat.com Signed-off-by: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Pavel Tatashin <pasha.tatashin@soleen.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: consolidate pgtable_cache_init() and pgd_cache_init()Mike Rapoport2019-09-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | Both pgtable_cache_init() and pgd_cache_init() are used to initialize kmem cache for page table allocations on several architectures that do not use PAGE_SIZE tables for one or more levels of the page table hierarchy. Most architectures do not implement these functions and use __weak default NOP implementation of pgd_cache_init(). Since there is no such default for pgtable_cache_init(), its empty stub is duplicated among most architectures. Rename the definitions of pgd_cache_init() to pgtable_cache_init() and drop empty stubs of pgtable_cache_init(). Link: http://lkml.kernel.org/r/1566457046-22637-1-git-send-email-rppt@linux.ibm.com Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Will Deacon <will@kernel.org> [arm64] Acked-by: Thomas Gleixner <tglx@linutronix.de> [x86] Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: remove quicklist page table cachesNicholas Piggin2019-09-252-99/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Patch series "mm: remove quicklist page table caches". A while ago Nicholas proposed to remove quicklist page table caches [1]. I've rebased his patch on the curren upstream and switched ia64 and sh to use generic versions of PTE allocation. [1] https://lore.kernel.org/linux-mm/20190711030339.20892-1-npiggin@gmail.com This patch (of 3): Remove page table allocator "quicklists". These have been around for a long time, but have not got much traction in the last decade and are only used on ia64 and sh architectures. The numbers in the initial commit look interesting but probably don't apply anymore. If anybody wants to resurrect this it's in the git history, but it's unhelpful to have this code and divergent allocator behaviour for minor archs. Also it might be better to instead make more general improvements to page allocator if this is still so slow. Link: http://lkml.kernel.org/r/1565250728-21721-2-git-send-email-rppt@linux.ibm.com Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm/gup: add make_dirty arg to put_user_pages_dirty_lock()akpm@linux-foundation.org2019-09-251-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [11~From: John Hubbard <jhubbard@nvidia.com> Subject: mm/gup: add make_dirty arg to put_user_pages_dirty_lock() Patch series "mm/gup: add make_dirty arg to put_user_pages_dirty_lock()", v3. There are about 50+ patches in my tree [2], and I'll be sending out the remaining ones in a few more groups: * The block/bio related changes (Jerome mostly wrote those, but I've had to move stuff around extensively, and add a little code) * mm/ changes * other subsystem patches * an RFC that shows the current state of the tracking patch set. That can only be applied after all call sites are converted, but it's good to get an early look at it. This is part a tree-wide conversion, as described in fc1d8e7cca2d ("mm: introduce put_user_page*(), placeholder versions"). This patch (of 3): Provide more capable variation of put_user_pages_dirty_lock(), and delete put_user_pages_dirty(). This is based on the following: 1. Lots of call sites become simpler if a bool is passed into put_user_page*(), instead of making the call site choose which put_user_page*() variant to call. 2. Christoph Hellwig's observation that set_page_dirty_lock() is usually correct, and set_page_dirty() is usually a bug, or at least questionable, within a put_user_page*() calling chain. This leads to the following API choices: * put_user_pages_dirty_lock(page, npages, make_dirty) * There is no put_user_pages_dirty(). You have to hand code that, in the rare case that it's required. [jhubbard@nvidia.com: remove unused variable in siw_free_plist()] Link: http://lkml.kernel.org/r/20190729074306.10368-1-jhubbard@nvidia.com Link: http://lkml.kernel.org/r/20190724044537.10458-2-jhubbard@nvidia.com Signed-off-by: John Hubbard <jhubbard@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Jan Kara <jack@suse.cz> Cc: Christoph Hellwig <hch@lst.de> Cc: Ira Weiny <ira.weiny@intel.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: page cache: store only head pages in i_pagesMatthew Wilcox (Oracle)2019-09-251-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Transparent Huge Pages are currently stored in i_pages as pointers to consecutive subpages. This patch changes that to storing consecutive pointers to the head page in preparation for storing huge pages more efficiently in i_pages. Large parts of this are "inspired" by Kirill's patch https://lore.kernel.org/lkml/20170126115819.58875-2-kirill.shutemov@linux.intel.com/ Kirill and Huang Ying contributed several fixes. [willy@infradead.org: use compound_nr, squish uninit-var warning] Link: http://lkml.kernel.org/r/20190731210400.7419-1-willy@infradead.org Signed-off-by: Matthew Wilcox <willy@infradead.org> Acked-by: Jan Kara <jack@suse.cz> Reviewed-by: Kirill Shutemov <kirill@shutemov.name> Reviewed-by: Song Liu <songliubraving@fb.com> Tested-by: Song Liu <songliubraving@fb.com> Tested-by: William Kucharski <william.kucharski@oracle.com> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Tested-by: Qian Cai <cai@lca.pw> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Song Liu <songliubraving@fb.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm, page_owner: keep owner info when freeing the pageVlastimil Babka2019-09-251-0/+1
| | | | | | | | | | | | | | | | | | | | | | | For debugging purposes it might be useful to keep the owner info even after page has been freed, and include it in e.g. dump_page() when detecting a bad page state. For that, change the PAGE_EXT_OWNER flag meaning to "page owner info has been set at least once" and add new PAGE_EXT_OWNER_ACTIVE for tracking whether page is supposed to be currently tracked allocated or free. Adjust dump_page() accordingly, distinguishing free and allocated pages. In the page_owner debugfs file, keep printing only allocated pages so that existing scripts are not confused, and also because free pages are irrelevant for the memory statistics or leak detection that's the typical use case of the file, anyway. Link: http://lkml.kernel.org/r/20190820131828.22684-4-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Matthew Wilcox <willy@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: introduce compound_nr()Matthew Wilcox (Oracle)2019-09-251-0/+6
| | | | | | | | | | | | | | Replace 1 << compound_order(page) with compound_nr(page). Minor improvements in readability. Link: http://lkml.kernel.org/r/20190721104612.19120-4-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: introduce page_shift()Matthew Wilcox (Oracle)2019-09-251-0/+6
| | | | | | | | | | | | | | | | Replace PAGE_SHIFT + compound_order(page) with the new page_shift() function. Minor improvements in readability. [akpm@linux-foundation.org: fix build in tce_page_is_contained()] Link: http://lkml.kernel.org/r/201907241853.yNQTrJWd%25lkp@intel.com Link: http://lkml.kernel.org/r/20190721104612.19120-3-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: introduce page_size()Matthew Wilcox (Oracle)2019-09-252-1/+7
| | | | | | | | | | | | | | | | | | | | | Patch series "Make working with compound pages easier", v2. These three patches add three helpers and convert the appropriate places to use them. This patch (of 3): It's unnecessarily hard to find out the size of a potentially huge page. Replace 'PAGE_SIZE << compound_order(page)' with page_size(page). Link: http://lkml.kernel.org/r/20190721104612.19120-2-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm, slab: move memcg_cache_params structure to mm/slab.hWaiman Long2019-09-251-62/+0
| | | | | | | | | | | | | | | | | | | | | | | | | The memcg_cache_params structure is only embedded into the kmem_cache of slab and slub allocators as defined in slab_def.h and slub_def.h and used internally by mm code. There is no needed to expose it in a public header. So move it from include/linux/slab.h to mm/slab.h. It is just a refactoring patch with no code change. In fact both the slub_def.h and slab_def.h should be moved into the mm directory as well, but that will probably cause many merge conflicts. Link: http://lkml.kernel.org/r/20190718180827.18758-1-longman@redhat.com Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Roman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Shakeel Butt <shakeelb@google.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* jbd2: remove jbd2_journal_inode_add_[write|wait]Joseph Qi2019-09-251-2/+0
| | | | | | | | | | | | | | | | | | | Since ext4/ocfs2 are using jbd2_inode dirty range scoping APIs now, jbd2_journal_inode_add_[write|wait] are not used any more, remove them. Link: http://lkml.kernel.org/r/1562977611-8412-2-git-send-email-joseph.qi@linux.alibaba.com Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Ross Zwisler <zwisler@google.com> Acked-by: Changwei Ge <chge@linux.alibaba.com> Cc: Gang He <ghe@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <jiangqi903@gmail.com> Cc: Jun Piao <piaojun@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: "Theodore Ts'o" <tytso@mit.edu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* mm: add dummy can_do_mlock() helperArnd Bergmann2019-09-251-0/+4
| | | | | | | | | | | | | | | | | | | | | | On kernels without CONFIG_MMU, we get a link error for the siw driver: drivers/infiniband/sw/siw/siw_mem.o: In function `siw_umem_get': siw_mem.c:(.text+0x4c8): undefined reference to `can_do_mlock' This is probably not the only driver that needs the function and could otherwise build correctly without CONFIG_MMU, so add a dummy variant that always returns false. Link: http://lkml.kernel.org/r/20190909204201.931830-1-arnd@arndb.de Fixes: 2251334dcac9 ("rdma/siw: application buffer management") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Suggested-by: Jason Gunthorpe <jgg@mellanox.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Bernard Metzler <bmt@zurich.ibm.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* modules: make MODULE_IMPORT_NS() work even when modular builds are disabledLinus Torvalds2019-09-221-2/+2
| | | | | | | | | | | | | | | It's an unusual configuration, and was apparently never tested, and not caught in linux-next because of a combination of travels and it making it into the tree too late. The fix is to simply move the #define to outside the CONFIG_MODULE section, since MODULE_INFO() will do the right thing. Cc: Martijn Coenen <maco@android.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Matthias Maennich <maennich@google.com> Cc: Jessica Yu <jeyu@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'soundwire-5.4-rc1' of ↵Linus Torvalds2019-09-222-2/+19
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire Pull soundwire updates from Vinod Koul: "This includes DT support thanks to Srini and more work done by Intel (Pierre) on improving cadence and intel support. Summary: - Add DT bindings and DT support in core - Add debugfs support for soundwire properties - Improvements on streaming handling to core - Improved handling of Cadence module - More updates and improvements to Intel driver" * tag 'soundwire-5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/soundwire: (30 commits) soundwire: stream: make stream name a const pointer soundwire: Add compute_params callback soundwire: core: add device tree support for slave devices dt-bindings: soundwire: add slave bindings soundwire: bus: set initial value to port_status soundwire: intel: handle disabled links soundwire: intel: add debugfs register dump soundwire: cadence_master: add debugfs register dump soundwire: add debugfs support soundwire: intel: remove unused variables soundwire: intel: move shutdown() callback and don't export symbol soundwire: cadence_master: add kernel parameter to override interrupt mask soundwire: intel_init: add kernel module parameter to filter out links soundwire: cadence_master: fix divider setting in clock register soundwire: cadence_master: make use of mclk_freq property soundwire: intel: read mclk_freq property from firmware soundwire: add new mclk_freq field for properties soundwire: stream: remove unnecessary variable initializations soundwire: stream: fix disable sequence soundwire: include mod_devicetable.h to avoid compiling warnings ...
| * soundwire: stream: make stream name a const pointerSrinivas Kandagatla2019-09-041-2/+2
| | | | | | | | | | | | | | | | Make stream name const pointer Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20190813083550.5877-3-srinivas.kandagatla@linaro.org Signed-off-by: Vinod Koul <vkoul@kernel.org>
| * soundwire: Add compute_params callbackVinod Koul2019-09-041-0/+2
| | | | | | | | | | | | | | | | This callback allows masters to compute the bus parameters required. Signed-off-by: Srinivas Kandagatla <srinivas.kandagatla@linaro.org> Link: https://lore.kernel.org/r/20190813083550.5877-2-srinivas.kandagatla@linaro.org Signed-off-by: Vinod Koul <vkoul@kernel.org>
| * soundwire: intel: handle disabled linksPierre-Louis Bossart2019-08-231-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On most hardware platforms, SoundWire interfaces are pin-muxed with other interfaces (typically DMIC or I2S) and the status of each link needs to be checked at boot time. For Intel platforms, the BIOS provides a menu to enable/disable the links separately, and the information is provided to the OS with an Intel-specific _DSD property. The same capability will be added to revisions of the MIPI DisCo specification. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20190821185821.12690-5-pierre-louis.bossart@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
| * soundwire: add debugfs supportPierre-Louis Bossart2019-08-231-0/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add base debugfs mechanism for SoundWire bus by creating soundwire root and master-N and slave-x hierarchy. Also add SDW Slave SCP, DP0 and DP-N register debug file. Registers not implemented will print as "XX" Credits: this patch is based on an earlier internal contribution by Vinod Koul, Sanyog Kale, Shreyas Nc and Hardik Shah. Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Sanyog Kale <sanyog.r.kale@intel.com> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20190821185821.12690-2-pierre-louis.bossart@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
| * soundwire: add new mclk_freq field for propertiesPierre-Louis Bossart2019-08-211-0/+2
| | | | | | | | | | | | | | | | | | To help pass platform-specific values, add a new field that can either be set by the Master driver or read from firmware (BIOS/DT). Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20190806005522.22642-11-pierre-louis.bossart@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
| * soundwire: include mod_devicetable.h to avoid compiling warningsBard liao2019-08-211-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When integrating SoundWire, kbuild throws this warning with randconfig: >> include/linux/soundwire/sdw.h:571:17: warning: 'struct sdw_device_id' declared inside parameter list will not be visible outside of this definition or declaration const struct sdw_device_id *id); ^~~~~~~~~~~~~ Fix by adding the relevant include Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Bard liao <yung-chuan.liao@linux.intel.com> Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20190806005522.22642-8-pierre-louis.bossart@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
| * soundwire: intel: prevent possible dereference in hw_paramsPierre-Louis Bossart2019-08-211-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This should not happen in production systems but we should test for all callback arguments before invoking the config_stream callback. Update the prototype to clarify that the first argument is mandatory. Also use local variable instead of multiple dereferences to improve readability. Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Link: https://lore.kernel.org/r/20190806005522.22642-2-pierre-louis.bossart@linux.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org>
* | Merge tag 'modules-for-v5.4' of ↵Linus Torvalds2019-09-223-19/+74
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux Pull modules updates from Jessica Yu: "The main bulk of this pull request introduces a new exported symbol namespaces feature. The number of exported symbols is increasingly growing with each release (we're at about 31k exports as of 5.3-rc7) and we currently have no way of visualizing how these symbols are "clustered" or making sense of this huge export surface. Namespacing exported symbols allows kernel developers to more explicitly partition and categorize exported symbols, as well as more easily limiting the availability of namespaced symbols to other parts of the kernel. For starters, we have introduced the USB_STORAGE namespace to demonstrate the API's usage. I have briefly summarized the feature and its main motivations in the tag below. Summary: - Introduce exported symbol namespaces. This new feature allows subsystem maintainers to partition and categorize their exported symbols into explicit namespaces. Module authors are now required to import the namespaces they need. Some of the main motivations of this feature include: allowing kernel developers to better manage the export surface, allow subsystem maintainers to explicitly state that usage of some exported symbols should only be limited to certain users (think: inter-module or inter-driver symbols, debugging symbols, etc), as well as more easily limiting the availability of namespaced symbols to other parts of the kernel. With the module import requirement, it is also easier to spot the misuse of exported symbols during patch review. Two new macros are introduced: EXPORT_SYMBOL_NS() and EXPORT_SYMBOL_NS_GPL(). The API is thoroughly documented in Documentation/kbuild/namespaces.rst. - Some small code and kbuild cleanups here and there" * tag 'modules-for-v5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux: module: Remove leftover '#undef' from export header module: remove unneeded casts in cmp_name() module: move CONFIG_UNUSED_SYMBOLS to the sub-menu of MODULES module: remove redundant 'depends on MODULES' module: Fix link failure due to invalid relocation on namespace offset usb-storage: export symbols in USB_STORAGE namespace usb-storage: remove single-use define for debugging docs: Add documentation for Symbol Namespaces scripts: Coccinelle script for namespace dependencies. modpost: add support for generating namespace dependencies export: allow definition default namespaces in Makefiles or sources module: add config option MODULE_ALLOW_MISSING_NAMESPACE_IMPORTS modpost: add support for symbol namespaces module: add support for symbol namespaces. export: explicitly align struct kernel_symbol module: support reading multiple values per modinfo tag
| * | module: Remove leftover '#undef' from export headerWill Deacon2019-09-121-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 7290d5809571 ("module: use relative references for __ksymtab entries") converted the '__put' #define into an assembly macro in asm-generic/export.h but forgot to remove the corresponding '#undef'. Remove the leftover '#undef'. Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Jessica Yu <jeyu@kernel.org> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Jessica Yu <jeyu@kernel.org>
| * | module: Fix link failure due to invalid relocation on namespace offsetWill Deacon2019-09-112-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 8651ec01daed ("module: add support for symbol namespaces.") broke linking for arm64 defconfig: | lib/crypto/arc4.o: In function `__ksymtab_arc4_setkey': | arc4.c:(___ksymtab+arc4_setkey+0x8): undefined reference to `no symbol' | lib/crypto/arc4.o: In function `__ksymtab_arc4_crypt': | arc4.c:(___ksymtab+arc4_crypt+0x8): undefined reference to `no symbol' This is because the dummy initialisation of the 'namespace_offset' field in 'struct kernel_symbol' when using EXPORT_SYMBOL on architectures with support for PREL32 locations uses an offset from an absolute address (0) in an effort to trick 'offset_to_pointer' into behaving as a NOP, allowing non-namespaced symbols to be treated in the same way as those belonging to a namespace. Unfortunately, place-relative relocations require a symbol reference rather than an absolute value and, although x86 appears to get away with this due to placing the kernel text at the top of the address space, it almost certainly results in a runtime failure if the kernel is relocated dynamically as a result of KASLR. Rework 'namespace_offset' so that a value of 0, which cannot occur for a valid namespaced symbol, indicates that the corresponding symbol does not belong to a namespace. Cc: Matthias Maennich <maennich@google.com> Cc: Jessica Yu <jeyu@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Fixes: 8651ec01daed ("module: add support for symbol namespaces.") Reported-by: kbuild test robot <lkp@intel.com> Tested-by: Matthias Maennich <maennich@google.com> Tested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Matthias Maennich <maennich@google.com> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Jessica Yu <jeyu@kernel.org>
| * | export: allow definition default namespaces in Makefiles or sourcesMatthias Maennich2019-09-101-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To avoid excessive usage of EXPORT_SYMBOL_NS(sym, MY_NAMESPACE), where MY_NAMESPACE will always be the namespace we are exporting to, allow exporting all definitions of EXPORT_SYMBOL() and friends by defining DEFAULT_SYMBOL_NAMESPACE. For example, to export all symbols defined in usb-common into the namespace USB_COMMON, add a line like this to drivers/usb/common/Makefile: ccflags-y += -DDEFAULT_SYMBOL_NAMESPACE=USB_COMMON That is equivalent to changing all EXPORT_SYMBOL(sym) definitions to EXPORT_SYMBOL_NS(sym, USB_COMMON). Subsequently all symbol namespaces functionality will apply. Another way of making use of this feature is to define the namespace within source or header files similar to how TRACE_SYSTEM defines are used: #undef DEFAULT_SYMBOL_NAMESPACE #define DEFAULT_SYMBOL_NAMESPACE USB_COMMON Please note that, as opposed to TRACE_SYSTEM, DEFAULT_SYMBOL_NAMESPACE has to be defined before including include/linux/export.h. If DEFAULT_SYMBOL_NAMESPACE is defined, a symbol can still be exported to another namespace by using EXPORT_SYMBOL_NS() and friends with explicitly specifying the namespace. Suggested-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Martijn Coenen <maco@android.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Matthias Maennich <maennich@google.com> Signed-off-by: Jessica Yu <jeyu@kernel.org>
| * | module: add support for symbol namespaces.Matthias Maennich2019-09-103-19/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The EXPORT_SYMBOL_NS() and EXPORT_SYMBOL_NS_GPL() macros can be used to export a symbol to a specific namespace. There are no _GPL_FUTURE and _UNUSED variants because these are currently unused, and I'm not sure they are necessary. I didn't add EXPORT_SYMBOL_NS() for ASM exports; this patch sets the namespace of ASM exports to NULL by default. In case of relative references, it will be relocatable to NULL. If there's a need, this should be pretty easy to add. A module that wants to use a symbol exported to a namespace must add a MODULE_IMPORT_NS() statement to their module code; otherwise, modpost will complain when building the module, and the kernel module loader will emit an error and fail when loading the module. MODULE_IMPORT_NS() adds a modinfo tag 'import_ns' to the module. That tag can be observed by the modinfo command, modpost and kernel/module.c at the time of loading the module. The ELF symbols are renamed to include the namespace with an asm label; for example, symbol 'usb_stor_suspend' in namespace USB_STORAGE becomes 'usb_stor_suspend.USB_STORAGE'. This allows modpost to do namespace checking, without having to go through all the effort of parsing ELF and relocation records just to get to the struct kernel_symbols. On x86_64 I saw no difference in binary size (compression), but at runtime this will require a word of memory per export to hold the namespace. An alternative could be to store namespaced symbols in their own section and use a separate 'struct namespaced_kernel_symbol' for that section, at the cost of making the module loader more complex. Co-developed-by: Martijn Coenen <maco@android.com> Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Matthias Maennich <maennich@google.com> Signed-off-by: Jessica Yu <jeyu@kernel.org>
| * | export: explicitly align struct kernel_symbolMatthias Maennich2019-09-102-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This change allows growing struct kernel_symbol without wasting bytes to alignment. It also concretized the alignment of ksymtab entries if relative references are used for ksymtab entries. struct kernel_symbol was already implicitly being aligned to the word size, except on x86_64 and m68k, where it is aligned to 16 and 2 bytes, respectively. As far as I can tell there is no requirement for aligning struct kernel_symbol to 16 bytes on x86_64, but gcc aligns structs to their size, and the linker aligns the custom __ksymtab sections to the largest data type contained within, so setting KSYM_ALIGN to 16 was necessary to stay consistent with the code generated for non-ASM EXPORT_SYMBOL(). Now that non-ASM EXPORT_SYMBOL() explicitly aligns to word size (8), KSYM_ALIGN is no longer necessary. In case of relative references, the alignment has been changed accordingly to not waste space when adding new struct members. As for m68k, struct kernel_symbol is aligned to 2 bytes even though the structure itself is 8 bytes; using a 4-byte alignment shouldn't hurt. I manually verified the output of the __ksymtab sections didn't change on x86, x86_64, arm, arm64 and m68k. As expected, the section contents didn't change, and the ELF section alignment only changed on x86_64 and m68k. Feedback from other archs more than welcome. Co-developed-by: Martijn Coenen <maco@android.com> Signed-off-by: Martijn Coenen <maco@android.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Matthias Maennich <maennich@google.com> Signed-off-by: Jessica Yu <jeyu@kernel.org>
* | | Merge tag 'mips_5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linuxLinus Torvalds2019-09-223-0/+27
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull MIPS updates from Paul Burton: "Main MIPS changes: - boot_mem_map is removed, providing a nice cleanup made possible by the recent removal of bootmem. - Some fixes to atomics, in general providing compiler barriers for smp_mb__{before,after}_atomic plus fixes specific to Loongson CPUs or MIPS32 systems using cmpxchg64(). - Conversion to the new generic VDSO infrastructure courtesy of Vincenzo Frascino. - Removal of undefined behavior in set_io_port_base(), fixing the behavior of some MIPS kernel configurations when built with recent clang versions. - Initial MIPS32 huge page support, functional on at least Ingenic SoCs. - pte_special() is now supported for some configurations, allowing among other things generic fast GUP to be used. - Miscellaneous fixes & cleanups. And platform specific changes: - Major improvements to Ingenic SoC support from Paul Cercueil, mostly enabled by the inclusion of the new TCU (timer-counter unit) drivers he's spent a very patient year or so working on. Plus some fixes for X1000 SoCs from Zhou Yanjie. - Netgear R6200 v1 systems are now supported by the bcm47xx platform. - DT updates for BMIPS, Lantiq & Microsemi Ocelot systems" * tag 'mips_5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux: (89 commits) MIPS: Detect bad _PFN_SHIFT values MIPS: Disable pte_special() for MIPS32 with RiXi MIPS: ralink: deactivate PCI support for SOC_MT7621 mips: compat: vdso: Use legacy syscalls as fallback MIPS: Drop Loongson _CACHE_* definitions MIPS: tlbex: Remove cpu_has_local_ebase MIPS: tlbex: Simplify r3k check MIPS: Select R3k-style TLB in Kconfig MIPS: PCI: refactor ioc3 special handling mips: remove ioremap_cachable mips/atomic: Fix smp_mb__{before,after}_atomic() mips/atomic: Fix loongson_llsc_mb() wreckage mips/atomic: Fix cmpxchg64 barriers MIPS: Octeon: remove duplicated include from dma-octeon.c firmware: bcm47xx_nvram: Allow COMPILE_TEST firmware: bcm47xx_nvram: Correct size_t printf format MIPS: Treat Loongson Extensions as ASEs MIPS: Remove dev_err() usage after platform_get_irq() MIPS: dts: mscc: describe the PTP ready interrupt MIPS: dts: mscc: describe the PTP register range ...
| * | | clk: jz4740: Add TCU clockPaul Cercueil2019-08-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add the missing TCU clock to the list of clocks supplied by the CGU for the JZ4740 SoC. Signed-off-by: Paul Cercueil <paul@crapouillou.net> Tested-by: Mathieu Malaterre <malat@debian.org> Tested-by: Artur Rojek <contact@artur-rojek.eu> Acked-by: Stephen Boyd <sboyd@kernel.org> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: Paul Burton <paul.burton@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: James Hogan <jhogan@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lee Jones <lee.jones@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Michael Turquette <mturquette@baylibre.com> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Rob Herring <robh+dt@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: devicetree@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-mips@vger.kernel.org Cc: linux-clk@vger.kernel.org Cc: od@zcrc.me
| * | | mfd/syscon: Add device_node_to_regmap()Paul Cercueil2019-08-091-0/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | device_node_to_regmap() is exactly like syscon_node_to_regmap(), but it does not check that the node is compatible with "syscon", and won't attach the first clock it finds to the regmap. The rationale behind this, is that one device node with a standard compatible string "foo,bar" can be covered by multiple drivers sharing a regmap, or by a single driver doing all the job without a regmap, but these are implementation details which shouldn't reflect on the devicetree. Signed-off-by: Paul Cercueil <paul@crapouillou.net> Acked-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Paul Burton <paul.burton@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: James Hogan <jhogan@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lee Jones <lee.jones@linaro.org> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Michael Turquette <mturquette@baylibre.com> Cc: Stephen Boyd <sboyd@kernel.org> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Rob Herring <robh+dt@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: devicetree@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-mips@vger.kernel.org Cc: linux-clk@vger.kernel.org Cc: od@zcrc.me Cc: Mathieu Malaterre <malat@debian.org>
| * | | dt-bindings: ingenic: Add DT bindings for TCU clocksPaul Cercueil2019-08-091-0/+20
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This header provides clock numbers for the ingenic,tcu DT binding. Signed-off-by: Paul Cercueil <paul@crapouillou.net> Tested-by: Mathieu Malaterre <malat@debian.org> Tested-by: Artur Rojek <contact@artur-rojek.eu> Reviewed-by: Rob Herring <robh@kernel.org> Acked-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Paul Burton <paul.burton@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: James Hogan <jhogan@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Lee Jones <lee.jones@linaro.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Michael Turquette <mturquette@baylibre.com> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Rob Herring <robh+dt@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: devicetree@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-mips@vger.kernel.org Cc: linux-clk@vger.kernel.org Cc: od@zcrc.me
* | | Merge tag 'f2fs-for-5.4' of ↵Linus Torvalds2019-09-212-1/+10
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs Pull f2fs updates from Jaegeuk Kim: "In this round, we introduced casefolding support in f2fs, and fixed various bugs in individual features such as IO alignment, checkpoint=disable, quota, and swapfile. Enhancement: - support casefolding w/ enhancement in ext4 - support fiemap for directory - support FS_IO_GET|SET_FSLABEL Bug fix: - fix IO stuck during checkpoint=disable - avoid infinite GC loop - fix panic/overflow related to IO alignment feature - fix livelock in swap file - fix discard command leak - disallow dio for atomic_write" * tag 'f2fs-for-5.4' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs: (51 commits) f2fs: add a condition to detect overflow in f2fs_ioc_gc_range() f2fs: fix to add missing F2FS_IO_ALIGNED() condition f2fs: fix to fallback to buffered IO in IO aligned mode f2fs: fix to handle error path correctly in f2fs_map_blocks f2fs: fix extent corrupotion during directIO in LFS mode f2fs: check all the data segments against all node ones f2fs: Add a small clarification to CONFIG_FS_F2FS_FS_SECURITY f2fs: fix inode rwsem regression f2fs: fix to avoid accessing uninitialized field of inode page in is_alive() f2fs: avoid infinite GC loop due to stale atomic files f2fs: Fix indefinite loop in f2fs_gc() f2fs: convert inline_data in prior to i_size_write f2fs: fix error path of f2fs_convert_inline_page() f2fs: add missing documents of reserve_root/resuid/resgid f2fs: fix flushing node pages when checkpoint is disabled f2fs: enhance f2fs_is_checkpoint_ready()'s readability f2fs: clean up __bio_alloc()'s parameter f2fs: fix wrong error injection path in inc_valid_block_count() f2fs: fix to writeout dirty inode during node flush f2fs: optimize case-insensitive lookups ...
| * | | f2fs: include charset encoding information in the superblockDaniel Rosenberg2019-08-231-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add charset encoding to f2fs to support casefolding. It is modeled after the same feature introduced in commit c83ad55eaa91 ("ext4: include charset encoding information in the superblock") Currently this is not compatible with encryption, similar to the current ext4 imlpementation. This will change in the future. >From the ext4 patch: """ The s_encoding field stores a magic number indicating the encoding format and version used globally by file and directory names in the filesystem. The s_encoding_flags defines policies for using the charset encoding, like how to handle invalid sequences. The magic number is mapped to the exact charset table, but the mapping is specific to ext4. Since we don't have any commitment to support old encodings, the only encoding I am supporting right now is utf8-12.1.0. The current implementation prevents the user from enabling encoding and per-directory encryption on the same filesystem at the same time. The incompatibility between these features lies in how we do efficient directory searches when we cannot be sure the encryption of the user provided fname will match the actual hash stored in the disk without decrypting every directory entry, because of normalization cases. My quickest solution is to simply block the concurrent use of these features for now, and enable it later, once we have a better solution. """ Signed-off-by: Daniel Rosenberg <drosen@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * | | fs: Reserve flag for casefoldingDaniel Rosenberg2019-08-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In preparation for including the casefold feature within f2fs, elevate the EXT4_CASEFOLD_FL flag to FS_CASEFOLD_FL. Signed-off-by: Daniel Rosenberg <drosen@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
| * | | f2fs: fix panic of IO alignment featureChao Yu2019-08-231-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since 07173c3ec276 ("block: enable multipage bvecs"), one bio vector can store multi pages, so that we can not calculate max IO size of bio as PAGE_SIZE * bio->bi_max_vecs. However IO alignment feature of f2fs always has that assumption, so finally, it may cause panic during IO submission as below stack. kernel BUG at fs/f2fs/data.c:317! RIP: 0010:__submit_merged_bio+0x8b0/0x8c0 Call Trace: f2fs_submit_page_write+0x3cd/0xdd0 do_write_page+0x15d/0x360 f2fs_outplace_write_data+0xd7/0x210 f2fs_do_write_data_page+0x43b/0xf30 __write_data_page+0xcf6/0x1140 f2fs_write_cache_pages+0x3ba/0xb40 f2fs_write_data_pages+0x3dd/0x8b0 do_writepages+0xbb/0x1e0 __writeback_single_inode+0xb6/0x800 writeback_sb_inodes+0x441/0x910 wb_writeback+0x261/0x650 wb_workfn+0x1f9/0x7a0 process_one_work+0x503/0x970 worker_thread+0x7d/0x820 kthread+0x1ad/0x210 ret_from_fork+0x35/0x40 This patch adds one extra condition to check left space in bio while trying merging page to bio, to avoid panic. This bug was reported in bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=204043 Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
* | | | Merge tag 'for_v5.4-rc1' of ↵Linus Torvalds2019-09-211-1/+1
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext2, quota, udf fixes and cleanups from Jan Kara: - two small quota fixes (in grace time handling and possible missed accounting of preallocated blocks beyond EOF). - some ext2 cleanups - udf fixes for better compatibility with Windows 10 generated media (named streams, write-protection using domain-identifier, placement of volume recognition sequence) - some udf cleanups * tag 'for_v5.4-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: quota: fix wrong condition in is_quota_modification() fs-udf: Delete an unnecessary check before brelse() ext2: Delete an unnecessary check before brelse() udf: Drop forward function declarations udf: Verify domain identifier fields udf: augment UDF permissions on new inodes udf: Use dynamic debug infrastructure udf: reduce leakage of blocks related to named streams udf: prevent allocation beyond UDF partition quota: fix condition for resetting time limit in do_set_dqblk() ext2: code cleanup for ext2_free_blocks() ext2: fix block range in ext2_data_block_valid() udf: support 2048-byte spacing of VRS descriptors on 4K media udf: refactor VRS descriptor identification
| * | | | quota: fix wrong condition in is_quota_modification()Chao Yu2019-09-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Quoted from commit 3da40c7b0898 ("ext4: only call ext4_truncate when size <= isize") " At LSF we decided that if we truncate up from isize we shouldn't trim fallocated blocks that were fallocated with KEEP_SIZE and are past the new i_size. This patch fixes ext4 to do this. " And generic/092 of fstest have covered this case for long time, however is_quota_modification() didn't adjust based on that rule, so that in below condition, we will lose to quota block change: - fallocate blocks beyond EOF - remount - truncate(file_path, file_size) Fix it. Link: https://lore.kernel.org/r/20190911093650.35329-1-yuchao0@huawei.com Fixes: 3da40c7b0898 ("ext4: only call ext4_truncate when size <= isize") CC: stable@vger.kernel.org Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jan Kara <jack@suse.cz>