summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* x86/build: Add arch/x86/tools/insn_decoder_test to .gitignoreProgyan Bhattacharya2018-02-131-0/+1
| | | | | | | | | | | The file was generated by make command and should not be in the source tree. Signed-off-by: Progyan Bhattacharya <progyanb@acm.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86/smpboot: Fix uncore_pci_remove() indexing bug when hot-removing a ↵Masayoshi Mizuma2018-02-131-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | physical CPU When a physical CPU is hot-removed, the following warning messages are shown while the uncore device is removed in uncore_pci_remove(): WARNING: CPU: 120 PID: 5 at arch/x86/events/intel/uncore.c:988 uncore_pci_remove+0xf1/0x110 ... CPU: 120 PID: 5 Comm: kworker/u1024:0 Not tainted 4.15.0-rc8 #1 Workqueue: kacpi_hotplug acpi_hotplug_work_fn ... Call Trace: pci_device_remove+0x36/0xb0 device_release_driver_internal+0x145/0x210 pci_stop_bus_device+0x76/0xa0 pci_stop_root_bus+0x44/0x60 acpi_pci_root_remove+0x1f/0x80 acpi_bus_trim+0x54/0x90 acpi_bus_trim+0x2e/0x90 acpi_device_hotplug+0x2bc/0x4b0 acpi_hotplug_work_fn+0x1a/0x30 process_one_work+0x141/0x340 worker_thread+0x47/0x3e0 kthread+0xf5/0x130 When uncore_pci_remove() runs, it tries to get the package ID to clear the value of uncore_extra_pci_dev[].dev[] by using topology_phys_to_logical_pkg(). The warning messesages are shown because topology_phys_to_logical_pkg() returns -1. arch/x86/events/intel/uncore.c: static void uncore_pci_remove(struct pci_dev *pdev) { ... phys_id = uncore_pcibus_to_physid(pdev->bus); ... pkg = topology_phys_to_logical_pkg(phys_id); // returns -1 for (i = 0; i < UNCORE_EXTRA_PCI_DEV_MAX; i++) { if (uncore_extra_pci_dev[pkg].dev[i] == pdev) { uncore_extra_pci_dev[pkg].dev[i] = NULL; break; } } WARN_ON_ONCE(i >= UNCORE_EXTRA_PCI_DEV_MAX); // <=========== HERE!! topology_phys_to_logical_pkg() tries to find cpuinfo_x86->phys_proc_id that matches the phys_pkg argument. arch/x86/kernel/smpboot.c: int topology_phys_to_logical_pkg(unsigned int phys_pkg) { int cpu; for_each_possible_cpu(cpu) { struct cpuinfo_x86 *c = &cpu_data(cpu); if (c->initialized && c->phys_proc_id == phys_pkg) return c->logical_proc_id; } return -1; } However, the phys_proc_id was already set to 0 by remove_siblinginfo() when the CPU was offlined. So, topology_phys_to_logical_pkg() cannot find the correct logical_proc_id and always returns -1. As the result, uncore_pci_remove() calls WARN_ON_ONCE() and the warning messages are shown. What is worse is that the bogus 'pkg' index results in two bugs: - We dereference uncore_extra_pci_dev[] with a negative index - We fail to clean up a stale pointer in uncore_extra_pci_dev[][] To fix these bugs, remove the clearing of ->phys_proc_id from remove_siblinginfo(). This should not cause any problems, because ->phys_proc_id is not used after it is hot-removed and it is re-set while hot-adding. Signed-off-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Acked-by: Thomas Gleixner <tglx@linutronix.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: yasu.isimatu@gmail.com Cc: <stable@vger.kernel.org> Fixes: 30bb9811856f ("x86/topology: Avoid wasting 128k for package id array") Link: http://lkml.kernel.org/r/ed738d54-0f01-b38b-b794-c31dc118c207@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86/mm/kcore: Add vsyscall page to /proc/kcore conditionallyJia Zhang2018-02-131-1/+2
| | | | | | | | | | | | | | The vsyscall page should be visible only if vsyscall=emulate/native when dumping /proc/kcore. Signed-off-by: Jia Zhang <zhang.jia@linux.alibaba.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: jolsa@redhat.com Link: http://lkml.kernel.org/r/1518446694-21124-3-git-send-email-zhang.jia@linux.alibaba.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* vfs/proc/kcore, x86/mm/kcore: Fix SMAP fault when dumping vsyscall user pageJia Zhang2018-02-133-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | Commit: df04abfd181a ("fs/proc/kcore.c: Add bounce buffer for ktext data") ... introduced a bounce buffer to work around CONFIG_HARDENED_USERCOPY=y. However, accessing the vsyscall user page will cause an SMAP fault. Replace memcpy() with copy_from_user() to fix this bug works, but adding a common way to handle this sort of user page may be useful for future. Currently, only vsyscall page requires KCORE_USER. Signed-off-by: Jia Zhang <zhang.jia@linux.alibaba.com> Reviewed-by: Jiri Olsa <jolsa@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: jolsa@redhat.com Link: http://lkml.kernel.org/r/1518446694-21124-2-git-send-email-zhang.jia@linux.alibaba.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86/Kconfig: Further simplify the NR_CPUS configIngo Molnar2018-02-111-26/+40
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clean up various aspects of the x86 CONFIG_NR_CPUS configuration switches: - Rename the three CONFIG_NR_CPUS related variables to create a common namespace for them: RANGE_BEGIN_CPUS => NR_CPUS_RANGE_BEGIN RANGE_END_CPUS => NR_CPUS_RANGE_END DEF_CONFIG_CPUS => NR_CPUS_DEFAULT - Align them vertically, such as: config NR_CPUS_RANGE_END int depends on X86_64 default 8192 if SMP && ( MAXSMP || CPUMASK_OFFSTACK) default 512 if SMP && (!MAXSMP && !CPUMASK_OFFSTACK) default 1 if !SMP - Update help text, add more comments. Test results: # i386 allnoconfig: CONFIG_NR_CPUS_RANGE_BEGIN=1 CONFIG_NR_CPUS_RANGE_END=1 CONFIG_NR_CPUS_DEFAULT=1 CONFIG_NR_CPUS=1 # i386 defconfig: CONFIG_NR_CPUS_RANGE_BEGIN=2 CONFIG_NR_CPUS_RANGE_END=8 CONFIG_NR_CPUS_DEFAULT=8 CONFIG_NR_CPUS=8 # i386 allyesconfig: CONFIG_NR_CPUS_RANGE_BEGIN=2 CONFIG_NR_CPUS_RANGE_END=64 CONFIG_NR_CPUS_DEFAULT=32 CONFIG_NR_CPUS=32 # x86_64 allnoconfig: CONFIG_NR_CPUS_RANGE_BEGIN=1 CONFIG_NR_CPUS_RANGE_END=1 CONFIG_NR_CPUS_DEFAULT=1 CONFIG_NR_CPUS=1 # x86_64 defconfig: CONFIG_NR_CPUS_RANGE_BEGIN=2 CONFIG_NR_CPUS_RANGE_END=512 CONFIG_NR_CPUS_DEFAULT=64 CONFIG_NR_CPUS=64 # x86_64 allyesconfig: CONFIG_NR_CPUS_RANGE_BEGIN=8192 CONFIG_NR_CPUS_RANGE_END=8192 CONFIG_NR_CPUS_DEFAULT=8192 CONFIG_NR_CPUS=8192 Acked-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20180210113629.jcv6su3r4suuno63@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86/Kconfig: Simplify NR_CPUS configRandy Dunlap2018-02-111-15/+42
| | | | | | | | | | | | | | | | | | | | | | | Clean up and simplify the X86 NR_CPUS Kconfig symbol/option by introducing RANGE_BEGIN_CPUS, RANGE_END_CPUS, and DEF_CONFIG_CPUS. Then combine some default values when their conditionals can be reduced. Also move the X86_BIGSMP kconfig option inside an "if X86_32"/"endif" config block and drop its explicit "depends on X86_32". Combine the max. 8192 cases of RANGE_END_CPUS (X86_64 only). Split RANGE_END_CPUS and DEF_CONFIG_CPUS into separate cases for X86_32 and X86_64. Suggested-by: Linus Torvalds <torvalds@linuxfoundation.org> Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/0b833246-ed4b-e451-c426-c4464725be92@infradead.org Link: lkml.kernel.org/r/CA+55aFzOd3j6ZUSkEwTdk85qtt1JywOtm3ZAb-qAvt8_hJ6D4A@mail.gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86/MCE: Fix build warning introduced by "x86: do not use print_symbol()"Borislav Petkov2018-02-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | The following commit: 7b6061627eb8 ("x86: do not use print_symbol()") ... introduced a new build warning on 32-bit x86: arch/x86/kernel/cpu/mcheck/mce.c:237:21: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast] pr_cont("{%pS}", (void *)m->ip); ^ Fix the type mismatch between the 'void *' expected by %pS and the mce->ip field which is u64 by casting to long. Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-kernel@vger.kernel.org Fixes: 7b6061627eb8 ("x86: do not use print_symbol()") Link: http://lkml.kernel.org/r/20180210145314.22174-1-bp@alien8.de [ Cleaned up the changelog. ] Signed-off-by: Ingo Molnar <mingo@kernel.org>
* Merge branch 'linus' into x86/urgent, to pick up dependent commitsIngo Molnar2018-02-119604-262736/+407704
|\ | | | | | | Signed-off-by: Ingo Molnar <mingo@kernel.org>
| * Merge tag 'nfsd-4.16' of git://linux-nfs.org/~bfields/linuxLinus Torvalds2018-02-0914-97/+73
| |\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull nfsd update from Bruce Fields: "A fairly small update this time around. Some cleanup, RDMA fixes, overlayfs fixes, and a fix for an NFSv4 state bug. The bigger deal for nfsd this time around was Jeff Layton's already-merged i_version patches" * tag 'nfsd-4.16' of git://linux-nfs.org/~bfields/linux: svcrdma: Fix Read chunk round-up NFSD: hide unused svcxdr_dupstr() nfsd: store stat times in fill_pre_wcc() instead of inode times nfsd: encode stat->mtime for getattr instead of inode->i_mtime nfsd: return RESOURCE not GARBAGE_ARGS on too many ops nfsd4: don't set lock stateid's sc_type to CLOSED nfsd: Detect unhashed stids in nfsd4_verify_open_stid() sunrpc: remove dead code in svc_sock_setbufsize svcrdma: Post Receives in the Receive completion handler nfsd4: permit layoutget of executable-only files lockd: convert nlm_rqst.a_count from atomic_t to refcount_t lockd: convert nlm_lockowner.count from atomic_t to refcount_t lockd: convert nsm_handle.sm_count from atomic_t to refcount_t
| | * svcrdma: Fix Read chunk round-upChuck Lever2018-02-081-4/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A single NFSv4 WRITE compound can often have three operations: PUTFH, WRITE, then GETATTR. When the WRITE payload is sent in a Read chunk, the client places the GETATTR in the inline part of the RPC/RDMA message, just after the WRITE operation (sans payload). The position value in the Read chunk enables the receiver to insert the Read chunk at the correct place in the received XDR stream; that is between the WRITE and GETATTR. According to RFC 8166, an NFS/RDMA client does not have to add XDR round-up to the Read chunk that carries the WRITE payload. The receiver adds XDR round-up padding if it is absent and the receiver's XDR decoder requires it to be present. Commit 193bcb7b3719 ("svcrdma: Populate tail iovec when receiving") attempted to add support for receiving such a compound so that just the WRITE payload appears in rq_arg's page list, and the trailing GETATTR is placed in rq_arg's tail iovec. (TCP just strings the whole compound into the head iovec and page list, without regard to the alignment of the WRITE payload). The server transport logic also had to accommodate the optional XDR round-up of the Read chunk, which it did simply by lengthening the tail iovec when round-up was needed. This approach is adequate for the NFSv2 and NFSv3 WRITE decoders. Unfortunately it is not sufficient for nfsd4_decode_write. When the Read chunk length is a couple of bytes less than PAGE_SIZE, the computation at the end of nfsd4_decode_write allows argp->pagelen to go negative, which breaks the logic in read_buf that looks for the tail iovec. The result is that a WRITE operation whose payload length is just less than a multiple of a page succeeds, but the subsequent GETATTR in the same compound fails with NFS4ERR_OP_ILLEGAL because the XDR decoder can't find it. Clients ignore the error, but they must update their attribute cache via a separate round trip. As nfsd4_decode_write appears to expect the payload itself to always have appropriate XDR round-up, have svc_rdma_build_normal_read_chunk add the Read chunk XDR round-up to the page_len rather than lengthening the tail iovec. Reported-by: Olga Kornievskaia <kolga@netapp.com> Fixes: 193bcb7b3719 ("svcrdma: Populate tail iovec when receiving") Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * NFSD: hide unused svcxdr_dupstr()Arnd Bergmann2018-02-081-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is now only one caller left for svcxdr_dupstr() and this is inside of an #ifdef, so we can get a warning when the option is disabled: fs/nfsd/nfs4xdr.c:241:1: error: 'svcxdr_dupstr' defined but not used [-Werror=unused-function] This changes the remaining caller to use a nicer IS_ENABLED() check, which lets the compiler drop the unused code silently. Fixes: e40d99e6183e ("NFSD: Clean up symlink argument XDR decoders") Suggested-by: Rasmus Villemoes <rasmus.villemoes@prevas.dk> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * nfsd: store stat times in fill_pre_wcc() instead of inode timesAmir Goldstein2018-02-083-24/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The time values in stat and inode may differ for overlayfs and stat time values are the correct ones to use. This is also consistent with the fact that fill_post_wcc() also stores stat time values. This means introducing a stat call that could fail, where previously we were just copying values out of the inode. To be conservative about changing behavior, we fall back to copying values out of the inode in the error case. It might be better just to clear fh_pre_saved (though note the BUG_ON in set_change_info). Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * nfsd: encode stat->mtime for getattr instead of inode->i_mtimeAmir Goldstein2018-02-082-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The values of stat->mtime and inode->i_mtime may differ for overlayfs and stat->mtime is the correct value to use when encoding getattr. This is also consistent with the fact that other attr times are also encoded from stat values. Both callers of lease_get_mtime() already have the value of stat->mtime, so the only needed change is that lease_get_mtime() will not overwrite this value with inode->i_mtime in case the inode does not have an exclusive lease. Signed-off-by: Amir Goldstein <amir73il@gmail.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * nfsd: return RESOURCE not GARBAGE_ARGS on too many opsJ. Bruce Fields2018-02-082-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A client that sends more than a hundred ops in a single compound currently gets an rpc-level GARBAGE_ARGS error. It would be more helpful to return NFS4ERR_RESOURCE, since that gives the client a better idea how to recover (for example by splitting up the compound into smaller compounds). This is all a bit academic since we've never actually seen a reason for clients to send such long compounds, but we may as well fix it. While we're there, just use NFSD4_MAX_OPS_PER_COMPOUND == 16, the constant we already use in the 4.1 case, instead of hard-coding 100. Chances anyone actually uses even 16 ops per compound are small enough that I think there's a neglible risk or any regression. This fixes pynfs test COMP6. Reported-by: "Lu, Xinyu" <luxy.fnst@cn.fujitsu.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * nfsd4: don't set lock stateid's sc_type to CLOSEDJ. Bruce Fields2018-02-051-4/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There's no point I can see to stp->st_stid.sc_type = NFS4_CLOSED_STID; given release_lock_stateid immediately sets sc_type to 0. That set of sc_type to 0 should be enough to prevent it being used where we don't want it to be; NFS4_CLOSED_STID should only be needed for actual open stateid's that are actually closed. Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * nfsd: Detect unhashed stids in nfsd4_verify_open_stid()Trond Myklebust2018-02-051-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The state of the stid is guaranteed by 2 locks: - The nfs4_client 'cl_lock' spinlock - The nfs4_ol_stateid 'st_mutex' mutex so it is quite possible for the stid to be unhashed after lookup, but before calling nfsd4_lock_ol_stateid(). So we do need to check for a zero value for 'sc_type' in nfsd4_verify_open_stid(). Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Tested-by: Checuk Lever <chuck.lever@oracle.com> Cc: stable@vger.kernel.org Fixes: 659aefb68eca "nfsd: Ensure we don't recognise lock stateids..." Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * sunrpc: remove dead code in svc_sock_setbufsizeChristoph Hellwig2018-02-051-14/+0
| | | | | | | | | | | | | | | | | | | | | | | | Setting values in struct sock directly is the usual method. Remove the long dead code using set_fs() and the related comment. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * svcrdma: Post Receives in the Receive completion handlerChuck Lever2018-01-185-39/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | This change improves Receive efficiency by posting Receives only on the same CPU that handles Receive completion. Improved latency and throughput has been noted with this change. Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * nfsd4: permit layoutget of executable-only filesBenjamin Coddington2017-12-211-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Clients must be able to read a file in order to execute it, and for pNFS that means the client needs to be able to perform a LAYOUTGET on the file. This behavior for executable-only files was added for OPEN in commit a043226bc140 "nfsd4: permit read opens of executable-only files". This fixes up xfstests generic/126 on block/scsi layouts. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * lockd: convert nlm_rqst.a_count from atomic_t to refcount_tElena Reshetova2017-12-213-6/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable nlm_rqst.a_count is used as pure reference counter. Convert it to refcount_t and fix up the operations. **Important note for maintainers: Some functions from refcount_t API defined in lib/refcount.c have different memory ordering guarantees than their atomic counterparts. The full comparison can be seen in https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon in state to be merged to the documentation tree. Normally the differences should not matter since refcount_t provides enough guarantees to satisfy the refcounting use cases, but in some rare cases it might matter. Please double check that you don't have some undocumented memory guarantees for this variable usage. For the nlm_rqst.a_count it might make a difference in following places: - nlmclnt_release_call() and nlmsvc_release_call(): decrement in refcount_dec_and_test() only provides RELEASE ordering and control dependency on success vs. fully ordered atomic counterpart Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * lockd: convert nlm_lockowner.count from atomic_t to refcount_tElena Reshetova2017-12-212-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable nlm_lockowner.count is used as pure reference counter. Convert it to refcount_t and fix up the operations. **Important note for maintainers: Some functions from refcount_t API defined in lib/refcount.c have different memory ordering guarantees than their atomic counterparts. The full comparison can be seen in https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon in state to be merged to the documentation tree. Normally the differences should not matter since refcount_t provides enough guarantees to satisfy the refcounting use cases, but in some rare cases it might matter. Please double check that you don't have some undocumented memory guarantees for this variable usage. For the nlm_lockowner.count it might make a difference in following places: - nlm_put_lockowner(): decrement in refcount_dec_and_lock() only provides RELEASE ordering, control dependency on success and holds a spin lock on success vs. fully ordered atomic counterpart. No changes in spin lock guarantees. Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| | * lockd: convert nsm_handle.sm_count from atomic_t to refcount_tElena Reshetova2017-12-213-9/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | atomic_t variables are currently used to implement reference counters with the following properties: - counter is initialized to 1 using atomic_set() - a resource is freed upon counter reaching zero - once counter reaches zero, its further increments aren't allowed - counter schema uses basic atomic operations (set, inc, inc_not_zero, dec_and_test, etc.) Such atomic variables should be converted to a newly provided refcount_t type and API that prevents accidental counter overflows and underflows. This is important since overflows and underflows can lead to use-after-free situation and be exploitable. The variable nsm_handle.sm_count is used as pure reference counter. Convert it to refcount_t and fix up the operations. **Important note for maintainers: Some functions from refcount_t API defined in lib/refcount.c have different memory ordering guarantees than their atomic counterparts. The full comparison can be seen in https://lkml.org/lkml/2017/11/15/57 and it is hopefully soon in state to be merged to the documentation tree. Normally the differences should not matter since refcount_t provides enough guarantees to satisfy the refcounting use cases, but in some rare cases it might matter. Please double check that you don't have some undocumented memory guarantees for this variable usage. For the nsm_handle.sm_count it might make a difference in following places: - nsm_release(): decrement in refcount_dec_and_lock() only provides RELEASE ordering, control dependency on success and holds a spin lock on success vs. fully ordered atomic counterpart. No change for the spin lock guarantees. Suggested-by: Kees Cook <keescook@chromium.org> Reviewed-by: David Windsor <dwindsor@gmail.com> Reviewed-by: Hans Liljestrand <ishkamiel@gmail.com> Signed-off-by: Elena Reshetova <elena.reshetova@intel.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com>
| * | Merge branch 'idr-2018-02-06' of git://git.infradead.org/users/willy/linux-daxLinus Torvalds2018-02-0815-325/+471
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull idr updates from Matthew Wilcox: - test-suite improvements - replace the extended API by improving the normal API - performance improvement for IDRs which are 1-based rather than 0-based - add documentation * 'idr-2018-02-06' of git://git.infradead.org/users/willy/linux-dax: idr: Add documentation idr: Make 1-based IDRs more efficient idr: Warn if old iterators see large IDs idr: Rename idr_for_each_entry_ext idr: Remove idr_alloc_ext cls_u32: Convert to idr_alloc_u32 cls_u32: Reinstate cyclic allocation cls_flower: Convert to idr_alloc_u32 cls_bpf: Convert to use idr_alloc_u32 cls_basic: Convert to use idr_alloc_u32 cls_api: Convert to idr_alloc_u32 net sched actions: Convert to use idr_alloc_u32 idr: Add idr_alloc_u32 helper idr: Delete idr_find_ext function idr: Delete idr_replace_ext function idr: Delete idr_remove_ext function IDR test suite: Check handling negative end correctly idr test suite: Fix ida_test_random() radix tree test suite: Remove ARRAY_SIZE
| | * | idr: Add documentationMatthew Wilcox2018-02-064-13/+95
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Move the idr kernel-doc to its own idr.rst file and add a few paragraphs about how to use it. Also add some more kernel-doc. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
| | * | idr: Make 1-based IDRs more efficientMatthew Wilcox2018-02-063-40/+103
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | About 20% of the IDR users in the kernel want the allocated IDs to start at 1. The implementation currently searches all the way down the left hand side of the tree, finds no free ID other than ID 0, walks all the way back up, and then all the way down again. This patch 'rebases' the ID so we fill the entire radix tree, rather than leave a gap at 0. Chris Wilson says: "I did the quick hack of allocating index 0 of the idr and that eradicated idr_get_free() from being at the top of the profiles for the many-object stress tests. This improvement will be much appreciated." Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
| | * | idr: Warn if old iterators see large IDsMatthew Wilcox2018-02-061-1/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that the IDR can be used to store large IDs, it is possible somebody might only partially convert their old code and use the iterators which can only handle IDs up to INT_MAX. It's probably unwise to show them a truncated ID, so settle for spewing warnings to dmesg, and terminating the iteration. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
| | * | idr: Rename idr_for_each_entry_extMatthew Wilcox2018-02-063-33/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Most places in the kernel that we need to distinguish functions by the type of their arguments, we use '_ul' as a suffix for the unsigned long variant, not '_ext'. Also add kernel-doc. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
| | * | idr: Remove idr_alloc_extMatthew Wilcox2018-02-065-111/+105
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It has no more users, so remove it. Move idr_alloc() back into idr.c, move the guts of idr_alloc_cmn() into idr_alloc_u32(), remove the wrappers around idr_get_free_cmn() and rename it to idr_get_free(). While there is now no interface to allocate IDs larger than a u32, the IDR internals remain ready to handle a larger ID should a need arise. These changes make it possible to provide the guarantee that, if the nextid pointer points into the object, the object's ID will be initialised before a concurrent lookup can find the object. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
| | * | cls_u32: Convert to idr_alloc_u32Matthew Wilcox2018-02-061-13/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | No real benefit to this classifier, but since we're allocating a u32 anyway, we should use this function. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
| | * | cls_u32: Reinstate cyclic allocationMatthew Wilcox2018-02-061-10/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit e7614370d6f0 ("net_sched: use idr to allocate u32 filter handles) converted htid allocation to use the IDR. The ID allocated by this scheme changes; it used to be cyclic, but now always allocates the lowest available. The IDR supports cyclic allocation, so just use the right function. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
| | * | cls_flower: Convert to idr_alloc_u32Matthew Wilcox2018-02-061-16/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the new helper which saves a temporary variable and a few lines of code. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
| | * | cls_bpf: Convert to use idr_alloc_u32Matthew Wilcox2018-02-061-14/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the new helper. This has a modest reduction in both lines of code and compiled code size. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
| | * | cls_basic: Convert to use idr_alloc_u32Matthew Wilcox2018-02-061-15/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the new helper which saves a temporary variable and a few lines of code. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
| | * | cls_api: Convert to idr_alloc_u32Matthew Wilcox2018-02-061-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | Use the new helper. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
| | * | net sched actions: Convert to use idr_alloc_u32Matthew Wilcox2018-02-061-35/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use the new helper. Also untangle the error path, and in so doing noticed that estimator generator failure would lead to us leaking an ID. Fix that bug. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
| | * | idr: Add idr_alloc_u32 helperMatthew Wilcox2018-02-062-0/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | All current users of idr_alloc_ext() actually want to allocate a u32 and idr_alloc_u32() fits their needs better. Like idr_get_next(), it uses a 'nextid' argument which serves as both a pointer to the start ID and the assigned ID (instead of a separate minimum and pointer-to-assigned-ID argument). It uses a 'max' argument rather than 'end' because the semantics that idr_alloc has for 'end' don't work well for unsigned types. Since idr_alloc_u32() returns an errno instead of the allocated ID, mark it as __must_check to help callers use it correctly. Include copious kernel-doc. Chris Mi <chrism@mellanox.com> has promised to contribute test-cases for idr_alloc_u32. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
| | * | idr: Delete idr_find_ext functionMatthew Wilcox2018-02-064-9/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Simply changing idr_remove's 'id' argument to 'unsigned long' works for all callers. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
| | * | idr: Delete idr_replace_ext functionMatthew Wilcox2018-02-067-19/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changing idr_replace's 'id' argument to 'unsigned long' works for all callers. Callers which passed a negative ID now get -ENOENT instead of -EINVAL. No callers relied on this error value. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
| | * | idr: Delete idr_remove_ext functionMatthew Wilcox2018-02-067-19/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Simply changing idr_remove's 'id' argument to 'unsigned long' suffices for all callers. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
| | * | IDR test suite: Check handling negative end correctlyMatthew Wilcox2018-02-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One of the charming quirks of the idr_alloc() interface is that you can pass a negative end and it will be interpreted as "maximum". Ensure we don't break that. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
| | * | idr test suite: Fix ida_test_random()Matthew Wilcox2018-02-061-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The test was checking the wrong errno; ida_get_new_above() returns EAGAIN, not ENOMEM on memory allocation failure. Double the number of threads to increase the chance that we actually exercise this path during the test suite (it was a bit sporadic before). Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
| | * | radix tree test suite: Remove ARRAY_SIZEMatthew Wilcox2018-02-061-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This is now defined in tools/include/linux/kernel.h, so our definition generates a warning. Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
| * | | Merge tag 'gcc-plugins-v4.16-rc1' of ↵Linus Torvalds2018-02-084-78/+37
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull gcc plugins updates from Kees Cook: - update includes for gcc 8 (Valdis Kletnieks) - update initializers for gcc 8 * tag 'gcc-plugins-v4.16-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: gcc-plugins: Use dynamic initializers gcc-plugins: Add include required by GCC release 8
| | * | | gcc-plugins: Use dynamic initializersKees Cook2018-02-063-78/+33
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GCC 8 changed the order of some fields and is very picky about ordering in static initializers, so instead just move to dynamic initializers, and drop the redundant already-zero field assignments. Suggested-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: Kees Cook <keescook@chromium.org>
| | * | | gcc-plugins: Add include required by GCC release 8valdis.kletnieks@vt.edu2018-02-061-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | GCC requires another #include to get the gcc-plugins to build cleanly. Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Signed-off-by: Kees Cook <keescook@chromium.org>
| * | | | Merge tag 'for-linus-4.16' of ↵Linus Torvalds2018-02-089-93/+76
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux Pull orangefs updates from Mike Marshall: "Mostly cleanups, but three bug fixes: - don't pass garbage return codes back up the call chain (Mike Marshall) - fix stale inode test (Martin Brandenburg) - fix off-by-one errors (Xiongfeng Wang) Also add Martin as a reviewer in the Maintainers file" * tag 'for-linus-4.16' of git://git.kernel.org/pub/scm/linux/kernel/git/hubcap/linux: orangefs: reverse sense of is-inode-stale test in d_revalidate orangefs: simplify orangefs_inode_is_stale Orangefs: don't propogate whacky error codes orangefs: use correct string length orangefs: make orangefs_make_bad_inode static orangefs: remove ORANGEFS_KERNEL_DEBUG orangefs: remove gossip_ldebug and gossip_lerr orangefs: make orangefs_client_debug_init static MAINTAINERS: update orangefs list and add myself as reviewer
| | * | | | orangefs: reverse sense of is-inode-stale test in d_revalidateMartin Brandenburg2018-02-061-10/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If a dentry is deleted, then a dentry is recreated with the same handle but a different type (i.e. it was a file and now it's a symlink), then its a different inode. The check was backwards, so d_revalidate would not have noticed. Due to the design of the OrangeFS server, this is rather unlikely. It's also possible for the dentry to be deleted and recreated with the same type. This would be undetectable. It's a bit of a ship of Theseus. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
| | * | | | orangefs: simplify orangefs_inode_is_staleMartin Brandenburg2018-02-061-24/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Check whether this is a new inode at location of call. Raises the question of what to do with an unknown inode type. Old code would've marked the inode bad and returned ESTALE. Signed-off-by: Martin Brandenburg <martin@omnibond.com> Signed-off-by: Mike Marshall <hubcap@omnibond.com>
| | * | | | Orangefs: don't propogate whacky error codesMike Marshall2018-02-061-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we get an error return code from userspace (the client-core) we check to make sure it is a valid code. This patch maps the whacky return code to -EINVAL instead of propagating garbage back up the call chain potentially resulting in a hard-to-find train-wreck. The client-core doesn't have any business returning whacky return codes, but if it does, we don't want the kernel to crash as a result. Signed-off-by: Mike Marshall <hubcap@omnibond.com>
| | * | | | orangefs: use correct string lengthXiongfeng Wang2018-02-063-12/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | gcc-8 reports fs/orangefs/dcache.c: In function 'orangefs_d_revalidate': ./include/linux/string.h:245:9: warning: '__builtin_strncpy' specified bound 256 equals destination size [-Wstringop-truncation] fs/orangefs/namei.c: In function 'orangefs_rename': ./include/linux/string.h:245:9: warning: '__builtin_strncpy' specified bound 256 equals destination size [-Wstringop-truncation] fs/orangefs/super.c: In function 'orangefs_mount': ./include/linux/string.h:245:9: warning: '__builtin_strncpy' specified bound 256 equals destination size [-Wstringop-truncation] We need one less byte or call strlcpy() to make it a nul-terminated string. Signed-off-by: Xiongfeng Wang <xiongfeng.wang@linaro.org> Signed-off-by: Mike Marshall <hubcap@omnibond.com>