summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* mmc: improve API to make clear hw_reset callback is for cardsWolfram Sang2022-04-2610-12/+12
| | | | | | | | | | | To make it unambiguous that the hw_reset callback is for cards and not for controllers, we add 'card' to the callback name and convert all users in one go. We keep the argument as mmc_host, though, because the callback is used very early when mmc_card is not yet populated. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Link: https://lore.kernel.org/r/20220408080045.6497-4-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
* mmc: core: improve API to make clear that mmc_sw_reset is for cardsWolfram Sang2022-04-262-2/+3
| | | | | | | | | | | To make it unambiguous that mmc_sw_reset() is for cards and not for controllers, we make the function argument mmc_card instead of mmc_host. There are no users to convert currently. Suggested-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Link: https://lore.kernel.org/r/20220408080045.6497-3-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
* MAINTAINERS: Add linux-renesas-soc@vger.kernel.org list for Renesas ↵Lad Prabhakar2022-04-261-0/+1
| | | | | | | | | | | | TMIO/SDHI driver Add linux-renesas-soc@vger.kernel.org list entry for Renesas TMIO/SDHI driver. Signed-off-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Link: https://lore.kernel.org/r/20220404174159.571-1-prabhakar.mahadev-lad.rj@bp.renesas.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
* mmc: renesas_sdhi: remove superfluous specific M3W entryWolfram Sang2022-04-261-1/+0
| | | | | | | | | | We don't need to specify the Gen3 compatible entry for M3W because it will be provided by the generic Gen3 fallback. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20220404130551.20209-1-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
* mmc: renesas_sdhi: R-Car V3H ES2.0 gained HS400 supportWolfram Sang2022-04-261-1/+1
| | | | | | | | | | | | The hardware evolved, so we only need to disable HS400 support on ES1.* revisions. Update the code. Signed-off-by: Takeshi Saito <takeshi.saito.xv@renesas.com> [wsa: refactored to top-of-tree] Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20220404123404.16289-1-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
* mmc: omap: Make it CCF clk API compatibleJanusz Krzysztofik2022-04-261-9/+14
| | | | | | | | | | | | | The driver, OMAP specific, now omits clk_prepare/unprepare() steps, not supported by OMAP custom implementation of clock API. However, non-CCF stubs of those functions exist for use on such platforms until converted to CCF. Update the driver to be compatible with CCF implementation of clock API. Signed-off-by: Janusz Krzysztofik <jmkrzyszt@gmail.com> Link: https://lore.kernel.org/r/20220402112004.129886-1-jmkrzyszt@gmail.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
* mmc: mmc_spi: parse speed mode optionsChristian Löhle2022-04-261-0/+4
| | | | | | | | | | Since SD and MMC Highspeed modes are also valid for SPI let's parse them too. Signed-off-by: Christian Loehle <cloehle@hyperstone.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20c6efa9a4c7423bbfb9352705c4a53a@hyperstone.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
* mmc: core: block: fix sloppy typing in mmc_blk_ioctl_multi_cmd()Sergey Shtylyov2022-04-261-7/+9
| | | | | | | | | | | | | | | | | | Despite mmc_ioc_multi_cmd::num_of_cmds is a 64-bit field, its maximum value is limited to MMC_IOC_MAX_CMDS (only 255); using a 64-bit local variable to hold a copy of that field leads to gcc generating ineffective loop code: despite the source code using an *int* variable for the loop counters, the 32-bit object code uses 64-bit unsigned counters. Also, gcc has to drop the most significant word of that 64-bit variable when calling kcalloc() and assigning to mmc_queue_req::ioc_count anyway. Using the *unsigned int* variable instead results in a better code. Found by Linux Verification Center (linuxtesting.org) with the SVACE static analysis tool. Signed-off-by: Sergey Shtylyov <s.shtylyov@omp.ru> Link: https://lore.kernel.org/r/eea3b0bd-6091-f005-7189-b5b7868abdb6@omp.ru Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
* dt-bindings: mmc: mtk-sd: increase reg itemsTinghan Shen2022-04-261-1/+14
| | | | | | | | | | | | | | | MediaTek has a new version of mmc IP since mt8183. Some IO registers are moved to top to improve hardware design and named as "host top registers". Add host top register in the reg binding description for mt8183 and successors. Signed-off-by: Wenbin Mei <wenbin.mei@mediatek.com> Signed-off-by: Tinghan Shen <tinghan.shen@mediatek.com> Reviewed-by: Rob Herring <robh@kernel.org> Link: https://lore.kernel.org/r/20220330094532.21721-2-tinghan.shen@mediatek.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
* dt-bindings: mmc: xenon: Convert to JSON schemaChris Packham2022-04-263-174/+276
| | | | | | | | | | | | | | | | Convert the marvell,xenon-sdhci binding to JSON schema. Currently the in-tree dts files don't validate because they use sdhci@ instead of mmc@ as required by the generic mmc-controller schema. The compatible "marvell,sdhci-xenon" was not documented in the old binding but it accompanies the of "marvell,armada-3700-sdhci" in the armada-37xx SoC dtsi so this combination is added to the new binding document. Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Link: https://lore.kernel.org/r/20220329220544.2132135-3-chris.packham@alliedtelesis.co.nz Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
* mmc: renesas_sdhi: R-Car V3M also has no HS400Wolfram Sang2022-04-261-0/+1
| | | | | | | | | | Further digging in the datasheets revealed that R-Car V3M also has no HS400 support. Reported-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Link: https://lore.kernel.org/r/20220404105831.5096-1-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
* mmc: renesas_sdhi: Add missing checks for the presence of quirksGeert Uytterhoeven2022-04-261-2/+2
| | | | | | | | | | | | | | | | | | | | | | | When running on an system without any quirks (e.g. R-Car V3U), the kernel crashes with a NULL pointer dereference: Unable to handle kernel NULL pointer dereference at virtual address 0000000000000002 ... Hardware name: Renesas Falcon CPU and Breakout boards based on r8a779a0 (DT) Workqueue: events_freezable mmc_rescan ... Call trace: renesas_sdhi_internal_dmac_start_dma+0x54/0x12c tmio_process_mrq+0x124/0x274 Fix this by adding the missing checks for the validatity of the priv->quirks pointer. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Link: https://lore.kernel.org/r/cc3178c2ff60f640f4d5a071d51f6b0b1db37656.1648822020.git.geert+renesas@glider.be Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
* mmc: mmci: stm32: use a buffer for unaligned DMA requestsYann Gautier2022-04-261-17/+71
| | | | | | | | | | | | | In SDIO mode, the sg list for requests can be unaligned with what the STM32 SDMMC internal DMA can support. In that case, instead of failing, use a temporary bounce buffer to copy from/to the sg list. This buffer is limited to 1MB. But for that we need to also limit max_req_size to 1MB. It has not shown any throughput penalties for SD-cards or eMMC. Signed-off-by: Yann Gautier <yann.gautier@foss.st.com> Link: https://lore.kernel.org/r/20220328145114.334577-1-yann.gautier@foss.st.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
* mmc: renesas_sdhi: style fix for proper function bodiesWolfram Sang2022-04-261-2/+4
| | | | | | | | Put the braces to the proper position to make reading the code easier. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Link: https://lore.kernel.org/r/20220320124538.62028-1-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
* mmc: renesas_sdhi: make 'dmac_only_one_rx' a quirkWolfram Sang2022-04-262-17/+12
| | | | | | | | | | | After Shimoda-san's much appreciated refactoring of the quirk handling, we can convert now 'dmac_only_one_rx' from an ugly global flag to a regular quirk. This makes quirk handling more consistent and easier to maintain. After this patch, soc_dma_quirks is completely gone, hooray! Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Link: https://lore.kernel.org/r/20220320123016.57991-7-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
* mmc: renesas_sdhi: make 'fixed_addr_mode' a quirkWolfram Sang2022-04-262-10/+12
| | | | | | | | | | | | After Shimoda-san's much appreciated refactoring of the quirk handling, we can convert now the 'fixed_addr_mode' from an ugly global flag to a regular quirk. This makes quirk handling more consistent and easier to maintain. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20220320123016.57991-6-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
* mmc: renesas_sdhi: remove a stale commentWolfram Sang2022-04-261-4/+0
| | | | | | | | | | | The whitelist has been refactored away with a0fb3fc8af01 ("mmc: renesas_sdhi: remove whitelist for internal DMAC") so the comment doesn't make any sense anymore. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20220320123016.57991-5-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
* mmc: renesas_sdhi: make setup selection more understandableWolfram Sang2022-04-261-2/+2
| | | | | | | | | | | When I read 'no_fallback', I forgot what fallback even though I was the author of this change. Name it better to make the code easier to understand. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20220320123016.57991-4-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
* mmc: renesas_sdhi: R-Car D3 also has no HS400Wolfram Sang2022-04-261-6/+7
| | | | | | | | | | | It is not explicitly expressed in the docs, but the needed data strobe pin is indeed missing for D3. The BSP disables HS400 as well. This means a little refactoring to reuse an already existing setup. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20220320123016.57991-3-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
* mmc: renesas_sdhi: remove outdated headersWolfram Sang2022-04-261-2/+0
| | | | | | | | | | We moved quirk handling out of the SDHI core to the individual drivers. So, no need to include headers needed for soc_device_match et al. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20220320123016.57991-2-wsa+renesas@sang-engineering.com Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
* Linux 5.18-rc4v5.18-rc4Linus Torvalds2022-04-241-1/+1
|
* Merge tag 'sched_urgent_for_v5.18_rc4' of ↵Linus Torvalds2022-04-241-5/+5
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fix from Borislav Petkov: - Fix a corner case when calculating sched runqueue variables That fix also removes a check for a zero divisor in the code, without mentioning it. Vincent clarified that it's ok after I whined about it: https://lore.kernel.org/all/CAKfTPtD2QEyZ6ADd5WrwETMOX0XOwJGnVddt7VHgfURdqgOS-Q@mail.gmail.com/ * tag 'sched_urgent_for_v5.18_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched/pelt: Fix attach_entity_load_avg() corner case
| * sched/pelt: Fix attach_entity_load_avg() corner casekuyo chang2022-04-191-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The warning in cfs_rq_is_decayed() triggered: SCHED_WARN_ON(cfs_rq->avg.load_avg || cfs_rq->avg.util_avg || cfs_rq->avg.runnable_avg) There exists a corner case in attach_entity_load_avg() which will cause load_sum to be zero while load_avg will not be. Consider se_weight is 88761 as per the sched_prio_to_weight[] table. Further assume the get_pelt_divider() is 47742, this gives: se->avg.load_avg is 1. However, calculating load_sum: se->avg.load_sum = div_u64(se->avg.load_avg * se->avg.load_sum, se_weight(se)); se->avg.load_sum = 1*47742/88761 = 0. Then enqueue_load_avg() adds this to the cfs_rq totals: cfs_rq->avg.load_avg += se->avg.load_avg; cfs_rq->avg.load_sum += se_weight(se) * se->avg.load_sum; Resulting in load_avg being 1 with load_sum is 0, which will trigger the WARN. Fixes: f207934fb79d ("sched/fair: Align PELT windows between cfs_rq and its se") Signed-off-by: kuyo chang <kuyo.chang@mediatek.com> [peterz: massage changelog] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org> Tested-by: Dietmar Eggemann <dietmar.eggemann@arm.com> Link: https://lkml.kernel.org/r/20220414090229.342-1-kuyo.chang@mediatek.com
* | Merge tag 'powerpc-5.18-3' of ↵Linus Torvalds2022-04-246-66/+66
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - Partly revert a change to our timer_interrupt() that caused lockups with high res timers disabled. - Fix a bug in KVM TCE handling that could corrupt kernel memory. - Two commits fixing Power9/Power10 perf alternative event selection. Thanks to Alexey Kardashevskiy, Athira Rajeev, David Gibson, Frederic Barrat, Madhavan Srinivasan, Miguel Ojeda, and Nicholas Piggin. * tag 'powerpc-5.18-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/perf: Fix 32bit compile powerpc/perf: Fix power10 event alternatives powerpc/perf: Fix power9 event alternatives KVM: PPC: Fix TCE handling for VFIO powerpc/time: Always set decrementer in timer_interrupt()
| * | powerpc/perf: Fix 32bit compileAlexey Kardashevskiy2022-04-211-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The "read_bhrb" global symbol is only called under CONFIG_PPC64 of arch/powerpc/perf/core-book3s.c but it is compiled for both 32 and 64 bit anyway (and LLVM fails to link this on 32bit). This fixes it by moving bhrb.o to obj64 targets. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220421025756.571995-1-aik@ozlabs.ru
| * | powerpc/perf: Fix power10 event alternativesAthira Rajeev2022-04-211-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When scheduling a group of events, there are constraint checks done to make sure all events can go in a group. Example, one of the criteria is that events in a group cannot use the same PMC. But platform specific PMU supports alternative event for some of the event codes. During perf_event_open(), if any event group doesn't match constraint check criteria, further lookup is done to find alternative event. By current design, the array of alternatives events in PMU code is expected to be sorted by column 0. This is because in find_alternative() the return criteria is based on event code comparison. ie. "event < ev_alt[i][0])". This optimisation is there since find_alternative() can be called multiple times. In power10 PMU code, the alternative event array is not sorted properly and hence there is breakage in finding alternative event. To work with existing logic, fix the alternative event array to be sorted by column 0 for power10-pmu.c Results: In case where an alternative event is not chosen when we could, events will be multiplexed. ie, time sliced where it could actually run concurrently. Example, in power10 PM_INST_CMPL_ALT(0x00002) has alternative event, PM_INST_CMPL(0x500fa). Without the fix, if a group of events with PMC1 to PMC4 is used along with PM_INST_CMPL_ALT, it will be time sliced since all programmable PMC's are consumed already. But with the fix, when it picks alternative event on PMC5, all events will run concurrently. Before: # perf stat -e r00002,r100fc,r200fa,r300fc,r400fc Performance counter stats for 'system wide': 328668935 r00002 (79.94%) 56501024 r100fc (79.95%) 49564238 r200fa (79.95%) 376 r300fc (80.19%) 660 r400fc (79.97%) 4.039150522 seconds time elapsed With the fix, since alternative event is chosen to run on PMC6, events will be run concurrently. After: # perf stat -e r00002,r100fc,r200fa,r300fc,r400fc Performance counter stats for 'system wide': 23596607 r00002 4907738 r100fc 2283608 r200fa 135 r300fc 248 r400fc 1.664671390 seconds time elapsed Fixes: a64e697cef23 ("powerpc/perf: power10 Performance Monitoring support") Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220419114828.89843-2-atrajeev@linux.vnet.ibm.com
| * | powerpc/perf: Fix power9 event alternativesAthira Rajeev2022-04-211-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When scheduling a group of events, there are constraint checks done to make sure all events can go in a group. Example, one of the criteria is that events in a group cannot use the same PMC. But platform specific PMU supports alternative event for some of the event codes. During perf_event_open(), if any event group doesn't match constraint check criteria, further lookup is done to find alternative event. By current design, the array of alternatives events in PMU code is expected to be sorted by column 0. This is because in find_alternative() the return criteria is based on event code comparison. ie. "event < ev_alt[i][0])". This optimisation is there since find_alternative() can be called multiple times. In power9 PMU code, the alternative event array is not sorted properly and hence there is breakage in finding alternative events. To work with existing logic, fix the alternative event array to be sorted by column 0 for power9-pmu.c Results: With alternative events, multiplexing can be avoided. That is, for example, in power9 PM_LD_MISS_L1 (0x3e054) has alternative event, PM_LD_MISS_L1_ALT (0x400f0). This is an identical event which can be programmed in a different PMC. Before: # perf stat -e r3e054,r300fc Performance counter stats for 'system wide': 1057860 r3e054 (50.21%) 379 r300fc (49.79%) 0.944329741 seconds time elapsed Since both the events are using PMC3 in this case, they are multiplexed here. After: # perf stat -e r3e054,r300fc Performance counter stats for 'system wide': 1006948 r3e054 182 r300fc Fixes: 91e0bd1e6251 ("powerpc/perf: Add PM_LD_MISS_L1 and PM_BR_2PATH to power9 event list") Signed-off-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Reviewed-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220419114828.89843-1-atrajeev@linux.vnet.ibm.com
| * | KVM: PPC: Fix TCE handling for VFIOAlexey Kardashevskiy2022-04-212-44/+45
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The LoPAPR spec defines a guest visible IOMMU with a variable page size. Currently QEMU advertises 4K, 64K, 2M, 16MB pages, a Linux VM picks the biggest (16MB). In the case of a passed though PCI device, there is a hardware IOMMU which does not support all pages sizes from the above - P8 cannot do 2MB and P9 cannot do 16MB. So for each emulated 16M IOMMU page we may create several smaller mappings ("TCEs") in the hardware IOMMU. The code wrongly uses the emulated TCE index instead of hardware TCE index in error handling. The problem is easier to see on POWER8 with multi-level TCE tables (when only the first level is preallocated) as hash mode uses real mode TCE hypercalls handlers. The kernel starts using indirect tables when VMs get bigger than 128GB (depends on the max page order). The very first real mode hcall is going to fail with H_TOO_HARD as in the real mode we cannot allocate memory for TCEs (we can in the virtual mode) but on the way out the code attempts to clear hardware TCEs using emulated TCE indexes which corrupts random kernel memory because it_offset==1<<59 is subtracted from those indexes and the resulting index is out of the TCE table bounds. This fixes kvmppc_clear_tce() to use the correct TCE indexes. While at it, this fixes TCE cache invalidation which uses emulated TCE indexes instead of the hardware ones. This went unnoticed as 64bit DMA is used these days and VMs map all RAM in one go and only then do DMA and this is when the TCE cache gets populated. Potentially this could slow down mapping, however normally 16MB emulated pages are backed by 64K hardware pages so it is one write to the "TCE Kill" per 256 updates which is not that bad considering the size of the cache (1024 TCEs or so). Fixes: ca1fc489cfa0 ("KVM: PPC: Book3S: Allow backing bigger guest IOMMU pages with smaller physical pages") Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Tested-by: David Gibson <david@gibson.dropbear.id.au> Reviewed-by: Frederic Barrat <fbarrat@linux.ibm.com> Reviewed-by: David Gibson <david@gibson.dropbear.id.au> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220420050840.328223-1-aik@ozlabs.ru
| * | powerpc/time: Always set decrementer in timer_interrupt()Michael Ellerman2022-04-211-15/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a partial revert of commit 0faf20a1ad16 ("powerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless perf is in use"). Prior to that commit, we always set the decrementer in timer_interrupt(), to clear the timer interrupt. Otherwise we could end up continuously taking timer interrupts. When high res timers are enabled there is no problem seen with leaving the decrementer untouched in timer_interrupt(), because it will be programmed via hrtimer_interrupt() -> tick_program_event() -> clockevents_program_event() -> decrementer_set_next_event(). However with CONFIG_HIGH_RES_TIMERS=n or booting with highres=off, we see a stall/lockup, because tick_nohz_handler() does not cause a reprogram of the decrementer, leading to endless timer interrupts. Example trace: [ 1.898617][ T7] Freeing initrd memory: 2624K^M [ 22.680919][ C1] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:^M [ 22.682281][ C1] rcu: 0-....: (25 ticks this GP) idle=073/0/0x1 softirq=10/16 fqs=1050 ^M [ 22.682851][ C1] (detected by 1, t=2102 jiffies, g=-1179, q=476)^M [ 22.683649][ C1] Sending NMI from CPU 1 to CPUs 0:^M [ 22.685252][ C0] NMI backtrace for cpu 0^M [ 22.685649][ C0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-rc2-00185-g0faf20a1ad16 #145^M [ 22.686393][ C0] NIP: c000000000016d64 LR: c000000000f6cca4 CTR: c00000000019c6e0^M [ 22.686774][ C0] REGS: c000000002833590 TRAP: 0500 Not tainted (5.16.0-rc2-00185-g0faf20a1ad16)^M [ 22.687222][ C0] MSR: 8000000000009033 <SF,EE,ME,IR,DR,RI,LE> CR: 24000222 XER: 00000000^M [ 22.688297][ C0] CFAR: c00000000000c854 IRQMASK: 0 ^M ... [ 22.692637][ C0] NIP [c000000000016d64] arch_local_irq_restore+0x174/0x250^M [ 22.694443][ C0] LR [c000000000f6cca4] __do_softirq+0xe4/0x3dc^M [ 22.695762][ C0] Call Trace:^M [ 22.696050][ C0] [c000000002833830] [c000000000f6cc80] __do_softirq+0xc0/0x3dc (unreliable)^M [ 22.697377][ C0] [c000000002833920] [c000000000151508] __irq_exit_rcu+0xd8/0x130^M [ 22.698739][ C0] [c000000002833950] [c000000000151730] irq_exit+0x20/0x40^M [ 22.699938][ C0] [c000000002833970] [c000000000027f40] timer_interrupt+0x270/0x460^M [ 22.701119][ C0] [c0000000028339d0] [c0000000000099a8] decrementer_common_virt+0x208/0x210^M Possibly this should be fixed in the lowres timing code, but that would be a generic change and could take some time and may not backport easily, so for now make the programming of the decrementer unconditional again in timer_interrupt() to avoid the stall/lockup. Fixes: 0faf20a1ad16 ("powerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless perf is in use") Reported-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Nicholas Piggin <npiggin@gmail.com> Link: https://lore.kernel.org/r/20220420141657.771442-1-mpe@ellerman.id.au
* | | Merge tag 'perf_urgent_for_v5.18_rc4' of ↵Linus Torvalds2022-04-244-9/+10
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Borislav Petkov: - Add Sapphire Rapids CPU support - Fix a perf vmalloc-ed buffer mapping error (PERF_USE_VMALLOC in use) * tag 'perf_urgent_for_v5.18_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/cstate: Add SAPPHIRERAPIDS_X CPU support perf/core: Fix perf_mmap fail when CONFIG_PERF_USE_VMALLOC enabled
| * | | perf/x86/cstate: Add SAPPHIRERAPIDS_X CPU supportZhang Rui2022-04-191-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From the perspective of Intel cstate residency counters, SAPPHIRERAPIDS_X is the same as ICELAKE_X. Share the code with it. And update the comments for SAPPHIRERAPIDS_X. Signed-off-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Kan Liang <kan.liang@linux.intel.com> Link: https://lkml.kernel.org/r/20220415104520.2737004-1-rui.zhang@intel.com
| * | | perf/core: Fix perf_mmap fail when CONFIG_PERF_USE_VMALLOC enabledZhipeng Xie2022-04-193-6/+6
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This problem can be reproduced with CONFIG_PERF_USE_VMALLOC enabled on both x86_64 and aarch64 arch when using sysdig -B(using ebpf)[1]. sysdig -B works fine after rebuilding the kernel with CONFIG_PERF_USE_VMALLOC disabled. I tracked it down to the if condition event->rb->nr_pages != nr_pages in perf_mmap is true when CONFIG_PERF_USE_VMALLOC is enabled where event->rb->nr_pages = 1 and nr_pages = 2048 resulting perf_mmap to return -EINVAL. This is because when CONFIG_PERF_USE_VMALLOC is enabled, rb->nr_pages is always equal to 1. Arch with CONFIG_PERF_USE_VMALLOC enabled by default: arc/arm/csky/mips/sh/sparc/xtensa Arch with CONFIG_PERF_USE_VMALLOC disabled by default: x86_64/aarch64/... Fix this problem by using data_page_nr() [1] https://github.com/draios/sysdig Fixes: 906010b2134e ("perf_event: Provide vmalloc() based mmap() backing") Signed-off-by: Zhipeng Xie <xiezhipeng1@huawei.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20220209145417.6495-1-xiezhipeng1@huawei.com
* | | Merge tag 'edac_urgent_for_v5.18_rc4' of ↵Linus Torvalds2022-04-241-5/+11
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras Pull EDAC fix from Borislav Petkov: - Read the reported error count from the proper register on synopsys_edac * tag 'edac_urgent_for_v5.18_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras: EDAC/synopsys: Read the error count from the correct register
| * | | EDAC/synopsys: Read the error count from the correct registerShubhrajyoti Datta2022-04-141-5/+11
| | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, the error count is read wrongly from the status register. Read the count from the proper error count register (ERRCNT). [ bp: Massage. ] Fixes: b500b4a029d5 ("EDAC, synopsys: Add ECC support for ZynqMP DDR controller") Signed-off-by: Shubhrajyoti Datta <shubhrajyoti.datta@xilinx.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Michal Simek <michal.simek@xilinx.com> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220414102813.4468-1-shubhrajyoti.datta@xilinx.com
* | | kvmalloc: use vmalloc_huge for vmalloc allocationsLinus Torvalds2022-04-241-2/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 559089e0a93d ("vmalloc: replace VM_NO_HUGE_VMAP with VM_ALLOW_HUGE_VMAP"), the use of hugepage mappings for vmalloc is an opt-in strategy, because it caused a number of problems that weren't noticed until x86 enabled it too. One of the issues was fixed by Nick Piggin in commit 3b8000ae185c ("mm/vmalloc: huge vmalloc backing pages should be split rather than compound"), but I'm still worried about page protection issues, and VM_FLUSH_RESET_PERMS in particular. However, like the hash table allocation case (commit f2edd118d02d: "page_alloc: use vmalloc_huge for large system hash"), the use of kvmalloc() should be safe from any such games, since the returned pointer might be a SLUB allocation, and as such no user should reasonably be using it in any odd ways. We also know that the allocations are fairly large, since it falls back to the vmalloc case only when a kmalloc() fails. So using a hugepage mapping seems both safe and relevant. This patch does show a weakness in the opt-in strategy: since the opt-in flag is in the 'vm_flags', not the usual gfp_t allocation flags, very few of the usual interfaces actually expose it. That's not much of an issue in this case that already used one of the fairly specialized low-level vmalloc interfaces for the allocation, but for a lot of other vmalloc() users that might want to opt in, it's going to be very inconvenient. We'll either have to fix any compatibility problems, or expose it in the gfp flags (__GFP_COMP would have made a lot of sense) to allow normal vmalloc() users to use hugepage mappings. That said, the cases that really matter were probably already taken care of by the hash tabel allocation. Link: https://lore.kernel.org/all/20220415164413.2727220-1-song@kernel.org/ Link: https://lore.kernel.org/all/CAHk-=whao=iosX1s5Z4SF-ZGa-ebAukJoAdUJFk5SPwnofV+Vg@mail.gmail.com/ Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Paul Menzel <pmenzel@molgen.mpg.de> Cc: Song Liu <songliubraving@fb.com> Cc: Rick Edgecombe <rick.p.edgecombe@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | page_alloc: use vmalloc_huge for large system hashSong Liu2022-04-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use vmalloc_huge() in alloc_large_system_hash() so that large system hash (>= PMD_SIZE) could benefit from huge pages. Note that vmalloc_huge only allocates huge pages for systems with HAVE_ARCH_HUGE_VMALLOC. Signed-off-by: Song Liu <song@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Rik van Riel <riel@surriel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge tag '5.18-rc3-ksmbd-fixes' of git://git.samba.org/ksmbdLinus Torvalds2022-04-248-66/+52
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull ksmbd server fixes from Steve French: - cap maximum sector size reported to avoid mount problems - reference count fix - fix filename rename race * tag '5.18-rc3-ksmbd-fixes' of git://git.samba.org/ksmbd: ksmbd: set fixed sector size to FS_SECTOR_SIZE_INFORMATION ksmbd: increment reference count of parent fp ksmbd: remove filename in ksmbd_file
| * | | ksmbd: set fixed sector size to FS_SECTOR_SIZE_INFORMATIONNamjae Jeon2022-04-151-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently ksmbd is using ->f_bsize from vfs_statfs() as sector size. If fat/exfat is a local share, ->f_bsize is a cluster size that is too large to be used as a sector size. Sector sizes larger than 4K cause problem occurs when mounting an iso file through windows client. The error message can be obtained using Mount-DiskImage command, the error is: "Mount-DiskImage : The sector size of the physical disk on which the virtual disk resides is not supported." This patch reports fixed 4KB sector size if ->s_blocksize is bigger than 4KB. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | ksmbd: increment reference count of parent fpNamjae Jeon2022-04-152-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add missing increment reference count of parent fp in ksmbd_lookup_fd_inode(). Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
| * | | ksmbd: remove filename in ksmbd_fileNamjae Jeon2022-04-158-62/+42
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | If the filename is change by underlying rename the server, fp->filename and real filename can be different. This patch remove the uses of fp->filename in ksmbd and replace it with d_path(). Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Reviewed-by: Hyunchul Lee <hyc.lee@gmail.com> Signed-off-by: Steve French <stfrench@microsoft.com>
* | | Merge tag 'arc-5.18-rc4' of ↵Linus Torvalds2022-04-249-27/+24
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc Pull ARC fixes from Vineet Gupta: - Assorted fixes * tag 'arc-5.18-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/vgupta/arc: ARC: remove redundant READ_ONCE() in cmpxchg loop ARC: atomic: cleanup atomic-llsc definitions arc: drop definitions of pgd_index() and pgd_offset{, _k}() entirely ARC: dts: align SPI NOR node name with dtschema ARC: Remove a redundant memset() ARC: fix typos in comments ARC: entry: fix syscall_trace_exit argument
| * | | ARC: remove redundant READ_ONCE() in cmpxchg loopBang Li2022-04-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch reverts commit 7082a29c22ac ("ARC: use ACCESS_ONCE in cmpxchg loop"). It is not necessary to use READ_ONCE() because cmpxchg contains barrier. We can get it from commit d57f727264f1 ("ARC: add compiler barrier to LLSC based cmpxchg"). Signed-off-by: Bang Li <libang.linuxer@gmail.com> Signed-off-by: Vineet Gupta <vgupta@kernel.org>
| * | | ARC: atomic: cleanup atomic-llsc definitionsSergey Matyukevich2022-04-181-16/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove redundant c_op macro argument. Only asm_op is needed to define atomic operations using llock/scond. Signed-off-by: Sergey Matyukevich <sergey.matyukevich@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@kernel.org>
| * | | arc: drop definitions of pgd_index() and pgd_offset{, _k}() entirelyRolf Eike Beer2022-04-181-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | They were in <asm/pgtables.h> and have been removed from there in 974b9b2c68f ("mm: consolidate pte_index() and pte_offset_*() definitions") in favor of the generic version. But that missed that the same definitons also existed in <asm/pgtable-levels.h>, where they were (inadvertently?) introduced in fe6cb7b043b6 ("ARC: mm: disintegrate pgtable.h into levels and flags"). Fixes: 974b9b2c68f3 ("mm: consolidate pte_index() and pte_offset_*() definitions") Fixes: fe6cb7b043b6 ("ARC: mm: disintegrate pgtable.h into levels and flags") Signed-off-by: Rolf Eike Beer <eb@emlix.com> Signed-off-by: Vineet Gupta <vgupta@kernel.org>
| * | | ARC: dts: align SPI NOR node name with dtschemaKrzysztof Kozlowski2022-04-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | The node names should be generic and SPI NOR dtschema expects "flash". Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: Vineet Gupta <vgupta@kernel.org>
| * | | ARC: Remove a redundant memset()Christophe JAILLET2022-04-181-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | disasm_instr() already call memset(0) on its 2nd argument, so there is no need to clear it explicitly before calling this function. Remove the redundant memset(). Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Vineet Gupta <vgupta@kernel.org>
| * | | ARC: fix typos in commentsJulia Lawall2022-04-185-5/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Various spelling mistakes in comments. Detected with the help of Coccinelle. Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Signed-off-by: Vineet Gupta <vgupta@kernel.org>
| * | | ARC: entry: fix syscall_trace_exit argumentSergey Matyukevich2022-04-181-0/+1
| |/ / | | | | | | | | | | | | | | | | | | | | | | | | | | | Function syscall_trace_exit expects pointer to pt_regs. However r0 is also used to keep syscall return value. Restore pointer to pt_regs before calling syscall_trace_exit. Cc: <stable@vger.kernel.org> Signed-off-by: Sergey Matyukevich <sergey.matyukevich@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@kernel.org>
* | | Merge tag 'scsi-fixes' of ↵Linus Torvalds2022-04-231-3/+12
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fix from James Bottomley: "One fix for an information leak caused by copying a buffer to userspace without checking for error first in the sr driver" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: sr: Do not leak information in ioctl
| * | | scsi: sr: Do not leak information in ioctlTom Rix2022-04-191-3/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sr_ioctl.c uses this pattern: result = sr_do_ioctl(cd, &cgc); to-user = buffer[]; kfree(buffer); return result; Use of a buffer without checking leaks information. Check result and jump over the use of buffer if there is an error. result = sr_do_ioctl(cd, &cgc); if (result) goto err; to-user = buffer[]; err: kfree(buffer); return result; Additionally, initialize the buffer to zero. This problem can be seen in the 2.4.0 kernel. Link: https://lore.kernel.org/r/20220411174756.2418435-1-trix@redhat.com Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>