summaryrefslogtreecommitdiffstats
path: root/include (unfollow)
Commit message (Collapse)AuthorFilesLines
2022-02-21firmware: arm_scmi: Add support for clock_enable_latencyCristian Marussi2-3/+10
An SCMI platform can optionally advertise an enable latency typically associated with a specific clock resource: add support for parsing such optional message field and export such information in the usual publicly accessible clock descriptor. Link: https://lore.kernel.org/r/20220217131234.50328-8-cristian.marussi@arm.com Signed-off-by: Cristian Marussi <cristian.marussi@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2022-02-21firmware: arm_scmi: Add atomic support to clock protocolCristian Marussi2-3/+22
Introduce new _atomic variant for SCMI clock protocol operations related to enable disable operations: when an atomic operation is required the xfer poll_completion flag is set for that transaction. Link: https://lore.kernel.org/r/20220217131234.50328-7-cristian.marussi@arm.com Signed-off-by: Cristian Marussi <cristian.marussi@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2022-02-21firmware: arm_scmi: Support optional system wide atomic-threshold-usCristian Marussi2-4/+28
An SCMI agent can be configured system-wide with a well-defined atomic threshold: only SCMI synchronous command whose latency has been advertised by the SCMI platform to be lower or equal to this configured threshold will be considered for atomic operations, when requested and if supported by the underlying transport at all. Link: https://lore.kernel.org/r/20220217131234.50328-6-cristian.marussi@arm.com Signed-off-by: Cristian Marussi <cristian.marussi@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2022-02-21dt-bindings: firmware: arm,scmi: Add atomic-threshold-us optional propertyCristian Marussi1-0/+10
SCMI protocols in the platform can optionally signal to the OSPM agent the expected execution latency for a specific resource/operation pair. Introduce an SCMI system wide optional property to describe a global time threshold which can be configured on a per-platform base to determine the opportunity, or not, for an SCMI command advertised to have a higher latency than the threshold, to be considered for atomic operations: high-latency SCMI synchronous commands should be preferably issued in the usual non-atomic mode. Link: https://lore.kernel.org/r/20220217131234.50328-5-cristian.marussi@arm.com Cc: Rob Herring <robh+dt@kernel.org> Cc: devicetree@vger.kernel.org Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: Cristian Marussi <cristian.marussi@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2022-02-21firmware: arm_scmi: Add atomic mode support to virtio transportCristian Marussi3-20/+391
Add support for .mark_txdone and .poll_done transport operations to SCMI VirtIO transport as pre-requisites to enable atomic operations. Add a Kernel configuration option to enable SCMI VirtIO transport polling and atomic mode for selected SCMI transactions while leaving it default disabled. Link: https://lore.kernel.org/r/20220217131234.50328-4-cristian.marussi@arm.com Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Igor Skalkin <igor.skalkin@opensynergy.com> Cc: Peter Hilber <peter.hilber@opensynergy.com> Cc: virtualization@lists.linux-foundation.org Signed-off-by: Cristian Marussi <cristian.marussi@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2022-02-21firmware: arm_scmi: Review virtio free_list handlingCristian Marussi1-31/+57
Add a new spinlock dedicated to the access of the TX free list and a couple of helpers to get and put messages back and forth from the free_list. Link: https://lore.kernel.org/r/20220217131234.50328-3-cristian.marussi@arm.com Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Igor Skalkin <igor.skalkin@opensynergy.com> Cc: Peter Hilber <peter.hilber@opensynergy.com> Cc: virtualization@lists.linux-foundation.org Signed-off-by: Cristian Marussi <cristian.marussi@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2022-02-21firmware: arm_scmi: Add a virtio channel refcountCristian Marussi1-51/+92
Currently SCMI VirtIO channels are marked with a ready flag and related lock to track channel lifetime and support proper synchronization at shutdown when virtqueues have to be stopped. This leads to some extended spinlocked sections with IRQs off on the RX path to keep hold of the ready flag and does not scale well especially when SCMI VirtIO polling mode will be introduced. Add an SCMI VirtIO channel dedicated refcount to track active users on both the TX and the RX path and properly enforce synchronization and cleanup at shutdown, inhibiting further usage of the channel once freed. Link: https://lore.kernel.org/r/20220217131234.50328-2-cristian.marussi@arm.com Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Igor Skalkin <igor.skalkin@opensynergy.com> Cc: Peter Hilber <peter.hilber@opensynergy.com> Cc: virtualization@lists.linux-foundation.org Signed-off-by: Cristian Marussi <cristian.marussi@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2022-02-09firmware: arm_scmi: Disable ftrace for Clang Thumb2 buildsArd Biesheuvel1-0/+7
The SMC calling convention designates R0-R7 as input registers in AArch32 mode, and this conflicts with the compiler's use of R7 as a frame pointer when building in Thumb2 mode. Generally, we don't enable the frame pointer, and GCC happily enables the -pg profiling hooks without them. However, Clang refuses, and errors out with the message below: drivers/firmware/arm_scmi/smc.c:152:2: error: write to reserved register 'R7' arm_smccc_1_1_invoke(scmi_info->func_id, 0, 0, 0, 0, 0, 0, 0, &res); ^ include/linux/arm-smccc.h:550:4: note: expanded from macro 'arm_smccc_1_1_invoke' arm_smccc_1_1_smc(__VA_ARGS__); \ ^ Let's just disable ftrace for the compilation unit when building this configuration. Link: https://lore.kernel.org/r/20220203082204.1176734-11-ardb@kernel.org Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com>
2022-01-23Linux 5.17-rc1v5.17-rc1Linus Torvalds1-2/+2
2022-01-23ftrace: Fix assuming build time sort works for s390Steven Rostedt (Google)2-3/+10
To speed up the boot process, as mcount_loc needs to be sorted for ftrace to work properly, sorting it at build time is more efficient than boot up and can save milliseconds of time. Unfortunately, this change broke s390 as it will modify the mcount_loc location after the sorting takes place and will put back the unsorted locations. Since the sorting is skipped at boot up if it is believed that it was sorted at run time, ftrace can crash as its algorithms are dependent on the list being sorted. Add a new config BUILDTIME_MCOUNT_SORT that is set when BUILDTIME_TABLE_SORT but not if S390 is set. Use this config to determine if sorting should take place at boot up. Link: https://lore.kernel.org/all/yt9dee51ctfn.fsf@linux.ibm.com/ Fixes: 72b3942a173c ("scripts: ftrace - move the sort-processing in ftrace_init") Reported-by: Sven Schnelle <svens@linux.ibm.com> Tested-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-01-22perf tools: Remove redundant err variableMinghao Chi1-4/+1
Return value from perf_event__process_tracing_data() directly instead of taking this in another redundant variable. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Minghao Chi <chi.minghao@zte.com.cn> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20220112080109.666800-1-chi.minghao@zte.com.cn Signed-off-by: CGEL ZTE <cgel.zte@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-22perf test: Add parse-events test for aliases with hyphensJohn Garry2-9/+82
Add a test which allows us to test parsing an event alias with hyphens. Since these events typically do not exist on most host systems, add the alias to the fake pmu. Function perf_pmu__test_parse_init() has terms added to match known test aliases. Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Liu <liuqi115@huawei.com> Cc: Shaokun Zhang <zhangshaokun@hisilicon.com> Cc: linuxarm@huawei.com Link: https://lore.kernel.org/r/1642432215-234089-4-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-22perf test: Add pmu-events test for aliases with hyphensJohn Garry2-0/+48
Add a test for aliases with hyphens in the name to ensure that the pmu-events tables are as expects. There should be no reason why these sort of aliases would be treated differently, but no harm in checking. Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Liu <liuqi115@huawei.com> Cc: Shaokun Zhang <zhangshaokun@hisilicon.com> Cc: linuxarm@huawei.com Link: https://lore.kernel.org/r/1642432215-234089-3-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-22perf parse-events: Support event alias in form foo-bar-bazJohn Garry4-4/+41
Event aliasing for events whose name in the form foo-bar-baz is not supported, while foo-bar, foo_bar_baz, and other combinations are, i.e. two hyphens are not supported. The HiSilicon D06 platform has events in such form: $ ./perf list sdir-home-migrate List of pre-defined events (to be used in -e): uncore hha: sdir-home-migrate [Unit: hisi_sccl,hha] $ sudo ./perf stat -e sdir-home-migrate event syntax error: 'sdir-home-migrate' \___ parser error Run 'perf list' for a list of valid events Usage: perf stat [<options>] [<command>] -e, --event <event>event selector. use 'perf list' to list available events To support, add an extra PMU event symbol type for "baz", and add a new rule in the bison file. Signed-off-by: John Garry <john.garry@huawei.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Qi Liu <liuqi115@huawei.com> Cc: Shaokun Zhang <zhangshaokun@hisilicon.com> Cc: linuxarm@huawei.com Link: https://lore.kernel.org/r/1642432215-234089-2-git-send-email-john.garry@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-22perf evsel: Override attr->sample_period for non-libpfm4 eventsGerman Gomez1-8/+17
A previous patch preventing "attr->sample_period" values from being overridden in pfm events changed a related behaviour in arm-spe. Before said patch: perf record -c 10000 -e arm_spe_0// -- sleep 1 Would yield an SPE event with period=10000. After the patch, the period in "-c 10000" was being ignored because the arm-spe code initializes sample_period to a non-zero value. This patch restores the previous behaviour for non-libpfm4 events. Fixes: ae5dcc8abe31 (“perf record: Prevent override of attr->sample_period for libpfm4 events”) Reported-by: Chase Conklin <chase.conklin@arm.com> Signed-off-by: German Gomez <german.gomez@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20220118144054.2541-1-german.gomez@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-22perf cpumap: Remove duplicate include in cpumap.hLv Ruyi1-1/+0
Remove all but the first include of stdbool.h from cpumap.h. Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Lv Ruyi <lv.ruyi@zte.com.cn> Acked-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20220117083730.863200-1-lv.ruyi@zte.com.cn Signed-off-by: CGEL ZTE <cgel.zte@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-22perf cpumap: Migrate to libperf cpumap apiIan Rogers31-87/+99
Switch from directly accessing the perf_cpu_map to using the appropriate libperf API when possible. Using the API simplifies the job of refactoring use of perf_cpu_map. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: André Almeida <andrealmeid@collabora.com> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miaoqian Lin <linmq006@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Brennan <stephen.s.brennan@oracle.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yury Norov <yury.norov@gmail.com> Link: http://lore.kernel.org/lkml/20220122045811.3402706-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-22perf python: Fix cpu_map__item() buildingIan Rogers1-3/+3
Value should be built as an integer. Switch some uses of perf_cpu_map to use the library API. Fixes: 6d18804b963b78dc ("perf cpumap: Give CPUs their own type") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: André Almeida <andrealmeid@collabora.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: German Gomez <german.gomez@arm.com> Cc: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Garry <john.garry@huawei.com> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Miaoqian Lin <linmq006@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com> Cc: Song Liu <song@kernel.org> Cc: Stephane Eranian <eranian@google.com> Cc: Stephen Brennan <stephen.s.brennan@oracle.com> Cc: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Yury Norov <yury.norov@gmail.com> Link: http://lore.kernel.org/lkml/20220122045811.3402706-2-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-22perf script: Fix printing 'phys_addr' failure issueYao Jin1-1/+1
Perf script was failed to print the phys_addr for SPE profiling. One 'dummy' event is added by SPE profiling but it doesn't have PHYS_ADDR attribute set, perf script then exits with error. Now referring to 'addr', use evsel__do_check_stype() to check the type. Before: # perf record -e arm_spe_0/branch_filter=0,ts_enable=1,pa_enable=1,load_filter=1,jitter=0,\ store_filter=0,min_latency=0,event_filter=2/ -p 4064384 -- sleep 3 # perf script -F pid,tid,addr,phys_addr Samples for 'dummy:u' event do not have PHYS_ADDR attribute set. Cannot print 'phys_addr' field. After: # perf record -e arm_spe_0/branch_filter=0,ts_enable=1,pa_enable=1,load_filter=1,jitter=0,\ store_filter=0,min_latency=0,event_filter=2/ -p 4064384 -- sleep 3 # perf script -F pid,tid,addr,phys_addr 4064384/4064384 ffff802f921be0d0 2f921be0d0 4064384/4064384 ffff802f921be0d0 2f921be0d0 Reviewed-by: German Gomez <german.gomez@arm.com> Signed-off-by: Yao Jin <jinyao5@huawei.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20220121065954.2121900-1-liwei391@huawei.com Signed-off-by: Wei Li <liwei391@huawei.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-22certs: Fix build error when CONFIG_MODULE_SIG_KEY is emptyMasahiro Yamada1-1/+1
Since b8c96a6b466c ("certs: simplify $(srctree)/ handling and remove config_filename macro"), when CONFIG_MODULE_SIG_KEY is empty, signing_key.x509 fails to build: CERT certs/signing_key.x509 Usage: extract-cert <source> <dest> make[1]: *** [certs/Makefile:78: certs/signing_key.x509] Error 2 make: *** [Makefile:1831: certs] Error 2 Pass "" to the first argument of extract-cert to fix the build error. Link: https://lore.kernel.org/linux-kbuild/20220120094606.2skuyb26yjlnu66q@lion.mk-sys.cz/T/#u Fixes: b8c96a6b466c ("certs: simplify $(srctree)/ handling and remove config_filename macro") Reported-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Tested-by: Michal Kubecek <mkubecek@suse.cz>
2022-01-22certs: Fix build error when CONFIG_MODULE_SIG_KEY is PKCS#11 URIMasahiro Yamada1-1/+1
When CONFIG_MODULE_SIG_KEY is PKCS#11 URL (pkcs11:*), signing_key.x509 fails to build: certs/Makefile:77: *** target pattern contains no '%'. Stop. Due to the typo, $(X509_DEP) contains a colon. Fix it. Fixes: b8c96a6b466c ("certs: simplify $(srctree)/ handling and remove config_filename macro") Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2022-01-22Revert "Makefile: Do not quote value for CONFIG_CC_IMPLICIT_FALLTHROUGH"Masahiro Yamada1-1/+1
This reverts commit cd8c917a56f20f48748dd43d9ae3caff51d5b987. Commit 129ab0d2d9f3 ("kbuild: do not quote string values in include/config/auto.conf") provided the final solution. Now reverting the temporary workaround. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2022-01-22usr/include/Makefile: add linux/nfc.h to the compile-test coverageDmitry V. Levin1-1/+0
As linux/nfc.h userspace compilation was finally fixed by commits 79b69a83705e ("nfc: uapi: use kernel size_t to fix user-space builds") and 7175f02c4e5f ("uapi: fix linux/nfc.h userspace compilation errors"), there is no need to keep the compile-test exception for it in usr/include/Makefile. Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
2022-01-22mm: hide the FRONTSWAP Kconfig symbolChristoph Hellwig1-15/+3
Select FRONTSWAP from ZSWAP instead of prompting for it. Link: https://lkml.kernel.org/r/20211224062246.1258487-14-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Hugh Dickins <hughd@google.com> Cc: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22frontswap: remove support for multiple opsChristoph Hellwig3-42/+19
There is only a single instance of frontswap ops in the kernel, so simplify the frontswap code by removing support for multiple operations. Link: https://lkml.kernel.org/r/20211224062246.1258487-13-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Hugh Dickins <hughd@google.com> Cc: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22mm: mark swap_lock and swap_active_head staticChristoph Hellwig2-4/+2
swap_lock and swap_active_head are only used in swapfile.c, so mark them static. Link: https://lkml.kernel.org/r/20211224062246.1258487-12-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Hugh Dickins <hughd@google.com> Cc: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22frontswap: simplify frontswap_register_opsChristoph Hellwig1-41/+0
Given that frontswap_register_ops must be called from built-in code, there is no need to handle the case of swapfiles coming online before or during it, so delete the code that deals with that case. Link: https://lkml.kernel.org/r/20211224062246.1258487-11-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Hugh Dickins <hughd@google.com> Cc: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22frontswap: remove frontswap_testChristoph Hellwig2-12/+1
frontswap_test is unused now, remove it. Link: https://lkml.kernel.org/r/20211224062246.1258487-10-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Hugh Dickins <hughd@google.com> Cc: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22mm: simplify try_to_unuseChristoph Hellwig5-97/+30
Remove the unused frontswap and pages_to_unuse arguments, and mark the function static now that the caller in frontswap is gone. [akpm@linux-foundation.org: fix shmem_unuse() stub, per Matthew] Link: https://lkml.kernel.org/r/20211224062246.1258487-9-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Hugh Dickins <hughd@google.com> Cc: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22frontswap: remove the frontswap exportsChristoph Hellwig1-6/+0
None of the frontswap API is called from modular code. Link: https://lkml.kernel.org/r/20211224062246.1258487-8-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Hugh Dickins <hughd@google.com> Cc: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22frontswap: simplify frontswap_initChristoph Hellwig3-11/+4
Just use IS_ENABLED() and remove the __frontswap_init indirection. Also remove the unused export. Link: https://lkml.kernel.org/r/20211224062246.1258487-7-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Hugh Dickins <hughd@google.com> Cc: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22frontswap: remove frontswap_curr_pagesChristoph Hellwig2-29/+0
frontswap_curr_pages is never called, so remove it. Link: https://lkml.kernel.org/r/20211224062246.1258487-6-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Hugh Dickins <hughd@google.com> Cc: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22frontswap: remove frontswap_shrinkChristoph Hellwig3-97/+0
frontswap_shrink is never called, so remove it. Link: https://lkml.kernel.org/r/20211224062246.1258487-5-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Hugh Dickins <hughd@google.com> Cc: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22frontswap: remove frontswap_tmem_exclusive_getsChristoph Hellwig2-24/+1
frontswap_tmem_exclusive_gets is never called, so remove it. Link: https://lkml.kernel.org/r/20211224062246.1258487-4-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Hugh Dickins <hughd@google.com> Cc: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22frontswap: remove frontswap_writethroughChristoph Hellwig3-29/+1
frontswap_writethrough is never called, so remove it. Link: https://lkml.kernel.org/r/20211224062246.1258487-3-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Juergen Gross <jgross@suse.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Hugh Dickins <hughd@google.com> Cc: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Seth Jennings <sjenning@redhat.com> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22mm: remove cleancacheChristoph Hellwig37-873/+4
Patch series "remove Xen tmem leftovers". Since the removal of the Xen tmem driver in 2019, the cleancache hooks are entirely unused, as are large parts of frontswap. This series against linux-next (with the folio changes included) removes cleancaches, and cuts down frontswap to the bits actually used by zswap. This patch (of 13): The cleancache subsystem is unused since the removal of Xen tmem driver in commit 814bbf49dcd0 ("xen: remove tmem driver"). [akpm@linux-foundation.org: remove now-unreachable code] Link: https://lkml.kernel.org/r/20211224062246.1258487-1-hch@lst.de Link: https://lkml.kernel.org/r/20211224062246.1258487-2-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Juergen Gross <jgross@suse.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Konrad Rzeszutek Wilk <Konrad.wilk@oracle.com> Cc: Hugh Dickins <hughd@google.com> Cc: Seth Jennings <sjenning@redhat.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22lib/stackdepot: always do filter_irq_stacks() in stack_depot_save()Marco Elver2-1/+13
The non-interrupt portion of interrupt stack traces before interrupt entry is usually arbitrary. Therefore, saving stack traces of interrupts (that include entries before interrupt entry) to stack depot leads to unbounded stackdepot growth. As such, use of filter_irq_stacks() is a requirement to ensure stackdepot can efficiently deduplicate interrupt stacks. Looking through all current users of stack_depot_save(), none (except KASAN) pass the stack trace through filter_irq_stacks() before passing it on to stack_depot_save(). Rather than adding filter_irq_stacks() to all current users of stack_depot_save(), it became clear that stack_depot_save() should simply do filter_irq_stacks(). Link: https://lkml.kernel.org/r/20211130095727.2378739-1-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Alexander Potapenko <glider@google.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: "Gustavo A. R. Silva" <gustavoars@kernel.org> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22lib/stackdepot: allow optional init and stack_table allocation by kvmalloc()Vlastimil Babka11-18/+76
Currently, enabling CONFIG_STACKDEPOT means its stack_table will be allocated from memblock, even if stack depot ends up not actually used. The default size of stack_table is 4MB on 32-bit, 8MB on 64-bit. This is fine for use-cases such as KASAN which is also a config option and has overhead on its own. But it's an issue for functionality that has to be actually enabled on boot (page_owner) or depends on hardware (GPU drivers) and thus the memory might be wasted. This was raised as an issue [1] when attempting to add stackdepot support for SLUB's debug object tracking functionality. It's common to build kernels with CONFIG_SLUB_DEBUG and enable slub_debug on boot only when needed, or create only specific kmem caches with debugging for testing purposes. It would thus be more efficient if stackdepot's table was allocated only when actually going to be used. This patch thus makes the allocation (and whole stack_depot_init() call) optional: - Add a CONFIG_STACKDEPOT_ALWAYS_INIT flag to keep using the current well-defined point of allocation as part of mem_init(). Make CONFIG_KASAN select this flag. - Other users have to call stack_depot_init() as part of their own init when it's determined that stack depot will actually be used. This may depend on both config and runtime conditions. Convert current users which are page_owner and several in the DRM subsystem. Same will be done for SLUB later. - Because the init might now be called after the boot-time memblock allocation has given all memory to the buddy allocator, change stack_depot_init() to allocate stack_table with kvmalloc() when memblock is no longer available. Also handle allocation failure by disabling stackdepot (could have theoretically happened even with memblock allocation previously), and don't unnecessarily align the memblock allocation to its own size anymore. [1] https://lore.kernel.org/all/CAMuHMdW=eoVzM1Re5FVoEN87nKfiLmM2+Ah7eNu2KXEhCvbZyA@mail.gmail.com/ Link: https://lkml.kernel.org/r/20211013073005.11351-1-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Marco Elver <elver@google.com> # stackdepot Cc: Marco Elver <elver@google.com> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Oliver Glitta <glittao@gmail.com> Cc: Imran Khan <imran.f.khan@oracle.com> From: Colin Ian King <colin.king@canonical.com> Subject: lib/stackdepot: fix spelling mistake and grammar in pr_err message There is a spelling mistake of the work allocation so fix this and re-phrase the message to make it easier to read. Link: https://lkml.kernel.org/r/20211015104159.11282-1-colin.king@canonical.com Signed-off-by: Colin Ian King <colin.king@canonical.com> Cc: Vlastimil Babka <vbabka@suse.cz> From: Vlastimil Babka <vbabka@suse.cz> Subject: lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() - fixup On FLATMEM, we call page_ext_init_flatmem_late() just before kmem_cache_init() which means stack_depot_init() (called by page owner init) will not recognize properly it should use kvmalloc() and not memblock_alloc(). memblock_alloc() will also not issue a warning and return a block memory that can be invalid and cause kernel page fault when saving stacks, as reported by the kernel test robot [1]. Fix this by moving page_ext_init_flatmem_late() below kmem_cache_init() so that slab_is_available() is true during stack_depot_init(). SPARSEMEM doesn't have this issue, as it doesn't do page_ext_init_flatmem_late(), but a different page_ext_init() even later in the boot process. Thanks to Mike Rapoport for pointing out the FLATMEM init ordering issue. While at it, also actually resolve a checkpatch warning in stack_depot_init() from DRM CI, which was supposed to be in the original patch already. [1] https://lore.kernel.org/all/20211014085450.GC18719@xsang-OptiPlex-9020/ Link: https://lkml.kernel.org/r/6abd9213-19a9-6d58-cedc-2414386d2d81@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reported-by: kernel test robot <oliver.sang@intel.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> From: Vlastimil Babka <vbabka@suse.cz> Subject: lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() - fixup3 Due to cd06ab2fd48f ("drm/locking: add backtrace for locking contended locks without backoff") landing recently to -next adding a new stack depot user in drivers/gpu/drm/drm_modeset_lock.c we need to add an appropriate call to stack_depot_init() there as well. Link: https://lkml.kernel.org/r/2a692365-cfa1-64f2-34e0-8aa5674dce5e@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Cc: Marco Elver <elver@google.com> Cc: Vijayanand Jitta <vjitta@codeaurora.org> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Oliver Glitta <glittao@gmail.com> Cc: Imran Khan <imran.f.khan@oracle.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> From: Vlastimil Babka <vbabka@suse.cz> Subject: lib/stackdepot: allow optional init and stack_table allocation by kvmalloc() - fixup4 Due to 4e66934eaadc ("lib: add reference counting tracking infrastructure") landing recently to net-next adding a new stack depot user in lib/ref_tracker.c we need to add an appropriate call to stack_depot_init() there as well. Link: https://lkml.kernel.org/r/45c1b738-1a2f-5b5f-2f6d-86fab206d01c@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Eric Dumazet <edumazet@google.com> Cc: Jiri Slab <jirislaby@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22proc: remove PDE_DATA() completelyMuchun Song50-180/+178
Remove PDE_DATA() completely and replace it with pde_data(). [akpm@linux-foundation.org: fix naming clash in drivers/nubus/proc.c] [akpm@linux-foundation.org: now fix it properly] Link: https://lkml.kernel.org/r/20211124081956.87711-2-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Alexey Gladkov <gladkov.alexey@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22fs: proc: store PDE()->data into inode->i_privateMuchun Song4-12/+13
PDE_DATA(inode) is introduced to get user private data and hide the layout of struct proc_dir_entry. The inode->i_private is used to do the same thing as well. Save a copy of user private data to inode-> i_private when proc inode is allocated. This means the user also can get their private data by inode->i_private. Introduce pde_data() to wrap inode->i_private so that we can remove PDE_DATA() from fs/proc/generic.c and make PTE_DATE() as a wrapper of pde_data(). It will be easier if we decide to remove PDE_DATE() in the future. Link: https://lkml.kernel.org/r/20211124081956.87711-1-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Alexey Gladkov <gladkov.alexey@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22zsmalloc: replace get_cpu_var with local_lockMike Galbraith1-3/+8
The usage of get_cpu_var() in zs_map_object() is problematic because it disables preemption and makes it impossible to acquire any sleeping lock on PREEMPT_RT such as a spinlock_t. Replace the get_cpu_var() usage with a local_lock_t which is embedded struct mapping_area. It ensures that the access the struct is synchronized against all users on the same CPU. [minchan: remove the bit_spin_lock part and change the title] Link: https://lkml.kernel.org/r/20211115185909.3949505-10-minchan@kernel.org Signed-off-by: Mike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Minchan Kim <minchan@kernel.org> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22zsmalloc: replace per zpage lock with pool->migrate_lockMinchan Kim1-109/+96
The zsmalloc has used a bit for spin_lock in zpage handle to keep zpage object alive during several operations. However, it causes the problem for PREEMPT_RT as well as introducing too complicated. This patch replaces the bit spin_lock with pool->migrate_lock rwlock. It could make the code simple as well as zsmalloc work under PREEMPT_RT. The drawback is the pool->migrate_lock is bigger granuarity than per zpage lock so the contention would be higher than old when both IO-related operations(i.e., zsmalloc, zsfree, zs_[map|unmap]) and compaction(page/zpage migration) are going in parallel(*, the migrate_lock is rwlock and IO related functions are all read side lock so there is no contention). However, the write-side is fast enough(dominant overhead is just page copy) so it wouldn't affect much. If the lock granurity becomes more problem later, we could introduce table locks based on handle as a hash value. Link: https://lkml.kernel.org/r/20211115185909.3949505-9-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22locking/rwlocks: introduce write_lock_nestedMinchan Kim6-0/+47
In preparation for converting bit_spin_lock to rwlock in zsmalloc so that multiple writers of zspages can run at the same time but those zspages are supposed to be different zspage instance. Thus, it's not deadlock. This patch adds write_lock_nested to support the case for LOCKDEP. [minchan@kernel.org: fix write_lock_nested for RT] Link: https://lkml.kernel.org/r/YZfrMTAXV56HFWJY@google.com [bigeasy@linutronix.de: fixup write_lock_nested() implementation] Link: https://lkml.kernel.org/r/20211123170134.y6xb7pmpgdn4m3bn@linutronix.de Link: https://lkml.kernel.org/r/20211115185909.3949505-8-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22zsmalloc: remove zspage isolation for migrationMinchan Kim1-149/+8
zspage isolation for migration introduced additional exceptions to be dealt with since the zspage was isolated from class list. The reason why I isolated zspage from class list was to prevent race between obj_malloc and page migration via allocating zpage from the zspage further. However, it couldn't prevent object freeing from zspage so it needed corner case handling. This patch removes the whole mess. Now, we are fine since class->lock and zspage->lock can prevent the race. Link: https://lkml.kernel.org/r/20211115185909.3949505-7-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22zsmalloc: move huge compressed obj from page to zspageMinchan Kim1-24/+26
The flag aims for zspage, not per page. Let's move it to zspage. Link: https://lkml.kernel.org/r/20211115185909.3949505-6-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22zsmalloc: introduce obj_allocatedMinchan Kim1-17/+16
The usage pattern for obj_to_head is to check whether the zpage is allocated or not. Thus, introduce obj_allocated. Link: https://lkml.kernel.org/r/20211115185909.3949505-5-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22zsmalloc: decouple class actions from zspage worksMinchan Kim1-10/+13
This patch moves class stat update out of obj_malloc since it's not related to zspage operation. This is a preparation to introduce new lock scheme in next patch. Link: https://lkml.kernel.org/r/20211115185909.3949505-4-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22zsmalloc: rename zs_stat_type to class_stat_typeMinchan Kim1-12/+12
The stat aims for class stat, not zspage so rename it. Link: https://lkml.kernel.org/r/20211115185909.3949505-3-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22zsmalloc: introduce some helper functionsMinchan Kim1-31/+23
Patch series "zsmalloc: remove bit_spin_lock", v2. zsmalloc uses bit_spin_lock to minimize space overhead since it's zpage granularity lock. However, it causes zsmalloc non-working under PREEMPT_RT as well as adding too much complication. This patchset tries to replace the bit_spin_lock with per-pool rwlock. It also removes unnecessary zspage isolation logic from class, which was the other part too much complication added into zsmalloc. Last patch changes the get_cpu_var to local_lock to make it work in PREEMPT_RT. This patch (of 9): get_zspage_mapping returns fullness as well as class_idx. However, the fullness is usually not used since it could be stale in some contexts. It causes misleading as well as unnecessary instructions so this patch introduces zspage_class. obj_to_location also produces page and index but we don't need always the index, either so this patch introduces obj_to_page. Link: https://lkml.kernel.org/r/20211115185909.3949505-1-minchan@kernel.org Link: https://lkml.kernel.org/r/20211115185909.3949505-2-minchan@kernel.org Signed-off-by: Minchan Kim <minchan@kernel.org> Acked-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-22sysctl: returns -EINVAL when a negative value is passed to ↵Baokun Li1-3/+4
proc_doulongvec_minmax When we pass a negative value to the proc_doulongvec_minmax() function, the function returns 0, but the corresponding interface value does not change. we can easily reproduce this problem with the following commands: cd /proc/sys/fs/epoll echo -1 > max_user_watches; echo $?; cat max_user_watches This function requires a non-negative number to be passed in, so when a negative number is passed in, -EINVAL is returned. Link: https://lkml.kernel.org/r/20211220092627.3744624-1-libaokun1@huawei.com Signed-off-by: Baokun Li <libaokun1@huawei.com> Reported-by: Hulk Robot <hulkci@huawei.com> Acked-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>