| Commit message (Collapse) | Author | Files | Lines |
|
pci_call_probe() prevents the nesting of work_on_cpu() for a scenario
where a VF device is probed from work_on_cpu() of the PF.
Replace the cpumask used in pci_call_probe() from all online CPUs to only
housekeeping CPUs. This is to ensure that there are no additional latency
overheads caused due to the pinning of jobs on isolated CPUs.
Signed-off-by: Alex Belits <abelits@marvell.com>
Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Link: https://lkml.kernel.org/r/20200625223443.2684-3-nitesh@redhat.com
|
|
The current implementation of cpumask_local_spread() does not respect the
isolated CPUs, i.e., even if a CPU has been isolated for Real-Time task,
it will return it to the caller for pinning of its IRQ threads. Having
these unwanted IRQ threads on an isolated CPU adds up to a latency
overhead.
Restrict the CPUs that are returned for spreading IRQs only to the
available housekeeping CPUs.
Signed-off-by: Alex Belits <abelits@marvell.com>
Signed-off-by: Nitesh Narayan Lal <nitesh@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200625223443.2684-2-nitesh@redhat.com
|
|
There is a report that when uclamp is enabled, a netperf UDP test
regresses compared to a kernel compiled without uclamp.
https://lore.kernel.org/lkml/20200529100806.GA3070@suse.de/
While investigating the root cause, there were no sign that the uclamp
code is doing anything particularly expensive but could suffer from bad
cache behavior under certain circumstances that are yet to be
understood.
https://lore.kernel.org/lkml/20200616110824.dgkkbyapn3io6wik@e107158-lin/
To reduce the pressure on the fast path anyway, add a static key that is
by default will skip executing uclamp logic in the
enqueue/dequeue_task() fast path until it's needed.
As soon as the user start using util clamp by:
1. Changing uclamp value of a task with sched_setattr()
2. Modifying the default sysctl_sched_util_clamp_{min, max}
3. Modifying the default cpu.uclamp.{min, max} value in cgroup
We flip the static key now that the user has opted to use util clamp.
Effectively re-introducing uclamp logic in the enqueue/dequeue_task()
fast path. It stays on from that point forward until the next reboot.
This should help minimize the effect of util clamp on workloads that
don't need it but still allow distros to ship their kernels with uclamp
compiled in by default.
SCHED_WARN_ON() in uclamp_rq_dec_id() was removed since now we can end
up with unbalanced call to uclamp_rq_dec_id() if we flip the key while
a task is running in the rq. Since we know it is harmless we just
quietly return if we attempt a uclamp_rq_dec_id() when
rq->uclamp[].bucket[].tasks is 0.
In schedutil, we introduce a new uclamp_is_enabled() helper which takes
the static key into account to ensure RT boosting behavior is retained.
The following results demonstrates how this helps on 2 Sockets Xeon E5
2x10-Cores system.
nouclamp uclamp uclamp-static-key
Hmean send-64 162.43 ( 0.00%) 157.84 * -2.82%* 163.39 * 0.59%*
Hmean send-128 324.71 ( 0.00%) 314.78 * -3.06%* 326.18 * 0.45%*
Hmean send-256 641.55 ( 0.00%) 628.67 * -2.01%* 648.12 * 1.02%*
Hmean send-1024 2525.28 ( 0.00%) 2448.26 * -3.05%* 2543.73 * 0.73%*
Hmean send-2048 4836.14 ( 0.00%) 4712.08 * -2.57%* 4867.69 * 0.65%*
Hmean send-3312 7540.83 ( 0.00%) 7425.45 * -1.53%* 7621.06 * 1.06%*
Hmean send-4096 9124.53 ( 0.00%) 8948.82 * -1.93%* 9276.25 * 1.66%*
Hmean send-8192 15589.67 ( 0.00%) 15486.35 * -0.66%* 15819.98 * 1.48%*
Hmean send-16384 26386.47 ( 0.00%) 25752.25 * -2.40%* 26773.74 * 1.47%*
The perf diff between nouclamp and uclamp-static-key when uclamp is
disabled in the fast path:
8.73% -1.55% [kernel.kallsyms] [k] try_to_wake_up
0.07% +0.04% [kernel.kallsyms] [k] deactivate_task
0.13% -0.02% [kernel.kallsyms] [k] activate_task
The diff between nouclamp and uclamp-static-key when uclamp is enabled
in the fast path:
8.73% -0.72% [kernel.kallsyms] [k] try_to_wake_up
0.13% +0.39% [kernel.kallsyms] [k] activate_task
0.07% +0.38% [kernel.kallsyms] [k] deactivate_task
Fixes: 69842cba9ace ("sched/uclamp: Add CPU's clamp buckets refcounting")
Reported-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lkml.kernel.org/r/20200630112123.12076-3-qais.yousef@arm.com
|
|
struct uclamp_rq was zeroed out entirely in assumption that in the first
call to uclamp_rq_inc() they'd be initialized correctly in accordance to
default settings.
But when next patch introduces a static key to skip
uclamp_rq_{inc,dec}() until userspace opts in to use uclamp, schedutil
will fail to perform any frequency changes because the
rq->uclamp[UCLAMP_MAX].value is zeroed at init and stays as such. Which
means all rqs are capped to 0 by default.
Fix it by making sure we do proper initialization at init without
relying on uclamp_rq_inc() doing it later.
Fixes: 69842cba9ace ("sched/uclamp: Add CPU's clamp buckets refcounting")
Signed-off-by: Qais Yousef <qais.yousef@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lkml.kernel.org/r/20200630112123.12076-2-qais.yousef@arm.com
|
|
For some mysterious reason GCC-4.9 has a 64 byte section alignment for
structures, all other GCC versions (and Clang) tested (including 4.8
and 5.0) are fine with the 32 bytes alignment.
Getting this right is important for the new SCHED_DATA macro that
creates an explicitly ordered array of 'struct sched_class' in the
linker script and expect pointer arithmetic to work.
Fixes: c3a340f7e7ea ("sched: Have sched_class_highest define by vmlinux.lds.h")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20200630144905.GX4817@hirez.programming.kicks-ass.net
|
|
While integrating rseq into glibc and replacing glibc's sched_getcpu
implementation with rseq, glibc's tests discovered an issue with
incorrect __rseq_abi.cpu_id field value right after the first time
a newly created process issues sched_setaffinity.
For the records, it triggers after building glibc and running tests, and
then issuing:
for x in {1..2000} ; do posix/tst-affinity-static & done
and shows up as:
error: Unexpected CPU 2, expected 0
error: Unexpected CPU 2, expected 0
error: Unexpected CPU 2, expected 0
error: Unexpected CPU 2, expected 0
error: Unexpected CPU 138, expected 0
error: Unexpected CPU 138, expected 0
error: Unexpected CPU 138, expected 0
error: Unexpected CPU 138, expected 0
This is caused by the scheduler invoking __set_task_cpu() directly from
sched_fork() and wake_up_new_task(), thus bypassing rseq_migrate() which
is done by set_task_cpu().
Add the missing rseq_migrate() to both functions. The only other direct
use of __set_task_cpu() is done by init_idle(), which does not involve a
user-space task.
Based on my testing with the glibc test-case, just adding rseq_migrate()
to wake_up_new_task() is sufficient to fix the observed issue. Also add
it to sched_fork() to keep things consistent.
The reason why this never triggered so far with the rseq/basic_test
selftest is unclear.
The current use of sched_getcpu(3) does not typically require it to be
always accurate. However, use of the __rseq_abi.cpu_id field within rseq
critical sections requires it to be accurate. If it is not accurate, it
can cause corruption in the per-cpu data targeted by rseq critical
sections in user-space.
Reported-By: Florian Weimer <fweimer@redhat.com>
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-By: Florian Weimer <fweimer@redhat.com>
Cc: stable@vger.kernel.org # v4.18+
Link: https://lkml.kernel.org/r/20200707201505.2632-1-mathieu.desnoyers@efficios.com
|
|
The recent commit:
c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu")
moved these lines in ttwu():
p->sched_contributes_to_load = !!task_contributes_to_load(p);
p->state = TASK_WAKING;
up before:
smp_cond_load_acquire(&p->on_cpu, !VAL);
into the 'p->on_rq == 0' block, with the thinking that once we hit
schedule() the current task cannot change it's ->state anymore. And
while this is true, it is both incorrect and flawed.
It is incorrect in that we need at least an ACQUIRE on 'p->on_rq == 0'
to avoid weak hardware from re-ordering things for us. This can fairly
easily be achieved by relying on the control-dependency already in
place.
The second problem, which makes the flaw in the original argument, is
that while schedule() will not change prev->state, it will read it a
number of times (arguably too many times since it's marked volatile).
The previous condition 'p->on_cpu == 0' was sufficient because that
indicates schedule() has completed, and will no longer read
prev->state. So now the trick is to make this same true for the (much)
earlier 'prev->on_rq == 0' case.
Furthermore, in order to make the ordering stick, the 'prev->on_rq = 0'
assignment needs to he a RELEASE, but adding additional ordering to
schedule() is an unwelcome proposition at the best of times, doubly so
for mere accounting.
Luckily we can push the prev->state load up before rq->lock, with the
only caveat that we then have to re-read the state after. However, we
know that if it changed, we no longer have to worry about the blocking
path. This gives us the required ordering, if we block, we did the
prev->state load before an (effective) smp_mb() and the p->on_rq store
needs not change.
With this we end up with the effective ordering:
LOAD p->state LOAD-ACQUIRE p->on_rq == 0
MB
STORE p->on_rq, 0 STORE p->state, TASK_WAKING
which ensures the TASK_WAKING store happens after the prev->state
load, and all is well again.
Fixes: c6e7bd7afaeb ("sched/core: Optimize ttwu() spinning on p->on_cpu")
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Dave Jones <davej@codemonkey.org.uk>
Tested-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Link: https://lkml.kernel.org/r/20200707102957.GN117543@hirez.programming.kicks-ass.net
|
|
|
|
Using a mutex for "print this warning only once" is so overdesigned as
to be actively offensive to my sensitive stomach.
Just use "pr_info_once()" that already does this, although in a
(harmlessly) racy manner that can in theory cause the message to be
printed twice if more than one CPU races on that "is this the first
time" test.
[ If somebody really cares about that harmless data race (which sounds
very unlikely indeed), that person can trivially fix printk_once() by
using a simple atomic access, preferably with an optimistic non-atomic
test first before even bothering to treat the pointless "make sure it
is _really_ just once" case.
A mutex is most definitely never the right primitive to use for
something like this. ]
Yes, this is a small and meaningless detail in a code path that hardly
matters. But let's keep some code quality standards here, and not
accept outrageously bad code.
Link: https://lore.kernel.org/lkml/CAHk-=wgV9toS7GU3KmNpj8hCS9SeF+A0voHS8F275_mgLhL4Lw@mail.gmail.com/
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Use preempt_disable() to fix the following bug under CONFIG_DEBUG_PREEMPT.
[ 21.915305] BUG: using smp_processor_id() in preemptible [00000000] code: qemu-system-mip/1056
[ 21.923996] caller is do_ri+0x1d4/0x690
[ 21.927921] CPU: 0 PID: 1056 Comm: qemu-system-mip Not tainted 5.8.0-rc2 #3
[ 21.934913] Stack : 0000000000000001 ffffffff81370000 ffffffff8071cd60 a80f926d5ac95694
[ 21.942984] a80f926d5ac95694 0000000000000000 98000007f0043c88 ffffffff80f2fe40
[ 21.951054] 0000000000000000 0000000000000000 0000000000000001 0000000000000000
[ 21.959123] ffffffff802d60cc 98000007f0043dd8 ffffffff81f4b1e8 ffffffff81f60000
[ 21.967192] ffffffff81f60000 ffffffff80fe0000 ffff000000000000 0000000000000000
[ 21.975261] fffffffff500cce1 0000000000000001 0000000000000002 0000000000000000
[ 21.983331] ffffffff80fe1a40 0000000000000006 ffffffff8077f940 0000000000000000
[ 21.991401] ffffffff81460000 98000007f0040000 98000007f0043c80 000000fffba8cf20
[ 21.999471] ffffffff8071cd60 0000000000000000 0000000000000000 0000000000000000
[ 22.007541] 0000000000000000 0000000000000000 ffffffff80212ab4 a80f926d5ac95694
[ 22.015610] ...
[ 22.018086] Call Trace:
[ 22.020562] [<ffffffff80212ab4>] show_stack+0xa4/0x138
[ 22.025732] [<ffffffff8071cd60>] dump_stack+0xf0/0x150
[ 22.030903] [<ffffffff80c73f5c>] check_preemption_disabled+0xf4/0x100
[ 22.037375] [<ffffffff80213b84>] do_ri+0x1d4/0x690
[ 22.042198] [<ffffffff8020b828>] handle_ri_int+0x44/0x5c
[ 24.359386] BUG: using smp_processor_id() in preemptible [00000000] code: qemu-system-mip/1072
[ 24.368204] caller is do_ri+0x1a8/0x690
[ 24.372169] CPU: 4 PID: 1072 Comm: qemu-system-mip Not tainted 5.8.0-rc2 #3
[ 24.379170] Stack : 0000000000000001 ffffffff81370000 ffffffff8071cd60 a80f926d5ac95694
[ 24.387246] a80f926d5ac95694 0000000000000000 98001007ef06bc88 ffffffff80f2fe40
[ 24.395318] 0000000000000000 0000000000000000 0000000000000001 0000000000000000
[ 24.403389] ffffffff802d60cc 98001007ef06bdd8 ffffffff81f4b818 ffffffff81f60000
[ 24.411461] ffffffff81f60000 ffffffff80fe0000 ffff000000000000 0000000000000000
[ 24.419533] fffffffff500cce1 0000000000000001 0000000000000002 0000000000000000
[ 24.427603] ffffffff80fe0000 0000000000000006 ffffffff8077f940 0000000000000020
[ 24.435673] ffffffff81460020 98001007ef068000 98001007ef06bc80 000000fffbbbb370
[ 24.443745] ffffffff8071cd60 0000000000000000 0000000000000000 0000000000000000
[ 24.451816] 0000000000000000 0000000000000000 ffffffff80212ab4 a80f926d5ac95694
[ 24.459887] ...
[ 24.462367] Call Trace:
[ 24.464846] [<ffffffff80212ab4>] show_stack+0xa4/0x138
[ 24.470029] [<ffffffff8071cd60>] dump_stack+0xf0/0x150
[ 24.475208] [<ffffffff80c73f5c>] check_preemption_disabled+0xf4/0x100
[ 24.481682] [<ffffffff80213b58>] do_ri+0x1a8/0x690
[ 24.486509] [<ffffffff8020b828>] handle_ri_int+0x44/0x5c
Signed-off-by: Xingxing Su <suxingxing@loongson.cn>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
|
|
This resolves the hazard between the mtc0 in the change_c0_status() and
the mfc0 in configure_exception_vector(). Without resolving this hazard
configure_exception_vector() could read an old value and would restore
this old value again. This would revert the changes change_c0_status()
did. I checked this by printing out the read_c0_status() at the end of
per_cpu_trap_init() and the ST0_MX is not set without this patch.
The hazard is documented in the MIPS Architecture Reference Manual Vol.
III: MIPS32/microMIPS32 Privileged Resource Architecture (MD00088), rev
6.03 table 8.1 which includes:
Producer | Consumer | Hazard
----------|----------|----------------------------
mtc0 | mfc0 | any coprocessor 0 register
I saw this hazard on an Atheros AR9344 rev 2 SoC with a MIPS 74Kc CPU.
There the change_c0_status() function would activate the DSPen by
setting ST0_MX in the c0_status register. This was reverted and then the
system got a DSP exception when the DSP registers were saved in
save_dsp() in the first process switch. The crash looks like this:
[ 0.089999] Mount-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[ 0.097796] Mountpoint-cache hash table entries: 1024 (order: 0, 4096 bytes, linear)
[ 0.107070] Kernel panic - not syncing: Unexpected DSP exception
[ 0.113470] Rebooting in 1 seconds..
We saw this problem in OpenWrt only on the MIPS 74Kc based Atheros SoCs,
not on the 24Kc based SoCs. We only saw it with kernel 5.4 not with
kernel 4.19, in addition we had to use GCC 8.4 or 9.X, with GCC 8.3 it
did not happen.
In the kernel I bisected this problem to commit 9012d011660e ("compiler:
allow all arches to enable CONFIG_OPTIMIZE_INLINING"), but when this was
reverted it also happened after commit 172dcd935c34b ("MIPS: Always
allocate exception vector for MIPSr2+").
Commit 0b24cae4d535 ("MIPS: Add missing EHB in mtc0 -> mfc0 sequence.")
does similar changes to a different file. I am not sure if there are
more places affected by this problem.
Signed-off-by: Hauke Mehrtens <hauke@hauke-m.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
|
|
Running `make savedefconfig` creates by default `defconfig`, which is,
currently, on git’s radar, for example, `git status` lists this file as
untracked.
So, add the file to `.gitignore`, so it’s ignored by git.
Signed-off-by: Paul Menzel <pmenzel@molgen.mpg.de>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
When switching to TWA_SIGNAL for task_work notifications, we also made
any signal based condition in io_cqring_wait() return -ERESTARTSYS.
This breaks applications that rely on using signals to abort someone
waiting for events.
Check if we have a signal pending because of queued task_work, and
repeat the signal check once we've run the task_work. This provides a
reliable way of telling the two apart.
Additionally, only use TWA_SIGNAL if we are using an eventfd. If not,
we don't have the dependency situation described in the original commit,
and we can get by with just using TWA_RESUME like we previously did.
Fixes: ce593a6c480a ("io_uring: use signal based task_work running")
Cc: stable@vger.kernel.org # v5.7
Reported-by: Andres Freund <andres@anarazel.de>
Tested-by: Andres Freund <andres@anarazel.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Xen PV doesn't implement ESPFIX64, so they don't work right. Disable
them. Also print a warning the first time anyone tries to use a
16-bit segment on a Xen PV guest that would otherwise allow it
to help people diagnose this change in behavior.
This gets us closer to having all x86 selftests pass on Xen PV.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/92b2975459dfe5929ecf34c3896ad920bd9e3f2d.1593795633.git.luto@kernel.org
|
|
DEFINE_IDTENTRY_MCE and DEFINE_IDTENTRY_DEBUG were wired up as non-RAW
on x86_32, but the code expected them to be RAW.
Get rid of all the macro indirection for them on 32-bit and just use
DECLARE_IDTENTRY_RAW and DEFINE_IDTENTRY_RAW directly.
Also add a warning to make sure that we only hit the _kernel paths
in kernel mode.
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/9e90a7ee8e72fd757db6d92e1e5ff16339c1ecf9.1593795633.git.luto@kernel.org
|
|
On Xen PV, #DB doesn't use IST. It still needs to be correctly routed
depending on whether it came from user or kernel mode.
Get rid of DECLARE/DEFINE_IDTENTRY_XEN -- it was too hard to follow the
logic. Instead, route #DB and NMI through DECLARE/DEFINE_IDTENTRY_RAW on
Xen, and do the right thing for #DB. Also add more warnings to the
exc_debug* handlers to make this type of failure more obvious.
This fixes various forms of corruption that happen when usermode
triggers #DB on Xen PV.
Fixes: 4c0dcd8350a0 ("x86/entry: Implement user mode C entry points for #DB and #MCE")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/4163e733cce0b41658e252c6c6b3464f33fdff17.1593795633.git.luto@kernel.org
|
|
Chasing down a Xen bug caused me to realize that the new entry sanity
checks are still fairly weak. Add some more checks.
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/881de09e786ab93ce56ee4a2437ba2c308afe7a9.1593795633.git.luto@kernel.org
|
|
Move the clearing of the high bits of RAX after Xen PV joins the SYSENTER
path so that Xen PV doesn't skip it.
Arguably this code should be deleted instead, but that would belong in the
merge window.
Fixes: ffae641f5747 ("x86/entry/64/compat: Fix Xen PV SYSENTER frame setup")
Signed-off-by: Andy Lutomirski <luto@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/9d33b3f3216dcab008070f1c28b6091ae7199969.1593795633.git.luto@kernel.org
|
|
I2C_SMBUS_BLOCK_MAX defines already the maximum number as defined in the
SMBus 2.0 specs. I don't see a reason to add 1 here. Also, fix the errno
to what is suggested for this error.
Fixes: c9bfdc7c16cb ("i2c: mlxcpld: Add support for smbus block read transaction")
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Michael Shych <michaelsh@mellanox.com>
Tested-by: Michael Shych <michaelsh@mellanox.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
I can't recall why there was none, but we surely want to have it.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
Add more details which have either been missing ever since or describe
recent additions.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Luca Ceresoli <luca@lucaceresoli.net>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
The driver can't be loaded automatically because it misses
module alias to be provided. Add corresponding MODULE_DEVICE_TABLE()
call to the driver.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
Current AMD's zen-based APUs use this core for some of its i2c-buses.
With this patch we re-enable autodetection of hwmon-alike devices, so
lm-sensors will be able to work automatically.
It does not affect the boot-time of embedded devices, as the class is
set based on the DMI information.
DMI is probed only on Qtechnology QT5222 Industrial Camera Platform.
DocLink: https://qtec.com/camera-technology-camera-platforms/
Fixes: 3eddad96c439 ("i2c: designware: reverts "i2c: designware: Add support for AMD I2C controller"")
Signed-off-by: Ricardo Ribalda <ribalda@kernel.org>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
The PCA9665 datasheet says that I2CSTA = 78h indicates that SCL is stuck
low, this differs to the PCA9564 which uses 90h for this indication.
Treat either 0x78 or 0x90 as an indication that the SCL line is stuck.
Based on looking through the PCA9564 and PCA9665 datasheets this should
be safe for both chips. The PCA9564 should not return 0x78 for any valid
state and the PCA9665 should not return 0x90.
Fixes: eff9ec95efaa ("i2c-algo-pca: Add PCA9665 support")
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
When I increased the upper bound of the min_free_kbytes value in
ee8eb9a5fe863 ("mm/page_alloc: increase default min_free_kbytes bound") I
forgot to tweak the above comment to reflect the new value. This patch
fixes that mistake.
Signed-off-by: Joel Savitz <jsavitz@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Rafael Aquini <aquini@redhat.com>
Cc: Fabrizio D'Angelo <fdangelo@redhat.com>
Link: http://lkml.kernel.org/r/20200624221236.29560-1-jsavitz@redhat.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Fix the recently added new __vmalloc_node_range callers to pass the
correct values as the owner for display in /proc/vmallocinfo.
Fixes: 800e26b81311 ("x86/hyperv: allocate the hypercall page with only read and execute bits")
Fixes: 10d5e97c1bf8 ("arm64: use PAGE_KERNEL_ROX directly in alloc_insn_page")
Fixes: 7a0e27b2a0ce ("mm: remove vmalloc_exec")
Reported-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20200627075649.2455097-1-hch@lst.de
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Calling cma_declare_contiguous_nid() with false exact_nid for per-numa
reservation can easily cause cma leak and various confusion. For example,
mm/hugetlb.c is trying to reserve per-numa cma for gigantic pages. But it
can easily leak cma and make users confused when system has memoryless
nodes.
In case the system has 4 numa nodes, and only numa node0 has memory. if
we set hugetlb_cma=4G in bootargs, mm/hugetlb.c will get 4 cma areas for 4
different numa nodes. since exact_nid=false in current code, all 4 numa
nodes will get cma successfully from node0, but hugetlb_cma[1 to 3] will
never be available to hugepage will only allocate memory from
hugetlb_cma[0].
In case the system has 4 numa nodes, both numa node0&2 has memory, other
nodes have no memory. if we set hugetlb_cma=4G in bootargs, mm/hugetlb.c
will get 4 cma areas for 4 different numa nodes. since exact_nid=false in
current code, all 4 numa nodes will get cma successfully from node0 or 2,
but hugetlb_cma[1] and [3] will never be available to hugepage as
mm/hugetlb.c will only allocate memory from hugetlb_cma[0] and
hugetlb_cma[2]. This causes permanent leak of the cma areas which are
supposed to be used by memoryless node.
Of cource we can workaround the issue by letting mm/hugetlb.c scan all cma
areas in alloc_gigantic_page() even node_mask includes node0 only. that
means when node_mask includes node0 only, we can get page from
hugetlb_cma[1] to hugetlb_cma[3]. But this will cause kernel crash in
free_gigantic_page() while it wants to free page by:
cma_release(hugetlb_cma[page_to_nid(page)], page, 1 << order)
On the other hand, exact_nid=false won't consider numa distance, it might
be not that useful to leverage cma areas on remote nodes. I feel it is
much simpler to make exact_nid true to make everything clear. After that,
memoryless nodes won't be able to reserve per-numa CMA from other nodes
which have memory.
Fixes: cf11e85fc08c ("mm: hugetlb: optionally allocate gigantic hugepages using cma")
Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Roman Gushchin <guro@fb.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Aslan Bakirov <aslan@fb.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andreas Schaufler <andreas.schaufler@gmx.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Rik van Riel <riel@surriel.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Robin Murphy <robin.murphy@arm.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200628074345.27228-1-song.bao.hua@hisilicon.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Something changed recently to uncover this warning:
samples/vfs/test-statx.c:24:15: warning: `struct foo' declared inside parameter list will not be visible outside of this definition or declaration
24 | #define statx foo
| ^~~
Which is due the use of "struct statx" (here, "struct foo") in a function
prototype argument list before it has been defined:
int
# 56 "/usr/include/x86_64-linux-gnu/bits/statx-generic.h"
foo
# 56 "/usr/include/x86_64-linux-gnu/bits/statx-generic.h" 3 4
(int __dirfd, const char *__restrict __path, int __flags,
unsigned int __mask, struct
# 57 "/usr/include/x86_64-linux-gnu/bits/statx-generic.h"
foo
# 57 "/usr/include/x86_64-linux-gnu/bits/statx-generic.h" 3 4
*__restrict __buf)
__attribute__ ((__nothrow__ , __leaf__)) __attribute__ ((__nonnull__ (2, 5)));
Add explicit struct before #include to avoid warning.
Fixes: f1b5618e013a ("vfs: Add a sample program for the new mount API")
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Cc: Miklos Szeredi <mszeredi@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: David Howells <dhowells@redhat.com>
Link: http://lkml.kernel.org/r/202006282213.C516EA6@keescook
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
The routine hpage_nr_pages() was incorrectly used to calculate the number
of base pages in a hugetlb page. hpage_nr_pages is designed to be called
for THP pages and will return HPAGE_PMD_NR for hugetlb pages of any size.
Due to the context in which hpage_nr_pages was called, it is unlikely to
produce a user visible error. The routine with the incorrect call is only
exercised in the case of hugetlb memory error or migration. In addition,
this would need to be on an architecture which supports huge page sizes
less than PMD_SIZE. And, the vma containing the huge page would also need
to smaller than PMD_SIZE.
Fixes: c0d0381ade79 ("hugetlbfs: use i_mmap_rwsem for more pmd sharing synchronization")
Reported-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200629185003.97202-1-mike.kravetz@oracle.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This error path returned directly instead of calling sysctl_head_finish().
Fixes: ef9d965bc8b6 ("sysctl: reject gigantic reads/write to sysctl files")
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
|
|
KRYO4XX silver/LITTLE CPU cores with revision r1p0 are affected by
erratum 1530923 and 1024718, so add them to the respective list.
The variant and revision bits are implementation defined and are
different from the their Cortex CPU counterparts on which they are
based on, i.e., r1p0 is equivalent to rdpe.
Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Link: https://lore.kernel.org/r/7013e8a3f857ca7e82863cc9e34a614293d7f80c.1593539394.git.saiprakash.ranjan@codeaurora.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
KRYO4XX gold/big CPU core revisions r0p0 to r3p1 are affected by
erratum 1463225 and 1418040, so add them to the respective list.
The variant and revision bits are implementation defined and are
different from the their Cortex CPU counterparts on which they are
based on, i.e., (r0p0 to r3p1) is equivalent to (rcpe to rfpf).
Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Link: https://lore.kernel.org/r/83780e80c6377c12ca51b5d53186b61241685e49.1593539394.git.saiprakash.ranjan@codeaurora.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Add MIDR value for KRYO4XX gold/big CPU cores which are
used in Qualcomm Technologies, Inc. SoCs. This will be
used to identify and apply erratum which are applicable
for these CPU cores.
Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
Link: https://lore.kernel.org/r/9093fb82e22441076280ca1b729242ffde80c432.1593539394.git.saiprakash.ranjan@codeaurora.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Before this patch, some gfs2 code locked the freeze glock with LM_FLAG_NOEXP
(Do not freeze) flag, and some did not. We never want to freeze the freeze
glock, so this patch makes it consistently use LM_FLAG_NOEXP always.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
|
Before this patch, the freeze code in gfs2 specified GL_NOCACHE in
several places. That's wrong because we always want to know the state
of whether the file system is frozen.
There was also a problem with freeze/thaw transitioning the glock from
frozen (EX) to thawed (SH) because gfs2 will normally grant glocks in EX
to processes that request it in SH mode, unless GL_EXACT is specified.
Therefore, the freeze/thaw code, which tried to reacquire the glock in
SH mode would get the glock in EX mode, and miss the transition from EX
to SH. That made it think the thaw had completed normally, but since the
glock was still cached in EX, other nodes could not freeze again.
This patch removes the GL_NOCACHE flag to allow the freeze glock to be
cached. It also adds the GL_EXACT flag so the glock is fully transitioned
from EX to SH, thereby allowing future freeze operations.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
|
Before this patch, only read-write mounts would grab the freeze
glock in read-only mode, as part of gfs2_make_fs_rw. So the freeze
glock was never initialized. That meant requests to freeze, which
request the glock in EX, were granted without any state transition.
That meant you could mount a gfs2 file system, which is currently
frozen on a different cluster node, in read-only mode.
This patch makes read-only mounts lock the freeze glock in SH mode,
which will block for file systems that are frozen on another node.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
|
Before this patch, function freeze_go_sync, called when promoting
the freeze glock, was testing for the SDF_JOURNAL_LIVE superblock flag.
That's only set for read-write mounts. Read-only mounts don't use a
journal, so the bit is never set, so the freeze never happened.
This patch removes the check for SDF_JOURNAL_LIVE for freeze requests
but still checks it when deciding whether to flush a journal.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
|
In several places, we used the GIF_ORDERED inode flag to determine
if an inode was on the ordered writes list. However, since we always
held the sd_ordered_lock spin_lock during the manipulation, we can
just as easily check list_empty(&ip->i_ordered) instead.
This allows us to keep more than one ordered writes list to make
journal writing improvements.
This patch eliminates GIF_ORDERED in favor of checking list_empty.
Signed-off-by: Bob Peterson <rpeterso@redhat.com>
|
|
Fix a typo in SENSORS_IR35221 option: module name should be "ir35221"
instead of "ir35521".
Fixes: 8991ebd9c9a6 ("hwmon: (pmbus) Add client driver for IR35221")
Cc: Samuel Mendoza-Jonas <sam@mendozajonas.com>
Signed-off-by: Tao Ren <rentao.bupt@gmail.com>
Link: https://lore.kernel.org/r/20200702221349.18139-1-rentao.bupt@gmail.com
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
|
|
Today xenbus_map_ring_valloc() can return either a negative errno
value (-ENOMEM or -EINVAL) or a grant status value. This is a mess as
e.g -ENOMEM and GNTST_eagain have the same numeric value.
Fix that by turning all grant mapping errors into -ENOENT. This is
no problem as all callers of xenbus_map_ring_valloc() only use the
return value to print an error message, and in case of mapping errors
the grant status value has already been printed by __xenbus_map_ring()
before.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20200701121638.19840-3-jgross@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
|
xenbus_map_ring_valloc() and its sub-functions are putting quite large
structs and arrays on the stack. This is problematic at runtime, but
might also result in build failures (e.g. with clang due to the option
-Werror,-Wframe-larger-than=... used).
Fix that by moving most of the data from the stack into a dynamically
allocated struct. Performance is no issue here, as
xenbus_map_ring_valloc() is used only when adding a new PV device to
a backend driver.
While at it move some duplicated code from pv/hvm specific mapping
functions to the single caller.
Reported-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Link: https://lore.kernel.org/r/20200701121638.19840-2-jgross@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
|
Fix sparse build warning:
block/bio-integrity.c:27:6: warning:
symbol '__bio_integrity_free' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Acer C720 running Linux v5.3 reports this in klog:
tpm_tis: 1.2 TPM (device-id 0xB, rev-id 16)
tpm tpm0: tpm_try_transmit: send(): error -5
tpm tpm0: A TPM error (-5) occurred attempting to determine the timeouts
tpm_tis tpm_tis: Could not get TPM timeouts and durations
tpm_tis 00:08: 1.2 TPM (device-id 0xB, rev-id 16)
tpm tpm0: tpm_try_transmit: send(): error -5
tpm tpm0: A TPM error (-5) occurred attempting to determine the timeouts
tpm_tis 00:08: Could not get TPM timeouts and durations
ima: No TPM chip found, activating TPM-bypass!
tpm_inf_pnp 00:08: Found TPM with ID IFX0102
% git --no-pager grep IFX0102 drivers/char/tpm
drivers/char/tpm/tpm_infineon.c: {"IFX0102", 0},
drivers/char/tpm/tpm_tis.c: {"IFX0102", 0}, /* Infineon */
Obviously IFX0102 was added to the HID table for the TCG TIS driver by
mistake.
Fixes: 93e1b7d42e1e ("[PATCH] tpm: add HID module parameter")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=203877
Cc: stable@vger.kernel.org
Cc: Kylene Jo Hall <kjhall@us.ibm.com>
Reported-by: Ferry Toth: <ferry.toth@elsinga.info>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
|
|
On a Chromebook I'm working on I noticed a big (~1 second) delay
during bootup where nothing was happening. Right around this big
delay there were messages about the TPM:
[ 2.311352] tpm_tis_spi spi0.0: TPM ready IRQ confirmed on attempt 2
[ 3.332790] tpm_tis_spi spi0.0: Cr50 firmware version: ...
I put a few printouts in and saw that tpm_tis_spi_init() (specifically
tpm_chip_register() in that function) was taking the lion's share of
this time, though ~115 ms of the time was in cr50_print_fw_version().
Let's make a one-line change to prefer async probe for tpm_tis_spi.
There's no reason we need to block other drivers from probing while we
load.
NOTES:
* It's possible that other hardware runs through the init sequence
faster than Cr50 and this isn't such a big problem for them.
However, even if they are faster they are still doing _some_
transfers over a SPI bus so this should benefit everyone even if to
a lesser extent.
* It's possible that there are extra delays in the code that could be
optimized out. I didn't dig since once I enabled async probe they
no longer impacted me.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
|
|
The tpm2_get_cc_attrs_tbl() call will result in TPM commands being issued,
which will need the use of the internal command/response buffer. But,
we're issuing this *before* we've waited to make sure that buffer is
allocated.
This can result in intermittent failures to probe if the hypervisor / TPM
implementation doesn't respond quickly enough. I find it fails almost
every time with an 8 vcpu guest under KVM with software emulated TPM.
To fix it, just move the tpm2_get_cc_attrs_tlb() call after the
existing code to wait for initialization, which will ensure the buffer
is allocated.
Fixes: 18b3670d79ae9 ("tpm: ibmvtpm: Add support for TPM2")
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
|
|
Trivial fix, the spelling of "drescription" is incorrect
in function comment.
Fix this.
Signed-off-by: Binbin Zhou <zhoubinbin@uniontech.com>
Acked-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
|
|
Found by smatch:
drivers/char/tpm/tpm_tis_core.c:1088 tpm_tis_core_init() warn:
variable dereferenced before check 'chip->ops' (see line 979)
'chip->ops' is assigned in the beginning of function
in tpmm_chip_alloc->tpm_chip_alloc
and is used before first possible goto to error path.
Signed-off-by: Vasily Averin <vvs@virtuozzo.com>
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
|
|
During flow control we are just reading from the TPM, yet our spi_xfer
has the tx_buf and rx_buf both non-NULL which means we're requesting a
full duplex transfer.
SPI is always somewhat of a full duplex protocol anyway and in theory
the other side shouldn't really be looking at what we're sending it
during flow control, but it's still a bit ugly to be sending some
"random" data when we shouldn't.
The default tpm_tis_spi_flow_control() tries to address this by
setting 'phy->iobuf[0] = 0'. This partially avoids the problem of
sending "random" data, but since our tx_buf and rx_buf both point to
the same place I believe there is the potential of us sending the
TPM's previous byte back to it if we hit the retry loop.
Another flow control implementation, cr50_spi_flow_control(), doesn't
address this at all.
Let's clean this up and just make the tx_buf NULL before we call
flow_control(). Not only does this ensure that we're not sending any
"random" bytes but it also possibly could make the SPI controller
behave in a slightly more optimal way.
NOTE: no actual observed problems are fixed by this patch--it's was
just made based on code inspection.
Signed-off-by: Douglas Anderson <dianders@chromium.org>
Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
|
|
It has been reported that some TIS based TPMs are giving unexpected
errors when using the O_NONBLOCK path of the TPM device. The problem
is that some TPMs don't like it when you get and then relinquish a
locality (as the tpm_try_get_ops()/tpm_put_ops() pair does) without
sending a command. This currently happens all the time in the
O_NONBLOCK write path. Fix this by moving the tpm_try_get_ops()
further down the code to after the O_NONBLOCK determination is made.
This is safe because the priv->buffer_mutex still protects the priv
state being modified.
BugLink: https://bugzilla.kernel.org/show_bug.cgi?id=206275
Fixes: d23d12484307 ("tpm: fix invalid locking in NONBLOCKING mode")
Reported-by: Mario Limonciello <Mario.Limonciello@dell.com>
Tested-by: Alex Guzman <alex@guzman.io>
Cc: stable@vger.kernel.org
Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com>
|
|
When building very large kernels, the logic that emits replacement
sequences for alternatives fails when relative branches are present
in the code that is emitted into the .altinstr_replacement section
and patched in at the original site and fixed up. The reason is that
the linker will insert veneers if relative branches go out of range,
and due to the relative distance of the .altinstr_replacement from
the .text section where its branch targets usually live, veneers
may be emitted at the end of the .altinstr_replacement section, with
the relative branches in the sequence pointed at the veneers instead
of the actual target.
The alternatives patching logic will attempt to fix up the branch to
point to its original target, which will be the veneer in this case,
but given that the patch site is likely to be far away as well, it
will be out of range and so patching will fail. There are other cases
where these veneers are problematic, e.g., when the target of the
branch is in .text while the patch site is in .init.text, in which
case putting the replacement sequence inside .text may not help either.
So let's use subsections to emit the replacement code as closely as
possible to the patch site, to ensure that veneers are only likely to
be emitted if they are required at the patch site as well, in which
case they will be in range for the replacement sequence both before
and after it is transported to the patch site.
This will prevent alternative sequences in non-init code from being
released from memory after boot, but this is tolerable given that the
entire section is only 512 KB on an allyesconfig build (which weighs in
at 500+ MB for the entire Image). Also, note that modules today carry
the replacement sequences in non-init sections as well, and any of
those that target init code will be emitted into init sections after
this change.
This fixes an early crash when booting an allyesconfig kernel on a
system where any of the alternatives sequences containing relative
branches are activated at boot (e.g., ARM64_HAS_PAN on TX2)
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Andre Przywara <andre.przywara@arm.com>
Cc: Dave P Martin <dave.martin@arm.com>
Link: https://lore.kernel.org/r/20200630081921.13443-1-ardb@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|