| Commit message (Collapse) | Author | Files | Lines |
|
tick_broadcast_setup_oneshot() accesses tick_next_period twice without any
serialization. This is wrong in two aspects:
- Reading it twice might make the broadcast data inconsistent if the
variable is updated concurrently.
- On 32bit systems the access might see an partial update
Protect it with jiffies_lock. That's safe as none of the callchains leading
up to this function can create a lock ordering violation:
timer interrupt
run_local_timers()
hrtimer_run_queues()
hrtimer_switch_to_hres()
tick_init_highres()
tick_switch_to_oneshot()
tick_broadcast_switch_to_oneshot()
or
tick_check_oneshot_change()
tick_nohz_switch_to_nohz()
tick_switch_to_oneshot()
tick_broadcast_switch_to_oneshot()
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201117132006.061341507@linutronix.de
|
|
The hrtimer_get_remaining() markup is documenting, instead,
__hrtimer_get_remaining(), as it is placed at the C file.
In order to properly document it, a kernel-doc markup is needed together
with the function prototype. So, add a new one, while preserving the
existing one, just fixing the function name.
The hrtimer_is_queued prototype has a typo: it is using
'=' instead of '-' to split: identifier - description
as required by kernel-doc markup.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/9dc87808c2fd07b7e050bafcd033c5ef05808fea.1605521731.git.mchehab+huawei@kernel.org
|
|
No users outside of the timer code. Move the caller below this function to
avoid a pointless forward declaration.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
The kernel-doc parser complains:
kernel/time/timekeeping.c:1543: warning: Function parameter or member
'ts' not described in 'read_persistent_clock64'
kernel/time/timekeeping.c:764: warning: Function parameter or member
'tk' not described in 'timekeeping_forward_now'
kernel/time/timekeeping.c:1331: warning: Function parameter or member
'ts' not described in 'timekeeping_inject_offset'
kernel/time/timekeeping.c:1331: warning: Excess function parameter 'tv'
description in 'timekeeping_inject_offset'
Add the missing parameter documentations and rename the 'tv' parameter of
timekeeping_inject_offset() to 'ts' so it matches the implemention.
[ tglx: Reworded a few docs and massaged changelog ]
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/1605252275-63652-5-git-send-email-alex.shi@linux.alibaba.com
|
|
Address the following kernel-doc markup warnings:
kernel/time/timekeeping.c:1563: warning: Function parameter or member
'wall_time' not described in 'read_persistent_wall_and_boot_offset'
kernel/time/timekeeping.c:1563: warning: Function parameter or member
'boot_offset' not described in 'read_persistent_wall_and_boot_offset'
The parameters are described but miss the leading '@' and the colon after
the parameter names.
[ tglx: Massaged changelog ]
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/1605252275-63652-6-git-send-email-alex.shi@linux.alibaba.com
|
|
The kernel-doc parser complains about:
kernel/time/timekeeping.c:651: warning: Function parameter or member
'nb' not described in 'pvclock_gtod_register_notifier'
kernel/time/timekeeping.c:670: warning: Function parameter or member
'nb' not described in 'pvclock_gtod_unregister_notifier'
Add the missing parameter explanations.
[ tglx: Massaged changelog ]
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/1605252275-63652-3-git-send-email-alex.shi@linux.alibaba.com
|
|
Alex reported the following warning:
kernel/time/timekeeping.c:464: warning: Function parameter or member
'tkf' not described in '__ktime_get_fast_ns'
which is not entirely correct because the documented function is
ktime_get_mono_fast_ns() which does not have a parameter, but the
kernel-doc parser looks at the function declaration which follows the
comment and complains about the missing parameter documentation.
Aside of that the documentation for the rest of the NMI safe accessors is
either incomplete or missing.
- Move the function documentation to the right place
- Fixup the references and inconsistencies
- Add the missing documentation for ktime_get_raw_fast_ns()
Reported-by: Alex Shi <alex.shi@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
|
Address the following warning:
kernel/time/timekeeping.c:415: warning: Function parameter or member
'tkf' not described in 'update_fast_timekeeper'
[ tglx: Remove the bogus ktime_get_mono_fast_ns() part ]
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/1605252275-63652-2-git-send-email-alex.shi@linux.alibaba.com
|
|
Various static functions in the timekeeping code have function comments
which pretend to be kernel-doc, but are incomplete and trigger parser
warnings.
As these functions are local to the timekeeping core code there is no need
to expose them via kernel-doc.
Remove the double star kernel-doc marker and remove excess newlines.
[ tglx: Massaged changelog and removed excess newlines ]
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/1605252275-63652-4-git-send-email-alex.shi@linux.alibaba.com
|
|
Address these kernel-doc warnings:
kernel/time/timeconv.c:79: warning: Function parameter or member
'totalsecs' not described in 'time64_to_tm'
kernel/time/timeconv.c:79: warning: Function parameter or member
'offset' not described in 'time64_to_tm'
kernel/time/timeconv.c:79: warning: Function parameter or member
'result' not described in 'time64_to_tm'
The parameters are described but lack colons after the parameter name.
[ tglx: Massaged changelog ]
Signed-off-by: Alex Shi <alex.shi@linux.alibaba.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/1605252275-63652-1-git-send-email-alex.shi@linux.alibaba.com
|
|
PREEMPT_RT does not spin and wait until a running timer completes its
callback but instead it blocks on a sleeping lock to prevent a livelock in
the case that the task waiting for the callback completion preempted the
callback.
This cannot be done for timers flagged with TIMER_IRQSAFE. These timers can
be canceled from an interrupt disabled context even on RT kernels.
The expiry callback of such timers is invoked with interrupts disabled so
there is no need to use the expiry lock mechanism because obviously the
callback cannot be preempted even on RT kernels.
Do not use the timer_base::expiry_lock mechanism when waiting for a running
callback to complete if the timer is flagged with TIMER_IRQSAFE.
Also add a lockdep assertion for RT kernels to validate that the expiry
lock mechanism is always invoked in preemptible context.
Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201103190937.hga67rqhvknki3tp@linutronix.de
|
|
Use the "%ps" printk format string to resolve symbol names.
This works on all platforms, including ia64, ppc64 and parisc64 on which
one needs to dereference pointers to function descriptors instead of
function pointers.
Signed-off-by: Helge Deller <deller@gmx.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201104163401.GA3984@ls3530.fritz.box
|
|
Use the new api and associate the seqcounter to the jiffies_lock enabling
lockdep support - although for this particular case the write-side locking
and non-preemptibility are quite obvious.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20201021190749.19363-1-dave@stgolabs.net
|
|
|
|
Use a more generic form for __section that requires quotes to avoid
complications with clang and gcc differences.
Remove the quote operator # from compiler_attributes.h __section macro.
Convert all unquoted __section(foo) uses to quoted __section("foo").
Also convert __attribute__((section("foo"))) uses to __section("foo")
even if the __attribute__ has multiple list entry forms.
Conversion done using the script at:
https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.pl
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@gooogle.com>
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
tid_addr is not a "pointer to (pointer to int in userspace)"; it is in
fact a "pointer to (pointer to int in userspace) in userspace". So
sparse rightfully complains about passing a kernel pointer to
put_user().
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Commit 453431a54934 ("mm, treewide: rename kzfree() to
kfree_sensitive()") renamed kzfree() to kfree_sensitive(),
but it left a compatibility definition of kzfree() to avoid
being too disruptive.
Since then a few more instances of kzfree() have slipped in.
Just get rid of them and remove the compatibility definition
once and for all.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
If set, use the environment variable GIT_DIR to change the default .git
location of the kernel git tree.
If GIT_DIR is unset, keep using the current ".git" default.
Link: https://lkml.kernel.org/r/c5e23b45562373d632fccb8bc04e563abba4dd1d.camel@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
registering the ACPI i2c devs
Commit 21653a4181ff ("i2c: core: Call i2c_acpi_install_space_handler()
before i2c_acpi_register_devices()")'s intention was to only move the
acpi_install_address_space_handler() call to the point before where
the ACPI declared i2c-children of the adapter where instantiated by
i2c_acpi_register_devices().
But i2c_acpi_install_space_handler() had a call to
acpi_walk_dep_device_list() hidden (that is I missed it) at the end
of it, so as an unwanted side-effect now acpi_walk_dep_device_list()
was also being called before i2c_acpi_register_devices().
Move the acpi_walk_dep_device_list() call to the end of
i2c_acpi_register_devices(), so that it is once again called *after*
the i2c_client-s hanging of the adapter have been created.
This fixes the Microsoft Surface Go 2 hanging at boot.
Fixes: 21653a4181ff ("i2c: core: Call i2c_acpi_install_space_handler() before i2c_acpi_register_devices()")
Link: https://bugzilla.kernel.org/show_bug.cgi?id=209627
Reported-by: Rainer Finke <rainer@finke.cc>
Reported-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Suggested-by: Maximilian Luz <luzmaximilian@gmail.com>
Tested-by: Kieran Bingham <kieran.bingham@ideasonboard.com>
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Wolfram Sang <wsa@kernel.org>
|
|
Given that this code is new, let's add a selftest for it as well.
It doesn't rely on fixed sets, instead it picks 1024 numbers and
verifies that they're not more correlated than desired.
Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/
Cc: George Spelvin <lkml@sdf.org>
Cc: Amit Klein <aksecurity@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: tytso@mit.edu
Cc: Florian Westphal <fw@strlen.de>
Cc: Marc Plumb <lkml.mplumb@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
|
|
With the removal of the interrupt perturbations in previous random32
change (random32: make prandom_u32() output unpredictable), the PRNG
has become 100% deterministic again. While SipHash is expected to be
way more robust against brute force than the previous Tausworthe LFSR,
there's still the risk that whoever has even one temporary access to
the PRNG's internal state is able to predict all subsequent draws till
the next reseed (roughly every minute). This may happen through a side
channel attack or any data leak.
This patch restores the spirit of commit f227e3ec3b5c ("random32: update
the net random state on interrupt and activity") in that it will perturb
the internal PRNG's statee using externally collected noise, except that
it will not pick that noise from the random pool's bits nor upon
interrupt, but will rather combine a few elements along the Tx path
that are collectively hard to predict, such as dev, skb and txq
pointers, packet length and jiffies values. These ones are combined
using a single round of SipHash into a single long variable that is
mixed with the net_rand_state upon each invocation.
The operation was inlined because it produces very small and efficient
code, typically 3 xor, 2 add and 2 rol. The performance was measured
to be the same (even very slightly better) than before the switch to
SipHash; on a 6-core 12-thread Core i7-8700k equipped with a 40G NIC
(i40e), the connection rate dropped from 556k/s to 555k/s while the
SYN cookie rate grew from 5.38 Mpps to 5.45 Mpps.
Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/
Cc: George Spelvin <lkml@sdf.org>
Cc: Amit Klein <aksecurity@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: tytso@mit.edu
Cc: Florian Westphal <fw@strlen.de>
Cc: Marc Plumb <lkml.mplumb@gmail.com>
Tested-by: Sedat Dilek <sedat.dilek@gmail.com>
Signed-off-by: Willy Tarreau <w@1wt.eu>
|
|
Non-cryptographic PRNGs may have great statistical properties, but
are usually trivially predictable to someone who knows the algorithm,
given a small sample of their output. An LFSR like prandom_u32() is
particularly simple, even if the sample is widely scattered bits.
It turns out the network stack uses prandom_u32() for some things like
random port numbers which it would prefer are *not* trivially predictable.
Predictability led to a practical DNS spoofing attack. Oops.
This patch replaces the LFSR with a homebrew cryptographic PRNG based
on the SipHash round function, which is in turn seeded with 128 bits
of strong random key. (The authors of SipHash have *not* been consulted
about this abuse of their algorithm.) Speed is prioritized over security;
attacks are rare, while performance is always wanted.
Replacing all callers of prandom_u32() is the quick fix.
Whether to reinstate a weaker PRNG for uses which can tolerate it
is an open question.
Commit f227e3ec3b5c ("random32: update the net random state on interrupt
and activity") was an earlier attempt at a solution. This patch replaces
it.
Reported-by: Amit Klein <aksecurity@gmail.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "Jason A. Donenfeld" <Jason@zx2c4.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: tytso@mit.edu
Cc: Florian Westphal <fw@strlen.de>
Cc: Marc Plumb <lkml.mplumb@gmail.com>
Fixes: f227e3ec3b5c ("random32: update the net random state on interrupt and activity")
Signed-off-by: George Spelvin <lkml@sdf.org>
Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/
[ willy: partial reversal of f227e3ec3b5c; moved SIPROUND definitions
to prandom.h for later use; merged George's prandom_seed() proposal;
inlined siprand_u32(); replaced the net_rand_state[] array with 4
members to fix a build issue; cosmetic cleanups to make checkpatch
happy; fixed RANDOM32_SELFTEST build ]
Signed-off-by: Willy Tarreau <w@1wt.eu>
|
|
During shutdown the IOAPIC trigger mode is reset to edge triggered
while the vfio-pci INTx is still registered with a resampler.
This allows us to get into an infinite loop:
ioapic_set_irq
-> ioapic_lazy_update_eoi
-> kvm_ioapic_update_eoi_one
-> kvm_notify_acked_irq
-> kvm_notify_acked_gsi
-> (via irq_acked fn ptr) irqfd_resampler_ack
-> kvm_set_irq
-> (via set fn ptr) kvm_set_ioapic_irq
-> kvm_ioapic_set_irq
-> ioapic_set_irq
Commit 8be8f932e3db ("kvm: ioapic: Restrict lazy EOI update to
edge-triggered interrupts", 2020-05-04) acknowledges that this recursion
loop exists and tries to avoid it at the call to ioapic_lazy_update_eoi,
but at this point the scenario is already set, we have an edge interrupt
with resampler on the same gsi.
Fortunately, the only user of irq ack notifiers (in addition to resamplefd)
is i8254 timer interrupt reinjection. These are edge-triggered, so in
principle they would need the call to kvm_ioapic_update_eoi_one from
ioapic_lazy_update_eoi, but they already disable AVIC(*), so they don't
need the lazy EOI behavior. Therefore, remove the call to
kvm_ioapic_update_eoi_one from ioapic_lazy_update_eoi.
This fixes CVE-2020-27152. Note that this issue cannot happen with
SR-IOV assigned devices because virtual functions do not have INTx,
only MSI.
Fixes: f458d039db7e ("kvm: ioapic: Lazy update IOAPIC EOI")
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Tested-by: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
allyesconfig results in:
ld: drivers/block/paride/paride.o: in function `pi_init':
(.text+0x1340): multiple definition of `pi_init'; arch/x86/kvm/vmx/posted_intr.o:posted_intr.c:(.init.text+0x0): first defined here
make: *** [Makefile:1164: vmlinux] Error 1
because commit:
commit 8888cdd0996c2d51cd417f9a60a282c034f3fa28
Author: Xiaoyao Li <xiaoyao.li@intel.com>
Date: Wed Sep 23 11:31:11 2020 -0700
KVM: VMX: Extract posted interrupt support to separate files
added another pi_init(), though one already existed in the paride code.
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Replace a modulo operator with the more common pattern for computing the
gfn "offset" of a huge page to fix an i386 build error.
arch/x86/kvm/mmu/tdp_mmu.c:212: undefined reference to `__umoddi3'
In fact, almost all of tdp_mmu.c can be elided on 32-bit builds, but
that is a much larger patch.
Fixes: 2f2fad0897cb ("kvm: x86/mmu: Add functions to handle changed TDP SPTEs")
Reported-by: Daniel Díaz <daniel.diaz@linaro.org>
Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
Message-Id: <20201024031150.9318-1-sean.j.christopherson@intel.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
To 2.29
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Quoting https://gcc.gnu.org/onlinedocs/gcc/Local-Register-Variables.html:
You can define a local register variable and associate it with a
specified register...
The only supported use for this feature is to specify registers for
input and output operands when calling Extended asm (see Extended
Asm). This may be necessary if the constraints for a particular
machine don't provide sufficient control to select the desired
register.
On 32-bit x86, this is used to ensure that gcc will put an 8-byte value
into the %edx:%eax pair, while all other cases will just use the single
register %eax (%rax on x86-64). While the _ASM_AX actually just expands
to "%eax", note this comment next to get_user() which does something
very similar:
* The use of _ASM_DX as the register specifier is a bit of a
* simplification, as gcc only cares about it as the starting point
* and not size: for a 64-bit value it will use %ecx:%edx on 32 bits
* (%ecx being the next register in gcc's x86 register sequence), and
* %rdx on 64 bits.
However, getting this to work requires that there is no code between the
assignment to the local register variable and its use as an input to the
asm() which can possibly clobber any of the registers involved -
including evaluation of the expressions making up other inputs.
In the current code, the ptr expression used directly as an input may
cause such code to be emitted. For example, Sean Christopherson
observed that with KASAN enabled and ptr being current->set_child_tid
(from chedule_tail()), the load of current->set_child_tid causes a call
to __asan_load8() to be emitted immediately prior to the __put_user_4
call, and Naresh Kamboju reports that various mmstress tests fail on
KASAN-enabled builds.
It's also possible to synthesize a broken case without KASAN if one uses
"foo()" as the ptr argument, with foo being some "extern u64 __user
*foo(void);" (though I don't know if that appears in real code).
Fix it by making sure ptr gets evaluated before the assignment to
__val_pu, and add a comment that __val_pu must be the last thing
computed before the asm() is entered.
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Fixes: d55564cfc222 ("x86: Make __put_user() generate an out-of-line call")
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
Add some structures and defines that were recently added to
the protocol documentation (see MS-FSCC sections 2.3.29-2.3.34).
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
Fix two unused variables in commit
"add support for stat of WSL reparse points for special file types"
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
This is needed so when mounting to Windows we do not
misinterpret various special files created by Linux (WSL) as symlinks.
An earlier patch addressed readdir. This patch fixes stat (getattr).
With this patch:
File: /mnt1/char
Size: 0 Blocks: 0 IO Block: 16384 character special file
Device: 34h/52d Inode: 844424930132069 Links: 1 Device type: 0,0
Access: (0755/crwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2020-10-21 17:46:51.839458900 -0500
Modify: 2020-10-21 17:46:51.839458900 -0500
Change: 2020-10-21 18:30:39.797358800 -0500
Birth: -
File: /mnt1/fifo
Size: 0 Blocks: 0 IO Block: 16384 fifo
Device: 34h/52d Inode: 1125899906842722 Links: 1
Access: (0755/prwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2020-10-21 16:21:37.259249700 -0500
Modify: 2020-10-21 16:21:37.259249700 -0500
Change: 2020-10-21 18:30:39.797358800 -0500
Birth: -
File: /mnt1/block
Size: 0 Blocks: 0 IO Block: 16384 block special file
Device: 34h/52d Inode: 844424930132068 Links: 1 Device type: 0,0
Access: (0755/brwxr-xr-x) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2020-10-21 17:10:47.913103200 -0500
Modify: 2020-10-21 17:10:47.913103200 -0500
Change: 2020-10-21 18:30:39.796725500 -0500
Birth: -
without the patch all show up incorrectly as symlinks with annoying "operation not supported error also returned"
File: /mnt1/charstat: cannot read symbolic link '/mnt1/char': Operation not supported
Size: 0 Blocks: 0 IO Block: 16384 symbolic link
Device: 34h/52d Inode: 844424930132069 Links: 1
Access: (0000/l---------) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2020-10-21 17:46:51.839458900 -0500
Modify: 2020-10-21 17:46:51.839458900 -0500
Change: 2020-10-21 18:30:39.797358800 -0500
Birth: -
File: /mnt1/fifostat: cannot read symbolic link '/mnt1/fifo': Operation not supported
Size: 0 Blocks: 0 IO Block: 16384 symbolic link
Device: 34h/52d Inode: 1125899906842722 Links: 1
Access: (0000/l---------) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2020-10-21 16:21:37.259249700 -0500
Modify: 2020-10-21 16:21:37.259249700 -0500
Change: 2020-10-21 18:30:39.797358800 -0500
Birth: -
File: /mnt1/blockstat: cannot read symbolic link '/mnt1/block': Operation not supported
Size: 0 Blocks: 0 IO Block: 16384 symbolic link
Device: 34h/52d Inode: 844424930132068 Links: 1
Access: (0000/l---------) Uid: ( 0/ root) Gid: ( 0/ root)
Access: 2020-10-21 17:10:47.913103200 -0500
Modify: 2020-10-21 17:10:47.913103200 -0500
Change: 2020-10-21 18:30:39.796725500 -0500
Signed-off-by: Steve French <stfrench@microsoft.com>
Reviewed-by: Ronnie Sahlberg <lsahlber@redhat.com>
|
|
I tested this driver on my HP PA-RISC C3000 workstation and it does
work with the built-in TEAC CD-532E-B CD-ROM drive.
So drop the TODO item and adjust the file header.
Signed-off-by: Helge Deller <deller@gmx.de>
|
|
Some functions have different names between their prototypes
and the kernel-doc markup.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Fix a typo:
blk_mq_run_hw_queue -> blk_mq_run_hw_queues
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
The commit 75ae04206a4d ("parisc: Define O_NONBLOCK to become
000200000") changed the O_NONBLOCK constant to have only one bit set
(like all other architectures). This change broke some existing
userspace code (e.g. udevadm, systemd-udevd, elogind) which called
specific syscalls which do strict value checking on their flag
parameter.
This patch adds wrapper functions for the relevant syscalls. The
wrappers masks out any old invalid O_NONBLOCK flags, reports in the
syslog if the old O_NONBLOCK value was used and then calls the target
syscall with the new O_NONBLOCK value.
Fixes: 75ae04206a4d ("parisc: Define O_NONBLOCK to become 000200000")
Signed-off-by: Helge Deller <deller@gmx.de>
Tested-by: Meelis Roos <mroos@linux.ee>
Tested-by: Jeroen Roovers <jer@xs4all.nl>
|
|
Apply the outstanding statfs changes in the journal head to the
master statfs file. Zero out the local statfs file for good measure.
Previously, statfs updates would be read in from the local statfs inode and
synced to the master statfs inode during recovery.
We now use the statfs updates in the journal head to update the master statfs
inode instead of reading in from the local statfs inode. To preserve backward
compatibility with kernels that can't do this, we still need to keep the
local statfs inode up to date by writing changes to it. At some point in the
future, we can do away with the local statfs inodes altogether and keep the
statfs changes solely in the journal.
Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
We need to lookup the master statfs inode and the local statfs
inodes earlier in the mount process (in init_journal) so journal
recovery can use them when it attempts to recover the statfs info.
We lookup all the local statfs inodes and store them in a linked
list to allow a node to recover statfs info for other nodes in the
cluster.
Signed-off-by: Abhi Das <adas@redhat.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
We've had several complaints about a 10s reconnect delay (the default)
when there was an error while there is connectivity to a subsystem.
The max_reconnects and reconnect_delay are set in common code prior to
calling the transport to create the controller.
This change checks if the default reconnect delay is being used, and if
so, it adjusts it to a shorter period (2s) for the nvme-fc transport.
It does so by calculating the controller loss tmo window, changing the
value of the reconnect delay, and then recalculating the maximum number
of reconnect attempts allowed.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
On reconnect, the code currently does not freeze the controller before
possibly updating the number hw queues for the controller.
Add the freeze before updating the number of hw queues. Note: the queues
are already started and remain started through the reconnect.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The loop that backs out of hw io queue creation continues through index
0, which corresponds to the admin queue as well.
Fix the loop so it only proceeds through indexes 1..n which correspond to
I/O queues.
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
Currently, an I/O timeout unconditionally invokes
nvme_fc_error_recovery() which checks for LIVE or CONNECTING state. If
live, the routine resets the controller which initiates a reconnect -
which is valid. If CONNECTING, err_work is scheduled. Err_work then
calls the terminate_io routine, which also checks for CONNECTING and
noops any further action on outstanding I/O. The result is nothing
happened to the timed out io. As such, if the command was dropped on
the wire, it will never timeout / complete, and the connect process
will hang.
Change the behavior of the io timeout routine to unconditionally abort
the I/O. I/O completion handling will note that an io failed due to an
abort and will terminate the connection / association as needed. If the
abort was unable to happen, continue with a call to
nvme_fc_error_recovery(). To ensure something different happens in
nvme_fc_error_recovery() rework it so at it will abort all I/Os on the
association to force a failure.
As I/O aborts now may occur outside of delete_association, counting for
completion must be wary and only count those aborted during
delete_association when TERMIO is set on the controller.
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
|
|
The kernel boot parameter xen.fifo_events isn't listed in
Documentation/admin-guide/kernel-parameters.txt. Add it.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Link: https://lore.kernel.org/r/20201022094907.28560-6-jgross@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
|
Unmasking an event channel with fifo events channels being used can
require a hypercall to be made, so try to avoid that by checking
whether the event channel was really masked.
Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Link: https://lore.kernel.org/r/20201022094907.28560-5-jgross@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
|
xen_debug_interrupt() is specific to 2-level event handling. So don't
register it with fifo event handling being active.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Link: https://lore.kernel.org/r/20201022094907.28560-4-jgross@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
|
The struct irq_info of Xen's event handling is used only for two
evtchn_ops functions outside of events_base.c. Those two functions
can easily be switched to avoid that usage.
This allows to make struct irq_info and its related access functions
private to events_base.c.
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Link: https://lore.kernel.org/r/20201022094907.28560-3-jgross@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
|
With the switch to the lateeoi model for interdomain event channels
some functions are no longer in use. Remove them.
Suggested-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Juergen Gross <jgross@suse.com>
Reviewed-by: Jan Beulich <jbeulich@suse.com>
Link: https://lore.kernel.org/r/20201022094907.28560-2-jgross@suse.com
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
|
|
Document the new dma_alloc_pages and dma_free_pages APIs, and fix
up the documentation for dma_alloc_noncoherent and dma_free_noncoherent.
Reported-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
|
|
When KVM maps a largepage backed region at a lower level in order to
make it executable (i.e. NX large page shattering), it reduces the TLB
performance of that region. In order to avoid making this degradation
permanent, KVM must periodically reclaim shattered NX largepages by
zapping them and allowing them to be rebuilt in the page fault handler.
With this patch, the TDP MMU does not respect KVM's rate limiting on
reclaim. It traverses the entire TDP structure every time. This will be
addressed in a future patch.
Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.
This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-21-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
Direct roots don't have a write flooding count because the guest can't
affect that paging structure. Thus there's no need to clear the write
flooding count on a fast CR3 switch for direct roots.
Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.
This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-20-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
In order to support MMIO, KVM must be able to walk the TDP paging
structures to find mappings for a given GFN. Support this walk for
the TDP MMU.
Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.
This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
v2: Thanks to Dan Carpenter and kernel test robot for finding that root
was used uninitialized in get_mmio_spte.
Signed-off-by: Ben Gardon <bgardon@google.com>
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Message-Id: <20201014182700.2888246-19-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
To support nested virtualization, KVM will sometimes need to write
protect pages which are part of a shadowed paging structure or are not
writable in the shadowed paging structure. Add a function to write
protect GFN mappings for this purpose.
Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell
machine. This series introduced no new failures.
This series can be viewed in Gerrit at:
https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538
Signed-off-by: Ben Gardon <bgardon@google.com>
Message-Id: <20201014182700.2888246-18-bgardon@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|