| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux
Pull Devicetree updates from Rob Herring:
- Fix possible memory leak in reserved-memory failure case
- Support for DMA parent bus which are not a parent node
- Clang -Wunsequenced fix
- Remove some unnecessary prints on memory alloc failures
- Various printk msg and comment fixes
- Update DT schema tools repository location
- Convert simple-framebuffer binding to DT schema
- Bindings for isl68137 and ir38064 trivial devices
- New documentation on binding do's and don't's for binding writers to
ignore
* tag 'devicetree-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (22 commits)
of: unittest: Remove error printing on OOM
of: irq: Remove WARN_ON() for kzalloc() failure
dt-bindings: pinctrl: fix bias-pull,up typo
dt-bindings: Update schema project location to devicetree.org github group
of: fix clang -Wunsequenced for be32_to_cpu()
of/device.c: fix the wrong comments
dt-bindings: Add isl68137 as a trivial device
dt-bindings: Add ir38064 as a trivial device
of: del redundant type conversion
dt-bindings: mfd: axp20x: Add fallback for axp805
of: Improve of_phandle_iterator_next() error message
dt-bindings: connector: Spelling mistake
dt-bindings: Add schemas for simple-framebuffer
of: address: Add support for the parent DMA bus
of: address: Retrieve a parent through a callback in __of_translate_address
dt-bindings: bus: Add binding for the Allwinner MBUS controller
dt-bindings: interconnect: Add a dma interconnect name
of: use correct function prototype for of_overlay_fdt_apply()
of: reserved_mem: fix reserve memory leak
of: property: Document that of_graph_get_endpoint_by_regs needs of_node_put
...
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
There is no need to print a backtrace or other error message if
kzalloc(), kmemdup(), or devm_kzalloc() fails, as the memory allocation
core already takes care of that.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Rob Herring <robh@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| | |
There is no need to print a backtrace if kzalloc() fails, as the memory
allocation core already takes care of that.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: Rob Herring <robh@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| | |
This patch fixes a shared typo in several qcom pinctrl dt-bindings.
Signed-off-by: Christian Lamparter <chunkeey@googlemail.com>
Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org>
Signed-off-by: Rob Herring <robh@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The DT schema tools are moving from my personal GH repo to the
devicetree.org group on GH. The new location is here:
https://github.com/devicetree-org/dt-schema.git
The old repo will be kept as a mirror.
Signed-off-by: Rob Herring <robh@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Now, make the loop explicit to avoid clang warning.
./include/linux/of.h:238:37: warning: multiple unsequenced modifications
to 'cell' [-Wunsequenced]
r = (r << 32) | be32_to_cpu(*(cell++));
^~
./include/linux/byteorder/generic.h:95:21: note: expanded from macro
'be32_to_cpu'
^
./include/uapi/linux/byteorder/little_endian.h:40:59: note: expanded
from macro '__be32_to_cpu'
^
./include/uapi/linux/swab.h:118:21: note: expanded from macro '__swab32'
___constant_swab32(x) : \
^
./include/uapi/linux/swab.h:18:12: note: expanded from macro
'___constant_swab32'
(((__u32)(x) & (__u32)0x000000ffUL) << 24) | \
^
Signed-off-by: Phong Tran <tranmanphong@gmail.com>
Reported-by: Nick Desaulniers <ndesaulniers@google.com>
Link: https://github.com/ClangBuiltLinux/linux/issues/460
Suggested-by: David Laight <David.Laight@ACULAB.COM>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Cc: stable@vger.kernel.org
[robh: fix up whitespace]
Signed-off-by: Rob Herring <robh@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
the comments which discribed the input parameters of of_match_device().
the name is changed, so fix it.
Signed-off-by: Jojo Zeng <jojo_zeng@126.com>
Reviewed-by: Frank Rowand <frank.rowand@sony.com>
Signed-off-by: Rob Herring <robh@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The isl68137 is a digital output 7-phrase configurable PWM controller
with an AVSBus interface from Intersil.
Signed-off-by: Patrick Venture <venture@google.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rob Herring <robh@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| | |
The ir38064 is a voltage regulator from Infineon.
Signed-off-by: Patrick Venture <venture@google.com>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rob Herring <robh@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The type of variable l in early_init_dt_scan_chosen is
int, there is no need to convert to int.
Signed-off-by: xiaojiangfeng <xiaojiangfeng@huawei.com>
Reviewed-by: Frank Rowand <frank.rowand@sony.com>
Signed-off-by: Rob Herring <robh@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
axp805 is actually compatible and used with axp806 as fallback.
But it's actually undocumented and trig a warning with checkpatch.
DT compatible string "x-powers,axp805" appears un-documented.
Add this compatible in the dt-bindings documentation.
Signed-off-by: Clément Péron <peron.clem@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Understanding why of_phandle_iterator_next() returns an error can
sometimes be hard to decipher based solely on the error message, a
typical error example is that #foo-cells = <X> and the phandle property
used has a smaller number of cells specified, e.g.:
#thermal-sensor-cells = <1>;
phandle = <&scmi_sensor>
instead of:
phandle <&scmi_sensor 0>;
Instead, make it clear what the expectations are towards debugging
incorrect Device Tree faster:
OF: /thermal-zones/scmi-thermal: #thermal-sensor-cells = 1, found 0 instead
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| | |
Cosmetic change multpile -> multiple.
Fixes: 593aa2b405f9 ("dt-bindings: add bindings for USB physical connector")
Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
Signed-off-by: Rob Herring <robh@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The simple framebuffer is a binding that allows the bootloader to setup a
framebuffer, describe it in the Device Tree for the OS to pick it up and
use it as is.
Replace the current binding by a schema to allow the validation tools to
check those nodes.
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Chen-Yu Tsai <wens@csie.org>
Cc: Hans de Goede <hdegoede@redhat.com>
Cc: Maxime Jourdan <mjourdan@baylibre.com>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: Rob Herring <robh@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Some SoCs have devices that are using a separate bus from the main bus to
perform DMA.
These buses might have some restrictions and/or different mapping than from
the CPU side, so we'd need to express those using the usual dma-ranges, but
using a different DT node than the node's parent.
Now that the generic interconnect bindings are available, we can model an
interconnect with the reserved name "dma-mem" for those use-cases.
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: Rob Herring <robh@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The __of_translate_address function is used to translate the device tree
addresses to physical addresses using the various ranges property to create
the offset.
However, it's shared between the CPU addresses (based on the ranges
property) and the DMA addresses (based on dma-ranges). Since we're going to
add support for a DMA parent node that is not the DT parent node, we need
to change the logic a bit to have a callback function that will retrieve
the parent node we should use.
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: Rob Herring <robh@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The MBUS controller drives the MBUS that other devices in the SoC will
use to perform DMA. It also has a register interface that allows to
monitor and control the bandwidth and priorities for masters on that
bus.
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: Rob Herring <robh@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The current DT bindings assume that the DMA will be performed by the
devices through their parent DT node, and rely on that assumption for the
address translation using dma-ranges.
However, some SoCs have devices that will perform DMA through another bus,
with separate address translation rules. We therefore need to express that
relationship, through the special interconnect name "dma-mem".
Acked-by: Georgi Djakov <georgi.djakov@linaro.org>
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: Rob Herring <robh@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
When CONFIG_OF_OVERLAY is not enabled the fallback stub for
of_overlay_fdt_apply() does not match the prototype for the case when
CONFIG_OF_OVERLAY is enabled. Update the stub to use the correct
function prototype.
Fixes: 39a751a4cb7e ("of: change overlay apply input data from unflattened to FDT")
Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz>
Reviewed-by: Frank Rowand <frank.rowand@sony.com>
Signed-off-by: Rob Herring <robh@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The __reserved_mem_init_node will call region specific reserved memory
init codes, but once all compatibled init codes failed, the memory region
will left in memory.reserved and cause leakage.
Take cma reserve memory DTS for example, if user declare 1MB size,
which is not align to (PAGE_SIZE << max(MAX_ORDER - 1,
pageblock_order)), rmem_cma_setup will return -EINVAL.
Meanwhile, rmem_dma_setup will also return -EINVAL since "reusable"
property is not set. If finally there is no reserved memory init pick up
this memory, kernel will left the 1MB leak in memory.reserved.
This patch will remove this kind of memory from memory.reserved, only
when __reserved_mem_init_node return neither 0 nor -ENOENT.
Signed-off-by: pierre Kuo <vichy.kuo@gmail.com>
Signed-off-by: Rob Herring <robh@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The node returned by of_graph_get_endpoint_by_regs has a reference taken,
and we need to put that reference back when done with the node.
However, the documentation for that node doesn't mention it, so let's make
sure it does.
Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com>
Signed-off-by: Rob Herring <robh@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
A node is always an object (aka a dictionary), so make that explicit for
child node schemas.
A meta-schema update will enforce having 'type' specified.
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Jason Cooper <jason@lakedaemon.net>
Cc: Marc Zyngier <marc.zyngier@arm.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rob Herring <robh@kernel.org>
|
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Devicetree binding reviews have a lot of repeated review comments. Much
of the guidelines aren't written down. This list of do's and don't's is
by no means an exhaustive guide for how to write bindings, but at least
the "rules" are written down in some form.
Signed-off-by: Rob Herring <robh@kernel.org>
|
|\ \
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random
Pull randomness updates from Ted Ts'o:
- initialize the random driver earler
- fix CRNG initialization when we trust the CPU's RNG on NUMA systems
- other miscellaneous cleanups and fixes.
* tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random:
random: add a spinlock_t to struct batched_entropy
random: document get_random_int() family
random: fix CRNG initialization when random.trust_cpu=1
random: move rand_initialize() earlier
random: only read from /dev/random after its pool has received 128 bits
drivers/char/random.c: make primary_crng static
drivers/char/random.c: remove unused stuct poolinfo::poolbits
drivers/char/random.c: constify poolinfo_table
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The per-CPU variable batched_entropy_uXX is protected by get_cpu_var().
This is just a preempt_disable() which ensures that the variable is only
from the local CPU. It does not protect against users on the same CPU
from another context. It is possible that a preemptible context reads
slot 0 and then an interrupt occurs and the same value is read again.
The above scenario is confirmed by lockdep if we add a spinlock:
| ================================
| WARNING: inconsistent lock state
| 5.1.0-rc3+ #42 Not tainted
| --------------------------------
| inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
| ksoftirqd/9/56 [HC0[0]:SC1[1]:HE0:SE0] takes:
| (____ptrval____) (batched_entropy_u32.lock){+.?.}, at: get_random_u32+0x3e/0xe0
| {SOFTIRQ-ON-W} state was registered at:
| _raw_spin_lock+0x2a/0x40
| get_random_u32+0x3e/0xe0
| new_slab+0x15c/0x7b0
| ___slab_alloc+0x492/0x620
| __slab_alloc.isra.73+0x53/0xa0
| kmem_cache_alloc_node+0xaf/0x2a0
| copy_process.part.41+0x1e1/0x2370
| _do_fork+0xdb/0x6d0
| kernel_thread+0x20/0x30
| kthreadd+0x1ba/0x220
| ret_from_fork+0x3a/0x50
…
| other info that might help us debug this:
| Possible unsafe locking scenario:
|
| CPU0
| ----
| lock(batched_entropy_u32.lock);
| <Interrupt>
| lock(batched_entropy_u32.lock);
|
| *** DEADLOCK ***
|
| stack backtrace:
| Call Trace:
…
| kmem_cache_alloc_trace+0x20e/0x270
| ipmi_alloc_recv_msg+0x16/0x40
…
| __do_softirq+0xec/0x48d
| run_ksoftirqd+0x37/0x60
| smpboot_thread_fn+0x191/0x290
| kthread+0xfe/0x130
| ret_from_fork+0x3a/0x50
Add a spinlock_t to the batched_entropy data structure and acquire the
lock while accessing it. Acquire the lock with disabled interrupts
because this function may be used from interrupt context.
Remove the batched_entropy_reset_lock lock. Now that we have a lock for
the data scructure, we can access it from a remote CPU.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Explain what these functions are for and when they offer
an advantage over get_random_bytes().
(We still need documentation on rng_is_initialized(), the
random_ready_callback system, and early boot in general.)
Signed-off-by: George Spelvin <lkml@sdf.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
When the system boots with random.trust_cpu=1 it doesn't initialize the
per-NUMA CRNGs because it skips the rest of the CRNG startup code. This
means that the code from 1e7f583af67b ("random: make /dev/urandom scalable
for silly userspace programs") is not used when random.trust_cpu=1.
crash> dmesg | grep random:
[ 0.000000] random: get_random_bytes called from start_kernel+0x94/0x530 with crng_init=0
[ 0.314029] random: crng done (trusting CPU's manufacturer)
crash> print crng_node_pool
$6 = (struct crng_state **) 0x0
After adding the missing call to numa_crng_init() the per-NUMA CRNGs are
initialized again:
crash> dmesg | grep random:
[ 0.000000] random: get_random_bytes called from start_kernel+0x94/0x530 with crng_init=0
[ 0.314031] random: crng done (trusting CPU's manufacturer)
crash> print crng_node_pool
$1 = (struct crng_state **) 0xffff9a915f4014a0
The call to invalidate_batched_entropy() was also missing. This is
important for architectures like PPC and S390 which only have the
arch_get_random_seed_* functions.
Fixes: 39a8883a2b98 ("random: add a config option to trust the CPU's hwrng")
Signed-off-by: Jon DeVree <nuxi@vault24.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Right now rand_initialize() is run as an early_initcall(), but it only
depends on timekeeping_init() (for mixing ktime_get_real() into the
pools). However, the call to boot_init_stack_canary() for stack canary
initialization runs earlier, which triggers a warning at boot:
random: get_random_bytes called from start_kernel+0x357/0x548 with crng_init=0
Instead, this moves rand_initialize() to after timekeeping_init(), and moves
canary initialization here as well.
Note that this warning may still remain for machines that do not have
UEFI RNG support (which initializes the RNG pools during setup_arch()),
or for x86 machines without RDRAND (or booting without "random.trust=on"
or CONFIG_RANDOM_TRUST_CPU=y).
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Immediately after boot, we allow reads from /dev/random before its
entropy pool has been fully initialized. Fix this so that we don't
allow this until the blocking pool has received 128 bits.
We do this by repurposing the initialized flag in the entropy pool
struct, and use the initialized flag in the blocking pool to indicate
whether it is safe to pull from the blocking pool.
To do this, we needed to rework when we decide to push entropy from the
input pool to the blocking pool, since the initialized flag for the
input pool was used for this purpose. To simplify things, we no
longer use the initialized flag for that purpose, nor do we use the
entropy_total field any more.
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Since the definition of struct crng_state is private to random.c, and
primary_crng is neither declared or used elsewhere, there's no reason
for that symbol to have external linkage.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This field is never used, might as well remove it.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Never modified, might as well be put in .rodata.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Pull fscrypt updates from Ted Ts'o:
"Clean up fscrypt's dcache revalidation support, and other
miscellaneous cleanups"
* tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt:
fscrypt: cache decrypted symlink target in ->i_link
vfs: use READ_ONCE() to access ->i_link
fscrypt: fix race where ->lookup() marks plaintext dentry as ciphertext
fscrypt: only set dentry_operations on ciphertext dentries
fs, fscrypt: clear DCACHE_ENCRYPTED_NAME when unaliasing directory
fscrypt: fix race allowing rename() and link() of ciphertext dentries
fscrypt: clean up and improve dentry revalidation
fscrypt: use READ_ONCE() to access ->i_crypt_info
fscrypt: remove WARN_ON_ONCE() when decryption fails
fscrypt: drop inode argument from fscrypt_get_ctx()
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Path lookups that traverse encrypted symlink(s) are very slow because
each encrypted symlink needs to be decrypted each time it's followed.
This also involves dropping out of rcu-walk mode.
Make encrypted symlinks faster by caching the decrypted symlink target
in ->i_link. The first call to fscrypt_get_symlink() sets it. Then,
the existing VFS path lookup code uses the non-NULL ->i_link to take the
fast path where ->get_link() isn't called, and lookups in rcu-walk mode
remain in rcu-walk mode.
Also set ->i_link immediately when a new encrypted symlink is created.
To safely free the symlink target after an RCU grace period has elapsed,
introduce a new function fscrypt_free_inode(), and make the relevant
filesystems call it just before actually freeing the inode.
Cc: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Use 'READ_ONCE(inode->i_link)' to explicitly support filesystems caching
the symlink target in ->i_link later if it was unavailable at iget()
time, or wasn't easily available. I'll be doing this in fscrypt, to
improve the performance of encrypted symlinks on ext4, f2fs, and ubifs.
->i_link will start NULL and may later be set to a non-NULL value by a
smp_store_release() or cmpxchg_release(). READ_ONCE() is needed on the
read side. smp_load_acquire() is unnecessary because only a data
dependency barrier is required. (Thanks to Al for pointing this out.)
Acked-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
->lookup() in an encrypted directory begins as follows:
1. fscrypt_prepare_lookup():
a. Try to load the directory's encryption key.
b. If the key is unavailable, mark the dentry as a ciphertext name
via d_flags.
2. fscrypt_setup_filename():
a. Try to load the directory's encryption key.
b. If the key is available, encrypt the name (treated as a plaintext
name) to get the on-disk name. Otherwise decode the name
(treated as a ciphertext name) to get the on-disk name.
But if the key is concurrently added, it may be found at (2a) but not at
(1a). In this case, the dentry will be wrongly marked as a ciphertext
name even though it was actually treated as plaintext.
This will cause the dentry to be wrongly invalidated on the next lookup,
potentially causing problems. For example, if the racy ->lookup() was
part of sys_mount(), then the new mount will be detached when anything
tries to access it. This is despite the mountpoint having a plaintext
path, which should remain valid now that the key was added.
Of course, this is only possible if there's a userspace race. Still,
the additional kernel-side race is confusing and unexpected.
Close the kernel-side race by changing fscrypt_prepare_lookup() to also
set the on-disk filename (step 2b), consistent with the d_flags update.
Fixes: 28b4c263961c ("ext4 crypto: revalidate dentry after adding or removing the key")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Plaintext dentries are always valid, so only set fscrypt_d_ops on
ciphertext dentries.
Besides marginally improved performance, this allows overlayfs to use an
fscrypt-encrypted upperdir, provided that all the following are true:
(1) The fscrypt encryption key is placed in the keyring before
mounting overlayfs, and remains while the overlayfs is mounted.
(2) The overlayfs workdir uses the same encryption policy.
(3) No dentries for the ciphertext names of subdirectories have been
created in the upperdir or workdir yet. (Since otherwise
d_splice_alias() will reuse the old dentry with ->d_op set.)
One potential use case is using an ephemeral encryption key to encrypt
all files created or changed by a container, so that they can be
securely erased ("crypto-shredded") after the container stops.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Make __d_move() clear DCACHE_ENCRYPTED_NAME on the source dentry. This
is needed for when d_splice_alias() moves a directory's encrypted alias
to its decrypted alias as a result of the encryption key being added.
Otherwise, the decrypted alias will incorrectly be invalidated on the
next lookup, causing problems such as unmounting a mount the user just
mount()ed there.
Note that we don't have to support arbitrary moves of this flag because
fscrypt doesn't allow dentries with DCACHE_ENCRYPTED_NAME to be the
source or target of a rename().
Fixes: 28b4c263961c ("ext4 crypto: revalidate dentry after adding or removing the key")
Reported-by: Sarthak Kukreti <sarthakkukreti@chromium.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Close some race conditions where fscrypt allowed rename() and link() on
ciphertext dentries that had been looked up just prior to the key being
concurrently added. It's better to return -ENOKEY in this case.
This avoids doing the nonsensical thing of encrypting the names a second
time when searching for the actual on-disk dir entries. It also
guarantees that DCACHE_ENCRYPTED_NAME dentries are never rename()d, so
the dcache won't have support all possible combinations of moving
DCACHE_ENCRYPTED_NAME around during __d_move().
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Make various improvements to fscrypt dentry revalidation:
- Don't try to handle the case where the per-directory key is removed,
as this can't happen without the inode (and dentries) being evicted.
- Flag ciphertext dentries rather than plaintext dentries, since it's
ciphertext dentries that need the special handling.
- Avoid doing unnecessary work for non-ciphertext dentries.
- When revalidating ciphertext dentries, try to set up the directory's
i_crypt_info to make sure the key is really still absent, rather than
invalidating all negative dentries as the previous code did. An old
comment suggested we can't do this for locking reasons, but AFAICT
this comment was outdated and it actually works fine.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
->i_crypt_info starts out NULL and may later be locklessly set to a
non-NULL value by the cmpxchg() in fscrypt_get_encryption_info().
But ->i_crypt_info is used directly, which technically is incorrect.
It's a data race, and it doesn't include the data dependency barrier
needed to safely dereference the pointer on at least one architecture.
Fix this by using READ_ONCE() instead. Note: we don't need to use
smp_load_acquire(), since dereferencing the pointer only requires a data
dependency barrier, which is already included in READ_ONCE(). We also
don't need READ_ONCE() in places where ->i_crypt_info is unconditionally
dereferenced, since it must have already been checked.
Also downgrade the cmpxchg() to cmpxchg_release(), since RELEASE
semantics are sufficient on the write side.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
If decrypting a block fails, fscrypt did a WARN_ON_ONCE(). But WARN is
meant for kernel bugs, which this isn't; this could be hit by fuzzers
using fault injection, for example. Also, there is already a proper
warning message logged in fscrypt_do_page_crypto(), so the WARN doesn't
add much.
Just remove the unnessary WARN.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| |/ /
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The only reason the inode is being passed to fscrypt_get_ctx() is to
verify that the encryption key is available. However, all callers
already ensure this because if we get as far as trying to do I/O to an
encrypted file without the key, there's already a bug.
Therefore, remove this unnecessary argument.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4
Pull ext4 updates from Ted Ts'o:
"Add as a feature case-insensitive directories (the casefold feature)
using Unicode 12.1.
Also, the usual largish number of cleanups and bug fixes"
* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (25 commits)
ext4: export /sys/fs/ext4/feature/casefold if Unicode support is present
ext4: fix ext4_show_options for file systems w/o journal
unicode: refactor the rule for regenerating utf8data.h
docs: ext4.rst: document case-insensitive directories
ext4: Support case-insensitive file name lookups
ext4: include charset encoding information in the superblock
MAINTAINERS: add Unicode subsystem entry
unicode: update unicode database unicode version 12.1.0
unicode: introduce test module for normalized utf8 implementation
unicode: implement higher level API for string handling
unicode: reduce the size of utf8data[]
unicode: introduce code for UTF-8 normalization
unicode: introduce UTF-8 character database
ext4: actually request zeroing of inode table after grow
ext4: cond_resched in work-heavy group loops
ext4: fix use-after-free race with debug_want_extra_isize
ext4: avoid drop reference to iloc.bh twice
ext4: ignore e_value_offs for xattrs with value-in-ea-inode
ext4: protect journal inode's blocks using block_validity
ext4: use BUG() instead of BUG_ON(1)
...
|
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Instead of removing EXT4_MOUNT_JOURNAL_CHECKSUM from s_def_mount_opt as
I assume was intended, all other options were blown away leading to
_ext4_show_options() output being incorrect.
Fixes: 1e381f60dad9 ("ext4: do not allow journal_opts for fs w/o journal")
Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Reviewed-by: Jan Kara <jack@suse.cz>
Cc: stable@kernel.org
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
scripts/mkutf8data is used only when regenerating utf8data.h,
which never happens in the normal kernel build. However, it is
irrespectively built if CONFIG_UNICODE is enabled.
Moreover, there is no good reason for it to reside in the scripts/
directory since it is only used in fs/unicode/.
Hence, move it from scripts/ to fs/unicode/.
In some cases, we bypass build artifacts in the normal build. The
conventional way to do so is to surround the code with ifdef REGENERATE_*.
For example,
- 7373f4f83c71 ("kbuild: add implicit rules for parser generation")
- 6aaf49b495b4 ("crypto: arm,arm64 - Fix random regeneration of S_shipped")
I rewrote the rule in a more kbuild'ish style.
In the normal build, utf8data.h is just shipped from the check-in file.
$ make
[ snip ]
SHIPPED fs/unicode/utf8data.h
CC fs/unicode/utf8-norm.o
CC fs/unicode/utf8-core.o
CC fs/unicode/utf8-selftest.o
AR fs/unicode/built-in.a
If you want to generate utf8data.h based on UCD, put *.txt files into
fs/unicode/, then pass REGENERATE_UTF8DATA=1 from the command line.
The mkutf8data tool will be automatically compiled to generate the
utf8data.h from the *.txt files.
$ make REGENERATE_UTF8DATA=1
[ snip ]
HOSTCC fs/unicode/mkutf8data
GEN fs/unicode/utf8data.h
CC fs/unicode/utf8-norm.o
CC fs/unicode/utf8-core.o
CC fs/unicode/utf8-selftest.o
AR fs/unicode/built-in.a
I renamed the check-in utf8data.h to utf8data.h_shipped so that this
will work for the out-of-tree build.
You can update it based on the latest UCD like this:
$ make REGENERATE_UTF8DATA=1 fs/unicode/
$ cp fs/unicode/utf8data.h fs/unicode/utf8data.h_shipped
Also, I added entries to .gitignore and dontdiff.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Introduces the case-insensitive features on ext4 for system
administrators. Explain the minimum of design decisions that are
important for sysadmins wanting to enable this feature.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This patch implements the actual support for case-insensitive file name
lookups in ext4, based on the feature bit and the encoding stored in the
superblock.
A filesystem that has the casefold feature set is able to configure
directories with the +F (EXT4_CASEFOLD_FL) attribute, enabling lookups
to succeed in that directory in a case-insensitive fashion, i.e: match
a directory entry even if the name used by userspace is not a byte per
byte match with the disk name, but is an equivalent case-insensitive
version of the Unicode string. This operation is called a
case-insensitive file name lookup.
The feature is configured as an inode attribute applied to directories
and inherited by its children. This attribute can only be enabled on
empty directories for filesystems that support the encoding feature,
thus preventing collision of file names that only differ by case.
* dcache handling:
For a +F directory, Ext4 only stores the first equivalent name dentry
used in the dcache. This is done to prevent unintentional duplication of
dentries in the dcache, while also allowing the VFS code to quickly find
the right entry in the cache despite which equivalent string was used in
a previous lookup, without having to resort to ->lookup().
d_hash() of casefolded directories is implemented as the hash of the
casefolded string, such that we always have a well-known bucket for all
the equivalencies of the same string. d_compare() uses the
utf8_strncasecmp() infrastructure, which handles the comparison of
equivalent, same case, names as well.
For now, negative lookups are not inserted in the dcache, since they
would need to be invalidated anyway, because we can't trust missing file
dentries. This is bad for performance but requires some leveraging of
the vfs layer to fix. We can live without that for now, and so does
everyone else.
* on-disk data:
Despite using a specific version of the name as the internal
representation within the dcache, the name stored and fetched from the
disk is a byte-per-byte match with what the user requested, making this
implementation 'name-preserving'. i.e. no actual information is lost
when writing to storage.
DX is supported by modifying the hashes used in +F directories to make
them case/encoding-aware. The new disk hashes are calculated as the
hash of the full casefolded string, instead of the string directly.
This allows us to efficiently search for file names in the htree without
requiring the user to provide an exact name.
* Dealing with invalid sequences:
By default, when a invalid UTF-8 sequence is identified, ext4 will treat
it as an opaque byte sequence, ignoring the encoding and reverting to
the old behavior for that unique file. This means that case-insensitive
file name lookup will not work only for that file. An optional bit can
be set in the superblock telling the filesystem code and userspace tools
to enforce the encoding. When that optional bit is set, any attempt to
create a file name using an invalid UTF-8 sequence will fail and return
an error to userspace.
* Normalization algorithm:
The UTF-8 algorithms used to compare strings in ext4 is implemented
lives in fs/unicode, and is based on a previous version developed by
SGI. It implements the Canonical decomposition (NFD) algorithm
described by the Unicode specification 12.1, or higher, combined with
the elimination of ignorable code points (NFDi) and full
case-folding (CF) as documented in fs/unicode/utf8_norm.c.
NFD seems to be the best normalization method for EXT4 because:
- It has a lower cost than NFC/NFKC (which requires
decomposing to NFD as an intermediary step)
- It doesn't eliminate important semantic meaning like
compatibility decompositions.
Although:
- This implementation is not completely linguistic accurate, because
different languages have conflicting rules, which would require the
specialization of the filesystem to a given locale, which brings all
sorts of problems for removable media and for users who use more than
one language.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Support for encoding is considered an incompatible feature, since it has
potential to create collisions of file names in existing filesystems.
If the feature flag is not enabled, the entire filesystem will operate
on opaque byte sequences, respecting the original behavior.
The s_encoding field stores a magic number indicating the encoding
format and version used globally by file and directory names in the
filesystem. The s_encoding_flags defines policies for using the charset
encoding, like how to handle invalid sequences. The magic number is
mapped to the exact charset table, but the mapping is specific to ext4.
Since we don't have any commitment to support old encodings, the only
encoding I am supporting right now is utf8-12.1.0.
The current implementation prevents the user from enabling encoding and
per-directory encryption on the same filesystem at the same time. The
incompatibility between these features lies in how we do efficient
directory searches when we cannot be sure the encryption of the user
provided fname will match the actual hash stored in the disk without
decrypting every directory entry, because of normalization cases. My
quickest solution is to simply block the concurrent use of these
features for now, and enable it later, once we have a better solution.
Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
|