linux - linux

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge tag 'devicetree-for-5.2' of ↵	Linus Torvalds	2019-05-08	28	-196/+341
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull Devicetree updates from Rob Herring: - Fix possible memory leak in reserved-memory failure case - Support for DMA parent bus which are not a parent node - Clang -Wunsequenced fix - Remove some unnecessary prints on memory alloc failures - Various printk msg and comment fixes - Update DT schema tools repository location - Convert simple-framebuffer binding to DT schema - Bindings for isl68137 and ir38064 trivial devices - New documentation on binding do's and don't's for binding writers to ignore * tag 'devicetree-for-5.2' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (22 commits) of: unittest: Remove error printing on OOM of: irq: Remove WARN_ON() for kzalloc() failure dt-bindings: pinctrl: fix bias-pull,up typo dt-bindings: Update schema project location to devicetree.org github group of: fix clang -Wunsequenced for be32_to_cpu() of/device.c: fix the wrong comments dt-bindings: Add isl68137 as a trivial device dt-bindings: Add ir38064 as a trivial device of: del redundant type conversion dt-bindings: mfd: axp20x: Add fallback for axp805 of: Improve of_phandle_iterator_next() error message dt-bindings: connector: Spelling mistake dt-bindings: Add schemas for simple-framebuffer of: address: Add support for the parent DMA bus of: address: Retrieve a parent through a callback in __of_translate_address dt-bindings: bus: Add binding for the Allwinner MBUS controller dt-bindings: interconnect: Add a dma interconnect name of: use correct function prototype for of_overlay_fdt_apply() of: reserved_mem: fix reserve memory leak of: property: Document that of_graph_get_endpoint_by_regs needs of_node_put ...
\| *	of: unittest: Remove error printing on OOM	Geert Uytterhoeven	2019-05-02	1	-9/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is no need to print a backtrace or other error message if kzalloc(), kmemdup(), or devm_kzalloc() fails, as the memory allocation core already takes care of that. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Rob Herring <robh@kernel.org>
\| *	of: irq: Remove WARN_ON() for kzalloc() failure	Geert Uytterhoeven	2019-05-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	There is no need to print a backtrace if kzalloc() fails, as the memory allocation core already takes care of that. Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Rob Herring <robh@kernel.org>
\| *	dt-bindings: pinctrl: fix bias-pull,up typo	Christian Lamparter	2019-05-02	5	-5/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes a shared typo in several qcom pinctrl dt-bindings. Signed-off-by: Christian Lamparter <chunkeey@googlemail.com> Acked-by: Bjorn Andersson <bjorn.andersson@linaro.org> Signed-off-by: Rob Herring <robh@kernel.org>
\| *	dt-bindings: Update schema project location to devicetree.org github group	Rob Herring	2019-05-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The DT schema tools are moving from my personal GH repo to the devicetree.org group on GH. The new location is here: https://github.com/devicetree-org/dt-schema.git The old repo will be kept as a mirror. Signed-off-by: Rob Herring <robh@kernel.org>
\| *	of: fix clang -Wunsequenced for be32_to_cpu()	Phong Tran	2019-05-01	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now, make the loop explicit to avoid clang warning. ./include/linux/of.h:238:37: warning: multiple unsequenced modifications to 'cell' [-Wunsequenced] r = (r << 32) \| be32_to_cpu(*(cell++)); ^~ ./include/linux/byteorder/generic.h:95:21: note: expanded from macro 'be32_to_cpu' ^ ./include/uapi/linux/byteorder/little_endian.h:40:59: note: expanded from macro '__be32_to_cpu' ^ ./include/uapi/linux/swab.h:118:21: note: expanded from macro '__swab32' ___constant_swab32(x) : \ ^ ./include/uapi/linux/swab.h:18:12: note: expanded from macro '___constant_swab32' (((__u32)(x) & (__u32)0x000000ffUL) << 24) \| \ ^ Signed-off-by: Phong Tran <tranmanphong@gmail.com> Reported-by: Nick Desaulniers <ndesaulniers@google.com> Link: https://github.com/ClangBuiltLinux/linux/issues/460 Suggested-by: David Laight <David.Laight@ACULAB.COM> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Cc: stable@vger.kernel.org [robh: fix up whitespace] Signed-off-by: Rob Herring <robh@kernel.org>
\| *	of/device.c: fix the wrong comments	Jojo Zeng	2019-05-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the comments which discribed the input parameters of of_match_device(). the name is changed, so fix it. Signed-off-by: Jojo Zeng <jojo_zeng@126.com> Reviewed-by: Frank Rowand <frank.rowand@sony.com> Signed-off-by: Rob Herring <robh@kernel.org>
\| *	dt-bindings: Add isl68137 as a trivial device	Patrick Venture	2019-04-30	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The isl68137 is a digital output 7-phrase configurable PWM controller with an AVSBus interface from Intersil. Signed-off-by: Patrick Venture <venture@google.com> Acked-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Rob Herring <robh@kernel.org>
\| *	dt-bindings: Add ir38064 as a trivial device	Patrick Venture	2019-04-30	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The ir38064 is a voltage regulator from Infineon. Signed-off-by: Patrick Venture <venture@google.com> Acked-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Rob Herring <robh@kernel.org>
\| *	of: del redundant type conversion	xiaojiangfeng	2019-04-29	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The type of variable l in early_init_dt_scan_chosen is int, there is no need to convert to int. Signed-off-by: xiaojiangfeng <xiaojiangfeng@huawei.com> Reviewed-by: Frank Rowand <frank.rowand@sony.com> Signed-off-by: Rob Herring <robh@kernel.org>
\| *	dt-bindings: mfd: axp20x: Add fallback for axp805	Clément Péron	2019-04-29	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	axp805 is actually compatible and used with axp806 as fallback. But it's actually undocumented and trig a warning with checkpatch. DT compatible string "x-powers,axp805" appears un-documented. Add this compatible in the dt-bindings documentation. Signed-off-by: Clément Péron <peron.clem@gmail.com> Signed-off-by: Rob Herring <robh@kernel.org>
\| *	of: Improve of_phandle_iterator_next() error message	Florian Fainelli	2019-04-10	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Understanding why of_phandle_iterator_next() returns an error can sometimes be hard to decipher based solely on the error message, a typical error example is that #foo-cells = <X> and the phandle property used has a smaller number of cells specified, e.g.: #thermal-sensor-cells = <1>; phandle = <&scmi_sensor> instead of: phandle <&scmi_sensor 0>; Instead, make it clear what the expectations are towards debugging incorrect Device Tree faster: OF: /thermal-zones/scmi-thermal: #thermal-sensor-cells = 1, found 0 instead Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: Rob Herring <robh@kernel.org>
\| *	dt-bindings: connector: Spelling mistake	Miquel Raynal	2019-04-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Cosmetic change multpile -> multiple. Fixes: 593aa2b405f9 ("dt-bindings: add bindings for USB physical connector") Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Rob Herring <robh@kernel.org>
\| *	dt-bindings: Add schemas for simple-framebuffer	Maxime Ripard	2019-04-10	4	-160/+160
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The simple framebuffer is a binding that allows the bootloader to setup a framebuffer, describe it in the Device Tree for the OS to pick it up and use it as is. Replace the current binding by a schema to allow the validation tools to check those nodes. Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com> Cc: Chen-Yu Tsai <wens@csie.org> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Maxime Jourdan <mjourdan@baylibre.com> Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com> Signed-off-by: Rob Herring <robh@kernel.org>
\| *	of: address: Add support for the parent DMA bus	Maxime Ripard	2019-04-10	1	-2/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some SoCs have devices that are using a separate bus from the main bus to perform DMA. These buses might have some restrictions and/or different mapping than from the CPU side, so we'd need to express those using the usual dma-ranges, but using a different DT node than the node's parent. Now that the generic interconnect bindings are available, we can model an interconnect with the reserved name "dma-mem" for those use-cases. Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com> Signed-off-by: Rob Herring <robh@kernel.org>
\| *	of: address: Retrieve a parent through a callback in __of_translate_address	Maxime Ripard	2019-04-10	1	-5/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The __of_translate_address function is used to translate the device tree addresses to physical addresses using the various ranges property to create the offset. However, it's shared between the CPU addresses (based on the ranges property) and the DMA addresses (based on dma-ranges). Since we're going to add support for a DMA parent node that is not the DT parent node, we need to change the logic a bit to have a callback function that will retrieve the parent node we should use. Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com> Signed-off-by: Rob Herring <robh@kernel.org>
\| *	dt-bindings: bus: Add binding for the Allwinner MBUS controller	Maxime Ripard	2019-04-10	1	-0/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The MBUS controller drives the MBUS that other devices in the SoC will use to perform DMA. It also has a register interface that allows to monitor and control the bandwidth and priorities for masters on that bus. Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com> Signed-off-by: Rob Herring <robh@kernel.org>
\| *	dt-bindings: interconnect: Add a dma interconnect name	Maxime Ripard	2019-04-10	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current DT bindings assume that the DMA will be performed by the devices through their parent DT node, and rely on that assumption for the address translation using dma-ranges. However, some SoCs have devices that will perform DMA through another bus, with separate address translation rules. We therefore need to express that relationship, through the special interconnect name "dma-mem". Acked-by: Georgi Djakov <georgi.djakov@linaro.org> Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com> Signed-off-by: Rob Herring <robh@kernel.org>
\| *	of: use correct function prototype for of_overlay_fdt_apply()	Chris Packham	2019-04-10	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When CONFIG_OF_OVERLAY is not enabled the fallback stub for of_overlay_fdt_apply() does not match the prototype for the case when CONFIG_OF_OVERLAY is enabled. Update the stub to use the correct function prototype. Fixes: 39a751a4cb7e ("of: change overlay apply input data from unflattened to FDT") Signed-off-by: Chris Packham <chris.packham@alliedtelesis.co.nz> Reviewed-by: Frank Rowand <frank.rowand@sony.com> Signed-off-by: Rob Herring <robh@kernel.org>
\| *	of: reserved_mem: fix reserve memory leak	pierre Kuo	2019-04-10	1	-5/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The __reserved_mem_init_node will call region specific reserved memory init codes, but once all compatibled init codes failed, the memory region will left in memory.reserved and cause leakage. Take cma reserve memory DTS for example, if user declare 1MB size, which is not align to (PAGE_SIZE << max(MAX_ORDER - 1, pageblock_order)), rmem_cma_setup will return -EINVAL. Meanwhile, rmem_dma_setup will also return -EINVAL since "reusable" property is not set. If finally there is no reserved memory init pick up this memory, kernel will left the 1MB leak in memory.reserved. This patch will remove this kind of memory from memory.reserved, only when __reserved_mem_init_node return neither 0 nor -ENOENT. Signed-off-by: pierre Kuo <vichy.kuo@gmail.com> Signed-off-by: Rob Herring <robh@kernel.org>
\| *	of: property: Document that of_graph_get_endpoint_by_regs needs of_node_put	Maxime Ripard	2019-04-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The node returned by of_graph_get_endpoint_by_regs has a reference taken, and we need to put that reference back when done with the node. However, the documentation for that node doesn't mention it, so let's make sure it does. Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com> Signed-off-by: Rob Herring <robh@kernel.org>
\| *	dt-bindings: Require child nodes type to be 'object'	Rob Herring	2019-04-10	3	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A node is always an object (aka a dictionary), so make that explicit for child node schemas. A meta-schema update will enforce having 'type' specified. Cc: Mark Rutland <mark.rutland@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Rob Herring <robh@kernel.org>
\| *	dt-bindings: Add a guide of do's and don't's for writing bindings	Rob Herring	2019-04-10	1	-0/+60
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Devicetree binding reviews have a lot of repeated review comments. Much of the guidelines aren't written down. This list of do's and don't's is by no means an exhaustive guide for how to write bindings, but at least the "rules" are written down in some form. Signed-off-by: Rob Herring <robh@kernel.org>
* \|	Merge tag 'random_for_linus' of ↵	Linus Torvalds	2019-05-08	4	-78/+156
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random Pull randomness updates from Ted Ts'o: - initialize the random driver earler - fix CRNG initialization when we trust the CPU's RNG on NUMA systems - other miscellaneous cleanups and fixes. * tag 'random_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/random: random: add a spinlock_t to struct batched_entropy random: document get_random_int() family random: fix CRNG initialization when random.trust_cpu=1 random: move rand_initialize() earlier random: only read from /dev/random after its pool has received 128 bits drivers/char/random.c: make primary_crng static drivers/char/random.c: remove unused stuct poolinfo::poolbits drivers/char/random.c: constify poolinfo_table
\| * \|	random: add a spinlock_t to struct batched_entropy	Sebastian Andrzej Siewior	2019-04-20	1	-25/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The per-CPU variable batched_entropy_uXX is protected by get_cpu_var(). This is just a preempt_disable() which ensures that the variable is only from the local CPU. It does not protect against users on the same CPU from another context. It is possible that a preemptible context reads slot 0 and then an interrupt occurs and the same value is read again. The above scenario is confirmed by lockdep if we add a spinlock: \| ================================ \| WARNING: inconsistent lock state \| 5.1.0-rc3+ #42 Not tainted \| -------------------------------- \| inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. \| ksoftirqd/9/56 [HC0[0]:SC1[1]:HE0:SE0] takes: \| (____ptrval____) (batched_entropy_u32.lock){+.?.}, at: get_random_u32+0x3e/0xe0 \| {SOFTIRQ-ON-W} state was registered at: \| _raw_spin_lock+0x2a/0x40 \| get_random_u32+0x3e/0xe0 \| new_slab+0x15c/0x7b0 \| ___slab_alloc+0x492/0x620 \| __slab_alloc.isra.73+0x53/0xa0 \| kmem_cache_alloc_node+0xaf/0x2a0 \| copy_process.part.41+0x1e1/0x2370 \| _do_fork+0xdb/0x6d0 \| kernel_thread+0x20/0x30 \| kthreadd+0x1ba/0x220 \| ret_from_fork+0x3a/0x50 … \| other info that might help us debug this: \| Possible unsafe locking scenario: \| \| CPU0 \| ---- \| lock(batched_entropy_u32.lock); \| <Interrupt> \| lock(batched_entropy_u32.lock); \| \| * DEADLOCK * \| \| stack backtrace: \| Call Trace: … \| kmem_cache_alloc_trace+0x20e/0x270 \| ipmi_alloc_recv_msg+0x16/0x40 … \| __do_softirq+0xec/0x48d \| run_ksoftirqd+0x37/0x60 \| smpboot_thread_fn+0x191/0x290 \| kthread+0xfe/0x130 \| ret_from_fork+0x3a/0x50 Add a spinlock_t to the batched_entropy data structure and acquire the lock while accessing it. Acquire the lock with disabled interrupts because this function may be used from interrupt context. Remove the batched_entropy_reset_lock lock. Now that we have a lock for the data scructure, we can access it from a remote CPU. Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
\| * \|	random: document get_random_int() family	George Spelvin	2019-04-20	1	-7/+76
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Explain what these functions are for and when they offer an advantage over get_random_bytes(). (We still need documentation on rng_is_initialized(), the random_ready_callback system, and early boot in general.) Signed-off-by: George Spelvin <lkml@sdf.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
\| * \|	random: fix CRNG initialization when random.trust_cpu=1	Jon DeVree	2019-04-20	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When the system boots with random.trust_cpu=1 it doesn't initialize the per-NUMA CRNGs because it skips the rest of the CRNG startup code. This means that the code from 1e7f583af67b ("random: make /dev/urandom scalable for silly userspace programs") is not used when random.trust_cpu=1. crash> dmesg \| grep random: [ 0.000000] random: get_random_bytes called from start_kernel+0x94/0x530 with crng_init=0 [ 0.314029] random: crng done (trusting CPU's manufacturer) crash> print crng_node_pool $6 = (struct crng_state ) 0x0 After adding the missing call to numa_crng_init() the per-NUMA CRNGs are initialized again: crash> dmesg \| grep random: [ 0.000000] random: get_random_bytes called from start_kernel+0x94/0x530 with crng_init=0 [ 0.314031] random: crng done (trusting CPU's manufacturer) crash> print crng_node_pool $1 = (struct crng_state ) 0xffff9a915f4014a0 The call to invalidate_batched_entropy() was also missing. This is important for architectures like PPC and S390 which only have the arch_get_random_seed_* functions. Fixes: 39a8883a2b98 ("random: add a config option to trust the CPU's hwrng") Signed-off-by: Jon DeVree <nuxi@vault24.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
\| * \|	random: move rand_initialize() earlier	Kees Cook	2019-04-20	3	-10/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Right now rand_initialize() is run as an early_initcall(), but it only depends on timekeeping_init() (for mixing ktime_get_real() into the pools). However, the call to boot_init_stack_canary() for stack canary initialization runs earlier, which triggers a warning at boot: random: get_random_bytes called from start_kernel+0x357/0x548 with crng_init=0 Instead, this moves rand_initialize() to after timekeeping_init(), and moves canary initialization here as well. Note that this warning may still remain for machines that do not have UEFI RNG support (which initializes the RNG pools during setup_arch()), or for x86 machines without RDRAND (or booting without "random.trust=on" or CONFIG_RANDOM_TRUST_CPU=y). Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
\| * \|	random: only read from /dev/random after its pool has received 128 bits	Theodore Ts'o	2019-04-17	2	-30/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Immediately after boot, we allow reads from /dev/random before its entropy pool has been fully initialized. Fix this so that we don't allow this until the blocking pool has received 128 bits. We do this by repurposing the initialized flag in the entropy pool struct, and use the initialized flag in the blocking pool to indicate whether it is safe to pull from the blocking pool. To do this, we needed to rework when we decide to push entropy from the input pool to the blocking pool, since the initialized flag for the input pool was used for this purpose. To simplify things, we no longer use the initialized flag for that purpose, nor do we use the entropy_total field any more. Signed-off-by: Theodore Ts'o <tytso@mit.edu>
\| * \|	drivers/char/random.c: make primary_crng static	Rasmus Villemoes	2019-04-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since the definition of struct crng_state is private to random.c, and primary_crng is neither declared or used elsewhere, there's no reason for that symbol to have external linkage. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
\| * \|	drivers/char/random.c: remove unused stuct poolinfo::poolbits	Rasmus Villemoes	2019-04-17	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This field is never used, might as well remove it. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
\| * \|	drivers/char/random.c: constify poolinfo_table	Rasmus Villemoes	2019-04-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Never modified, might as well be put in .rodata. Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* \| \|	Merge tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt	Linus Torvalds	2019-05-08	18	-144/+294
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pull fscrypt updates from Ted Ts'o: "Clean up fscrypt's dcache revalidation support, and other miscellaneous cleanups" * tag 'fscrypt_for_linus' of git://git.kernel.org/pub/scm/fs/fscrypt/fscrypt: fscrypt: cache decrypted symlink target in ->i_link vfs: use READ_ONCE() to access ->i_link fscrypt: fix race where ->lookup() marks plaintext dentry as ciphertext fscrypt: only set dentry_operations on ciphertext dentries fs, fscrypt: clear DCACHE_ENCRYPTED_NAME when unaliasing directory fscrypt: fix race allowing rename() and link() of ciphertext dentries fscrypt: clean up and improve dentry revalidation fscrypt: use READ_ONCE() to access ->i_crypt_info fscrypt: remove WARN_ON_ONCE() when decryption fails fscrypt: drop inode argument from fscrypt_get_ctx()
\| * \| \|	fscrypt: cache decrypted symlink target in ->i_link	Eric Biggers	2019-04-17	6	-7/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Path lookups that traverse encrypted symlink(s) are very slow because each encrypted symlink needs to be decrypted each time it's followed. This also involves dropping out of rcu-walk mode. Make encrypted symlinks faster by caching the decrypted symlink target in ->i_link. The first call to fscrypt_get_symlink() sets it. Then, the existing VFS path lookup code uses the non-NULL ->i_link to take the fast path where ->get_link() isn't called, and lookups in rcu-walk mode remain in rcu-walk mode. Also set ->i_link immediately when a new encrypted symlink is created. To safely free the symlink target after an RCU grace period has elapsed, introduce a new function fscrypt_free_inode(), and make the relevant filesystems call it just before actually freeing the inode. Cc: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
\| * \| \|	vfs: use READ_ONCE() to access ->i_link	Eric Biggers	2019-04-17	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Use 'READ_ONCE(inode->i_link)' to explicitly support filesystems caching the symlink target in ->i_link later if it was unavailable at iget() time, or wasn't easily available. I'll be doing this in fscrypt, to improve the performance of encrypted symlinks on ext4, f2fs, and ubifs. ->i_link will start NULL and may later be set to a non-NULL value by a smp_store_release() or cmpxchg_release(). READ_ONCE() is needed on the read side. smp_load_acquire() is unnecessary because only a data dependency barrier is required. (Thanks to Al for pointing this out.) Acked-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
\| * \| \|	fscrypt: fix race where ->lookup() marks plaintext dentry as ciphertext	Eric Biggers	2019-04-17	7	-66/+139
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	->lookup() in an encrypted directory begins as follows: 1. fscrypt_prepare_lookup(): a. Try to load the directory's encryption key. b. If the key is unavailable, mark the dentry as a ciphertext name via d_flags. 2. fscrypt_setup_filename(): a. Try to load the directory's encryption key. b. If the key is available, encrypt the name (treated as a plaintext name) to get the on-disk name. Otherwise decode the name (treated as a ciphertext name) to get the on-disk name. But if the key is concurrently added, it may be found at (2a) but not at (1a). In this case, the dentry will be wrongly marked as a ciphertext name even though it was actually treated as plaintext. This will cause the dentry to be wrongly invalidated on the next lookup, potentially causing problems. For example, if the racy ->lookup() was part of sys_mount(), then the new mount will be detached when anything tries to access it. This is despite the mountpoint having a plaintext path, which should remain valid now that the key was added. Of course, this is only possible if there's a userspace race. Still, the additional kernel-side race is confusing and unexpected. Close the kernel-side race by changing fscrypt_prepare_lookup() to also set the on-disk filename (step 2b), consistent with the d_flags update. Fixes: 28b4c263961c ("ext4 crypto: revalidate dentry after adding or removing the key") Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
\| * \| \|	fscrypt: only set dentry_operations on ciphertext dentries	Eric Biggers	2019-04-17	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Plaintext dentries are always valid, so only set fscrypt_d_ops on ciphertext dentries. Besides marginally improved performance, this allows overlayfs to use an fscrypt-encrypted upperdir, provided that all the following are true: (1) The fscrypt encryption key is placed in the keyring before mounting overlayfs, and remains while the overlayfs is mounted. (2) The overlayfs workdir uses the same encryption policy. (3) No dentries for the ciphertext names of subdirectories have been created in the upperdir or workdir yet. (Since otherwise d_splice_alias() will reuse the old dentry with ->d_op set.) One potential use case is using an ephemeral encryption key to encrypt all files created or changed by a container, so that they can be securely erased ("crypto-shredded") after the container stops. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
\| * \| \|	fs, fscrypt: clear DCACHE_ENCRYPTED_NAME when unaliasing directory	Eric Biggers	2019-04-17	2	-0/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make __d_move() clear DCACHE_ENCRYPTED_NAME on the source dentry. This is needed for when d_splice_alias() moves a directory's encrypted alias to its decrypted alias as a result of the encryption key being added. Otherwise, the decrypted alias will incorrectly be invalidated on the next lookup, causing problems such as unmounting a mount the user just mount()ed there. Note that we don't have to support arbitrary moves of this flag because fscrypt doesn't allow dentries with DCACHE_ENCRYPTED_NAME to be the source or target of a rename(). Fixes: 28b4c263961c ("ext4 crypto: revalidate dentry after adding or removing the key") Reported-by: Sarthak Kukreti <sarthakkukreti@chromium.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
\| * \| \|	fscrypt: fix race allowing rename() and link() of ciphertext dentries	Eric Biggers	2019-04-17	2	-5/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Close some race conditions where fscrypt allowed rename() and link() on ciphertext dentries that had been looked up just prior to the key being concurrently added. It's better to return -ENOKEY in this case. This avoids doing the nonsensical thing of encrypting the names a second time when searching for the actual on-disk dir entries. It also guarantees that DCACHE_ENCRYPTED_NAME dentries are never rename()d, so the dcache won't have support all possible combinations of moving DCACHE_ENCRYPTED_NAME around during __d_move(). Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
\| * \| \|	fscrypt: clean up and improve dentry revalidation	Eric Biggers	2019-04-17	4	-35/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make various improvements to fscrypt dentry revalidation: - Don't try to handle the case where the per-directory key is removed, as this can't happen without the inode (and dentries) being evicted. - Flag ciphertext dentries rather than plaintext dentries, since it's ciphertext dentries that need the special handling. - Avoid doing unnecessary work for non-ciphertext dentries. - When revalidating ciphertext dentries, try to set up the directory's i_crypt_info to make sure the key is really still absent, rather than invalidating all negative dentries as the previous code did. An old comment suggested we can't do this for locking reasons, but AFAICT this comment was outdated and it actually works fine. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
\| * \| \|	fscrypt: use READ_ONCE() to access ->i_crypt_info	Eric Biggers	2019-04-17	5	-9/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	->i_crypt_info starts out NULL and may later be locklessly set to a non-NULL value by the cmpxchg() in fscrypt_get_encryption_info(). But ->i_crypt_info is used directly, which technically is incorrect. It's a data race, and it doesn't include the data dependency barrier needed to safely dereference the pointer on at least one architecture. Fix this by using READ_ONCE() instead. Note: we don't need to use smp_load_acquire(), since dereferencing the pointer only requires a data dependency barrier, which is already included in READ_ONCE(). We also don't need READ_ONCE() in places where ->i_crypt_info is unconditionally dereferenced, since it must have already been checked. Also downgrade the cmpxchg() to cmpxchg_release(), since RELEASE semantics are sufficient on the write side. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
\| * \| \|	fscrypt: remove WARN_ON_ONCE() when decryption fails	Eric Biggers	2019-04-17	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If decrypting a block fails, fscrypt did a WARN_ON_ONCE(). But WARN is meant for kernel bugs, which this isn't; this could be hit by fuzzers using fault injection, for example. Also, there is already a proper warning message logged in fscrypt_do_page_crypto(), so the WARN doesn't add much. Just remove the unnessary WARN. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
\| * \| \|	fscrypt: drop inode argument from fscrypt_get_ctx()	Eric Biggers	2019-04-17	4	-16/+9
\| \|/ / \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The only reason the inode is being passed to fscrypt_get_ctx() is to verify that the encryption key is available. However, all callers already ensure this because if we get as far as trying to do I/O to an encrypted file without the key, there's already a bug. Therefore, remove this unnecessary argument. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
* \| \|	Merge tag 'ext4_for_linus' of ↵	Linus Torvalds	2019-05-08	35	-59/+9591
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: "Add as a feature case-insensitive directories (the casefold feature) using Unicode 12.1. Also, the usual largish number of cleanups and bug fixes" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (25 commits) ext4: export /sys/fs/ext4/feature/casefold if Unicode support is present ext4: fix ext4_show_options for file systems w/o journal unicode: refactor the rule for regenerating utf8data.h docs: ext4.rst: document case-insensitive directories ext4: Support case-insensitive file name lookups ext4: include charset encoding information in the superblock MAINTAINERS: add Unicode subsystem entry unicode: update unicode database unicode version 12.1.0 unicode: introduce test module for normalized utf8 implementation unicode: implement higher level API for string handling unicode: reduce the size of utf8data[] unicode: introduce code for UTF-8 normalization unicode: introduce UTF-8 character database ext4: actually request zeroing of inode table after grow ext4: cond_resched in work-heavy group loops ext4: fix use-after-free race with debug_want_extra_isize ext4: avoid drop reference to iloc.bh twice ext4: ignore e_value_offs for xattrs with value-in-ea-inode ext4: protect journal inode's blocks using block_validity ext4: use BUG() instead of BUG_ON(1) ...
\| * \| \|	ext4: export /sys/fs/ext4/feature/casefold if Unicode support is present	Theodore Ts'o	2019-05-06	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Theodore Ts'o <tytso@mit.edu>
\| * \| \|	ext4: fix ext4_show_options for file systems w/o journal	Debabrata Banerjee	2019-05-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of removing EXT4_MOUNT_JOURNAL_CHECKSUM from s_def_mount_opt as I assume was intended, all other options were blown away leading to _ext4_show_options() output being incorrect. Fixes: 1e381f60dad9 ("ext4: do not allow journal_opts for fs w/o journal") Signed-off-by: Debabrata Banerjee <dbanerje@akamai.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@kernel.org
\| * \| \|	unicode: refactor the rule for regenerating utf8data.h	Masahiro Yamada	2019-04-28	7	-17/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	scripts/mkutf8data is used only when regenerating utf8data.h, which never happens in the normal kernel build. However, it is irrespectively built if CONFIG_UNICODE is enabled. Moreover, there is no good reason for it to reside in the scripts/ directory since it is only used in fs/unicode/. Hence, move it from scripts/ to fs/unicode/. In some cases, we bypass build artifacts in the normal build. The conventional way to do so is to surround the code with ifdef REGENERATE_. For example, - 7373f4f83c71 ("kbuild: add implicit rules for parser generation") - 6aaf49b495b4 ("crypto: arm,arm64 - Fix random regeneration of S_shipped") I rewrote the rule in a more kbuild'ish style. In the normal build, utf8data.h is just shipped from the check-in file. $ make [ snip ] SHIPPED fs/unicode/utf8data.h CC fs/unicode/utf8-norm.o CC fs/unicode/utf8-core.o CC fs/unicode/utf8-selftest.o AR fs/unicode/built-in.a If you want to generate utf8data.h based on UCD, put .txt files into fs/unicode/, then pass REGENERATE_UTF8DATA=1 from the command line. The mkutf8data tool will be automatically compiled to generate the utf8data.h from the *.txt files. $ make REGENERATE_UTF8DATA=1 [ snip ] HOSTCC fs/unicode/mkutf8data GEN fs/unicode/utf8data.h CC fs/unicode/utf8-norm.o CC fs/unicode/utf8-core.o CC fs/unicode/utf8-selftest.o AR fs/unicode/built-in.a I renamed the check-in utf8data.h to utf8data.h_shipped so that this will work for the out-of-tree build. You can update it based on the latest UCD like this: $ make REGENERATE_UTF8DATA=1 fs/unicode/ $ cp fs/unicode/utf8data.h fs/unicode/utf8data.h_shipped Also, I added entries to .gitignore and dontdiff. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
\| * \| \|	docs: ext4.rst: document case-insensitive directories	Gabriel Krisman Bertazi	2019-04-25	1	-0/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Introduces the case-insensitive features on ext4 for system administrators. Explain the minimum of design decisions that are important for sysadmins wanting to enable this feature. Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
\| * \| \|	ext4: Support case-insensitive file name lookups	Gabriel Krisman Bertazi	2019-04-25	10	-21/+223
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch implements the actual support for case-insensitive file name lookups in ext4, based on the feature bit and the encoding stored in the superblock. A filesystem that has the casefold feature set is able to configure directories with the +F (EXT4_CASEFOLD_FL) attribute, enabling lookups to succeed in that directory in a case-insensitive fashion, i.e: match a directory entry even if the name used by userspace is not a byte per byte match with the disk name, but is an equivalent case-insensitive version of the Unicode string. This operation is called a case-insensitive file name lookup. The feature is configured as an inode attribute applied to directories and inherited by its children. This attribute can only be enabled on empty directories for filesystems that support the encoding feature, thus preventing collision of file names that only differ by case. * dcache handling: For a +F directory, Ext4 only stores the first equivalent name dentry used in the dcache. This is done to prevent unintentional duplication of dentries in the dcache, while also allowing the VFS code to quickly find the right entry in the cache despite which equivalent string was used in a previous lookup, without having to resort to ->lookup(). d_hash() of casefolded directories is implemented as the hash of the casefolded string, such that we always have a well-known bucket for all the equivalencies of the same string. d_compare() uses the utf8_strncasecmp() infrastructure, which handles the comparison of equivalent, same case, names as well. For now, negative lookups are not inserted in the dcache, since they would need to be invalidated anyway, because we can't trust missing file dentries. This is bad for performance but requires some leveraging of the vfs layer to fix. We can live without that for now, and so does everyone else. * on-disk data: Despite using a specific version of the name as the internal representation within the dcache, the name stored and fetched from the disk is a byte-per-byte match with what the user requested, making this implementation 'name-preserving'. i.e. no actual information is lost when writing to storage. DX is supported by modifying the hashes used in +F directories to make them case/encoding-aware. The new disk hashes are calculated as the hash of the full casefolded string, instead of the string directly. This allows us to efficiently search for file names in the htree without requiring the user to provide an exact name. * Dealing with invalid sequences: By default, when a invalid UTF-8 sequence is identified, ext4 will treat it as an opaque byte sequence, ignoring the encoding and reverting to the old behavior for that unique file. This means that case-insensitive file name lookup will not work only for that file. An optional bit can be set in the superblock telling the filesystem code and userspace tools to enforce the encoding. When that optional bit is set, any attempt to create a file name using an invalid UTF-8 sequence will fail and return an error to userspace. * Normalization algorithm: The UTF-8 algorithms used to compare strings in ext4 is implemented lives in fs/unicode, and is based on a previous version developed by SGI. It implements the Canonical decomposition (NFD) algorithm described by the Unicode specification 12.1, or higher, combined with the elimination of ignorable code points (NFDi) and full case-folding (CF) as documented in fs/unicode/utf8_norm.c. NFD seems to be the best normalization method for EXT4 because: - It has a lower cost than NFC/NFKC (which requires decomposing to NFD as an intermediary step) - It doesn't eliminate important semantic meaning like compatibility decompositions. Although: - This implementation is not completely linguistic accurate, because different languages have conflicting rules, which would require the specialization of the filesystem to a given locale, which brings all sorts of problems for removable media and for users who use more than one language. Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
\| * \| \|	ext4: include charset encoding information in the superblock	Gabriel Krisman Bertazi	2019-04-25	2	-1/+105
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Support for encoding is considered an incompatible feature, since it has potential to create collisions of file names in existing filesystems. If the feature flag is not enabled, the entire filesystem will operate on opaque byte sequences, respecting the original behavior. The s_encoding field stores a magic number indicating the encoding format and version used globally by file and directory names in the filesystem. The s_encoding_flags defines policies for using the charset encoding, like how to handle invalid sequences. The magic number is mapped to the exact charset table, but the mapping is specific to ext4. Since we don't have any commitment to support old encodings, the only encoding I am supporting right now is utf8-12.1.0. The current implementation prevents the user from enabling encoding and per-directory encryption on the same filesystem at the same time. The incompatibility between these features lies in how we do efficient directory searches when we cannot be sure the encryption of the user provided fname will match the actual hash stored in the disk without decrypting every directory entry, because of normalization cases. My quickest solution is to simply block the concurrent use of these features for now, and enable it later, once we have a better solution. Signed-off-by: Gabriel Krisman Bertazi <krisman@collabora.co.uk> Signed-off-by: Theodore Ts'o <tytso@mit.edu>