summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* nvme: add a newline to the 'tls_key' sysfs attributeHannes Reinecke2024-08-221-1/+1
| | | | | | | | | Print a newline for easier userspace handling. Signed-off-by: Hannes Reinecke <hare@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
* nvme-tcp: check for invalidated or revoked keyHannes Reinecke2024-08-225-3/+30
| | | | | | | | | | key_lookup() will always return a key, even if that key is revoked or invalidated. So check for invalid keys before continuing. Signed-off-by: Hannes Reinecke <hare@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
* nvme-tcp: sanitize TLS key handlingHannes Reinecke2024-08-224-17/+43
| | | | | | | | | | | | | | | | | | There is a difference between TLS configured (ie the user has provisioned/requested a key) and TLS enabled (ie the connection is encrypted with TLS). This becomes important for secure concatenation, where the initial authentication is run on an unencrypted connection (ie with TLS configured, but not enabled), and then the queue is reset to run over TLS (ie TLS configured _and_ enabled). So to differentiate between those two states store the generated key in opts->tls_key (as we're using the same TLS key for all queues), the key serial of the resulting TLS handshake in ctrl->tls_pskid (to signal that TLS on the admin queue is enabled), and a simple flag for the queues to indicated that TLS has been enabled. Signed-off-by: Hannes Reinecke <hare@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
* nvme-keyring: restrict match length for version '1' identifiersHannes Reinecke2024-08-221-10/+26
| | | | | | | | | | | | | | | | | | | | | | | | TP8018 introduced a new TLS PSK identifier version (version 1), which appended a PSK hash value to the existing identifier (cf NVMe TCP specification v1.1, section 3.6.1.3 'TLS PSK and PSK Identity Derivation'). An original (version 0) identifier has the form: NVMe0<type><hmac> <hostnqn> <subsysnqn> and a version 1 identifier has the form: NVMe1<type><hmac> <hostnqn> <subsysnqn> <hash> This patch modifies the lookup algorthm to compare only the first part of the identifier (excluding the hash value) to handle both version 0 and version 1 identifiers. And the spec declares 'version 0' identifiers obsolete, so the lookup algorithm is modified to prever v1 identifiers. Signed-off-by: Hannes Reinecke <hare@kernel.org> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Keith Busch <kbusch@kernel.org>
* nvme_core: scan namespaces asynchronouslyStuart Hayes2024-08-221-1/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Use async function calls to make namespace scanning happen in parallel. Without the patch, NVME namespaces are scanned serially, so it can take a long time for all of a controller's namespaces to become available, especially with a slower (TCP) interface with large number of namespaces. It is not uncommon to have large numbers (hundreds or thousands) of namespaces on nvme-of with storage servers. The time it took for all namespaces to show up after connecting (via TCP) to a controller with 1002 namespaces was measured on one system: network latency without patch with patch 0 6s 1s 50ms 210s 10s 100ms 417s 18s Measurements taken on another system show the effect of the patch on the time nvme_scan_work() took to complete, when connecting to a linux nvme-of target with varying numbers of namespaces, on a network of 400us. namespaces without patch with patch 1 16ms 14ms 2 24ms 16ms 4 49ms 22ms 8 101ms 33ms 16 207ms 56ms 100 1.4s 0.6s 1000 12.9s 2.0s On the same system, connecting to a local PCIe NVMe drive (a Samsung PM1733) instead of a network target: namespaces without patch with patch 1 13ms 12ms 2 41ms 13ms Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
* blk-cgroup: Remove unused declaration blkg_path()Yue Haibing2024-08-161-1/+0
| | | | | | | | | Commit bb7e5a193d8b ("block, bfq: remove blkg_path()") removed the implementation but leave declaration. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/r/20240816095821.877842-1-yuehaibing@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: constify ext_pi_ref_escape()Alexey Dobriyan2024-08-131-2/+2
| | | | | | | | | | This function doesn't mutate data. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/d24611b3-dddf-473a-903d-39290db03b11@p183 Signed-off-by: Jens Axboe <axboe@kernel.dk>
* block: delete module stuff from t10-piAlexey Dobriyan2024-08-131-4/+0
| | | | | | | | | | | | It is not possible to build t10-pi.ko anymore. This file doesn't even export functions. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/216ccc79-5b80-47b2-b507-990951aa810c@p183 Signed-off-by: Jens Axboe <axboe@kernel.dk>
* ublk: move zone report data out of request pduMing Lei2024-08-121-16/+46
| | | | | | | | | | | | | | | | | ublk zoned takes 16 bytes in each request pdu just for handling REPORT_ZONE operation, this way does waste memory since request pdu is allocated statically. Store the transient zone report data into one global xarray, and remove it after the report zone request is completed. This way is reasonable since report zone is run in slow code path. Fixes: 29802d7ca33b ("ublk: enable zoned storage support") Cc: Damien Le Moal <dlemoal@kernel.org> Cc: Andreas Hindborg <a.hindborg@samsung.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20240812013624.587587-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* drbd: Remove unused extern declarationsYueHaibing2024-08-021-10/+0
| | | | | | | | | | | | | | | | | | Commit b411b3637fa7 ("The DRBD driver") declared but never implemented drbd_read_remote(), is_valid_ar_handle() and drbd_set_recv_tcq(). And commit 668700b40a7c ("drbd: Create a dedicated workqueue for sending acks on the control connection") never implemented drbd_send_ping_wf(). Commit 2451fc3b2bd3 ("drbd: Removed the BIO_RW_BARRIER support form the receiver/epoch code") leave w_e_reissue() declaration unused. Commit 8fe605513ab4 ("drbd: Rename drbdd_init() -> drbd_receiver()") rename drbdd_init() and leave unsued declaration. Also drbd_asender() is removed in commit 1c03e52083c8 ("drbd: Rename asender to ack_receiver"). Signed-off-by: YueHaibing <yuehaibing@huawei.com> Link: https://lore.kernel.org/r/20240802095147.2788218-1-yuehaibing@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* nbd: add support for rotational devicesWouter Verhelst2024-07-292-1/+5
| | | | | | | | | | | | The NBD protocol defines the flag NBD_FLAG_ROTATIONAL to flag that the export in use should be treated as a rotational device. Add support for that flag to the kernel driver. Signed-off-by: Wouter Verhelst <w@uter.be> Reviewed-by: Eric Blake <eblake@redhat.com> Link: https://lore.kernel.org/r/20240725164536.1275851-1-w@uter.be Signed-off-by: Jens Axboe <axboe@kernel.dk>
* drbd: use sendpages_ok() instead of sendpage_ok()Ofir Gal2024-07-291-1/+1
| | | | | | | | | | | | | | | | | | | | | Currently _drbd_send_page() use sendpage_ok() in order to enable MSG_SPLICE_PAGES, it check the first page of the iterator, the iterator may represent contiguous pages. MSG_SPLICE_PAGES enables skb_splice_from_iter() which checks all the pages it sends with sendpage_ok(). When _drbd_send_page() sends an iterator that the first page is sendable, but one of the other pages isn't skb_splice_from_iter() warns and aborts the data transfer. Using the new helper sendpages_ok() in order to enable MSG_SPLICE_PAGES solves the issue. Acked-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Signed-off-by: Ofir Gal <ofir.gal@volumez.com> Link: https://lore.kernel.org/r/20240718084515.3833733-4-ofir.gal@volumez.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* nvme-tcp: use sendpages_ok() instead of sendpage_ok()Ofir Gal2024-07-291-1/+1
| | | | | | | | | | | | | | | | | | | | | | | Currently nvme_tcp_try_send_data() use sendpage_ok() in order to disable MSG_SPLICE_PAGES, it check the first page of the iterator, the iterator may represent contiguous pages. MSG_SPLICE_PAGES enables skb_splice_from_iter() which checks all the pages it sends with sendpage_ok(). When nvme_tcp_try_send_data() sends an iterator that the first page is sendable, but one of the other pages isn't skb_splice_from_iter() warns and aborts the data transfer. Using the new helper sendpages_ok() in order to disable MSG_SPLICE_PAGES solves the issue. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Ofir Gal <ofir.gal@volumez.com> Link: https://lore.kernel.org/r/20240718084515.3833733-3-ofir.gal@volumez.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* net: introduce helper sendpages_ok()Ofir Gal2024-07-291-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | Network drivers are using sendpage_ok() to check the first page of an iterator in order to disable MSG_SPLICE_PAGES. The iterator can represent list of contiguous pages. When MSG_SPLICE_PAGES is enabled skb_splice_from_iter() is being used, it requires all pages in the iterator to be sendable. Therefore it needs to check that each page is sendable. The patch introduces a helper sendpages_ok(), it returns true if all the contiguous pages are sendable. Drivers who want to send contiguous pages with MSG_SPLICE_PAGES may use this helper to check whether the page list is OK. If the helper does not return true, the driver should remove MSG_SPLICE_PAGES flag. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Ofir Gal <ofir.gal@volumez.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20240718084515.3833733-2-ofir.gal@volumez.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* blk-ioprio: remove per-disk structureYu Kuai2024-07-293-62/+0
| | | | | | | | | | | | | | | | ioprio works on the blk-cgroup level, all disks in the same cgroup are the same, and the struct ioprio_blkg doesn't have anything in it. Hence register the policy is enough, because cpd_alloc/free_fn will be handled for each blk-cgroup, and there is no need to activate the policy for disk. Hence remove blk_ioprio_init/exit and ioprio_alloc/free_pd. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240719071506.158075-4-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* blk-ioprio: remove ioprio_blkcg_from_bio()Yu Kuai2024-07-291-11/+1
| | | | | | | | | | | | | | Currently, if config is enabled, then ioprio is always enabled by default from blkcg_init_disk(), hence there is no point to check if the policy is enabled from blkg in ioprio_blkcg_from_bio(). Hence remove it and get blkcg directly from bio. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240719071506.158075-3-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* blk-cgroup: check for pd_(alloc|free)_fn in blkcg_activate_policy()Yu Kuai2024-07-291-2/+13
| | | | | | | | | | | | | | Currently all policies implement pd_(alloc|free)_fn, however, this is not necessary for ioprio that only works for blkcg, not blkg. There are no functional changes, prepare to cleanup activating ioprio policy. Signed-off-by: Yu Kuai <yukuai3@huawei.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20240719071506.158075-2-yukuai1@huaweicloud.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
* Linux 6.11-rc1v6.11-rc1Linus Torvalds2024-07-281-2/+2
|
* Merge tag 'kbuild-fixes-v6.11' of ↵Linus Torvalds2024-07-284-4/+4
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild fixes from Masahiro Yamada: - Fix RPM package build error caused by an incorrect locale setup - Mark modules.weakdep as ghost in RPM package - Fix the odd combination of -S and -c in stack protector scripts, which is an error with the latest Clang * tag 'kbuild-fixes-v6.11' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: Fix '-S -c' in x86 stack protector scripts kbuild: rpm-pkg: ghost modules.weakdep file kbuild: rpm-pkg: Fix C locale setup
| * kbuild: Fix '-S -c' in x86 stack protector scriptsNathan Chancellor2024-07-282-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After a recent change in clang to stop consuming all instances of '-S' and '-c' [1], the stack protector scripts break due to the kernel's use of -Werror=unused-command-line-argument to catch cases where flags are not being properly consumed by the compiler driver: $ echo | clang -o - -x c - -S -c -Werror=unused-command-line-argument clang: error: argument unused during compilation: '-c' [-Werror,-Wunused-command-line-argument] This results in CONFIG_STACKPROTECTOR getting disabled because CONFIG_CC_HAS_SANE_STACKPROTECTOR is no longer set. '-c' and '-S' both instruct the compiler to stop at different stages of the pipeline ('-S' after compiling, '-c' after assembling), so having them present together in the same command makes little sense. In this case, the test wants to stop before assembling because it is looking at the textual assembly output of the compiler for either '%fs' or '%gs', so remove '-c' from the list of arguments to resolve the error. All versions of GCC continue to work after this change, along with versions of clang that do or do not contain the change mentioned above. Cc: stable@vger.kernel.org Fixes: 4f7fd4d7a791 ("[PATCH] Add the -fstack-protector option to the CFLAGS") Fixes: 60a5317ff0f4 ("x86: implement x86_32 stack protector") Link: https://github.com/llvm/llvm-project/commit/6461e537815f7fa68cef06842505353cf5600e9c [1] Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
| * kbuild: rpm-pkg: ghost modules.weakdep fileJose Ignacio Tornos Martinez2024-07-281-1/+1
| | | | | | | | | | | | | | | | | | | | In the same way as for other similar files, mark as ghost the new file generated by depmod for configured weak dependencies for modules, modules.weakdep, so that although it is not included in the package, claim the ownership on it. Signed-off-by: Jose Ignacio Tornos Martinez <jtornosm@redhat.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
| * kbuild: rpm-pkg: Fix C locale setupPetr Vorel2024-07-241-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | semicolon separation in LC_ALL is wrong. Either variable needs to be exported before as a separate commit or set as part of the commit in the beginning. Used second variant. This fixes broken build on user's locale setup which makes 'date' binary to produce invalid characters in rpm changelog (e.g. cs_CZ.UTF-8 'čec'): $ make binrpm-pkg GEN rpmbuild/SPECS/kernel.spec rpmbuild -bb rpmbuild/SPECS/kernel.spec --define='_topdirlinux/rpmbuild' \ --target x86_64-linux --build-in-place --noprep --define='_smp_mflags \ %{nil}' $(rpm -q rpm >/dev/null 2>&1 || echo --nodeps) Building target platforms: x86_64-linux Building for target x86_64-linux error: bad date in %changelog: St čec 24 2024 user <user@somehost> make[2]: *** [scripts/Makefile.package:71: binrpm-pkg] Error 1 make[1]: *** [linux/Makefile:1546: binrpm-pkg] Error 2 make: *** [Makefile:224: __sub-make] Error 2 Fixes: 301c10908e42 ("kbuild: rpm-pkg: introduce a simple changelog section for kernel.spec") Signed-off-by: Petr Vorel <pvorel@suse.cz> Reviewed-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
* | minmax: simplify and clarify min_t()/max_t() implementationLinus Torvalds2024-07-281-8/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This simplifies the min_t() and max_t() macros by no longer making them work in the context of a C constant expression. That means that you can no longer use them for static initializers or for array sizes in type definitions, but there were only a couple of such uses, and all of them were converted (famous last words) to use MIN_T/MAX_T instead. Cc: David Laight <David.Laight@aculab.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | minmax: add a few more MIN_T/MAX_T usersLinus Torvalds2024-07-287-10/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 3a7e02c040b1 ("minmax: avoid overly complicated constant expressions in VM code") added the simpler MIN_T/MAX_T macros in order to avoid some excessive expansion from the rather complicated regular min/max macros. The complexity of those macros stems from two issues: (a) trying to use them in situations that require a C constant expression (in static initializers and for array sizes) (b) the type sanity checking and MIN_T/MAX_T avoids both of these issues. Now, in the whole (long) discussion about all this, it was pointed out that the whole type sanity checking is entirely unnecessary for min_t/max_t which get a fixed type that the comparison is done in. But that still leaves min_t/max_t unnecessarily complicated due to worries about the C constant expression case. However, it turns out that there really aren't very many cases that use min_t/max_t for this, and we can just force-convert those. This does exactly that. Which in turn will then allow for much simpler implementations of min_t()/max_t(). All the usual "macros in all upper case will evaluate the arguments multiple times" rules apply. We should do all the same things for the regular min/max() vs MIN/MAX() cases, but that has the added complexity of various drivers defining their own local versions of MIN/MAX, so that needs another level of fixes first. Link: https://lore.kernel.org/all/b47fad1d0cf8449886ad148f8c013dae@AcuMS.aculab.com/ Cc: David Laight <David.Laight@aculab.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | Merge tag 'ubifs-for-linus-6.11-rc1-take2' of ↵Linus Torvalds2024-07-2821-214/+135
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs Pull UBI and UBIFS updates from Richard Weinberger: - Many fixes for power-cut issues by Zhihao Cheng - Another ubiblock error path fix - ubiblock section mismatch fix - Misc fixes all over the place * tag 'ubifs-for-linus-6.11-rc1-take2' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/ubifs: ubi: Fix ubi_init() ubiblock_exit() section mismatch ubifs: add check for crypto_shash_tfm_digest ubifs: Fix inconsistent inode size when powercut happens during appendant writing ubi: block: fix null-pointer-dereference in ubiblock_create() ubifs: fix kernel-doc warnings ubifs: correct UBIFS_DFS_DIR_LEN macro definition and improve code clarity mtd: ubi: Restore missing cleanup on ubi_init() failure path ubifs: dbg_orphan_check: Fix missed key type checking ubifs: Fix unattached inode when powercut happens in creating ubifs: Fix space leak when powercut happens in linking tmpfile ubifs: Move ui->data initialization after initializing security ubifs: Fix adding orphan entry twice for the same inode ubifs: Remove insert_dead_orphan from replaying orphan process Revert "ubifs: ubifs_symlink: Fix memleak of inode->i_link in error path" ubifs: Don't add xattr inode into orphan area ubifs: Fix unattached xattr inode if powercut happens after deleting mtd: ubi: avoid expensive do_div() on 32-bit machines mtd: ubi: make ubi_class constant ubi: eba: properly rollback inside self_check_eba
| * | ubi: Fix ubi_init() ubiblock_exit() section mismatchRichard Weinberger2024-07-281-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since ubiblock_exit() is now called from an init function, the __exit section no longer makes sense. Cc: Ben Hutchings <bwh@kernel.org> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202407131403.wZJpd8n2-lkp@intel.com/ Signed-off-by: Richard Weinberger <richard@nod.at> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com>
| * | ubifs: add check for crypto_shash_tfm_digestChen Ni2024-07-121-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add check for the return value of crypto_shash_tfm_digest() and return the error if it fails in order to catch the error. Fixes: 817aa094842d ("ubifs: support offline signed images") Signed-off-by: Chen Ni <nichen@iscas.ac.cn> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | ubifs: Fix inconsistent inode size when powercut happens during appendant ↵Zhihao Cheng2024-07-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | writing UBIFS always make sure that the data length won't beyond the inode size by writing inode before writing page(See ubifs_writepage.). After commit c35acef383f4a2f2cfc30("ubifs: Convert ubifs_writepage to use a folio"), the rule is broken in one case: Given a file with size 3, then write 4096 from the offset 0, following process will make inode size be smaller than file data length after powercut & recovery: P1 P2 ubifs_writepage len = folio_size(folio) // 4096 if (folio_pos(folio) + len <= i_size) // condition 1: 0 + 4096 <= 4096 //(i_size is updated as 4096 in ubifs_write_end) if (folio_pos(folio) >= synced_i_size) // condition 2: 0 >= 3, false write_inode // Skipped, because condition 2 is false do_writepage(folio, len) // write one page do_commit // data node won't be replayed in next mounting >> Powercut << So, inode size(4096) is not updated into disk, we will get following error messages in next mounting(chk_fs = 1): check_leaf [ubifs]: data node at LEB 14:2048 is not within inode size 3 dbg_walk_index [ubifs]: leaf checking function returned error -22, for leaf at LEB 14:2048 Fix it by modifying condition 2 as original comparison(Compare the page index of synced_i_size with current page index). Fixes: c35acef383f4 ("ubifs: Convert ubifs_writepage to use a folio") Link: https://bugzilla.kernel.org/show_bug.cgi?id=218934 Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | ubi: block: fix null-pointer-dereference in ubiblock_create()Li Nan2024-07-121-3/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Similar to commit adbf4c4954e3 ("ubi: block: fix memleak in ubiblock_create()"), 'dev->gd' is not assigned but dereferenced if blk_mq_alloc_tag_set() fails, and leading to a null-pointer-dereference. Fix it by using pr_err() and variable 'dev' to print error log. Additionally, the log in the error handle path of idr_alloc() has been improved by using pr_err(), too. Before initializing device name, using dev_err() will print error log with 'null' instead of the actual device name, like this: block (null): ... ~~~~~~ It is unclear. Using pr_err() can print more details of the device. The improved log is: ubiblock0_0: ... Fixes: 77567b25ab9f ("ubi: use blk_mq_alloc_disk and blk_cleanup_disk") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Li Nan <linan122@huawei.com> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Daniel Golle <daniel@makrotopia.org> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | ubifs: fix kernel-doc warningsJeff Johnson2024-07-126-6/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | make C=1 reports the following kernel-doc warnings: fs/ubifs/compress.c:103: warning: Function parameter or struct member 'c' not described in 'ubifs_compress' fs/ubifs/compress.c:155: warning: Function parameter or struct member 'c' not described in 'ubifs_decompress' fs/ubifs/find.c:353: warning: Excess function parameter 'data' description in 'scan_for_free_cb' fs/ubifs/find.c:353: warning: Function parameter or struct member 'arg' not described in 'scan_for_free_cb' fs/ubifs/find.c:594: warning: Excess function parameter 'data' description in 'scan_for_idx_cb' fs/ubifs/find.c:594: warning: Function parameter or struct member 'arg' not described in 'scan_for_idx_cb' fs/ubifs/find.c:786: warning: Excess function parameter 'data' description in 'scan_dirty_idx_cb' fs/ubifs/find.c:786: warning: Function parameter or struct member 'arg' not described in 'scan_dirty_idx_cb' fs/ubifs/find.c:86: warning: Excess function parameter 'data' description in 'scan_for_dirty_cb' fs/ubifs/find.c:86: warning: Function parameter or struct member 'arg' not described in 'scan_for_dirty_cb' fs/ubifs/journal.c:369: warning: expecting prototype for wake_up_reservation(). Prototype was for add_or_start_queue() instead fs/ubifs/lprops.c:1018: warning: Excess function parameter 'lst' description in 'scan_check_cb' fs/ubifs/lprops.c:1018: warning: Function parameter or struct member 'arg' not described in 'scan_check_cb' fs/ubifs/lpt.c:1938: warning: Function parameter or struct member 'ptr' not described in 'lpt_scan_node' fs/ubifs/replay.c:60: warning: Function parameter or struct member 'hash' not described in 'replay_entry' Fix them. Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | ubifs: correct UBIFS_DFS_DIR_LEN macro definition and improve code clarityZhaoLong Wang2024-07-126-18/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The UBIFS_DFS_DIR_LEN macro, which defines the maximum length of the UBIFS debugfs directory name, has an incorrect formula and misleading comments. The current formula is (3 + 1 + 2*2 + 1), which assumes that both UBI device number and volume ID are limited to 2 characters. However, UBI device number ranges from 0 to 31 (2 characters), and volume ID ranges from 0 to 127 (up to 3 characters). Although the current code works due to the cancellation of mathematical errors (9 + 1 = 10, which matches the correct UBIFS_DFS_DIR_LEN value), it can lead to confusion and potential issues in the future. This patch aims to improve the code clarity and maintainability by making the following changes: 1. Corrects the UBIFS_DFS_DIR_LEN macro definition to (3 + 1 + 2 + 3 + 1), accommodating the maximum lengths of both UBI device number and volume ID, plus the separators and null terminator. 2. Updates the snprintf calls to use UBIFS_DFS_DIR_LEN instead of UBIFS_DFS_DIR_LEN + 1, removing the unnecessary +1. 3. Modifies the error checks to compare against UBIFS_DFS_DIR_LEN using >= instead of >, aligning with the corrected macro definition. 4. Removes the redundant +1 in the dfs_dir_name array definitions in ubi.h and debug.h. While these changes do not affect the runtime behavior, they make the code more readable, maintainable, and less prone to future errors. v2->v3: - Removes the duplicated UBIFS_DFS_DIR_LEN and UBIFS_DFS_DIR_NAME macro definitions in ubifs.h, as they are already defined in debug.h. Signed-off-by: ZhaoLong Wang <wangzhaolong1@huawei.com> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | mtd: ubi: Restore missing cleanup on ubi_init() failure pathBen Hutchings2024-07-121-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | We need to clean-up debugfs and ubiblock if we fail after initialising them. Signed-off-by: Ben Hutchings <ben.hutchings@mind.be> Fixes: 927c145208b0 ("mtd: ubi: attach from device tree") Signed-off-by: Richard Weinberger <richard@nod.at>
| * | ubifs: dbg_orphan_check: Fix missed key type checkingZhihao Cheng2024-07-121-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When selinux/encryption is enabled, xattr entry node is added into TNC before host inode when creating new file. So it is possible to find xattr entry without host inode from TNC. Orphan debug checking is called by ubifs_orphan_end_commit(), at that time, the commit semaphore is already unlock, so the new creation won't be blocked. Fixes: d7f0b70d30ff ("UBIFS: Add security.* XATTR support for the UBIFS") Fixes: d475a507457b ("ubifs: Add skeleton for fscrypto") Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | ubifs: Fix unattached inode when powercut happens in creatingZhihao Cheng2024-07-123-26/+51
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | For selinux or encryption scenarios, UBIFS could become inconsistent while creating new files in powercut case. Encryption/selinux related xattrs will be created before creating file dentry, which makes creation process is not atomic, details are shown as: Encryption case: ubifs_create ubifs_new_inode fscrypt_set_context ubifs_xattr_set create_xattr ubifs_jnl_update // Disk: xentry xinode inode(LAST_OF_NODE_GROUP) >> power cut << ubifs_jnl_update // Disk: dentry inode parent_inode(LAST_OF_NODE_GROUP) Selinux case: ubifs_create ubifs_new_inode ubifs_init_security security_inode_init_security ubifs_xattr_set create_xattr ubifs_jnl_update // Disk: xentry xinode inode(LAST_OF_NODE_GROUP) >> power cut << ubifs_jnl_update // Disk: dentry inode parent_inode(LAST_OF_NODE_GROUP) Above process will make chk_fs failed in next mounting: UBIFS error (ubi0:0 pid 7995): dbg_check_filesystem [ubifs]: inode 66 nlink is 1, but calculated nlink is 0 Fix it by allocating orphan inode for each non-xattr file creation, then removing orphan list in journal writing process, which ensures that both xattr and dentry be effective in atomic when powercut happens. Fixes: d7f0b70d30ff ("UBIFS: Add security.* XATTR support for the UBIFS") Fixes: d475a507457b ("ubifs: Add skeleton for fscrypto") Link: https://bugzilla.kernel.org/show_bug.cgi?id=218309 Suggested-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | ubifs: Fix space leak when powercut happens in linking tmpfileZhihao Cheng2024-07-124-17/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is a potential space leak problem when powercut happens in linking tmpfile, in which case, inode node (with nlink=0) and its' data nodes can be found from tnc (on flash), but there are no dentries related to the inode, so the file is invisible but takes free space. Detailed process is shown as: ubifs_tmpfile ubifs_jnl_update // Add bud A into log area ubifs_add_orphan // Add inode into orphan list P1 P2 ubifs_link ubifs_delete_orphan // Delete inode from orphan list, then inode won't // be written into orphan area, there is no chance // to delete inode by replaying orphan. commit // bud A won't be replayed in next mounting >> powercut << ubifs_jnl_update // Link inode to dentry The root cause is that orphan entry deletion and journal writing(for link) are interrupted by commit, which makes the two operations are not atomic. Fix it by doing ubifs_delete_orphan under the protection of c->commit_sem within ubifs_jnl_update. This is also a preparation to support all creating new files by orphan inode. v1 is https://lore.kernel.org/linux-mtd/20200701093227.674945-1-chengzhihao1@huawei.com/ Fixes: 32fe905c17f0 ("ubifs: Fix O_TMPFILE corner case in ubifs_link()") Link: https://bugzilla.kernel.org/show_bug.cgi?id=208405 Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | ubifs: Move ui->data initialization after initializing securityZhihao Cheng2024-07-121-8/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Host inode and its' xattr will be written on disk after initializing security when creating symlink or dev, then the host inode and its dentry will be written again in ubifs_jnl_update. There is no need to write inode data in the security initialization pass, just move the ui->data initialization after initializing security. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | ubifs: Fix adding orphan entry twice for the same inodeZhihao Cheng2024-07-121-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The tmpfile could be added into orphan list twice, first time is creation, the second time is removing after it is linked. The orphan entry could be added twice for tmpfile if following sequence is satisfied: ubifs_tmpfile ubifs_jnl_update ubifs_add_orphan // first time to add orphan entry P1 P2 ubifs_link do_commit ubifs_orphan_start_commit orphan->cmt = 1 ubifs_delete_orphan orphan_delete if (orph->cmt) orph->del = 1; // orphan entry is not deleted from tree return ubifs_unlink ubifs_jnl_update ubifs_add_orphan orphan_add // found old orphan entry, second time to add orphan entry ubifs_err(c, "orphaned twice") return -EINVAL // unlink failed! ubifs_orphan_end_commit erase_deleted // delete old orphan entry rb_erase(&orphan->rb, &c->orph_tree) Fix it by removing orphan entry from orphan tree in advance, rather than remove it from orphan tree in committing process. Fixes: 32fe905c17f0 ("ubifs: Fix O_TMPFILE corner case in ubifs_link()") Link: https://bugzilla.kernel.org/show_bug.cgi?id=218672 Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | ubifs: Remove insert_dead_orphan from replaying orphan processZhihao Cheng2024-07-121-49/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | UBIFS will do commit at the end of mounting process(rw mode), dead orphans(added by insert_dead_orphan in replaying orphan) are deleted by ubifs_orphan_end_commit(). The only reason why dead orphans are added into orphan list is that old orpans may be lost when powercut happens in ubifs_orphan_end_commit(): ubifs_orphan_end_commit // TNC(updated by orphans) is not written yet if (c->cmt_orphans != 0) commit_orphans consolidate // traverse orphan list write_orph_nodes // rewrite all orphans by ubifs_leb_change // If dead orphans are not in list, they will be lost when powercut // happens, then TNC won't be updated by old orphans in next mounting. Luckily, the condition 'c->cmt_orphans != 0' will never be true in mounting process, there can't be new orphans added into orphan list before mounting returned, but commit will be done at the end of mounting. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | Revert "ubifs: ubifs_symlink: Fix memleak of inode->i_link in error path"Zhihao Cheng2024-07-121-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 6379b44cdcd67f5f5d986b73953e99700591edfa. Commit 1e022216dcd2 ("ubifs: ubifs_symlink: Fix memleak of inode->i_link in error path") is applied again in commit 6379b44cdcd6 ("ubifs: ubifs_symlink: Fix memleak of inode->i_link in error path"), which changed ubifs_mknod (It won't become a real problem). Just revert it. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | ubifs: Don't add xattr inode into orphan areaZhihao Cheng2024-07-122-74/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now, the entire inode with its' xattrs are removed while replaying orphan nodes. There is no need to add xattr inodes into orphan area, which is based on the fact that xattr entries won't be cleared from disk before deleting xattr inodes, in another words, current logic can make sure that xattr inode be deleted in any cases even UBIFS not record xattr inode into orphan area. Let's looking for possible paths that could clear xattr entries from disk but leave the xattr inode on TNC: 1. unlink/tmpfile -> ubifs_jnl_update: inode(nlink=0) is written into bud LEB and added into orphan list, then: a. powercut: ubifs_tnc_remove_ino(xattr entry/inode can be found from TNC and being deleted) is invoked in replaying journal. b. commit + powercut: inode is written into orphan area, and ubifs_tnc_remove_ino is invoked in replaying orphan nodes. c. evicting + powercut: xattr inode(nlink=0) is written on disk, xattr is removed from TNC, gc could clear xattr entries from disk. ubifs_tnc_remove_ino will apply on inode and xattr inode in replaying journal, so lost xattr entries will make no influence. d. evicting + commit + powercut: xattr inode/entry are removed from index tree(on disk) by ubifs_jnl_write_inode, xattr inode is cleared from orphan area by ubifs_jnl_write_inode + commit. e. commit + evicting + powercut: inode is written into orphan area, then equivalent to c. 2. remove xattr -> ubifs_jnl_delete_xattr: xattr entry(inum=0) and xattr inode(nlink=0) is written into bud LEB, xattr entry/inode are removed from TNC, then: a. powercut: gc could clear xattr entries from disk, which won't affect deleting xattr entry from TNC. ubifs_tnc_remove_ino will apply on xattr inode in replaying journal, ubifs_tnc_remove_nm will apply on xattr entry in replaying journal. b. commit + powercut: xattr entry/inode are removed from index tree (on disk). Tracking xattr inode in orphan list is imported by commit 988bec41318f3f ("ubifs: orphan: Handle xattrs like files"), it aims to fix the similar problem described in commit 7959cf3a7506d4a ("ubifs: journal: Handle xattrs like files"). Actually, the problem only exist in journal case but not the orphan case. So, we can remove the orphan tracking for xattr inodes. Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | ubifs: Fix unattached xattr inode if powercut happens after deletingZhihao Cheng2024-07-121-7/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When powercut happens after deleting file, the xattr inode could be alone existing in TNC but its' xattr entry cannot be found in TNC. File inode and xattr inode are added into orphan list after deleting file, file inode's nlink is 0 but xattr inode's nlink is not 0 (PS: zero nlink xattr inode is written on disk in evicting process by ubifs_jnl_write_inode). So, following process could happen: 1. touch file 2. setxattr(file) 3. unlink file // inode(nlink=0), xattr inode(nlink=1) are added into orphan list 4. commit // write inode inum and xattr inum into orphan area 5. powercut 6. mount do_kill_orphans // inode(nlink=0) is deleted from TNC by ubifs_tnc_remove_range, // xattr entry is deleted too. // xattr inode(nlink=1) is not deleted from TNC Finally we could see following error while debugging UBIFS: UBIFS error (ubi0:0 pid 1093): dbg_check_filesystem [ubifs]: inode 66 nlink is 1, but calculated nlink is 0 UBIFS (ubi0:0): dump of the inode 66 sitting in LEB 12:2128 node_type 0 (inode node) group_type 1 (in node group) len 197 key (66, inode) size 37 nlink 1 flags 0x20 xattr_cnt 0 xattr_size 0 xattr_names 0 data len 37 Fix it by removing entire inode with it's xattrs while replaying orphan, just replace function ubifs_tnc_remove_range by ubifs_tnc_remove_ino. Fixes: ee1438ce5dc4 ("ubifs: Check link count of inodes when killing orphans.") Link: https://bugzilla.kernel.org/show_bug.cgi?id=218661 Signed-off-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | mtd: ubi: avoid expensive do_div() on 32-bit machinesArnd Bergmann2024-07-121-3/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The use of do_div() in ubi_nvmem_reg_read() makes calling it on 32-bit machines rather expensive. Since the 'from' variable is known to be a 32-bit quantity, it is clearly never needed and can be optimized into a regular division operation. Fixes: b8a77b9a5f9c ("mtd: ubi: fix NVMEM over UBI volumes on 32-bit systems") Fixes: 3ce485803da1 ("mtd: ubi: provide NVMEM layer over UBI volumes") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | mtd: ubi: make ubi_class constantRicardo B. Marliere2024-07-122-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 43a7206b0963 ("driver core: class: make class_register() take a const *"), the driver core allows for struct class to be in read-only memory, so move the ubi_class structure to be declared at build time placing it into read-only memory, instead of having to be dynamically allocated at boot time. Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Ricardo B. Marliere <ricardo@marliere.net> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Richard Weinberger <richard@nod.at>
| * | ubi: eba: properly rollback inside self_check_ebaFedor Pchelkin2024-07-121-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In case of a memory allocation failure in the volumes loop we can only process the already allocated scan_eba and fm_eba array elements on the error path - others are still uninitialized. Found by Linux Verification Center (linuxtesting.org). Fixes: 00abf3041590 ("UBI: Add self_check_eba()") Cc: stable@vger.kernel.org Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Reviewed-by: Zhihao Cheng <chengzhihao1@huawei.com> Signed-off-by: Richard Weinberger <richard@nod.at>
* | | Merge tag 'v6.11-merge' of ↵Linus Torvalds2024-07-285-495/+2274
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux Pull turbostat updates from Len Brown: - Enable turbostat extensions to add both perf and PMT (Intel Platform Monitoring Technology) counters via the cmdline - Demonstrate PMT access with built-in support for Meteor Lake's Die C6 counter * tag 'v6.11-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: tools/power turbostat: version 2024.07.26 tools/power turbostat: Include umask=%x in perf counter's config tools/power turbostat: Document PMT in turbostat.8 tools/power turbostat: Add MTL's PMT DC6 builtin counter tools/power turbostat: Add early support for PMT counters tools/power turbostat: Add selftests for added perf counters tools/power turbostat: Add selftests for SMI, APERF and MPERF counters tools/power turbostat: Move verbose counter messages to level 2 tools/power turbostat: Move debug prints from stdout to stderr tools/power turbostat: Fix typo in turbostat.8 tools/power turbostat: Add perf added counter example to turbostat.8 tools/power turbostat: Fix formatting in turbostat.8 tools/power turbostat: Extend --add option with perf counters tools/power turbostat: Group SMI counter with APERF and MPERF tools/power turbostat: Add ZERO_ARRAY for zero initializing builtin array tools/power turbostat: Replace enum rapl_source and cstate_source with counter_source tools/power turbostat: Remove anonymous union from rapl_counter_info_t tools/power/turbostat: Switch to new Intel CPU model defines
| * | | tools/power turbostat: version 2024.07.26Len Brown2024-07-261-53/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Release 2024.07.26: Enable turbostat extensions to add both perf and PMT (Intel Platform Monitoring Technology) counters from the cmdline. Demonstrate PMT access with built-in support for Meteor Lake's Die%c6 counter. This commit: Clean up white-space nits introduced since version 2024.05.10 Signed-off-by: Len Brown <len.brown@intel.com>
| * | | tools/power turbostat: Include umask=%x in perf counter's configPatryk Wlazlyn2024-07-261-10/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some counters, like cpu/cache-misses/, expose and require umask=%x parameter alongside event=%x in the sysfs perf counter's event file. This change make sure we parse and use it when opening user added counters. Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * | | tools/power turbostat: Document PMT in turbostat.8Patryk Wlazlyn2024-07-261-0/+65
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a general description of the user interface for adding PMT counters with the new --add pmt,... option. Provide a complete example for requesting two counters. Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * | | tools/power turbostat: Add MTL's PMT DC6 builtin counterPatryk Wlazlyn2024-07-261-1/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Provide a definition for metadata that allows reading DC6 residency counter via PMT and exposes it as a builtin counter. Note that this residency counter is updated and read via entirely different mechanisms vs the MSR-based residency counters. On MTL processors, there are times when Die%c6 will report above 100%. This is still useful, but don't expect 3 digits of precision... Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * | | tools/power turbostat: Add early support for PMT countersPatryk Wlazlyn2024-07-261-2/+766
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allows users to read Intel PMT (Platform Monitoring Technology) counters, providing interface similar to one used to add MSR and perf counters. Because PMT is exposed as a raw MMIO range, without metadata, user has to supply the necessary information to find and correctly display the requested counter. Signed-off-by: Patryk Wlazlyn <patryk.wlazlyn@linux.intel.com> Signed-off-by: Len Brown <len.brown@intel.com>