summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* ceph: mark directory as non-complete after loading keyLuís Henriques2023-08-244-9/+46
| | | | | | | | | | | | | | | | When setting a directory's crypt context, ceph_dir_clear_complete() needs to be called otherwise if it was complete before, any existing (old) dentry will still be valid. This patch adds a wrapper around __fscrypt_prepare_readdir() which will ensure a directory is marked as non-complete if key status changes. [ xiubli: revise commit title per Milind ] Signed-off-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: allow encrypting a directory while not having Ax capsLuís Henriques2023-08-241-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | If a client doesn't have Fx caps on a directory, it will get errors while trying encrypt it: ceph: handle_cap_grant: cap grant attempt to change fscrypt_auth on non-I_NEW inode (old len 0 new len 48) fscrypt (ceph, inode 1099511627812): Error -105 getting encryption context A simple way to reproduce this is to use two clients: client1 # mkdir /mnt/mydir client2 # ls /mnt/mydir client1 # fscrypt encrypt /mnt/mydir client1 # echo hello > /mnt/mydir/world This happens because, in __ceph_setattr(), we only initialize ci->fscrypt_auth if we have Ax and ceph_fill_inode() won't use the fscrypt_auth received if the inode state isn't I_NEW. Fix it by allowing ceph_fill_inode() to also set ci->fscrypt_auth if the inode doesn't have it set already. Signed-off-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: add some fscrypt guardrailsJeff Layton2023-08-245-9/+55
| | | | | | | | | | | | | | | | | | | | | | | | | Add the appropriate calls into fscrypt for various actions, including link, rename, setattr, and the open codepaths. Disable fallocate for encrypted inodes -- hopefully, just for now. If we have an encrypted inode, then the client will need to re-encrypt the contents of the new object. Disable copy offload to or from encrypted inodes. Set i_blkbits to crypto block size for encrypted inodes -- some of the underlying infrastructure for fscrypt relies on i_blkbits being aligned to crypto blocksize. Report STATX_ATTR_ENCRYPTED on encrypted inodes. [ lhenriques: forbid encryption with striped layouts ] Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: create symlinks with encrypted and base64-encoded targetsJeff Layton2023-08-242-17/+150
| | | | | | | | | | | | | | | | When creating symlinks in encrypted directories, encrypt and base64-encode the target with the new inode's key before sending to the MDS. When filling a symlinked inode, base64-decode it into a buffer that we'll keep in ci->i_symlink. When get_link is called, decrypt the buffer into a new one that will hang off i_link. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: add support to readdir for encrypted namesXiubo Li2023-08-246-26/+144
| | | | | | | | | | | | | | | | | | | To make it simpler to decrypt names in a readdir reply (i.e. before we have a dentry), add a new ceph_encode_encrypted_fname()-like helper that takes a qstr pointer instead of a dentry pointer. Once we've decrypted the names in a readdir reply, we no longer need the crypttext, so overwrite them in ceph_mds_reply_dir_entry with the unencrypted names. Then in both ceph_readdir_prepopulate() and ceph_readdir() we will use the dencrypted name directly. [ jlayton: convert some BUG_ONs into error returns ] Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: pass the request to parse_reply_info_readdir()Xiubo Li2023-08-241-10/+13
| | | | | | | | | | | | Instead of passing just the r_reply_info to the readdir reply parser, pass the request pointer directly instead. This will facilitate implementing readdir on fscrypted directories. Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: make ceph_fill_trace and ceph_get_name decrypt namesJeff Layton2023-08-242-14/+60
| | | | | | | | | | | When we get a dentry in a trace, decrypt the name so we can properly instantiate the dentry or fill out ceph_get_name() buffer. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: add helpers for converting names for userland presentationJeff Layton2023-08-242-0/+123
| | | | | | | | | | | | | | | | Define a new ceph_fname struct that we can use to carry information about encrypted dentry names. Add helpers for working with these objects, including ceph_fname_to_usr which formats an encrypted filename for userland presentation. [ xiubli: fix resulting name length check -- neither name_len nor ctext_len should exceed NAME_MAX ] Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: make d_revalidate call fscrypt revalidator for encrypted dentriesJeff Layton2023-08-241-2/+7
| | | | | | | | | | | | If we have a dentry which represents a no-key name, then we need to test whether the parent directory's encryption key has since been added. Do that before we test anything else about the dentry. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: set DCACHE_NOKEY_NAME flag in ceph_lookup/atomic_open()Jeff Layton2023-08-242-0/+18
| | | | | | | | | | | | | | | | | This is required so that we know to invalidate these dentries when the directory is unlocked. Atomic open can act as a lookup if handed a dentry that is negative on the MDS. Ensure that we set DCACHE_NOKEY_NAME on the dentry in atomic_open, if we don't have the key for the parent. Otherwise, we can end up validating the dentry inappropriately if someone later adds a key. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: decode alternate_name in lease infoJeff Layton2023-08-242-10/+38
| | | | | | | | | | | | | | | | | | | | | | | Ceph is a bit different from local filesystems, in that we don't want to store filenames as raw binary data, since we may also be dealing with clients that don't support fscrypt. We could just base64-encode the encrypted filenames, but that could leave us with filenames longer than NAME_MAX. It turns out that the MDS doesn't care much about filename length, but the clients do. To manage this, we've added a new "alternate name" field that can be optionally added to any dentry that we'll use to store the binary crypttext of the filename if its base64-encoded value will be longer than NAME_MAX. When a dentry has one of these names attached, the MDS will send it along in the lease info, which we can then store for later usage. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: send alternate_name in MClientRequestJeff Layton2023-08-242-5/+74
| | | | | | | | | | | | | | | | | | | In the event that we have a filename longer than CEPH_NOHASH_NAME_MAX, we'll need to hash the tail of the filename. The client however will still need to know the full name of the file if it has a key. To support this, the MClientRequest field has grown a new alternate_name field that we populate with the full (binary) crypttext of the filename. This is then transmitted to the clients in readdir or traces as part of the dentry lease. Add support for populating this field when the filenames are very long. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: encode encrypted name in ceph_mdsc_build_path and dentry releaseJeff Layton2023-08-245-25/+172
| | | | | | | | | | | | | | | | | | | | | | | Allow ceph_mdsc_build_path to encrypt and base64 encode the filename when the parent is encrypted and we're sending the path to the MDS. In a similar fashion, encode encrypted dentry names if including a dentry release in a request. In most cases, we just encrypt the filenames and base64 encode them, but when the name is longer than CEPH_NOHASH_NAME_MAX, we use a similar scheme to fscrypt proper, and hash the remaning bits with sha256. When doing this, we then send along the full crypttext of the name in the new alternate_name field of the MClientRequest. The MDS can then send that along in readdir responses and traces. [ idryomov: drop duplicate include reported by Abaci Robot ] Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: add base64 endcoding routines for encrypted namesLuís Henriques2023-08-222-0/+92
| | | | | | | | | | | | | The base64url encoding used by fscrypt includes the '_' character, which may cause problems in snapshot names (if the name starts with '_'). Thus, use the base64 encoding defined for IMAP mailbox names (RFC 3501), which uses '+' and ',' instead of '-' and '_'. Signed-off-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: make ioctl cmds more readable in debug logXiubo Li2023-08-221-1/+38
| | | | | | | | | | | | | | | ioctl file 0000000004e6b054 cmd 2148296211 arg 824635143532 The numerical cmd value in the ioctl debug log message is too hard to understand even when you look at it in the code. Make it more readable. [ idryomov: add missing _ in ceph_ioctl_cmd_name() ] Signed-off-by: Xiubo Li <xiubli@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: add fscrypt ioctls and ceph.fscrypt.auth vxattrJeff Layton2023-08-222-0/+111
| | | | | | | | | | | | | | | | | | We gate most of the ioctls on MDS feature support. The exception is the key removal and status functions that we still want to work if the MDS's were to (inexplicably) lose the feature. For the set_policy ioctl, we take Fs caps to ensure that nothing can create files in the directory while the ioctl is running. That should be enough to ensure that the "empty_dir" check is reliable. The vxattr is read-only, added mostly for future debugging purposes. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: implement -o test_dummy_encryption mount optionJeff Layton2023-08-226-6/+186
| | | | | | | | | | | | | | | Add support for the test_dummy_encryption mount option. This allows us to test the encrypted codepaths in ceph without having to manually set keys, etc. [ lhenriques: fix potential fsc->fsc_dummy_enc_policy memory leak in ceph_real_mount() ] Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: fscrypt_auth handling for cephJeff Layton2023-08-2211-35/+385
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Most fscrypt-enabled filesystems store the crypto context in an xattr, but that's problematic for ceph as xatts are governed by the XATTR cap, but we really want the crypto context as part of the AUTH cap. Because of this, the MDS has added two new inode metadata fields: fscrypt_auth and fscrypt_file. The former is used to hold the crypto context, and the latter is used to track the real file size. Parse new fscrypt_auth and fscrypt_file fields in inode traces. For now, we don't use fscrypt_file, but fscrypt_auth is used to hold the fscrypt context. Allow the client to use a setattr request for setting the fscrypt_auth field. Since this is not a standard setattr request from the VFS, we add a new field to __ceph_setattr that carries ceph-specific inode attrs. Have the set_context op do a setattr that sets the fscrypt_auth value, and get_context just return the contents of that field (since it should always be available). Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: use osd_req_op_extent_osd_iter for netfs readsJeff Layton2023-08-221-18/+1
| | | | | | | | | | | | | | | The netfs layer has already pinned the pages involved before calling issue_op, so we can just pass down the iter directly instead of calling iov_iter_get_pages_alloc. Instead of having to allocate a page array, use CEPH_MSG_DATA_ITER and pass it the iov_iter directly to clone. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* libceph: add new iov_iter-based ceph_msg_data_type and ceph_osd_data_typeJeff Layton2023-08-224-0/+116
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Add an iov_iter to the unions in ceph_msg_data and ceph_msg_data_cursor. Instead of requiring a list of pages or bvecs, we can just use an iov_iter directly, and avoid extra allocations. We assume that the pages represented by the iter are pinned such that they shouldn't incur page faults, which is the case for the iov_iters created by netfs. While working on this, Al Viro informed me that he was going to change iov_iter_get_pages to auto-advance the iterator as that pattern is more or less required for ITER_PIPE anyway. We emulate that here for now by advancing in the _next op and tracking that amount in the "lastlen" field. In the event that _next is called twice without an intervening _advance, we revert the iov_iter by the remaining lastlen before calling iov_iter_get_pages. Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: make ceph_msdc_build_path use ref-walkJeff Layton2023-08-221-16/+19
| | | | | | | | | | | | | | | | Encryption potentially requires allocation, at which point we'll need to be in a non-atomic context. Convert ceph_msdc_build_path to take dentry spinlocks and references instead of using rcu_read_lock to walk the path. This is slightly less efficient, and we may want to eventually allow using RCU when the leaf dentry isn't encrypted. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: preallocate inode for ops that may create oneJeff Layton2023-08-226-67/+179
| | | | | | | | | | | | | | | | | | | | | | | | | | | | When creating a new inode, we need to determine the crypto context before we can transmit the RPC. The fscrypt API has a routine for getting a crypto context before a create occurs, but it requires an inode. Change the ceph code to preallocate an inode in advance of a create of any sort (open(), mknod(), symlink(), etc). Move the existing code that generates the ACL and SELinux blobs into this routine since that's mostly common across all the different codepaths. In most cases, we just want to allow ceph_fill_trace to use that inode after the reply comes in, so add a new field to the MDS request for it (r_new_inode). The async create codepath is a bit different though. In that case, we want to hash the inode in advance of the RPC so that it can be used before the reply comes in. If the call subsequently fails with -EJUKEBOX, then just put the references and clean up the as_ctx. Note that with this change, we now need to regenerate the as_ctx when this occurs, but it's quite rare for it to happen. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* ceph: add new mount option to enable sparse readsJeff Layton2023-08-224-11/+72
| | | | | | | | | | | | | | | | | | | | Add a new mount option that has the client issue sparse reads instead of normal ones. The callers now preallocate an sparse extent buffer that the libceph receive code can populate and hand back after the operation completes. After a successful sparse read, we can't use the req->r_result value to determine the amount of data "read", so instead we set the received length to be from the end of the last extent in the buffer. Any interstitial holes will have been filled by the receive code. [ xiubli: fix a double free on req reported by Ilya ] Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* libceph: add sparse read support to OSD clientJeff Layton2023-08-222-4/+285
| | | | | | | | | | | | | | | | | | | | | | Have get_reply check for the presence of sparse read ops in the request and set the sparse_read boolean in the msg. That will queue the messenger layer to use the sparse read codepath instead of the normal data receive. Add a new sparse_read operation for the OSD client, driven by its own state machine. The messenger will repeatedly call the sparse_read operation, and it will pass back the necessary info to set up to read the next extent of data, while zero-filling the sparse regions. The state machine will stop at the end of the last extent, and will attach the extent map buffer to the ceph_osd_req_op so that the caller can use it. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* libceph: add sparse read support to msgr1Jeff Layton2023-08-222-8/+94
| | | | | | | | | | | | | | | | | Add 2 new fields to ceph_connection_v1_info to track the necessary info in sparse reads. Skip initializing the cursor for a sparse read. Break out read_partial_message_section into a wrapper around a new read_partial_message_chunk function that doesn't zero out the crc first. Add new helper functions to drive receiving into the destinations provided by the sparse_read state machine. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* libceph: support sparse reads on msgr2 secure codepathJeff Layton2023-08-221-10/+110
| | | | | | | | | | | | | | | | | | | | | | Add a new init_sgs_pages helper that populates the scatterlist from an arbitrary point in an array of pages. Change setup_message_sgs to take an optional pointer to an array of pages. If that's set, then the scatterlist will be set using that array instead of the cursor. When given a sparse read on a secure connection, decrypt the data in-place rather than into the final destination, by passing it the in_enc_pages array. After decrypting, run the sparse_read state machine in a loop, copying data from the decrypted pages until it's complete. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* libceph: new sparse_read op, support sparse reads on msgr2 crc codepathJeff Layton2023-08-223-9/+187
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add support for a new sparse_read ceph_connection operation. The idea is that the client driver can define this operation use it to do special handling for incoming reads. The alloc_msg routine will look at the request and determine whether the reply is expected to be sparse. If it is, then we'll dispatch to a different set of state machine states that will repeatedly call the driver's sparse_read op to get length and placement info for reading the extent map, and the extents themselves. This necessitates adding some new field to some other structs: - The msg gets a new bool to track whether it's a sparse_read request. - A new field is added to the cursor to track the amount remaining in the current extent. This is used to cap the read from the socket into the msg_data - Handing a revoke with all of this is particularly difficult, so I've added a new data_len_remain field to the v2 connection info, and then use that to skip that much on a revoke. We may want to expand the use of that to the normal read path as well, just for consistency's sake. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* libceph: define struct ceph_sparse_extent and add some helpersJeff Layton2023-08-222-1/+55
| | | | | | | | | | | | | | | When the OSD sends back a sparse read reply, it contains an array of these structures. Define the structure and add a couple of helpers for dealing with them. Also add a place in struct ceph_osd_req_op to store the extent buffer, and code to free it if it's populated when the req is torn down. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* libceph: add spinlock around osd->o_requestsJeff Layton2023-08-222-1/+12
| | | | | | | | | | | | | | | | In a later patch, we're going to need to search for a request in the rbtree, but taking the o_mutex is inconvenient as we already hold the con mutex at the point where we need it. Add a new spinlock that we take when inserting and erasing entries from the o_requests tree. Search of the rbtree can be done with either the mutex or the spinlock, but insertion and removal requires both. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Reviewed-and-tested-by: Luís Henriques <lhenriques@suse.de> Reviewed-by: Milind Changire <mchangir@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
* Linux 6.5-rc7v6.5-rc7Linus Torvalds2023-08-201-1/+1
|
* Merge tag 'tty-6.5-rc7' of ↵Linus Torvalds2023-08-209-34/+72
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty Pull tty/serial fixes from Greg KH: "Here are some small tty and serial core fixes for 6.5-rc7 that resolve a lot of reported issues. Primarily in here are the fixes for the serial bus code from Tony that came in -rc1, as it hit wider testing with the huge number of different types of systems and serial ports. All of the reported issues with duplicate names and other issues with this code are now resolved. Other than that included in here is: - n_gsm fix for a previous fix - 8250 lockdep annotation fix - fsl_lpuart serial driver fix - TIOCSTI documentation update for previous CAP_SYS_ADMIN change All of these have been in linux-next for a while with no reported problems" * tag 'tty-6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: serial: core: Fix serial core port id, including multiport devices serial: 8250: drop lockdep annotation from serial8250_clear_IER() tty: n_gsm: fix the UAF caused by race condition in gsm_cleanup_mux serial: core: Revert port_id use TIOCSTI: Document CAP_SYS_ADMIN behaviour in Kconfig serial: 8250: Fix oops for port->pm on uart_change_pm() serial: 8250: Reinit port_id when adding back serial8250_isa_devs serial: core: Fix kmemleak issue for serial core device remove MAINTAINERS: Merge TTY layer and serial drivers serial: core: Fix serial_base_match() after fixing controller port name serial: core: Fix serial core controller port name to show controller id serial: core: Fix serial core port id to not use port->line serial: core: Controller id cannot be negative tty: serial: fsl_lpuart: Clear the error flags by writing 1 for lpuart32 platforms
| * serial: core: Fix serial core port id, including multiport devicesTony Lindgren2023-08-112-1/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We want to fix the serial core port DEVNAME to use a port id of the hardware specific controller port instance instead of the port->line. For example, the 8250 driver sets up a number of serial8250 ports initially that can be inherited by the hardware specific driver. At that the port->line no longer decribes the port's relation to the serial core controller instance. Let's fix the issue by assigning port->port_id for each serial core controller port instance. Fixes: 7d695d83767c ("serial: core: Fix serial_base_match() after fixing controller port name") Tested-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Dhruva Gole <d-gole@ti.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20230811103648.2826-1-tony@atomide.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * serial: 8250: drop lockdep annotation from serial8250_clear_IER()Jiri Slaby (SUSE)2023-08-111-3/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The port lock is not always held when calling serial8250_clear_IER(). When an oops is in progress, the lock is tried to be taken and when it is not, a warning is issued: WARNING: CPU: 0 PID: 1 at drivers/tty/serial/8250/8250_port.c:707 +0x57/0x60 Modules linked in: CPU: 0 PID: 1 Comm: init Not tainted 6.5.0-rc5-1.g225bfb7-default+ #774 00f1be860db663ed29479b8255d3b01ab1135bd3 Hardware name: QEMU Standard PC ... RIP: 0010:serial8250_clear_IER+0x57/0x60 ... Call Trace: <TASK> serial8250_console_write+0x9e/0x4b0 console_flush_all+0x217/0x5f0 ... Therefore, remove the annotation as it doesn't hold for all invocations. The other option would be to make the lockdep test conditional on 'oops_in_progress' or pass 'locked' from serial8250_console_write(). I don't think, that is worth it. Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org> Reported-by: Vlastimil Babka <vbabka@suse.cz> Cc: John Ogness <john.ogness@linutronix.de> Fixes: d0b309a5d3f4 (serial: 8250: synchronize and annotate UART_IER access) Link: https://lore.kernel.org/r/20230811064340.13400-1-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * tty: n_gsm: fix the UAF caused by race condition in gsm_cleanup_muxYi Yang2023-08-111-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In commit 9b9c8195f3f0 ("tty: n_gsm: fix UAF in gsm_cleanup_mux"), the UAF problem is not completely fixed. There is a race condition in gsm_cleanup_mux(), which caused this UAF. The UAF problem is triggered by the following race: task[5046] task[5054] ----------------------- ----------------------- gsm_cleanup_mux(); dlci = gsm->dlci[0]; mutex_lock(&gsm->mutex); gsm_cleanup_mux(); dlci = gsm->dlci[0]; //Didn't take the lock gsm_dlci_release(gsm->dlci[i]); gsm->dlci[i] = NULL; mutex_unlock(&gsm->mutex); mutex_lock(&gsm->mutex); dlci->dead = true; //UAF Fix it by assigning values after mutex_lock(). Link: https://syzkaller.appspot.com/text?tag=CrashReport&x=176188b5a80000 Cc: stable <stable@kernel.org> Fixes: 9b9c8195f3f0 ("tty: n_gsm: fix UAF in gsm_cleanup_mux") Fixes: aa371e96f05d ("tty: n_gsm: fix restart handling via CLD command") Signed-off-by: Yi Yang <yiyang13@huawei.com> Co-developed-by: Qiumiao Zhang <zhangqiumiao1@huawei.com> Signed-off-by: Qiumiao Zhang <zhangqiumiao1@huawei.com> Link: https://lore.kernel.org/r/20230811031121.153237-1-yiyang13@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * serial: core: Revert port_id useTony Lindgren2023-08-091-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Guenter reports boot issues with duplicate sysfs entries for multiport drivers. Let's go back to using port->line for now to fix the regression. With this change, the serial core port device names are not correct for the hardware specific 8250 single port drivers, but that's a cosmetic issue for now. Fixes: d962de6ae51f ("serial: core: Fix serial core port id to not use port->line") Reported-by: Guenter Roeck <groeck7@gmail.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Tested-by: Guenter Roeck <linux@roeck-us.net> Link: https://lore.kernel.org/r/20230806062052.47737-1-tony@atomide.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * TIOCSTI: Document CAP_SYS_ADMIN behaviour in KconfigGünther Noack2023-08-091-0/+3
| | | | | | | | | | | | | | | | | | | | | | Clarifies that the LEGACY_TIOCSTI setting is safe to turn off even when running BRLTTY, as it was introduced in commit 690c8b804ad2 ("TIOCSTI: always enable for CAP_SYS_ADMIN"). Signed-off-by: Günther Noack <gnoack3000@gmail.com> Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Link: https://lore.kernel.org/r/20230808201115.23993-1-gnoack3000@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * serial: 8250: Fix oops for port->pm on uart_change_pm()Tony Lindgren2023-08-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Unloading a hardware specific 8250 driver can produce error "Unable to handle kernel paging request at virtual address" about ten seconds after unloading the driver. This happens on uart_hangup() calling uart_change_pm(). Turns out commit 04e82793f068 ("serial: 8250: Reinit port->pm on port specific driver unbind") was only a partial fix. If the hardware specific driver has initialized port->pm function, we need to clear port->pm too. Just reinitializing port->ops does not do this. Otherwise serial8250_pm() will call port->pm() instead of serial8250_do_pm(). Fixes: 04e82793f068 ("serial: 8250: Reinit port->pm on port specific driver unbind") Signed-off-by: Tony Lindgren <tony@atomide.com> Link: https://lore.kernel.org/r/20230804131553.52927-1-tony@atomide.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * serial: 8250: Reinit port_id when adding back serial8250_isa_devsTony Lindgren2023-08-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After fixing the serial core port device to use port->port_id instead of port->line, unloading a hardware specific 8250 port driver started producing an error for "sysfs: cannot create duplicate filename". This is happening as we are wrongly initializing port->port_id to zero when adding back serial8250_isa_devs instances, and the serial8250:0.0 sysfs entry may already exist. For serial8250 devices, we typically have multiple devices mapped to a single driver instance. For the serial8250_isa_devs instances, the port->port_id is the same as port->line. Let's fix the issue by re-initializing port_id when adding back the serial8250_isa_devs instances in serial8250_unregister_port(). Fixes: d962de6ae51f ("serial: core: Fix serial core port id to not use port->line") Signed-off-by: Tony Lindgren <tony@atomide.com> Link: https://lore.kernel.org/r/20230804123546.25293-1-tony@atomide.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * serial: core: Fix kmemleak issue for serial core device removeTony Lindgren2023-08-041-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Kmemleak reports issues for serial8250 ports after the hardware specific driver takes over on boot as noted by Tomi. The kerneldoc for device_initialize() says we must call device_put() after calling device_initialize(). We are calling device_put() on the error path, but are missing it from the device remove path. This causes release() to never get called for the devices on remove. Let's add the missing put_device() calls for both serial ctrl and port devices. Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM") Reported-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Tested-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com> Link: https://lore.kernel.org/r/20230804090909.51529-1-tony@atomide.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * MAINTAINERS: Merge TTY layer and serial driversTony Lindgren2023-08-041-13/+2
| | | | | | | | | | | | | | | | | | | | | | | | Greg suggested we merge TTY layer and serial driver entries to avoid duplicates. Suggested-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Acked-by: Jiri Slaby <jirislaby@kernel.org> Signed-off-by: Tony Lindgren <tony@atomide.com> Link: https://lore.kernel.org/r/20230804102042.53576-1-tony@atomide.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * serial: core: Fix serial_base_match() after fixing controller port nameTony Lindgren2023-08-031-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | While fixing DEVNAME to be more usable, I broke serial_base_match() as the ctrl and port prefix for device names seemed unnecessary. The prefixes are still needed by serial_base_match() to probe the serial base controller port, and serial tx is now broken. Let's fix the issue by checking against dev->type and drv->name instead of the prefixes that are no longer in the DEVNAME. Fixes: 1ef2c2df1199 ("serial: core: Fix serial core controller port name to show controller id") Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202308021529.35b3ad6c-oliver.sang@intel.com Signed-off-by: Tony Lindgren <tony@atomide.com> Reviewed-by: Jiri Slaby <jirislaby@kernel.org> Link: https://lore.kernel.org/r/20230803071034.25571-1-tony@atomide.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * serial: core: Fix serial core controller port name to show controller idTony Lindgren2023-08-011-12/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We are missing the serial core controller id for the serial core port name. Let's fix the issue for sane sysfs output, and to avoid issues addressing serial ports later on. And as we're now showing the controller id, the "ctrl" and "port" prefix for the DEVNAME become useless, we can just drop them. Let's standardize on DEVNAME:0 for controller name, where 0 is the controller id. And DEVNAME:0.0 for port name, where 0.0 are the controller id and port id. This makes the sysfs output nicer, on qemu for example: $ ls /sys/bus/serial-base/devices 00:04:0 serial8250:0 serial8250:0.2 00:04:0.0 serial8250:0.1 serial8250:0.3 Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM") Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20230725054216.45696-4-tony@atomide.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * serial: core: Fix serial core port id to not use port->lineTony Lindgren2023-08-013-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The serial core port id should be serial core controller specific port instance, which is not always the port->line index. For example, 8250 driver maps a number of legacy ports, and when a hardware specific device driver takes over, we typically have one driver instance for each port. Let's instead add port->port_id to keep track serial ports mapped to each serial core controller instance. Currently this is only a cosmetic issue for the serial core port device names. The issue can be noticed looking at /sys/bus/serial-base/devices for example though. Let's fix the issue to avoid port addressing issues later on. Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM") Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Link: https://lore.kernel.org/r/20230725054216.45696-3-tony@atomide.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * serial: core: Controller id cannot be negativeTony Lindgren2023-08-011-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | The controller id cannot be negative. Let's fix the ctrl_id in preparation for adding port_id to fix the device name. Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM") Reported-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Link: https://lore.kernel.org/r/20230725054216.45696-2-tony@atomide.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
| * tty: serial: fsl_lpuart: Clear the error flags by writing 1 for lpuart32 ↵Sherry Sun2023-08-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | platforms Do not read the data register to clear the error flags for lpuart32 platforms, the additional read may cause the receive FIFO underflow since the DMA has already read the data register. Actually all lpuart32 platforms support write 1 to clear those error bits, let's use this method to better clear the error flags. Fixes: 42b68768e51b ("serial: fsl_lpuart: DMA support for 32-bit variant") Cc: stable <stable@kernel.org> Signed-off-by: Sherry Sun <sherry.sun@nxp.com> Link: https://lore.kernel.org/r/20230801022304.24251-1-sherry.sun@nxp.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
* | Merge tag 'rust-fixes-6.5-rc7' of https://github.com/Rust-for-Linux/linuxLinus Torvalds2023-08-201-0/+1
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull rust fix from Miguel Ojeda: - Macros: fix 'HAS_*' redefinition by the '#[vtable]' macro under conditional compilation * tag 'rust-fixes-6.5-rc7' of https://github.com/Rust-for-Linux/linux: rust: macros: vtable: fix `HAS_*` redefinition (`gen_const_name`)
| * | rust: macros: vtable: fix `HAS_*` redefinition (`gen_const_name`)Qingsong Chen2023-08-091-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we define the same function name twice in a trait (using `#[cfg]`), the `vtable` macro will redefine its `gen_const_name`, e.g. this will define `HAS_BAR` twice: #[vtable] pub trait Foo { #[cfg(CONFIG_X)] fn bar(); #[cfg(not(CONFIG_X))] fn bar(x: usize); } Fixes: b44becc5ee80 ("rust: macros: add `#[vtable]` proc macro") Signed-off-by: Qingsong Chen <changxian.cqs@antgroup.com> Reviewed-by: Andreas Hindborg <a.hindborg@samsung.com> Reviewed-by: Gary Guo <gary@garyguo.net> Reviewed-by: Sergio González Collado <sergio.collado@gmail.com> Link: https://lore.kernel.org/r/20230808025404.2053471-1-changxian.cqs@antgroup.com Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
* | | Merge tag 'i2c-for-6.5-rc7' of ↵Linus Torvalds2023-08-197-9/+37
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Pull i2c fixes from Wolfram Sang: "Usual set of driver fixes. A bit more than usual because I was unavailable for a while" * tag 'i2c-for-6.5-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux: i2c: bcm-iproc: Fix bcm_iproc_i2c_isr deadlock issue i2c: Update documentation to use .probe() again i2c: sun6i-p2wi: Fix an error message in probe() i2c: hisi: Only handle the interrupt of the driver's transfer i2c: tegra: Fix i2c-tegra DMA config option processing i2c: tegra: Fix failure during probe deferral cleanup i2c: designware: Handle invalid SMBus block data response length value i2c: designware: Correct length byte validation logic i2c: imx-lpi2c: return -EINVAL when i2c peripheral clk doesn't work
| * | | i2c: bcm-iproc: Fix bcm_iproc_i2c_isr deadlock issueChengfeng Ye2023-08-141-4/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | iproc_i2c_rd_reg() and iproc_i2c_wr_reg() are called from both interrupt context (e.g. bcm_iproc_i2c_isr) and process context (e.g. bcm_iproc_i2c_suspend). Therefore, interrupts should be disabled to avoid potential deadlock. To prevent this scenario, use spin_lock_irqsave(). Fixes: 9a1038728037 ("i2c: iproc: add NIC I2C support") Signed-off-by: Chengfeng Ye <dg573847474@gmail.com> Acked-by: Ray Jui <ray.jui@broadcom.com> Reviewed-by: Andi Shyti <andi.shyti@kernel.org> Signed-off-by: Wolfram Sang <wsa@kernel.org>
| * | | i2c: Update documentation to use .probe() againUwe Kleine-König2023-08-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since commit 03c835f498b5 ("i2c: Switch .probe() to not take an id parameter") .probe() is the recommended callback to implement (again). Reflect this in the documentation and don't mention .probe_new() any more. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by: Jean Delvare <jdelvare@suse.de> Reviewed-by: Javier Martinez Canillas <javierm@redhat.com> Reviewed-by: Andi Shyti <andi.shyti@kernel.org> Signed-off-by: Wolfram Sang <wsa@kernel.org>