summaryrefslogtreecommitdiffstats
path: root/net/socket.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
...
| * | sock: enable timestamping using control messagesSoheil Hassas Yeganeh2016-04-041-5/+5
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, SOL_TIMESTAMPING can only be enabled using setsockopt. This is very costly when users want to sample writes to gather tx timestamps. Add support for enabling SO_TIMESTAMPING via control messages by using tsflags added in `struct sockcm_cookie` (added in the previous patches in this series) to set the tx_flags of the last skb created in a sendmsg. With this patch, the timestamp recording bits in tx_flags of the skbuff is overridden if SO_TIMESTAMPING is passed in a cmsg. Please note that this is only effective for overriding the recording timestamps flags. Users should enable timestamp reporting (e.g., SOF_TIMESTAMPING_SOFTWARE | SOF_TIMESTAMPING_OPT_ID) using socket options and then should ask for SOF_TIMESTAMPING_TX_* using control messages per sendmsg to sample timestamps for each write. Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* / ->getxattr(): pass dentry and inode as separate argumentsAl Viro2016-04-111-1/+1
|/ | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* net: Fix use after free in the recvmmsg exit pathArnaldo Carvalho de Melo2016-03-141-19/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | The syzkaller fuzzer hit the following use-after-free: Call Trace: [<ffffffff8175ea0e>] __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:295 [<ffffffff851cc31a>] __sys_recvmmsg+0x6fa/0x7f0 net/socket.c:2261 [< inline >] SYSC_recvmmsg net/socket.c:2281 [<ffffffff851cc57f>] SyS_recvmmsg+0x16f/0x180 net/socket.c:2270 [<ffffffff86332bb6>] entry_SYSCALL_64_fastpath+0x16/0x7a arch/x86/entry/entry_64.S:185 And, as Dmitry rightly assessed, that is because we can drop the reference and then touch it when the underlying recvmsg calls return some packets and then hit an error, which will make recvmmsg to set sock->sk->sk_err, oops, fix it. Reported-and-Tested-by: Dmitry Vyukov <dvyukov@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Eric Dumazet <edumazet@google.com> Cc: Kostya Serebryany <kcc@google.com> Cc: Sasha Levin <sasha.levin@oracle.com> Fixes: a2e2725541fa ("net: Introduce recvmmsg socket syscall") http://lkml.kernel.org/r/20160122211644.GC2470@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: socket: use pr_info_once to tip the obsolete usage of PF_PACKETliping.zhang2016-03-141-6/+2
| | | | | | | | There is no need to use the static variable here, pr_info_once is more concise. Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Add MSG_BATCH flagTom Herbert2016-03-091-0/+5
| | | | | | | | | | | | | | | Add a new msg flag called MSG_BATCH. This flag is used in sendmsg to indicate that more messages will follow (i.e. a batch of messages is being sent). This is similar to MSG_MORE except that the following messages are not merged into one packet, they are sent individually. sendmmsg is updated so that each contained message except for the last one is marked as MSG_BATCH. MSG_BATCH is a performance optimization in cases where a socket implementation can benefit by transmitting packets in a batch. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Allow MSG_EOR in each msghdr of sendmmsgTom Herbert2016-03-091-4/+6
| | | | | | | | | This patch allows setting MSG_EOR in each individual msghdr passed in sendmmsg. This allows a sendmmsg to send multiple messages when using SOCK_SEQPACKET. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Make sock_alloc exportableTom Herbert2016-03-091-1/+2
| | | | | | | Export it for cases where we want to create sockets by hand. Signed-off-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* kmemcg: account certain kmem allocations to memcgVladimir Davydov2016-01-151-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mark those kmem allocations that are known to be easily triggered from userspace as __GFP_ACCOUNT/SLAB_ACCOUNT, which makes them accounted to memcg. For the list, see below: - threadinfo - task_struct - task_delay_info - pid - cred - mm_struct - vm_area_struct and vm_region (nommu) - anon_vma and anon_vma_chain - signal_struct - sighand_struct - fs_struct - files_struct - fdtable and fdtable->full_fds_bits - dentry and external_name - inode for all filesystems. This is the most tedious part, because most filesystems overwrite the alloc_inode method. The list is far from complete, so feel free to add more objects. Nevertheless, it should be close to "account everything" approach and keep most workloads within bounds. Malevolent users will be able to breach the limit, but this was possible even with the former "account everything" approach (simply because it did not account everything in fact). [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Tejun Heo <tj@kernel.org> Cc: Greg Thelen <gthelen@google.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* net: add scheduling point in recvmmsg/sendmmsgEric Dumazet2016-01-111-0/+2
| | | | | | | | | | | Applications often have to reduce number of datagrams they receive or send per system call to avoid starvation problems. Really the kernel should take care of this by using cond_resched(), so that applications can experiment bigger batch sizes. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net, socket, socket_wq: fix missing initialization of flagsNicolai Stange2015-12-301-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit ceb5d58b2170 ("net: fix sock_wake_async() rcu protection") from the current 4.4 release cycle introduced a new flags member in struct socket_wq and moved SOCKWQ_ASYNC_NOSPACE and SOCKWQ_ASYNC_WAITDATA from struct socket's flags member into that new place. Unfortunately, the new flags field is never initialized properly, at least not for the struct socket_wq instance created in sock_alloc_inode(). One particular issue I encountered because of this is that my GNU Emacs failed to draw anything on my desktop -- i.e. what I got is a transparent window, including the title bar. Bisection lead to the commit mentioned above and further investigation by means of strace told me that Emacs is indeed speaking to my Xorg through an O_ASYNC AF_UNIX socket. This is reproducible 100% of times and the fact that properly initializing the struct socket_wq ->flags fixes the issue leads me to the conclusion that somehow SOCKWQ_ASYNC_WAITDATA got set in the uninitialized ->flags, preventing my Emacs from receiving any SIGIO's due to data becoming available and it got stuck. Make sock_alloc_inode() set the newly created struct socket_wq's ->flags member to zero. Fixes: ceb5d58b2170 ("net: fix sock_wake_async() rcu protection") Signed-off-by: Nicolai Stange <nicstange@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: fix uninitialized variable issuetadeusz.struk@intel.com2015-12-151-0/+1
| | | | | | | | | | msg_iocb needs to be initialized on the recv/recvfrom path. Otherwise afalg will wrongly interpret it as an async call. Cc: stable@vger.kernel.org Reported-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: fix sock_wake_async() rcu protectionEric Dumazet2015-12-011-14/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Dmitry provided a syzkaller (http://github.com/google/syzkaller) triggering a fault in sock_wake_async() when async IO is requested. Said program stressed af_unix sockets, but the issue is generic and should be addressed in core networking stack. The problem is that by the time sock_wake_async() is called, we should not access the @flags field of 'struct socket', as the inode containing this socket might be freed without further notice, and without RCU grace period. We already maintain an RCU protected structure, "struct socket_wq" so moving SOCKWQ_ASYNC_NOSPACE & SOCKWQ_ASYNC_WAITDATA into it is the safe route. It also reduces number of cache lines needing dirtying, so might provide a performance improvement anyway. In followup patches, we might move remaining flags (SOCK_NOSPACE, SOCK_PASSCRED, SOCK_PASSSEC) to save 8 bytes and let 'struct socket' being mostly read and let it being shared between cpus. Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: rename SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATAEric Dumazet2015-12-011-2/+2
| | | | | | | | | | | | | | | | | | This patch is a cleanup to make following patch easier to review. Goal is to move SOCK_ASYNC_NOSPACE and SOCK_ASYNC_WAITDATA from (struct socket)->flags to a (struct socket_wq)->flags to benefit from RCU protection in sock_wake_async() To ease backports, we rename both constants. Two new helpers, sk_set_bit(int nr, struct sock *sk) and sk_clear_bit(int net, struct sock *sk) are added so that following patch can change their implementation. Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: Drop unlikely before IS_ERR(_OR_NULL)Viresh Kumar2015-09-291-3/+3
| | | | | | | | | IS_ERR(_OR_NULL) already contain an 'unlikely' compiler flag and there is no need to do that again from its callers. Drop it. Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
* net: Add a struct net parameter to sock_create_kernEric W. Biederman2015-05-111-2/+2
| | | | | | | | This is long overdue, and is part of cleaning up how we allocate kernel sockets that don't reference count struct net. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* tun: Utilize the normal socket network namespace refcounting.Eric W. Biederman2015-05-111-3/+0
| | | | | | | | | | | | | | | | | | | | There is no need for tun to do the weird network namespace refcounting. The existing network namespace refcounting in tfile has almost exactly the same lifetime. So rewrite the code to use the struct sock network namespace refcounting and remove the unnecessary hand rolled network namespace refcounting and the unncesary tfile->net. This change allows the tun code to directly call sock_put bypassing sock_release and making SOCK_EXTERNALLY_ALLOCATED unnecessary. Remove the now unncessary tun_release so that if anything tries to use the sock_release code path the kernel will oops, and let us know about the bug. The macvtap code already uses it's internal socket this way. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* VFS: net/: d_inode() annotationsDavid Howells2015-04-151-3/+3
| | | | | | | socket inodes and sunrpc filesystems - inodes owned by that code Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* make new_sync_{read,write}() staticAl Viro2015-04-121-2/+0
| | | | | | | | All places outside of core VFS that checked ->read and ->write for being NULL or called the methods directly are gone now, so NULL {read,write} with non-NULL {read,write}_iter will do the right thing in all cases. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* new helper: msg_data_left()Al Viro2015-04-111-2/+2
| | | | | | convert open-coded instances Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* get rid of the size argument of sock_sendmsg()Al Viro2015-04-111-13/+14
| | | | | | it's equal to iov_iter_count(&msg->msg_iter) in all cases Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* switch kernel_sendmsg() and kernel_recvmsg() to iov_iter_kvec()Al Viro2015-04-091-17/+3
| | | | | | | | | | | For kernel_sendmsg() that eliminates the need to play with setfs(); for kernel_recvmsg() it does *not* - a couple of callers are using it with non-NULL ->msg_control, which would be treated as userland address on recvmsg side of things. In all cases we are really setting a kvec-backed iov_iter, though. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* net: switch importing msghdr from userland to {compat_,}import_iovec()Al Viro2015-04-091-19/+12
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* net: switch sendto() and recvfrom() to import_single_range()Al Viro2015-04-091-16/+8
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge branch 'iocb' into for-davemAl Viro2015-04-091-3/+3
|\ | | | | | | | | | | | | trivial conflict in net/socket.c and non-trivial one in crypto - that one had evaded aio_complete() removal. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * fs: don't allow to complete sync iocbs through aio_completeChristoph Hellwig2015-03-131-6/+3
| | | | | | | | | | | | | | | | | | | | | | | | The AIO interface is fairly complex because it tries to allow filesystems to always work async and then wakeup a synchronous caller through aio_complete. It turns out that basically no one was doing this to avoid the complexity and context switches, and we've already fixed up the remaining users and can now get rid of this case. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
| * fs: remove ki_nbytesChristoph Hellwig2015-03-131-3/+3
| | | | | | | | | | | | | | | | There is no need to pass the total request length in the kiocb, as we already get passed in through the iov_iter argument. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | net: socket: add support for async operationstadeusz.struk@intel.com2015-03-231-2/+6
| | | | | | | | | | | | | | Add support for async operations. Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2015-03-201-0/+4
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: drivers/net/ethernet/emulex/benet/be_main.c net/core/sysctl_net_core.c net/ipv4/inet_diag.c The be_main.c conflict resolution was really tricky. The conflict hunks generated by GIT were very unhelpful, to say the least. It split functions in half and moved them around, when the real actual conflict only existed solely inside of one function, that being be_map_pci_bars(). So instead, to resolve this, I checked out be_main.c from the top of net-next, then I applied the be_main.c changes from 'net' since the last time I merged. And this worked beautifully. The inet_diag.c and sysctl_net_core.c conflicts were simple overlapping changes, and were easily to resolve. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net: validate the range we feed to iov_iter_init() in sys_sendto/sys_recvfromAl Viro2015-03-201-0/+4
| |/ | | | | | | | | | | Cc: stable@vger.kernel.org # v3.19 Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: Remove iocb argument from sendmsg and recvmsgYing Xue2015-03-021-65/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | After TIPC doesn't depend on iocb argument in its internal implementations of sendmsg() and recvmsg() hooks defined in proto structure, no any user is using iocb argument in them at all now. Then we can drop the redundant iocb argument completely from kinds of implementations of both sendmsg() and recvmsg() in the entire networking stack. Cc: Christoph Hellwig <hch@lst.de> Suggested-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | net: move skb->dropcount to skb->cb[]Eyal Birger2015-03-021-2/+2
|/ | | | | | | | | | | | | | | | | | Commit 977750076d98 ("af_packet: add interframe drop cmsg (v6)") unionized skb->mark and skb->dropcount in order to allow recording of the socket drop count while maintaining struct sk_buff size. skb->dropcount was introduced since there was no available room in skb->cb[] in packet sockets. However, its introduction led to the inability to export skb->mark, or any other aliased field to userspace if so desired. Moving the dropcount metric to skb->cb[] eliminates this problem at the expense of 4 bytes less in skb->cb[] for protocol families using it. Signed-off-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: switch sockets to ->read_iter/->write_iterAl Viro2015-02-041-29/+27
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* net/socket.c: fold do_sock_{read,write} into callersAl Viro2015-02-041-35/+21
| | | | Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* net: remove sock_iocbChristoph Hellwig2015-01-291-41/+4
| | | | | | | | | | | The sock_iocb structure is allocate on stack for each read/write-like operation on sockets, and contains various fields of which only the embedded msghdr and sometimes a pointer to the scm_cookie is ever used. Get rid of the sock_iocb and put a msghdr directly on the stack and pass the scm_cookie explicitly to netlink_mmap_sendmsg. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2015-01-281-3/+0
|\ | | | | | | | | | | | | | | | | | | Conflicts: arch/arm/boot/dts/imx6sx-sdb.dts net/sched/cls_bpf.c Two simple sets of overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
| * net: don't OOPS on socket aioChristoph Hellwig2015-01-271-3/+0
| | | | | | | | | | Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* | socket: use ki_nbytes instead of iov_length()Nicolas Dichtel2015-01-181-6/+4
| | | | | | | | | | | | | | | | | | This field already contains the length of the iovec, no need to calculate it again. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | socket: use iov_length()Nicolas Dichtel2015-01-151-10/+2
|/ | | | | | | Better to use available helpers. Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* [regression] chunk lost from bd9b51Al Viro2014-12-191-1/+0
| | | | | | Reported-by: Pavel Emelyanov <xemul@parallels.com> Acked-by: Pavel Emelyanov <xemul@parallels.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge branch 'for-linus' of ↵Linus Torvalds2014-12-171-19/+0
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull vfs pile #2 from Al Viro: "Next pile (and there'll be one or two more). The large piece in this one is getting rid of /proc/*/ns/* weirdness; among other things, it allows to (finally) make nameidata completely opaque outside of fs/namei.c, making for easier further cleanups in there" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: coda_venus_readdir(): use file_inode() fs/namei.c: fold link_path_walk() call into path_init() path_init(): don't bother with LOOKUP_PARENT in argument fs/namei.c: new helper (path_cleanup()) path_init(): store the "base" pointer to file in nameidata itself make default ->i_fop have ->open() fail with ENXIO make nameidata completely opaque outside of fs/namei.c kill proc_ns completely take the targets of /proc/*/ns/* symlinks to separate fs bury struct proc_ns in fs/proc copy address of proc_ns_ops into ns_common new helpers: ns_alloc_inum/ns_free_inum make proc_ns_operations work with struct ns_common * instead of void * switch the rest of proc_ns_operations to working with &...->ns netns: switch ->get()/->put()/->install()/->inum() to working with &net->ns make mntns ->get()/->put()/->install()/->inum() work with &mnt_ns->ns common object embedded into various struct ....ns
| * make default ->i_fop have ->open() fail with ENXIOAl Viro2014-12-111-19/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As it is, default ->i_fop has NULL ->open() (along with all other methods). The only case where it matters is reopening (via procfs symlink) a file that didn't get its ->f_op from ->i_fop - anything else will have ->i_fop assigned to something sane (default would fail on read/write/ioctl/etc.). Unfortunately, such case exists - alloc_file() users, especially anon_get_file() ones. There we have tons of opened files of very different kinds sharing the same inode. As the result, attempt to reopen those via procfs succeeds and you get a descriptor you can't do anything with. Moreover, in case of sockets we set ->i_fop that will only be used on such reopen attempts - and put a failing ->open() into it to make sure those do not succeed. It would be simpler to put such ->open() into default ->i_fop and leave it unchanged both for anon inode (as we do anyway) and for socket ones. Result: * everything going through do_dentry_open() works as it used to * sock_no_open() kludge is gone * attempts to reopen anon-inode files fail as they really ought to * ditto for aio_private_file() * ditto for perfmon - this one actually tried to imitate sock_no_open() trick, but failed to set ->i_fop, so in the current tree reopens succeed and yield completely useless descriptor. Intent clearly had been to fail with -ENXIO on such reopens; now it actually does. * everything else that used alloc_file() keeps working - it has ->i_fop set for its inodes anyway Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | put iov_iter into msghdrAl Viro2014-12-091-15/+12
| | | | | | | | | | | | | | | | Note that the code _using_ ->msg_iter at that point will be very unhappy with anything other than unshifted iovec-backed iov_iter. We still need to convert users to proper primitives. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | net/socket.c : introduce helper function do_sock_sendmsg to replace ↵Gu Zheng2014-12-091-12/+10
| | | | | | | | | | | | | | | | | | | | reduplicate code Introduce helper function do_sock_sendmsg() to simplify sock_sendmsg{_nosec}, and replace reduplicate code. Signed-off-by: Gu Zheng <guz.fnst@cn.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | fold verify_iovec() into copy_msghdr_from_user()Al Viro2014-11-191-40/+53
| | | | | | | | | | | | ... and do the same on the compat side of things. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | {compat_,}verify_iovec(): switch to generic copying of iovecsAl Viro2014-11-191-30/+8
| | | | | | | | | | | | | | | | use {compat_,}rw_copy_check_uvector(). As the result, we are guaranteed that all iovecs seen in ->msg_iov by ->sendmsg() and ->recvmsg() will pass access_ok(). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* | separate kernel- and userland-side msghdrAl Viro2014-11-191-13/+16
|/ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Kernel-side struct msghdr is (currently) using the same layout as userland one, but it's not a one-to-one copy - even without considering 32bit compat issues, we have msg_iov, msg_name and msg_control copied to kernel[1]. It's fairly localized, so we get away with a few functions where that knowledge is needed (and we could shrink that set even more). Pretty much everything deals with the kernel-side variant and the few places that want userland one just use a bunch of force-casts to paper over the differences. The thing is, kernel-side definition of struct msghdr is *not* exposed in include/uapi - libc doesn't see it, etc. So we can add struct user_msghdr, with proper annotations and let the few places that ever deal with those beasts use it for userland pointers. Saner typechecking aside, that will allow to change the layout of kernel-side msghdr - e.g. replace msg_iov/msg_iovlen there with struct iov_iter, getting rid of the need to modify the iovec as we copy data to/from it, etc. We could introduce kernel_msghdr instead, but that would create much more noise - the absolute majority of the instances would need to have the type switched to kernel_msghdr and definition of struct msghdr in include/linux/socket.h is not going to be seen by userland anyway. This commit just introduces user_msghdr and switches the few places that are dealing with userland-side msghdr to it. [1] actually, it's even trickier than that - we copy msg_control for sendmsg, but keep the userland address on recvmsg. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
* Merge tag 'locks-v3.18-1' of git://git.samba.org/jlayton/linuxLinus Torvalds2014-10-111-1/+2
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull file locking related changes from Jeff Layton: "This release is a little more busy for file locking changes than the last: - a set of patches from Kinglong Mee to fix the lockowner handling in knfsd - a pile of cleanups to the internal file lease API. This should get us a bit closer to allowing for setlease methods that can block. There are some dependencies between mine and Bruce's trees this cycle, and I based my tree on top of the requisite patches in Bruce's tree" * tag 'locks-v3.18-1' of git://git.samba.org/jlayton/linux: (26 commits) locks: fix fcntl_setlease/getlease return when !CONFIG_FILE_LOCKING locks: flock_make_lock should return a struct file_lock (or PTR_ERR) locks: set fl_owner for leases to filp instead of current->files locks: give lm_break a return value locks: __break_lease cleanup in preparation of allowing direct removal of leases locks: remove i_have_this_lease check from __break_lease locks: move freeing of leases outside of i_lock locks: move i_lock acquisition into generic_*_lease handlers locks: define a lm_setup handler for leases locks: plumb a "priv" pointer into the setlease routines nfsd: don't keep a pointer to the lease in nfs4_file locks: clean up vfs_setlease kerneldoc comments locks: generic_delete_lease doesn't need a file_lock at all nfsd: fix potential lease memory leak in nfs4_setlease locks: close potential race in lease_get_mtime security: make security_file_set_fowner, f_setown and __f_setown void return locks: consolidate "nolease" routines locks: remove lock_may_read and lock_may_write lockd: rip out deferred lock handling from testlock codepath NFSD: Get reference of lockowner when coping file_lock ...
| * security: make security_file_set_fowner, f_setown and __f_setown void returnJeff Layton2014-09-091-1/+2
| | | | | | | | | | | | | | | | | | | | security_file_set_fowner always returns 0, so make it f_setown and __f_setown void return functions and fix up the error handling in the callers. Cc: linux-security-module@vger.kernel.org Signed-off-by: Jeff Layton <jlayton@primarydata.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
* | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netDavid S. Miller2014-09-231-0/+3
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Conflicts: arch/mips/net/bpf_jit.c drivers/net/can/flexcan.c Both the flexcan and MIPS bpf_jit conflicts were cases of simple overlapping changes. Signed-off-by: David S. Miller <davem@davemloft.net>
| * | net:socket: set msg_namelen to 0 if msg_name is passed as NULL in msghdr ↵Ani Sinha2014-09-101-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | struct from userland. Linux manpage for recvmsg and sendmsg calls does not explicitly mention setting msg_namelen to 0 when msg_name passed set as NULL. When developers don't set msg_namelen member in msghdr, it might contain garbage value which will fail the validation check and sendmsg and recvmsg calls from kernel will return EINVAL. This will break old binaries and any code for which there is no access to source code. To fix this, we set msg_namelen to 0 when msg_name is passed as NULL from userland. Signed-off-by: Ani Sinha <ani@arista.com> Signed-off-by: David S. Miller <davem@davemloft.net>