From 2b144498350860b6ee9dc57ff27a93ad488de5dc Mon Sep 17 00:00:00 2001 From: Srikar Dronamraju Date: Thu, 9 Feb 2012 14:56:42 +0530 Subject: uprobes, mm, x86: Add the ability to install and remove uprobes breakpoints Add uprobes support to the core kernel, with x86 support. This commit adds the kernel facilities, the actual uprobes user-space ABI and perf probe support comes in later commits. General design: Uprobes are maintained in an rb-tree indexed by inode and offset (the offset here is from the start of the mapping). For a unique (inode, offset) tuple, there can be at most one uprobe in the rb-tree. Since the (inode, offset) tuple identifies a unique uprobe, more than one user may be interested in the same uprobe. This provides the ability to connect multiple 'consumers' to the same uprobe. Each consumer defines a handler and a filter (optional). The 'handler' is run every time the uprobe is hit, if it matches the 'filter' criteria. The first consumer of a uprobe causes the breakpoint to be inserted at the specified address and subsequent consumers are appended to this list. On subsequent probes, the consumer gets appended to the existing list of consumers. The breakpoint is removed when the last consumer unregisters. For all other unregisterations, the consumer is removed from the list of consumers. Given a inode, we get a list of the mms that have mapped the inode. Do the actual registration if mm maps the page where a probe needs to be inserted/removed. We use a temporary list to walk through the vmas that map the inode. - The number of maps that map the inode, is not known before we walk the rmap and keeps changing. - extending vm_area_struct wasn't recommended, it's a size-critical data structure. - There can be more than one maps of the inode in the same mm. We add callbacks to the mmap methods to keep an eye on text vmas that are of interest to uprobes. When a vma of interest is mapped, we insert the breakpoint at the right address. Uprobe works by replacing the instruction at the address defined by (inode, offset) with the arch specific breakpoint instruction. We save a copy of the original instruction at the uprobed address. This is needed for: a. executing the instruction out-of-line (xol). b. instruction analysis for any subsequent fixups. c. restoring the instruction back when the uprobe is unregistered. We insert or delete a breakpoint instruction, and this breakpoint instruction is assumed to be the smallest instruction available on the platform. For fixed size instruction platforms this is trivially true, for variable size instruction platforms the breakpoint instruction is typically the smallest (often a single byte). Writing the instruction is done by COWing the page and changing the instruction during the copy, this even though most platforms allow atomic writes of the breakpoint instruction. This also mirrors the behaviour of a ptrace() memory write to a PRIVATE file map. The core worker is derived from KSM's replace_page() logic. In essence, similar to KSM: a. allocate a new page and copy over contents of the page that has the uprobed vaddr b. modify the copy and insert the breakpoint at the required address c. switch the original page with the copy containing the breakpoint d. flush page tables. replace_page() is being replicated here because of some minor changes in the type of pages and also because Hugh Dickins had plans to improve replace_page() for KSM specific work. Instruction analysis on x86 is based on instruction decoder and determines if an instruction can be probed and determines the necessary fixups after singlestep. Instruction analysis is done at probe insertion time so that we avoid having to repeat the same analysis every time a probe is hit. A lot of code here is due to the improvement/suggestions/inputs from Peter Zijlstra. Changelog: (v10): - Add code to clear REX.B prefix as suggested by Denys Vlasenko and Masami Hiramatsu. (v9): - Use insn_offset_modrm as suggested by Masami Hiramatsu. (v7): Handle comments from Peter Zijlstra: - Dont take reference to inode. (expect inode to uprobe_register to be sane). - Use PTR_ERR to set the return value. - No need to take reference to inode. - use PTR_ERR to return error value. - register and uprobe_unregister share code. (v5): - Modified del_consumer as per comments from Peter. - Drop reference to inode before dropping reference to uprobe. - Use i_size_read(inode) instead of inode->i_size. - Ensure uprobe->consumers is NULL, before __uprobe_unregister() is called. - Includes errno.h as recommended by Stephen Rothwell to fix a build issue on sparc defconfig - Remove restrictions while unregistering. - Earlier code leaked inode references under some conditions while registering/unregistering. - Continue the vma-rmap walk even if the intermediate vma doesnt meet the requirements. - Validate the vma found by find_vma before inserting/removing the breakpoint - Call del_consumer under mutex_lock. - Use hash locks. - Handle mremap. - Introduce find_least_offset_node() instead of close match logic in find_uprobe - Uprobes no more depends on MM_OWNER; No reference to task_structs while inserting/removing a probe. - Uses read_mapping_page instead of grab_cache_page so that the pages have valid content. - pass NULL to get_user_pages for the task parameter. - call SetPageUptodate on the new page allocated in write_opcode. - fix leaking a reference to the new page under certain conditions. - Include Instruction Decoder if Uprobes gets defined. - Remove const attributes for instruction prefix arrays. - Uses mm_context to know if the application is 32 bit. Signed-off-by: Srikar Dronamraju Also-written-by: Jim Keniston Reviewed-by: Peter Zijlstra Cc: Oleg Nesterov Cc: Andi Kleen Cc: Christoph Hellwig Cc: Steven Rostedt Cc: Roland McGrath Cc: Masami Hiramatsu Cc: Arnaldo Carvalho de Melo Cc: Anton Arapov Cc: Ananth N Mavinakayanahalli Cc: Stephen Rothwell Cc: Denys Vlasenko Cc: Peter Zijlstra Cc: Linus Torvalds Cc: Andrew Morton Cc: Linux-mm Link: http://lkml.kernel.org/r/20120209092642.GE16600@linux.vnet.ibm.com [ Made various small edits to the commit log ] Signed-off-by: Ingo Molnar --- include/linux/uprobes.h | 98 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 98 insertions(+) create mode 100644 include/linux/uprobes.h (limited to 'include') diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h new file mode 100644 index 000000000000..f1d13fd140f2 --- /dev/null +++ b/include/linux/uprobes.h @@ -0,0 +1,98 @@ +#ifndef _LINUX_UPROBES_H +#define _LINUX_UPROBES_H +/* + * Userspace Probes (UProbes) + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, write to the Free Software + * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. + * + * Copyright (C) IBM Corporation, 2008-2011 + * Authors: + * Srikar Dronamraju + * Jim Keniston + */ + +#include +#include + +struct vm_area_struct; +#ifdef CONFIG_ARCH_SUPPORTS_UPROBES +#include +#else + +typedef u8 uprobe_opcode_t; +struct uprobe_arch_info {}; + +#define MAX_UINSN_BYTES 4 +#endif + +#define uprobe_opcode_sz sizeof(uprobe_opcode_t) + +/* flags that denote/change uprobes behaviour */ +/* Have a copy of original instruction */ +#define UPROBES_COPY_INSN 0x1 +/* Dont run handlers when first register/ last unregister in progress*/ +#define UPROBES_RUN_HANDLER 0x2 + +struct uprobe_consumer { + int (*handler)(struct uprobe_consumer *self, struct pt_regs *regs); + /* + * filter is optional; If a filter exists, handler is run + * if and only if filter returns true. + */ + bool (*filter)(struct uprobe_consumer *self, struct task_struct *task); + + struct uprobe_consumer *next; +}; + +struct uprobe { + struct rb_node rb_node; /* node in the rb tree */ + atomic_t ref; + struct rw_semaphore consumer_rwsem; + struct list_head pending_list; + struct uprobe_arch_info arch_info; + struct uprobe_consumer *consumers; + struct inode *inode; /* Also hold a ref to inode */ + loff_t offset; + int flags; + u8 insn[MAX_UINSN_BYTES]; +}; + +#ifdef CONFIG_UPROBES +extern int __weak set_bkpt(struct mm_struct *mm, struct uprobe *uprobe, + unsigned long vaddr); +extern int __weak set_orig_insn(struct mm_struct *mm, struct uprobe *uprobe, + unsigned long vaddr, bool verify); +extern bool __weak is_bkpt_insn(uprobe_opcode_t *insn); +extern int register_uprobe(struct inode *inode, loff_t offset, + struct uprobe_consumer *consumer); +extern void unregister_uprobe(struct inode *inode, loff_t offset, + struct uprobe_consumer *consumer); +extern int mmap_uprobe(struct vm_area_struct *vma); +#else /* CONFIG_UPROBES is not defined */ +static inline int register_uprobe(struct inode *inode, loff_t offset, + struct uprobe_consumer *consumer) +{ + return -ENOSYS; +} +static inline void unregister_uprobe(struct inode *inode, loff_t offset, + struct uprobe_consumer *consumer) +{ +} +static inline int mmap_uprobe(struct vm_area_struct *vma) +{ + return 0; +} +#endif /* CONFIG_UPROBES */ +#endif /* _LINUX_UPROBES_H */ -- cgit v1.2.3 From 7b2d81d48a2d8e37efb6ce7b4d5ef58822b30d89 Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Fri, 17 Feb 2012 09:27:41 +0100 Subject: uprobes/core: Clean up, refactor and improve the code Make the uprobes code readable to me: - improve the Kconfig text so that a mere mortal gets some idea what CONFIG_UPROBES=y is really about - do trivial renames to standardize around the uprobes_*() namespace - clean up and simplify various code flow details - separate basic blocks of functionality - line break artifact and white space related removal - use standard local varible definition blocks - use vertical spacing to make things more readable - remove unnecessary volatile - restructure comment blocks to make them more uniform and more readable in general Cc: Srikar Dronamraju Cc: Jim Keniston Cc: Peter Zijlstra Cc: Oleg Nesterov Cc: Masami Hiramatsu Cc: Arnaldo Carvalho de Melo Cc: Anton Arapov Cc: Ananth N Mavinakayanahalli Link: http://lkml.kernel.org/n/tip-ewbwhb8o6navvllsauu7k07p@git.kernel.org Signed-off-by: Ingo Molnar --- arch/Kconfig | 14 ++- arch/x86/include/asm/uprobes.h | 17 ++-- arch/x86/kernel/uprobes.c | 129 ++++++++++++------------ include/linux/uprobes.h | 28 +++--- kernel/uprobes.c | 219 ++++++++++++++++++++++++----------------- mm/mmap.c | 12 +-- 6 files changed, 233 insertions(+), 186 deletions(-) (limited to 'include') diff --git a/arch/Kconfig b/arch/Kconfig index 284f5898f526..cca5b545d806 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -66,13 +66,19 @@ config OPTPROBES depends on !PREEMPT config UPROBES - bool "User-space probes (EXPERIMENTAL)" + bool "Transparent user-space probes (EXPERIMENTAL)" depends on ARCH_SUPPORTS_UPROBES default n help - Uprobes enables kernel subsystems to establish probepoints - in user applications and execute handler functions when - the probepoints are hit. + Uprobes is the user-space counterpart to kprobes: they + enable instrumentation applications (such as 'perf probe') + to establish unintrusive probes in user-space binaries and + libraries, by executing handler functions when the probes + are hit by user-space applications. + + ( These probes come in the form of single-byte breakpoints, + managed by the kernel and kept transparent to the probed + application. ) If in doubt, say "N". diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h index 8208234391ff..072df3902636 100644 --- a/arch/x86/include/asm/uprobes.h +++ b/arch/x86/include/asm/uprobes.h @@ -1,7 +1,7 @@ #ifndef _ASM_UPROBES_H #define _ASM_UPROBES_H /* - * Userspace Probes (UProbes) for x86 + * User-space Probes (UProbes) for x86 * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -24,19 +24,20 @@ */ typedef u8 uprobe_opcode_t; -#define MAX_UINSN_BYTES 16 -#define UPROBES_XOL_SLOT_BYTES 128 /* to keep it cache aligned */ -#define UPROBES_BKPT_INSN 0xcc -#define UPROBES_BKPT_INSN_SIZE 1 +#define MAX_UINSN_BYTES 16 +#define UPROBES_XOL_SLOT_BYTES 128 /* to keep it cache aligned */ + +#define UPROBES_BKPT_INSN 0xcc +#define UPROBES_BKPT_INSN_SIZE 1 struct uprobe_arch_info { - u16 fixups; + u16 fixups; #ifdef CONFIG_X86_64 - unsigned long rip_rela_target_address; + unsigned long rip_rela_target_address; #endif }; struct uprobe; -extern int analyze_insn(struct mm_struct *mm, struct uprobe *uprobe); +extern int arch_uprobes_analyze_insn(struct mm_struct *mm, struct uprobe *uprobe); #endif /* _ASM_UPROBES_H */ diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c index 2a301bb91bdb..cf2a18498425 100644 --- a/arch/x86/kernel/uprobes.c +++ b/arch/x86/kernel/uprobes.c @@ -1,5 +1,5 @@ /* - * Userspace Probes (UProbes) for x86 + * User-space Probes (UProbes) for x86 * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -20,7 +20,6 @@ * Srikar Dronamraju * Jim Keniston */ - #include #include #include @@ -42,10 +41,10 @@ #define UPROBES_FIX_RIP_CX 0x4000 /* Adaptations for mhiramat x86 decoder v14. */ -#define OPCODE1(insn) ((insn)->opcode.bytes[0]) -#define OPCODE2(insn) ((insn)->opcode.bytes[1]) -#define OPCODE3(insn) ((insn)->opcode.bytes[2]) -#define MODRM_REG(insn) X86_MODRM_REG(insn->modrm.value) +#define OPCODE1(insn) ((insn)->opcode.bytes[0]) +#define OPCODE2(insn) ((insn)->opcode.bytes[1]) +#define OPCODE3(insn) ((insn)->opcode.bytes[2]) +#define MODRM_REG(insn) X86_MODRM_REG(insn->modrm.value) #define W(row, b0, b1, b2, b3, b4, b5, b6, b7, b8, b9, ba, bb, bc, bd, be, bf)\ (((b0##UL << 0x0)|(b1##UL << 0x1)|(b2##UL << 0x2)|(b3##UL << 0x3) | \ @@ -55,7 +54,7 @@ << (row % 32)) #ifdef CONFIG_X86_64 -static volatile u32 good_insns_64[256 / 32] = { +static u32 good_insns_64[256 / 32] = { /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ /* ---------------------------------------------- */ W(0x00, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 0, 0) | /* 00 */ @@ -81,7 +80,7 @@ static volatile u32 good_insns_64[256 / 32] = { /* Good-instruction tables for 32-bit apps */ -static volatile u32 good_insns_32[256 / 32] = { +static u32 good_insns_32[256 / 32] = { /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ /* ---------------------------------------------- */ W(0x00, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 0) | /* 00 */ @@ -105,7 +104,7 @@ static volatile u32 good_insns_32[256 / 32] = { }; /* Using this for both 64-bit and 32-bit apps */ -static volatile u32 good_2byte_insns[256 / 32] = { +static u32 good_2byte_insns[256 / 32] = { /* 0 1 2 3 4 5 6 7 8 9 a b c d e f */ /* ---------------------------------------------- */ W(0x00, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1) | /* 00 */ @@ -132,42 +131,47 @@ static volatile u32 good_2byte_insns[256 / 32] = { /* * opcodes we'll probably never support: - * 6c-6d, e4-e5, ec-ed - in - * 6e-6f, e6-e7, ee-ef - out - * cc, cd - int3, int - * cf - iret - * d6 - illegal instruction - * f1 - int1/icebp - * f4 - hlt - * fa, fb - cli, sti - * 0f - lar, lsl, syscall, clts, sysret, sysenter, sysexit, invd, wbinvd, ud2 + * + * 6c-6d, e4-e5, ec-ed - in + * 6e-6f, e6-e7, ee-ef - out + * cc, cd - int3, int + * cf - iret + * d6 - illegal instruction + * f1 - int1/icebp + * f4 - hlt + * fa, fb - cli, sti + * 0f - lar, lsl, syscall, clts, sysret, sysenter, sysexit, invd, wbinvd, ud2 * * invalid opcodes in 64-bit mode: - * 06, 0e, 16, 1e, 27, 2f, 37, 3f, 60-62, 82, c4-c5, d4-d5 * - * 63 - we support this opcode in x86_64 but not in i386. + * 06, 0e, 16, 1e, 27, 2f, 37, 3f, 60-62, 82, c4-c5, d4-d5 + * 63 - we support this opcode in x86_64 but not in i386. * * opcodes we may need to refine support for: - * 0f - 2-byte instructions: For many of these instructions, the validity - * depends on the prefix and/or the reg field. On such instructions, we - * just consider the opcode combination valid if it corresponds to any - * valid instruction. - * 8f - Group 1 - only reg = 0 is OK - * c6-c7 - Group 11 - only reg = 0 is OK - * d9-df - fpu insns with some illegal encodings - * f2, f3 - repnz, repz prefixes. These are also the first byte for - * certain floating-point instructions, such as addsd. - * fe - Group 4 - only reg = 0 or 1 is OK - * ff - Group 5 - only reg = 0-6 is OK + * + * 0f - 2-byte instructions: For many of these instructions, the validity + * depends on the prefix and/or the reg field. On such instructions, we + * just consider the opcode combination valid if it corresponds to any + * valid instruction. + * + * 8f - Group 1 - only reg = 0 is OK + * c6-c7 - Group 11 - only reg = 0 is OK + * d9-df - fpu insns with some illegal encodings + * f2, f3 - repnz, repz prefixes. These are also the first byte for + * certain floating-point instructions, such as addsd. + * + * fe - Group 4 - only reg = 0 or 1 is OK + * ff - Group 5 - only reg = 0-6 is OK * * others -- Do we need to support these? - * 0f - (floating-point?) prefetch instructions - * 07, 17, 1f - pop es, pop ss, pop ds - * 26, 2e, 36, 3e - es:, cs:, ss:, ds: segment prefixes -- + * + * 0f - (floating-point?) prefetch instructions + * 07, 17, 1f - pop es, pop ss, pop ds + * 26, 2e, 36, 3e - es:, cs:, ss:, ds: segment prefixes -- * but 64 and 65 (fs: and gs:) seem to be used, so we support them - * 67 - addr16 prefix - * ce - into - * f0 - lock prefix + * 67 - addr16 prefix + * ce - into + * f0 - lock prefix */ /* @@ -182,11 +186,11 @@ static bool is_prefix_bad(struct insn *insn) for (i = 0; i < insn->prefixes.nbytes; i++) { switch (insn->prefixes.bytes[i]) { - case 0x26: /*INAT_PFX_ES */ - case 0x2E: /*INAT_PFX_CS */ - case 0x36: /*INAT_PFX_DS */ - case 0x3E: /*INAT_PFX_SS */ - case 0xF0: /*INAT_PFX_LOCK */ + case 0x26: /* INAT_PFX_ES */ + case 0x2E: /* INAT_PFX_CS */ + case 0x36: /* INAT_PFX_DS */ + case 0x3E: /* INAT_PFX_SS */ + case 0xF0: /* INAT_PFX_LOCK */ return true; } } @@ -201,12 +205,15 @@ static int validate_insn_32bits(struct uprobe *uprobe, struct insn *insn) insn_get_opcode(insn); if (is_prefix_bad(insn)) return -ENOTSUPP; + if (test_bit(OPCODE1(insn), (unsigned long *)good_insns_32)) return 0; + if (insn->opcode.nbytes == 2) { if (test_bit(OPCODE2(insn), (unsigned long *)good_2byte_insns)) return 0; } + return -ENOTSUPP; } @@ -282,12 +289,12 @@ static void prepare_fixups(struct uprobe *uprobe, struct insn *insn) * disastrous. * * Some useful facts about rip-relative instructions: - * - There's always a modrm byte. - * - There's never a SIB byte. - * - The displacement is always 4 bytes. + * + * - There's always a modrm byte. + * - There's never a SIB byte. + * - The displacement is always 4 bytes. */ -static void handle_riprel_insn(struct mm_struct *mm, struct uprobe *uprobe, - struct insn *insn) +static void handle_riprel_insn(struct mm_struct *mm, struct uprobe *uprobe, struct insn *insn) { u8 *cursor; u8 reg; @@ -342,13 +349,12 @@ static void handle_riprel_insn(struct mm_struct *mm, struct uprobe *uprobe, } /* Target address = address of next instruction + (signed) offset */ - uprobe->arch_info.rip_rela_target_address = (long)insn->length - + insn->displacement.value; + uprobe->arch_info.rip_rela_target_address = (long)insn->length + insn->displacement.value; + /* Displacement field is gone; slide immediate field (if any) over. */ if (insn->immediate.nbytes) { cursor++; - memmove(cursor, cursor + insn->displacement.nbytes, - insn->immediate.nbytes); + memmove(cursor, cursor + insn->displacement.nbytes, insn->immediate.nbytes); } return; } @@ -361,8 +367,10 @@ static int validate_insn_64bits(struct uprobe *uprobe, struct insn *insn) insn_get_opcode(insn); if (is_prefix_bad(insn)) return -ENOTSUPP; + if (test_bit(OPCODE1(insn), (unsigned long *)good_insns_64)) return 0; + if (insn->opcode.nbytes == 2) { if (test_bit(OPCODE2(insn), (unsigned long *)good_2byte_insns)) return 0; @@ -370,34 +378,31 @@ static int validate_insn_64bits(struct uprobe *uprobe, struct insn *insn) return -ENOTSUPP; } -static int validate_insn_bits(struct mm_struct *mm, struct uprobe *uprobe, - struct insn *insn) +static int validate_insn_bits(struct mm_struct *mm, struct uprobe *uprobe, struct insn *insn) { if (mm->context.ia32_compat) return validate_insn_32bits(uprobe, insn); return validate_insn_64bits(uprobe, insn); } -#else -static void handle_riprel_insn(struct mm_struct *mm, struct uprobe *uprobe, - struct insn *insn) +#else /* 32-bit: */ +static void handle_riprel_insn(struct mm_struct *mm, struct uprobe *uprobe, struct insn *insn) { - return; + /* No RIP-relative addressing on 32-bit */ } -static int validate_insn_bits(struct mm_struct *mm, struct uprobe *uprobe, - struct insn *insn) +static int validate_insn_bits(struct mm_struct *mm, struct uprobe *uprobe, struct insn *insn) { return validate_insn_32bits(uprobe, insn); } #endif /* CONFIG_X86_64 */ /** - * analyze_insn - instruction analysis including validity and fixups. + * arch_uprobes_analyze_insn - instruction analysis including validity and fixups. * @mm: the probed address space. * @uprobe: the probepoint information. * Return 0 on success or a -ve number on error. */ -int analyze_insn(struct mm_struct *mm, struct uprobe *uprobe) +int arch_uprobes_analyze_insn(struct mm_struct *mm, struct uprobe *uprobe) { int ret; struct insn insn; @@ -406,7 +411,9 @@ int analyze_insn(struct mm_struct *mm, struct uprobe *uprobe) ret = validate_insn_bits(mm, uprobe, &insn); if (ret != 0) return ret; + handle_riprel_insn(mm, uprobe, &insn); prepare_fixups(uprobe, &insn); + return 0; } diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h index f1d13fd140f2..64e45f116b2a 100644 --- a/include/linux/uprobes.h +++ b/include/linux/uprobes.h @@ -1,7 +1,7 @@ #ifndef _LINUX_UPROBES_H #define _LINUX_UPROBES_H /* - * Userspace Probes (UProbes) + * User-space Probes (UProbes) * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -40,8 +40,10 @@ struct uprobe_arch_info {}; #define uprobe_opcode_sz sizeof(uprobe_opcode_t) /* flags that denote/change uprobes behaviour */ + /* Have a copy of original instruction */ #define UPROBES_COPY_INSN 0x1 + /* Dont run handlers when first register/ last unregister in progress*/ #define UPROBES_RUN_HANDLER 0x2 @@ -70,27 +72,23 @@ struct uprobe { }; #ifdef CONFIG_UPROBES -extern int __weak set_bkpt(struct mm_struct *mm, struct uprobe *uprobe, - unsigned long vaddr); -extern int __weak set_orig_insn(struct mm_struct *mm, struct uprobe *uprobe, - unsigned long vaddr, bool verify); +extern int __weak set_bkpt(struct mm_struct *mm, struct uprobe *uprobe, unsigned long vaddr); +extern int __weak set_orig_insn(struct mm_struct *mm, struct uprobe *uprobe, unsigned long vaddr, bool verify); extern bool __weak is_bkpt_insn(uprobe_opcode_t *insn); -extern int register_uprobe(struct inode *inode, loff_t offset, - struct uprobe_consumer *consumer); -extern void unregister_uprobe(struct inode *inode, loff_t offset, - struct uprobe_consumer *consumer); -extern int mmap_uprobe(struct vm_area_struct *vma); +extern int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *consumer); +extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *consumer); +extern int uprobe_mmap(struct vm_area_struct *vma); #else /* CONFIG_UPROBES is not defined */ -static inline int register_uprobe(struct inode *inode, loff_t offset, - struct uprobe_consumer *consumer) +static inline int +uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *consumer) { return -ENOSYS; } -static inline void unregister_uprobe(struct inode *inode, loff_t offset, - struct uprobe_consumer *consumer) +static inline void +uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *consumer) { } -static inline int mmap_uprobe(struct vm_area_struct *vma) +static inline int uprobe_mmap(struct vm_area_struct *vma) { return 0; } diff --git a/kernel/uprobes.c b/kernel/uprobes.c index 72e8bb3b52cd..884817f1b0d3 100644 --- a/kernel/uprobes.c +++ b/kernel/uprobes.c @@ -1,5 +1,5 @@ /* - * Userspace Probes (UProbes) + * User-space Probes (UProbes) * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -29,24 +29,26 @@ #include /* anon_vma_prepare */ #include /* set_pte_at_notify */ #include /* try_to_free_swap */ + #include static struct rb_root uprobes_tree = RB_ROOT; + static DEFINE_SPINLOCK(uprobes_treelock); /* serialize rbtree access */ #define UPROBES_HASH_SZ 13 + /* serialize (un)register */ static struct mutex uprobes_mutex[UPROBES_HASH_SZ]; -#define uprobes_hash(v) (&uprobes_mutex[((unsigned long)(v)) %\ - UPROBES_HASH_SZ]) + +#define uprobes_hash(v) (&uprobes_mutex[((unsigned long)(v)) % UPROBES_HASH_SZ]) /* serialize uprobe->pending_list */ static struct mutex uprobes_mmap_mutex[UPROBES_HASH_SZ]; -#define uprobes_mmap_hash(v) (&uprobes_mmap_mutex[((unsigned long)(v)) %\ - UPROBES_HASH_SZ]) +#define uprobes_mmap_hash(v) (&uprobes_mmap_mutex[((unsigned long)(v)) % UPROBES_HASH_SZ]) /* - * uprobe_events allows us to skip the mmap_uprobe if there are no uprobe + * uprobe_events allows us to skip the uprobe_mmap if there are no uprobe * events active at this time. Probably a fine grained per inode count is * better? */ @@ -58,9 +60,9 @@ static atomic_t uprobe_events = ATOMIC_INIT(0); * vm_area_struct wasnt recommended. */ struct vma_info { - struct list_head probe_list; - struct mm_struct *mm; - loff_t vaddr; + struct list_head probe_list; + struct mm_struct *mm; + loff_t vaddr; }; /* @@ -79,8 +81,7 @@ static bool valid_vma(struct vm_area_struct *vma, bool is_register) if (!is_register) return true; - if ((vma->vm_flags & (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)) == - (VM_READ|VM_EXEC)) + if ((vma->vm_flags & (VM_READ|VM_WRITE|VM_EXEC|VM_SHARED)) == (VM_READ|VM_EXEC)) return true; return false; @@ -92,6 +93,7 @@ static loff_t vma_address(struct vm_area_struct *vma, loff_t offset) vaddr = vma->vm_start + offset; vaddr -= vma->vm_pgoff << PAGE_SHIFT; + return vaddr; } @@ -105,8 +107,7 @@ static loff_t vma_address(struct vm_area_struct *vma, loff_t offset) * * Returns 0 on success, -EFAULT on failure. */ -static int __replace_page(struct vm_area_struct *vma, struct page *page, - struct page *kpage) +static int __replace_page(struct vm_area_struct *vma, struct page *page, struct page *kpage) { struct mm_struct *mm = vma->vm_mm; pgd_t *pgd; @@ -163,7 +164,7 @@ out: */ bool __weak is_bkpt_insn(uprobe_opcode_t *insn) { - return (*insn == UPROBES_BKPT_INSN); + return *insn == UPROBES_BKPT_INSN; } /* @@ -203,6 +204,7 @@ static int write_opcode(struct mm_struct *mm, struct uprobe *uprobe, ret = get_user_pages(NULL, mm, vaddr, 1, 0, 0, &old_page, &vma); if (ret <= 0) return ret; + ret = -EINVAL; /* @@ -239,6 +241,7 @@ static int write_opcode(struct mm_struct *mm, struct uprobe *uprobe, vaddr_new = kmap_atomic(new_page); memcpy(vaddr_new, vaddr_old, PAGE_SIZE); + /* poke the new insn in, ASSUMES we don't cross page boundary */ vaddr &= ~PAGE_MASK; BUG_ON(vaddr + uprobe_opcode_sz > PAGE_SIZE); @@ -260,7 +263,8 @@ unlock_out: page_cache_release(new_page); put_out: - put_page(old_page); /* we did a get_page in the beginning */ + put_page(old_page); + return ret; } @@ -276,8 +280,7 @@ put_out: * For mm @mm, read the opcode at @vaddr and store it in @opcode. * Return 0 (success) or a negative errno. */ -static int read_opcode(struct mm_struct *mm, unsigned long vaddr, - uprobe_opcode_t *opcode) +static int read_opcode(struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t *opcode) { struct page *page; void *vaddr_new; @@ -293,15 +296,18 @@ static int read_opcode(struct mm_struct *mm, unsigned long vaddr, memcpy(opcode, vaddr_new + vaddr, uprobe_opcode_sz); kunmap_atomic(vaddr_new); unlock_page(page); - put_page(page); /* we did a get_user_pages in the beginning */ + + put_page(page); + return 0; } static int is_bkpt_at_addr(struct mm_struct *mm, unsigned long vaddr) { uprobe_opcode_t opcode; - int result = read_opcode(mm, vaddr, &opcode); + int result; + result = read_opcode(mm, vaddr, &opcode); if (result) return result; @@ -320,11 +326,11 @@ static int is_bkpt_at_addr(struct mm_struct *mm, unsigned long vaddr) * For mm @mm, store the breakpoint instruction at @vaddr. * Return 0 (success) or a negative errno. */ -int __weak set_bkpt(struct mm_struct *mm, struct uprobe *uprobe, - unsigned long vaddr) +int __weak set_bkpt(struct mm_struct *mm, struct uprobe *uprobe, unsigned long vaddr) { - int result = is_bkpt_at_addr(mm, vaddr); + int result; + result = is_bkpt_at_addr(mm, vaddr); if (result == 1) return -EEXIST; @@ -344,35 +350,35 @@ int __weak set_bkpt(struct mm_struct *mm, struct uprobe *uprobe, * For mm @mm, restore the original opcode (opcode) at @vaddr. * Return 0 (success) or a negative errno. */ -int __weak set_orig_insn(struct mm_struct *mm, struct uprobe *uprobe, - unsigned long vaddr, bool verify) +int __weak +set_orig_insn(struct mm_struct *mm, struct uprobe *uprobe, unsigned long vaddr, bool verify) { if (verify) { - int result = is_bkpt_at_addr(mm, vaddr); + int result; + result = is_bkpt_at_addr(mm, vaddr); if (!result) return -EINVAL; if (result != 1) return result; } - return write_opcode(mm, uprobe, vaddr, - *(uprobe_opcode_t *)uprobe->insn); + return write_opcode(mm, uprobe, vaddr, *(uprobe_opcode_t *)uprobe->insn); } static int match_uprobe(struct uprobe *l, struct uprobe *r) { if (l->inode < r->inode) return -1; + if (l->inode > r->inode) return 1; - else { - if (l->offset < r->offset) - return -1; - if (l->offset > r->offset) - return 1; - } + if (l->offset < r->offset) + return -1; + + if (l->offset > r->offset) + return 1; return 0; } @@ -391,6 +397,7 @@ static struct uprobe *__find_uprobe(struct inode *inode, loff_t offset) atomic_inc(&uprobe->ref); return uprobe; } + if (match < 0) n = n->rb_left; else @@ -411,6 +418,7 @@ static struct uprobe *find_uprobe(struct inode *inode, loff_t offset) spin_lock_irqsave(&uprobes_treelock, flags); uprobe = __find_uprobe(inode, offset); spin_unlock_irqrestore(&uprobes_treelock, flags); + return uprobe; } @@ -436,16 +444,18 @@ static struct uprobe *__insert_uprobe(struct uprobe *uprobe) p = &parent->rb_right; } + u = NULL; rb_link_node(&uprobe->rb_node, parent, p); rb_insert_color(&uprobe->rb_node, &uprobes_tree); /* get access + creation ref */ atomic_set(&uprobe->ref, 2); + return u; } /* - * Acquires uprobes_treelock. + * Acquire uprobes_treelock. * Matching uprobe already exists in rbtree; * increment (access refcount) and return the matching uprobe. * @@ -460,6 +470,7 @@ static struct uprobe *insert_uprobe(struct uprobe *uprobe) spin_lock_irqsave(&uprobes_treelock, flags); u = __insert_uprobe(uprobe); spin_unlock_irqrestore(&uprobes_treelock, flags); + return u; } @@ -490,19 +501,22 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset) kfree(uprobe); uprobe = cur_uprobe; iput(inode); - } else + } else { atomic_inc(&uprobe_events); + } + return uprobe; } /* Returns the previous consumer */ -static struct uprobe_consumer *add_consumer(struct uprobe *uprobe, - struct uprobe_consumer *consumer) +static struct uprobe_consumer * +consumer_add(struct uprobe *uprobe, struct uprobe_consumer *consumer) { down_write(&uprobe->consumer_rwsem); consumer->next = uprobe->consumers; uprobe->consumers = consumer; up_write(&uprobe->consumer_rwsem); + return consumer->next; } @@ -511,8 +525,7 @@ static struct uprobe_consumer *add_consumer(struct uprobe *uprobe, * Return true if the @consumer is deleted successfully * or return false. */ -static bool del_consumer(struct uprobe *uprobe, - struct uprobe_consumer *consumer) +static bool consumer_del(struct uprobe *uprobe, struct uprobe_consumer *consumer) { struct uprobe_consumer **con; bool ret = false; @@ -526,6 +539,7 @@ static bool del_consumer(struct uprobe *uprobe, } } up_write(&uprobe->consumer_rwsem); + return ret; } @@ -557,15 +571,15 @@ static int __copy_insn(struct address_space *mapping, memcpy(insn, vaddr + off1, nbytes); kunmap_atomic(vaddr); page_cache_release(page); + return 0; } -static int copy_insn(struct uprobe *uprobe, struct vm_area_struct *vma, - unsigned long addr) +static int copy_insn(struct uprobe *uprobe, struct vm_area_struct *vma, unsigned long addr) { struct address_space *mapping; - int bytes; unsigned long nbytes; + int bytes; addr &= ~PAGE_MASK; nbytes = PAGE_SIZE - addr; @@ -605,6 +619,7 @@ static int install_breakpoint(struct mm_struct *mm, struct uprobe *uprobe, return -EEXIST; addr = (unsigned long)vaddr; + if (!(uprobe->flags & UPROBES_COPY_INSN)) { ret = copy_insn(uprobe, vma, addr); if (ret) @@ -613,7 +628,7 @@ static int install_breakpoint(struct mm_struct *mm, struct uprobe *uprobe, if (is_bkpt_insn((uprobe_opcode_t *)uprobe->insn)) return -EEXIST; - ret = analyze_insn(mm, uprobe); + ret = arch_uprobes_analyze_insn(mm, uprobe); if (ret) return ret; @@ -624,8 +639,7 @@ static int install_breakpoint(struct mm_struct *mm, struct uprobe *uprobe, return ret; } -static void remove_breakpoint(struct mm_struct *mm, struct uprobe *uprobe, - loff_t vaddr) +static void remove_breakpoint(struct mm_struct *mm, struct uprobe *uprobe, loff_t vaddr) { set_orig_insn(mm, uprobe, (unsigned long)vaddr, true); } @@ -649,9 +663,11 @@ static struct vma_info *__find_next_vma_info(struct list_head *head, struct prio_tree_iter iter; struct vm_area_struct *vma; struct vma_info *tmpvi; - loff_t vaddr; - unsigned long pgoff = offset >> PAGE_SHIFT; + unsigned long pgoff; int existing_vma; + loff_t vaddr; + + pgoff = offset >> PAGE_SHIFT; vma_prio_tree_foreach(vma, &iter, &mapping->i_mmap, pgoff, pgoff) { if (!valid_vma(vma, is_register)) @@ -659,6 +675,7 @@ static struct vma_info *__find_next_vma_info(struct list_head *head, existing_vma = 0; vaddr = vma_address(vma, offset); + list_for_each_entry(tmpvi, head, probe_list) { if (tmpvi->mm == vma->vm_mm && tmpvi->vaddr == vaddr) { existing_vma = 1; @@ -670,14 +687,15 @@ static struct vma_info *__find_next_vma_info(struct list_head *head, * Another vma needs a probe to be installed. However skip * installing the probe if the vma is about to be unlinked. */ - if (!existing_vma && - atomic_inc_not_zero(&vma->vm_mm->mm_users)) { + if (!existing_vma && atomic_inc_not_zero(&vma->vm_mm->mm_users)) { vi->mm = vma->vm_mm; vi->vaddr = vaddr; list_add(&vi->probe_list, head); + return vi; } } + return NULL; } @@ -685,11 +703,12 @@ static struct vma_info *__find_next_vma_info(struct list_head *head, * Iterate in the rmap prio tree and find a vma where a probe has not * yet been inserted. */ -static struct vma_info *find_next_vma_info(struct list_head *head, - loff_t offset, struct address_space *mapping, - bool is_register) +static struct vma_info * +find_next_vma_info(struct list_head *head, loff_t offset, struct address_space *mapping, + bool is_register) { struct vma_info *vi, *retvi; + vi = kzalloc(sizeof(struct vma_info), GFP_KERNEL); if (!vi) return ERR_PTR(-ENOMEM); @@ -700,6 +719,7 @@ static struct vma_info *find_next_vma_info(struct list_head *head, if (!retvi) kfree(vi); + return retvi; } @@ -711,16 +731,23 @@ static int register_for_each_vma(struct uprobe *uprobe, bool is_register) struct vma_info *vi, *tmpvi; struct mm_struct *mm; loff_t vaddr; - int ret = 0; + int ret; mapping = uprobe->inode->i_mapping; INIT_LIST_HEAD(&try_list); - while ((vi = find_next_vma_info(&try_list, uprobe->offset, - mapping, is_register)) != NULL) { + + ret = 0; + + for (;;) { + vi = find_next_vma_info(&try_list, uprobe->offset, mapping, is_register); + if (!vi) + break; + if (IS_ERR(vi)) { ret = PTR_ERR(vi); break; } + mm = vi->mm; down_read(&mm->mmap_sem); vma = find_vma(mm, (unsigned long)vi->vaddr); @@ -755,19 +782,21 @@ static int register_for_each_vma(struct uprobe *uprobe, bool is_register) break; } } + list_for_each_entry_safe(vi, tmpvi, &try_list, probe_list) { list_del(&vi->probe_list); kfree(vi); } + return ret; } -static int __register_uprobe(struct uprobe *uprobe) +static int __uprobe_register(struct uprobe *uprobe) { return register_for_each_vma(uprobe, true); } -static void __unregister_uprobe(struct uprobe *uprobe) +static void __uprobe_unregister(struct uprobe *uprobe) { if (!register_for_each_vma(uprobe, false)) delete_uprobe(uprobe); @@ -776,15 +805,15 @@ static void __unregister_uprobe(struct uprobe *uprobe) } /* - * register_uprobe - register a probe + * uprobe_register - register a probe * @inode: the file in which the probe has to be placed. * @offset: offset from the start of the file. * @consumer: information on howto handle the probe.. * - * Apart from the access refcount, register_uprobe() takes a creation + * Apart from the access refcount, uprobe_register() takes a creation * refcount (thro alloc_uprobe) if and only if this @uprobe is getting * inserted into the rbtree (i.e first consumer for a @inode:@offset - * tuple). Creation refcount stops unregister_uprobe from freeing the + * tuple). Creation refcount stops uprobe_unregister from freeing the * @uprobe even before the register operation is complete. Creation * refcount is released when the last @consumer for the @uprobe * unregisters. @@ -792,28 +821,29 @@ static void __unregister_uprobe(struct uprobe *uprobe) * Return errno if it cannot successully install probes * else return 0 (success) */ -int register_uprobe(struct inode *inode, loff_t offset, - struct uprobe_consumer *consumer) +int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *consumer) { struct uprobe *uprobe; - int ret = -EINVAL; + int ret; if (!inode || !consumer || consumer->next) - return ret; + return -EINVAL; if (offset > i_size_read(inode)) - return ret; + return -EINVAL; ret = 0; mutex_lock(uprobes_hash(inode)); uprobe = alloc_uprobe(inode, offset); - if (uprobe && !add_consumer(uprobe, consumer)) { - ret = __register_uprobe(uprobe); + + if (uprobe && !consumer_add(uprobe, consumer)) { + ret = __uprobe_register(uprobe); if (ret) { uprobe->consumers = NULL; - __unregister_uprobe(uprobe); - } else + __uprobe_unregister(uprobe); + } else { uprobe->flags |= UPROBES_RUN_HANDLER; + } } mutex_unlock(uprobes_hash(inode)); @@ -823,15 +853,14 @@ int register_uprobe(struct inode *inode, loff_t offset, } /* - * unregister_uprobe - unregister a already registered probe. + * uprobe_unregister - unregister a already registered probe. * @inode: the file in which the probe has to be removed. * @offset: offset from the start of the file. * @consumer: identify which probe if multiple probes are colocated. */ -void unregister_uprobe(struct inode *inode, loff_t offset, - struct uprobe_consumer *consumer) +void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *consumer) { - struct uprobe *uprobe = NULL; + struct uprobe *uprobe; if (!inode || !consumer) return; @@ -841,15 +870,14 @@ void unregister_uprobe(struct inode *inode, loff_t offset, return; mutex_lock(uprobes_hash(inode)); - if (!del_consumer(uprobe, consumer)) - goto unreg_out; - if (!uprobe->consumers) { - __unregister_uprobe(uprobe); - uprobe->flags &= ~UPROBES_RUN_HANDLER; + if (consumer_del(uprobe, consumer)) { + if (!uprobe->consumers) { + __uprobe_unregister(uprobe); + uprobe->flags &= ~UPROBES_RUN_HANDLER; + } } -unreg_out: mutex_unlock(uprobes_hash(inode)); if (uprobe) put_uprobe(uprobe); @@ -870,6 +898,7 @@ static struct rb_node *find_least_offset_node(struct inode *inode) while (n) { uprobe = rb_entry(n, struct uprobe, rb_node); match = match_uprobe(&u, uprobe); + if (uprobe->inode == inode) close_node = n; @@ -881,6 +910,7 @@ static struct rb_node *find_least_offset_node(struct inode *inode) else n = n->rb_right; } + return close_node; } @@ -890,11 +920,13 @@ static struct rb_node *find_least_offset_node(struct inode *inode) static void build_probe_list(struct inode *inode, struct list_head *head) { struct uprobe *uprobe; - struct rb_node *n; unsigned long flags; + struct rb_node *n; spin_lock_irqsave(&uprobes_treelock, flags); + n = find_least_offset_node(inode); + for (; n; n = rb_next(n)) { uprobe = rb_entry(n, struct uprobe, rb_node); if (uprobe->inode != inode) @@ -903,6 +935,7 @@ static void build_probe_list(struct inode *inode, struct list_head *head) list_add(&uprobe->pending_list, head); atomic_inc(&uprobe->ref); } + spin_unlock_irqrestore(&uprobes_treelock, flags); } @@ -912,42 +945,44 @@ static void build_probe_list(struct inode *inode, struct list_head *head) * * Return -ve no if we fail to insert probes and we cannot * bail-out. - * Return 0 otherwise. i.e : + * Return 0 otherwise. i.e: + * * - successful insertion of probes * - (or) no possible probes to be inserted. * - (or) insertion of probes failed but we can bail-out. */ -int mmap_uprobe(struct vm_area_struct *vma) +int uprobe_mmap(struct vm_area_struct *vma) { struct list_head tmp_list; struct uprobe *uprobe, *u; struct inode *inode; - int ret = 0; + int ret; if (!atomic_read(&uprobe_events) || !valid_vma(vma, true)) - return ret; /* Bail-out */ + return 0; inode = vma->vm_file->f_mapping->host; if (!inode) - return ret; + return 0; INIT_LIST_HEAD(&tmp_list); mutex_lock(uprobes_mmap_hash(inode)); build_probe_list(inode, &tmp_list); + + ret = 0; + list_for_each_entry_safe(uprobe, u, &tmp_list, pending_list) { loff_t vaddr; list_del(&uprobe->pending_list); if (!ret) { vaddr = vma_address(vma, uprobe->offset); - if (vaddr < vma->vm_start || vaddr >= vma->vm_end) { - put_uprobe(uprobe); - continue; + if (vaddr >= vma->vm_start && vaddr < vma->vm_end) { + ret = install_breakpoint(vma->vm_mm, uprobe, vma, vaddr); + /* Ignore double add: */ + if (ret == -EEXIST) + ret = 0; } - ret = install_breakpoint(vma->vm_mm, uprobe, vma, - vaddr); - if (ret == -EEXIST) - ret = 0; } put_uprobe(uprobe); } diff --git a/mm/mmap.c b/mm/mmap.c index 1aed183636d7..5a863d328a44 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -618,10 +618,10 @@ again: remove_next = 1 + (end > next->vm_end); mutex_unlock(&mapping->i_mmap_mutex); if (root) { - mmap_uprobe(vma); + uprobe_mmap(vma); if (adjust_next) - mmap_uprobe(next); + uprobe_mmap(next); } if (remove_next) { @@ -646,7 +646,7 @@ again: remove_next = 1 + (end > next->vm_end); } } if (insert && file) - mmap_uprobe(insert); + uprobe_mmap(insert); validate_mm(mm); @@ -1340,7 +1340,7 @@ out: } else if ((flags & MAP_POPULATE) && !(flags & MAP_NONBLOCK)) make_pages_present(addr, addr + len); - if (file && mmap_uprobe(vma)) + if (file && uprobe_mmap(vma)) /* matching probes but cannot insert */ goto unmap_and_free_vma; @@ -2301,7 +2301,7 @@ int insert_vm_struct(struct mm_struct * mm, struct vm_area_struct * vma) security_vm_enough_memory_mm(mm, vma_pages(vma))) return -ENOMEM; - if (vma->vm_file && mmap_uprobe(vma)) + if (vma->vm_file && uprobe_mmap(vma)) return -EINVAL; vma_link(mm, vma, prev, rb_link, rb_parent); @@ -2374,7 +2374,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap, if (new_vma->vm_file) { get_file(new_vma->vm_file); - if (mmap_uprobe(new_vma)) + if (uprobe_mmap(new_vma)) goto out_free_mempol; if (vma->vm_flags & VM_EXECUTABLE) -- cgit v1.2.3 From 96379f60075c75b261328aa7830ef8aa158247ac Mon Sep 17 00:00:00 2001 From: Srikar Dronamraju Date: Wed, 22 Feb 2012 14:45:49 +0530 Subject: uprobes/core: Remove uprobe_opcode_sz uprobe_opcode_sz refers to the smallest instruction size for that architecture. UPROBES_BKPT_INSN_SIZE refers to the size of the breakpoint instruction for that architecture. For now we are assuming that both uprobe_opcode_sz and UPROBES_BKPT_INSN_SIZE are the same for all archs and hence removing uprobe_opcode_sz in favour of UPROBES_BKPT_INSN_SIZE. However if we have to support architectures where the smallest instruction size is different from the size of breakpoint instruction, we may have to re-introduce uprobe_opcode_sz. Signed-off-by: Srikar Dronamraju Cc: Peter Zijlstra Cc: Linus Torvalds Cc: Oleg Nesterov Cc: Christoph Hellwig Cc: Steven Rostedt Cc: Masami Hiramatsu Cc: Anton Arapov Cc: Ananth N Mavinakayanahalli Cc: Jim Keniston Cc: Jiri Olsa Cc: Josh Stone Link: http://lkml.kernel.org/r/20120222091549.15880.67020.sendpatchset@srdronam.in.ibm.com Signed-off-by: Ingo Molnar --- include/linux/uprobes.h | 2 -- kernel/events/uprobes.c | 6 +++--- 2 files changed, 3 insertions(+), 5 deletions(-) (limited to 'include') diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h index 64e45f116b2a..fd45b70750d4 100644 --- a/include/linux/uprobes.h +++ b/include/linux/uprobes.h @@ -37,8 +37,6 @@ struct uprobe_arch_info {}; #define MAX_UINSN_BYTES 4 #endif -#define uprobe_opcode_sz sizeof(uprobe_opcode_t) - /* flags that denote/change uprobes behaviour */ /* Have a copy of original instruction */ diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 884817f1b0d3..ee496ad95db3 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -244,8 +244,8 @@ static int write_opcode(struct mm_struct *mm, struct uprobe *uprobe, /* poke the new insn in, ASSUMES we don't cross page boundary */ vaddr &= ~PAGE_MASK; - BUG_ON(vaddr + uprobe_opcode_sz > PAGE_SIZE); - memcpy(vaddr_new + vaddr, &opcode, uprobe_opcode_sz); + BUG_ON(vaddr + UPROBES_BKPT_INSN_SIZE > PAGE_SIZE); + memcpy(vaddr_new + vaddr, &opcode, UPROBES_BKPT_INSN_SIZE); kunmap_atomic(vaddr_new); kunmap_atomic(vaddr_old); @@ -293,7 +293,7 @@ static int read_opcode(struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_ lock_page(page); vaddr_new = kmap_atomic(page); vaddr &= ~PAGE_MASK; - memcpy(opcode, vaddr_new + vaddr, uprobe_opcode_sz); + memcpy(opcode, vaddr_new + vaddr, UPROBES_BKPT_INSN_SIZE); kunmap_atomic(vaddr_new); unlock_page(page); -- cgit v1.2.3 From 3ff54efdfaace9e9b2b7c1959a865be6b91de96c Mon Sep 17 00:00:00 2001 From: Srikar Dronamraju Date: Wed, 22 Feb 2012 14:46:02 +0530 Subject: uprobes/core: Move insn to arch specific structure Few cleanups suggested by Ingo Molnar. - Rename struct uprobe_arch_info to struct arch_uprobe. - Move insn from struct uprobe to struct arch_uprobe. - Make arch specific uprobe functions to accept struct arch_uprobe instead of struct uprobe. - Move struct uprobe to kernel/uprobes.c from include/linux/uprobes.h Signed-off-by: Srikar Dronamraju Cc: Peter Zijlstra Cc: Linus Torvalds Cc: Oleg Nesterov Cc: Christoph Hellwig Cc: Steven Rostedt Cc: Masami Hiramatsu Cc: Anton Arapov Cc: Ananth N Mavinakayanahalli Cc: Jim Keniston Cc: Jiri Olsa Cc: Josh Stone Link: http://lkml.kernel.org/r/20120222091602.15880.40249.sendpatchset@srdronam.in.ibm.com [ Made various small improvements ] Signed-off-by: Ingo Molnar --- arch/x86/include/asm/uprobes.h | 6 ++--- arch/x86/kernel/uprobes.c | 60 +++++++++++++++++++++--------------------- include/linux/uprobes.h | 23 ++-------------- kernel/events/uprobes.c | 38 +++++++++++++++++--------- 4 files changed, 61 insertions(+), 66 deletions(-) (limited to 'include') diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h index 072df3902636..f7ce310a429d 100644 --- a/arch/x86/include/asm/uprobes.h +++ b/arch/x86/include/asm/uprobes.h @@ -31,13 +31,13 @@ typedef u8 uprobe_opcode_t; #define UPROBES_BKPT_INSN 0xcc #define UPROBES_BKPT_INSN_SIZE 1 -struct uprobe_arch_info { +struct arch_uprobe { u16 fixups; + u8 insn[MAX_UINSN_BYTES]; #ifdef CONFIG_X86_64 unsigned long rip_rela_target_address; #endif }; -struct uprobe; -extern int arch_uprobes_analyze_insn(struct mm_struct *mm, struct uprobe *uprobe); +extern int arch_uprobes_analyze_insn(struct mm_struct *mm, struct arch_uprobe *arch_uprobe); #endif /* _ASM_UPROBES_H */ diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c index 13d616d6519b..04dfcef2d028 100644 --- a/arch/x86/kernel/uprobes.c +++ b/arch/x86/kernel/uprobes.c @@ -200,9 +200,9 @@ static bool is_prefix_bad(struct insn *insn) return false; } -static int validate_insn_32bits(struct uprobe *uprobe, struct insn *insn) +static int validate_insn_32bits(struct arch_uprobe *auprobe, struct insn *insn) { - insn_init(insn, uprobe->insn, false); + insn_init(insn, auprobe->insn, false); /* Skip good instruction prefixes; reject "bad" ones. */ insn_get_opcode(insn); @@ -222,11 +222,11 @@ static int validate_insn_32bits(struct uprobe *uprobe, struct insn *insn) /* * Figure out which fixups post_xol() will need to perform, and annotate - * uprobe->arch_info.fixups accordingly. To start with, - * uprobe->arch_info.fixups is either zero or it reflects rip-related + * arch_uprobe->fixups accordingly. To start with, + * arch_uprobe->fixups is either zero or it reflects rip-related * fixups. */ -static void prepare_fixups(struct uprobe *uprobe, struct insn *insn) +static void prepare_fixups(struct arch_uprobe *auprobe, struct insn *insn) { bool fix_ip = true, fix_call = false; /* defaults */ int reg; @@ -269,17 +269,17 @@ static void prepare_fixups(struct uprobe *uprobe, struct insn *insn) break; } if (fix_ip) - uprobe->arch_info.fixups |= UPROBES_FIX_IP; + auprobe->fixups |= UPROBES_FIX_IP; if (fix_call) - uprobe->arch_info.fixups |= UPROBES_FIX_CALL; + auprobe->fixups |= UPROBES_FIX_CALL; } #ifdef CONFIG_X86_64 /* - * If uprobe->insn doesn't use rip-relative addressing, return + * If arch_uprobe->insn doesn't use rip-relative addressing, return * immediately. Otherwise, rewrite the instruction so that it accesses * its memory operand indirectly through a scratch register. Set - * uprobe->arch_info.fixups and uprobe->arch_info.rip_rela_target_address + * arch_uprobe->fixups and arch_uprobe->rip_rela_target_address * accordingly. (The contents of the scratch register will be saved * before we single-step the modified instruction, and restored * afterward.) @@ -297,7 +297,7 @@ static void prepare_fixups(struct uprobe *uprobe, struct insn *insn) * - There's never a SIB byte. * - The displacement is always 4 bytes. */ -static void handle_riprel_insn(struct mm_struct *mm, struct uprobe *uprobe, struct insn *insn) +static void handle_riprel_insn(struct mm_struct *mm, struct arch_uprobe *auprobe, struct insn *insn) { u8 *cursor; u8 reg; @@ -305,7 +305,7 @@ static void handle_riprel_insn(struct mm_struct *mm, struct uprobe *uprobe, stru if (mm->context.ia32_compat) return; - uprobe->arch_info.rip_rela_target_address = 0x0; + auprobe->rip_rela_target_address = 0x0; if (!insn_rip_relative(insn)) return; @@ -315,7 +315,7 @@ static void handle_riprel_insn(struct mm_struct *mm, struct uprobe *uprobe, stru * we want to encode rax/rcx, not r8/r9. */ if (insn->rex_prefix.nbytes) { - cursor = uprobe->insn + insn_offset_rex_prefix(insn); + cursor = auprobe->insn + insn_offset_rex_prefix(insn); *cursor &= 0xfe; /* Clearing REX.B bit */ } @@ -324,7 +324,7 @@ static void handle_riprel_insn(struct mm_struct *mm, struct uprobe *uprobe, stru * displacement. Beyond the displacement, for some instructions, * is the immediate operand. */ - cursor = uprobe->insn + insn_offset_modrm(insn); + cursor = auprobe->insn + insn_offset_modrm(insn); insn_get_length(insn); /* @@ -341,18 +341,18 @@ static void handle_riprel_insn(struct mm_struct *mm, struct uprobe *uprobe, stru * is NOT the register operand, so we use %rcx (register * #1) for the scratch register. */ - uprobe->arch_info.fixups = UPROBES_FIX_RIP_CX; + auprobe->fixups = UPROBES_FIX_RIP_CX; /* Change modrm from 00 000 101 to 00 000 001. */ *cursor = 0x1; } else { /* Use %rax (register #0) for the scratch register. */ - uprobe->arch_info.fixups = UPROBES_FIX_RIP_AX; + auprobe->fixups = UPROBES_FIX_RIP_AX; /* Change modrm from 00 xxx 101 to 00 xxx 000 */ *cursor = (reg << 3); } /* Target address = address of next instruction + (signed) offset */ - uprobe->arch_info.rip_rela_target_address = (long)insn->length + insn->displacement.value; + auprobe->rip_rela_target_address = (long)insn->length + insn->displacement.value; /* Displacement field is gone; slide immediate field (if any) over. */ if (insn->immediate.nbytes) { @@ -362,9 +362,9 @@ static void handle_riprel_insn(struct mm_struct *mm, struct uprobe *uprobe, stru return; } -static int validate_insn_64bits(struct uprobe *uprobe, struct insn *insn) +static int validate_insn_64bits(struct arch_uprobe *auprobe, struct insn *insn) { - insn_init(insn, uprobe->insn, true); + insn_init(insn, auprobe->insn, true); /* Skip good instruction prefixes; reject "bad" ones. */ insn_get_opcode(insn); @@ -381,42 +381,42 @@ static int validate_insn_64bits(struct uprobe *uprobe, struct insn *insn) return -ENOTSUPP; } -static int validate_insn_bits(struct mm_struct *mm, struct uprobe *uprobe, struct insn *insn) +static int validate_insn_bits(struct mm_struct *mm, struct arch_uprobe *auprobe, struct insn *insn) { if (mm->context.ia32_compat) - return validate_insn_32bits(uprobe, insn); - return validate_insn_64bits(uprobe, insn); + return validate_insn_32bits(auprobe, insn); + return validate_insn_64bits(auprobe, insn); } #else /* 32-bit: */ -static void handle_riprel_insn(struct mm_struct *mm, struct uprobe *uprobe, struct insn *insn) +static void handle_riprel_insn(struct mm_struct *mm, struct arch_uprobe *auprobe, struct insn *insn) { /* No RIP-relative addressing on 32-bit */ } -static int validate_insn_bits(struct mm_struct *mm, struct uprobe *uprobe, struct insn *insn) +static int validate_insn_bits(struct mm_struct *mm, struct arch_uprobe *auprobe, struct insn *insn) { - return validate_insn_32bits(uprobe, insn); + return validate_insn_32bits(auprobe, insn); } #endif /* CONFIG_X86_64 */ /** * arch_uprobes_analyze_insn - instruction analysis including validity and fixups. * @mm: the probed address space. - * @uprobe: the probepoint information. + * @arch_uprobe: the probepoint information. * Return 0 on success or a -ve number on error. */ -int arch_uprobes_analyze_insn(struct mm_struct *mm, struct uprobe *uprobe) +int arch_uprobes_analyze_insn(struct mm_struct *mm, struct arch_uprobe *auprobe) { int ret; struct insn insn; - uprobe->arch_info.fixups = 0; - ret = validate_insn_bits(mm, uprobe, &insn); + auprobe->fixups = 0; + ret = validate_insn_bits(mm, auprobe, &insn); if (ret != 0) return ret; - handle_riprel_insn(mm, uprobe, &insn); - prepare_fixups(uprobe, &insn); + handle_riprel_insn(mm, auprobe, &insn); + prepare_fixups(auprobe, &insn); return 0; } diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h index fd45b70750d4..9c6be62787ed 100644 --- a/include/linux/uprobes.h +++ b/include/linux/uprobes.h @@ -29,12 +29,6 @@ struct vm_area_struct; #ifdef CONFIG_ARCH_SUPPORTS_UPROBES #include -#else - -typedef u8 uprobe_opcode_t; -struct uprobe_arch_info {}; - -#define MAX_UINSN_BYTES 4 #endif /* flags that denote/change uprobes behaviour */ @@ -56,22 +50,9 @@ struct uprobe_consumer { struct uprobe_consumer *next; }; -struct uprobe { - struct rb_node rb_node; /* node in the rb tree */ - atomic_t ref; - struct rw_semaphore consumer_rwsem; - struct list_head pending_list; - struct uprobe_arch_info arch_info; - struct uprobe_consumer *consumers; - struct inode *inode; /* Also hold a ref to inode */ - loff_t offset; - int flags; - u8 insn[MAX_UINSN_BYTES]; -}; - #ifdef CONFIG_UPROBES -extern int __weak set_bkpt(struct mm_struct *mm, struct uprobe *uprobe, unsigned long vaddr); -extern int __weak set_orig_insn(struct mm_struct *mm, struct uprobe *uprobe, unsigned long vaddr, bool verify); +extern int __weak set_bkpt(struct mm_struct *mm, struct arch_uprobe *auprobe, unsigned long vaddr); +extern int __weak set_orig_insn(struct mm_struct *mm, struct arch_uprobe *auprobe, unsigned long vaddr, bool verify); extern bool __weak is_bkpt_insn(uprobe_opcode_t *insn); extern int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *consumer); extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *consumer); diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index ee496ad95db3..13f1b5909af4 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -65,6 +65,18 @@ struct vma_info { loff_t vaddr; }; +struct uprobe { + struct rb_node rb_node; /* node in the rb tree */ + atomic_t ref; + struct rw_semaphore consumer_rwsem; + struct list_head pending_list; + struct uprobe_consumer *consumers; + struct inode *inode; /* Also hold a ref to inode */ + loff_t offset; + int flags; + struct arch_uprobe arch; +}; + /* * valid_vma: Verify if the specified vma is an executable vma * Relax restrictions while unregistering: vm_flags might have @@ -180,7 +192,7 @@ bool __weak is_bkpt_insn(uprobe_opcode_t *insn) /* * write_opcode - write the opcode at a given virtual address. * @mm: the probed process address space. - * @uprobe: the breakpointing information. + * @arch_uprobe: the breakpointing information. * @vaddr: the virtual address to store the opcode. * @opcode: opcode to be written at @vaddr. * @@ -190,13 +202,14 @@ bool __weak is_bkpt_insn(uprobe_opcode_t *insn) * For mm @mm, write the opcode at @vaddr. * Return 0 (success) or a negative errno. */ -static int write_opcode(struct mm_struct *mm, struct uprobe *uprobe, +static int write_opcode(struct mm_struct *mm, struct arch_uprobe *auprobe, unsigned long vaddr, uprobe_opcode_t opcode) { struct page *old_page, *new_page; struct address_space *mapping; void *vaddr_old, *vaddr_new; struct vm_area_struct *vma; + struct uprobe *uprobe; loff_t addr; int ret; @@ -216,6 +229,7 @@ static int write_opcode(struct mm_struct *mm, struct uprobe *uprobe, if (!valid_vma(vma, is_bkpt_insn(&opcode))) goto put_out; + uprobe = container_of(auprobe, struct uprobe, arch); mapping = uprobe->inode->i_mapping; if (mapping != vma->vm_file->f_mapping) goto put_out; @@ -326,7 +340,7 @@ static int is_bkpt_at_addr(struct mm_struct *mm, unsigned long vaddr) * For mm @mm, store the breakpoint instruction at @vaddr. * Return 0 (success) or a negative errno. */ -int __weak set_bkpt(struct mm_struct *mm, struct uprobe *uprobe, unsigned long vaddr) +int __weak set_bkpt(struct mm_struct *mm, struct arch_uprobe *auprobe, unsigned long vaddr) { int result; @@ -337,7 +351,7 @@ int __weak set_bkpt(struct mm_struct *mm, struct uprobe *uprobe, unsigned long v if (result) return result; - return write_opcode(mm, uprobe, vaddr, UPROBES_BKPT_INSN); + return write_opcode(mm, auprobe, vaddr, UPROBES_BKPT_INSN); } /** @@ -351,7 +365,7 @@ int __weak set_bkpt(struct mm_struct *mm, struct uprobe *uprobe, unsigned long v * Return 0 (success) or a negative errno. */ int __weak -set_orig_insn(struct mm_struct *mm, struct uprobe *uprobe, unsigned long vaddr, bool verify) +set_orig_insn(struct mm_struct *mm, struct arch_uprobe *auprobe, unsigned long vaddr, bool verify) { if (verify) { int result; @@ -363,7 +377,7 @@ set_orig_insn(struct mm_struct *mm, struct uprobe *uprobe, unsigned long vaddr, if (result != 1) return result; } - return write_opcode(mm, uprobe, vaddr, *(uprobe_opcode_t *)uprobe->insn); + return write_opcode(mm, auprobe, vaddr, *(uprobe_opcode_t *)auprobe->insn); } static int match_uprobe(struct uprobe *l, struct uprobe *r) @@ -593,13 +607,13 @@ static int copy_insn(struct uprobe *uprobe, struct vm_area_struct *vma, unsigned /* Instruction at the page-boundary; copy bytes in second page */ if (nbytes < bytes) { - if (__copy_insn(mapping, vma, uprobe->insn + nbytes, + if (__copy_insn(mapping, vma, uprobe->arch.insn + nbytes, bytes - nbytes, uprobe->offset + nbytes)) return -ENOMEM; bytes = nbytes; } - return __copy_insn(mapping, vma, uprobe->insn, bytes, uprobe->offset); + return __copy_insn(mapping, vma, uprobe->arch.insn, bytes, uprobe->offset); } static int install_breakpoint(struct mm_struct *mm, struct uprobe *uprobe, @@ -625,23 +639,23 @@ static int install_breakpoint(struct mm_struct *mm, struct uprobe *uprobe, if (ret) return ret; - if (is_bkpt_insn((uprobe_opcode_t *)uprobe->insn)) + if (is_bkpt_insn((uprobe_opcode_t *)uprobe->arch.insn)) return -EEXIST; - ret = arch_uprobes_analyze_insn(mm, uprobe); + ret = arch_uprobes_analyze_insn(mm, &uprobe->arch); if (ret) return ret; uprobe->flags |= UPROBES_COPY_INSN; } - ret = set_bkpt(mm, uprobe, addr); + ret = set_bkpt(mm, &uprobe->arch, addr); return ret; } static void remove_breakpoint(struct mm_struct *mm, struct uprobe *uprobe, loff_t vaddr) { - set_orig_insn(mm, uprobe, (unsigned long)vaddr, true); + set_orig_insn(mm, &uprobe->arch, (unsigned long)vaddr, true); } static void delete_uprobe(struct uprobe *uprobe) -- cgit v1.2.3 From 35aa621b5ab9d08767f7bc8d209b696df281d715 Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Wed, 22 Feb 2012 11:37:29 +0100 Subject: uprobes: Update copyright notices Add Peter Zijlstra's copyright to the uprobes code, whose contributions to the uprobes code are not visible in the Git history, because they were backmerged. Also update existing copyright notices to the year 2012. Acked-by: Srikar Dronamraju Cc: Peter Zijlstra Cc: Ananth N Mavinakayanahalli Cc: Jim Keniston Link: http://lkml.kernel.org/n/tip-vjqxst502pc1efz7ah8cyht4@git.kernel.org Signed-off-by: Ingo Molnar --- include/linux/uprobes.h | 3 ++- kernel/events/uprobes.c | 3 ++- 2 files changed, 4 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h index 9c6be62787ed..f85797e1ccd4 100644 --- a/include/linux/uprobes.h +++ b/include/linux/uprobes.h @@ -17,10 +17,11 @@ * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. * - * Copyright (C) IBM Corporation, 2008-2011 + * Copyright (C) IBM Corporation, 2008-2012 * Authors: * Srikar Dronamraju * Jim Keniston + * Copyright (C) 2011-2012 Red Hat, Inc., Peter Zijlstra */ #include diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 13f1b5909af4..5ce32e3ae9e9 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -15,10 +15,11 @@ * along with this program; if not, write to the Free Software * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. * - * Copyright (C) IBM Corporation, 2008-2011 + * Copyright (C) IBM Corporation, 2008-2012 * Authors: * Srikar Dronamraju * Jim Keniston + * Copyright (C) 2011-2012 Red Hat, Inc., Peter Zijlstra */ #include -- cgit v1.2.3 From b2fab5acd28ead6f0dd6c3996ba23f0ef1772f15 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Mon, 5 Mar 2012 13:14:57 -0800 Subject: elevator: make elevator_init_fn() return 0/-errno elevator_ops->elevator_init_fn() has a weird return value. It returns a void * which the caller should assign to q->elevator->elevator_data and %NULL return denotes init failure. Update such that it returns integer 0/-errno and sets elevator_data directly as necessary. This makes the interface more conventional and eases further cleanup. Signed-off-by: Tejun Heo Cc: Vivek Goyal Signed-off-by: Jens Axboe --- block/cfq-iosched.c | 9 +++++---- block/deadline-iosched.c | 8 +++++--- block/elevator.c | 12 ++---------- block/noop-iosched.c | 8 +++++--- include/linux/elevator.h | 2 +- 5 files changed, 18 insertions(+), 21 deletions(-) (limited to 'include') diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index 388fe01de18e..72680a6715fc 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -3656,7 +3656,7 @@ static void cfq_exit_queue(struct elevator_queue *e) kfree(cfqd); } -static void *cfq_init_queue(struct request_queue *q) +static int cfq_init_queue(struct request_queue *q) { struct cfq_data *cfqd; int i, j; @@ -3665,7 +3665,7 @@ static void *cfq_init_queue(struct request_queue *q) cfqd = kmalloc_node(sizeof(*cfqd), GFP_KERNEL | __GFP_ZERO, q->node); if (!cfqd) - return NULL; + return -ENOMEM; /* Init root service tree */ cfqd->grp_service_tree = CFQ_RB_ROOT; @@ -3692,7 +3692,7 @@ static void *cfq_init_queue(struct request_queue *q) if (blkio_alloc_blkg_stats(&cfqg->blkg)) { kfree(cfqg); kfree(cfqd); - return NULL; + return -ENOMEM; } rcu_read_lock(); @@ -3723,6 +3723,7 @@ static void *cfq_init_queue(struct request_queue *q) cfq_link_cfqq_cfqg(&cfqd->oom_cfqq, &cfqd->root_group); cfqd->queue = q; + q->elevator->elevator_data = cfqd; init_timer(&cfqd->idle_slice_timer); cfqd->idle_slice_timer.function = cfq_idle_slice_timer; @@ -3747,7 +3748,7 @@ static void *cfq_init_queue(struct request_queue *q) * second, in order to have larger depth for async operations. */ cfqd->last_delayed_sync = jiffies - HZ; - return cfqd; + return 0; } /* diff --git a/block/deadline-iosched.c b/block/deadline-iosched.c index 7bf12d793fcd..599b12e5380f 100644 --- a/block/deadline-iosched.c +++ b/block/deadline-iosched.c @@ -337,13 +337,13 @@ static void deadline_exit_queue(struct elevator_queue *e) /* * initialize elevator private data (deadline_data). */ -static void *deadline_init_queue(struct request_queue *q) +static int deadline_init_queue(struct request_queue *q) { struct deadline_data *dd; dd = kmalloc_node(sizeof(*dd), GFP_KERNEL | __GFP_ZERO, q->node); if (!dd) - return NULL; + return -ENOMEM; INIT_LIST_HEAD(&dd->fifo_list[READ]); INIT_LIST_HEAD(&dd->fifo_list[WRITE]); @@ -354,7 +354,9 @@ static void *deadline_init_queue(struct request_queue *q) dd->writes_starved = writes_starved; dd->front_merges = 1; dd->fifo_batch = fifo_batch; - return dd; + + q->elevator->elevator_data = dd; + return 0; } /* diff --git a/block/elevator.c b/block/elevator.c index f8c08e1bff2b..f81c061dad15 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -121,14 +121,6 @@ static struct elevator_type *elevator_get(const char *name) return e; } -static int elevator_init_queue(struct request_queue *q) -{ - q->elevator->elevator_data = q->elevator->type->ops.elevator_init_fn(q); - if (q->elevator->elevator_data) - return 0; - return -ENOMEM; -} - static char chosen_elevator[ELV_NAME_MAX]; static int __init elevator_setup(char *str) @@ -224,7 +216,7 @@ int elevator_init(struct request_queue *q, char *name) if (!q->elevator) return -ENOMEM; - err = elevator_init_queue(q); + err = e->ops.elevator_init_fn(q); if (err) { kobject_put(&q->elevator->kobj); return err; @@ -927,7 +919,7 @@ static int elevator_switch(struct request_queue *q, struct elevator_type *new_e) if (!q->elevator) goto fail_init; - err = elevator_init_queue(q); + err = new_e->ops.elevator_init_fn(q); if (err) { kobject_put(&q->elevator->kobj); goto fail_init; diff --git a/block/noop-iosched.c b/block/noop-iosched.c index 413a0b1d788c..5d1bf70e33d5 100644 --- a/block/noop-iosched.c +++ b/block/noop-iosched.c @@ -59,15 +59,17 @@ noop_latter_request(struct request_queue *q, struct request *rq) return list_entry(rq->queuelist.next, struct request, queuelist); } -static void *noop_init_queue(struct request_queue *q) +static int noop_init_queue(struct request_queue *q) { struct noop_data *nd; nd = kmalloc_node(sizeof(*nd), GFP_KERNEL, q->node); if (!nd) - return NULL; + return -ENOMEM; + INIT_LIST_HEAD(&nd->queue); - return nd; + q->elevator->elevator_data = nd; + return 0; } static void noop_exit_queue(struct elevator_queue *e) diff --git a/include/linux/elevator.h b/include/linux/elevator.h index 7d4e0356f329..97fb2557a18c 100644 --- a/include/linux/elevator.h +++ b/include/linux/elevator.h @@ -33,7 +33,7 @@ typedef void (elevator_put_req_fn) (struct request *); typedef void (elevator_activate_req_fn) (struct request_queue *, struct request *); typedef void (elevator_deactivate_req_fn) (struct request_queue *, struct request *); -typedef void *(elevator_init_fn) (struct request_queue *); +typedef int (elevator_init_fn) (struct request_queue *); typedef void (elevator_exit_fn) (struct elevator_queue *); struct elevator_ops -- cgit v1.2.3 From d732580b4eb31553c63744a47d590f770cafb8f0 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Mon, 5 Mar 2012 13:14:58 -0800 Subject: block: implement blk_queue_bypass_start/end() Rename and extend elv_queisce_start/end() to blk_queue_bypass_start/end() which are exported and supports nesting via @q->bypass_depth. Also add blk_queue_bypass() to test bypass state. This will be further extended and used for blkio_group management. Signed-off-by: Tejun Heo Cc: Vivek Goyal Signed-off-by: Jens Axboe --- block/blk-core.c | 39 +++++++++++++++++++++++++++++++++++++-- block/blk.h | 6 ++---- block/elevator.c | 25 +++---------------------- include/linux/blkdev.h | 5 ++++- 4 files changed, 46 insertions(+), 29 deletions(-) (limited to 'include') diff --git a/block/blk-core.c b/block/blk-core.c index fccb25021121..98ddef430093 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -409,6 +409,42 @@ void blk_drain_queue(struct request_queue *q, bool drain_all) } } +/** + * blk_queue_bypass_start - enter queue bypass mode + * @q: queue of interest + * + * In bypass mode, only the dispatch FIFO queue of @q is used. This + * function makes @q enter bypass mode and drains all requests which were + * issued before. On return, it's guaranteed that no request has ELVPRIV + * set. + */ +void blk_queue_bypass_start(struct request_queue *q) +{ + spin_lock_irq(q->queue_lock); + q->bypass_depth++; + queue_flag_set(QUEUE_FLAG_BYPASS, q); + spin_unlock_irq(q->queue_lock); + + blk_drain_queue(q, false); +} +EXPORT_SYMBOL_GPL(blk_queue_bypass_start); + +/** + * blk_queue_bypass_end - leave queue bypass mode + * @q: queue of interest + * + * Leave bypass mode and restore the normal queueing behavior. + */ +void blk_queue_bypass_end(struct request_queue *q) +{ + spin_lock_irq(q->queue_lock); + if (!--q->bypass_depth) + queue_flag_clear(QUEUE_FLAG_BYPASS, q); + WARN_ON_ONCE(q->bypass_depth < 0); + spin_unlock_irq(q->queue_lock); +} +EXPORT_SYMBOL_GPL(blk_queue_bypass_end); + /** * blk_cleanup_queue - shutdown a request queue * @q: request queue to shutdown @@ -862,8 +898,7 @@ retry: * Also, lookup icq while holding queue_lock. If it doesn't exist, * it will be created after releasing queue_lock. */ - if (blk_rq_should_init_elevator(bio) && - !test_bit(QUEUE_FLAG_ELVSWITCH, &q->queue_flags)) { + if (blk_rq_should_init_elevator(bio) && !blk_queue_bypass(q)) { rw_flags |= REQ_ELVPRIV; rl->elvpriv++; if (et->icq_cache && ioc) diff --git a/block/blk.h b/block/blk.h index 9c12f80882b0..7422f3133c5d 100644 --- a/block/blk.h +++ b/block/blk.h @@ -23,7 +23,8 @@ void blk_rq_bio_prep(struct request_queue *q, struct request *rq, struct bio *bio); int blk_rq_append_bio(struct request_queue *q, struct request *rq, struct bio *bio); -void blk_drain_queue(struct request_queue *q, bool drain_all); +void blk_queue_bypass_start(struct request_queue *q); +void blk_queue_bypass_end(struct request_queue *q); void blk_dequeue_request(struct request *rq); void __blk_queue_free_tags(struct request_queue *q); bool __blk_end_bidi_request(struct request *rq, int error, @@ -144,9 +145,6 @@ void blk_queue_congestion_threshold(struct request_queue *q); int blk_dev_init(void); -void elv_quiesce_start(struct request_queue *q); -void elv_quiesce_end(struct request_queue *q); - /* * Return the threshold (number of used requests) at which the queue is diff --git a/block/elevator.c b/block/elevator.c index f81c061dad15..0bdea0ed03a3 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -553,25 +553,6 @@ void elv_drain_elevator(struct request_queue *q) } } -void elv_quiesce_start(struct request_queue *q) -{ - if (!q->elevator) - return; - - spin_lock_irq(q->queue_lock); - queue_flag_set(QUEUE_FLAG_ELVSWITCH, q); - spin_unlock_irq(q->queue_lock); - - blk_drain_queue(q, false); -} - -void elv_quiesce_end(struct request_queue *q) -{ - spin_lock_irq(q->queue_lock); - queue_flag_clear(QUEUE_FLAG_ELVSWITCH, q); - spin_unlock_irq(q->queue_lock); -} - void __elv_add_request(struct request_queue *q, struct request *rq, int where) { trace_block_rq_insert(q, rq); @@ -903,7 +884,7 @@ static int elevator_switch(struct request_queue *q, struct elevator_type *new_e) * using INSERT_BACK. All requests have SOFTBARRIER set and no * merge happens either. */ - elv_quiesce_start(q); + blk_queue_bypass_start(q); /* unregister and clear all auxiliary data of the old elevator */ if (registered) @@ -933,7 +914,7 @@ static int elevator_switch(struct request_queue *q, struct elevator_type *new_e) /* done, kill the old one and finish */ elevator_exit(old); - elv_quiesce_end(q); + blk_queue_bypass_end(q); blk_add_trace_msg(q, "elv switch: %s", new_e->elevator_name); @@ -945,7 +926,7 @@ fail_init: /* switch failed, restore and re-register old elevator */ q->elevator = old; elv_register_queue(q); - elv_quiesce_end(q); + blk_queue_bypass_end(q); return err; } diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 606cf339bb56..315db1d91bc4 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -389,6 +389,8 @@ struct request_queue { struct mutex sysfs_lock; + int bypass_depth; + #if defined(CONFIG_BLK_DEV_BSG) bsg_job_fn *bsg_job_fn; int bsg_job_size; @@ -406,7 +408,7 @@ struct request_queue { #define QUEUE_FLAG_SYNCFULL 3 /* read queue has been filled */ #define QUEUE_FLAG_ASYNCFULL 4 /* write queue has been filled */ #define QUEUE_FLAG_DEAD 5 /* queue being torn down */ -#define QUEUE_FLAG_ELVSWITCH 6 /* don't use elevator, just do FIFO */ +#define QUEUE_FLAG_BYPASS 6 /* act as dumb FIFO queue */ #define QUEUE_FLAG_BIDI 7 /* queue supports bidi requests */ #define QUEUE_FLAG_NOMERGES 8 /* disable merge attempts */ #define QUEUE_FLAG_SAME_COMP 9 /* complete on same CPU-group */ @@ -494,6 +496,7 @@ static inline void queue_flag_clear(unsigned int flag, struct request_queue *q) #define blk_queue_tagged(q) test_bit(QUEUE_FLAG_QUEUED, &(q)->queue_flags) #define blk_queue_stopped(q) test_bit(QUEUE_FLAG_STOPPED, &(q)->queue_flags) #define blk_queue_dead(q) test_bit(QUEUE_FLAG_DEAD, &(q)->queue_flags) +#define blk_queue_bypass(q) test_bit(QUEUE_FLAG_BYPASS, &(q)->queue_flags) #define blk_queue_nomerges(q) test_bit(QUEUE_FLAG_NOMERGES, &(q)->queue_flags) #define blk_queue_noxmerges(q) \ test_bit(QUEUE_FLAG_NOXMERGES, &(q)->queue_flags) -- cgit v1.2.3 From 923adde1be1df57cebd80c563058e503376645e8 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Mon, 5 Mar 2012 13:15:13 -0800 Subject: blkcg: clear all request_queues on blkcg policy [un]registrations Keep track of all request_queues which have blkcg initialized and turn on bypass and invoke blkcg_clear_queue() on all before making changes to blkcg policies. This is to prepare for moving blkg management into blkcg core. Note that this uses more brute force than necessary. Finer grained shoot down will be implemented later and given that policy [un]registration almost never happens on running systems (blk-throtl can't be built as a module and cfq usually is the builtin default iosched), this shouldn't be a problem for the time being. Signed-off-by: Tejun Heo Cc: Vivek Goyal Signed-off-by: Jens Axboe --- block/blk-cgroup.c | 48 +++++++++++++++++++++++++++++++++++++++++++++++- include/linux/blkdev.h | 3 +++ 2 files changed, 50 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index b302ce1d662b..266c0707d588 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -27,6 +27,9 @@ static DEFINE_SPINLOCK(blkio_list_lock); static LIST_HEAD(blkio_list); +static DEFINE_MUTEX(all_q_mutex); +static LIST_HEAD(all_q_list); + struct blkio_cgroup blkio_root_cgroup = { .weight = 2*BLKIO_WEIGHT_DEFAULT }; EXPORT_SYMBOL_GPL(blkio_root_cgroup); @@ -1472,9 +1475,20 @@ done: */ int blkcg_init_queue(struct request_queue *q) { + int ret; + might_sleep(); - return blk_throtl_init(q); + ret = blk_throtl_init(q); + if (ret) + return ret; + + mutex_lock(&all_q_mutex); + INIT_LIST_HEAD(&q->all_q_node); + list_add_tail(&q->all_q_node, &all_q_list); + mutex_unlock(&all_q_mutex); + + return 0; } /** @@ -1498,6 +1512,10 @@ void blkcg_drain_queue(struct request_queue *q) */ void blkcg_exit_queue(struct request_queue *q) { + mutex_lock(&all_q_mutex); + list_del_init(&q->all_q_node); + mutex_unlock(&all_q_mutex); + blk_throtl_exit(q); } @@ -1543,8 +1561,33 @@ static void blkiocg_attach(struct cgroup_subsys *ss, struct cgroup *cgrp, } } +static void blkcg_bypass_start(void) + __acquires(&all_q_mutex) +{ + struct request_queue *q; + + mutex_lock(&all_q_mutex); + + list_for_each_entry(q, &all_q_list, all_q_node) { + blk_queue_bypass_start(q); + blkg_destroy_all(q); + } +} + +static void blkcg_bypass_end(void) + __releases(&all_q_mutex) +{ + struct request_queue *q; + + list_for_each_entry(q, &all_q_list, all_q_node) + blk_queue_bypass_end(q); + + mutex_unlock(&all_q_mutex); +} + void blkio_policy_register(struct blkio_policy_type *blkiop) { + blkcg_bypass_start(); spin_lock(&blkio_list_lock); BUG_ON(blkio_policy[blkiop->plid]); @@ -1552,11 +1595,13 @@ void blkio_policy_register(struct blkio_policy_type *blkiop) list_add_tail(&blkiop->list, &blkio_list); spin_unlock(&blkio_list_lock); + blkcg_bypass_end(); } EXPORT_SYMBOL_GPL(blkio_policy_register); void blkio_policy_unregister(struct blkio_policy_type *blkiop) { + blkcg_bypass_start(); spin_lock(&blkio_list_lock); BUG_ON(blkio_policy[blkiop->plid] != blkiop); @@ -1564,5 +1609,6 @@ void blkio_policy_unregister(struct blkio_policy_type *blkiop) list_del_init(&blkiop->list); spin_unlock(&blkio_list_lock); + blkcg_bypass_end(); } EXPORT_SYMBOL_GPL(blkio_policy_unregister); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 315db1d91bc4..e8c0bbd06b9a 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -397,6 +397,9 @@ struct request_queue { struct bsg_class_device bsg_dev; #endif +#ifdef CONFIG_BLK_CGROUP + struct list_head all_q_node; +#endif #ifdef CONFIG_BLK_DEV_THROTTLING /* Throttle data */ struct throtl_data *td; -- cgit v1.2.3 From 4eef3049986e8397d5003916aed8cad6567a5e02 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Mon, 5 Mar 2012 13:15:18 -0800 Subject: blkcg: move per-queue blkg list heads and counters to queue and blkg Currently, specific policy implementations are responsible for maintaining list and number of blkgs. This duplicates code unnecessarily, and hinders factoring common code and providing blkcg API with better defined semantics. After this patch, request_queue hosts list heads and counters and blkg has list nodes for both policies. This patch only relocates the necessary fields and the next patch will actually move management code into blkcg core. Note that request_queue->blkg_list[] and ->nr_blkgs[] are hardcoded to have 2 elements. This is to avoid include dependency and will be removed by the next patch. This patch doesn't introduce any behavior change. -v2: Now unnecessary conditional on CONFIG_BLK_CGROUP_MODULE removed as pointed out by Vivek. Signed-off-by: Tejun Heo Cc: Vivek Goyal Signed-off-by: Jens Axboe --- block/blk-cgroup.c | 2 ++ block/blk-cgroup.h | 1 + block/blk-core.c | 4 ++++ block/blk-throttle.c | 49 +++++++++++++++++++++++-------------------------- block/cfq-iosched.c | 47 +++++++++++++++++++---------------------------- include/linux/blkdev.h | 5 +++++ 6 files changed, 54 insertions(+), 54 deletions(-) (limited to 'include') diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 91f9824be5cc..e940972ccd66 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -499,6 +499,8 @@ static struct blkio_group *blkg_alloc(struct blkio_cgroup *blkcg, spin_lock_init(&blkg->stats_lock); rcu_assign_pointer(blkg->q, q); + INIT_LIST_HEAD(&blkg->q_node[0]); + INIT_LIST_HEAD(&blkg->q_node[1]); blkg->blkcg = blkcg; blkg->plid = pol->plid; blkg->refcnt = 1; diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h index 60e96b4be4ce..ae96f196d469 100644 --- a/block/blk-cgroup.h +++ b/block/blk-cgroup.h @@ -178,6 +178,7 @@ struct blkg_policy_data { struct blkio_group { /* Pointer to the associated request_queue, RCU protected */ struct request_queue __rcu *q; + struct list_head q_node[BLKIO_NR_POLICIES]; struct hlist_node blkcg_node; struct blkio_cgroup *blkcg; /* Store cgroup path */ diff --git a/block/blk-core.c b/block/blk-core.c index c3434c6395b9..83a47fcf5946 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -547,6 +547,10 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) INIT_LIST_HEAD(&q->queue_head); INIT_LIST_HEAD(&q->timeout_list); INIT_LIST_HEAD(&q->icq_list); +#ifdef CONFIG_BLK_CGROUP + INIT_LIST_HEAD(&q->blkg_list[0]); + INIT_LIST_HEAD(&q->blkg_list[1]); +#endif INIT_LIST_HEAD(&q->flush_queue[0]); INIT_LIST_HEAD(&q->flush_queue[1]); INIT_LIST_HEAD(&q->flush_data_in_flight); diff --git a/block/blk-throttle.c b/block/blk-throttle.c index b2fddaf20b98..c15d38307e1d 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -41,9 +41,6 @@ struct throtl_rb_root { #define rb_entry_tg(node) rb_entry((node), struct throtl_grp, rb_node) struct throtl_grp { - /* List of throtl groups on the request queue*/ - struct hlist_node tg_node; - /* active throtl group service_tree member */ struct rb_node rb_node; @@ -83,9 +80,6 @@ struct throtl_grp { struct throtl_data { - /* List of throtl groups */ - struct hlist_head tg_list; - /* service tree for active throtl groups */ struct throtl_rb_root tg_service_tree; @@ -152,7 +146,6 @@ static void throtl_init_blkio_group(struct blkio_group *blkg) { struct throtl_grp *tg = blkg_to_tg(blkg); - INIT_HLIST_NODE(&tg->tg_node); RB_CLEAR_NODE(&tg->rb_node); bio_list_init(&tg->bio_lists[0]); bio_list_init(&tg->bio_lists[1]); @@ -167,11 +160,9 @@ static void throtl_init_blkio_group(struct blkio_group *blkg) static void throtl_link_blkio_group(struct request_queue *q, struct blkio_group *blkg) { - struct throtl_data *td = q->td; - struct throtl_grp *tg = blkg_to_tg(blkg); - - hlist_add_head(&tg->tg_node, &td->tg_list); - td->nr_undestroyed_grps++; + list_add(&blkg->q_node[BLKIO_POLICY_THROTL], + &q->blkg_list[BLKIO_POLICY_THROTL]); + q->nr_blkgs[BLKIO_POLICY_THROTL]++; } static struct @@ -711,8 +702,8 @@ static int throtl_select_dispatch(struct throtl_data *td, struct bio_list *bl) static void throtl_process_limit_change(struct throtl_data *td) { - struct throtl_grp *tg; - struct hlist_node *pos, *n; + struct request_queue *q = td->queue; + struct blkio_group *blkg, *n; if (!td->limits_changed) return; @@ -721,7 +712,10 @@ static void throtl_process_limit_change(struct throtl_data *td) throtl_log(td, "limits changed"); - hlist_for_each_entry_safe(tg, pos, n, &td->tg_list, tg_node) { + list_for_each_entry_safe(blkg, n, &q->blkg_list[BLKIO_POLICY_THROTL], + q_node[BLKIO_POLICY_THROTL]) { + struct throtl_grp *tg = blkg_to_tg(blkg); + if (!tg->limits_changed) continue; @@ -822,26 +816,31 @@ throtl_schedule_delayed_work(struct throtl_data *td, unsigned long delay) static void throtl_destroy_tg(struct throtl_data *td, struct throtl_grp *tg) { + struct blkio_group *blkg = tg_to_blkg(tg); + /* Something wrong if we are trying to remove same group twice */ - BUG_ON(hlist_unhashed(&tg->tg_node)); + WARN_ON_ONCE(list_empty(&blkg->q_node[BLKIO_POLICY_THROTL])); - hlist_del_init(&tg->tg_node); + list_del_init(&blkg->q_node[BLKIO_POLICY_THROTL]); /* * Put the reference taken at the time of creation so that when all * queues are gone, group can be destroyed. */ blkg_put(tg_to_blkg(tg)); - td->nr_undestroyed_grps--; + td->queue->nr_blkgs[BLKIO_POLICY_THROTL]--; } static bool throtl_release_tgs(struct throtl_data *td, bool release_root) { - struct hlist_node *pos, *n; - struct throtl_grp *tg; + struct request_queue *q = td->queue; + struct blkio_group *blkg, *n; bool empty = true; - hlist_for_each_entry_safe(tg, pos, n, &td->tg_list, tg_node) { + list_for_each_entry_safe(blkg, n, &q->blkg_list[BLKIO_POLICY_THROTL], + q_node[BLKIO_POLICY_THROTL]) { + struct throtl_grp *tg = blkg_to_tg(blkg); + /* skip root? */ if (!release_root && tg == td->root_tg) continue; @@ -851,7 +850,7 @@ static bool throtl_release_tgs(struct throtl_data *td, bool release_root) * it from cgroup list, then it will take care of destroying * cfqg also. */ - if (!blkiocg_del_blkio_group(tg_to_blkg(tg))) + if (!blkiocg_del_blkio_group(blkg)) throtl_destroy_tg(td, tg); else empty = false; @@ -1114,7 +1113,6 @@ int blk_throtl_init(struct request_queue *q) if (!td) return -ENOMEM; - INIT_HLIST_HEAD(&td->tg_list); td->tg_service_tree = THROTL_RB_ROOT; td->limits_changed = false; INIT_DELAYED_WORK(&td->throtl_work, blk_throtl_work); @@ -1144,7 +1142,7 @@ int blk_throtl_init(struct request_queue *q) void blk_throtl_exit(struct request_queue *q) { struct throtl_data *td = q->td; - bool wait = false; + bool wait; BUG_ON(!td); @@ -1154,8 +1152,7 @@ void blk_throtl_exit(struct request_queue *q) throtl_release_tgs(td, true); /* If there are other groups */ - if (td->nr_undestroyed_grps > 0) - wait = true; + wait = q->nr_blkgs[BLKIO_POLICY_THROTL]; spin_unlock_irq(q->queue_lock); diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index 11dd9d7f2edb..e846803280a6 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -208,9 +208,7 @@ struct cfq_group { unsigned long saved_workload_slice; enum wl_type_t saved_workload; enum wl_prio_t saved_serving_prio; -#ifdef CONFIG_CFQ_GROUP_IOSCHED - struct hlist_node cfqd_node; -#endif + /* number of requests that are on the dispatch list or inside driver */ int dispatched; struct cfq_ttime ttime; @@ -302,12 +300,6 @@ struct cfq_data { struct cfq_queue oom_cfqq; unsigned long last_delayed_sync; - - /* List of cfq groups being managed on this device*/ - struct hlist_head cfqg_list; - - /* Number of groups which are on blkcg->blkg_list */ - unsigned int nr_blkcg_linked_grps; }; static inline struct cfq_group *blkg_to_cfqg(struct blkio_group *blkg) @@ -1056,13 +1048,9 @@ static void cfq_update_blkio_group_weight(struct request_queue *q, static void cfq_link_blkio_group(struct request_queue *q, struct blkio_group *blkg) { - struct cfq_data *cfqd = q->elevator->elevator_data; - struct cfq_group *cfqg = blkg_to_cfqg(blkg); - - cfqd->nr_blkcg_linked_grps++; - - /* Add group on cfqd list */ - hlist_add_head(&cfqg->cfqd_node, &cfqd->cfqg_list); + list_add(&blkg->q_node[BLKIO_POLICY_PROP], + &q->blkg_list[BLKIO_POLICY_PROP]); + q->nr_blkgs[BLKIO_POLICY_PROP]++; } static void cfq_init_blkio_group(struct blkio_group *blkg) @@ -1110,13 +1098,15 @@ static void cfq_link_cfqq_cfqg(struct cfq_queue *cfqq, struct cfq_group *cfqg) static void cfq_destroy_cfqg(struct cfq_data *cfqd, struct cfq_group *cfqg) { + struct blkio_group *blkg = cfqg_to_blkg(cfqg); + /* Something wrong if we are trying to remove same group twice */ - BUG_ON(hlist_unhashed(&cfqg->cfqd_node)); + BUG_ON(list_empty(&blkg->q_node[BLKIO_POLICY_PROP])); - hlist_del_init(&cfqg->cfqd_node); + list_del_init(&blkg->q_node[BLKIO_POLICY_PROP]); - BUG_ON(cfqd->nr_blkcg_linked_grps <= 0); - cfqd->nr_blkcg_linked_grps--; + BUG_ON(cfqd->queue->nr_blkgs[BLKIO_POLICY_PROP] <= 0); + cfqd->queue->nr_blkgs[BLKIO_POLICY_PROP]--; /* * Put the reference taken at the time of creation so that when all @@ -1127,18 +1117,19 @@ static void cfq_destroy_cfqg(struct cfq_data *cfqd, struct cfq_group *cfqg) static bool cfq_release_cfq_groups(struct cfq_data *cfqd) { - struct hlist_node *pos, *n; - struct cfq_group *cfqg; + struct request_queue *q = cfqd->queue; + struct blkio_group *blkg, *n; bool empty = true; - hlist_for_each_entry_safe(cfqg, pos, n, &cfqd->cfqg_list, cfqd_node) { + list_for_each_entry_safe(blkg, n, &q->blkg_list[BLKIO_POLICY_PROP], + q_node[BLKIO_POLICY_PROP]) { /* * If cgroup removal path got to blk_group first and removed * it from cgroup list, then it will take care of destroying * cfqg also. */ - if (!cfq_blkiocg_del_blkio_group(cfqg_to_blkg(cfqg))) - cfq_destroy_cfqg(cfqd, cfqg); + if (!cfq_blkiocg_del_blkio_group(blkg)) + cfq_destroy_cfqg(cfqd, blkg_to_cfqg(blkg)); else empty = false; } @@ -3558,13 +3549,13 @@ static void cfq_exit_queue(struct elevator_queue *e) cfq_put_async_queues(cfqd); cfq_release_cfq_groups(cfqd); +#ifdef CONFIG_BLK_CGROUP /* * If there are groups which we could not unlink from blkcg list, * wait for a rcu period for them to be freed. */ - if (cfqd->nr_blkcg_linked_grps) - wait = true; - + wait = q->nr_blkgs[BLKIO_POLICY_PROP]; +#endif spin_unlock_irq(q->queue_lock); cfq_shutdown_timer_wq(cfqd); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index e8c0bbd06b9a..f4e35edea70f 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -362,6 +362,11 @@ struct request_queue { struct list_head timeout_list; struct list_head icq_list; +#ifdef CONFIG_BLK_CGROUP + /* XXX: array size hardcoded to avoid include dependency (temporary) */ + struct list_head blkg_list[2]; + int nr_blkgs[2]; +#endif struct queue_limits limits; -- cgit v1.2.3 From 03aa264ac15637b6f98374270bcdf31400965505 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Mon, 5 Mar 2012 13:15:19 -0800 Subject: blkcg: let blkcg core manage per-queue blkg list and counter With the previous patch to move blkg list heads and counters to request_queue and blkg, logic to manage them in both policies are almost identical and can be moved to blkcg core. This patch moves blkg link logic into blkg_lookup_create(), implements common blkg unlink code in blkg_destroy(), and updates blkg_destory_all() so that it's policy specific and can skip root group. The updated blkg_destroy_all() is now used to both clear queue for bypassing and elv switching, and release all blkgs on q exit. This patch introduces a race window where policy [de]registration may race against queue blkg clearing. This can only be a problem on cfq unload and shouldn't be a real problem in practice (and we have many other places where this race already exists). Future patches will remove these unlikely races. Signed-off-by: Tejun Heo Cc: Vivek Goyal Signed-off-by: Jens Axboe --- block/blk-cgroup.c | 72 +++++++++++++++++++++++++++-------- block/blk-cgroup.h | 15 +++----- block/blk-throttle.c | 99 +----------------------------------------------- block/cfq-iosched.c | 100 +++---------------------------------------------- block/elevator.c | 5 ++- include/linux/blkdev.h | 4 +- 6 files changed, 74 insertions(+), 221 deletions(-) (limited to 'include') diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index e940972ccd66..2ca9a15db0f7 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -596,8 +596,11 @@ struct blkio_group *blkg_lookup_create(struct blkio_cgroup *blkcg, /* insert */ spin_lock(&blkcg->lock); swap(blkg, new_blkg); + hlist_add_head_rcu(&blkg->blkcg_node, &blkcg->blkg_list); - pol->ops.blkio_link_group_fn(q, blkg); + list_add(&blkg->q_node[plid], &q->blkg_list[plid]); + q->nr_blkgs[plid]++; + spin_unlock(&blkcg->lock); out: blkg_free(new_blkg); @@ -646,36 +649,69 @@ struct blkio_group *blkg_lookup(struct blkio_cgroup *blkcg, } EXPORT_SYMBOL_GPL(blkg_lookup); -void blkg_destroy_all(struct request_queue *q) +static void blkg_destroy(struct blkio_group *blkg, enum blkio_policy_id plid) +{ + struct request_queue *q = blkg->q; + + lockdep_assert_held(q->queue_lock); + + /* Something wrong if we are trying to remove same group twice */ + WARN_ON_ONCE(list_empty(&blkg->q_node[plid])); + list_del_init(&blkg->q_node[plid]); + + WARN_ON_ONCE(q->nr_blkgs[plid] <= 0); + q->nr_blkgs[plid]--; + + /* + * Put the reference taken at the time of creation so that when all + * queues are gone, group can be destroyed. + */ + blkg_put(blkg); +} + +void blkg_destroy_all(struct request_queue *q, enum blkio_policy_id plid, + bool destroy_root) { - struct blkio_policy_type *pol; + struct blkio_group *blkg, *n; while (true) { bool done = true; - spin_lock(&blkio_list_lock); spin_lock_irq(q->queue_lock); - /* - * clear_queue_fn() might return with non-empty group list - * if it raced cgroup removal and lost. cgroup removal is - * guaranteed to make forward progress and retrying after a - * while is enough. This ugliness is scheduled to be - * removed after locking update. - */ - list_for_each_entry(pol, &blkio_list, list) - if (!pol->ops.blkio_clear_queue_fn(q)) + list_for_each_entry_safe(blkg, n, &q->blkg_list[plid], + q_node[plid]) { + /* skip root? */ + if (!destroy_root && blkg->blkcg == &blkio_root_cgroup) + continue; + + /* + * If cgroup removal path got to blk_group first + * and removed it from cgroup list, then it will + * take care of destroying cfqg also. + */ + if (!blkiocg_del_blkio_group(blkg)) + blkg_destroy(blkg, plid); + else done = false; + } spin_unlock_irq(q->queue_lock); - spin_unlock(&blkio_list_lock); + /* + * Group list may not be empty if we raced cgroup removal + * and lost. cgroup removal is guaranteed to make forward + * progress and retrying after a while is enough. This + * ugliness is scheduled to be removed after locking + * update. + */ if (done) break; msleep(10); /* just some random duration I like */ } } +EXPORT_SYMBOL_GPL(blkg_destroy_all); static void blkg_rcu_free(struct rcu_head *rcu_head) { @@ -1549,11 +1585,13 @@ static int blkiocg_pre_destroy(struct cgroup_subsys *subsys, * this event. */ spin_lock(&blkio_list_lock); + spin_lock_irqsave(q->queue_lock, flags); list_for_each_entry(blkiop, &blkio_list, list) { if (blkiop->plid != blkg->plid) continue; - blkiop->ops.blkio_unlink_group_fn(q, blkg); + blkg_destroy(blkg, blkiop->plid); } + spin_unlock_irqrestore(q->queue_lock, flags); spin_unlock(&blkio_list_lock); } while (1); @@ -1695,12 +1733,14 @@ static void blkcg_bypass_start(void) __acquires(&all_q_mutex) { struct request_queue *q; + int i; mutex_lock(&all_q_mutex); list_for_each_entry(q, &all_q_list, all_q_node) { blk_queue_bypass_start(q); - blkg_destroy_all(q); + for (i = 0; i < BLKIO_NR_POLICIES; i++) + blkg_destroy_all(q, i, false); } } diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h index ae96f196d469..83ce5fa0a604 100644 --- a/block/blk-cgroup.h +++ b/block/blk-cgroup.h @@ -196,11 +196,6 @@ struct blkio_group { }; typedef void (blkio_init_group_fn)(struct blkio_group *blkg); -typedef void (blkio_link_group_fn)(struct request_queue *q, - struct blkio_group *blkg); -typedef void (blkio_unlink_group_fn)(struct request_queue *q, - struct blkio_group *blkg); -typedef bool (blkio_clear_queue_fn)(struct request_queue *q); typedef void (blkio_update_group_weight_fn)(struct request_queue *q, struct blkio_group *blkg, unsigned int weight); typedef void (blkio_update_group_read_bps_fn)(struct request_queue *q, @@ -214,9 +209,6 @@ typedef void (blkio_update_group_write_iops_fn)(struct request_queue *q, struct blkio_policy_ops { blkio_init_group_fn *blkio_init_group_fn; - blkio_link_group_fn *blkio_link_group_fn; - blkio_unlink_group_fn *blkio_unlink_group_fn; - blkio_clear_queue_fn *blkio_clear_queue_fn; blkio_update_group_weight_fn *blkio_update_group_weight_fn; blkio_update_group_read_bps_fn *blkio_update_group_read_bps_fn; blkio_update_group_write_bps_fn *blkio_update_group_write_bps_fn; @@ -238,7 +230,8 @@ extern void blkcg_exit_queue(struct request_queue *q); /* Blkio controller policy registration */ extern void blkio_policy_register(struct blkio_policy_type *); extern void blkio_policy_unregister(struct blkio_policy_type *); -extern void blkg_destroy_all(struct request_queue *q); +extern void blkg_destroy_all(struct request_queue *q, + enum blkio_policy_id plid, bool destroy_root); /** * blkg_to_pdata - get policy private data @@ -319,7 +312,9 @@ static inline void blkcg_drain_queue(struct request_queue *q) { } static inline void blkcg_exit_queue(struct request_queue *q) { } static inline void blkio_policy_register(struct blkio_policy_type *blkiop) { } static inline void blkio_policy_unregister(struct blkio_policy_type *blkiop) { } -static inline void blkg_destroy_all(struct request_queue *q) { } +static inline void blkg_destroy_all(struct request_queue *q, + enum blkio_policy_id plid, + bool destory_root) { } static inline void *blkg_to_pdata(struct blkio_group *blkg, struct blkio_policy_type *pol) { return NULL; } diff --git a/block/blk-throttle.c b/block/blk-throttle.c index c15d38307e1d..132941260e58 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -157,14 +157,6 @@ static void throtl_init_blkio_group(struct blkio_group *blkg) tg->iops[WRITE] = -1; } -static void throtl_link_blkio_group(struct request_queue *q, - struct blkio_group *blkg) -{ - list_add(&blkg->q_node[BLKIO_POLICY_THROTL], - &q->blkg_list[BLKIO_POLICY_THROTL]); - q->nr_blkgs[BLKIO_POLICY_THROTL]++; -} - static struct throtl_grp *throtl_lookup_tg(struct throtl_data *td, struct blkio_cgroup *blkcg) { @@ -813,89 +805,6 @@ throtl_schedule_delayed_work(struct throtl_data *td, unsigned long delay) } } -static void -throtl_destroy_tg(struct throtl_data *td, struct throtl_grp *tg) -{ - struct blkio_group *blkg = tg_to_blkg(tg); - - /* Something wrong if we are trying to remove same group twice */ - WARN_ON_ONCE(list_empty(&blkg->q_node[BLKIO_POLICY_THROTL])); - - list_del_init(&blkg->q_node[BLKIO_POLICY_THROTL]); - - /* - * Put the reference taken at the time of creation so that when all - * queues are gone, group can be destroyed. - */ - blkg_put(tg_to_blkg(tg)); - td->queue->nr_blkgs[BLKIO_POLICY_THROTL]--; -} - -static bool throtl_release_tgs(struct throtl_data *td, bool release_root) -{ - struct request_queue *q = td->queue; - struct blkio_group *blkg, *n; - bool empty = true; - - list_for_each_entry_safe(blkg, n, &q->blkg_list[BLKIO_POLICY_THROTL], - q_node[BLKIO_POLICY_THROTL]) { - struct throtl_grp *tg = blkg_to_tg(blkg); - - /* skip root? */ - if (!release_root && tg == td->root_tg) - continue; - - /* - * If cgroup removal path got to blk_group first and removed - * it from cgroup list, then it will take care of destroying - * cfqg also. - */ - if (!blkiocg_del_blkio_group(blkg)) - throtl_destroy_tg(td, tg); - else - empty = false; - } - return empty; -} - -/* - * Blk cgroup controller notification saying that blkio_group object is being - * delinked as associated cgroup object is going away. That also means that - * no new IO will come in this group. So get rid of this group as soon as - * any pending IO in the group is finished. - * - * This function is called under rcu_read_lock(). @q is the rcu protected - * pointer. That means @q is a valid request_queue pointer as long as we - * are rcu read lock. - * - * @q was fetched from blkio_group under blkio_cgroup->lock. That means - * it should not be NULL as even if queue was going away, cgroup deltion - * path got to it first. - */ -void throtl_unlink_blkio_group(struct request_queue *q, - struct blkio_group *blkg) -{ - unsigned long flags; - - spin_lock_irqsave(q->queue_lock, flags); - throtl_destroy_tg(q->td, blkg_to_tg(blkg)); - spin_unlock_irqrestore(q->queue_lock, flags); -} - -static bool throtl_clear_queue(struct request_queue *q) -{ - lockdep_assert_held(q->queue_lock); - - /* - * Clear tgs but leave the root one alone. This is necessary - * because root_tg is expected to be persistent and safe because - * blk-throtl can never be disabled while @q is alive. This is a - * kludge to prepare for unified blkg. This whole function will be - * removed soon. - */ - return throtl_release_tgs(q->td, false); -} - static void throtl_update_blkio_group_common(struct throtl_data *td, struct throtl_grp *tg) { @@ -960,9 +869,6 @@ static void throtl_shutdown_wq(struct request_queue *q) static struct blkio_policy_type blkio_policy_throtl = { .ops = { .blkio_init_group_fn = throtl_init_blkio_group, - .blkio_link_group_fn = throtl_link_blkio_group, - .blkio_unlink_group_fn = throtl_unlink_blkio_group, - .blkio_clear_queue_fn = throtl_clear_queue, .blkio_update_group_read_bps_fn = throtl_update_blkio_group_read_bps, .blkio_update_group_write_bps_fn = @@ -1148,12 +1054,11 @@ void blk_throtl_exit(struct request_queue *q) throtl_shutdown_wq(q); - spin_lock_irq(q->queue_lock); - throtl_release_tgs(td, true); + blkg_destroy_all(q, BLKIO_POLICY_THROTL, true); /* If there are other groups */ + spin_lock_irq(q->queue_lock); wait = q->nr_blkgs[BLKIO_POLICY_THROTL]; - spin_unlock_irq(q->queue_lock); /* diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index e846803280a6..dc73690dec44 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -1045,14 +1045,6 @@ static void cfq_update_blkio_group_weight(struct request_queue *q, cfqg->needs_update = true; } -static void cfq_link_blkio_group(struct request_queue *q, - struct blkio_group *blkg) -{ - list_add(&blkg->q_node[BLKIO_POLICY_PROP], - &q->blkg_list[BLKIO_POLICY_PROP]); - q->nr_blkgs[BLKIO_POLICY_PROP]++; -} - static void cfq_init_blkio_group(struct blkio_group *blkg) { struct cfq_group *cfqg = blkg_to_cfqg(blkg); @@ -1096,84 +1088,6 @@ static void cfq_link_cfqq_cfqg(struct cfq_queue *cfqq, struct cfq_group *cfqg) blkg_get(cfqg_to_blkg(cfqg)); } -static void cfq_destroy_cfqg(struct cfq_data *cfqd, struct cfq_group *cfqg) -{ - struct blkio_group *blkg = cfqg_to_blkg(cfqg); - - /* Something wrong if we are trying to remove same group twice */ - BUG_ON(list_empty(&blkg->q_node[BLKIO_POLICY_PROP])); - - list_del_init(&blkg->q_node[BLKIO_POLICY_PROP]); - - BUG_ON(cfqd->queue->nr_blkgs[BLKIO_POLICY_PROP] <= 0); - cfqd->queue->nr_blkgs[BLKIO_POLICY_PROP]--; - - /* - * Put the reference taken at the time of creation so that when all - * queues are gone, group can be destroyed. - */ - blkg_put(cfqg_to_blkg(cfqg)); -} - -static bool cfq_release_cfq_groups(struct cfq_data *cfqd) -{ - struct request_queue *q = cfqd->queue; - struct blkio_group *blkg, *n; - bool empty = true; - - list_for_each_entry_safe(blkg, n, &q->blkg_list[BLKIO_POLICY_PROP], - q_node[BLKIO_POLICY_PROP]) { - /* - * If cgroup removal path got to blk_group first and removed - * it from cgroup list, then it will take care of destroying - * cfqg also. - */ - if (!cfq_blkiocg_del_blkio_group(blkg)) - cfq_destroy_cfqg(cfqd, blkg_to_cfqg(blkg)); - else - empty = false; - } - return empty; -} - -/* - * Blk cgroup controller notification saying that blkio_group object is being - * delinked as associated cgroup object is going away. That also means that - * no new IO will come in this group. So get rid of this group as soon as - * any pending IO in the group is finished. - * - * This function is called under rcu_read_lock(). key is the rcu protected - * pointer. That means @q is a valid request_queue pointer as long as we - * are rcu read lock. - * - * @q was fetched from blkio_group under blkio_cgroup->lock. That means - * it should not be NULL as even if elevator was exiting, cgroup deltion - * path got to it first. - */ -static void cfq_unlink_blkio_group(struct request_queue *q, - struct blkio_group *blkg) -{ - struct cfq_data *cfqd = q->elevator->elevator_data; - unsigned long flags; - - spin_lock_irqsave(q->queue_lock, flags); - cfq_destroy_cfqg(cfqd, blkg_to_cfqg(blkg)); - spin_unlock_irqrestore(q->queue_lock, flags); -} - -static struct elevator_type iosched_cfq; - -static bool cfq_clear_queue(struct request_queue *q) -{ - lockdep_assert_held(q->queue_lock); - - /* shoot down blkgs iff the current elevator is cfq */ - if (!q->elevator || q->elevator->type != &iosched_cfq) - return true; - - return cfq_release_cfq_groups(q->elevator->elevator_data); -} - #else /* GROUP_IOSCHED */ static struct cfq_group *cfq_lookup_create_cfqg(struct cfq_data *cfqd, struct blkio_cgroup *blkcg) @@ -1186,8 +1100,6 @@ cfq_link_cfqq_cfqg(struct cfq_queue *cfqq, struct cfq_group *cfqg) { cfqq->cfqg = cfqg; } -static void cfq_release_cfq_groups(struct cfq_data *cfqd) {} - #endif /* GROUP_IOSCHED */ /* @@ -3547,17 +3459,20 @@ static void cfq_exit_queue(struct elevator_queue *e) __cfq_slice_expired(cfqd, cfqd->active_queue, 0); cfq_put_async_queues(cfqd); - cfq_release_cfq_groups(cfqd); + + spin_unlock_irq(q->queue_lock); + + blkg_destroy_all(q, BLKIO_POLICY_PROP, true); #ifdef CONFIG_BLK_CGROUP /* * If there are groups which we could not unlink from blkcg list, * wait for a rcu period for them to be freed. */ + spin_lock_irq(q->queue_lock); wait = q->nr_blkgs[BLKIO_POLICY_PROP]; -#endif spin_unlock_irq(q->queue_lock); - +#endif cfq_shutdown_timer_wq(cfqd); /* @@ -3794,9 +3709,6 @@ static struct elevator_type iosched_cfq = { static struct blkio_policy_type blkio_policy_cfq = { .ops = { .blkio_init_group_fn = cfq_init_blkio_group, - .blkio_link_group_fn = cfq_link_blkio_group, - .blkio_unlink_group_fn = cfq_unlink_blkio_group, - .blkio_clear_queue_fn = cfq_clear_queue, .blkio_update_group_weight_fn = cfq_update_blkio_group_weight, }, .plid = BLKIO_POLICY_PROP, diff --git a/block/elevator.c b/block/elevator.c index 8c7561fd2c79..d4d39dab841a 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -876,7 +876,7 @@ static int elevator_switch(struct request_queue *q, struct elevator_type *new_e) { struct elevator_queue *old = q->elevator; bool registered = old->registered; - int err; + int i, err; /* * Turn on BYPASS and drain all requests w/ elevator private data. @@ -895,7 +895,8 @@ static int elevator_switch(struct request_queue *q, struct elevator_type *new_e) ioc_clear_queue(q); spin_unlock_irq(q->queue_lock); - blkg_destroy_all(q); + for (i = 0; i < BLKIO_NR_POLICIES; i++) + blkg_destroy_all(q, i, false); /* allocate, init and register new elevator */ err = -ENOMEM; diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index f4e35edea70f..b4d1d4bfc168 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -364,8 +364,8 @@ struct request_queue { struct list_head icq_list; #ifdef CONFIG_BLK_CGROUP /* XXX: array size hardcoded to avoid include dependency (temporary) */ - struct list_head blkg_list[2]; - int nr_blkgs[2]; + struct list_head blkg_list; + int nr_blkgs; #endif struct queue_limits limits; -- cgit v1.2.3 From c875f4d0250a1f070fa26087a73bdd8f54c48100 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Mon, 5 Mar 2012 13:15:22 -0800 Subject: blkcg: drop unnecessary RCU locking Now that blkg additions / removals are always done under both q and blkcg locks, the only places RCU locking is necessary are blkg_lookup[_create]() for lookup w/o blkcg lock. This patch drops unncessary RCU locking replacing it with plain blkcg locking as necessary. * blkiocg_pre_destroy() already perform proper locking and don't need RCU. Dropped. * blkio_read_blkg_stats() now uses blkcg->lock instead of RCU read lock. This isn't a hot path. * Now unnecessary synchronize_rcu() from queue exit paths removed. This makes q->nr_blkgs unnecessary. Dropped. * RCU annotation on blkg->q removed. -v2: Vivek pointed out that blkg_lookup_create() still needs to be called under rcu_read_lock(). Updated. -v3: After the update, stats_lock locking in blkio_read_blkg_stats() shouldn't be using _irq variant as it otherwise ends up enabling irq while blkcg->lock is locked. Fixed. Signed-off-by: Tejun Heo Cc: Vivek Goyal Signed-off-by: Jens Axboe --- block/blk-cgroup.c | 24 +++++++++--------------- block/blk-cgroup.h | 4 ++-- block/blk-throttle.c | 33 +-------------------------------- block/cfq-iosched.c | 24 ------------------------ include/linux/blkdev.h | 1 - 5 files changed, 12 insertions(+), 74 deletions(-) (limited to 'include') diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index e9e3b038c702..27d39a810cb6 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -500,7 +500,7 @@ static struct blkio_group *blkg_alloc(struct blkio_cgroup *blkcg, return NULL; spin_lock_init(&blkg->stats_lock); - rcu_assign_pointer(blkg->q, q); + blkg->q = q; INIT_LIST_HEAD(&blkg->q_node); blkg->blkcg = blkcg; blkg->refcnt = 1; @@ -611,7 +611,6 @@ struct blkio_group *blkg_lookup_create(struct blkio_cgroup *blkcg, hlist_add_head_rcu(&blkg->blkcg_node, &blkcg->blkg_list); list_add(&blkg->q_node, &q->blkg_list); - q->nr_blkgs++; spin_unlock(&blkcg->lock); out: @@ -648,9 +647,6 @@ static void blkg_destroy(struct blkio_group *blkg) list_del_init(&blkg->q_node); hlist_del_init_rcu(&blkg->blkcg_node); - WARN_ON_ONCE(q->nr_blkgs <= 0); - q->nr_blkgs--; - /* * Put the reference taken at the time of creation so that when all * queues are gone, group can be destroyed. @@ -1232,8 +1228,9 @@ static int blkio_read_blkg_stats(struct blkio_cgroup *blkcg, struct hlist_node *n; uint64_t cgroup_total = 0; - rcu_read_lock(); - hlist_for_each_entry_rcu(blkg, n, &blkcg->blkg_list, blkcg_node) { + spin_lock_irq(&blkcg->lock); + + hlist_for_each_entry(blkg, n, &blkcg->blkg_list, blkcg_node) { const char *dname = blkg_dev_name(blkg); int plid = BLKIOFILE_POLICY(cft->private); @@ -1243,15 +1240,16 @@ static int blkio_read_blkg_stats(struct blkio_cgroup *blkcg, cgroup_total += blkio_get_stat_cpu(blkg, plid, cb, dname, type); } else { - spin_lock_irq(&blkg->stats_lock); + spin_lock(&blkg->stats_lock); cgroup_total += blkio_get_stat(blkg, plid, cb, dname, type); - spin_unlock_irq(&blkg->stats_lock); + spin_unlock(&blkg->stats_lock); } } if (show_total) cb->fill(cb, "Total", cgroup_total); - rcu_read_unlock(); + + spin_unlock_irq(&blkcg->lock); return 0; } @@ -1583,28 +1581,24 @@ static int blkiocg_pre_destroy(struct cgroup_subsys *subsys, { struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgroup); - rcu_read_lock(); spin_lock_irq(&blkcg->lock); while (!hlist_empty(&blkcg->blkg_list)) { struct blkio_group *blkg = hlist_entry(blkcg->blkg_list.first, struct blkio_group, blkcg_node); - struct request_queue *q = rcu_dereference(blkg->q); + struct request_queue *q = blkg->q; if (spin_trylock(q->queue_lock)) { blkg_destroy(blkg); spin_unlock(q->queue_lock); } else { spin_unlock_irq(&blkcg->lock); - rcu_read_unlock(); cpu_relax(); - rcu_read_lock(); spin_lock(&blkcg->lock); } } spin_unlock_irq(&blkcg->lock); - rcu_read_unlock(); return 0; } diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h index df73040a6a5f..66eaefefcbd2 100644 --- a/block/blk-cgroup.h +++ b/block/blk-cgroup.h @@ -176,8 +176,8 @@ struct blkg_policy_data { }; struct blkio_group { - /* Pointer to the associated request_queue, RCU protected */ - struct request_queue __rcu *q; + /* Pointer to the associated request_queue */ + struct request_queue *q; struct list_head q_node; struct hlist_node blkcg_node; struct blkio_cgroup *blkcg; diff --git a/block/blk-throttle.c b/block/blk-throttle.c index e35ee7aeea69..bfa5168249eb 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -1046,39 +1046,8 @@ int blk_throtl_init(struct request_queue *q) void blk_throtl_exit(struct request_queue *q) { - struct throtl_data *td = q->td; - bool wait; - - BUG_ON(!td); - + BUG_ON(!q->td); throtl_shutdown_wq(q); - - /* If there are other groups */ - spin_lock_irq(q->queue_lock); - wait = q->nr_blkgs; - spin_unlock_irq(q->queue_lock); - - /* - * Wait for tg_to_blkg(tg)->q accessors to exit their grace periods. - * Do this wait only if there are other undestroyed groups out - * there (other than root group). This can happen if cgroup deletion - * path claimed the responsibility of cleaning up a group before - * queue cleanup code get to the group. - * - * Do not call synchronize_rcu() unconditionally as there are drivers - * which create/delete request queue hundreds of times during scan/boot - * and synchronize_rcu() can take significant time and slow down boot. - */ - if (wait) - synchronize_rcu(); - - /* - * Just being safe to make sure after previous flush if some body did - * update limits through cgroup and another work got queued, cancel - * it. - */ - throtl_shutdown_wq(q); - kfree(q->td); } diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index 393eaa59913b..9e386d9bcb79 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -3449,7 +3449,6 @@ static void cfq_exit_queue(struct elevator_queue *e) { struct cfq_data *cfqd = e->elevator_data; struct request_queue *q = cfqd->queue; - bool wait = false; cfq_shutdown_timer_wq(cfqd); @@ -3462,31 +3461,8 @@ static void cfq_exit_queue(struct elevator_queue *e) spin_unlock_irq(q->queue_lock); -#ifdef CONFIG_BLK_CGROUP - /* - * If there are groups which we could not unlink from blkcg list, - * wait for a rcu period for them to be freed. - */ - spin_lock_irq(q->queue_lock); - wait = q->nr_blkgs; - spin_unlock_irq(q->queue_lock); -#endif cfq_shutdown_timer_wq(cfqd); - /* - * Wait for cfqg->blkg->key accessors to exit their grace periods. - * Do this wait only if there are other unlinked groups out - * there. This can happen if cgroup deletion path claimed the - * responsibility of cleaning up a group before queue cleanup code - * get to the group. - * - * Do not call synchronize_rcu() unconditionally as there are drivers - * which create/delete request queue hundreds of times during scan/boot - * and synchronize_rcu() can take significant time and slow down boot. - */ - if (wait) - synchronize_rcu(); - #ifndef CONFIG_CFQ_GROUP_IOSCHED kfree(cfqd->root_group); #endif diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index b4d1d4bfc168..33f1b29e53f4 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -365,7 +365,6 @@ struct request_queue { #ifdef CONFIG_BLK_CGROUP /* XXX: array size hardcoded to avoid include dependency (temporary) */ struct list_head blkg_list; - int nr_blkgs; #endif struct queue_limits limits; -- cgit v1.2.3 From 3d48749d93a3dce732dd30a14002ab90ec4355f3 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Mon, 5 Mar 2012 13:15:25 -0800 Subject: block: ioc_task_link() can't fail ioc_task_link() is used to share %current's ioc on clone. If %current->io_context is set, %current is guaranteed to have refcount on the ioc and, thus, ioc_task_link() can't fail. Replace error checking in ioc_task_link() with WARN_ON_ONCE() and make it just increment refcount and nr_tasks. -v2: Description typo fix (Vivek). Signed-off-by: Tejun Heo Cc: Vivek Goyal Signed-off-by: Jens Axboe --- include/linux/iocontext.h | 16 +++++----------- kernel/fork.c | 5 ++--- 2 files changed, 7 insertions(+), 14 deletions(-) (limited to 'include') diff --git a/include/linux/iocontext.h b/include/linux/iocontext.h index 1a3018063034..81a8870ac224 100644 --- a/include/linux/iocontext.h +++ b/include/linux/iocontext.h @@ -120,18 +120,12 @@ struct io_context { struct work_struct release_work; }; -static inline struct io_context *ioc_task_link(struct io_context *ioc) +static inline void ioc_task_link(struct io_context *ioc) { - /* - * if ref count is zero, don't allow sharing (ioc is going away, it's - * a race). - */ - if (ioc && atomic_long_inc_not_zero(&ioc->refcount)) { - atomic_inc(&ioc->nr_tasks); - return ioc; - } - - return NULL; + WARN_ON_ONCE(atomic_long_read(&ioc->refcount) <= 0); + WARN_ON_ONCE(atomic_read(&ioc->nr_tasks) <= 0); + atomic_long_inc(&ioc->refcount); + atomic_inc(&ioc->nr_tasks); } struct task_struct; diff --git a/kernel/fork.c b/kernel/fork.c index b77fd559c78e..a1b632713e43 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -901,9 +901,8 @@ static int copy_io(unsigned long clone_flags, struct task_struct *tsk) * Share io context with parent, if CLONE_IO is set */ if (clone_flags & CLONE_IO) { - tsk->io_context = ioc_task_link(ioc); - if (unlikely(!tsk->io_context)) - return -ENOMEM; + ioc_task_link(ioc); + tsk->io_context = ioc; } else if (ioprio_valid(ioc->ioprio)) { new_ioc = get_task_io_context(tsk, GFP_KERNEL, NUMA_NO_NODE); if (unlikely(!new_ioc)) -- cgit v1.2.3 From f6e8d01bee036460e03bd4f6a79d014f98ba712e Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Mon, 5 Mar 2012 13:15:26 -0800 Subject: block: add io_context->active_ref Currently ioc->nr_tasks is used to decide two things - whether an ioc is done issuing IOs and whether it's shared by multiple tasks. This patch separate out the first into ioc->active_ref, which is acquired and released using {get|put}_io_context_active() respectively. This will be used to associate bio's with a given task. This patch doesn't introduce any visible behavior change. Signed-off-by: Tejun Heo Cc: Vivek Goyal Signed-off-by: Jens Axboe --- block/blk-ioc.c | 36 +++++++++++++++++++++++++----------- block/cfq-iosched.c | 4 ++-- include/linux/iocontext.h | 22 ++++++++++++++++++++-- 3 files changed, 47 insertions(+), 15 deletions(-) (limited to 'include') diff --git a/block/blk-ioc.c b/block/blk-ioc.c index 10928740b5da..439ec21fd787 100644 --- a/block/blk-ioc.c +++ b/block/blk-ioc.c @@ -149,20 +149,20 @@ void put_io_context(struct io_context *ioc) } EXPORT_SYMBOL(put_io_context); -/* Called by the exiting task */ -void exit_io_context(struct task_struct *task) +/** + * put_io_context_active - put active reference on ioc + * @ioc: ioc of interest + * + * Undo get_io_context_active(). If active reference reaches zero after + * put, @ioc can never issue further IOs and ioscheds are notified. + */ +void put_io_context_active(struct io_context *ioc) { - struct io_context *ioc; - struct io_cq *icq; struct hlist_node *n; unsigned long flags; + struct io_cq *icq; - task_lock(task); - ioc = task->io_context; - task->io_context = NULL; - task_unlock(task); - - if (!atomic_dec_and_test(&ioc->nr_tasks)) { + if (!atomic_dec_and_test(&ioc->active_ref)) { put_io_context(ioc); return; } @@ -191,6 +191,20 @@ retry: put_io_context(ioc); } +/* Called by the exiting task */ +void exit_io_context(struct task_struct *task) +{ + struct io_context *ioc; + + task_lock(task); + ioc = task->io_context; + task->io_context = NULL; + task_unlock(task); + + atomic_dec(&ioc->nr_tasks); + put_io_context_active(ioc); +} + /** * ioc_clear_queue - break any ioc association with the specified queue * @q: request_queue being cleared @@ -223,7 +237,7 @@ int create_task_io_context(struct task_struct *task, gfp_t gfp_flags, int node) /* initialize */ atomic_long_set(&ioc->refcount, 1); - atomic_set(&ioc->nr_tasks, 1); + atomic_set(&ioc->active_ref, 1); spin_lock_init(&ioc->lock); INIT_RADIX_TREE(&ioc->icq_tree, GFP_ATOMIC | __GFP_HIGH); INIT_HLIST_HEAD(&ioc->icq_list); diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index 9e386d9bcb79..9a4eac490e0b 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -1865,7 +1865,7 @@ static void cfq_arm_slice_timer(struct cfq_data *cfqd) * task has exited, don't wait */ cic = cfqd->active_cic; - if (!cic || !atomic_read(&cic->icq.ioc->nr_tasks)) + if (!cic || !atomic_read(&cic->icq.ioc->active_ref)) return; /* @@ -2841,7 +2841,7 @@ cfq_update_idle_window(struct cfq_data *cfqd, struct cfq_queue *cfqq, if (cfqq->next_rq && (cfqq->next_rq->cmd_flags & REQ_NOIDLE)) enable_idle = 0; - else if (!atomic_read(&cic->icq.ioc->nr_tasks) || + else if (!atomic_read(&cic->icq.ioc->active_ref) || !cfqd->cfq_slice_idle || (!cfq_cfqq_deep(cfqq) && CFQQ_SEEKY(cfqq))) enable_idle = 0; diff --git a/include/linux/iocontext.h b/include/linux/iocontext.h index 81a8870ac224..6f1a2608e91f 100644 --- a/include/linux/iocontext.h +++ b/include/linux/iocontext.h @@ -100,6 +100,7 @@ struct io_cq { */ struct io_context { atomic_long_t refcount; + atomic_t active_ref; atomic_t nr_tasks; /* all the fields below are protected by this lock */ @@ -120,17 +121,34 @@ struct io_context { struct work_struct release_work; }; -static inline void ioc_task_link(struct io_context *ioc) +/** + * get_io_context_active - get active reference on ioc + * @ioc: ioc of interest + * + * Only iocs with active reference can issue new IOs. This function + * acquires an active reference on @ioc. The caller must already have an + * active reference on @ioc. + */ +static inline void get_io_context_active(struct io_context *ioc) { WARN_ON_ONCE(atomic_long_read(&ioc->refcount) <= 0); - WARN_ON_ONCE(atomic_read(&ioc->nr_tasks) <= 0); + WARN_ON_ONCE(atomic_read(&ioc->active_ref) <= 0); atomic_long_inc(&ioc->refcount); + atomic_inc(&ioc->active_ref); +} + +static inline void ioc_task_link(struct io_context *ioc) +{ + get_io_context_active(ioc); + + WARN_ON_ONCE(atomic_read(&ioc->nr_tasks) <= 0); atomic_inc(&ioc->nr_tasks); } struct task_struct; #ifdef CONFIG_BLOCK void put_io_context(struct io_context *ioc); +void put_io_context_active(struct io_context *ioc); void exit_io_context(struct task_struct *task); struct io_context *get_task_io_context(struct task_struct *task, gfp_t gfp_flags, int node); -- cgit v1.2.3 From 852c788f8365062c8a383c5a93f7f7289977cb50 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Mon, 5 Mar 2012 13:15:27 -0800 Subject: block: implement bio_associate_current() IO scheduling and cgroup are tied to the issuing task via io_context and cgroup of %current. Unfortunately, there are cases where IOs need to be routed via a different task which makes scheduling and cgroup limit enforcement applied completely incorrectly. For example, all bios delayed by blk-throttle end up being issued by a delayed work item and get assigned the io_context of the worker task which happens to serve the work item and dumped to the default block cgroup. This is double confusing as bios which aren't delayed end up in the correct cgroup and makes using blk-throttle and cfq propio together impossible. Any code which punts IO issuing to another task is affected which is getting more and more common (e.g. btrfs). As both io_context and cgroup are firmly tied to task including userland visible APIs to manipulate them, it makes a lot of sense to match up tasks to bios. This patch implements bio_associate_current() which associates the specified bio with %current. The bio will record the associated ioc and blkcg at that point and block layer will use the recorded ones regardless of which task actually ends up issuing the bio. bio release puts the associated ioc and blkcg. It grabs and remembers ioc and blkcg instead of the task itself because task may already be dead by the time the bio is issued making ioc and blkcg inaccessible and those are all block layer cares about. elevator_set_req_fn() is updated such that the bio elvdata is being allocated for is available to the elevator. This doesn't update block cgroup policies yet. Further patches will implement the support. -v2: #ifdef CONFIG_BLK_CGROUP added around bio->bi_ioc dereference in rq_ioc() to fix build breakage. Signed-off-by: Tejun Heo Cc: Vivek Goyal Cc: Kent Overstreet Signed-off-by: Jens Axboe --- block/blk-core.c | 32 +++++++++++++++++++------ block/cfq-iosched.c | 3 ++- block/elevator.c | 5 ++-- fs/bio.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++ include/linux/bio.h | 8 +++++++ include/linux/blk_types.h | 10 ++++++++ include/linux/elevator.h | 6 +++-- 7 files changed, 113 insertions(+), 12 deletions(-) (limited to 'include') diff --git a/block/blk-core.c b/block/blk-core.c index b2d0fcd8f87f..991c1d6ef245 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -696,7 +696,7 @@ static inline void blk_free_request(struct request_queue *q, struct request *rq) } static struct request * -blk_alloc_request(struct request_queue *q, struct io_cq *icq, +blk_alloc_request(struct request_queue *q, struct bio *bio, struct io_cq *icq, unsigned int flags, gfp_t gfp_mask) { struct request *rq = mempool_alloc(q->rq.rq_pool, gfp_mask); @@ -710,7 +710,7 @@ blk_alloc_request(struct request_queue *q, struct io_cq *icq, if (flags & REQ_ELVPRIV) { rq->elv.icq = icq; - if (unlikely(elv_set_request(q, rq, gfp_mask))) { + if (unlikely(elv_set_request(q, rq, bio, gfp_mask))) { mempool_free(rq, q->rq.rq_pool); return NULL; } @@ -809,6 +809,22 @@ static bool blk_rq_should_init_elevator(struct bio *bio) return true; } +/** + * rq_ioc - determine io_context for request allocation + * @bio: request being allocated is for this bio (can be %NULL) + * + * Determine io_context to use for request allocation for @bio. May return + * %NULL if %current->io_context doesn't exist. + */ +static struct io_context *rq_ioc(struct bio *bio) +{ +#ifdef CONFIG_BLK_CGROUP + if (bio && bio->bi_ioc) + return bio->bi_ioc; +#endif + return current->io_context; +} + /** * get_request - get a free request * @q: request_queue to allocate request from @@ -836,7 +852,7 @@ static struct request *get_request(struct request_queue *q, int rw_flags, int may_queue; retry: et = q->elevator->type; - ioc = current->io_context; + ioc = rq_ioc(bio); if (unlikely(blk_queue_dead(q))) return NULL; @@ -919,14 +935,16 @@ retry: /* create icq if missing */ if ((rw_flags & REQ_ELVPRIV) && unlikely(et->icq_cache && !icq)) { - ioc = create_io_context(gfp_mask, q->node); - if (ioc) - icq = ioc_create_icq(ioc, q, gfp_mask); + create_io_context(gfp_mask, q->node); + ioc = rq_ioc(bio); + if (!ioc) + goto fail_alloc; + icq = ioc_create_icq(ioc, q, gfp_mask); if (!icq) goto fail_alloc; } - rq = blk_alloc_request(q, icq, rw_flags, gfp_mask); + rq = blk_alloc_request(q, bio, icq, rw_flags, gfp_mask); if (unlikely(!rq)) goto fail_alloc; diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index 9a4eac490e0b..abac87337d70 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -3299,7 +3299,8 @@ split_cfqq(struct cfq_io_cq *cic, struct cfq_queue *cfqq) * Allocate cfq data structures associated with this request. */ static int -cfq_set_request(struct request_queue *q, struct request *rq, gfp_t gfp_mask) +cfq_set_request(struct request_queue *q, struct request *rq, struct bio *bio, + gfp_t gfp_mask) { struct cfq_data *cfqd = q->elevator->elevator_data; struct cfq_io_cq *cic = icq_to_cic(rq->elv.icq); diff --git a/block/elevator.c b/block/elevator.c index 451654fadab0..be3ab6df0fea 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -663,12 +663,13 @@ struct request *elv_former_request(struct request_queue *q, struct request *rq) return NULL; } -int elv_set_request(struct request_queue *q, struct request *rq, gfp_t gfp_mask) +int elv_set_request(struct request_queue *q, struct request *rq, + struct bio *bio, gfp_t gfp_mask) { struct elevator_queue *e = q->elevator; if (e->type->ops.elevator_set_req_fn) - return e->type->ops.elevator_set_req_fn(q, rq, gfp_mask); + return e->type->ops.elevator_set_req_fn(q, rq, bio, gfp_mask); return 0; } diff --git a/fs/bio.c b/fs/bio.c index b980ecde026a..142214b80039 100644 --- a/fs/bio.c +++ b/fs/bio.c @@ -19,12 +19,14 @@ #include #include #include +#include #include #include #include #include #include #include +#include #include /* for struct sg_iovec */ #include @@ -418,6 +420,7 @@ void bio_put(struct bio *bio) * last put frees it */ if (atomic_dec_and_test(&bio->bi_cnt)) { + bio_disassociate_task(bio); bio->bi_next = NULL; bio->bi_destructor(bio); } @@ -1641,6 +1644,64 @@ bad: } EXPORT_SYMBOL(bioset_create); +#ifdef CONFIG_BLK_CGROUP +/** + * bio_associate_current - associate a bio with %current + * @bio: target bio + * + * Associate @bio with %current if it hasn't been associated yet. Block + * layer will treat @bio as if it were issued by %current no matter which + * task actually issues it. + * + * This function takes an extra reference of @task's io_context and blkcg + * which will be put when @bio is released. The caller must own @bio, + * ensure %current->io_context exists, and is responsible for synchronizing + * calls to this function. + */ +int bio_associate_current(struct bio *bio) +{ + struct io_context *ioc; + struct cgroup_subsys_state *css; + + if (bio->bi_ioc) + return -EBUSY; + + ioc = current->io_context; + if (!ioc) + return -ENOENT; + + /* acquire active ref on @ioc and associate */ + get_io_context_active(ioc); + bio->bi_ioc = ioc; + + /* associate blkcg if exists */ + rcu_read_lock(); + css = task_subsys_state(current, blkio_subsys_id); + if (css && css_tryget(css)) + bio->bi_css = css; + rcu_read_unlock(); + + return 0; +} + +/** + * bio_disassociate_task - undo bio_associate_current() + * @bio: target bio + */ +void bio_disassociate_task(struct bio *bio) +{ + if (bio->bi_ioc) { + put_io_context(bio->bi_ioc); + bio->bi_ioc = NULL; + } + if (bio->bi_css) { + css_put(bio->bi_css); + bio->bi_css = NULL; + } +} + +#endif /* CONFIG_BLK_CGROUP */ + static void __init biovec_init_slabs(void) { int i; diff --git a/include/linux/bio.h b/include/linux/bio.h index 129a9c097958..692d3d5b49f5 100644 --- a/include/linux/bio.h +++ b/include/linux/bio.h @@ -268,6 +268,14 @@ extern struct bio_vec *bvec_alloc_bs(gfp_t, int, unsigned long *, struct bio_set extern void bvec_free_bs(struct bio_set *, struct bio_vec *, unsigned int); extern unsigned int bvec_nr_vecs(unsigned short idx); +#ifdef CONFIG_BLK_CGROUP +int bio_associate_current(struct bio *bio); +void bio_disassociate_task(struct bio *bio); +#else /* CONFIG_BLK_CGROUP */ +static inline int bio_associate_current(struct bio *bio) { return -ENOENT; } +static inline void bio_disassociate_task(struct bio *bio) { } +#endif /* CONFIG_BLK_CGROUP */ + /* * bio_set is used to allow other portions of the IO system to * allocate their own private memory pools for bio and iovec structures. diff --git a/include/linux/blk_types.h b/include/linux/blk_types.h index 4053cbd4490e..0edb65dd8edd 100644 --- a/include/linux/blk_types.h +++ b/include/linux/blk_types.h @@ -14,6 +14,8 @@ struct bio; struct bio_integrity_payload; struct page; struct block_device; +struct io_context; +struct cgroup_subsys_state; typedef void (bio_end_io_t) (struct bio *, int); typedef void (bio_destructor_t) (struct bio *); @@ -66,6 +68,14 @@ struct bio { bio_end_io_t *bi_end_io; void *bi_private; +#ifdef CONFIG_BLK_CGROUP + /* + * Optional ioc and css associated with this bio. Put on bio + * release. Read comment on top of bio_associate_current(). + */ + struct io_context *bi_ioc; + struct cgroup_subsys_state *bi_css; +#endif #if defined(CONFIG_BLK_DEV_INTEGRITY) struct bio_integrity_payload *bi_integrity; /* data integrity */ #endif diff --git a/include/linux/elevator.h b/include/linux/elevator.h index 97fb2557a18c..c03af7687bb4 100644 --- a/include/linux/elevator.h +++ b/include/linux/elevator.h @@ -28,7 +28,8 @@ typedef int (elevator_may_queue_fn) (struct request_queue *, int); typedef void (elevator_init_icq_fn) (struct io_cq *); typedef void (elevator_exit_icq_fn) (struct io_cq *); -typedef int (elevator_set_req_fn) (struct request_queue *, struct request *, gfp_t); +typedef int (elevator_set_req_fn) (struct request_queue *, struct request *, + struct bio *, gfp_t); typedef void (elevator_put_req_fn) (struct request *); typedef void (elevator_activate_req_fn) (struct request_queue *, struct request *); typedef void (elevator_deactivate_req_fn) (struct request_queue *, struct request *); @@ -129,7 +130,8 @@ extern void elv_unregister_queue(struct request_queue *q); extern int elv_may_queue(struct request_queue *, int); extern void elv_abort_queue(struct request_queue *); extern void elv_completed_request(struct request_queue *, struct request *); -extern int elv_set_request(struct request_queue *, struct request *, gfp_t); +extern int elv_set_request(struct request_queue *q, struct request *rq, + struct bio *bio, gfp_t gfp_mask); extern void elv_put_request(struct request_queue *, struct request *); extern void elv_drain_elevator(struct request_queue *); -- cgit v1.2.3 From 900771a483ef28915a48066d7895d8252315607a Mon Sep 17 00:00:00 2001 From: Srikar Dronamraju Date: Mon, 12 Mar 2012 14:55:14 +0530 Subject: uprobes/core: Make macro names consistent Rename macros that refer to individual uprobe to start with UPROBE_ instead of UPROBES_. This is pure cleanup, no functional change intended. Signed-off-by: Srikar Dronamraju Cc: Linus Torvalds Cc: Ananth N Mavinakayanahalli Cc: Jim Keniston Cc: Linux-mm Cc: Oleg Nesterov Cc: Andi Kleen Cc: Christoph Hellwig Cc: Steven Rostedt Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/20120312092514.5379.36595.sendpatchset@srdronam.in.ibm.com Signed-off-by: Ingo Molnar --- arch/x86/include/asm/uprobes.h | 6 +++--- arch/x86/kernel/uprobes.c | 18 +++++++++--------- include/linux/uprobes.h | 4 ++-- kernel/events/uprobes.c | 18 +++++++++--------- 4 files changed, 23 insertions(+), 23 deletions(-) (limited to 'include') diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h index f7ce310a429d..5c399e446512 100644 --- a/arch/x86/include/asm/uprobes.h +++ b/arch/x86/include/asm/uprobes.h @@ -26,10 +26,10 @@ typedef u8 uprobe_opcode_t; #define MAX_UINSN_BYTES 16 -#define UPROBES_XOL_SLOT_BYTES 128 /* to keep it cache aligned */ +#define UPROBE_XOL_SLOT_BYTES 128 /* to keep it cache aligned */ -#define UPROBES_BKPT_INSN 0xcc -#define UPROBES_BKPT_INSN_SIZE 1 +#define UPROBE_BKPT_INSN 0xcc +#define UPROBE_BKPT_INSN_SIZE 1 struct arch_uprobe { u16 fixups; diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c index 04dfcef2d028..6dfa89e6f24a 100644 --- a/arch/x86/kernel/uprobes.c +++ b/arch/x86/kernel/uprobes.c @@ -31,14 +31,14 @@ /* Post-execution fixups. */ /* No fixup needed */ -#define UPROBES_FIX_NONE 0x0 +#define UPROBE_FIX_NONE 0x0 /* Adjust IP back to vicinity of actual insn */ -#define UPROBES_FIX_IP 0x1 +#define UPROBE_FIX_IP 0x1 /* Adjust the return address of a call insn */ -#define UPROBES_FIX_CALL 0x2 +#define UPROBE_FIX_CALL 0x2 -#define UPROBES_FIX_RIP_AX 0x8000 -#define UPROBES_FIX_RIP_CX 0x4000 +#define UPROBE_FIX_RIP_AX 0x8000 +#define UPROBE_FIX_RIP_CX 0x4000 /* Adaptations for mhiramat x86 decoder v14. */ #define OPCODE1(insn) ((insn)->opcode.bytes[0]) @@ -269,9 +269,9 @@ static void prepare_fixups(struct arch_uprobe *auprobe, struct insn *insn) break; } if (fix_ip) - auprobe->fixups |= UPROBES_FIX_IP; + auprobe->fixups |= UPROBE_FIX_IP; if (fix_call) - auprobe->fixups |= UPROBES_FIX_CALL; + auprobe->fixups |= UPROBE_FIX_CALL; } #ifdef CONFIG_X86_64 @@ -341,12 +341,12 @@ static void handle_riprel_insn(struct mm_struct *mm, struct arch_uprobe *auprobe * is NOT the register operand, so we use %rcx (register * #1) for the scratch register. */ - auprobe->fixups = UPROBES_FIX_RIP_CX; + auprobe->fixups = UPROBE_FIX_RIP_CX; /* Change modrm from 00 000 101 to 00 000 001. */ *cursor = 0x1; } else { /* Use %rax (register #0) for the scratch register. */ - auprobe->fixups = UPROBES_FIX_RIP_AX; + auprobe->fixups = UPROBE_FIX_RIP_AX; /* Change modrm from 00 xxx 101 to 00 xxx 000 */ *cursor = (reg << 3); } diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h index f85797e1ccd4..838fb312926a 100644 --- a/include/linux/uprobes.h +++ b/include/linux/uprobes.h @@ -35,10 +35,10 @@ struct vm_area_struct; /* flags that denote/change uprobes behaviour */ /* Have a copy of original instruction */ -#define UPROBES_COPY_INSN 0x1 +#define UPROBE_COPY_INSN 0x1 /* Dont run handlers when first register/ last unregister in progress*/ -#define UPROBES_RUN_HANDLER 0x2 +#define UPROBE_RUN_HANDLER 0x2 struct uprobe_consumer { int (*handler)(struct uprobe_consumer *self, struct pt_regs *regs); diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 5ce32e3ae9e9..0d36bf3920ba 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -177,7 +177,7 @@ out: */ bool __weak is_bkpt_insn(uprobe_opcode_t *insn) { - return *insn == UPROBES_BKPT_INSN; + return *insn == UPROBE_BKPT_INSN; } /* @@ -259,8 +259,8 @@ static int write_opcode(struct mm_struct *mm, struct arch_uprobe *auprobe, /* poke the new insn in, ASSUMES we don't cross page boundary */ vaddr &= ~PAGE_MASK; - BUG_ON(vaddr + UPROBES_BKPT_INSN_SIZE > PAGE_SIZE); - memcpy(vaddr_new + vaddr, &opcode, UPROBES_BKPT_INSN_SIZE); + BUG_ON(vaddr + UPROBE_BKPT_INSN_SIZE > PAGE_SIZE); + memcpy(vaddr_new + vaddr, &opcode, UPROBE_BKPT_INSN_SIZE); kunmap_atomic(vaddr_new); kunmap_atomic(vaddr_old); @@ -308,7 +308,7 @@ static int read_opcode(struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_ lock_page(page); vaddr_new = kmap_atomic(page); vaddr &= ~PAGE_MASK; - memcpy(opcode, vaddr_new + vaddr, UPROBES_BKPT_INSN_SIZE); + memcpy(opcode, vaddr_new + vaddr, UPROBE_BKPT_INSN_SIZE); kunmap_atomic(vaddr_new); unlock_page(page); @@ -352,7 +352,7 @@ int __weak set_bkpt(struct mm_struct *mm, struct arch_uprobe *auprobe, unsigned if (result) return result; - return write_opcode(mm, auprobe, vaddr, UPROBES_BKPT_INSN); + return write_opcode(mm, auprobe, vaddr, UPROBE_BKPT_INSN); } /** @@ -635,7 +635,7 @@ static int install_breakpoint(struct mm_struct *mm, struct uprobe *uprobe, addr = (unsigned long)vaddr; - if (!(uprobe->flags & UPROBES_COPY_INSN)) { + if (!(uprobe->flags & UPROBE_COPY_INSN)) { ret = copy_insn(uprobe, vma, addr); if (ret) return ret; @@ -647,7 +647,7 @@ static int install_breakpoint(struct mm_struct *mm, struct uprobe *uprobe, if (ret) return ret; - uprobe->flags |= UPROBES_COPY_INSN; + uprobe->flags |= UPROBE_COPY_INSN; } ret = set_bkpt(mm, &uprobe->arch, addr); @@ -857,7 +857,7 @@ int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer * uprobe->consumers = NULL; __uprobe_unregister(uprobe); } else { - uprobe->flags |= UPROBES_RUN_HANDLER; + uprobe->flags |= UPROBE_RUN_HANDLER; } } @@ -889,7 +889,7 @@ void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consume if (consumer_del(uprobe, consumer)) { if (!uprobe->consumers) { __uprobe_unregister(uprobe); - uprobe->flags &= ~UPROBES_RUN_HANDLER; + uprobe->flags &= ~UPROBE_RUN_HANDLER; } } -- cgit v1.2.3 From e3343e6a2819ff5d0dfc4bb5c9fb7f9a4d04da73 Mon Sep 17 00:00:00 2001 From: Srikar Dronamraju Date: Mon, 12 Mar 2012 14:55:30 +0530 Subject: uprobes/core: Make order of function parameters consistent across functions If a function takes struct uprobe or struct arch_uprobe, then it is passed as the first parameter. This is pure cleanup, no functional change intended. Signed-off-by: Srikar Dronamraju Cc: Linus Torvalds Cc: Ananth N Mavinakayanahalli Cc: Jim Keniston Cc: Linux-mm Cc: Oleg Nesterov Cc: Andi Kleen Cc: Christoph Hellwig Cc: Steven Rostedt Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/20120312092530.5379.18394.sendpatchset@srdronam.in.ibm.com Signed-off-by: Ingo Molnar --- arch/x86/include/asm/uprobes.h | 2 +- arch/x86/kernel/uprobes.c | 15 +++---- include/linux/uprobes.h | 12 +++--- kernel/events/uprobes.c | 93 ++++++++++++++++++++++-------------------- 4 files changed, 63 insertions(+), 59 deletions(-) (limited to 'include') diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h index 5c399e446512..384f1bebf884 100644 --- a/arch/x86/include/asm/uprobes.h +++ b/arch/x86/include/asm/uprobes.h @@ -39,5 +39,5 @@ struct arch_uprobe { #endif }; -extern int arch_uprobes_analyze_insn(struct mm_struct *mm, struct arch_uprobe *arch_uprobe); +extern int arch_uprobes_analyze_insn(struct arch_uprobe *aup, struct mm_struct *mm); #endif /* _ASM_UPROBES_H */ diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c index 6dfa89e6f24a..851a11b0d38c 100644 --- a/arch/x86/kernel/uprobes.c +++ b/arch/x86/kernel/uprobes.c @@ -297,7 +297,8 @@ static void prepare_fixups(struct arch_uprobe *auprobe, struct insn *insn) * - There's never a SIB byte. * - The displacement is always 4 bytes. */ -static void handle_riprel_insn(struct mm_struct *mm, struct arch_uprobe *auprobe, struct insn *insn) +static void +handle_riprel_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, struct insn *insn) { u8 *cursor; u8 reg; @@ -381,19 +382,19 @@ static int validate_insn_64bits(struct arch_uprobe *auprobe, struct insn *insn) return -ENOTSUPP; } -static int validate_insn_bits(struct mm_struct *mm, struct arch_uprobe *auprobe, struct insn *insn) +static int validate_insn_bits(struct arch_uprobe *auprobe, struct mm_struct *mm, struct insn *insn) { if (mm->context.ia32_compat) return validate_insn_32bits(auprobe, insn); return validate_insn_64bits(auprobe, insn); } #else /* 32-bit: */ -static void handle_riprel_insn(struct mm_struct *mm, struct arch_uprobe *auprobe, struct insn *insn) +static void handle_riprel_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, struct insn *insn) { /* No RIP-relative addressing on 32-bit */ } -static int validate_insn_bits(struct mm_struct *mm, struct arch_uprobe *auprobe, struct insn *insn) +static int validate_insn_bits(struct arch_uprobe *auprobe, struct mm_struct *mm, struct insn *insn) { return validate_insn_32bits(auprobe, insn); } @@ -405,17 +406,17 @@ static int validate_insn_bits(struct mm_struct *mm, struct arch_uprobe *auprobe, * @arch_uprobe: the probepoint information. * Return 0 on success or a -ve number on error. */ -int arch_uprobes_analyze_insn(struct mm_struct *mm, struct arch_uprobe *auprobe) +int arch_uprobes_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm) { int ret; struct insn insn; auprobe->fixups = 0; - ret = validate_insn_bits(mm, auprobe, &insn); + ret = validate_insn_bits(auprobe, mm, &insn); if (ret != 0) return ret; - handle_riprel_insn(mm, auprobe, &insn); + handle_riprel_insn(auprobe, mm, &insn); prepare_fixups(auprobe, &insn); return 0; diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h index 838fb312926a..58699182e9a7 100644 --- a/include/linux/uprobes.h +++ b/include/linux/uprobes.h @@ -52,20 +52,20 @@ struct uprobe_consumer { }; #ifdef CONFIG_UPROBES -extern int __weak set_bkpt(struct mm_struct *mm, struct arch_uprobe *auprobe, unsigned long vaddr); -extern int __weak set_orig_insn(struct mm_struct *mm, struct arch_uprobe *auprobe, unsigned long vaddr, bool verify); +extern int __weak set_bkpt(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr); +extern int __weak set_orig_insn(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr, bool verify); extern bool __weak is_bkpt_insn(uprobe_opcode_t *insn); -extern int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *consumer); -extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *consumer); +extern int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc); +extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc); extern int uprobe_mmap(struct vm_area_struct *vma); #else /* CONFIG_UPROBES is not defined */ static inline int -uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *consumer) +uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc) { return -ENOSYS; } static inline void -uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *consumer) +uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc) { } static inline int uprobe_mmap(struct vm_area_struct *vma) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 0d36bf3920ba..9c5ddff1c8da 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -192,8 +192,8 @@ bool __weak is_bkpt_insn(uprobe_opcode_t *insn) /* * write_opcode - write the opcode at a given virtual address. + * @auprobe: arch breakpointing information. * @mm: the probed process address space. - * @arch_uprobe: the breakpointing information. * @vaddr: the virtual address to store the opcode. * @opcode: opcode to be written at @vaddr. * @@ -203,7 +203,7 @@ bool __weak is_bkpt_insn(uprobe_opcode_t *insn) * For mm @mm, write the opcode at @vaddr. * Return 0 (success) or a negative errno. */ -static int write_opcode(struct mm_struct *mm, struct arch_uprobe *auprobe, +static int write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_t opcode) { struct page *old_page, *new_page; @@ -334,14 +334,14 @@ static int is_bkpt_at_addr(struct mm_struct *mm, unsigned long vaddr) /** * set_bkpt - store breakpoint at a given address. + * @auprobe: arch specific probepoint information. * @mm: the probed process address space. - * @uprobe: the probepoint information. * @vaddr: the virtual address to insert the opcode. * * For mm @mm, store the breakpoint instruction at @vaddr. * Return 0 (success) or a negative errno. */ -int __weak set_bkpt(struct mm_struct *mm, struct arch_uprobe *auprobe, unsigned long vaddr) +int __weak set_bkpt(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr) { int result; @@ -352,13 +352,13 @@ int __weak set_bkpt(struct mm_struct *mm, struct arch_uprobe *auprobe, unsigned if (result) return result; - return write_opcode(mm, auprobe, vaddr, UPROBE_BKPT_INSN); + return write_opcode(auprobe, mm, vaddr, UPROBE_BKPT_INSN); } /** * set_orig_insn - Restore the original instruction. * @mm: the probed process address space. - * @uprobe: the probepoint information. + * @auprobe: arch specific probepoint information. * @vaddr: the virtual address to insert the opcode. * @verify: if true, verify existance of breakpoint instruction. * @@ -366,7 +366,7 @@ int __weak set_bkpt(struct mm_struct *mm, struct arch_uprobe *auprobe, unsigned * Return 0 (success) or a negative errno. */ int __weak -set_orig_insn(struct mm_struct *mm, struct arch_uprobe *auprobe, unsigned long vaddr, bool verify) +set_orig_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr, bool verify) { if (verify) { int result; @@ -378,7 +378,7 @@ set_orig_insn(struct mm_struct *mm, struct arch_uprobe *auprobe, unsigned long v if (result != 1) return result; } - return write_opcode(mm, auprobe, vaddr, *(uprobe_opcode_t *)auprobe->insn); + return write_opcode(auprobe, mm, vaddr, *(uprobe_opcode_t *)auprobe->insn); } static int match_uprobe(struct uprobe *l, struct uprobe *r) @@ -525,30 +525,30 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset) /* Returns the previous consumer */ static struct uprobe_consumer * -consumer_add(struct uprobe *uprobe, struct uprobe_consumer *consumer) +consumer_add(struct uprobe *uprobe, struct uprobe_consumer *uc) { down_write(&uprobe->consumer_rwsem); - consumer->next = uprobe->consumers; - uprobe->consumers = consumer; + uc->next = uprobe->consumers; + uprobe->consumers = uc; up_write(&uprobe->consumer_rwsem); - return consumer->next; + return uc->next; } /* - * For uprobe @uprobe, delete the consumer @consumer. - * Return true if the @consumer is deleted successfully + * For uprobe @uprobe, delete the consumer @uc. + * Return true if the @uc is deleted successfully * or return false. */ -static bool consumer_del(struct uprobe *uprobe, struct uprobe_consumer *consumer) +static bool consumer_del(struct uprobe *uprobe, struct uprobe_consumer *uc) { struct uprobe_consumer **con; bool ret = false; down_write(&uprobe->consumer_rwsem); for (con = &uprobe->consumers; *con; con = &(*con)->next) { - if (*con == consumer) { - *con = consumer->next; + if (*con == uc) { + *con = uc->next; ret = true; break; } @@ -558,8 +558,8 @@ static bool consumer_del(struct uprobe *uprobe, struct uprobe_consumer *consumer return ret; } -static int __copy_insn(struct address_space *mapping, - struct vm_area_struct *vma, char *insn, +static int +__copy_insn(struct address_space *mapping, struct vm_area_struct *vma, char *insn, unsigned long nbytes, unsigned long offset) { struct file *filp = vma->vm_file; @@ -590,7 +590,8 @@ static int __copy_insn(struct address_space *mapping, return 0; } -static int copy_insn(struct uprobe *uprobe, struct vm_area_struct *vma, unsigned long addr) +static int +copy_insn(struct uprobe *uprobe, struct vm_area_struct *vma, unsigned long addr) { struct address_space *mapping; unsigned long nbytes; @@ -617,8 +618,9 @@ static int copy_insn(struct uprobe *uprobe, struct vm_area_struct *vma, unsigned return __copy_insn(mapping, vma, uprobe->arch.insn, bytes, uprobe->offset); } -static int install_breakpoint(struct mm_struct *mm, struct uprobe *uprobe, - struct vm_area_struct *vma, loff_t vaddr) +static int +install_breakpoint(struct uprobe *uprobe, struct mm_struct *mm, + struct vm_area_struct *vma, loff_t vaddr) { unsigned long addr; int ret; @@ -643,20 +645,21 @@ static int install_breakpoint(struct mm_struct *mm, struct uprobe *uprobe, if (is_bkpt_insn((uprobe_opcode_t *)uprobe->arch.insn)) return -EEXIST; - ret = arch_uprobes_analyze_insn(mm, &uprobe->arch); + ret = arch_uprobes_analyze_insn(&uprobe->arch, mm); if (ret) return ret; uprobe->flags |= UPROBE_COPY_INSN; } - ret = set_bkpt(mm, &uprobe->arch, addr); + ret = set_bkpt(&uprobe->arch, mm, addr); return ret; } -static void remove_breakpoint(struct mm_struct *mm, struct uprobe *uprobe, loff_t vaddr) +static void +remove_breakpoint(struct uprobe *uprobe, struct mm_struct *mm, loff_t vaddr) { - set_orig_insn(mm, &uprobe->arch, (unsigned long)vaddr, true); + set_orig_insn(&uprobe->arch, mm, (unsigned long)vaddr, true); } static void delete_uprobe(struct uprobe *uprobe) @@ -671,9 +674,9 @@ static void delete_uprobe(struct uprobe *uprobe) atomic_dec(&uprobe_events); } -static struct vma_info *__find_next_vma_info(struct list_head *head, - loff_t offset, struct address_space *mapping, - struct vma_info *vi, bool is_register) +static struct vma_info * +__find_next_vma_info(struct address_space *mapping, struct list_head *head, + struct vma_info *vi, loff_t offset, bool is_register) { struct prio_tree_iter iter; struct vm_area_struct *vma; @@ -719,8 +722,8 @@ static struct vma_info *__find_next_vma_info(struct list_head *head, * yet been inserted. */ static struct vma_info * -find_next_vma_info(struct list_head *head, loff_t offset, struct address_space *mapping, - bool is_register) +find_next_vma_info(struct address_space *mapping, struct list_head *head, + loff_t offset, bool is_register) { struct vma_info *vi, *retvi; @@ -729,7 +732,7 @@ find_next_vma_info(struct list_head *head, loff_t offset, struct address_space * return ERR_PTR(-ENOMEM); mutex_lock(&mapping->i_mmap_mutex); - retvi = __find_next_vma_info(head, offset, mapping, vi, is_register); + retvi = __find_next_vma_info(mapping, head, vi, offset, is_register); mutex_unlock(&mapping->i_mmap_mutex); if (!retvi) @@ -754,7 +757,7 @@ static int register_for_each_vma(struct uprobe *uprobe, bool is_register) ret = 0; for (;;) { - vi = find_next_vma_info(&try_list, uprobe->offset, mapping, is_register); + vi = find_next_vma_info(mapping, &try_list, uprobe->offset, is_register); if (!vi) break; @@ -784,9 +787,9 @@ static int register_for_each_vma(struct uprobe *uprobe, bool is_register) } if (is_register) - ret = install_breakpoint(mm, uprobe, vma, vi->vaddr); + ret = install_breakpoint(uprobe, mm, vma, vi->vaddr); else - remove_breakpoint(mm, uprobe, vi->vaddr); + remove_breakpoint(uprobe, mm, vi->vaddr); up_read(&mm->mmap_sem); mmput(mm); @@ -823,25 +826,25 @@ static void __uprobe_unregister(struct uprobe *uprobe) * uprobe_register - register a probe * @inode: the file in which the probe has to be placed. * @offset: offset from the start of the file. - * @consumer: information on howto handle the probe.. + * @uc: information on howto handle the probe.. * * Apart from the access refcount, uprobe_register() takes a creation * refcount (thro alloc_uprobe) if and only if this @uprobe is getting * inserted into the rbtree (i.e first consumer for a @inode:@offset * tuple). Creation refcount stops uprobe_unregister from freeing the * @uprobe even before the register operation is complete. Creation - * refcount is released when the last @consumer for the @uprobe + * refcount is released when the last @uc for the @uprobe * unregisters. * * Return errno if it cannot successully install probes * else return 0 (success) */ -int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *consumer) +int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc) { struct uprobe *uprobe; int ret; - if (!inode || !consumer || consumer->next) + if (!inode || !uc || uc->next) return -EINVAL; if (offset > i_size_read(inode)) @@ -851,7 +854,7 @@ int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer * mutex_lock(uprobes_hash(inode)); uprobe = alloc_uprobe(inode, offset); - if (uprobe && !consumer_add(uprobe, consumer)) { + if (uprobe && !consumer_add(uprobe, uc)) { ret = __uprobe_register(uprobe); if (ret) { uprobe->consumers = NULL; @@ -871,13 +874,13 @@ int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer * * uprobe_unregister - unregister a already registered probe. * @inode: the file in which the probe has to be removed. * @offset: offset from the start of the file. - * @consumer: identify which probe if multiple probes are colocated. + * @uc: identify which probe if multiple probes are colocated. */ -void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *consumer) +void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc) { struct uprobe *uprobe; - if (!inode || !consumer) + if (!inode || !uc) return; uprobe = find_uprobe(inode, offset); @@ -886,7 +889,7 @@ void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consume mutex_lock(uprobes_hash(inode)); - if (consumer_del(uprobe, consumer)) { + if (consumer_del(uprobe, uc)) { if (!uprobe->consumers) { __uprobe_unregister(uprobe); uprobe->flags &= ~UPROBE_RUN_HANDLER; @@ -993,7 +996,7 @@ int uprobe_mmap(struct vm_area_struct *vma) if (!ret) { vaddr = vma_address(vma, uprobe->offset); if (vaddr >= vma->vm_start && vaddr < vma->vm_end) { - ret = install_breakpoint(vma->vm_mm, uprobe, vma, vaddr); + ret = install_breakpoint(uprobe, vma->vm_mm, vma, vaddr); /* Ignore double add: */ if (ret == -EEXIST) ret = 0; -- cgit v1.2.3 From 5cb4ac3a583d4ee18c8682ab857e093c4a0d0895 Mon Sep 17 00:00:00 2001 From: Srikar Dronamraju Date: Mon, 12 Mar 2012 14:55:45 +0530 Subject: uprobes/core: Rename bkpt to swbp bkpt doesnt seem to be a correct abbrevation for breakpoint. Choice was between bp and breakpoint. Since bp can refer to things other than breakpoint, use swbp to refer to breakpoints. This is pure cleanup, no functional change intended. Signed-off-by: Srikar Dronamraju Cc: Linus Torvalds Cc: Ananth N Mavinakayanahalli Cc: Jim Keniston Cc: Linux-mm Cc: Oleg Nesterov Cc: Andi Kleen Cc: Christoph Hellwig Cc: Steven Rostedt Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/20120312092545.5379.91251.sendpatchset@srdronam.in.ibm.com Signed-off-by: Ingo Molnar --- arch/x86/include/asm/uprobes.h | 4 ++-- include/linux/uprobes.h | 4 ++-- kernel/events/uprobes.c | 34 +++++++++++++++++----------------- 3 files changed, 21 insertions(+), 21 deletions(-) (limited to 'include') diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h index 384f1bebf884..0500391f57d0 100644 --- a/arch/x86/include/asm/uprobes.h +++ b/arch/x86/include/asm/uprobes.h @@ -28,8 +28,8 @@ typedef u8 uprobe_opcode_t; #define MAX_UINSN_BYTES 16 #define UPROBE_XOL_SLOT_BYTES 128 /* to keep it cache aligned */ -#define UPROBE_BKPT_INSN 0xcc -#define UPROBE_BKPT_INSN_SIZE 1 +#define UPROBE_SWBP_INSN 0xcc +#define UPROBE_SWBP_INSN_SIZE 1 struct arch_uprobe { u16 fixups; diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h index 58699182e9a7..eac525f41b94 100644 --- a/include/linux/uprobes.h +++ b/include/linux/uprobes.h @@ -52,9 +52,9 @@ struct uprobe_consumer { }; #ifdef CONFIG_UPROBES -extern int __weak set_bkpt(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr); +extern int __weak set_swbp(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr); extern int __weak set_orig_insn(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr, bool verify); -extern bool __weak is_bkpt_insn(uprobe_opcode_t *insn); +extern bool __weak is_swbp_insn(uprobe_opcode_t *insn); extern int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc); extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc); extern int uprobe_mmap(struct vm_area_struct *vma); diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index 9c5ddff1c8da..e56e56aa7535 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -170,14 +170,14 @@ out: } /** - * is_bkpt_insn - check if instruction is breakpoint instruction. + * is_swbp_insn - check if instruction is breakpoint instruction. * @insn: instruction to be checked. - * Default implementation of is_bkpt_insn + * Default implementation of is_swbp_insn * Returns true if @insn is a breakpoint instruction. */ -bool __weak is_bkpt_insn(uprobe_opcode_t *insn) +bool __weak is_swbp_insn(uprobe_opcode_t *insn) { - return *insn == UPROBE_BKPT_INSN; + return *insn == UPROBE_SWBP_INSN; } /* @@ -227,7 +227,7 @@ static int write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, * adding probes in write mapped pages since the breakpoints * might end up in the file copy. */ - if (!valid_vma(vma, is_bkpt_insn(&opcode))) + if (!valid_vma(vma, is_swbp_insn(&opcode))) goto put_out; uprobe = container_of(auprobe, struct uprobe, arch); @@ -259,8 +259,8 @@ static int write_opcode(struct arch_uprobe *auprobe, struct mm_struct *mm, /* poke the new insn in, ASSUMES we don't cross page boundary */ vaddr &= ~PAGE_MASK; - BUG_ON(vaddr + UPROBE_BKPT_INSN_SIZE > PAGE_SIZE); - memcpy(vaddr_new + vaddr, &opcode, UPROBE_BKPT_INSN_SIZE); + BUG_ON(vaddr + UPROBE_SWBP_INSN_SIZE > PAGE_SIZE); + memcpy(vaddr_new + vaddr, &opcode, UPROBE_SWBP_INSN_SIZE); kunmap_atomic(vaddr_new); kunmap_atomic(vaddr_old); @@ -308,7 +308,7 @@ static int read_opcode(struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_ lock_page(page); vaddr_new = kmap_atomic(page); vaddr &= ~PAGE_MASK; - memcpy(opcode, vaddr_new + vaddr, UPROBE_BKPT_INSN_SIZE); + memcpy(opcode, vaddr_new + vaddr, UPROBE_SWBP_INSN_SIZE); kunmap_atomic(vaddr_new); unlock_page(page); @@ -317,7 +317,7 @@ static int read_opcode(struct mm_struct *mm, unsigned long vaddr, uprobe_opcode_ return 0; } -static int is_bkpt_at_addr(struct mm_struct *mm, unsigned long vaddr) +static int is_swbp_at_addr(struct mm_struct *mm, unsigned long vaddr) { uprobe_opcode_t opcode; int result; @@ -326,14 +326,14 @@ static int is_bkpt_at_addr(struct mm_struct *mm, unsigned long vaddr) if (result) return result; - if (is_bkpt_insn(&opcode)) + if (is_swbp_insn(&opcode)) return 1; return 0; } /** - * set_bkpt - store breakpoint at a given address. + * set_swbp - store breakpoint at a given address. * @auprobe: arch specific probepoint information. * @mm: the probed process address space. * @vaddr: the virtual address to insert the opcode. @@ -341,18 +341,18 @@ static int is_bkpt_at_addr(struct mm_struct *mm, unsigned long vaddr) * For mm @mm, store the breakpoint instruction at @vaddr. * Return 0 (success) or a negative errno. */ -int __weak set_bkpt(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr) +int __weak set_swbp(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long vaddr) { int result; - result = is_bkpt_at_addr(mm, vaddr); + result = is_swbp_at_addr(mm, vaddr); if (result == 1) return -EEXIST; if (result) return result; - return write_opcode(auprobe, mm, vaddr, UPROBE_BKPT_INSN); + return write_opcode(auprobe, mm, vaddr, UPROBE_SWBP_INSN); } /** @@ -371,7 +371,7 @@ set_orig_insn(struct arch_uprobe *auprobe, struct mm_struct *mm, unsigned long v if (verify) { int result; - result = is_bkpt_at_addr(mm, vaddr); + result = is_swbp_at_addr(mm, vaddr); if (!result) return -EINVAL; @@ -642,7 +642,7 @@ install_breakpoint(struct uprobe *uprobe, struct mm_struct *mm, if (ret) return ret; - if (is_bkpt_insn((uprobe_opcode_t *)uprobe->arch.insn)) + if (is_swbp_insn((uprobe_opcode_t *)uprobe->arch.insn)) return -EEXIST; ret = arch_uprobes_analyze_insn(&uprobe->arch, mm); @@ -651,7 +651,7 @@ install_breakpoint(struct uprobe *uprobe, struct mm_struct *mm, uprobe->flags |= UPROBE_COPY_INSN; } - ret = set_bkpt(&uprobe->arch, mm, addr); + ret = set_swbp(&uprobe->arch, mm, addr); return ret; } -- cgit v1.2.3 From 0326f5a94ddea33fa331b2519f4172f4fb387baa Mon Sep 17 00:00:00 2001 From: Srikar Dronamraju Date: Tue, 13 Mar 2012 23:30:11 +0530 Subject: uprobes/core: Handle breakpoint and singlestep exceptions Uprobes uses exception notifiers to get to know if a thread hit a breakpoint or a singlestep exception. When a thread hits a uprobe or is singlestepping post a uprobe hit, the uprobe exception notifier sets its TIF_UPROBE bit, which will then be checked on its return to userspace path (do_notify_resume() ->uprobe_notify_resume()), where the consumers handlers are run (in task context) based on the defined filters. Uprobe hits are thread specific and hence we need to maintain information about if a task hit a uprobe, what uprobe was hit, the slot where the original instruction was copied for xol so that it can be singlestepped with appropriate fixups. In some cases, special care is needed for instructions that are executed out of line (xol). These are architecture specific artefacts, such as handling RIP relative instructions on x86_64. Since the instruction at which the uprobe was inserted is executed out of line, architecture specific fixups are added so that the thread continues normal execution in the presence of a uprobe. Postpone the signals until we execute the probed insn. post_xol() path does a recalc_sigpending() before return to user-mode, this ensures the signal can't be lost. Uprobes relies on DIE_DEBUG notification to notify if a singlestep is complete. Adds x86 specific uprobe exception notifiers and appropriate hooks needed to determine a uprobe hit and subsequent post processing. Add requisite x86 fixups for xol for uprobes. Specific cases needing fixups include relative jumps (x86_64), calls, etc. Where possible, we check and skip singlestepping the breakpointed instructions. For now we skip single byte as well as few multibyte nop instructions. However this can be extended to other instructions too. Credits to Oleg Nesterov for suggestions/patches related to signal, breakpoint, singlestep handling code. Signed-off-by: Srikar Dronamraju Cc: Linus Torvalds Cc: Ananth N Mavinakayanahalli Cc: Jim Keniston Cc: Linux-mm Cc: Oleg Nesterov Cc: Andi Kleen Cc: Christoph Hellwig Cc: Steven Rostedt Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/20120313180011.29771.89027.sendpatchset@srdronam.in.ibm.com [ Performed various cleanliness edits ] Signed-off-by: Ingo Molnar --- arch/x86/include/asm/thread_info.h | 2 + arch/x86/include/asm/uprobes.h | 16 +- arch/x86/kernel/signal.c | 6 + arch/x86/kernel/uprobes.c | 265 +++++++++++++++++++++++++++++- include/linux/sched.h | 4 + include/linux/uprobes.h | 55 ++++++- kernel/events/uprobes.c | 323 ++++++++++++++++++++++++++++++++++++- kernel/fork.c | 4 + kernel/signal.c | 4 + 9 files changed, 664 insertions(+), 15 deletions(-) (limited to 'include') diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h index ad6df8ccd715..0710c11305d4 100644 --- a/arch/x86/include/asm/thread_info.h +++ b/arch/x86/include/asm/thread_info.h @@ -85,6 +85,7 @@ struct thread_info { #define TIF_SECCOMP 8 /* secure computing */ #define TIF_MCE_NOTIFY 10 /* notify userspace of an MCE */ #define TIF_USER_RETURN_NOTIFY 11 /* notify kernel of userspace return */ +#define TIF_UPROBE 12 /* breakpointed or singlestepping */ #define TIF_NOTSC 16 /* TSC is not accessible in userland */ #define TIF_IA32 17 /* IA32 compatibility process */ #define TIF_FORK 18 /* ret_from_fork */ @@ -109,6 +110,7 @@ struct thread_info { #define _TIF_SECCOMP (1 << TIF_SECCOMP) #define _TIF_MCE_NOTIFY (1 << TIF_MCE_NOTIFY) #define _TIF_USER_RETURN_NOTIFY (1 << TIF_USER_RETURN_NOTIFY) +#define _TIF_UPROBE (1 << TIF_UPROBE) #define _TIF_NOTSC (1 << TIF_NOTSC) #define _TIF_IA32 (1 << TIF_IA32) #define _TIF_FORK (1 << TIF_FORK) diff --git a/arch/x86/include/asm/uprobes.h b/arch/x86/include/asm/uprobes.h index 0500391f57d0..1e9bed14f7ae 100644 --- a/arch/x86/include/asm/uprobes.h +++ b/arch/x86/include/asm/uprobes.h @@ -23,6 +23,8 @@ * Jim Keniston */ +#include + typedef u8 uprobe_opcode_t; #define MAX_UINSN_BYTES 16 @@ -39,5 +41,17 @@ struct arch_uprobe { #endif }; -extern int arch_uprobes_analyze_insn(struct arch_uprobe *aup, struct mm_struct *mm); +struct arch_uprobe_task { + unsigned long saved_trap_nr; +#ifdef CONFIG_X86_64 + unsigned long saved_scratch_register; +#endif +}; + +extern int arch_uprobe_analyze_insn(struct arch_uprobe *aup, struct mm_struct *mm); +extern int arch_uprobe_pre_xol(struct arch_uprobe *aup, struct pt_regs *regs); +extern int arch_uprobe_post_xol(struct arch_uprobe *aup, struct pt_regs *regs); +extern bool arch_uprobe_xol_was_trapped(struct task_struct *tsk); +extern int arch_uprobe_exception_notify(struct notifier_block *self, unsigned long val, void *data); +extern void arch_uprobe_abort_xol(struct arch_uprobe *aup, struct pt_regs *regs); #endif /* _ASM_UPROBES_H */ diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c index 9c73acc1c860..b3cd6913ceea 100644 --- a/arch/x86/kernel/signal.c +++ b/arch/x86/kernel/signal.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include @@ -823,6 +824,11 @@ do_notify_resume(struct pt_regs *regs, void *unused, __u32 thread_info_flags) mce_notify_process(); #endif /* CONFIG_X86_64 && CONFIG_X86_MCE */ + if (thread_info_flags & _TIF_UPROBE) { + clear_thread_flag(TIF_UPROBE); + uprobe_notify_resume(regs); + } + /* deal with pending signal delivery */ if (thread_info_flags & _TIF_SIGPENDING) do_signal(regs); diff --git a/arch/x86/kernel/uprobes.c b/arch/x86/kernel/uprobes.c index 851a11b0d38c..dc4e910a7d96 100644 --- a/arch/x86/kernel/uprobes.c +++ b/arch/x86/kernel/uprobes.c @@ -24,22 +24,28 @@ #include #include #include +#include #include +#include #include /* Post-execution fixups. */ /* No fixup needed */ -#define UPROBE_FIX_NONE 0x0 +#define UPROBE_FIX_NONE 0x0 + /* Adjust IP back to vicinity of actual insn */ #define UPROBE_FIX_IP 0x1 + /* Adjust the return address of a call insn */ #define UPROBE_FIX_CALL 0x2 #define UPROBE_FIX_RIP_AX 0x8000 #define UPROBE_FIX_RIP_CX 0x4000 +#define UPROBE_TRAP_NR UINT_MAX + /* Adaptations for mhiramat x86 decoder v14. */ #define OPCODE1(insn) ((insn)->opcode.bytes[0]) #define OPCODE2(insn) ((insn)->opcode.bytes[1]) @@ -221,10 +227,9 @@ static int validate_insn_32bits(struct arch_uprobe *auprobe, struct insn *insn) } /* - * Figure out which fixups post_xol() will need to perform, and annotate - * arch_uprobe->fixups accordingly. To start with, - * arch_uprobe->fixups is either zero or it reflects rip-related - * fixups. + * Figure out which fixups arch_uprobe_post_xol() will need to perform, and + * annotate arch_uprobe->fixups accordingly. To start with, + * arch_uprobe->fixups is either zero or it reflects rip-related fixups. */ static void prepare_fixups(struct arch_uprobe *auprobe, struct insn *insn) { @@ -401,12 +406,12 @@ static int validate_insn_bits(struct arch_uprobe *auprobe, struct mm_struct *mm, #endif /* CONFIG_X86_64 */ /** - * arch_uprobes_analyze_insn - instruction analysis including validity and fixups. + * arch_uprobe_analyze_insn - instruction analysis including validity and fixups. * @mm: the probed address space. * @arch_uprobe: the probepoint information. * Return 0 on success or a -ve number on error. */ -int arch_uprobes_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm) +int arch_uprobe_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm) { int ret; struct insn insn; @@ -421,3 +426,249 @@ int arch_uprobes_analyze_insn(struct arch_uprobe *auprobe, struct mm_struct *mm) return 0; } + +#ifdef CONFIG_X86_64 +/* + * If we're emulating a rip-relative instruction, save the contents + * of the scratch register and store the target address in that register. + */ +static void +pre_xol_rip_insn(struct arch_uprobe *auprobe, struct pt_regs *regs, + struct arch_uprobe_task *autask) +{ + if (auprobe->fixups & UPROBE_FIX_RIP_AX) { + autask->saved_scratch_register = regs->ax; + regs->ax = current->utask->vaddr; + regs->ax += auprobe->rip_rela_target_address; + } else if (auprobe->fixups & UPROBE_FIX_RIP_CX) { + autask->saved_scratch_register = regs->cx; + regs->cx = current->utask->vaddr; + regs->cx += auprobe->rip_rela_target_address; + } +} +#else +static void +pre_xol_rip_insn(struct arch_uprobe *auprobe, struct pt_regs *regs, + struct arch_uprobe_task *autask) +{ + /* No RIP-relative addressing on 32-bit */ +} +#endif + +/* + * arch_uprobe_pre_xol - prepare to execute out of line. + * @auprobe: the probepoint information. + * @regs: reflects the saved user state of current task. + */ +int arch_uprobe_pre_xol(struct arch_uprobe *auprobe, struct pt_regs *regs) +{ + struct arch_uprobe_task *autask; + + autask = ¤t->utask->autask; + autask->saved_trap_nr = current->thread.trap_nr; + current->thread.trap_nr = UPROBE_TRAP_NR; + regs->ip = current->utask->xol_vaddr; + pre_xol_rip_insn(auprobe, regs, autask); + + return 0; +} + +/* + * This function is called by arch_uprobe_post_xol() to adjust the return + * address pushed by a call instruction executed out of line. + */ +static int adjust_ret_addr(unsigned long sp, long correction) +{ + int rasize, ncopied; + long ra = 0; + + if (is_ia32_task()) + rasize = 4; + else + rasize = 8; + + ncopied = copy_from_user(&ra, (void __user *)sp, rasize); + if (unlikely(ncopied)) + return -EFAULT; + + ra += correction; + ncopied = copy_to_user((void __user *)sp, &ra, rasize); + if (unlikely(ncopied)) + return -EFAULT; + + return 0; +} + +#ifdef CONFIG_X86_64 +static bool is_riprel_insn(struct arch_uprobe *auprobe) +{ + return ((auprobe->fixups & (UPROBE_FIX_RIP_AX | UPROBE_FIX_RIP_CX)) != 0); +} + +static void +handle_riprel_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs, long *correction) +{ + if (is_riprel_insn(auprobe)) { + struct arch_uprobe_task *autask; + + autask = ¤t->utask->autask; + if (auprobe->fixups & UPROBE_FIX_RIP_AX) + regs->ax = autask->saved_scratch_register; + else + regs->cx = autask->saved_scratch_register; + + /* + * The original instruction includes a displacement, and so + * is 4 bytes longer than what we've just single-stepped. + * Fall through to handle stuff like "jmpq *...(%rip)" and + * "callq *...(%rip)". + */ + if (correction) + *correction += 4; + } +} +#else +static void +handle_riprel_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs, long *correction) +{ + /* No RIP-relative addressing on 32-bit */ +} +#endif + +/* + * If xol insn itself traps and generates a signal(Say, + * SIGILL/SIGSEGV/etc), then detect the case where a singlestepped + * instruction jumps back to its own address. It is assumed that anything + * like do_page_fault/do_trap/etc sets thread.trap_nr != -1. + * + * arch_uprobe_pre_xol/arch_uprobe_post_xol save/restore thread.trap_nr, + * arch_uprobe_xol_was_trapped() simply checks that ->trap_nr is not equal to + * UPROBE_TRAP_NR == -1 set by arch_uprobe_pre_xol(). + */ +bool arch_uprobe_xol_was_trapped(struct task_struct *t) +{ + if (t->thread.trap_nr != UPROBE_TRAP_NR) + return true; + + return false; +} + +/* + * Called after single-stepping. To avoid the SMP problems that can + * occur when we temporarily put back the original opcode to + * single-step, we single-stepped a copy of the instruction. + * + * This function prepares to resume execution after the single-step. + * We have to fix things up as follows: + * + * Typically, the new ip is relative to the copied instruction. We need + * to make it relative to the original instruction (FIX_IP). Exceptions + * are return instructions and absolute or indirect jump or call instructions. + * + * If the single-stepped instruction was a call, the return address that + * is atop the stack is the address following the copied instruction. We + * need to make it the address following the original instruction (FIX_CALL). + * + * If the original instruction was a rip-relative instruction such as + * "movl %edx,0xnnnn(%rip)", we have instead executed an equivalent + * instruction using a scratch register -- e.g., "movl %edx,(%rax)". + * We need to restore the contents of the scratch register and adjust + * the ip, keeping in mind that the instruction we executed is 4 bytes + * shorter than the original instruction (since we squeezed out the offset + * field). (FIX_RIP_AX or FIX_RIP_CX) + */ +int arch_uprobe_post_xol(struct arch_uprobe *auprobe, struct pt_regs *regs) +{ + struct uprobe_task *utask; + long correction; + int result = 0; + + WARN_ON_ONCE(current->thread.trap_nr != UPROBE_TRAP_NR); + + utask = current->utask; + current->thread.trap_nr = utask->autask.saved_trap_nr; + correction = (long)(utask->vaddr - utask->xol_vaddr); + handle_riprel_post_xol(auprobe, regs, &correction); + if (auprobe->fixups & UPROBE_FIX_IP) + regs->ip += correction; + + if (auprobe->fixups & UPROBE_FIX_CALL) + result = adjust_ret_addr(regs->sp, correction); + + return result; +} + +/* callback routine for handling exceptions. */ +int arch_uprobe_exception_notify(struct notifier_block *self, unsigned long val, void *data) +{ + struct die_args *args = data; + struct pt_regs *regs = args->regs; + int ret = NOTIFY_DONE; + + /* We are only interested in userspace traps */ + if (regs && !user_mode_vm(regs)) + return NOTIFY_DONE; + + switch (val) { + case DIE_INT3: + if (uprobe_pre_sstep_notifier(regs)) + ret = NOTIFY_STOP; + + break; + + case DIE_DEBUG: + if (uprobe_post_sstep_notifier(regs)) + ret = NOTIFY_STOP; + + default: + break; + } + + return ret; +} + +/* + * This function gets called when XOL instruction either gets trapped or + * the thread has a fatal signal, so reset the instruction pointer to its + * probed address. + */ +void arch_uprobe_abort_xol(struct arch_uprobe *auprobe, struct pt_regs *regs) +{ + struct uprobe_task *utask = current->utask; + + current->thread.trap_nr = utask->autask.saved_trap_nr; + handle_riprel_post_xol(auprobe, regs, NULL); + instruction_pointer_set(regs, utask->vaddr); +} + +/* + * Skip these instructions as per the currently known x86 ISA. + * 0x66* { 0x90 | 0x0f 0x1f | 0x0f 0x19 | 0x87 0xc0 } + */ +bool arch_uprobe_skip_sstep(struct arch_uprobe *auprobe, struct pt_regs *regs) +{ + int i; + + for (i = 0; i < MAX_UINSN_BYTES; i++) { + if ((auprobe->insn[i] == 0x66)) + continue; + + if (auprobe->insn[i] == 0x90) + return true; + + if (i == (MAX_UINSN_BYTES - 1)) + break; + + if ((auprobe->insn[i] == 0x0f) && (auprobe->insn[i+1] == 0x1f)) + return true; + + if ((auprobe->insn[i] == 0x0f) && (auprobe->insn[i+1] == 0x19)) + return true; + + if ((auprobe->insn[i] == 0x87) && (auprobe->insn[i+1] == 0xc0)) + return true; + + break; + } + return false; +} diff --git a/include/linux/sched.h b/include/linux/sched.h index 7d379a6bfd88..8379e3771690 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1590,6 +1590,10 @@ struct task_struct { #ifdef CONFIG_HAVE_HW_BREAKPOINT atomic_t ptrace_bp_refcnt; #endif +#ifdef CONFIG_UPROBES + struct uprobe_task *utask; + int uprobe_srcu_id; +#endif }; /* Future-safe accessor for struct task_struct's cpus_allowed. */ diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h index eac525f41b94..5ec778fdce6f 100644 --- a/include/linux/uprobes.h +++ b/include/linux/uprobes.h @@ -28,8 +28,9 @@ #include struct vm_area_struct; + #ifdef CONFIG_ARCH_SUPPORTS_UPROBES -#include +# include #endif /* flags that denote/change uprobes behaviour */ @@ -39,6 +40,8 @@ struct vm_area_struct; /* Dont run handlers when first register/ last unregister in progress*/ #define UPROBE_RUN_HANDLER 0x2 +/* Can skip singlestep */ +#define UPROBE_SKIP_SSTEP 0x4 struct uprobe_consumer { int (*handler)(struct uprobe_consumer *self, struct pt_regs *regs); @@ -52,13 +55,42 @@ struct uprobe_consumer { }; #ifdef CONFIG_UPROBES +enum uprobe_task_state { + UTASK_RUNNING, + UTASK_BP_HIT, + UTASK_SSTEP, + UTASK_SSTEP_ACK, + UTASK_SSTEP_TRAPPED, +}; + +/* + * uprobe_task: Metadata of a task while it singlesteps. + */ +struct uprobe_task { + enum uprobe_task_state state; + struct arch_uprobe_task autask; + + struct uprobe *active_uprobe; + + unsigned long xol_vaddr; + unsigned long vaddr; +}; + extern int __weak set_swbp(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr); extern int __weak set_orig_insn(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr, bool verify); extern bool __weak is_swbp_insn(uprobe_opcode_t *insn); extern int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc); extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc); extern int uprobe_mmap(struct vm_area_struct *vma); -#else /* CONFIG_UPROBES is not defined */ +extern void uprobe_free_utask(struct task_struct *t); +extern void uprobe_copy_process(struct task_struct *t); +extern unsigned long __weak uprobe_get_swbp_addr(struct pt_regs *regs); +extern int uprobe_post_sstep_notifier(struct pt_regs *regs); +extern int uprobe_pre_sstep_notifier(struct pt_regs *regs); +extern void uprobe_notify_resume(struct pt_regs *regs); +extern bool uprobe_deny_signal(void); +extern bool __weak arch_uprobe_skip_sstep(struct arch_uprobe *aup, struct pt_regs *regs); +#else /* !CONFIG_UPROBES */ static inline int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc) { @@ -72,5 +104,22 @@ static inline int uprobe_mmap(struct vm_area_struct *vma) { return 0; } -#endif /* CONFIG_UPROBES */ +static inline void uprobe_notify_resume(struct pt_regs *regs) +{ +} +static inline bool uprobe_deny_signal(void) +{ + return false; +} +static inline unsigned long uprobe_get_swbp_addr(struct pt_regs *regs) +{ + return 0; +} +static inline void uprobe_free_utask(struct task_struct *t) +{ +} +static inline void uprobe_copy_process(struct task_struct *t) +{ +} +#endif /* !CONFIG_UPROBES */ #endif /* _LINUX_UPROBES_H */ diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index e56e56aa7535..b807d1566b64 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -30,9 +30,12 @@ #include /* anon_vma_prepare */ #include /* set_pte_at_notify */ #include /* try_to_free_swap */ +#include /* user_enable_single_step */ +#include /* notifier mechanism */ #include +static struct srcu_struct uprobes_srcu; static struct rb_root uprobes_tree = RB_ROOT; static DEFINE_SPINLOCK(uprobes_treelock); /* serialize rbtree access */ @@ -486,6 +489,9 @@ static struct uprobe *insert_uprobe(struct uprobe *uprobe) u = __insert_uprobe(uprobe); spin_unlock_irqrestore(&uprobes_treelock, flags); + /* For now assume that the instruction need not be single-stepped */ + uprobe->flags |= UPROBE_SKIP_SSTEP; + return u; } @@ -523,6 +529,21 @@ static struct uprobe *alloc_uprobe(struct inode *inode, loff_t offset) return uprobe; } +static void handler_chain(struct uprobe *uprobe, struct pt_regs *regs) +{ + struct uprobe_consumer *uc; + + if (!(uprobe->flags & UPROBE_RUN_HANDLER)) + return; + + down_read(&uprobe->consumer_rwsem); + for (uc = uprobe->consumers; uc; uc = uc->next) { + if (!uc->filter || uc->filter(uc, current)) + uc->handler(uc, regs); + } + up_read(&uprobe->consumer_rwsem); +} + /* Returns the previous consumer */ static struct uprobe_consumer * consumer_add(struct uprobe *uprobe, struct uprobe_consumer *uc) @@ -645,7 +666,7 @@ install_breakpoint(struct uprobe *uprobe, struct mm_struct *mm, if (is_swbp_insn((uprobe_opcode_t *)uprobe->arch.insn)) return -EEXIST; - ret = arch_uprobes_analyze_insn(&uprobe->arch, mm); + ret = arch_uprobe_analyze_insn(&uprobe->arch, mm); if (ret) return ret; @@ -662,10 +683,21 @@ remove_breakpoint(struct uprobe *uprobe, struct mm_struct *mm, loff_t vaddr) set_orig_insn(&uprobe->arch, mm, (unsigned long)vaddr, true); } +/* + * There could be threads that have hit the breakpoint and are entering the + * notifier code and trying to acquire the uprobes_treelock. The thread + * calling delete_uprobe() that is removing the uprobe from the rb_tree can + * race with these threads and might acquire the uprobes_treelock compared + * to some of the breakpoint hit threads. In such a case, the breakpoint + * hit threads will not find the uprobe. The current unregistering thread + * waits till all other threads have hit a breakpoint, to acquire the + * uprobes_treelock before the uprobe is removed from the rbtree. + */ static void delete_uprobe(struct uprobe *uprobe) { unsigned long flags; + synchronize_srcu(&uprobes_srcu); spin_lock_irqsave(&uprobes_treelock, flags); rb_erase(&uprobe->rb_node, &uprobes_tree); spin_unlock_irqrestore(&uprobes_treelock, flags); @@ -1010,6 +1042,288 @@ int uprobe_mmap(struct vm_area_struct *vma) return ret; } +/** + * uprobe_get_swbp_addr - compute address of swbp given post-swbp regs + * @regs: Reflects the saved state of the task after it has hit a breakpoint + * instruction. + * Return the address of the breakpoint instruction. + */ +unsigned long __weak uprobe_get_swbp_addr(struct pt_regs *regs) +{ + return instruction_pointer(regs) - UPROBE_SWBP_INSN_SIZE; +} + +/* + * Called with no locks held. + * Called in context of a exiting or a exec-ing thread. + */ +void uprobe_free_utask(struct task_struct *t) +{ + struct uprobe_task *utask = t->utask; + + if (t->uprobe_srcu_id != -1) + srcu_read_unlock_raw(&uprobes_srcu, t->uprobe_srcu_id); + + if (!utask) + return; + + if (utask->active_uprobe) + put_uprobe(utask->active_uprobe); + + kfree(utask); + t->utask = NULL; +} + +/* + * Called in context of a new clone/fork from copy_process. + */ +void uprobe_copy_process(struct task_struct *t) +{ + t->utask = NULL; + t->uprobe_srcu_id = -1; +} + +/* + * Allocate a uprobe_task object for the task. + * Called when the thread hits a breakpoint for the first time. + * + * Returns: + * - pointer to new uprobe_task on success + * - NULL otherwise + */ +static struct uprobe_task *add_utask(void) +{ + struct uprobe_task *utask; + + utask = kzalloc(sizeof *utask, GFP_KERNEL); + if (unlikely(!utask)) + return NULL; + + utask->active_uprobe = NULL; + current->utask = utask; + return utask; +} + +/* Prepare to single-step probed instruction out of line. */ +static int +pre_ssout(struct uprobe *uprobe, struct pt_regs *regs, unsigned long vaddr) +{ + return -EFAULT; +} + +/* + * If we are singlestepping, then ensure this thread is not connected to + * non-fatal signals until completion of singlestep. When xol insn itself + * triggers the signal, restart the original insn even if the task is + * already SIGKILL'ed (since coredump should report the correct ip). This + * is even more important if the task has a handler for SIGSEGV/etc, The + * _same_ instruction should be repeated again after return from the signal + * handler, and SSTEP can never finish in this case. + */ +bool uprobe_deny_signal(void) +{ + struct task_struct *t = current; + struct uprobe_task *utask = t->utask; + + if (likely(!utask || !utask->active_uprobe)) + return false; + + WARN_ON_ONCE(utask->state != UTASK_SSTEP); + + if (signal_pending(t)) { + spin_lock_irq(&t->sighand->siglock); + clear_tsk_thread_flag(t, TIF_SIGPENDING); + spin_unlock_irq(&t->sighand->siglock); + + if (__fatal_signal_pending(t) || arch_uprobe_xol_was_trapped(t)) { + utask->state = UTASK_SSTEP_TRAPPED; + set_tsk_thread_flag(t, TIF_UPROBE); + set_tsk_thread_flag(t, TIF_NOTIFY_RESUME); + } + } + + return true; +} + +/* + * Avoid singlestepping the original instruction if the original instruction + * is a NOP or can be emulated. + */ +static bool can_skip_sstep(struct uprobe *uprobe, struct pt_regs *regs) +{ + if (arch_uprobe_skip_sstep(&uprobe->arch, regs)) + return true; + + uprobe->flags &= ~UPROBE_SKIP_SSTEP; + return false; +} + +/* + * Run handler and ask thread to singlestep. + * Ensure all non-fatal signals cannot interrupt thread while it singlesteps. + */ +static void handle_swbp(struct pt_regs *regs) +{ + struct vm_area_struct *vma; + struct uprobe_task *utask; + struct uprobe *uprobe; + struct mm_struct *mm; + unsigned long bp_vaddr; + + uprobe = NULL; + bp_vaddr = uprobe_get_swbp_addr(regs); + mm = current->mm; + down_read(&mm->mmap_sem); + vma = find_vma(mm, bp_vaddr); + + if (vma && vma->vm_start <= bp_vaddr && valid_vma(vma, false)) { + struct inode *inode; + loff_t offset; + + inode = vma->vm_file->f_mapping->host; + offset = bp_vaddr - vma->vm_start; + offset += (vma->vm_pgoff << PAGE_SHIFT); + uprobe = find_uprobe(inode, offset); + } + + srcu_read_unlock_raw(&uprobes_srcu, current->uprobe_srcu_id); + current->uprobe_srcu_id = -1; + up_read(&mm->mmap_sem); + + if (!uprobe) { + /* No matching uprobe; signal SIGTRAP. */ + send_sig(SIGTRAP, current, 0); + return; + } + + utask = current->utask; + if (!utask) { + utask = add_utask(); + /* Cannot allocate; re-execute the instruction. */ + if (!utask) + goto cleanup_ret; + } + utask->active_uprobe = uprobe; + handler_chain(uprobe, regs); + if (uprobe->flags & UPROBE_SKIP_SSTEP && can_skip_sstep(uprobe, regs)) + goto cleanup_ret; + + utask->state = UTASK_SSTEP; + if (!pre_ssout(uprobe, regs, bp_vaddr)) { + user_enable_single_step(current); + return; + } + +cleanup_ret: + if (utask) { + utask->active_uprobe = NULL; + utask->state = UTASK_RUNNING; + } + if (uprobe) { + if (!(uprobe->flags & UPROBE_SKIP_SSTEP)) + + /* + * cannot singlestep; cannot skip instruction; + * re-execute the instruction. + */ + instruction_pointer_set(regs, bp_vaddr); + + put_uprobe(uprobe); + } +} + +/* + * Perform required fix-ups and disable singlestep. + * Allow pending signals to take effect. + */ +static void handle_singlestep(struct uprobe_task *utask, struct pt_regs *regs) +{ + struct uprobe *uprobe; + + uprobe = utask->active_uprobe; + if (utask->state == UTASK_SSTEP_ACK) + arch_uprobe_post_xol(&uprobe->arch, regs); + else if (utask->state == UTASK_SSTEP_TRAPPED) + arch_uprobe_abort_xol(&uprobe->arch, regs); + else + WARN_ON_ONCE(1); + + put_uprobe(uprobe); + utask->active_uprobe = NULL; + utask->state = UTASK_RUNNING; + user_disable_single_step(current); + + spin_lock_irq(¤t->sighand->siglock); + recalc_sigpending(); /* see uprobe_deny_signal() */ + spin_unlock_irq(¤t->sighand->siglock); +} + +/* + * On breakpoint hit, breakpoint notifier sets the TIF_UPROBE flag. (and on + * subsequent probe hits on the thread sets the state to UTASK_BP_HIT) and + * allows the thread to return from interrupt. + * + * On singlestep exception, singlestep notifier sets the TIF_UPROBE flag and + * also sets the state to UTASK_SSTEP_ACK and allows the thread to return from + * interrupt. + * + * While returning to userspace, thread notices the TIF_UPROBE flag and calls + * uprobe_notify_resume(). + */ +void uprobe_notify_resume(struct pt_regs *regs) +{ + struct uprobe_task *utask; + + utask = current->utask; + if (!utask || utask->state == UTASK_BP_HIT) + handle_swbp(regs); + else + handle_singlestep(utask, regs); +} + +/* + * uprobe_pre_sstep_notifier gets called from interrupt context as part of + * notifier mechanism. Set TIF_UPROBE flag and indicate breakpoint hit. + */ +int uprobe_pre_sstep_notifier(struct pt_regs *regs) +{ + struct uprobe_task *utask; + + if (!current->mm) + return 0; + + utask = current->utask; + if (utask) + utask->state = UTASK_BP_HIT; + + set_thread_flag(TIF_UPROBE); + current->uprobe_srcu_id = srcu_read_lock_raw(&uprobes_srcu); + + return 1; +} + +/* + * uprobe_post_sstep_notifier gets called in interrupt context as part of notifier + * mechanism. Set TIF_UPROBE flag and indicate completion of singlestep. + */ +int uprobe_post_sstep_notifier(struct pt_regs *regs) +{ + struct uprobe_task *utask = current->utask; + + if (!current->mm || !utask || !utask->active_uprobe) + /* task is currently not uprobed */ + return 0; + + utask->state = UTASK_SSTEP_ACK; + set_thread_flag(TIF_UPROBE); + return 1; +} + +static struct notifier_block uprobe_exception_nb = { + .notifier_call = arch_uprobe_exception_notify, + .priority = INT_MAX-1, /* notified after kprobes, kgdb */ +}; + static int __init init_uprobes(void) { int i; @@ -1018,12 +1332,13 @@ static int __init init_uprobes(void) mutex_init(&uprobes_mutex[i]); mutex_init(&uprobes_mmap_mutex[i]); } - return 0; + init_srcu_struct(&uprobes_srcu); + + return register_die_notifier(&uprobe_exception_nb); } +module_init(init_uprobes); static void __exit exit_uprobes(void) { } - -module_init(init_uprobes); module_exit(exit_uprobes); diff --git a/kernel/fork.c b/kernel/fork.c index e2cd3e2a5ae8..eb7b63334009 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -67,6 +67,7 @@ #include #include #include +#include #include #include @@ -701,6 +702,8 @@ void mm_release(struct task_struct *tsk, struct mm_struct *mm) exit_pi_state_list(tsk); #endif + uprobe_free_utask(tsk); + /* Get rid of any cached register state */ deactivate_mm(tsk, mm); @@ -1295,6 +1298,7 @@ static struct task_struct *copy_process(unsigned long clone_flags, INIT_LIST_HEAD(&p->pi_state_list); p->pi_state_cache = NULL; #endif + uprobe_copy_process(p); /* * sigaltstack should be cleared when sharing the same VM */ diff --git a/kernel/signal.c b/kernel/signal.c index 8511e39813c7..e93ff0a719a0 100644 --- a/kernel/signal.c +++ b/kernel/signal.c @@ -29,6 +29,7 @@ #include #include #include +#include #define CREATE_TRACE_POINTS #include @@ -2192,6 +2193,9 @@ int get_signal_to_deliver(siginfo_t *info, struct k_sigaction *return_ka, struct signal_struct *signal = current->signal; int signr; + if (unlikely(uprobe_deny_signal())) + return 0; + relock: /* * We'll jump back here after any time we were stopped in TASK_STOPPED. -- cgit v1.2.3 From 598971bfbdfdc8701337dc1636c7919c44699914 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Mon, 19 Mar 2012 15:10:58 -0700 Subject: cfq: don't use icq_get_changed() cfq caches the associated cfqq's for a given cic. The cache needs to be flushed if the cic's ioprio or blkcg has changed. It is currently done by requiring the changing action to set the respective ICQ_*_CHANGED bit in the icq and testing it from cfq_set_request(), which involves iterating through all the affected icqs. All cfq wants to know is whether ioprio and/or blkcg have changed since the last flush and can be easily achieved by just remembering the current ioprio and blkcg ID in cic. This patch adds cic->{ioprio|blkcg_id}, updates all ioprio users to use the remembered value instead, and updates cfq_set_request() path such that, instead of using icq_get_changed(), the current values are compared against the remembered ones and trigger appropriate flush action if not. Condition tests are moved inside both _changed functions which are now named check_ioprio_changed() and check_blkcg_changed(). ioprio.h::task_ioprio*() can't be used anymore and replaced with open-coded IOPRIO_CLASS_NONE case in cfq_async_queue_prio(). Signed-off-by: Tejun Heo Cc: Vivek Goyal Signed-off-by: Jens Axboe --- block/cfq-iosched.c | 63 ++++++++++++++++++++++++++++++++------------------ include/linux/ioprio.h | 22 ++++-------------- 2 files changed, 45 insertions(+), 40 deletions(-) (limited to 'include') diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index 9e8624e9e246..7c3893d4447a 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -218,6 +218,10 @@ struct cfq_io_cq { struct io_cq icq; /* must be the first member */ struct cfq_queue *cfqq[2]; struct cfq_ttime ttime; + int ioprio; /* the current ioprio */ +#ifdef CONFIG_CFQ_GROUP_IOSCHED + uint64_t blkcg_id; /* the current blkcg ID */ +#endif }; /* @@ -2568,7 +2572,7 @@ static void cfq_init_prio_data(struct cfq_queue *cfqq, struct cfq_io_cq *cic) if (!cfq_cfqq_prio_changed(cfqq)) return; - ioprio_class = IOPRIO_PRIO_CLASS(cic->icq.ioc->ioprio); + ioprio_class = IOPRIO_PRIO_CLASS(cic->ioprio); switch (ioprio_class) { default: printk(KERN_ERR "cfq: bad prio %x\n", ioprio_class); @@ -2580,11 +2584,11 @@ static void cfq_init_prio_data(struct cfq_queue *cfqq, struct cfq_io_cq *cic) cfqq->ioprio_class = task_nice_ioclass(tsk); break; case IOPRIO_CLASS_RT: - cfqq->ioprio = task_ioprio(cic->icq.ioc); + cfqq->ioprio = IOPRIO_PRIO_DATA(cic->ioprio); cfqq->ioprio_class = IOPRIO_CLASS_RT; break; case IOPRIO_CLASS_BE: - cfqq->ioprio = task_ioprio(cic->icq.ioc); + cfqq->ioprio = IOPRIO_PRIO_DATA(cic->ioprio); cfqq->ioprio_class = IOPRIO_CLASS_BE; break; case IOPRIO_CLASS_IDLE: @@ -2602,12 +2606,17 @@ static void cfq_init_prio_data(struct cfq_queue *cfqq, struct cfq_io_cq *cic) cfq_clear_cfqq_prio_changed(cfqq); } -static void changed_ioprio(struct cfq_io_cq *cic, struct bio *bio) +static void check_ioprio_changed(struct cfq_io_cq *cic, struct bio *bio) { + int ioprio = cic->icq.ioc->ioprio; struct cfq_data *cfqd = cic_to_cfqd(cic); struct cfq_queue *cfqq; - if (unlikely(!cfqd)) + /* + * Check whether ioprio has changed. The condition may trigger + * spuriously on a newly created cic but there's no harm. + */ + if (unlikely(!cfqd) || likely(cic->ioprio == ioprio)) return; cfqq = cic->cfqq[BLK_RW_ASYNC]; @@ -2624,6 +2633,8 @@ static void changed_ioprio(struct cfq_io_cq *cic, struct bio *bio) cfqq = cic->cfqq[BLK_RW_SYNC]; if (cfqq) cfq_mark_cfqq_prio_changed(cfqq); + + cic->ioprio = ioprio; } static void cfq_init_cfqq(struct cfq_data *cfqd, struct cfq_queue *cfqq, @@ -2647,17 +2658,24 @@ static void cfq_init_cfqq(struct cfq_data *cfqd, struct cfq_queue *cfqq, } #ifdef CONFIG_CFQ_GROUP_IOSCHED -static void changed_cgroup(struct cfq_io_cq *cic) +static void check_blkcg_changed(struct cfq_io_cq *cic, struct bio *bio) { - struct cfq_queue *sync_cfqq = cic_to_cfqq(cic, 1); struct cfq_data *cfqd = cic_to_cfqd(cic); - struct request_queue *q; + struct cfq_queue *sync_cfqq; + uint64_t id; - if (unlikely(!cfqd)) - return; + rcu_read_lock(); + id = bio_blkio_cgroup(bio)->id; + rcu_read_unlock(); - q = cfqd->queue; + /* + * Check whether blkcg has changed. The condition may trigger + * spuriously on a newly created cic but there's no harm. + */ + if (unlikely(!cfqd) || likely(cic->blkcg_id == id)) + return; + sync_cfqq = cic_to_cfqq(cic, 1); if (sync_cfqq) { /* * Drop reference to sync queue. A new sync queue will be @@ -2667,7 +2685,11 @@ static void changed_cgroup(struct cfq_io_cq *cic) cic_set_cfqq(cic, NULL, 1); cfq_put_queue(sync_cfqq); } + + cic->blkcg_id = id; } +#else +static inline void check_blkcg_changed(struct cfq_io_cq *cic, struct bio *bio) { } #endif /* CONFIG_CFQ_GROUP_IOSCHED */ static struct cfq_queue * @@ -2731,6 +2753,9 @@ cfq_async_queue_prio(struct cfq_data *cfqd, int ioprio_class, int ioprio) switch (ioprio_class) { case IOPRIO_CLASS_RT: return &cfqd->async_cfqq[0][ioprio]; + case IOPRIO_CLASS_NONE: + ioprio = IOPRIO_NORM; + /* fall through */ case IOPRIO_CLASS_BE: return &cfqd->async_cfqq[1][ioprio]; case IOPRIO_CLASS_IDLE: @@ -2744,8 +2769,8 @@ static struct cfq_queue * cfq_get_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_cq *cic, struct bio *bio, gfp_t gfp_mask) { - const int ioprio = task_ioprio(cic->icq.ioc); - const int ioprio_class = task_ioprio_class(cic->icq.ioc); + const int ioprio_class = IOPRIO_PRIO_CLASS(cic->ioprio); + const int ioprio = IOPRIO_PRIO_DATA(cic->ioprio); struct cfq_queue **async_cfqq = NULL; struct cfq_queue *cfqq = NULL; @@ -3303,21 +3328,13 @@ cfq_set_request(struct request_queue *q, struct request *rq, struct bio *bio, const int rw = rq_data_dir(rq); const bool is_sync = rq_is_sync(rq); struct cfq_queue *cfqq; - unsigned int changed; might_sleep_if(gfp_mask & __GFP_WAIT); spin_lock_irq(q->queue_lock); - /* handle changed notifications */ - changed = icq_get_changed(&cic->icq); - if (unlikely(changed & ICQ_IOPRIO_CHANGED)) - changed_ioprio(cic, bio); -#ifdef CONFIG_CFQ_GROUP_IOSCHED - if (unlikely(changed & ICQ_CGROUP_CHANGED)) - changed_cgroup(cic); -#endif - + check_ioprio_changed(cic, bio); + check_blkcg_changed(cic, bio); new_queue: cfqq = cic_to_cfqq(cic, is_sync); if (!cfqq || cfqq == &cfqd->oom_cfqq) { diff --git a/include/linux/ioprio.h b/include/linux/ioprio.h index 76dad4808847..beb9ce1c2c23 100644 --- a/include/linux/ioprio.h +++ b/include/linux/ioprio.h @@ -42,26 +42,14 @@ enum { }; /* - * if process has set io priority explicitly, use that. if not, convert - * the cpu scheduler nice value to an io priority + * Fallback BE priority */ #define IOPRIO_NORM (4) -static inline int task_ioprio(struct io_context *ioc) -{ - if (ioprio_valid(ioc->ioprio)) - return IOPRIO_PRIO_DATA(ioc->ioprio); - - return IOPRIO_NORM; -} - -static inline int task_ioprio_class(struct io_context *ioc) -{ - if (ioprio_valid(ioc->ioprio)) - return IOPRIO_PRIO_CLASS(ioc->ioprio); - - return IOPRIO_CLASS_BE; -} +/* + * if process has set io priority explicitly, use that. if not, convert + * the cpu scheduler nice value to an io priority + */ static inline int task_nice_ioprio(struct task_struct *task) { return (task_nice(task) + 20) / 5; -- cgit v1.2.3 From 2b566fa55b9a94b53217c2818e6c5e5756eeb1a1 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Mon, 19 Mar 2012 15:10:59 -0700 Subject: block: remove ioc_*_changed() After the previous patch to cfq, there's no ioc_get_changed() user left. This patch yanks out ioc_{ioprio|cgroup|get}_changed() and all related stuff. Signed-off-by: Tejun Heo Cc: Vivek Goyal Signed-off-by: Jens Axboe --- block/blk-cgroup.c | 19 ------------- block/blk-ioc.c | 68 ----------------------------------------------- fs/ioprio.c | 2 +- include/linux/iocontext.h | 7 ----- 4 files changed, 1 insertion(+), 95 deletions(-) (limited to 'include') diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 30e07308db24..a74019b67311 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -47,8 +47,6 @@ static struct cgroup_subsys_state *blkiocg_create(struct cgroup_subsys *, struct cgroup *); static int blkiocg_can_attach(struct cgroup_subsys *, struct cgroup *, struct cgroup_taskset *); -static void blkiocg_attach(struct cgroup_subsys *, struct cgroup *, - struct cgroup_taskset *); static int blkiocg_pre_destroy(struct cgroup_subsys *, struct cgroup *); static void blkiocg_destroy(struct cgroup_subsys *, struct cgroup *); static int blkiocg_populate(struct cgroup_subsys *, struct cgroup *); @@ -63,7 +61,6 @@ struct cgroup_subsys blkio_subsys = { .name = "blkio", .create = blkiocg_create, .can_attach = blkiocg_can_attach, - .attach = blkiocg_attach, .pre_destroy = blkiocg_pre_destroy, .destroy = blkiocg_destroy, .populate = blkiocg_populate, @@ -1729,22 +1726,6 @@ static int blkiocg_can_attach(struct cgroup_subsys *ss, struct cgroup *cgrp, return ret; } -static void blkiocg_attach(struct cgroup_subsys *ss, struct cgroup *cgrp, - struct cgroup_taskset *tset) -{ - struct task_struct *task; - struct io_context *ioc; - - cgroup_taskset_for_each(task, cgrp, tset) { - /* we don't lose anything even if ioc allocation fails */ - ioc = get_task_io_context(task, GFP_ATOMIC, NUMA_NO_NODE); - if (ioc) { - ioc_cgroup_changed(ioc); - put_io_context(ioc); - } - } -} - static void blkcg_bypass_start(void) __acquires(&all_q_mutex) { diff --git a/block/blk-ioc.c b/block/blk-ioc.c index 439ec21fd787..3f3dd51a1280 100644 --- a/block/blk-ioc.c +++ b/block/blk-ioc.c @@ -388,74 +388,6 @@ struct io_cq *ioc_create_icq(struct io_context *ioc, struct request_queue *q, return icq; } -void ioc_set_icq_flags(struct io_context *ioc, unsigned int flags) -{ - struct io_cq *icq; - struct hlist_node *n; - - hlist_for_each_entry(icq, n, &ioc->icq_list, ioc_node) - icq->flags |= flags; -} - -/** - * ioc_ioprio_changed - notify ioprio change - * @ioc: io_context of interest - * @ioprio: new ioprio - * - * @ioc's ioprio has changed to @ioprio. Set %ICQ_IOPRIO_CHANGED for all - * icq's. iosched is responsible for checking the bit and applying it on - * request issue path. - */ -void ioc_ioprio_changed(struct io_context *ioc, int ioprio) -{ - unsigned long flags; - - spin_lock_irqsave(&ioc->lock, flags); - ioc->ioprio = ioprio; - ioc_set_icq_flags(ioc, ICQ_IOPRIO_CHANGED); - spin_unlock_irqrestore(&ioc->lock, flags); -} - -/** - * ioc_cgroup_changed - notify cgroup change - * @ioc: io_context of interest - * - * @ioc's cgroup has changed. Set %ICQ_CGROUP_CHANGED for all icq's. - * iosched is responsible for checking the bit and applying it on request - * issue path. - */ -void ioc_cgroup_changed(struct io_context *ioc) -{ - unsigned long flags; - - spin_lock_irqsave(&ioc->lock, flags); - ioc_set_icq_flags(ioc, ICQ_CGROUP_CHANGED); - spin_unlock_irqrestore(&ioc->lock, flags); -} -EXPORT_SYMBOL(ioc_cgroup_changed); - -/** - * icq_get_changed - fetch and clear icq changed mask - * @icq: icq of interest - * - * Fetch and clear ICQ_*_CHANGED bits from @icq. Grabs and releases - * @icq->ioc->lock. - */ -unsigned icq_get_changed(struct io_cq *icq) -{ - unsigned int changed = 0; - unsigned long flags; - - if (unlikely(icq->flags & ICQ_CHANGED_MASK)) { - spin_lock_irqsave(&icq->ioc->lock, flags); - changed = icq->flags & ICQ_CHANGED_MASK; - icq->flags &= ~ICQ_CHANGED_MASK; - spin_unlock_irqrestore(&icq->ioc->lock, flags); - } - return changed; -} -EXPORT_SYMBOL(icq_get_changed); - static int __init blk_ioc_init(void) { iocontext_cachep = kmem_cache_create("blkdev_ioc", diff --git a/fs/ioprio.c b/fs/ioprio.c index 0f1b9515213b..48644373de58 100644 --- a/fs/ioprio.c +++ b/fs/ioprio.c @@ -50,7 +50,7 @@ int set_task_ioprio(struct task_struct *task, int ioprio) ioc = get_task_io_context(task, GFP_ATOMIC, NUMA_NO_NODE); if (ioc) { - ioc_ioprio_changed(ioc, ioprio); + ioc->ioprio = ioprio; put_io_context(ioc); } diff --git a/include/linux/iocontext.h b/include/linux/iocontext.h index 6f1a2608e91f..df38db2ef45b 100644 --- a/include/linux/iocontext.h +++ b/include/linux/iocontext.h @@ -6,11 +6,7 @@ #include enum { - ICQ_IOPRIO_CHANGED = 1 << 0, - ICQ_CGROUP_CHANGED = 1 << 1, ICQ_EXITED = 1 << 2, - - ICQ_CHANGED_MASK = ICQ_IOPRIO_CHANGED | ICQ_CGROUP_CHANGED, }; /* @@ -152,9 +148,6 @@ void put_io_context_active(struct io_context *ioc); void exit_io_context(struct task_struct *task); struct io_context *get_task_io_context(struct task_struct *task, gfp_t gfp_flags, int node); -void ioc_ioprio_changed(struct io_context *ioc, int ioprio); -void ioc_cgroup_changed(struct io_context *ioc); -unsigned int icq_get_changed(struct io_cq *icq); #else struct io_context; static inline void put_io_context(struct io_context *ioc) { } -- cgit v1.2.3 From 777ee96f50d8c3ac4ff3dde9ad69c22779ac88cb Mon Sep 17 00:00:00 2001 From: Daniel Vetter Date: Wed, 15 Feb 2012 23:50:25 +0100 Subject: drm/i915: add HAS_ALIASING_PPGTT parameter for userspace On Sanybridge a few MI read/write commands only work when ppgtt is enabled. Userspace therefore needs to be able to check whether ppgtt is enabled. For added hilarity, you need to reset the "use global GTT" bit on snb when ppgtt is enabled, otherwise it won't work. Despite what bspec says about automatically using ppgtt ... Luckily PIPE_CONTROL (the only write cmd current userspace uses) is not affected by all this, as tested by tests/gem_pipe_control_store_loop. Reviewed-and-tested-by: Chris Wilson Signed-Off-by: Daniel Vetter --- drivers/gpu/drm/i915/i915_dma.c | 3 +++ include/drm/i915_drm.h | 3 ++- 2 files changed, 5 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c index 3c086d707a91..fdff0097cf2b 100644 --- a/drivers/gpu/drm/i915/i915_dma.c +++ b/drivers/gpu/drm/i915/i915_dma.c @@ -790,6 +790,9 @@ static int i915_getparam(struct drm_device *dev, void *data, case I915_PARAM_HAS_LLC: value = HAS_LLC(dev); break; + case I915_PARAM_HAS_ALIASING_PPGTT: + value = dev_priv->mm.aliasing_ppgtt ? 1 : 0; + break; default: DRM_DEBUG_DRIVER("Unknown parameter %d\n", param->param); diff --git a/include/drm/i915_drm.h b/include/drm/i915_drm.h index da929bb5b788..f3f82242bf1d 100644 --- a/include/drm/i915_drm.h +++ b/include/drm/i915_drm.h @@ -296,7 +296,8 @@ typedef struct drm_i915_irq_wait { #define I915_PARAM_HAS_EXEC_CONSTANTS 14 #define I915_PARAM_HAS_RELAXED_DELTA 15 #define I915_PARAM_HAS_GEN7_SOL_RESET 16 -#define I915_PARAM_HAS_LLC 17 +#define I915_PARAM_HAS_LLC 17 +#define I915_PARAM_HAS_ALIASING_PPGTT 18 typedef struct drm_i915_getparam { int param; -- cgit v1.2.3 From 6d5cd9cb1e32e4f4e4468704430b26bcb0bfb129 Mon Sep 17 00:00:00 2001 From: Daniel Vetter Date: Sun, 25 Mar 2012 19:47:30 +0200 Subject: drm: add helper to clflush a virtual address range Useful when the page is already mapped to copy date in/out. For -stable because the next patch (fixing phys obj pwrite) needs this little helper function. Tested-by: Chris Wilson Reviewed-by: Chris Wilson Cc: dri-devel@lists.freedesktop.org Signed-off-by: Daniel Vetter --- drivers/gpu/drm/drm_cache.c | 23 +++++++++++++++++++++++ include/drm/drmP.h | 1 + 2 files changed, 24 insertions(+) (limited to 'include') diff --git a/drivers/gpu/drm/drm_cache.c b/drivers/gpu/drm/drm_cache.c index 592865381c6e..c7c8f6b5786f 100644 --- a/drivers/gpu/drm/drm_cache.c +++ b/drivers/gpu/drm/drm_cache.c @@ -98,3 +98,26 @@ drm_clflush_pages(struct page *pages[], unsigned long num_pages) #endif } EXPORT_SYMBOL(drm_clflush_pages); + +void +drm_clflush_virt_range(char *addr, unsigned long length) +{ +#if defined(CONFIG_X86) + if (cpu_has_clflush) { + char *end = addr + length; + mb(); + for (; addr < end; addr += boot_cpu_data.x86_clflush_size) + clflush(addr); + clflush(end - 1); + mb(); + return; + } + + if (on_each_cpu(drm_clflush_ipi_handler, NULL, 1) != 0) + printk(KERN_ERR "Timed out waiting for cache flush.\n"); +#else + printk(KERN_ERR "Architecture has no drm_cache.c support\n"); + WARN_ON_ONCE(1); +#endif +} +EXPORT_SYMBOL(drm_clflush_virt_range); diff --git a/include/drm/drmP.h b/include/drm/drmP.h index 92f0981b5fb8..d33597bcc77c 100644 --- a/include/drm/drmP.h +++ b/include/drm/drmP.h @@ -1332,6 +1332,7 @@ extern int drm_remove_magic(struct drm_master *master, drm_magic_t magic); /* Cache management (drm_cache.c) */ void drm_clflush_pages(struct page *pages[], unsigned long num_pages); +void drm_clflush_virt_range(char *addr, unsigned long length); /* Locking IOCTL support (drm_lock.h) */ extern int drm_lock(struct drm_device *dev, void *data, -- cgit v1.2.3 From f56f821feb7b36223f309e0ec05986bb137ce418 Mon Sep 17 00:00:00 2001 From: Daniel Vetter Date: Sun, 25 Mar 2012 19:47:41 +0200 Subject: mm: extend prefault helpers to fault in more than PAGE_SIZE drm/i915 wants to read/write more than one page in its fastpath and hence needs to prefault more than PAGE_SIZE bytes. Add new functions in filemap.h to make that possible. Also kill a copy&pasted spurious space in both functions while at it. v2: As suggested by Andrew Morton, add a multipage parameter to both functions to avoid the additional branch for the pagemap.c hotpath. My gcc 4.6 here seems to dtrt and indeed reap these branches where not needed. v3: Becaus I couldn't find a way around adding a uaddr += PAGE_SIZE to the filemap.c hotpaths (that the compiler couldn't remove again), let's go with separate new functions for the multipage use-case. v4: Adjust comment to CodingStlye and fix spelling. Acked-by: Andrew Morton Signed-off-by: Daniel Vetter --- drivers/gpu/drm/i915/i915_gem.c | 6 +-- drivers/gpu/drm/i915/i915_gem_execbuffer.c | 2 +- include/linux/pagemap.h | 64 +++++++++++++++++++++++++++++- 3 files changed, 66 insertions(+), 6 deletions(-) (limited to 'include') diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index e9cac478cced..6dc832902f53 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -416,7 +416,7 @@ i915_gem_shmem_pread(struct drm_device *dev, mutex_unlock(&dev->struct_mutex); if (!prefaulted) { - ret = fault_in_pages_writeable(user_data, remain); + ret = fault_in_multipages_writeable(user_data, remain); /* Userspace is tricking us, but we've already clobbered * its pages with the prefault and promised to write the * data up to the first fault. Hence ignore any errors @@ -809,8 +809,8 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data, args->size)) return -EFAULT; - ret = fault_in_pages_readable((char __user *)(uintptr_t)args->data_ptr, - args->size); + ret = fault_in_multipages_readable((char __user *)(uintptr_t)args->data_ptr, + args->size); if (ret) return -EFAULT; diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c index eb85860001ec..8e0b686d3afb 100644 --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c @@ -997,7 +997,7 @@ validate_exec_list(struct drm_i915_gem_exec_object2 *exec, if (!access_ok(VERIFY_WRITE, ptr, length)) return -EFAULT; - if (fault_in_pages_readable(ptr, length)) + if (fault_in_multipages_readable(ptr, length)) return -EFAULT; } diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index cfaaa6949b8b..c93a9a9bcd35 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -426,7 +426,7 @@ static inline int fault_in_pages_writeable(char __user *uaddr, int size) */ if (((unsigned long)uaddr & PAGE_MASK) != ((unsigned long)end & PAGE_MASK)) - ret = __put_user(0, end); + ret = __put_user(0, end); } return ret; } @@ -445,13 +445,73 @@ static inline int fault_in_pages_readable(const char __user *uaddr, int size) if (((unsigned long)uaddr & PAGE_MASK) != ((unsigned long)end & PAGE_MASK)) { - ret = __get_user(c, end); + ret = __get_user(c, end); (void)c; } } return ret; } +/* + * Multipage variants of the above prefault helpers, useful if more than + * PAGE_SIZE of data needs to be prefaulted. These are separate from the above + * functions (which only handle up to PAGE_SIZE) to avoid clobbering the + * filemap.c hotpaths. + */ +static inline int fault_in_multipages_writeable(char __user *uaddr, int size) +{ + int ret; + const char __user *end = uaddr + size - 1; + + if (unlikely(size == 0)) + return 0; + + /* + * Writing zeroes into userspace here is OK, because we know that if + * the zero gets there, we'll be overwriting it. + */ + while (uaddr <= end) { + ret = __put_user(0, uaddr); + if (ret != 0) + return ret; + uaddr += PAGE_SIZE; + } + + /* Check whether the range spilled into the next page. */ + if (((unsigned long)uaddr & PAGE_MASK) == + ((unsigned long)end & PAGE_MASK)) + ret = __put_user(0, end); + + return ret; +} + +static inline int fault_in_multipages_readable(const char __user *uaddr, + int size) +{ + volatile char c; + int ret; + const char __user *end = uaddr + size - 1; + + if (unlikely(size == 0)) + return 0; + + while (uaddr <= end) { + ret = __get_user(c, uaddr); + if (ret != 0) + return ret; + uaddr += PAGE_SIZE; + } + + /* Check whether the range spilled into the next page. */ + if (((unsigned long)uaddr & PAGE_MASK) == + ((unsigned long)end & PAGE_MASK)) { + ret = __get_user(c, end); + (void)c; + } + + return ret; +} + int add_to_page_cache_locked(struct page *page, struct address_space *mapping, pgoff_t index, gfp_t gfp_mask); int add_to_page_cache_lru(struct page *page, struct address_space *mapping, -- cgit v1.2.3 From d4b3b6384f98f8692ad0209891ccdbc7e78bbefe Mon Sep 17 00:00:00 2001 From: Srikar Dronamraju Date: Fri, 30 Mar 2012 23:56:31 +0530 Subject: uprobes/core: Allocate XOL slots for uprobes use Uprobes executes the original instruction at a probed location out of line. For this, we allocate a page (per mm) upon the first uprobe hit, in the process user address space, divide it into slots that are used to store the actual instructions to be singlestepped. These slots are known as xol (execution out of line) slots. Care is taken to ensure that the allocation is in an unmapped area as close to the top of the user address space as possible, with appropriate permission settings to keep selinux like frameworks happy. Upon a uprobe hit, a free slot is acquired, and is released after the singlestep completes. Lots of improvements courtesy suggestions/inputs from Peter and Oleg. [ Folded a fix for build issue on powerpc fixed and reported by Stephen Rothwell. ] Signed-off-by: Srikar Dronamraju Cc: Linus Torvalds Cc: Ananth N Mavinakayanahalli Cc: Jim Keniston Cc: Linux-mm Cc: Oleg Nesterov Cc: Andi Kleen Cc: Christoph Hellwig Cc: Steven Rostedt Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Anton Arapov Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/20120330182631.10018.48175.sendpatchset@srdronam.in.ibm.com Signed-off-by: Ingo Molnar --- include/linux/mm_types.h | 2 + include/linux/uprobes.h | 34 ++++++++ kernel/events/uprobes.c | 215 +++++++++++++++++++++++++++++++++++++++++++++++ kernel/fork.c | 2 + 4 files changed, 253 insertions(+) (limited to 'include') diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 3cc3062b3767..26574c726121 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -12,6 +12,7 @@ #include #include #include +#include #include #include @@ -388,6 +389,7 @@ struct mm_struct { #ifdef CONFIG_CPUMASK_OFFSTACK struct cpumask cpumask_allocation; #endif + struct uprobes_state uprobes_state; }; static inline void mm_init_cpumask(struct mm_struct *mm) diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h index 5ec778fdce6f..a111460c07d5 100644 --- a/include/linux/uprobes.h +++ b/include/linux/uprobes.h @@ -28,6 +28,8 @@ #include struct vm_area_struct; +struct mm_struct; +struct inode; #ifdef CONFIG_ARCH_SUPPORTS_UPROBES # include @@ -76,6 +78,28 @@ struct uprobe_task { unsigned long vaddr; }; +/* + * On a breakpoint hit, thread contests for a slot. It frees the + * slot after singlestep. Currently a fixed number of slots are + * allocated. + */ +struct xol_area { + wait_queue_head_t wq; /* if all slots are busy */ + atomic_t slot_count; /* number of in-use slots */ + unsigned long *bitmap; /* 0 = free slot */ + struct page *page; + + /* + * We keep the vma's vm_start rather than a pointer to the vma + * itself. The probed process or a naughty kernel module could make + * the vma go away, and we must handle that reasonably gracefully. + */ + unsigned long vaddr; /* Page(s) of instruction slots */ +}; + +struct uprobes_state { + struct xol_area *xol_area; +}; extern int __weak set_swbp(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr); extern int __weak set_orig_insn(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr, bool verify); extern bool __weak is_swbp_insn(uprobe_opcode_t *insn); @@ -90,7 +114,11 @@ extern int uprobe_pre_sstep_notifier(struct pt_regs *regs); extern void uprobe_notify_resume(struct pt_regs *regs); extern bool uprobe_deny_signal(void); extern bool __weak arch_uprobe_skip_sstep(struct arch_uprobe *aup, struct pt_regs *regs); +extern void uprobe_clear_state(struct mm_struct *mm); +extern void uprobe_reset_state(struct mm_struct *mm); #else /* !CONFIG_UPROBES */ +struct uprobes_state { +}; static inline int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc) { @@ -121,5 +149,11 @@ static inline void uprobe_free_utask(struct task_struct *t) static inline void uprobe_copy_process(struct task_struct *t) { } +static inline void uprobe_clear_state(struct mm_struct *mm) +{ +} +static inline void uprobe_reset_state(struct mm_struct *mm) +{ +} #endif /* !CONFIG_UPROBES */ #endif /* _LINUX_UPROBES_H */ diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index b807d1566b64..b395edb97f53 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -35,6 +35,9 @@ #include +#define UINSNS_PER_PAGE (PAGE_SIZE/UPROBE_XOL_SLOT_BYTES) +#define MAX_UPROBE_XOL_SLOTS UINSNS_PER_PAGE + static struct srcu_struct uprobes_srcu; static struct rb_root uprobes_tree = RB_ROOT; @@ -1042,6 +1045,213 @@ int uprobe_mmap(struct vm_area_struct *vma) return ret; } +/* Slot allocation for XOL */ +static int xol_add_vma(struct xol_area *area) +{ + struct mm_struct *mm; + int ret; + + area->page = alloc_page(GFP_HIGHUSER); + if (!area->page) + return -ENOMEM; + + ret = -EALREADY; + mm = current->mm; + + down_write(&mm->mmap_sem); + if (mm->uprobes_state.xol_area) + goto fail; + + ret = -ENOMEM; + + /* Try to map as high as possible, this is only a hint. */ + area->vaddr = get_unmapped_area(NULL, TASK_SIZE - PAGE_SIZE, PAGE_SIZE, 0, 0); + if (area->vaddr & ~PAGE_MASK) { + ret = area->vaddr; + goto fail; + } + + ret = install_special_mapping(mm, area->vaddr, PAGE_SIZE, + VM_EXEC|VM_MAYEXEC|VM_DONTCOPY|VM_IO, &area->page); + if (ret) + goto fail; + + smp_wmb(); /* pairs with get_xol_area() */ + mm->uprobes_state.xol_area = area; + ret = 0; + +fail: + up_write(&mm->mmap_sem); + if (ret) + __free_page(area->page); + + return ret; +} + +static struct xol_area *get_xol_area(struct mm_struct *mm) +{ + struct xol_area *area; + + area = mm->uprobes_state.xol_area; + smp_read_barrier_depends(); /* pairs with wmb in xol_add_vma() */ + + return area; +} + +/* + * xol_alloc_area - Allocate process's xol_area. + * This area will be used for storing instructions for execution out of + * line. + * + * Returns the allocated area or NULL. + */ +static struct xol_area *xol_alloc_area(void) +{ + struct xol_area *area; + + area = kzalloc(sizeof(*area), GFP_KERNEL); + if (unlikely(!area)) + return NULL; + + area->bitmap = kzalloc(BITS_TO_LONGS(UINSNS_PER_PAGE) * sizeof(long), GFP_KERNEL); + + if (!area->bitmap) + goto fail; + + init_waitqueue_head(&area->wq); + if (!xol_add_vma(area)) + return area; + +fail: + kfree(area->bitmap); + kfree(area); + + return get_xol_area(current->mm); +} + +/* + * uprobe_clear_state - Free the area allocated for slots. + */ +void uprobe_clear_state(struct mm_struct *mm) +{ + struct xol_area *area = mm->uprobes_state.xol_area; + + if (!area) + return; + + put_page(area->page); + kfree(area->bitmap); + kfree(area); +} + +/* + * uprobe_reset_state - Free the area allocated for slots. + */ +void uprobe_reset_state(struct mm_struct *mm) +{ + mm->uprobes_state.xol_area = NULL; +} + +/* + * - search for a free slot. + */ +static unsigned long xol_take_insn_slot(struct xol_area *area) +{ + unsigned long slot_addr; + int slot_nr; + + do { + slot_nr = find_first_zero_bit(area->bitmap, UINSNS_PER_PAGE); + if (slot_nr < UINSNS_PER_PAGE) { + if (!test_and_set_bit(slot_nr, area->bitmap)) + break; + + slot_nr = UINSNS_PER_PAGE; + continue; + } + wait_event(area->wq, (atomic_read(&area->slot_count) < UINSNS_PER_PAGE)); + } while (slot_nr >= UINSNS_PER_PAGE); + + slot_addr = area->vaddr + (slot_nr * UPROBE_XOL_SLOT_BYTES); + atomic_inc(&area->slot_count); + + return slot_addr; +} + +/* + * xol_get_insn_slot - If was not allocated a slot, then + * allocate a slot. + * Returns the allocated slot address or 0. + */ +static unsigned long xol_get_insn_slot(struct uprobe *uprobe, unsigned long slot_addr) +{ + struct xol_area *area; + unsigned long offset; + void *vaddr; + + area = get_xol_area(current->mm); + if (!area) { + area = xol_alloc_area(); + if (!area) + return 0; + } + current->utask->xol_vaddr = xol_take_insn_slot(area); + + /* + * Initialize the slot if xol_vaddr points to valid + * instruction slot. + */ + if (unlikely(!current->utask->xol_vaddr)) + return 0; + + current->utask->vaddr = slot_addr; + offset = current->utask->xol_vaddr & ~PAGE_MASK; + vaddr = kmap_atomic(area->page); + memcpy(vaddr + offset, uprobe->arch.insn, MAX_UINSN_BYTES); + kunmap_atomic(vaddr); + + return current->utask->xol_vaddr; +} + +/* + * xol_free_insn_slot - If slot was earlier allocated by + * @xol_get_insn_slot(), make the slot available for + * subsequent requests. + */ +static void xol_free_insn_slot(struct task_struct *tsk) +{ + struct xol_area *area; + unsigned long vma_end; + unsigned long slot_addr; + + if (!tsk->mm || !tsk->mm->uprobes_state.xol_area || !tsk->utask) + return; + + slot_addr = tsk->utask->xol_vaddr; + + if (unlikely(!slot_addr || IS_ERR_VALUE(slot_addr))) + return; + + area = tsk->mm->uprobes_state.xol_area; + vma_end = area->vaddr + PAGE_SIZE; + if (area->vaddr <= slot_addr && slot_addr < vma_end) { + unsigned long offset; + int slot_nr; + + offset = slot_addr - area->vaddr; + slot_nr = offset / UPROBE_XOL_SLOT_BYTES; + if (slot_nr >= UINSNS_PER_PAGE) + return; + + clear_bit(slot_nr, area->bitmap); + atomic_dec(&area->slot_count); + if (waitqueue_active(&area->wq)) + wake_up(&area->wq); + + tsk->utask->xol_vaddr = 0; + } +} + /** * uprobe_get_swbp_addr - compute address of swbp given post-swbp regs * @regs: Reflects the saved state of the task after it has hit a breakpoint @@ -1070,6 +1280,7 @@ void uprobe_free_utask(struct task_struct *t) if (utask->active_uprobe) put_uprobe(utask->active_uprobe); + xol_free_insn_slot(t); kfree(utask); t->utask = NULL; } @@ -1108,6 +1319,9 @@ static struct uprobe_task *add_utask(void) static int pre_ssout(struct uprobe *uprobe, struct pt_regs *regs, unsigned long vaddr) { + if (xol_get_insn_slot(uprobe, vaddr) && !arch_uprobe_pre_xol(&uprobe->arch, regs)) + return 0; + return -EFAULT; } @@ -1252,6 +1466,7 @@ static void handle_singlestep(struct uprobe_task *utask, struct pt_regs *regs) utask->active_uprobe = NULL; utask->state = UTASK_RUNNING; user_disable_single_step(current); + xol_free_insn_slot(current); spin_lock_irq(¤t->sighand->siglock); recalc_sigpending(); /* see uprobe_deny_signal() */ diff --git a/kernel/fork.c b/kernel/fork.c index eb7b63334009..3133b9da59d5 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -554,6 +554,7 @@ void mmput(struct mm_struct *mm) might_sleep(); if (atomic_dec_and_test(&mm->mm_users)) { + uprobe_clear_state(mm); exit_aio(mm); ksm_exit(mm); khugepaged_exit(mm); /* must run before exit_mmap */ @@ -760,6 +761,7 @@ struct mm_struct *dup_mm(struct task_struct *tsk) #ifdef CONFIG_TRANSPARENT_HUGEPAGE mm->pmd_huge_pte = NULL; #endif + uprobe_reset_state(mm); if (!mm_init(mm, tsk)) goto fail_nomem; -- cgit v1.2.3 From 682968e0c425c60f0dde37977e5beb2b12ddc4cc Mon Sep 17 00:00:00 2001 From: Srikar Dronamraju Date: Fri, 30 Mar 2012 23:56:46 +0530 Subject: uprobes/core: Optimize probe hits with the help of a counter Maintain a per-mm counter: number of uprobes that are inserted on this process address space. This counter can be used at probe hit time to determine if we need a lookup in the uprobes rbtree. Everytime a probe gets inserted successfully, the probe count is incremented and everytime a probe gets removed, the probe count is decremented. The new uprobe_munmap hook ensures the count is correct on a unmap or remap of a region. We expect that once a uprobe_munmap() is called, the vma goes away. So uprobe_unregister() finding a probe to unregister would either mean unmap event hasnt occurred yet or a mmap event on the same executable file occured after a unmap event. Additionally, uprobe_mmap hook now also gets called: a. on every executable vma that is COWed at fork. b. a vma of interest is newly mapped; breakpoint insertion also happens at the required address. On process creation, make sure the probes count in the child is set correctly. Special cases that are taken care include: a. mremap b. VM_DONTCOPY vmas on fork() c. insertion/removal races in the parent during fork(). Signed-off-by: Srikar Dronamraju Cc: Linus Torvalds Cc: Ananth N Mavinakayanahalli Cc: Jim Keniston Cc: Linux-mm Cc: Oleg Nesterov Cc: Andi Kleen Cc: Christoph Hellwig Cc: Steven Rostedt Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Anton Arapov Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/20120330182646.10018.85805.sendpatchset@srdronam.in.ibm.com Signed-off-by: Ingo Molnar --- include/linux/uprobes.h | 5 ++ kernel/events/uprobes.c | 119 ++++++++++++++++++++++++++++++++++++++++++++---- kernel/fork.c | 3 ++ mm/mmap.c | 10 +++- 4 files changed, 128 insertions(+), 9 deletions(-) (limited to 'include') diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h index a111460c07d5..d594d3b3ad4c 100644 --- a/include/linux/uprobes.h +++ b/include/linux/uprobes.h @@ -99,6 +99,7 @@ struct xol_area { struct uprobes_state { struct xol_area *xol_area; + atomic_t count; }; extern int __weak set_swbp(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr); extern int __weak set_orig_insn(struct arch_uprobe *aup, struct mm_struct *mm, unsigned long vaddr, bool verify); @@ -106,6 +107,7 @@ extern bool __weak is_swbp_insn(uprobe_opcode_t *insn); extern int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc); extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc); extern int uprobe_mmap(struct vm_area_struct *vma); +extern void uprobe_munmap(struct vm_area_struct *vma); extern void uprobe_free_utask(struct task_struct *t); extern void uprobe_copy_process(struct task_struct *t); extern unsigned long __weak uprobe_get_swbp_addr(struct pt_regs *regs); @@ -132,6 +134,9 @@ static inline int uprobe_mmap(struct vm_area_struct *vma) { return 0; } +static inline void uprobe_munmap(struct vm_area_struct *vma) +{ +} static inline void uprobe_notify_resume(struct pt_regs *regs) { } diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index b395edb97f53..29e881b0137d 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -642,6 +642,29 @@ copy_insn(struct uprobe *uprobe, struct vm_area_struct *vma, unsigned long addr) return __copy_insn(mapping, vma, uprobe->arch.insn, bytes, uprobe->offset); } +/* + * How mm->uprobes_state.count gets updated + * uprobe_mmap() increments the count if + * - it successfully adds a breakpoint. + * - it cannot add a breakpoint, but sees that there is a underlying + * breakpoint (via a is_swbp_at_addr()). + * + * uprobe_munmap() decrements the count if + * - it sees a underlying breakpoint, (via is_swbp_at_addr) + * (Subsequent uprobe_unregister wouldnt find the breakpoint + * unless a uprobe_mmap kicks in, since the old vma would be + * dropped just after uprobe_munmap.) + * + * uprobe_register increments the count if: + * - it successfully adds a breakpoint. + * + * uprobe_unregister decrements the count if: + * - it sees a underlying breakpoint and removes successfully. + * (via is_swbp_at_addr) + * (Subsequent uprobe_munmap wouldnt find the breakpoint + * since there is no underlying breakpoint after the + * breakpoint removal.) + */ static int install_breakpoint(struct uprobe *uprobe, struct mm_struct *mm, struct vm_area_struct *vma, loff_t vaddr) @@ -675,7 +698,19 @@ install_breakpoint(struct uprobe *uprobe, struct mm_struct *mm, uprobe->flags |= UPROBE_COPY_INSN; } + + /* + * Ideally, should be updating the probe count after the breakpoint + * has been successfully inserted. However a thread could hit the + * breakpoint we just inserted even before the probe count is + * incremented. If this is the first breakpoint placed, breakpoint + * notifier might ignore uprobes and pass the trap to the thread. + * Hence increment before and decrement on failure. + */ + atomic_inc(&mm->uprobes_state.count); ret = set_swbp(&uprobe->arch, mm, addr); + if (ret) + atomic_dec(&mm->uprobes_state.count); return ret; } @@ -683,7 +718,8 @@ install_breakpoint(struct uprobe *uprobe, struct mm_struct *mm, static void remove_breakpoint(struct uprobe *uprobe, struct mm_struct *mm, loff_t vaddr) { - set_orig_insn(&uprobe->arch, mm, (unsigned long)vaddr, true); + if (!set_orig_insn(&uprobe->arch, mm, (unsigned long)vaddr, true)) + atomic_dec(&mm->uprobes_state.count); } /* @@ -1009,7 +1045,7 @@ int uprobe_mmap(struct vm_area_struct *vma) struct list_head tmp_list; struct uprobe *uprobe, *u; struct inode *inode; - int ret; + int ret, count; if (!atomic_read(&uprobe_events) || !valid_vma(vma, true)) return 0; @@ -1023,6 +1059,7 @@ int uprobe_mmap(struct vm_area_struct *vma) build_probe_list(inode, &tmp_list); ret = 0; + count = 0; list_for_each_entry_safe(uprobe, u, &tmp_list, pending_list) { loff_t vaddr; @@ -1030,21 +1067,85 @@ int uprobe_mmap(struct vm_area_struct *vma) list_del(&uprobe->pending_list); if (!ret) { vaddr = vma_address(vma, uprobe->offset); - if (vaddr >= vma->vm_start && vaddr < vma->vm_end) { - ret = install_breakpoint(uprobe, vma->vm_mm, vma, vaddr); - /* Ignore double add: */ - if (ret == -EEXIST) - ret = 0; + + if (vaddr < vma->vm_start || vaddr >= vma->vm_end) { + put_uprobe(uprobe); + continue; } + + ret = install_breakpoint(uprobe, vma->vm_mm, vma, vaddr); + + /* Ignore double add: */ + if (ret == -EEXIST) { + ret = 0; + + if (!is_swbp_at_addr(vma->vm_mm, vaddr)) + continue; + + /* + * Unable to insert a breakpoint, but + * breakpoint lies underneath. Increment the + * probe count. + */ + atomic_inc(&vma->vm_mm->uprobes_state.count); + } + + if (!ret) + count++; } put_uprobe(uprobe); } mutex_unlock(uprobes_mmap_hash(inode)); + if (ret) + atomic_sub(count, &vma->vm_mm->uprobes_state.count); + return ret; } +/* + * Called in context of a munmap of a vma. + */ +void uprobe_munmap(struct vm_area_struct *vma) +{ + struct list_head tmp_list; + struct uprobe *uprobe, *u; + struct inode *inode; + + if (!atomic_read(&uprobe_events) || !valid_vma(vma, false)) + return; + + if (!atomic_read(&vma->vm_mm->uprobes_state.count)) + return; + + inode = vma->vm_file->f_mapping->host; + if (!inode) + return; + + INIT_LIST_HEAD(&tmp_list); + mutex_lock(uprobes_mmap_hash(inode)); + build_probe_list(inode, &tmp_list); + + list_for_each_entry_safe(uprobe, u, &tmp_list, pending_list) { + loff_t vaddr; + + list_del(&uprobe->pending_list); + vaddr = vma_address(vma, uprobe->offset); + + if (vaddr >= vma->vm_start && vaddr < vma->vm_end) { + /* + * An unregister could have removed the probe before + * unmap. So check before we decrement the count. + */ + if (is_swbp_at_addr(vma->vm_mm, vaddr) == 1) + atomic_dec(&vma->vm_mm->uprobes_state.count); + } + put_uprobe(uprobe); + } + mutex_unlock(uprobes_mmap_hash(inode)); +} + /* Slot allocation for XOL */ static int xol_add_vma(struct xol_area *area) { @@ -1150,6 +1251,7 @@ void uprobe_clear_state(struct mm_struct *mm) void uprobe_reset_state(struct mm_struct *mm) { mm->uprobes_state.xol_area = NULL; + atomic_set(&mm->uprobes_state.count, 0); } /* @@ -1504,7 +1606,8 @@ int uprobe_pre_sstep_notifier(struct pt_regs *regs) { struct uprobe_task *utask; - if (!current->mm) + if (!current->mm || !atomic_read(¤t->mm->uprobes_state.count)) + /* task is currently not uprobed */ return 0; utask = current->utask; diff --git a/kernel/fork.c b/kernel/fork.c index 3133b9da59d5..26a8f5c25805 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -421,6 +421,9 @@ static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm) if (retval) goto out; + + if (file && uprobe_mmap(tmp)) + goto out; } /* a new mm has just been created */ arch_dup_mmap(oldmm, mm); diff --git a/mm/mmap.c b/mm/mmap.c index 5a863d328a44..7c112fbca405 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -218,6 +218,7 @@ void unlink_file_vma(struct vm_area_struct *vma) mutex_lock(&mapping->i_mmap_mutex); __remove_shared_vm_struct(vma, file, mapping); mutex_unlock(&mapping->i_mmap_mutex); + uprobe_munmap(vma); } } @@ -546,8 +547,14 @@ again: remove_next = 1 + (end > next->vm_end); if (file) { mapping = file->f_mapping; - if (!(vma->vm_flags & VM_NONLINEAR)) + if (!(vma->vm_flags & VM_NONLINEAR)) { root = &mapping->i_mmap; + uprobe_munmap(vma); + + if (adjust_next) + uprobe_munmap(next); + } + mutex_lock(&mapping->i_mmap_mutex); if (insert) { /* @@ -626,6 +633,7 @@ again: remove_next = 1 + (end > next->vm_end); if (remove_next) { if (file) { + uprobe_munmap(next); fput(file); if (next->vm_flags & VM_EXECUTABLE) removed_exe_file_vma(mm); -- cgit v1.2.3 From 0bf25a45386f284d591530ef174eaa9e44d84956 Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Tue, 3 Apr 2012 13:39:44 -0700 Subject: Input: add support for LM8333 keypads This driver adds support for the keypad part of the LM8333 and is prepared for possible GPIO/PWM drivers. Note that this is not a MFD because you cannot disable the keypad functionality which, thus, has to be handled by the core anyhow. Signed-off-by: Wolfram Sang Signed-off-by: Dmitry Torokhov --- drivers/input/keyboard/Kconfig | 10 ++ drivers/input/keyboard/Makefile | 1 + drivers/input/keyboard/lm8333.c | 236 ++++++++++++++++++++++++++++++++++++++++ include/linux/input/lm8333.h | 24 ++++ 4 files changed, 271 insertions(+) create mode 100644 drivers/input/keyboard/lm8333.c create mode 100644 include/linux/input/lm8333.h (limited to 'include') diff --git a/drivers/input/keyboard/Kconfig b/drivers/input/keyboard/Kconfig index f354813a13e8..7eaf93fe5128 100644 --- a/drivers/input/keyboard/Kconfig +++ b/drivers/input/keyboard/Kconfig @@ -309,6 +309,16 @@ config KEYBOARD_LM8323 To compile this driver as a module, choose M here: the module will be called lm8323. +config KEYBOARD_LM8333 + tristate "LM8333 keypad chip" + depends on I2C + help + If you say yes here you get support for the National Semiconductor + LM8333 keypad controller. + + To compile this driver as a module, choose M here: the + module will be called lm8333. + config KEYBOARD_LOCOMO tristate "LoCoMo Keyboard Support" depends on SHARP_LOCOMO diff --git a/drivers/input/keyboard/Makefile b/drivers/input/keyboard/Makefile index df7061f12918..b03b02456a82 100644 --- a/drivers/input/keyboard/Makefile +++ b/drivers/input/keyboard/Makefile @@ -24,6 +24,7 @@ obj-$(CONFIG_KEYBOARD_HP6XX) += jornada680_kbd.o obj-$(CONFIG_KEYBOARD_HP7XX) += jornada720_kbd.o obj-$(CONFIG_KEYBOARD_LKKBD) += lkkbd.o obj-$(CONFIG_KEYBOARD_LM8323) += lm8323.o +obj-$(CONFIG_KEYBOARD_LM8333) += lm8333.o obj-$(CONFIG_KEYBOARD_LOCOMO) += locomokbd.o obj-$(CONFIG_KEYBOARD_MAPLE) += maple_keyb.o obj-$(CONFIG_KEYBOARD_MATRIX) += matrix_keypad.o diff --git a/drivers/input/keyboard/lm8333.c b/drivers/input/keyboard/lm8333.c new file mode 100644 index 000000000000..9a8c4a6cf5c6 --- /dev/null +++ b/drivers/input/keyboard/lm8333.c @@ -0,0 +1,236 @@ +/* + * LM8333 keypad driver + * Copyright (C) 2012 Wolfram Sang, Pengutronix + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License. + */ + +#include +#include +#include +#include +#include +#include +#include + +#define LM8333_FIFO_READ 0x20 +#define LM8333_DEBOUNCE 0x22 +#define LM8333_READ_INT 0xD0 +#define LM8333_ACTIVE 0xE4 +#define LM8333_READ_ERROR 0xF0 + +#define LM8333_KEYPAD_IRQ (1 << 0) +#define LM8333_ERROR_IRQ (1 << 3) + +#define LM8333_ERROR_KEYOVR 0x04 +#define LM8333_ERROR_FIFOOVR 0x40 + +#define LM8333_FIFO_TRANSFER_SIZE 16 + +#define LM8333_ROW_SHIFT 4 +#define LM8333_NUM_ROWS 8 + + +struct lm8333 { + struct i2c_client *client; + struct input_dev *input; + unsigned short keycodes[LM8333_NUM_ROWS << LM8333_ROW_SHIFT]; +}; + +/* The accessors try twice because the first access may be needed for wakeup */ +#define LM8333_READ_RETRIES 2 + +int lm8333_read8(struct lm8333 *lm8333, u8 cmd) +{ + int retries = 0, ret; + + do { + ret = i2c_smbus_read_byte_data(lm8333->client, cmd); + } while (ret < 0 && retries++ < LM8333_READ_RETRIES); + + return ret; +} + +int lm8333_write8(struct lm8333 *lm8333, u8 cmd, u8 val) +{ + int retries = 0, ret; + + do { + ret = i2c_smbus_write_byte_data(lm8333->client, cmd, val); + } while (ret < 0 && retries++ < LM8333_READ_RETRIES); + + return ret; +} + +int lm8333_read_block(struct lm8333 *lm8333, u8 cmd, u8 len, u8 *buf) +{ + int retries = 0, ret; + + do { + ret = i2c_smbus_read_i2c_block_data(lm8333->client, + cmd, len, buf); + } while (ret < 0 && retries++ < LM8333_READ_RETRIES); + + return ret; +} + +static void lm8333_key_handler(struct lm8333 *lm8333) +{ + struct input_dev *input = lm8333->input; + u8 keys[LM8333_FIFO_TRANSFER_SIZE]; + u8 code, pressed; + int i, ret; + + ret = lm8333_read_block(lm8333, LM8333_FIFO_READ, + LM8333_FIFO_TRANSFER_SIZE, keys); + if (ret != LM8333_FIFO_TRANSFER_SIZE) { + dev_err(&lm8333->client->dev, + "Error %d while reading FIFO\n", ret); + return; + } + + for (i = 0; keys[i] && i < LM8333_FIFO_TRANSFER_SIZE; i++) { + pressed = keys[i] & 0x80; + code = keys[i] & 0x7f; + + input_event(input, EV_MSC, MSC_SCAN, code); + input_report_key(input, lm8333->keycodes[code], pressed); + } + + input_sync(input); +} + +static irqreturn_t lm8333_irq_thread(int irq, void *data) +{ + struct lm8333 *lm8333 = data; + u8 status = lm8333_read8(lm8333, LM8333_READ_INT); + + if (!status) + return IRQ_NONE; + + if (status & LM8333_ERROR_IRQ) { + u8 err = lm8333_read8(lm8333, LM8333_READ_ERROR); + + if (err & (LM8333_ERROR_KEYOVR | LM8333_ERROR_FIFOOVR)) { + u8 dummy[LM8333_FIFO_TRANSFER_SIZE]; + + lm8333_read_block(lm8333, LM8333_FIFO_READ, + LM8333_FIFO_TRANSFER_SIZE, dummy); + } + dev_err(&lm8333->client->dev, "Got error %02x\n", err); + } + + if (status & LM8333_KEYPAD_IRQ) + lm8333_key_handler(lm8333); + + return IRQ_HANDLED; +} + +static int __devinit lm8333_probe(struct i2c_client *client, + const struct i2c_device_id *id) +{ + const struct lm8333_platform_data *pdata = client->dev.platform_data; + struct lm8333 *lm8333; + struct input_dev *input; + int err, active_time; + + if (!pdata) + return -EINVAL; + + active_time = pdata->active_time ?: 500; + if (active_time / 3 <= pdata->debounce_time / 3) { + dev_err(&client->dev, "Active time not big enough!\n"); + return -EINVAL; + } + + lm8333 = kzalloc(sizeof(*lm8333), GFP_KERNEL); + input = input_allocate_device(); + if (!lm8333 || !input) { + err = -ENOMEM; + goto free_mem; + } + + lm8333->client = client; + lm8333->input = input; + + input->name = client->name; + input->dev.parent = &client->dev; + input->id.bustype = BUS_I2C; + + input->keycode = lm8333->keycodes; + input->keycodesize = sizeof(lm8333->keycodes[0]); + input->keycodemax = ARRAY_SIZE(lm8333->keycodes); + input->evbit[0] = BIT_MASK(EV_KEY); + input_set_capability(input, EV_MSC, MSC_SCAN); + + matrix_keypad_build_keymap(pdata->matrix_data, LM8333_ROW_SHIFT, + input->keycode, input->keybit); + + if (pdata->debounce_time) { + err = lm8333_write8(lm8333, LM8333_DEBOUNCE, + pdata->debounce_time / 3); + if (err) + dev_warn(&client->dev, "Unable to set debounce time\n"); + } + + if (pdata->active_time) { + err = lm8333_write8(lm8333, LM8333_ACTIVE, + pdata->active_time / 3); + if (err) + dev_warn(&client->dev, "Unable to set active time\n"); + } + + err = request_threaded_irq(client->irq, NULL, lm8333_irq_thread, + IRQF_TRIGGER_FALLING | IRQF_ONESHOT, + "lm8333", lm8333); + if (err) + goto free_mem; + + err = input_register_device(input); + if (err) + goto free_irq; + + i2c_set_clientdata(client, lm8333); + return 0; + + free_irq: + free_irq(client->irq, lm8333); + free_mem: + input_free_device(input); + kfree(lm8333); + return err; +} + +static int __devexit lm8333_remove(struct i2c_client *client) +{ + struct lm8333 *lm8333 = i2c_get_clientdata(client); + + free_irq(client->irq, lm8333); + input_unregister_device(lm8333->input); + kfree(lm8333); + + return 0; +} + +static const struct i2c_device_id lm8333_id[] = { + { "lm8333", 0 }, + { } +}; +MODULE_DEVICE_TABLE(i2c, lm8333_id); + +static struct i2c_driver lm8333_driver = { + .driver = { + .name = "lm8333", + .owner = THIS_MODULE, + }, + .probe = lm8333_probe, + .remove = __devexit_p(lm8333_remove), + .id_table = lm8333_id, +}; +module_i2c_driver(lm8333_driver); + +MODULE_AUTHOR("Wolfram Sang "); +MODULE_DESCRIPTION("LM8333 keyboard driver"); +MODULE_LICENSE("GPL v2"); diff --git a/include/linux/input/lm8333.h b/include/linux/input/lm8333.h new file mode 100644 index 000000000000..79f918c6e8c5 --- /dev/null +++ b/include/linux/input/lm8333.h @@ -0,0 +1,24 @@ +/* + * public include for LM8333 keypad driver - same license as driver + * Copyright (C) 2012 Wolfram Sang, Pengutronix + */ + +#ifndef _LM8333_H +#define _LM8333_H + +struct lm8333; + +struct lm8333_platform_data { + /* Keymap data */ + const struct matrix_keymap_data *matrix_data; + /* Active timeout before enter HALT mode in microseconds */ + unsigned active_time; + /* Debounce interval in microseconds */ + unsigned debounce_time; +}; + +extern int lm8333_read8(struct lm8333 *lm8333, u8 cmd); +extern int lm8333_write8(struct lm8333 *lm8333, u8 cmd, u8 val); +extern int lm8333_read_block(struct lm8333 *lm8333, u8 cmd, u8 len, u8 *buf); + +#endif /* _LM8333_H */ -- cgit v1.2.3 From fa7f86d157781515b74d658120552eafd890f4de Mon Sep 17 00:00:00 2001 From: Axel Lin Date: Tue, 3 Apr 2012 23:50:15 -0700 Subject: Input: serio - add helper macro for serio_driver boilerplate This patch introduces the module_serio_driver macro which is a convenience macro for serio driver modules similar to module_platform_driver. It is intended to be used by drivers which init/exit section does nothing but registers/unregisters the serio driver. By using this macro it is possible to eliminate a few lines of boilerplate code per serio driver. Based on work done by Lars-Peter Clausen for other buses (i2c and spi). Signed-off-by: Axel Lin Signed-off-by: Dmitry Torokhov --- include/linux/serio.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'include') diff --git a/include/linux/serio.h b/include/linux/serio.h index ca82861b0e46..6d6cfd3e94a3 100644 --- a/include/linux/serio.h +++ b/include/linux/serio.h @@ -96,6 +96,19 @@ int __must_check __serio_register_driver(struct serio_driver *drv, void serio_unregister_driver(struct serio_driver *drv); +/** + * module_serio_driver() - Helper macro for registering a serio driver + * @__serio_driver: serio_driver struct + * + * Helper macro for serio drivers which do not do anything special in + * module init/exit. This eliminates a lot of boilerplate. Each module + * may only use this macro once, and calling it replaces module_init() + * and module_exit(). + */ +#define module_serio_driver(__serio_driver) \ + module_driver(__serio_driver, serio_register_driver, \ + serio_unregister_driver) + static inline int serio_write(struct serio *serio, unsigned char data) { if (serio->write) -- cgit v1.2.3 From 45b2604eaaa105223ce60117b0482ca8a488f9c4 Mon Sep 17 00:00:00 2001 From: Axel Lin Date: Tue, 3 Apr 2012 23:51:08 -0700 Subject: Input: gameport - add helper macro for gameport_driver boilerplate This patch introduces the module_gameport_driver macro which is a convenience macro for gameport driver modules similar to module_platform_driver. It is intended to be used by drivers which init/exit section does nothing but registers/unregisters the gameport driver. By using this macro it is possible to eliminate a few lines of boilerplate code per gameport driver. Based on work done by Lars-Peter Clausen for other buses (i2c and spi). Signed-off-by: Axel Lin Signed-off-by: Dmitry Torokhov --- include/linux/gameport.h | 13 +++++++++++++ 1 file changed, 13 insertions(+) (limited to 'include') diff --git a/include/linux/gameport.h b/include/linux/gameport.h index b456b08d70ed..b986be513406 100644 --- a/include/linux/gameport.h +++ b/include/linux/gameport.h @@ -153,6 +153,19 @@ int __must_check __gameport_register_driver(struct gameport_driver *drv, void gameport_unregister_driver(struct gameport_driver *drv); +/** + * module_gameport_driver() - Helper macro for registering a gameport driver + * @__gameport_driver: gameport_driver struct + * + * Helper macro for gameport drivers which do not do anything special in + * module init/exit. This eliminates a lot of boilerplate. Each module may + * only use this macro once, and calling it replaces module_init() and + * module_exit(). + */ +#define module_gameport_driver(__gameport_driver) \ + module_driver(__gameport_driver, gameport_register_driver, \ + gameport_unregister_driver) + #endif /* __KERNEL__ */ #define GAMEPORT_MODE_DISABLED 0 -- cgit v1.2.3 From fc3a1f04f5040255cbc086c419e4237f29f89f88 Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Tue, 13 Dec 2011 18:34:01 +0100 Subject: gpio: add flags to export GPIOs when requesting Introduce new flags to automatically export GPIOs when using the convenience functions gpio_request_one() or gpio_request_array(). This eases support for custom boards where lots of GPIOs need to be exported for customer applications. Signed-off-by: Wolfram Sang Signed-off-by: Grant Likely --- Documentation/gpio.txt | 3 +++ drivers/gpio/gpiolib.c | 12 +++++++++++- include/linux/gpio.h | 5 +++++ 3 files changed, 19 insertions(+), 1 deletion(-) (limited to 'include') diff --git a/Documentation/gpio.txt b/Documentation/gpio.txt index 620a07844e8c..e08a883de36e 100644 --- a/Documentation/gpio.txt +++ b/Documentation/gpio.txt @@ -322,6 +322,9 @@ where 'flags' is currently defined to specify the following properties: * GPIOF_OPEN_DRAIN - gpio pin is open drain type. * GPIOF_OPEN_SOURCE - gpio pin is open source type. + * GPIOF_EXPORT_DIR_FIXED - export gpio to sysfs, keep direction + * GPIOF_EXPORT_DIR_CHANGEABLE - also export, allow changing direction + since GPIOF_INIT_* are only valid when configured as output, so group valid combinations as: diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index 5a75510d66bb..566d0122d832 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -1302,8 +1302,18 @@ int gpio_request_one(unsigned gpio, unsigned long flags, const char *label) (flags & GPIOF_INIT_HIGH) ? 1 : 0); if (err) - gpio_free(gpio); + goto free_gpio; + + if (flags & GPIOF_EXPORT) { + err = gpio_export(gpio, flags & GPIOF_EXPORT_CHANGEABLE); + if (err) + goto free_gpio; + } + + return 0; + free_gpio: + gpio_free(gpio); return err; } EXPORT_SYMBOL_GPL(gpio_request_one); diff --git a/include/linux/gpio.h b/include/linux/gpio.h index 6155ecf192b0..af511a682925 100644 --- a/include/linux/gpio.h +++ b/include/linux/gpio.h @@ -20,6 +20,11 @@ /* Gpio pin is open source */ #define GPIOF_OPEN_SOURCE (1 << 3) +#define GPIOF_EXPORT (1 << 2) +#define GPIOF_EXPORT_CHANGEABLE (1 << 3) +#define GPIOF_EXPORT_DIR_FIXED (GPIOF_EXPORT) +#define GPIOF_EXPORT_DIR_CHANGEABLE (GPIOF_EXPORT | GPIOF_EXPORT_CHANGEABLE) + /** * struct gpio - a structure describing a GPIO with configuration * @gpio: the GPIO number -- cgit v1.2.3 From 2c96922ae3f0bfb7324a7a433d96d319fe6de729 Mon Sep 17 00:00:00 2001 From: Mark Brown Date: Wed, 4 Apr 2012 16:14:48 +0100 Subject: gpiolib: Add !CONFIG_GPIOLIB definitions of devm_ functions Currently the managed gpio_request() and gpio_free() are not stubbed out for configurations not using gpiolib - do that to aid use in drivers. Signed-off-by: Mark Brown Signed-off-by: Grant Likely --- include/linux/gpio.h | 14 ++++++++++++++ 1 file changed, 14 insertions(+) (limited to 'include') diff --git a/include/linux/gpio.h b/include/linux/gpio.h index af511a682925..d1890d46b6ce 100644 --- a/include/linux/gpio.h +++ b/include/linux/gpio.h @@ -60,6 +60,12 @@ static inline int gpio_request(unsigned gpio, const char *label) return -ENOSYS; } +static inline int devm_gpio_request(struct device *dev, unsigned gpio, + const char *label) +{ + return -ENOSYS; +} + static inline int gpio_request_one(unsigned gpio, unsigned long flags, const char *label) { @@ -79,6 +85,14 @@ static inline void gpio_free(unsigned gpio) WARN_ON(1); } +static inline void devm_gpio_free(struct device *dev, unsigned gpio) +{ + might_sleep(); + + /* GPIO can never have been requested */ + WARN_ON(1); +} + static inline void gpio_free_array(const struct gpio *array, size_t num) { might_sleep(); -- cgit v1.2.3 From a13007160f1b9ec7c67e28ec9254f197c5c08d7d Mon Sep 17 00:00:00 2001 From: Amos Kong Date: Fri, 9 Mar 2012 12:17:32 +0800 Subject: KVM: resize kvm_io_range array dynamically This patch makes the kvm_io_range array can be resized dynamically. Signed-off-by: Amos Kong Signed-off-by: Marcelo Tosatti Signed-off-by: Avi Kivity --- include/linux/kvm_host.h | 5 +++-- virt/kvm/kvm_main.c | 38 ++++++++++++++++++-------------------- 2 files changed, 21 insertions(+), 22 deletions(-) (limited to 'include') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 665a260c7e09..ba9fb4a9762d 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -68,10 +68,11 @@ struct kvm_io_range { struct kvm_io_device *dev; }; +#define NR_IOBUS_DEVS 300 + struct kvm_io_bus { int dev_count; -#define NR_IOBUS_DEVS 300 - struct kvm_io_range range[NR_IOBUS_DEVS]; + struct kvm_io_range range[]; }; enum kvm_bus { diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index 42b73930a6de..a9565e240636 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -2393,9 +2393,6 @@ int kvm_io_bus_sort_cmp(const void *p1, const void *p2) int kvm_io_bus_insert_dev(struct kvm_io_bus *bus, struct kvm_io_device *dev, gpa_t addr, int len) { - if (bus->dev_count == NR_IOBUS_DEVS) - return -ENOSPC; - bus->range[bus->dev_count++] = (struct kvm_io_range) { .addr = addr, .len = len, @@ -2495,12 +2492,15 @@ int kvm_io_bus_register_dev(struct kvm *kvm, enum kvm_bus bus_idx, gpa_t addr, struct kvm_io_bus *new_bus, *bus; bus = kvm->buses[bus_idx]; - if (bus->dev_count > NR_IOBUS_DEVS-1) + if (bus->dev_count > NR_IOBUS_DEVS - 1) return -ENOSPC; - new_bus = kmemdup(bus, sizeof(struct kvm_io_bus), GFP_KERNEL); + new_bus = kzalloc(sizeof(*bus) + ((bus->dev_count + 1) * + sizeof(struct kvm_io_range)), GFP_KERNEL); if (!new_bus) return -ENOMEM; + memcpy(new_bus, bus, sizeof(*bus) + (bus->dev_count * + sizeof(struct kvm_io_range))); kvm_io_bus_insert_dev(new_bus, dev, addr, len); rcu_assign_pointer(kvm->buses[bus_idx], new_bus); synchronize_srcu_expedited(&kvm->srcu); @@ -2517,27 +2517,25 @@ int kvm_io_bus_unregister_dev(struct kvm *kvm, enum kvm_bus bus_idx, struct kvm_io_bus *new_bus, *bus; bus = kvm->buses[bus_idx]; - - new_bus = kmemdup(bus, sizeof(*bus), GFP_KERNEL); - if (!new_bus) - return -ENOMEM; - r = -ENOENT; - for (i = 0; i < new_bus->dev_count; i++) - if (new_bus->range[i].dev == dev) { + for (i = 0; i < bus->dev_count; i++) + if (bus->range[i].dev == dev) { r = 0; - new_bus->dev_count--; - new_bus->range[i] = new_bus->range[new_bus->dev_count]; - sort(new_bus->range, new_bus->dev_count, - sizeof(struct kvm_io_range), - kvm_io_bus_sort_cmp, NULL); break; } - if (r) { - kfree(new_bus); + if (r) return r; - } + + new_bus = kzalloc(sizeof(*bus) + ((bus->dev_count - 1) * + sizeof(struct kvm_io_range)), GFP_KERNEL); + if (!new_bus) + return -ENOMEM; + + memcpy(new_bus, bus, sizeof(*bus) + i * sizeof(struct kvm_io_range)); + new_bus->dev_count--; + memcpy(new_bus->range + i, bus->range + i + 1, + (new_bus->dev_count - i) * sizeof(struct kvm_io_range)); rcu_assign_pointer(kvm->buses[bus_idx], new_bus); synchronize_srcu_expedited(&kvm->srcu); -- cgit v1.2.3 From 786a9f888bfbe70a36d0592b26037ca1e8c8da7f Mon Sep 17 00:00:00 2001 From: Amos Kong Date: Fri, 9 Mar 2012 12:17:40 +0800 Subject: KVM: set upper bounds for iobus dev to limit userspace kvm_io_bus devices are used for ioevent, pit, pic, ioapic, coalesced_mmio. Currently Qemu only emulates one PCI bus, it contains 32 slots, one slot contains 8 functions, maximum of supported PCI devices: 1 * 32 * 8 = 256. One virtio-blk takes one iobus device, one virtio-net(vhost=on) takes two iobus devices. The maximum of coalesced mmio zone is 100, each zone has an iobus devices. So 300 io_bus devices are not enough. Set an upper bounds for kvm_io_range to limit userspace. 1000 is a very large limit and not bloat the typical user. Signed-off-by: Amos Kong Signed-off-by: Marcelo Tosatti Signed-off-by: Avi Kivity --- include/linux/kvm_host.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index ba9fb4a9762d..3a2cea616283 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -68,7 +68,7 @@ struct kvm_io_range { struct kvm_io_device *dev; }; -#define NR_IOBUS_DEVS 300 +#define NR_IOBUS_DEVS 1000 struct kvm_io_bus { int dev_count; -- cgit v1.2.3 From b6d33834bd4e8bdf4a199812e31b3e36da53c794 Mon Sep 17 00:00:00 2001 From: Christoffer Dall Date: Thu, 8 Mar 2012 16:44:24 -0500 Subject: KVM: Factor out kvm_vcpu_kick to arch-generic code The kvm_vcpu_kick function performs roughly the same funcitonality on most all architectures, so we shouldn't have separate copies. PowerPC keeps a pointer to interchanging waitqueues on the vcpu_arch structure and to accomodate this special need a __KVM_HAVE_ARCH_VCPU_GET_WQ define and accompanying function kvm_arch_vcpu_wq have been defined. For all other architectures this is a generic inline that just returns &vcpu->wq; Acked-by: Scott Wood Signed-off-by: Christoffer Dall Signed-off-by: Marcelo Tosatti Signed-off-by: Avi Kivity --- arch/ia64/include/asm/kvm_host.h | 1 + arch/ia64/kvm/kvm-ia64.c | 20 +++++--------------- arch/powerpc/include/asm/kvm_host.h | 6 ++++++ arch/powerpc/kvm/powerpc.c | 21 ++++++--------------- arch/s390/kvm/kvm-s390.c | 8 ++++++++ arch/x86/kvm/x86.c | 16 ++-------------- include/linux/kvm_host.h | 9 +++++++++ virt/kvm/kvm_main.c | 22 ++++++++++++++++++++++ 8 files changed, 59 insertions(+), 44 deletions(-) (limited to 'include') diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h index e35b3a84a40b..c4b4bac3d09e 100644 --- a/arch/ia64/include/asm/kvm_host.h +++ b/arch/ia64/include/asm/kvm_host.h @@ -365,6 +365,7 @@ struct thash_cb { }; struct kvm_vcpu_stat { + u32 halt_wakeup; }; struct kvm_vcpu_arch { diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index f5104b7c52cd..9d80ff8d9eff 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -1872,21 +1872,6 @@ void kvm_arch_hardware_unsetup(void) { } -void kvm_vcpu_kick(struct kvm_vcpu *vcpu) -{ - int me; - int cpu = vcpu->cpu; - - if (waitqueue_active(&vcpu->wq)) - wake_up_interruptible(&vcpu->wq); - - me = get_cpu(); - if (cpu != me && (unsigned) cpu < nr_cpu_ids && cpu_online(cpu)) - if (!test_and_set_bit(KVM_REQ_KICK, &vcpu->requests)) - smp_send_reschedule(cpu); - put_cpu(); -} - int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq) { return __apic_accept_irq(vcpu, irq->vector); @@ -1956,6 +1941,11 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) (kvm_highest_pending_irq(vcpu) != -1); } +int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu) +{ + return (!test_and_set_bit(KVM_REQ_KICK, &vcpu->requests)); +} + int kvm_arch_vcpu_ioctl_get_mpstate(struct kvm_vcpu *vcpu, struct kvm_mp_state *mp_state) { diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 52eb9c1f4fe0..889383735e73 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -498,4 +498,10 @@ struct kvm_vcpu_arch { #define KVM_MMIO_REG_QPR 0x0040 #define KVM_MMIO_REG_FQPR 0x0060 +#define __KVM_HAVE_ARCH_VCPU_GET_WQ 1 +static inline wait_queue_head *kvm_arch_vcpu_wq(struct kvm_vcpu *vcpu) +{ + return vcpu->arch.wqp; +} + #endif /* __POWERPC_KVM_HOST_H__ */ diff --git a/arch/powerpc/kvm/powerpc.c b/arch/powerpc/kvm/powerpc.c index 00d7e345b3fe..b5e9046462fd 100644 --- a/arch/powerpc/kvm/powerpc.c +++ b/arch/powerpc/kvm/powerpc.c @@ -43,6 +43,11 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *v) v->requests; } +int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu) +{ + return 1; +} + int kvmppc_kvm_pv(struct kvm_vcpu *vcpu) { int nr = kvmppc_get_gpr(vcpu, 11); @@ -588,21 +593,6 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) return r; } -void kvm_vcpu_kick(struct kvm_vcpu *vcpu) -{ - int me; - int cpu = vcpu->cpu; - - me = get_cpu(); - if (waitqueue_active(vcpu->arch.wqp)) { - wake_up_interruptible(vcpu->arch.wqp); - vcpu->stat.halt_wakeup++; - } else if (cpu != me && cpu != -1) { - smp_send_reschedule(vcpu->cpu); - } - put_cpu(); -} - int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq) { if (irq->irq == KVM_INTERRUPT_UNSET) { @@ -611,6 +601,7 @@ int kvm_vcpu_ioctl_interrupt(struct kvm_vcpu *vcpu, struct kvm_interrupt *irq) } kvmppc_core_queue_external(vcpu, irq); + kvm_vcpu_kick(vcpu); return 0; diff --git a/arch/s390/kvm/kvm-s390.c b/arch/s390/kvm/kvm-s390.c index 217ce44395a4..d30c8350b949 100644 --- a/arch/s390/kvm/kvm-s390.c +++ b/arch/s390/kvm/kvm-s390.c @@ -423,6 +423,14 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) return 0; } +int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu) +{ + /* kvm common code refers to this, but never calls it */ + BUG(); + return 0; +} + + static int kvm_arch_vcpu_ioctl_initial_reset(struct kvm_vcpu *vcpu) { kvm_s390_vcpu_initial_reset(vcpu); diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 4044ce0bf7c1..511031dcb9cc 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -6403,21 +6403,9 @@ int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu) kvm_cpu_has_interrupt(vcpu)); } -void kvm_vcpu_kick(struct kvm_vcpu *vcpu) +int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu) { - int me; - int cpu = vcpu->cpu; - - if (waitqueue_active(&vcpu->wq)) { - wake_up_interruptible(&vcpu->wq); - ++vcpu->stat.halt_wakeup; - } - - me = get_cpu(); - if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu)) - if (kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE) - smp_send_reschedule(cpu); - put_cpu(); + return kvm_vcpu_exiting_guest_mode(vcpu) == IN_GUEST_MODE; } int kvm_arch_interrupt_allowed(struct kvm_vcpu *vcpu) diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 3a2cea616283..5b624e1ff814 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -439,6 +439,7 @@ void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_memory_slot *memslot, gfn_t gfn); void kvm_vcpu_block(struct kvm_vcpu *vcpu); +void kvm_vcpu_kick(struct kvm_vcpu *vcpu); void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu); void kvm_resched(struct kvm_vcpu *vcpu); void kvm_load_guest_fpu(struct kvm_vcpu *vcpu); @@ -507,6 +508,7 @@ int kvm_arch_hardware_setup(void); void kvm_arch_hardware_unsetup(void); void kvm_arch_check_processor_compat(void *rtn); int kvm_arch_vcpu_runnable(struct kvm_vcpu *vcpu); +int kvm_arch_vcpu_should_kick(struct kvm_vcpu *vcpu); void kvm_free_physmem(struct kvm *kvm); @@ -522,6 +524,13 @@ static inline void kvm_arch_free_vm(struct kvm *kvm) } #endif +#ifndef __KVM_HAVE_ARCH_VCPU_GET_WQ +static inline wait_queue_head_t *kvm_arch_vcpu_wq(struct kvm_vcpu *vcpu) +{ + return &vcpu->wq; +} +#endif + int kvm_arch_init_vm(struct kvm *kvm, unsigned long type); void kvm_arch_destroy_vm(struct kvm *kvm); void kvm_free_all_assigned_devices(struct kvm *kvm); diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index a9565e240636..7149a2e65524 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -1514,6 +1514,28 @@ void kvm_vcpu_block(struct kvm_vcpu *vcpu) finish_wait(&vcpu->wq, &wait); } +/* + * Kick a sleeping VCPU, or a guest VCPU in guest mode, into host kernel mode. + */ +void kvm_vcpu_kick(struct kvm_vcpu *vcpu) +{ + int me; + int cpu = vcpu->cpu; + wait_queue_head_t *wqp; + + wqp = kvm_arch_vcpu_wq(vcpu); + if (waitqueue_active(wqp)) { + wake_up_interruptible(wqp); + ++vcpu->stat.halt_wakeup; + } + + me = get_cpu(); + if (cpu != me && (unsigned)cpu < nr_cpu_ids && cpu_online(cpu)) + if (kvm_arch_vcpu_should_kick(vcpu)) + smp_send_reschedule(cpu); + put_cpu(); +} + void kvm_resched(struct kvm_vcpu *vcpu) { if (!need_resched()) -- cgit v1.2.3 From 2246f8b56315befa30f3d3d2800e0734c774f70e Mon Sep 17 00:00:00 2001 From: Alexander Graf Date: Tue, 13 Mar 2012 22:35:01 +0100 Subject: KVM: PPC: Rework wqp conditional code On PowerPC, we sometimes use a waitqueue per core, not per thread, so we can't always use the vcpu internal waitqueue. This code has been generalized by Christoffer Dall recently, but unfortunately broke compilation for PowerPC. At the time the helper function is defined, struct kvm_vcpu is not declared yet, so we can't dereference it. This patch moves all logic into the generic inline function, at which time we have all information necessary. Signed-off-by: Alexander Graf Signed-off-by: Marcelo Tosatti Signed-off-by: Avi Kivity --- arch/powerpc/include/asm/kvm_host.h | 6 +----- include/linux/kvm_host.h | 6 ++++-- 2 files changed, 5 insertions(+), 7 deletions(-) (limited to 'include') diff --git a/arch/powerpc/include/asm/kvm_host.h b/arch/powerpc/include/asm/kvm_host.h index 889383735e73..20ab5b2dbd0f 100644 --- a/arch/powerpc/include/asm/kvm_host.h +++ b/arch/powerpc/include/asm/kvm_host.h @@ -498,10 +498,6 @@ struct kvm_vcpu_arch { #define KVM_MMIO_REG_QPR 0x0040 #define KVM_MMIO_REG_FQPR 0x0060 -#define __KVM_HAVE_ARCH_VCPU_GET_WQ 1 -static inline wait_queue_head *kvm_arch_vcpu_wq(struct kvm_vcpu *vcpu) -{ - return vcpu->arch.wqp; -} +#define __KVM_HAVE_ARCH_WQP #endif /* __POWERPC_KVM_HOST_H__ */ diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 5b624e1ff814..5184817e714a 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -524,12 +524,14 @@ static inline void kvm_arch_free_vm(struct kvm *kvm) } #endif -#ifndef __KVM_HAVE_ARCH_VCPU_GET_WQ static inline wait_queue_head_t *kvm_arch_vcpu_wq(struct kvm_vcpu *vcpu) { +#ifdef __KVM_HAVE_ARCH_WQP + return vcpu->arch.wqp; +#else return &vcpu->wq; -} #endif +} int kvm_arch_init_vm(struct kvm *kvm, unsigned long type); void kvm_arch_destroy_vm(struct kvm *kvm); -- cgit v1.2.3 From 3b5d56b9317fa7b5407dff1aa7b115bf6cdbd494 Mon Sep 17 00:00:00 2001 From: Eric B Munson Date: Sat, 10 Mar 2012 14:37:26 -0500 Subject: kvmclock: Add functions to check if the host has stopped the vm When a host stops or suspends a VM it will set a flag to show this. The watchdog will use these functions to determine if a softlockup is real, or the result of a suspended VM. Signed-off-by: Eric B Munson asm-generic changes Acked-by: Arnd Bergmann Signed-off-by: Marcelo Tosatti Signed-off-by: Avi Kivity --- arch/alpha/include/asm/kvm_para.h | 1 + arch/arm/include/asm/kvm_para.h | 1 + arch/avr32/include/asm/kvm_para.h | 1 + arch/blackfin/include/asm/kvm_para.h | 1 + arch/c6x/include/asm/kvm_para.h | 1 + arch/frv/include/asm/kvm_para.h | 1 + arch/h8300/include/asm/kvm_para.h | 1 + arch/hexagon/include/asm/kvm_para.h | 1 + arch/ia64/include/asm/kvm_para.h | 5 +++++ arch/m68k/include/asm/kvm_para.h | 1 + arch/microblaze/include/asm/kvm_para.h | 1 + arch/mips/include/asm/kvm_para.h | 1 + arch/mn10300/include/asm/kvm_para.h | 1 + arch/openrisc/include/asm/kvm_para.h | 1 + arch/parisc/include/asm/kvm_para.h | 1 + arch/powerpc/include/asm/kvm_para.h | 5 +++++ arch/s390/include/asm/kvm_para.h | 5 +++++ arch/score/include/asm/kvm_para.h | 1 + arch/sh/include/asm/kvm_para.h | 1 + arch/sparc/include/asm/kvm_para.h | 1 + arch/tile/include/asm/kvm_para.h | 1 + arch/um/include/asm/kvm_para.h | 1 + arch/unicore32/include/asm/kvm_para.h | 1 + arch/x86/include/asm/kvm_para.h | 8 ++++++++ arch/x86/kernel/kvmclock.c | 21 +++++++++++++++++++++ arch/xtensa/include/asm/kvm_para.h | 1 + include/asm-generic/kvm_para.h | 14 ++++++++++++++ 27 files changed, 79 insertions(+) create mode 100644 arch/alpha/include/asm/kvm_para.h create mode 100644 arch/arm/include/asm/kvm_para.h create mode 100644 arch/avr32/include/asm/kvm_para.h create mode 100644 arch/blackfin/include/asm/kvm_para.h create mode 100644 arch/c6x/include/asm/kvm_para.h create mode 100644 arch/frv/include/asm/kvm_para.h create mode 100644 arch/h8300/include/asm/kvm_para.h create mode 100644 arch/hexagon/include/asm/kvm_para.h create mode 100644 arch/m68k/include/asm/kvm_para.h create mode 100644 arch/microblaze/include/asm/kvm_para.h create mode 100644 arch/mips/include/asm/kvm_para.h create mode 100644 arch/mn10300/include/asm/kvm_para.h create mode 100644 arch/openrisc/include/asm/kvm_para.h create mode 100644 arch/parisc/include/asm/kvm_para.h create mode 100644 arch/score/include/asm/kvm_para.h create mode 100644 arch/sh/include/asm/kvm_para.h create mode 100644 arch/sparc/include/asm/kvm_para.h create mode 100644 arch/tile/include/asm/kvm_para.h create mode 100644 arch/um/include/asm/kvm_para.h create mode 100644 arch/unicore32/include/asm/kvm_para.h create mode 100644 arch/xtensa/include/asm/kvm_para.h create mode 100644 include/asm-generic/kvm_para.h (limited to 'include') diff --git a/arch/alpha/include/asm/kvm_para.h b/arch/alpha/include/asm/kvm_para.h new file mode 100644 index 000000000000..14fab8f0b957 --- /dev/null +++ b/arch/alpha/include/asm/kvm_para.h @@ -0,0 +1 @@ +#include diff --git a/arch/arm/include/asm/kvm_para.h b/arch/arm/include/asm/kvm_para.h new file mode 100644 index 000000000000..14fab8f0b957 --- /dev/null +++ b/arch/arm/include/asm/kvm_para.h @@ -0,0 +1 @@ +#include diff --git a/arch/avr32/include/asm/kvm_para.h b/arch/avr32/include/asm/kvm_para.h new file mode 100644 index 000000000000..14fab8f0b957 --- /dev/null +++ b/arch/avr32/include/asm/kvm_para.h @@ -0,0 +1 @@ +#include diff --git a/arch/blackfin/include/asm/kvm_para.h b/arch/blackfin/include/asm/kvm_para.h new file mode 100644 index 000000000000..14fab8f0b957 --- /dev/null +++ b/arch/blackfin/include/asm/kvm_para.h @@ -0,0 +1 @@ +#include diff --git a/arch/c6x/include/asm/kvm_para.h b/arch/c6x/include/asm/kvm_para.h new file mode 100644 index 000000000000..14fab8f0b957 --- /dev/null +++ b/arch/c6x/include/asm/kvm_para.h @@ -0,0 +1 @@ +#include diff --git a/arch/frv/include/asm/kvm_para.h b/arch/frv/include/asm/kvm_para.h new file mode 100644 index 000000000000..14fab8f0b957 --- /dev/null +++ b/arch/frv/include/asm/kvm_para.h @@ -0,0 +1 @@ +#include diff --git a/arch/h8300/include/asm/kvm_para.h b/arch/h8300/include/asm/kvm_para.h new file mode 100644 index 000000000000..14fab8f0b957 --- /dev/null +++ b/arch/h8300/include/asm/kvm_para.h @@ -0,0 +1 @@ +#include diff --git a/arch/hexagon/include/asm/kvm_para.h b/arch/hexagon/include/asm/kvm_para.h new file mode 100644 index 000000000000..14fab8f0b957 --- /dev/null +++ b/arch/hexagon/include/asm/kvm_para.h @@ -0,0 +1 @@ +#include diff --git a/arch/ia64/include/asm/kvm_para.h b/arch/ia64/include/asm/kvm_para.h index 1588aee781a2..2019cb99335e 100644 --- a/arch/ia64/include/asm/kvm_para.h +++ b/arch/ia64/include/asm/kvm_para.h @@ -26,6 +26,11 @@ static inline unsigned int kvm_arch_para_features(void) return 0; } +static inline bool kvm_check_and_clear_guest_paused(void) +{ + return false; +} + #endif #endif diff --git a/arch/m68k/include/asm/kvm_para.h b/arch/m68k/include/asm/kvm_para.h new file mode 100644 index 000000000000..14fab8f0b957 --- /dev/null +++ b/arch/m68k/include/asm/kvm_para.h @@ -0,0 +1 @@ +#include diff --git a/arch/microblaze/include/asm/kvm_para.h b/arch/microblaze/include/asm/kvm_para.h new file mode 100644 index 000000000000..14fab8f0b957 --- /dev/null +++ b/arch/microblaze/include/asm/kvm_para.h @@ -0,0 +1 @@ +#include diff --git a/arch/mips/include/asm/kvm_para.h b/arch/mips/include/asm/kvm_para.h new file mode 100644 index 000000000000..14fab8f0b957 --- /dev/null +++ b/arch/mips/include/asm/kvm_para.h @@ -0,0 +1 @@ +#include diff --git a/arch/mn10300/include/asm/kvm_para.h b/arch/mn10300/include/asm/kvm_para.h new file mode 100644 index 000000000000..14fab8f0b957 --- /dev/null +++ b/arch/mn10300/include/asm/kvm_para.h @@ -0,0 +1 @@ +#include diff --git a/arch/openrisc/include/asm/kvm_para.h b/arch/openrisc/include/asm/kvm_para.h new file mode 100644 index 000000000000..14fab8f0b957 --- /dev/null +++ b/arch/openrisc/include/asm/kvm_para.h @@ -0,0 +1 @@ +#include diff --git a/arch/parisc/include/asm/kvm_para.h b/arch/parisc/include/asm/kvm_para.h new file mode 100644 index 000000000000..14fab8f0b957 --- /dev/null +++ b/arch/parisc/include/asm/kvm_para.h @@ -0,0 +1 @@ +#include diff --git a/arch/powerpc/include/asm/kvm_para.h b/arch/powerpc/include/asm/kvm_para.h index 7b754e743003..c18916bff689 100644 --- a/arch/powerpc/include/asm/kvm_para.h +++ b/arch/powerpc/include/asm/kvm_para.h @@ -206,6 +206,11 @@ static inline unsigned int kvm_arch_para_features(void) return r; } +static inline bool kvm_check_and_clear_guest_paused(void) +{ + return false; +} + #endif /* __KERNEL__ */ #endif /* __POWERPC_KVM_PARA_H__ */ diff --git a/arch/s390/include/asm/kvm_para.h b/arch/s390/include/asm/kvm_para.h index 6964db226f83..a98832961035 100644 --- a/arch/s390/include/asm/kvm_para.h +++ b/arch/s390/include/asm/kvm_para.h @@ -149,6 +149,11 @@ static inline unsigned int kvm_arch_para_features(void) return 0; } +static inline bool kvm_check_and_clear_guest_paused(void) +{ + return false; +} + #endif #endif /* __S390_KVM_PARA_H */ diff --git a/arch/score/include/asm/kvm_para.h b/arch/score/include/asm/kvm_para.h new file mode 100644 index 000000000000..14fab8f0b957 --- /dev/null +++ b/arch/score/include/asm/kvm_para.h @@ -0,0 +1 @@ +#include diff --git a/arch/sh/include/asm/kvm_para.h b/arch/sh/include/asm/kvm_para.h new file mode 100644 index 000000000000..14fab8f0b957 --- /dev/null +++ b/arch/sh/include/asm/kvm_para.h @@ -0,0 +1 @@ +#include diff --git a/arch/sparc/include/asm/kvm_para.h b/arch/sparc/include/asm/kvm_para.h new file mode 100644 index 000000000000..14fab8f0b957 --- /dev/null +++ b/arch/sparc/include/asm/kvm_para.h @@ -0,0 +1 @@ +#include diff --git a/arch/tile/include/asm/kvm_para.h b/arch/tile/include/asm/kvm_para.h new file mode 100644 index 000000000000..14fab8f0b957 --- /dev/null +++ b/arch/tile/include/asm/kvm_para.h @@ -0,0 +1 @@ +#include diff --git a/arch/um/include/asm/kvm_para.h b/arch/um/include/asm/kvm_para.h new file mode 100644 index 000000000000..14fab8f0b957 --- /dev/null +++ b/arch/um/include/asm/kvm_para.h @@ -0,0 +1 @@ +#include diff --git a/arch/unicore32/include/asm/kvm_para.h b/arch/unicore32/include/asm/kvm_para.h new file mode 100644 index 000000000000..14fab8f0b957 --- /dev/null +++ b/arch/unicore32/include/asm/kvm_para.h @@ -0,0 +1 @@ +#include diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_para.h index 734c3767cfac..99c4bbe0cca2 100644 --- a/arch/x86/include/asm/kvm_para.h +++ b/arch/x86/include/asm/kvm_para.h @@ -95,6 +95,14 @@ struct kvm_vcpu_pv_apf_data { extern void kvmclock_init(void); extern int kvm_register_clock(char *txt); +#ifdef CONFIG_KVM_CLOCK +bool kvm_check_and_clear_guest_paused(void); +#else +static inline bool kvm_check_and_clear_guest_paused(void) +{ + return false; +} +#endif /* CONFIG_KVMCLOCK */ /* This instruction is vmcall. On non-VT architectures, it will generate a * trap that we will then rewrite to the appropriate instruction. diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index f8492da65bfc..4ba090ca689d 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include @@ -114,6 +115,26 @@ static void kvm_get_preset_lpj(void) preset_lpj = lpj; } +bool kvm_check_and_clear_guest_paused(void) +{ + bool ret = false; + struct pvclock_vcpu_time_info *src; + + /* + * per_cpu() is safe here because this function is only called from + * timer functions where preemption is already disabled. + */ + WARN_ON(!in_atomic()); + src = &__get_cpu_var(hv_clock); + if ((src->flags & PVCLOCK_GUEST_STOPPED) != 0) { + __this_cpu_and(hv_clock.flags, ~PVCLOCK_GUEST_STOPPED); + ret = true; + } + + return ret; +} +EXPORT_SYMBOL_GPL(kvm_check_and_clear_guest_paused); + static struct clocksource kvm_clock = { .name = "kvm-clock", .read = kvm_clock_get_cycles, diff --git a/arch/xtensa/include/asm/kvm_para.h b/arch/xtensa/include/asm/kvm_para.h new file mode 100644 index 000000000000..14fab8f0b957 --- /dev/null +++ b/arch/xtensa/include/asm/kvm_para.h @@ -0,0 +1 @@ +#include diff --git a/include/asm-generic/kvm_para.h b/include/asm-generic/kvm_para.h new file mode 100644 index 000000000000..05ef7e705939 --- /dev/null +++ b/include/asm-generic/kvm_para.h @@ -0,0 +1,14 @@ +#ifndef _ASM_GENERIC_KVM_PARA_H +#define _ASM_GENERIC_KVM_PARA_H + + +/* + * This function is used by architectures that support kvm to avoid issuing + * false soft lockup messages. + */ +static inline bool kvm_check_and_clear_guest_paused(void) +{ + return false; +} + +#endif -- cgit v1.2.3 From 1c0b28c2a46d98cd258d96b8c222144b22876c46 Mon Sep 17 00:00:00 2001 From: Eric B Munson Date: Sat, 10 Mar 2012 14:37:27 -0500 Subject: KVM: x86: Add ioctl for KVM_KVMCLOCK_CTRL Now that we have a flag that will tell the guest it was suspended, create an interface for that communication using a KVM ioctl. Signed-off-by: Eric B Munson Signed-off-by: Marcelo Tosatti Signed-off-by: Avi Kivity --- Documentation/virtual/kvm/api.txt | 20 ++++++++++++++++++++ Documentation/virtual/kvm/msr.txt | 4 ++++ arch/x86/kvm/x86.c | 22 ++++++++++++++++++++++ include/linux/kvm.h | 3 +++ 4 files changed, 49 insertions(+) (limited to 'include') diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt index 6386f8c0482e..81ff39f6248d 100644 --- a/Documentation/virtual/kvm/api.txt +++ b/Documentation/virtual/kvm/api.txt @@ -1669,6 +1669,26 @@ at the memory location pointed to by "addr". The list of registers accessible using this interface is identical to the list in 4.64. +4.70 KVM_KVMCLOCK_CTRL + +Capability: KVM_CAP_KVMCLOCK_CTRL +Architectures: Any that implement pvclocks (currently x86 only) +Type: vcpu ioctl +Parameters: None +Returns: 0 on success, -1 on error + +This signals to the host kernel that the specified guest is being paused by +userspace. The host will set a flag in the pvclock structure that is checked +from the soft lockup watchdog. The flag is part of the pvclock structure that +is shared between guest and host, specifically the second bit of the flags +field of the pvclock_vcpu_time_info structure. It will be set exclusively by +the host and read/cleared exclusively by the guest. The guest operation of +checking and clearing the flag must an atomic operation so +load-link/store-conditional, or equivalent must be used. There are two cases +where the guest will clear the flag: when the soft lockup watchdog timer resets +itself or when a soft lockup is detected. This ioctl can be called any time +after pausing the vcpu, but before it is resumed. + 5. The kvm_run structure Application code obtains a pointer to the kvm_run structure by diff --git a/Documentation/virtual/kvm/msr.txt b/Documentation/virtual/kvm/msr.txt index 50317809113d..96b41bd97523 100644 --- a/Documentation/virtual/kvm/msr.txt +++ b/Documentation/virtual/kvm/msr.txt @@ -108,6 +108,10 @@ MSR_KVM_SYSTEM_TIME_NEW: 0x4b564d01 | | time measures taken across 0 | 24 | multiple cpus are guaranteed to | | be monotonic + ------------------------------------------------------------- + | | guest vcpu has been paused by + 1 | N/A | the host + | | See 4.70 in api.txt ------------------------------------------------------------- Availability of this MSR must be checked via bit 3 in 0x4000001 cpuid diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 511031dcb9cc..99b738028fc0 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -2147,6 +2147,7 @@ int kvm_dev_ioctl_check_extension(long ext) case KVM_CAP_ASYNC_PF: case KVM_CAP_GET_TSC_KHZ: case KVM_CAP_PCI_2_3: + case KVM_CAP_KVMCLOCK_CTRL: r = 1; break; case KVM_CAP_COALESCED_MMIO: @@ -2597,6 +2598,23 @@ static int kvm_vcpu_ioctl_x86_set_xcrs(struct kvm_vcpu *vcpu, return r; } +/* + * kvm_set_guest_paused() indicates to the guest kernel that it has been + * stopped by the hypervisor. This function will be called from the host only. + * EINVAL is returned when the host attempts to set the flag for a guest that + * does not support pv clocks. + */ +static int kvm_set_guest_paused(struct kvm_vcpu *vcpu) +{ + struct pvclock_vcpu_time_info *src = &vcpu->arch.hv_clock; + if (!vcpu->arch.time_page) + return -EINVAL; + src->flags |= PVCLOCK_GUEST_STOPPED; + mark_page_dirty(vcpu->kvm, vcpu->arch.time >> PAGE_SHIFT); + kvm_make_request(KVM_REQ_CLOCK_UPDATE, vcpu); + return 0; +} + long kvm_arch_vcpu_ioctl(struct file *filp, unsigned int ioctl, unsigned long arg) { @@ -2873,6 +2891,10 @@ long kvm_arch_vcpu_ioctl(struct file *filp, r = vcpu->arch.virtual_tsc_khz; goto out; } + case KVM_KVMCLOCK_CTRL: { + r = kvm_set_guest_paused(vcpu); + goto out; + } default: r = -EINVAL; } diff --git a/include/linux/kvm.h b/include/linux/kvm.h index 6c322a90b92f..7a9dd4b3dede 100644 --- a/include/linux/kvm.h +++ b/include/linux/kvm.h @@ -589,6 +589,7 @@ struct kvm_ppc_pvinfo { #define KVM_CAP_S390_UCONTROL 73 #define KVM_CAP_SYNC_REGS 74 #define KVM_CAP_PCI_2_3 75 +#define KVM_CAP_KVMCLOCK_CTRL 76 #ifdef KVM_CAP_IRQ_ROUTING @@ -859,6 +860,8 @@ struct kvm_s390_ucas_mapping { /* Available with KVM_CAP_ONE_REG */ #define KVM_GET_ONE_REG _IOW(KVMIO, 0xab, struct kvm_one_reg) #define KVM_SET_ONE_REG _IOW(KVMIO, 0xac, struct kvm_one_reg) +/* VM is being stopped by host */ +#define KVM_KVMCLOCK_CTRL _IO(KVMIO, 0xad) #define KVM_DEV_ASSIGN_ENABLE_IOMMU (1 << 0) #define KVM_DEV_ASSIGN_PCI_2_3 (1 << 1) -- cgit v1.2.3 From 93474b25af1eabf5b14743793156e8d307bfcd6b Mon Sep 17 00:00:00 2001 From: Takuya Yoshikawa Date: Thu, 1 Mar 2012 19:34:45 +0900 Subject: KVM: Remove unused dirty_bitmap_head and nr_dirty_pages Now that we do neither double buffering nor heuristic selection of the write protection method these are not needed anymore. Note: some drivers have their own implementation of set_bit_le() and making it generic needs a bit of work; so we use test_and_set_bit_le() and will later replace it with generic set_bit_le(). Signed-off-by: Takuya Yoshikawa Signed-off-by: Avi Kivity --- include/linux/kvm_host.h | 2 -- virt/kvm/kvm_main.c | 14 +++++--------- 2 files changed, 5 insertions(+), 11 deletions(-) (limited to 'include') diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index 5184817e714a..49c2f2fd281f 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -179,8 +179,6 @@ struct kvm_memory_slot { unsigned long flags; unsigned long *rmap; unsigned long *dirty_bitmap; - unsigned long *dirty_bitmap_head; - unsigned long nr_dirty_pages; struct kvm_arch_memory_slot arch; unsigned long userspace_addr; int user_alloc; diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c index a612bc8c921c..6bd34a6ecca1 100644 --- a/virt/kvm/kvm_main.c +++ b/virt/kvm/kvm_main.c @@ -522,12 +522,11 @@ static void kvm_destroy_dirty_bitmap(struct kvm_memory_slot *memslot) return; if (2 * kvm_dirty_bitmap_bytes(memslot) > PAGE_SIZE) - vfree(memslot->dirty_bitmap_head); + vfree(memslot->dirty_bitmap); else - kfree(memslot->dirty_bitmap_head); + kfree(memslot->dirty_bitmap); memslot->dirty_bitmap = NULL; - memslot->dirty_bitmap_head = NULL; } /* @@ -611,8 +610,7 @@ static int kvm_vm_release(struct inode *inode, struct file *filp) /* * Allocation size is twice as large as the actual dirty bitmap size. - * This makes it possible to do double buffering: see x86's - * kvm_vm_ioctl_get_dirty_log(). + * See x86's kvm_vm_ioctl_get_dirty_log() why this is needed. */ static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot) { @@ -627,8 +625,6 @@ static int kvm_create_dirty_bitmap(struct kvm_memory_slot *memslot) if (!memslot->dirty_bitmap) return -ENOMEM; - memslot->dirty_bitmap_head = memslot->dirty_bitmap; - memslot->nr_dirty_pages = 0; #endif /* !CONFIG_S390 */ return 0; } @@ -1476,8 +1472,8 @@ void mark_page_dirty_in_slot(struct kvm *kvm, struct kvm_memory_slot *memslot, if (memslot && memslot->dirty_bitmap) { unsigned long rel_gfn = gfn - memslot->base_gfn; - if (!test_and_set_bit_le(rel_gfn, memslot->dirty_bitmap)) - memslot->nr_dirty_pages++; + /* TODO: introduce set_bit_le() and use it */ + test_and_set_bit_le(rel_gfn, memslot->dirty_bitmap); } } -- cgit v1.2.3 From ce580fe5190dec4d872e7925946b0aec1f694370 Mon Sep 17 00:00:00 2001 From: Sakari Ailus Date: Thu, 4 Aug 2011 13:51:11 -0300 Subject: [media] v4l: Introduce integer menu controls Create a new control type called V4L2_CTRL_TYPE_INTEGER_MENU. Integer menu controls are just like menu controls but the menu items are 64-bit integers rather than strings. Signed-off-by: Sakari Ailus Acked-by: Laurent Pinchart Tested-by: Sylwester Nawrocki Signed-off-by: Mauro Carvalho Chehab --- drivers/media/video/v4l2-ctrls.c | 74 +++++++++++++++++++++++++++++++--------- include/linux/videodev2.h | 6 +++- include/media/v4l2-ctrls.h | 6 +++- 3 files changed, 67 insertions(+), 19 deletions(-) (limited to 'include') diff --git a/drivers/media/video/v4l2-ctrls.c b/drivers/media/video/v4l2-ctrls.c index 18015c0a8d31..3e0a72dec994 100644 --- a/drivers/media/video/v4l2-ctrls.c +++ b/drivers/media/video/v4l2-ctrls.c @@ -852,7 +852,8 @@ static void fill_event(struct v4l2_event *ev, struct v4l2_ctrl *ctrl, u32 change ev->u.ctrl.value64 = ctrl->cur.val64; ev->u.ctrl.minimum = ctrl->minimum; ev->u.ctrl.maximum = ctrl->maximum; - if (ctrl->type == V4L2_CTRL_TYPE_MENU) + if (ctrl->type == V4L2_CTRL_TYPE_MENU + || ctrl->type == V4L2_CTRL_TYPE_INTEGER_MENU) ev->u.ctrl.step = 1; else ev->u.ctrl.step = ctrl->step; @@ -1083,10 +1084,13 @@ static int validate_new_int(const struct v4l2_ctrl *ctrl, s32 *pval) return 0; case V4L2_CTRL_TYPE_MENU: + case V4L2_CTRL_TYPE_INTEGER_MENU: if (val < ctrl->minimum || val > ctrl->maximum) return -ERANGE; - if (ctrl->qmenu[val][0] == '\0' || - (ctrl->menu_skip_mask & (1 << val))) + if (ctrl->menu_skip_mask & (1 << val)) + return -EINVAL; + if (ctrl->type == V4L2_CTRL_TYPE_MENU && + ctrl->qmenu[val][0] == '\0') return -EINVAL; return 0; @@ -1114,6 +1118,7 @@ static int validate_new(const struct v4l2_ctrl *ctrl, struct v4l2_ext_control *c case V4L2_CTRL_TYPE_INTEGER: case V4L2_CTRL_TYPE_BOOLEAN: case V4L2_CTRL_TYPE_MENU: + case V4L2_CTRL_TYPE_INTEGER_MENU: case V4L2_CTRL_TYPE_BITMASK: case V4L2_CTRL_TYPE_BUTTON: case V4L2_CTRL_TYPE_CTRL_CLASS: @@ -1343,7 +1348,8 @@ static struct v4l2_ctrl *v4l2_ctrl_new(struct v4l2_ctrl_handler *hdl, const struct v4l2_ctrl_ops *ops, u32 id, const char *name, enum v4l2_ctrl_type type, s32 min, s32 max, u32 step, s32 def, - u32 flags, const char * const *qmenu, void *priv) + u32 flags, const char * const *qmenu, + const s64 *qmenu_int, void *priv) { struct v4l2_ctrl *ctrl; unsigned sz_extra = 0; @@ -1356,6 +1362,7 @@ static struct v4l2_ctrl *v4l2_ctrl_new(struct v4l2_ctrl_handler *hdl, (type == V4L2_CTRL_TYPE_INTEGER && step == 0) || (type == V4L2_CTRL_TYPE_BITMASK && max == 0) || (type == V4L2_CTRL_TYPE_MENU && qmenu == NULL) || + (type == V4L2_CTRL_TYPE_INTEGER_MENU && qmenu_int == NULL) || (type == V4L2_CTRL_TYPE_STRING && max == 0)) { handler_set_err(hdl, -ERANGE); return NULL; @@ -1366,6 +1373,7 @@ static struct v4l2_ctrl *v4l2_ctrl_new(struct v4l2_ctrl_handler *hdl, } if ((type == V4L2_CTRL_TYPE_INTEGER || type == V4L2_CTRL_TYPE_MENU || + type == V4L2_CTRL_TYPE_INTEGER_MENU || type == V4L2_CTRL_TYPE_BOOLEAN) && (def < min || def > max)) { handler_set_err(hdl, -ERANGE); @@ -1400,7 +1408,10 @@ static struct v4l2_ctrl *v4l2_ctrl_new(struct v4l2_ctrl_handler *hdl, ctrl->minimum = min; ctrl->maximum = max; ctrl->step = step; - ctrl->qmenu = qmenu; + if (type == V4L2_CTRL_TYPE_MENU) + ctrl->qmenu = qmenu; + else if (type == V4L2_CTRL_TYPE_INTEGER_MENU) + ctrl->qmenu_int = qmenu_int; ctrl->priv = priv; ctrl->cur.val = ctrl->val = ctrl->default_value = def; @@ -1427,6 +1438,7 @@ struct v4l2_ctrl *v4l2_ctrl_new_custom(struct v4l2_ctrl_handler *hdl, struct v4l2_ctrl *ctrl; const char *name = cfg->name; const char * const *qmenu = cfg->qmenu; + const s64 *qmenu_int = cfg->qmenu_int; enum v4l2_ctrl_type type = cfg->type; u32 flags = cfg->flags; s32 min = cfg->min; @@ -1438,18 +1450,24 @@ struct v4l2_ctrl *v4l2_ctrl_new_custom(struct v4l2_ctrl_handler *hdl, v4l2_ctrl_fill(cfg->id, &name, &type, &min, &max, &step, &def, &flags); - is_menu = (cfg->type == V4L2_CTRL_TYPE_MENU); + is_menu = (cfg->type == V4L2_CTRL_TYPE_MENU || + cfg->type == V4L2_CTRL_TYPE_INTEGER_MENU); if (is_menu) WARN_ON(step); else WARN_ON(cfg->menu_skip_mask); - if (is_menu && qmenu == NULL) + if (cfg->type == V4L2_CTRL_TYPE_MENU && qmenu == NULL) qmenu = v4l2_ctrl_get_menu(cfg->id); + else if (cfg->type == V4L2_CTRL_TYPE_INTEGER_MENU && + qmenu_int == NULL) { + handler_set_err(hdl, -EINVAL); + return NULL; + } ctrl = v4l2_ctrl_new(hdl, cfg->ops, cfg->id, name, type, min, max, is_menu ? cfg->menu_skip_mask : step, - def, flags, qmenu, priv); + def, flags, qmenu, qmenu_int, priv); if (ctrl) ctrl->is_private = cfg->is_private; return ctrl; @@ -1466,12 +1484,13 @@ struct v4l2_ctrl *v4l2_ctrl_new_std(struct v4l2_ctrl_handler *hdl, u32 flags; v4l2_ctrl_fill(id, &name, &type, &min, &max, &step, &def, &flags); - if (type == V4L2_CTRL_TYPE_MENU) { + if (type == V4L2_CTRL_TYPE_MENU + || type == V4L2_CTRL_TYPE_INTEGER_MENU) { handler_set_err(hdl, -EINVAL); return NULL; } return v4l2_ctrl_new(hdl, ops, id, name, type, - min, max, step, def, flags, NULL, NULL); + min, max, step, def, flags, NULL, NULL, NULL); } EXPORT_SYMBOL(v4l2_ctrl_new_std); @@ -1493,7 +1512,7 @@ struct v4l2_ctrl *v4l2_ctrl_new_std_menu(struct v4l2_ctrl_handler *hdl, return NULL; } return v4l2_ctrl_new(hdl, ops, id, name, type, - 0, max, mask, def, flags, qmenu, NULL); + 0, max, mask, def, flags, qmenu, NULL, NULL); } EXPORT_SYMBOL(v4l2_ctrl_new_std_menu); @@ -1659,6 +1678,9 @@ static void log_ctrl(const struct v4l2_ctrl *ctrl, case V4L2_CTRL_TYPE_MENU: printk(KERN_CONT "%s", ctrl->qmenu[ctrl->cur.val]); break; + case V4L2_CTRL_TYPE_INTEGER_MENU: + printk(KERN_CONT "%lld", ctrl->qmenu_int[ctrl->cur.val]); + break; case V4L2_CTRL_TYPE_BITMASK: printk(KERN_CONT "0x%08x", ctrl->cur.val); break; @@ -1795,7 +1817,8 @@ int v4l2_queryctrl(struct v4l2_ctrl_handler *hdl, struct v4l2_queryctrl *qc) qc->minimum = ctrl->minimum; qc->maximum = ctrl->maximum; qc->default_value = ctrl->default_value; - if (ctrl->type == V4L2_CTRL_TYPE_MENU) + if (ctrl->type == V4L2_CTRL_TYPE_MENU + || ctrl->type == V4L2_CTRL_TYPE_INTEGER_MENU) qc->step = 1; else qc->step = ctrl->step; @@ -1825,16 +1848,33 @@ int v4l2_querymenu(struct v4l2_ctrl_handler *hdl, struct v4l2_querymenu *qm) qm->reserved = 0; /* Sanity checks */ - if (ctrl->qmenu == NULL || - i < ctrl->minimum || i > ctrl->maximum) + switch (ctrl->type) { + case V4L2_CTRL_TYPE_MENU: + if (ctrl->qmenu == NULL) + return -EINVAL; + break; + case V4L2_CTRL_TYPE_INTEGER_MENU: + if (ctrl->qmenu_int == NULL) + return -EINVAL; + break; + default: + return -EINVAL; + } + + if (i < ctrl->minimum || i > ctrl->maximum) return -EINVAL; + /* Use mask to see if this menu item should be skipped */ if (ctrl->menu_skip_mask & (1 << i)) return -EINVAL; /* Empty menu items should also be skipped */ - if (ctrl->qmenu[i] == NULL || ctrl->qmenu[i][0] == '\0') - return -EINVAL; - strlcpy(qm->name, ctrl->qmenu[i], sizeof(qm->name)); + if (ctrl->type == V4L2_CTRL_TYPE_MENU) { + if (ctrl->qmenu[i] == NULL || ctrl->qmenu[i][0] == '\0') + return -EINVAL; + strlcpy(qm->name, ctrl->qmenu[i], sizeof(qm->name)); + } else { + qm->value = ctrl->qmenu_int[i]; + } return 0; } EXPORT_SYMBOL(v4l2_querymenu); diff --git a/include/linux/videodev2.h b/include/linux/videodev2.h index c9c9a4680cc5..e69cacc9e9ea 100644 --- a/include/linux/videodev2.h +++ b/include/linux/videodev2.h @@ -1151,6 +1151,7 @@ enum v4l2_ctrl_type { V4L2_CTRL_TYPE_CTRL_CLASS = 6, V4L2_CTRL_TYPE_STRING = 7, V4L2_CTRL_TYPE_BITMASK = 8, + V4L2_CTRL_TYPE_INTEGER_MENU = 9, }; /* Used in the VIDIOC_QUERYCTRL ioctl for querying controls */ @@ -1170,7 +1171,10 @@ struct v4l2_queryctrl { struct v4l2_querymenu { __u32 id; __u32 index; - __u8 name[32]; /* Whatever */ + union { + __u8 name[32]; /* Whatever */ + __s64 value; + }; __u32 reserved; }; diff --git a/include/media/v4l2-ctrls.h b/include/media/v4l2-ctrls.h index 3dbd06638506..533315bd74e0 100644 --- a/include/media/v4l2-ctrls.h +++ b/include/media/v4l2-ctrls.h @@ -130,7 +130,10 @@ struct v4l2_ctrl { u32 step; u32 menu_skip_mask; }; - const char * const *qmenu; + union { + const char * const *qmenu; + const s64 *qmenu_int; + }; unsigned long flags; union { s32 val; @@ -220,6 +223,7 @@ struct v4l2_ctrl_config { u32 flags; u32 menu_skip_mask; const char * const *qmenu; + const s64 *qmenu_int; unsigned int is_private:1; }; -- cgit v1.2.3 From ae184cda8d0eebfea6cf217abc3f94a7cfffe24d Mon Sep 17 00:00:00 2001 From: Sakari Ailus Date: Fri, 14 Oct 2011 14:14:26 -0300 Subject: [media] v4l: VIDIOC_SUBDEV_S_SELECTION and VIDIOC_SUBDEV_G_SELECTION IOCTLs Add support for VIDIOC_SUBDEV_S_SELECTION and VIDIOC_SUBDEV_G_SELECTION IOCTLs. They replace functionality provided by VIDIOC_SUBDEV_S_CROP and VIDIOC_SUBDEV_G_CROP IOCTLs and also add new functionality (composing). VIDIOC_SUBDEV_G_CROP and VIDIOC_SUBDEV_S_CROP continue to be supported. Signed-off-by: Sakari Ailus Acked-by: Laurent Pinchart Signed-off-by: Mauro Carvalho Chehab --- drivers/media/video/v4l2-subdev.c | 42 +++++++++++++++++++++++++++++---------- include/linux/v4l2-subdev.h | 41 ++++++++++++++++++++++++++++++++++++++ include/media/v4l2-subdev.h | 21 ++++++++++++++++---- 3 files changed, 90 insertions(+), 14 deletions(-) (limited to 'include') diff --git a/drivers/media/video/v4l2-subdev.c b/drivers/media/video/v4l2-subdev.c index 6fe88e965a8c..7d225389bfb1 100644 --- a/drivers/media/video/v4l2-subdev.c +++ b/drivers/media/video/v4l2-subdev.c @@ -35,14 +35,9 @@ static int subdev_fh_init(struct v4l2_subdev_fh *fh, struct v4l2_subdev *sd) { #if defined(CONFIG_VIDEO_V4L2_SUBDEV_API) - /* Allocate try format and crop in the same memory block */ - fh->try_fmt = kzalloc((sizeof(*fh->try_fmt) + sizeof(*fh->try_crop)) - * sd->entity.num_pads, GFP_KERNEL); - if (fh->try_fmt == NULL) + fh->pad = kzalloc(sizeof(*fh->pad) * sd->entity.num_pads, GFP_KERNEL); + if (fh->pad == NULL) return -ENOMEM; - - fh->try_crop = (struct v4l2_rect *) - (fh->try_fmt + sd->entity.num_pads); #endif return 0; } @@ -50,9 +45,8 @@ static int subdev_fh_init(struct v4l2_subdev_fh *fh, struct v4l2_subdev *sd) static void subdev_fh_free(struct v4l2_subdev_fh *fh) { #if defined(CONFIG_VIDEO_V4L2_SUBDEV_API) - kfree(fh->try_fmt); - fh->try_fmt = NULL; - fh->try_crop = NULL; + kfree(fh->pad); + fh->pad = NULL; #endif } @@ -293,6 +287,34 @@ static long subdev_do_ioctl(struct file *file, unsigned int cmd, void *arg) return v4l2_subdev_call(sd, pad, enum_frame_interval, subdev_fh, fie); } + + case VIDIOC_SUBDEV_G_SELECTION: { + struct v4l2_subdev_selection *sel = arg; + + if (sel->which != V4L2_SUBDEV_FORMAT_TRY && + sel->which != V4L2_SUBDEV_FORMAT_ACTIVE) + return -EINVAL; + + if (sel->pad >= sd->entity.num_pads) + return -EINVAL; + + return v4l2_subdev_call( + sd, pad, get_selection, subdev_fh, sel); + } + + case VIDIOC_SUBDEV_S_SELECTION: { + struct v4l2_subdev_selection *sel = arg; + + if (sel->which != V4L2_SUBDEV_FORMAT_TRY && + sel->which != V4L2_SUBDEV_FORMAT_ACTIVE) + return -EINVAL; + + if (sel->pad >= sd->entity.num_pads) + return -EINVAL; + + return v4l2_subdev_call( + sd, pad, set_selection, subdev_fh, sel); + } #endif default: return v4l2_subdev_call(sd, core, ioctl, cmd, arg); diff --git a/include/linux/v4l2-subdev.h b/include/linux/v4l2-subdev.h index ed29cbbebfef..812019ee1e06 100644 --- a/include/linux/v4l2-subdev.h +++ b/include/linux/v4l2-subdev.h @@ -123,6 +123,43 @@ struct v4l2_subdev_frame_interval_enum { __u32 reserved[9]; }; +#define V4L2_SUBDEV_SEL_FLAG_SIZE_GE (1 << 0) +#define V4L2_SUBDEV_SEL_FLAG_SIZE_LE (1 << 1) +#define V4L2_SUBDEV_SEL_FLAG_KEEP_CONFIG (1 << 2) + +/* active cropping area */ +#define V4L2_SUBDEV_SEL_TGT_CROP_ACTUAL 0x0000 +/* cropping bounds */ +#define V4L2_SUBDEV_SEL_TGT_CROP_BOUNDS 0x0002 +/* current composing area */ +#define V4L2_SUBDEV_SEL_TGT_COMPOSE_ACTUAL 0x0100 +/* composing bounds */ +#define V4L2_SUBDEV_SEL_TGT_COMPOSE_BOUNDS 0x0102 + + +/** + * struct v4l2_subdev_selection - selection info + * + * @which: either V4L2_SUBDEV_FORMAT_ACTIVE or V4L2_SUBDEV_FORMAT_TRY + * @pad: pad number, as reported by the media API + * @target: selection target, used to choose one of possible rectangles + * @flags: constraint flags + * @r: coordinates of the selection window + * @reserved: for future use, set to zero for now + * + * Hardware may use multiple helper windows to process a video stream. + * The structure is used to exchange this selection areas between + * an application and a driver. + */ +struct v4l2_subdev_selection { + __u32 which; + __u32 pad; + __u32 target; + __u32 flags; + struct v4l2_rect r; + __u32 reserved[8]; +}; + #define VIDIOC_SUBDEV_G_FMT _IOWR('V', 4, struct v4l2_subdev_format) #define VIDIOC_SUBDEV_S_FMT _IOWR('V', 5, struct v4l2_subdev_format) #define VIDIOC_SUBDEV_G_FRAME_INTERVAL \ @@ -137,5 +174,9 @@ struct v4l2_subdev_frame_interval_enum { _IOWR('V', 75, struct v4l2_subdev_frame_interval_enum) #define VIDIOC_SUBDEV_G_CROP _IOWR('V', 59, struct v4l2_subdev_crop) #define VIDIOC_SUBDEV_S_CROP _IOWR('V', 60, struct v4l2_subdev_crop) +#define VIDIOC_SUBDEV_G_SELECTION \ + _IOWR('V', 61, struct v4l2_subdev_selection) +#define VIDIOC_SUBDEV_S_SELECTION \ + _IOWR('V', 62, struct v4l2_subdev_selection) #endif diff --git a/include/media/v4l2-subdev.h b/include/media/v4l2-subdev.h index f0f3358d1b1b..feab950bc8ab 100644 --- a/include/media/v4l2-subdev.h +++ b/include/media/v4l2-subdev.h @@ -466,6 +466,10 @@ struct v4l2_subdev_pad_ops { struct v4l2_subdev_crop *crop); int (*get_crop)(struct v4l2_subdev *sd, struct v4l2_subdev_fh *fh, struct v4l2_subdev_crop *crop); + int (*get_selection)(struct v4l2_subdev *sd, struct v4l2_subdev_fh *fh, + struct v4l2_subdev_selection *sel); + int (*set_selection)(struct v4l2_subdev *sd, struct v4l2_subdev_fh *fh, + struct v4l2_subdev_selection *sel); }; struct v4l2_subdev_ops { @@ -549,8 +553,11 @@ struct v4l2_subdev { struct v4l2_subdev_fh { struct v4l2_fh vfh; #if defined(CONFIG_VIDEO_V4L2_SUBDEV_API) - struct v4l2_mbus_framefmt *try_fmt; - struct v4l2_rect *try_crop; + struct { + struct v4l2_mbus_framefmt try_fmt; + struct v4l2_rect try_crop; + struct v4l2_rect try_compose; + } *pad; #endif }; @@ -561,13 +568,19 @@ struct v4l2_subdev_fh { static inline struct v4l2_mbus_framefmt * v4l2_subdev_get_try_format(struct v4l2_subdev_fh *fh, unsigned int pad) { - return &fh->try_fmt[pad]; + return &fh->pad[pad].try_fmt; } static inline struct v4l2_rect * v4l2_subdev_get_try_crop(struct v4l2_subdev_fh *fh, unsigned int pad) { - return &fh->try_crop[pad]; + return &fh->pad[pad].try_crop; +} + +static inline struct v4l2_rect * +v4l2_subdev_get_try_compose(struct v4l2_subdev_fh *fh, unsigned int pad) +{ + return &fh->pad[pad].try_compose; } #endif -- cgit v1.2.3 From c5a766ceb497078459115fcbd1412917083aa4a5 Mon Sep 17 00:00:00 2001 From: Sakari Ailus Date: Wed, 15 Feb 2012 22:58:12 -0300 Subject: [media] v4l: vdev_to_v4l2_subdev() should have return type "struct v4l2_subdev *" vdev_to_v4l2_subdev() should return struct v4l2_subdev *, not void *. Fix this. Signed-off-by: Sakari Ailus Acked-by: Laurent Pinchart Signed-off-by: Mauro Carvalho Chehab --- include/media/v4l2-subdev.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/media/v4l2-subdev.h b/include/media/v4l2-subdev.h index feab950bc8ab..bcaf6b80bb20 100644 --- a/include/media/v4l2-subdev.h +++ b/include/media/v4l2-subdev.h @@ -545,7 +545,7 @@ struct v4l2_subdev { #define media_entity_to_v4l2_subdev(ent) \ container_of(ent, struct v4l2_subdev, entity) #define vdev_to_v4l2_subdev(vdev) \ - video_get_drvdata(vdev) + ((struct v4l2_subdev *)video_get_drvdata(vdev)) /* * Used for storing subdev information per file handle -- cgit v1.2.3 From 5e6ff7c17bf468b8bc012e49174771e5f718e72c Mon Sep 17 00:00:00 2001 From: Sakari Ailus Date: Wed, 15 Feb 2012 22:57:22 -0300 Subject: [media] v4l: Check pad number in get try pointer functions Unify functions to get try pointers and validate the pad number accessed by the user. Signed-off-by: Sakari Ailus Acked-by: Laurent Pinchart Signed-off-by: Mauro Carvalho Chehab --- include/media/v4l2-subdev.h | 30 +++++++++++++----------------- 1 file changed, 13 insertions(+), 17 deletions(-) (limited to 'include') diff --git a/include/media/v4l2-subdev.h b/include/media/v4l2-subdev.h index bcaf6b80bb20..7e850355a6f0 100644 --- a/include/media/v4l2-subdev.h +++ b/include/media/v4l2-subdev.h @@ -565,23 +565,19 @@ struct v4l2_subdev_fh { container_of(fh, struct v4l2_subdev_fh, vfh) #if defined(CONFIG_VIDEO_V4L2_SUBDEV_API) -static inline struct v4l2_mbus_framefmt * -v4l2_subdev_get_try_format(struct v4l2_subdev_fh *fh, unsigned int pad) -{ - return &fh->pad[pad].try_fmt; -} - -static inline struct v4l2_rect * -v4l2_subdev_get_try_crop(struct v4l2_subdev_fh *fh, unsigned int pad) -{ - return &fh->pad[pad].try_crop; -} - -static inline struct v4l2_rect * -v4l2_subdev_get_try_compose(struct v4l2_subdev_fh *fh, unsigned int pad) -{ - return &fh->pad[pad].try_compose; -} +#define __V4L2_SUBDEV_MK_GET_TRY(rtype, fun_name, field_name) \ + static inline struct rtype * \ + v4l2_subdev_get_try_##fun_name(struct v4l2_subdev_fh *fh, \ + unsigned int pad) \ + { \ + BUG_ON(unlikely(pad >= vdev_to_v4l2_subdev( \ + fh->vfh.vdev)->entity.num_pads)); \ + return &fh->pad[pad].field_name; \ + } + +__V4L2_SUBDEV_MK_GET_TRY(v4l2_mbus_framefmt, format, try_fmt) +__V4L2_SUBDEV_MK_GET_TRY(v4l2_rect, crop, try_compose) +__V4L2_SUBDEV_MK_GET_TRY(v4l2_rect, compose, try_compose) #endif extern const struct v4l2_file_operations v4l2_subdev_fops; -- cgit v1.2.3 From 9d454d48ebcd9938ac60a245fa545d9db1035f1a Mon Sep 17 00:00:00 2001 From: Anssi Hannula Date: Sun, 1 Apr 2012 16:41:46 -0300 Subject: [media] ati_remote: add support for Medion X10 Digitainer remote Add support for another Medion X10 remote. This was apparently originally used with the Medion Digitainer box, but is now sold separately without any Digitainer labeling. A peculiarity of this remote is a scrollwheel in place of up/down buttons. Each direction is mapped to 8 different scancodes, each corresponding to 1..8 notches, allowing multiple notches to the same direction to be transmitted in a single scancode. The driver transforms the multi-notch scancodes to multiple events of the single-notch scancode. (0x70..0x77 = 1..8 notches down, 0x78..0x7f = 1..8 notches up) Since the scrollwheel scancodes are the same that are used for mouse on some other X10 (ati_remote) remotes, the driver will now check whether the active keymap has a keycode defined for the single-notch scancode when a mouse/scrollwheel scancode (0x70..0x7f) is received. If set, scrollwheel is assumed, otherwise mouse is assumed. This remote ships with a different receiver than the already supported Medion X10 remote, but they share the same USB ID. The only difference in the USB descriptors is that the Digitainer receiver has the Remote Wakeup bit set in bmAttributes of the Configuration Descriptor. Therefore that is used to select the default keymap. Thanks to Stephan Raue from OpenELEC (www.openelec.tv) for providing me both a Medion X10 Digitainer remote+receiver and an already supported Medion X10 remote+receiver. Thanks to Martin Beyss for providing some useful information about the remote (including the "Digitainer" name). This patch has been tested by both of them and myself. Signed-off-by: Anssi Hannula Tested-by: Stephan Raue Tested-by: Martin Beyss Signed-off-by: Mauro Carvalho Chehab --- drivers/media/rc/ati_remote.c | 90 +++++++++++----- drivers/media/rc/keymaps/Makefile | 1 + .../media/rc/keymaps/rc-medion-x10-digitainer.c | 115 +++++++++++++++++++++ include/media/rc-map.h | 1 + 4 files changed, 179 insertions(+), 28 deletions(-) create mode 100644 drivers/media/rc/keymaps/rc-medion-x10-digitainer.c (limited to 'include') diff --git a/drivers/media/rc/ati_remote.c b/drivers/media/rc/ati_remote.c index 7a35f7afad50..26fa043d3de7 100644 --- a/drivers/media/rc/ati_remote.c +++ b/drivers/media/rc/ati_remote.c @@ -1,7 +1,7 @@ /* * USB ATI Remote support * - * Copyright (c) 2011 Anssi Hannula + * Copyright (c) 2011, 2012 Anssi Hannula * Version 2.2.0 Copyright (c) 2004 Torrey Hoffman * Version 2.1.1 Copyright (c) 2002 Vladimir Dergachev * @@ -157,8 +157,20 @@ struct ati_receiver_type { const char *(*get_default_keymap)(struct usb_interface *interface); }; +static const char *get_medion_keymap(struct usb_interface *interface) +{ + struct usb_device *udev = interface_to_usbdev(interface); + + /* The receiver shipped with the "Digitainer" variant helpfully has + * a single additional bit set in its descriptor. */ + if (udev->actconfig->desc.bmAttributes & USB_CONFIG_ATT_WAKEUP) + return RC_MAP_MEDION_X10_DIGITAINER; + + return RC_MAP_MEDION_X10; +} + static const struct ati_receiver_type type_ati = { .default_keymap = RC_MAP_ATI_X10 }; -static const struct ati_receiver_type type_medion = { .default_keymap = RC_MAP_MEDION_X10 }; +static const struct ati_receiver_type type_medion = { .get_default_keymap = get_medion_keymap }; static const struct ati_receiver_type type_firefly = { .default_keymap = RC_MAP_SNAPSTREAM_FIREFLY }; static struct usb_device_id ati_remote_table[] = { @@ -455,6 +467,7 @@ static void ati_remote_input_report(struct urb *urb) int acc; int remote_num; unsigned char scancode; + u32 wheel_keycode = KEY_RESERVED; int i; /* @@ -494,26 +507,33 @@ static void ati_remote_input_report(struct urb *urb) */ scancode = data[2] & 0x7f; - /* Look up event code index in the mouse translation table. */ - for (i = 0; ati_remote_tbl[i].kind != KIND_END; i++) { - if (scancode == ati_remote_tbl[i].data) { - index = i; - break; + dbginfo(&ati_remote->interface->dev, + "channel 0x%02x; key data %02x, scancode %02x\n", + remote_num, data[2], scancode); + + if (scancode >= 0x70) { + /* + * This is either a mouse or scrollwheel event, depending on + * the remote/keymap. + * Get the keycode assigned to scancode 0x78/0x70. If it is + * set, assume this is a scrollwheel up/down event. + */ + wheel_keycode = rc_g_keycode_from_table(ati_remote->rdev, + scancode & 0x78); + + if (wheel_keycode == KEY_RESERVED) { + /* scrollwheel was not mapped, assume mouse */ + + /* Look up event code index in the mouse translation table. */ + for (i = 0; ati_remote_tbl[i].kind != KIND_END; i++) { + if (scancode == ati_remote_tbl[i].data) { + index = i; + break; + } + } } } - if (index >= 0) { - dbginfo(&ati_remote->interface->dev, - "channel 0x%02x; mouse data %02x; index %d; keycode %d\n", - remote_num, data[2], index, ati_remote_tbl[index].code); - if (!dev) - return; /* no mouse device */ - } else - dbginfo(&ati_remote->interface->dev, - "channel 0x%02x; key data %02x, scancode %02x\n", - remote_num, data[2], scancode); - - if (index >= 0 && ati_remote_tbl[index].kind == KIND_LITERAL) { input_event(dev, ati_remote_tbl[index].type, ati_remote_tbl[index].code, @@ -552,15 +572,29 @@ static void ati_remote_input_report(struct urb *urb) if (index < 0) { /* Not a mouse event, hand it to rc-core. */ - - /* - * We don't use the rc-core repeat handling yet as - * it would cause ghost repeats which would be a - * regression for this driver. - */ - rc_keydown_notimeout(ati_remote->rdev, scancode, - data[2]); - rc_keyup(ati_remote->rdev); + int count = 1; + + if (wheel_keycode != KEY_RESERVED) { + /* + * This is a scrollwheel event, send the + * scroll up (0x78) / down (0x70) scancode + * repeatedly as many times as indicated by + * rest of the scancode. + */ + count = (scancode & 0x07) + 1; + scancode &= 0x78; + } + + while (count--) { + /* + * We don't use the rc-core repeat handling yet as + * it would cause ghost repeats which would be a + * regression for this driver. + */ + rc_keydown_notimeout(ati_remote->rdev, scancode, + data[2]); + rc_keyup(ati_remote->rdev); + } return; } diff --git a/drivers/media/rc/keymaps/Makefile b/drivers/media/rc/keymaps/Makefile index 49ce2662f56b..38ff6e0e099a 100644 --- a/drivers/media/rc/keymaps/Makefile +++ b/drivers/media/rc/keymaps/Makefile @@ -52,6 +52,7 @@ obj-$(CONFIG_RC_MAP) += rc-adstech-dvb-t-pci.o \ rc-lme2510.o \ rc-manli.o \ rc-medion-x10.o \ + rc-medion-x10-digitainer.o \ rc-msi-digivox-ii.o \ rc-msi-digivox-iii.o \ rc-msi-tvanywhere.o \ diff --git a/drivers/media/rc/keymaps/rc-medion-x10-digitainer.c b/drivers/media/rc/keymaps/rc-medion-x10-digitainer.c new file mode 100644 index 000000000000..0a5ce84d9fd8 --- /dev/null +++ b/drivers/media/rc/keymaps/rc-medion-x10-digitainer.c @@ -0,0 +1,115 @@ +/* + * Medion X10 RF remote keytable (Digitainer variant) + * + * Copyright (C) 2012 Anssi Hannula + * + * This keymap is for a variant that has a distinctive scrollwheel instead of + * up/down buttons (tested with P/N 40009936 / 20018268), reportedly + * originally shipped with Medion Digitainer but now sold separately simply as + * an "X10" remote. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License along + * with this program; if not, write to the Free Software Foundation, Inc., + * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + */ + +#include +#include + +static struct rc_map_table medion_x10_digitainer[] = { + { 0x02, KEY_POWER }, + + { 0x2c, KEY_TV }, + { 0x2d, KEY_VIDEO }, + { 0x04, KEY_DVD }, /* CD/DVD */ + { 0x16, KEY_TEXT }, /* "teletext" icon, i.e. a screen with lines */ + { 0x06, KEY_AUDIO }, + { 0x2e, KEY_RADIO }, + { 0x31, KEY_EPG }, /* a screen with an open book */ + { 0x05, KEY_IMAGES }, /* Photo */ + { 0x2f, KEY_INFO }, + + { 0x78, KEY_UP }, /* scrollwheel up 1 notch */ + /* 0x79..0x7f: 2-8 notches, driver repeats 0x78 entry */ + + { 0x70, KEY_DOWN }, /* scrollwheel down 1 notch */ + /* 0x71..0x77: 2-8 notches, driver repeats 0x70 entry */ + + { 0x19, KEY_MENU }, + { 0x1d, KEY_LEFT }, + { 0x1e, KEY_OK }, /* scrollwheel press */ + { 0x1f, KEY_RIGHT }, + { 0x20, KEY_BACK }, + + { 0x09, KEY_VOLUMEUP }, + { 0x08, KEY_VOLUMEDOWN }, + { 0x00, KEY_MUTE }, + + { 0x1b, KEY_SELECT }, /* also has "U" rotated 90 degrees CCW */ + + { 0x0b, KEY_CHANNELUP }, + { 0x0c, KEY_CHANNELDOWN }, + { 0x1c, KEY_LAST }, + + { 0x32, KEY_RED }, /* also Audio */ + { 0x33, KEY_GREEN }, /* also Subtitle */ + { 0x34, KEY_YELLOW }, /* also Angle */ + { 0x35, KEY_BLUE }, /* also Title */ + + { 0x28, KEY_STOP }, + { 0x29, KEY_PAUSE }, + { 0x25, KEY_PLAY }, + { 0x21, KEY_PREVIOUS }, + { 0x18, KEY_CAMERA }, + { 0x23, KEY_NEXT }, + { 0x24, KEY_REWIND }, + { 0x27, KEY_RECORD }, + { 0x26, KEY_FORWARD }, + + { 0x0d, KEY_1 }, + { 0x0e, KEY_2 }, + { 0x0f, KEY_3 }, + { 0x10, KEY_4 }, + { 0x11, KEY_5 }, + { 0x12, KEY_6 }, + { 0x13, KEY_7 }, + { 0x14, KEY_8 }, + { 0x15, KEY_9 }, + { 0x17, KEY_0 }, +}; + +static struct rc_map_list medion_x10_digitainer_map = { + .map = { + .scan = medion_x10_digitainer, + .size = ARRAY_SIZE(medion_x10_digitainer), + .rc_type = RC_TYPE_OTHER, + .name = RC_MAP_MEDION_X10_DIGITAINER, + } +}; + +static int __init init_rc_map_medion_x10_digitainer(void) +{ + return rc_map_register(&medion_x10_digitainer_map); +} + +static void __exit exit_rc_map_medion_x10_digitainer(void) +{ + rc_map_unregister(&medion_x10_digitainer_map); +} + +module_init(init_rc_map_medion_x10_digitainer) +module_exit(exit_rc_map_medion_x10_digitainer) + +MODULE_DESCRIPTION("Medion X10 RF remote keytable (Digitainer variant)"); +MODULE_AUTHOR("Anssi Hannula "); +MODULE_LICENSE("GPL"); diff --git a/include/media/rc-map.h b/include/media/rc-map.h index 8db6741c1256..88583a6ff7f2 100644 --- a/include/media/rc-map.h +++ b/include/media/rc-map.h @@ -113,6 +113,7 @@ void rc_map_init(void); #define RC_MAP_LME2510 "rc-lme2510" #define RC_MAP_MANLI "rc-manli" #define RC_MAP_MEDION_X10 "rc-medion-x10" +#define RC_MAP_MEDION_X10_DIGITAINER "rc-medion-x10-digitainer" #define RC_MAP_MSI_DIGIVOX_II "rc-msi-digivox-ii" #define RC_MAP_MSI_DIGIVOX_III "rc-msi-digivox-iii" #define RC_MAP_MSI_TVANYWHERE_PLUS "rc-msi-tvanywhere-plus" -- cgit v1.2.3 From 2db938bee32e7469ca8ed9bfb3a05535f28c680d Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Mon, 21 Feb 2011 17:25:37 +0100 Subject: jbd: Refine commit writeout logic Currently we write out all journal buffers in WRITE_SYNC mode. This improves performance for fsync heavy workloads but hinders performance when writes are mostly asynchronous, most noticably it slows down readers and users complain about slow desktop response etc. So submit writes as asynchronous in the normal case and only submit writes as WRITE_SYNC if we detect someone is waiting for current transaction commit. I've gathered some numbers to back this change. The first is the read latency test. It measures time to read 1 MB after several seconds of sleeping in presence of streaming writes. Top 10 times (out of 90) in us: Before After 2131586 697473 1709932 557487 1564598 535642 1480462 347573 1478579 323153 1408496 222181 1388960 181273 1329565 181070 1252486 172832 1223265 172278 Average: 619377 82180 So the improvement in both maximum and average latency is massive. I've measured fsync throughput by: fs_mark -n 100 -t 1 -s 16384 -d /mnt/fsync/ -S 1 -L 4 in presence of streaming reader. The numbers (fsyncs/s) are: Before After 9.9 6.3 6.8 6.0 6.3 6.2 5.8 6.1 So fsync performance seems unharmed by this change. Signed-off-by: Jan Kara --- fs/jbd/commit.c | 10 +++++++--- fs/jbd/journal.c | 2 ++ fs/jbd/transaction.c | 2 -- include/linux/jbd.h | 15 +++++++++------ include/trace/events/jbd.h | 24 ++++++++---------------- 5 files changed, 26 insertions(+), 27 deletions(-) (limited to 'include') diff --git a/fs/jbd/commit.c b/fs/jbd/commit.c index f2b9a571f4cf..9d31e6a39205 100644 --- a/fs/jbd/commit.c +++ b/fs/jbd/commit.c @@ -298,6 +298,7 @@ void journal_commit_transaction(journal_t *journal) int tag_flag; int i; struct blk_plug plug; + int write_op = WRITE; /* * First job: lock down the current transaction and wait for @@ -413,13 +414,16 @@ void journal_commit_transaction(journal_t *journal) jbd_debug (3, "JBD: commit phase 2\n"); + if (tid_geq(journal->j_commit_waited, commit_transaction->t_tid)) + write_op = WRITE_SYNC; + /* * Now start flushing things to disk, in the order they appear * on the transaction lists. Data blocks go first. */ blk_start_plug(&plug); err = journal_submit_data_buffers(journal, commit_transaction, - WRITE_SYNC); + write_op); blk_finish_plug(&plug); /* @@ -478,7 +482,7 @@ void journal_commit_transaction(journal_t *journal) blk_start_plug(&plug); - journal_write_revoke_records(journal, commit_transaction, WRITE_SYNC); + journal_write_revoke_records(journal, commit_transaction, write_op); /* * If we found any dirty or locked buffers, then we should have @@ -649,7 +653,7 @@ start_journal_io: clear_buffer_dirty(bh); set_buffer_uptodate(bh); bh->b_end_io = journal_end_buffer_io_sync; - submit_bh(WRITE_SYNC, bh); + submit_bh(write_op, bh); } cond_resched(); diff --git a/fs/jbd/journal.c b/fs/jbd/journal.c index 0971e9217808..2047fd77bf38 100644 --- a/fs/jbd/journal.c +++ b/fs/jbd/journal.c @@ -563,6 +563,8 @@ int log_wait_commit(journal_t *journal, tid_t tid) spin_unlock(&journal->j_state_lock); #endif spin_lock(&journal->j_state_lock); + if (!tid_geq(journal->j_commit_waited, tid)) + journal->j_commit_waited = tid; while (tid_gt(tid, journal->j_commit_sequence)) { jbd_debug(1, "JBD: want %d, j_commit_sequence=%d\n", tid, journal->j_commit_sequence); diff --git a/fs/jbd/transaction.c b/fs/jbd/transaction.c index b2a7e5244e39..febc10db5ced 100644 --- a/fs/jbd/transaction.c +++ b/fs/jbd/transaction.c @@ -1433,8 +1433,6 @@ int journal_stop(handle_t *handle) } } - if (handle->h_sync) - transaction->t_synchronous_commit = 1; current->journal_info = NULL; spin_lock(&journal->j_state_lock); spin_lock(&transaction->t_handle_lock); diff --git a/include/linux/jbd.h b/include/linux/jbd.h index d211732b9e99..f265682ae134 100644 --- a/include/linux/jbd.h +++ b/include/linux/jbd.h @@ -479,12 +479,6 @@ struct transaction_s * How many handles used this transaction? [t_handle_lock] */ int t_handle_count; - - /* - * This transaction is being forced and some process is - * waiting for it to finish. - */ - unsigned int t_synchronous_commit:1; }; /** @@ -531,6 +525,8 @@ struct transaction_s * transaction * @j_commit_request: Sequence number of the most recent transaction wanting * commit + * @j_commit_waited: Sequence number of the most recent transaction someone + * is waiting for to commit. * @j_uuid: Uuid of client object. * @j_task: Pointer to the current commit thread for this journal * @j_max_transaction_buffers: Maximum number of metadata buffers to allow in a @@ -695,6 +691,13 @@ struct journal_s */ tid_t j_commit_request; + /* + * Sequence number of the most recent transaction someone is waiting + * for to commit. + * [j_state_lock] + */ + tid_t j_commit_waited; + /* * Journal uuid: identifies the object (filesystem, LVM volume etc) * backed by this journal. This will eventually be replaced by an array diff --git a/include/trace/events/jbd.h b/include/trace/events/jbd.h index aff64d82d713..9305e1b5edc3 100644 --- a/include/trace/events/jbd.h +++ b/include/trace/events/jbd.h @@ -36,19 +36,17 @@ DECLARE_EVENT_CLASS(jbd_commit, TP_STRUCT__entry( __field( dev_t, dev ) - __field( char, sync_commit ) __field( int, transaction ) ), TP_fast_assign( __entry->dev = journal->j_fs_dev->bd_dev; - __entry->sync_commit = commit_transaction->t_synchronous_commit; __entry->transaction = commit_transaction->t_tid; ), - TP_printk("dev %d,%d transaction %d sync %d", + TP_printk("dev %d,%d transaction %d", MAJOR(__entry->dev), MINOR(__entry->dev), - __entry->transaction, __entry->sync_commit) + __entry->transaction) ); DEFINE_EVENT(jbd_commit, jbd_start_commit, @@ -87,19 +85,17 @@ TRACE_EVENT(jbd_drop_transaction, TP_STRUCT__entry( __field( dev_t, dev ) - __field( char, sync_commit ) __field( int, transaction ) ), TP_fast_assign( __entry->dev = journal->j_fs_dev->bd_dev; - __entry->sync_commit = commit_transaction->t_synchronous_commit; __entry->transaction = commit_transaction->t_tid; ), - TP_printk("dev %d,%d transaction %d sync %d", + TP_printk("dev %d,%d transaction %d", MAJOR(__entry->dev), MINOR(__entry->dev), - __entry->transaction, __entry->sync_commit) + __entry->transaction) ); TRACE_EVENT(jbd_end_commit, @@ -109,21 +105,19 @@ TRACE_EVENT(jbd_end_commit, TP_STRUCT__entry( __field( dev_t, dev ) - __field( char, sync_commit ) __field( int, transaction ) __field( int, head ) ), TP_fast_assign( __entry->dev = journal->j_fs_dev->bd_dev; - __entry->sync_commit = commit_transaction->t_synchronous_commit; __entry->transaction = commit_transaction->t_tid; __entry->head = journal->j_tail_sequence; ), - TP_printk("dev %d,%d transaction %d sync %d head %d", + TP_printk("dev %d,%d transaction %d head %d", MAJOR(__entry->dev), MINOR(__entry->dev), - __entry->transaction, __entry->sync_commit, __entry->head) + __entry->transaction, __entry->head) ); TRACE_EVENT(jbd_do_submit_data, @@ -133,19 +127,17 @@ TRACE_EVENT(jbd_do_submit_data, TP_STRUCT__entry( __field( dev_t, dev ) - __field( char, sync_commit ) __field( int, transaction ) ), TP_fast_assign( __entry->dev = journal->j_fs_dev->bd_dev; - __entry->sync_commit = commit_transaction->t_synchronous_commit; __entry->transaction = commit_transaction->t_tid; ), - TP_printk("dev %d,%d transaction %d sync %d", + TP_printk("dev %d,%d transaction %d", MAJOR(__entry->dev), MINOR(__entry->dev), - __entry->transaction, __entry->sync_commit) + __entry->transaction) ); TRACE_EVENT(jbd_cleanup_journal_tail, -- cgit v1.2.3 From a9aa53df6e6c768fc0f25a7c80ba586b0290720a Mon Sep 17 00:00:00 2001 From: Simo Sorce Date: Thu, 29 Mar 2012 19:18:19 -0400 Subject: svcauth: remove unused define Signed-off-by: Simo Sorce --- include/linux/sunrpc/svcauth.h | 1 - 1 file changed, 1 deletion(-) (limited to 'include') diff --git a/include/linux/sunrpc/svcauth.h b/include/linux/sunrpc/svcauth.h index 548790e9113b..2e2af101b59c 100644 --- a/include/linux/sunrpc/svcauth.h +++ b/include/linux/sunrpc/svcauth.h @@ -16,7 +16,6 @@ #include #include -#define SVC_CRED_NGROUPS 32 struct svc_cred { uid_t cr_uid; gid_t cr_gid; -- cgit v1.2.3 From db3a35326362624dd4d8473e676d63afa52bedcc Mon Sep 17 00:00:00 2001 From: Stanislav Kinsbursky Date: Wed, 28 Mar 2012 19:09:08 +0400 Subject: nfsd: add link to owner cache detail to svc_export structure Without info about owner cache datail it won't be able to find out, which per-net cache detail have to be. Signed-off-by: Stanislav Kinsbursky Signed-off-by: J. Bruce Fields --- fs/nfsd/export.c | 10 +++++----- include/linux/nfsd/export.h | 1 + 2 files changed, 6 insertions(+), 5 deletions(-) (limited to 'include') diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c index 71c5ce35a1a5..99ea4c00240c 100644 --- a/fs/nfsd/export.c +++ b/fs/nfsd/export.c @@ -525,6 +525,7 @@ static int svc_export_parse(struct cache_detail *cd, char *mesg, int mlen) goto out1; exp.ex_client = dom; + exp.cd = cd; /* expiry */ err = -EINVAL; @@ -672,6 +673,7 @@ static void svc_export_init(struct cache_head *cnew, struct cache_head *citem) new->ex_fslocs.locations = NULL; new->ex_fslocs.locations_count = 0; new->ex_fslocs.migrated = 0; + new->cd = item->cd; } static void export_update(struct cache_head *cnew, struct cache_head *citem) @@ -739,8 +741,7 @@ svc_export_lookup(struct svc_export *exp) struct cache_head *ch; int hash = svc_export_hash(exp); - ch = sunrpc_cache_lookup(&svc_export_cache, &exp->h, - hash); + ch = sunrpc_cache_lookup(exp->cd, &exp->h, hash); if (ch) return container_of(ch, struct svc_export, h); else @@ -753,9 +754,7 @@ svc_export_update(struct svc_export *new, struct svc_export *old) struct cache_head *ch; int hash = svc_export_hash(old); - ch = sunrpc_cache_update(&svc_export_cache, &new->h, - &old->h, - hash); + ch = sunrpc_cache_update(old->cd, &new->h, &old->h, hash); if (ch) return container_of(ch, struct svc_export, h); else @@ -797,6 +796,7 @@ static svc_export *exp_get_by_name(svc_client *clp, const struct path *path, key.ex_client = clp; key.ex_path = *path; + key.cd = &svc_export_cache; exp = svc_export_lookup(&key); if (exp == NULL) diff --git a/include/linux/nfsd/export.h b/include/linux/nfsd/export.h index f85308e688fd..64455292bbba 100644 --- a/include/linux/nfsd/export.h +++ b/include/linux/nfsd/export.h @@ -103,6 +103,7 @@ struct svc_export { struct nfsd4_fs_locations ex_fslocs; int ex_nflavors; struct exp_flavor_info ex_flavors[MAX_SECINFO_LIST]; + struct cache_detail *cd; }; /* an "export key" (expkey) maps a filehandlefragement to an -- cgit v1.2.3 From 71234978e81ee515c8025d087a197561b311c183 Mon Sep 17 00:00:00 2001 From: Stanislav Kinsbursky Date: Wed, 28 Mar 2012 19:09:15 +0400 Subject: nfsd: use cache detail pointer from svc_export structure on cache put Hard-coded pointer is redundant now and can be replaced. Signed-off-by: Stanislav Kinsbursky Signed-off-by: J. Bruce Fields --- include/linux/nfsd/export.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/nfsd/export.h b/include/linux/nfsd/export.h index 64455292bbba..485c2afa96f7 100644 --- a/include/linux/nfsd/export.h +++ b/include/linux/nfsd/export.h @@ -147,7 +147,7 @@ extern struct cache_detail svc_export_cache; static inline void exp_put(struct svc_export *exp) { - cache_put(&exp->h, &svc_export_cache); + cache_put(&exp->h, exp->cd); } static inline void exp_get(struct svc_export *exp) -- cgit v1.2.3 From e3f70eadb7dddfb5a2bb9afff7abfc6ee17a29d0 Mon Sep 17 00:00:00 2001 From: Stanislav Kinsbursky Date: Thu, 29 Mar 2012 18:54:33 +0400 Subject: Lockd: pass network namespace to creation and destruction routines v2: dereference of most probably already released nlm_host removed in nlmclnt_done() and reclaimer(). These routines are called from locks reclaimer() kernel thread. This thread works in "init_net" network context and currently relays on persence on lockd thread and it's per-net resources. Thus lockd_up() and lockd_down() can't relay on current network context. So let's pass corrent one into them. Signed-off-by: Stanislav Kinsbursky Signed-off-by: J. Bruce Fields --- fs/lockd/clntlock.c | 13 ++++++++----- fs/lockd/svc.c | 7 +++---- fs/nfsd/nfssvc.c | 6 +++--- include/linux/lockd/bind.h | 4 ++-- 4 files changed, 16 insertions(+), 14 deletions(-) (limited to 'include') diff --git a/fs/lockd/clntlock.c b/fs/lockd/clntlock.c index ba1dc2eebd1e..ca0a08001449 100644 --- a/fs/lockd/clntlock.c +++ b/fs/lockd/clntlock.c @@ -56,7 +56,7 @@ struct nlm_host *nlmclnt_init(const struct nlmclnt_initdata *nlm_init) u32 nlm_version = (nlm_init->nfs_version == 2) ? 1 : 4; int status; - status = lockd_up(); + status = lockd_up(nlm_init->net); if (status < 0) return ERR_PTR(status); @@ -65,7 +65,7 @@ struct nlm_host *nlmclnt_init(const struct nlmclnt_initdata *nlm_init) nlm_init->hostname, nlm_init->noresvport, nlm_init->net); if (host == NULL) { - lockd_down(); + lockd_down(nlm_init->net); return ERR_PTR(-ENOLCK); } @@ -80,8 +80,10 @@ EXPORT_SYMBOL_GPL(nlmclnt_init); */ void nlmclnt_done(struct nlm_host *host) { + struct net *net = host->net; + nlmclnt_release_host(host); - lockd_down(); + lockd_down(net); } EXPORT_SYMBOL_GPL(nlmclnt_done); @@ -220,11 +222,12 @@ reclaimer(void *ptr) struct nlm_wait *block; struct file_lock *fl, *next; u32 nsmstate; + struct net *net = host->net; allow_signal(SIGKILL); down_write(&host->h_rwsem); - lockd_up(); /* note: this cannot fail as lockd is already running */ + lockd_up(net); /* note: this cannot fail as lockd is already running */ dprintk("lockd: reclaiming locks for host %s\n", host->h_name); @@ -275,6 +278,6 @@ restart: /* Release host handle after use */ nlmclnt_release_host(host); - lockd_down(); + lockd_down(net); return 0; } diff --git a/fs/lockd/svc.c b/fs/lockd/svc.c index f49b9afc4436..1ead0750cdbb 100644 --- a/fs/lockd/svc.c +++ b/fs/lockd/svc.c @@ -295,11 +295,10 @@ static void lockd_down_net(struct net *net) /* * Bring up the lockd process if it's not already up. */ -int lockd_up(void) +int lockd_up(struct net *net) { struct svc_serv *serv; int error = 0; - struct net *net = current->nsproxy->net_ns; mutex_lock(&nlmsvc_mutex); /* @@ -378,12 +377,12 @@ EXPORT_SYMBOL_GPL(lockd_up); * Decrement the user count and bring down lockd if we're the last. */ void -lockd_down(void) +lockd_down(struct net *net) { mutex_lock(&nlmsvc_mutex); if (nlmsvc_users) { if (--nlmsvc_users) { - lockd_down_net(current->nsproxy->net_ns); + lockd_down_net(net); goto out; } } else { diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 28dfad39f0c5..78e521392df1 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -220,7 +220,7 @@ static int nfsd_startup(unsigned short port, int nrservs) ret = nfsd_init_socks(port); if (ret) goto out_racache; - ret = lockd_up(); + ret = lockd_up(&init_net); if (ret) goto out_racache; ret = nfs4_state_start(); @@ -229,7 +229,7 @@ static int nfsd_startup(unsigned short port, int nrservs) nfsd_up = true; return 0; out_lockd: - lockd_down(); + lockd_down(&init_net); out_racache: nfsd_racache_shutdown(); return ret; @@ -246,7 +246,7 @@ static void nfsd_shutdown(void) if (!nfsd_up) return; nfs4_state_shutdown(); - lockd_down(); + lockd_down(&init_net); nfsd_racache_shutdown(); nfsd_up = false; } diff --git a/include/linux/lockd/bind.h b/include/linux/lockd/bind.h index 11a966e5f829..4d24d64578c4 100644 --- a/include/linux/lockd/bind.h +++ b/include/linux/lockd/bind.h @@ -54,7 +54,7 @@ extern void nlmclnt_done(struct nlm_host *host); extern int nlmclnt_proc(struct nlm_host *host, int cmd, struct file_lock *fl); -extern int lockd_up(void); -extern void lockd_down(void); +extern int lockd_up(struct net *net); +extern void lockd_down(struct net *net); #endif /* LINUX_LOCKD_BIND_H */ -- cgit v1.2.3 From b89109bef4a6a4a8ab5788778ee0addca0787870 Mon Sep 17 00:00:00 2001 From: Stanislav Kinsbursky Date: Wed, 11 Apr 2012 15:13:14 +0400 Subject: nfsd: pass network context to export caches init/shutdown routines These functions will be called from per-net operations. Signed-off-by: Stanislav Kinsbursky Signed-off-by: J. Bruce Fields --- fs/nfsd/export.c | 20 ++++++++++---------- fs/nfsd/nfsctl.c | 6 +++--- include/linux/nfsd/export.h | 4 ++-- 3 files changed, 15 insertions(+), 15 deletions(-) (limited to 'include') diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c index 84723bc37c59..6453669dcef7 100644 --- a/fs/nfsd/export.c +++ b/fs/nfsd/export.c @@ -1228,17 +1228,17 @@ const struct seq_operations nfs_exports_op = { * Initialize the exports module. */ int -nfsd_export_init(void) +nfsd_export_init(struct net *net) { int rv; - dprintk("nfsd: initializing export module.\n"); + dprintk("nfsd: initializing export module (net: %p).\n", net); - rv = cache_register_net(&svc_export_cache, &init_net); + rv = cache_register_net(&svc_export_cache, net); if (rv) return rv; - rv = cache_register_net(&svc_expkey_cache, &init_net); + rv = cache_register_net(&svc_expkey_cache, net); if (rv) - cache_unregister_net(&svc_export_cache, &init_net); + cache_unregister_net(&svc_export_cache, net); return rv; } @@ -1257,14 +1257,14 @@ nfsd_export_flush(void) * Shutdown the exports module. */ void -nfsd_export_shutdown(void) +nfsd_export_shutdown(struct net *net) { - dprintk("nfsd: shutting down export module.\n"); + dprintk("nfsd: shutting down export module (net: %p).\n", net); - cache_unregister_net(&svc_expkey_cache, &init_net); - cache_unregister_net(&svc_export_cache, &init_net); + cache_unregister_net(&svc_expkey_cache, net); + cache_unregister_net(&svc_export_cache, net); svcauth_unix_purge(); - dprintk("nfsd: export shutdown complete.\n"); + dprintk("nfsd: export shutdown complete (net: %p).\n", net); } diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index ae19293e68df..bc76f8ebbe5e 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -1163,7 +1163,7 @@ static int __init init_nfsd(void) retval = nfsd_reply_cache_init(); if (retval) goto out_free_stat; - retval = nfsd_export_init(); + retval = nfsd_export_init(&init_net); if (retval) goto out_free_cache; nfsd_lockd_init(); /* lockd->nfsd callbacks */ @@ -1184,7 +1184,7 @@ out_free_idmap: nfsd_idmap_shutdown(); out_free_lockd: nfsd_lockd_shutdown(); - nfsd_export_shutdown(); + nfsd_export_shutdown(&init_net); out_free_cache: nfsd_reply_cache_shutdown(); out_free_stat: @@ -1201,7 +1201,7 @@ out_unregister_notifier: static void __exit exit_nfsd(void) { - nfsd_export_shutdown(); + nfsd_export_shutdown(&init_net); nfsd_reply_cache_shutdown(); remove_proc_entry("fs/nfs/exports", NULL); remove_proc_entry("fs/nfs", NULL); diff --git a/include/linux/nfsd/export.h b/include/linux/nfsd/export.h index 485c2afa96f7..375096c083d3 100644 --- a/include/linux/nfsd/export.h +++ b/include/linux/nfsd/export.h @@ -130,8 +130,8 @@ __be32 check_nfsd_access(struct svc_export *exp, struct svc_rqst *rqstp); /* * Function declarations */ -int nfsd_export_init(void); -void nfsd_export_shutdown(void); +int nfsd_export_init(struct net *); +void nfsd_export_shutdown(struct net *); void nfsd_export_flush(void); struct svc_export * rqst_exp_get_by_name(struct svc_rqst *, struct path *); -- cgit v1.2.3 From b3853e0ea1f2ef58f7e7c03e47819e2ae3766dea Mon Sep 17 00:00:00 2001 From: Stanislav Kinsbursky Date: Wed, 11 Apr 2012 15:13:21 +0400 Subject: nfsd: make export cache allocated per network namespace context This patch also changes prototypes of nfsd_export_flush() and exp_rootfh(): network namespace parameter added. Signed-off-by: Stanislav Kinsbursky Signed-off-by: J. Bruce Fields --- fs/nfsd/export.c | 47 ++++++++++++++++++++++++++++++--------------- fs/nfsd/netns.h | 2 ++ fs/nfsd/nfsctl.c | 2 +- fs/nfsd/nfssvc.c | 2 +- include/linux/nfsd/export.h | 4 ++-- 5 files changed, 38 insertions(+), 19 deletions(-) (limited to 'include') diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c index 688264b55a3a..84d020fc0e37 100644 --- a/fs/nfsd/export.c +++ b/fs/nfsd/export.c @@ -15,11 +15,13 @@ #include #include #include +#include #include #include "nfsd.h" #include "nfsfh.h" +#include "netns.h" #define NFSDDBG_FACILITY NFSDDBG_EXPORT @@ -298,8 +300,6 @@ svc_expkey_update(struct cache_detail *cd, struct svc_expkey *new, #define EXPORT_HASHBITS 8 #define EXPORT_HASHMAX (1<< EXPORT_HASHBITS) -static struct cache_head *export_table[EXPORT_HASHMAX]; - static void nfsd4_fslocs_free(struct nfsd4_fs_locations *fsloc) { int i; @@ -708,10 +708,9 @@ static struct cache_head *svc_export_alloc(void) return NULL; } -struct cache_detail svc_export_cache = { +struct cache_detail svc_export_cache_template = { .owner = THIS_MODULE, .hash_size = EXPORT_HASHMAX, - .hash_table = export_table, .name = "nfsd.export", .cache_put = svc_export_put, .cache_upcall = svc_export_upcall, @@ -835,7 +834,7 @@ static struct svc_export *exp_parent(struct cache_detail *cd, svc_client *clp, * since its harder to fool a kernel module than a user space program. */ int -exp_rootfh(svc_client *clp, char *name, +exp_rootfh(struct net *net, svc_client *clp, char *name, struct knfsd_fh *f, int maxsize) { struct svc_export *exp; @@ -843,7 +842,8 @@ exp_rootfh(svc_client *clp, char *name, struct inode *inode; struct svc_fh fh; int err; - struct cache_detail *cd = &svc_export_cache; + struct nfsd_net *nn = net_generic(net, nfsd_net_id); + struct cache_detail *cd = nn->svc_export_cache; err = -EPERM; /* NB: we probably ought to check that it's NUL-terminated */ @@ -930,7 +930,8 @@ struct svc_export * rqst_exp_get_by_name(struct svc_rqst *rqstp, struct path *path) { struct svc_export *gssexp, *exp = ERR_PTR(-ENOENT); - struct cache_detail *cd = &svc_export_cache; + struct nfsd_net *nn = net_generic(rqstp->rq_xprt->xpt_net, nfsd_net_id); + struct cache_detail *cd = nn->svc_export_cache; if (rqstp->rq_client == NULL) goto gss; @@ -960,7 +961,8 @@ struct svc_export * rqst_exp_find(struct svc_rqst *rqstp, int fsid_type, u32 *fsidv) { struct svc_export *gssexp, *exp = ERR_PTR(-ENOENT); - struct cache_detail *cd = &svc_export_cache; + struct nfsd_net *nn = net_generic(rqstp->rq_xprt->xpt_net, nfsd_net_id); + struct cache_detail *cd = nn->svc_export_cache; if (rqstp->rq_client == NULL) goto gss; @@ -1238,26 +1240,39 @@ int nfsd_export_init(struct net *net) { int rv; + struct nfsd_net *nn = net_generic(net, nfsd_net_id); + dprintk("nfsd: initializing export module (net: %p).\n", net); - rv = cache_register_net(&svc_export_cache, net); + nn->svc_export_cache = cache_create_net(&svc_export_cache_template, net); + if (IS_ERR(nn->svc_export_cache)) + return PTR_ERR(nn->svc_export_cache); + rv = cache_register_net(nn->svc_export_cache, net); if (rv) - return rv; + goto destroy_export_cache; + rv = cache_register_net(&svc_expkey_cache, net); if (rv) - cache_unregister_net(&svc_export_cache, net); - return rv; + goto unregister_export_cache; + return 0; +unregister_export_cache: + cache_unregister_net(nn->svc_export_cache, net); +destroy_export_cache: + cache_destroy_net(nn->svc_export_cache, net); + return rv; } /* * Flush exports table - called when last nfsd thread is killed */ void -nfsd_export_flush(void) +nfsd_export_flush(struct net *net) { + struct nfsd_net *nn = net_generic(net, nfsd_net_id); + cache_purge(&svc_expkey_cache); - cache_purge(&svc_export_cache); + cache_purge(nn->svc_export_cache); } /* @@ -1266,11 +1281,13 @@ nfsd_export_flush(void) void nfsd_export_shutdown(struct net *net) { + struct nfsd_net *nn = net_generic(net, nfsd_net_id); dprintk("nfsd: shutting down export module (net: %p).\n", net); cache_unregister_net(&svc_expkey_cache, net); - cache_unregister_net(&svc_export_cache, net); + cache_unregister_net(nn->svc_export_cache, net); + cache_destroy_net(nn->svc_export_cache, net); svcauth_unix_purge(); dprintk("nfsd: export shutdown complete (net: %p).\n", net); diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index 12e0cff435b4..c1c6242942a9 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -28,6 +28,8 @@ struct cld_net; struct nfsd_net { struct cld_net *cld_net; + + struct cache_detail *svc_export_cache; }; extern int nfsd_net_id; diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index bc76f8ebbe5e..ddb9f8787379 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -354,7 +354,7 @@ static ssize_t write_filehandle(struct file *file, char *buf, size_t size) if (!dom) return -ENOMEM; - len = exp_rootfh(dom, path, &fh, maxsize); + len = exp_rootfh(&init_net, dom, path, &fh, maxsize); auth_domain_put(dom); if (len) return len; diff --git a/fs/nfsd/nfssvc.c b/fs/nfsd/nfssvc.c index 78e521392df1..cb4d51d8cbdb 100644 --- a/fs/nfsd/nfssvc.c +++ b/fs/nfsd/nfssvc.c @@ -261,7 +261,7 @@ static void nfsd_last_thread(struct svc_serv *serv, struct net *net) printk(KERN_WARNING "nfsd: last server has exited, flushing export " "cache\n"); - nfsd_export_flush(); + nfsd_export_flush(net); } void nfsd_reset_versions(void) diff --git a/include/linux/nfsd/export.h b/include/linux/nfsd/export.h index 375096c083d3..565c2122993f 100644 --- a/include/linux/nfsd/export.h +++ b/include/linux/nfsd/export.h @@ -132,13 +132,13 @@ __be32 check_nfsd_access(struct svc_export *exp, struct svc_rqst *rqstp); */ int nfsd_export_init(struct net *); void nfsd_export_shutdown(struct net *); -void nfsd_export_flush(void); +void nfsd_export_flush(struct net *); struct svc_export * rqst_exp_get_by_name(struct svc_rqst *, struct path *); struct svc_export * rqst_exp_parent(struct svc_rqst *, struct path *); struct svc_export * rqst_find_fsidzero_export(struct svc_rqst *); -int exp_rootfh(struct auth_domain *, +int exp_rootfh(struct net *, struct auth_domain *, char *path, struct knfsd_fh *, int maxsize); __be32 exp_pseudoroot(struct svc_rqst *, struct svc_fh *); __be32 nfserrno(int errno); -- cgit v1.2.3 From e5f06f720eff24e32f1cc08ec03bcc8c4b2d2934 Mon Sep 17 00:00:00 2001 From: Stanislav Kinsbursky Date: Wed, 11 Apr 2012 15:13:28 +0400 Subject: nfsd: make expkey cache allocated per network namespace context This patch also changes svcauth_unix_purge() function: added network namespace as a parameter and thus loop over all networks was replaced by only one call for ip map cache purge. Signed-off-by: Stanislav Kinsbursky Signed-off-by: J. Bruce Fields --- fs/nfsd/export.c | 28 +++++++++++++++++----------- fs/nfsd/netns.h | 1 + fs/nfsd/nfsctl.c | 3 ++- include/linux/nfsd/export.h | 2 -- include/linux/sunrpc/svcauth.h | 2 +- net/sunrpc/svcauth_unix.c | 13 ++++--------- 6 files changed, 25 insertions(+), 24 deletions(-) (limited to 'include') diff --git a/fs/nfsd/export.c b/fs/nfsd/export.c index 84d020fc0e37..dcb52b884519 100644 --- a/fs/nfsd/export.c +++ b/fs/nfsd/export.c @@ -40,7 +40,6 @@ typedef struct svc_export svc_export; #define EXPKEY_HASHBITS 8 #define EXPKEY_HASHMAX (1 << EXPKEY_HASHBITS) #define EXPKEY_HASHMASK (EXPKEY_HASHMAX -1) -static struct cache_head *expkey_table[EXPKEY_HASHMAX]; static void expkey_put(struct kref *ref) { @@ -241,10 +240,9 @@ static struct cache_head *expkey_alloc(void) return NULL; } -static struct cache_detail svc_expkey_cache = { +static struct cache_detail svc_expkey_cache_template = { .owner = THIS_MODULE, .hash_size = EXPKEY_HASHMAX, - .hash_table = expkey_table, .name = "nfsd.fh", .cache_put = expkey_put, .cache_upcall = expkey_upcall, @@ -883,12 +881,13 @@ static struct svc_export *exp_find(struct cache_detail *cd, u32 *fsidv, struct cache_req *reqp) { struct svc_export *exp; - struct svc_expkey *ek = exp_find_key(&svc_expkey_cache, clp, fsid_type, fsidv, reqp); + struct nfsd_net *nn = net_generic(cd->net, nfsd_net_id); + struct svc_expkey *ek = exp_find_key(nn->svc_expkey_cache, clp, fsid_type, fsidv, reqp); if (IS_ERR(ek)) return ERR_CAST(ek); exp = exp_get_by_name(cd, clp, &ek->ek_path, reqp); - cache_put(&ek->h, &svc_expkey_cache); + cache_put(&ek->h, nn->svc_expkey_cache); if (IS_ERR(exp)) return ERR_CAST(exp); @@ -1232,7 +1231,6 @@ const struct seq_operations nfs_exports_op = { .show = e_show, }; - /* * Initialize the exports module. */ @@ -1251,11 +1249,18 @@ nfsd_export_init(struct net *net) if (rv) goto destroy_export_cache; - rv = cache_register_net(&svc_expkey_cache, net); - if (rv) + nn->svc_expkey_cache = cache_create_net(&svc_expkey_cache_template, net); + if (IS_ERR(nn->svc_expkey_cache)) { + rv = PTR_ERR(nn->svc_expkey_cache); goto unregister_export_cache; + } + rv = cache_register_net(nn->svc_expkey_cache, net); + if (rv) + goto destroy_expkey_cache; return 0; +destroy_expkey_cache: + cache_destroy_net(nn->svc_expkey_cache, net); unregister_export_cache: cache_unregister_net(nn->svc_export_cache, net); destroy_export_cache: @@ -1271,7 +1276,7 @@ nfsd_export_flush(struct net *net) { struct nfsd_net *nn = net_generic(net, nfsd_net_id); - cache_purge(&svc_expkey_cache); + cache_purge(nn->svc_expkey_cache); cache_purge(nn->svc_export_cache); } @@ -1285,10 +1290,11 @@ nfsd_export_shutdown(struct net *net) dprintk("nfsd: shutting down export module (net: %p).\n", net); - cache_unregister_net(&svc_expkey_cache, net); + cache_unregister_net(nn->svc_expkey_cache, net); cache_unregister_net(nn->svc_export_cache, net); + cache_destroy_net(nn->svc_expkey_cache, net); cache_destroy_net(nn->svc_export_cache, net); - svcauth_unix_purge(); + svcauth_unix_purge(net); dprintk("nfsd: export shutdown complete (net: %p).\n", net); } diff --git a/fs/nfsd/netns.h b/fs/nfsd/netns.h index c1c6242942a9..9794c6c7d133 100644 --- a/fs/nfsd/netns.h +++ b/fs/nfsd/netns.h @@ -29,6 +29,7 @@ struct cld_net; struct nfsd_net { struct cld_net *cld_net; + struct cache_detail *svc_expkey_cache; struct cache_detail *svc_export_cache; }; diff --git a/fs/nfsd/nfsctl.c b/fs/nfsd/nfsctl.c index ddb9f8787379..b14417740816 100644 --- a/fs/nfsd/nfsctl.c +++ b/fs/nfsd/nfsctl.c @@ -129,13 +129,14 @@ static int exports_open(struct inode *inode, struct file *file) { int err; struct seq_file *seq; + struct nfsd_net *nn = net_generic(&init_net, nfsd_net_id); err = seq_open(file, &nfs_exports_op); if (err) return err; seq = file->private_data; - seq->private = &svc_export_cache; + seq->private = nn->svc_export_cache; return 0; } diff --git a/include/linux/nfsd/export.h b/include/linux/nfsd/export.h index 565c2122993f..e33f747b173c 100644 --- a/include/linux/nfsd/export.h +++ b/include/linux/nfsd/export.h @@ -143,8 +143,6 @@ int exp_rootfh(struct net *, struct auth_domain *, __be32 exp_pseudoroot(struct svc_rqst *, struct svc_fh *); __be32 nfserrno(int errno); -extern struct cache_detail svc_export_cache; - static inline void exp_put(struct svc_export *exp) { cache_put(&exp->h, exp->cd); diff --git a/include/linux/sunrpc/svcauth.h b/include/linux/sunrpc/svcauth.h index 2e2af101b59c..2c54683b91de 100644 --- a/include/linux/sunrpc/svcauth.h +++ b/include/linux/sunrpc/svcauth.h @@ -130,7 +130,7 @@ extern struct auth_domain *auth_domain_lookup(char *name, struct auth_domain *ne extern struct auth_domain *auth_domain_find(char *name); extern struct auth_domain *auth_unix_lookup(struct net *net, struct in6_addr *addr); extern int auth_unix_forget_old(struct auth_domain *dom); -extern void svcauth_unix_purge(void); +extern void svcauth_unix_purge(struct net *net); extern void svcauth_unix_info_release(struct svc_xprt *xpt); extern int svcauth_unix_set_client(struct svc_rqst *rqstp); diff --git a/net/sunrpc/svcauth_unix.c b/net/sunrpc/svcauth_unix.c index 521d8f7dc833..9c3b9f014468 100644 --- a/net/sunrpc/svcauth_unix.c +++ b/net/sunrpc/svcauth_unix.c @@ -346,17 +346,12 @@ static inline int ip_map_update(struct net *net, struct ip_map *ipm, return __ip_map_update(sn->ip_map_cache, ipm, udom, expiry); } - -void svcauth_unix_purge(void) +void svcauth_unix_purge(struct net *net) { - struct net *net; - - for_each_net(net) { - struct sunrpc_net *sn; + struct sunrpc_net *sn; - sn = net_generic(net, sunrpc_net_id); - cache_purge(sn->ip_map_cache); - } + sn = net_generic(net, sunrpc_net_id); + cache_purge(sn->ip_map_cache); } EXPORT_SYMBOL_GPL(svcauth_unix_purge); -- cgit v1.2.3 From d57a4282d04810417c4ed2a49cbbeda8b3569b18 Mon Sep 17 00:00:00 2001 From: Grant Likely Date: Sat, 7 Apr 2012 14:16:53 -0600 Subject: spi/devicetree: Move devicetree support code into spi directory The SPI device tree support code isn't shared by any other subsystem. It can be moved into the core drivers/spi directory and the exported symbol can be removed. Signed-off-by: Grant Likely Cc: Rob Herring --- drivers/of/Kconfig | 6 --- drivers/of/Makefile | 1 - drivers/of/of_spi.c | 99 ---------------------------------------------- drivers/spi/spi-fsl-espi.c | 1 - drivers/spi/spi-fsl-lib.c | 2 +- drivers/spi/spi-ppc4xx.c | 1 - drivers/spi/spi.c | 92 +++++++++++++++++++++++++++++++++++++++++- include/linux/of_spi.h | 23 ----------- 8 files changed, 92 insertions(+), 133 deletions(-) delete mode 100644 drivers/of/of_spi.c delete mode 100644 include/linux/of_spi.h (limited to 'include') diff --git a/drivers/of/Kconfig b/drivers/of/Kconfig index 8e84ce9765a9..f623f17a0b9f 100644 --- a/drivers/of/Kconfig +++ b/drivers/of/Kconfig @@ -67,12 +67,6 @@ config OF_NET depends on NETDEVICES def_bool y -config OF_SPI - def_tristate SPI - depends on SPI && !SPARC - help - OpenFirmware SPI accessors - config OF_MDIO def_tristate PHYLIB depends on PHYLIB diff --git a/drivers/of/Makefile b/drivers/of/Makefile index aa90e602c8a7..0040d1858665 100644 --- a/drivers/of/Makefile +++ b/drivers/of/Makefile @@ -7,7 +7,6 @@ obj-$(CONFIG_OF_DEVICE) += device.o platform.o obj-$(CONFIG_OF_GPIO) += gpio.o obj-$(CONFIG_OF_I2C) += of_i2c.o obj-$(CONFIG_OF_NET) += of_net.o -obj-$(CONFIG_OF_SPI) += of_spi.o obj-$(CONFIG_OF_SELFTEST) += selftest.o obj-$(CONFIG_OF_MDIO) += of_mdio.o obj-$(CONFIG_OF_PCI) += of_pci.o diff --git a/drivers/of/of_spi.c b/drivers/of/of_spi.c deleted file mode 100644 index 6dbc074e4876..000000000000 --- a/drivers/of/of_spi.c +++ /dev/null @@ -1,99 +0,0 @@ -/* - * SPI OF support routines - * Copyright (C) 2008 Secret Lab Technologies Ltd. - * - * Support routines for deriving SPI device attachments from the device - * tree. - */ - -#include -#include -#include -#include -#include -#include - -/** - * of_register_spi_devices - Register child devices onto the SPI bus - * @master: Pointer to spi_master device - * - * Registers an spi_device for each child node of master node which has a 'reg' - * property. - */ -void of_register_spi_devices(struct spi_master *master) -{ - struct spi_device *spi; - struct device_node *nc; - const __be32 *prop; - int rc; - int len; - - if (!master->dev.of_node) - return; - - for_each_child_of_node(master->dev.of_node, nc) { - /* Alloc an spi_device */ - spi = spi_alloc_device(master); - if (!spi) { - dev_err(&master->dev, "spi_device alloc error for %s\n", - nc->full_name); - spi_dev_put(spi); - continue; - } - - /* Select device driver */ - if (of_modalias_node(nc, spi->modalias, - sizeof(spi->modalias)) < 0) { - dev_err(&master->dev, "cannot find modalias for %s\n", - nc->full_name); - spi_dev_put(spi); - continue; - } - - /* Device address */ - prop = of_get_property(nc, "reg", &len); - if (!prop || len < sizeof(*prop)) { - dev_err(&master->dev, "%s has no 'reg' property\n", - nc->full_name); - spi_dev_put(spi); - continue; - } - spi->chip_select = be32_to_cpup(prop); - - /* Mode (clock phase/polarity/etc.) */ - if (of_find_property(nc, "spi-cpha", NULL)) - spi->mode |= SPI_CPHA; - if (of_find_property(nc, "spi-cpol", NULL)) - spi->mode |= SPI_CPOL; - if (of_find_property(nc, "spi-cs-high", NULL)) - spi->mode |= SPI_CS_HIGH; - - /* Device speed */ - prop = of_get_property(nc, "spi-max-frequency", &len); - if (!prop || len < sizeof(*prop)) { - dev_err(&master->dev, "%s has no 'spi-max-frequency' property\n", - nc->full_name); - spi_dev_put(spi); - continue; - } - spi->max_speed_hz = be32_to_cpup(prop); - - /* IRQ */ - spi->irq = irq_of_parse_and_map(nc, 0); - - /* Store a pointer to the node in the device structure */ - of_node_get(nc); - spi->dev.of_node = nc; - - /* Register the new device */ - request_module(spi->modalias); - rc = spi_add_device(spi); - if (rc) { - dev_err(&master->dev, "spi_device register error %s\n", - nc->full_name); - spi_dev_put(spi); - } - - } -} -EXPORT_SYMBOL(of_register_spi_devices); diff --git a/drivers/spi/spi-fsl-espi.c b/drivers/spi/spi-fsl-espi.c index 7523a2429d09..27bdc47b5250 100644 --- a/drivers/spi/spi-fsl-espi.c +++ b/drivers/spi/spi-fsl-espi.c @@ -17,7 +17,6 @@ #include #include #include -#include #include #include #include diff --git a/drivers/spi/spi-fsl-lib.c b/drivers/spi/spi-fsl-lib.c index 2674fad7f68a..1503574b215a 100644 --- a/drivers/spi/spi-fsl-lib.c +++ b/drivers/spi/spi-fsl-lib.c @@ -22,7 +22,7 @@ #include #include #include -#include +#include #include #include "spi-fsl-lib.h" diff --git a/drivers/spi/spi-ppc4xx.c b/drivers/spi/spi-ppc4xx.c index 98ec53285fc7..d95d307a1100 100644 --- a/drivers/spi/spi-ppc4xx.c +++ b/drivers/spi/spi-ppc4xx.c @@ -30,7 +30,6 @@ #include #include #include -#include #include #include #include diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c index 3d8f662e4fe9..37c555ec59ab 100644 --- a/drivers/spi/spi.c +++ b/drivers/spi/spi.c @@ -2,6 +2,7 @@ * SPI init/core code * * Copyright (C) 2005 David Brownell + * Copyright (C) 2008 Secret Lab Technologies Ltd. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -19,15 +20,16 @@ */ #include +#include #include #include #include #include #include +#include #include #include #include -#include #include #include #include @@ -798,6 +800,94 @@ err_init_queue: /*-------------------------------------------------------------------------*/ +#if defined(CONFIG_OF) && !defined(CONFIG_SPARC) +/** + * of_register_spi_devices() - Register child devices onto the SPI bus + * @master: Pointer to spi_master device + * + * Registers an spi_device for each child node of master node which has a 'reg' + * property. + */ +static void of_register_spi_devices(struct spi_master *master) +{ + struct spi_device *spi; + struct device_node *nc; + const __be32 *prop; + int rc; + int len; + + if (!master->dev.of_node) + return; + + for_each_child_of_node(master->dev.of_node, nc) { + /* Alloc an spi_device */ + spi = spi_alloc_device(master); + if (!spi) { + dev_err(&master->dev, "spi_device alloc error for %s\n", + nc->full_name); + spi_dev_put(spi); + continue; + } + + /* Select device driver */ + if (of_modalias_node(nc, spi->modalias, + sizeof(spi->modalias)) < 0) { + dev_err(&master->dev, "cannot find modalias for %s\n", + nc->full_name); + spi_dev_put(spi); + continue; + } + + /* Device address */ + prop = of_get_property(nc, "reg", &len); + if (!prop || len < sizeof(*prop)) { + dev_err(&master->dev, "%s has no 'reg' property\n", + nc->full_name); + spi_dev_put(spi); + continue; + } + spi->chip_select = be32_to_cpup(prop); + + /* Mode (clock phase/polarity/etc.) */ + if (of_find_property(nc, "spi-cpha", NULL)) + spi->mode |= SPI_CPHA; + if (of_find_property(nc, "spi-cpol", NULL)) + spi->mode |= SPI_CPOL; + if (of_find_property(nc, "spi-cs-high", NULL)) + spi->mode |= SPI_CS_HIGH; + + /* Device speed */ + prop = of_get_property(nc, "spi-max-frequency", &len); + if (!prop || len < sizeof(*prop)) { + dev_err(&master->dev, "%s has no 'spi-max-frequency' property\n", + nc->full_name); + spi_dev_put(spi); + continue; + } + spi->max_speed_hz = be32_to_cpup(prop); + + /* IRQ */ + spi->irq = irq_of_parse_and_map(nc, 0); + + /* Store a pointer to the node in the device structure */ + of_node_get(nc); + spi->dev.of_node = nc; + + /* Register the new device */ + request_module(spi->modalias); + rc = spi_add_device(spi); + if (rc) { + dev_err(&master->dev, "spi_device register error %s\n", + nc->full_name); + spi_dev_put(spi); + } + + } +} +#else +static void of_register_spi_devices(struct spi_master *master) { } +#endif + static void spi_master_release(struct device *dev) { struct spi_master *master; diff --git a/include/linux/of_spi.h b/include/linux/of_spi.h deleted file mode 100644 index 9e3e70f78ae6..000000000000 --- a/include/linux/of_spi.h +++ /dev/null @@ -1,23 +0,0 @@ -/* - * OpenFirmware SPI support routines - * Copyright (C) 2008 Secret Lab Technologies Ltd. - * - * Support routines for deriving SPI device attachments from the device - * tree. - */ - -#ifndef __LINUX_OF_SPI_H -#define __LINUX_OF_SPI_H - -#include - -#if defined(CONFIG_OF_SPI) || defined(CONFIG_OF_SPI_MODULE) -extern void of_register_spi_devices(struct spi_master *master); -#else -static inline void of_register_spi_devices(struct spi_master *master) -{ - return; -} -#endif /* CONFIG_OF_SPI */ - -#endif /* __LINUX_OF_SPI */ -- cgit v1.2.3 From cbc91f71b51b8335f1fc7ccfca8011f31a717367 Mon Sep 17 00:00:00 2001 From: Srikar Dronamraju Date: Wed, 11 Apr 2012 16:05:27 +0530 Subject: uprobes/core: Decrement uprobe count before the pages are unmapped Uprobes has a callback (uprobe_munmap()) in the unmap path to maintain the uprobes count. In the exit path this callback gets called in unlink_file_vma(). However by the time unlink_file_vma() is called, the pages would have been unmapped (in unmap_vmas()) and the task->rss_stat counts accounted (in zap_pte_range()). If the exiting process has probepoints, uprobe_munmap() checks if the breakpoint instruction was around before decrementing the probe count. This results in a file backed page being reread by uprobe_munmap() and hence it does not find the breakpoint. This patch fixes this problem by moving the callback to unmap_single_vma(). Since unmap_single_vma() may not unmap the complete vma, add start and end parameters to uprobe_munmap(). This bug became apparent courtesy of commit c3f0327f8e9d ("mm: add rss counters consistency check"). Signed-off-by: Srikar Dronamraju Cc: Linus Torvalds Cc: Ananth N Mavinakayanahalli Cc: Jim Keniston Cc: Linux-mm Cc: Oleg Nesterov Cc: Andi Kleen Cc: Christoph Hellwig Cc: Steven Rostedt Cc: Arnaldo Carvalho de Melo Cc: Masami Hiramatsu Cc: Anton Arapov Cc: Peter Zijlstra Link: http://lkml.kernel.org/r/20120411103527.23245.9835.sendpatchset@srdronam.in.ibm.com Signed-off-by: Ingo Molnar --- include/linux/uprobes.h | 5 +++-- kernel/events/uprobes.c | 4 ++-- mm/memory.c | 3 +++ mm/mmap.c | 8 ++++---- 4 files changed, 12 insertions(+), 8 deletions(-) (limited to 'include') diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h index d594d3b3ad4c..efe4b3308c74 100644 --- a/include/linux/uprobes.h +++ b/include/linux/uprobes.h @@ -107,7 +107,7 @@ extern bool __weak is_swbp_insn(uprobe_opcode_t *insn); extern int uprobe_register(struct inode *inode, loff_t offset, struct uprobe_consumer *uc); extern void uprobe_unregister(struct inode *inode, loff_t offset, struct uprobe_consumer *uc); extern int uprobe_mmap(struct vm_area_struct *vma); -extern void uprobe_munmap(struct vm_area_struct *vma); +extern void uprobe_munmap(struct vm_area_struct *vma, unsigned long start, unsigned long end); extern void uprobe_free_utask(struct task_struct *t); extern void uprobe_copy_process(struct task_struct *t); extern unsigned long __weak uprobe_get_swbp_addr(struct pt_regs *regs); @@ -134,7 +134,8 @@ static inline int uprobe_mmap(struct vm_area_struct *vma) { return 0; } -static inline void uprobe_munmap(struct vm_area_struct *vma) +static inline void +uprobe_munmap(struct vm_area_struct *vma, unsigned long start, unsigned long end) { } static inline void uprobe_notify_resume(struct pt_regs *regs) diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c index c5caeecea1dc..985be4d80fe8 100644 --- a/kernel/events/uprobes.c +++ b/kernel/events/uprobes.c @@ -1112,7 +1112,7 @@ int uprobe_mmap(struct vm_area_struct *vma) /* * Called in context of a munmap of a vma. */ -void uprobe_munmap(struct vm_area_struct *vma) +void uprobe_munmap(struct vm_area_struct *vma, unsigned long start, unsigned long end) { struct list_head tmp_list; struct uprobe *uprobe, *u; @@ -1138,7 +1138,7 @@ void uprobe_munmap(struct vm_area_struct *vma) list_del(&uprobe->pending_list); vaddr = vma_address(vma, uprobe->offset); - if (vaddr >= vma->vm_start && vaddr < vma->vm_end) { + if (vaddr >= start && vaddr < end) { /* * An unregister could have removed the probe before * unmap. So check before we decrement the count. diff --git a/mm/memory.c b/mm/memory.c index 6105f475fa86..bf8b4035277d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1307,6 +1307,9 @@ static void unmap_single_vma(struct mmu_gather *tlb, if (end <= vma->vm_start) return; + if (vma->vm_file) + uprobe_munmap(vma, start, end); + if (vma->vm_flags & VM_ACCOUNT) *nr_accounted += (end - start) >> PAGE_SHIFT; diff --git a/mm/mmap.c b/mm/mmap.c index b17a39f31a5e..15c21a150402 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -218,7 +218,6 @@ void unlink_file_vma(struct vm_area_struct *vma) mutex_lock(&mapping->i_mmap_mutex); __remove_shared_vm_struct(vma, file, mapping); mutex_unlock(&mapping->i_mmap_mutex); - uprobe_munmap(vma); } } @@ -548,10 +547,11 @@ again: remove_next = 1 + (end > next->vm_end); mapping = file->f_mapping; if (!(vma->vm_flags & VM_NONLINEAR)) { root = &mapping->i_mmap; - uprobe_munmap(vma); + uprobe_munmap(vma, vma->vm_start, vma->vm_end); if (adjust_next) - uprobe_munmap(next); + uprobe_munmap(next, next->vm_start, + next->vm_end); } mutex_lock(&mapping->i_mmap_mutex); @@ -632,7 +632,7 @@ again: remove_next = 1 + (end > next->vm_end); if (remove_next) { if (file) { - uprobe_munmap(next); + uprobe_munmap(next, next->vm_start, next->vm_end); fput(file); if (next->vm_flags & VM_EXECUTABLE) removed_exe_file_vma(mm); -- cgit v1.2.3 From 9fe2a7015393dc0203ac39242ae9c89038994f3c Mon Sep 17 00:00:00 2001 From: Srivatsa Vaddagiri Date: Fri, 23 Mar 2012 13:36:28 +0530 Subject: debugfs: Add support to print u32 array in debugfs Move the code from Xen to debugfs to make the code common for other users as well. Accked-by: Greg Kroah-Hartman Signed-off-by: Srivatsa Vaddagiri Signed-off-by: Suzuki Poulose [v1: Fixed rebase issues] [v2: Fixed PPC compile issues] Signed-off-by: Raghavendra K T Signed-off-by: Konrad Rzeszutek Wilk --- arch/x86/xen/debugfs.c | 104 --------------------------------------- arch/x86/xen/debugfs.h | 4 -- arch/x86/xen/spinlock.c | 12 ++--- fs/debugfs/file.c | 128 ++++++++++++++++++++++++++++++++++++++++++++++++ include/linux/debugfs.h | 11 +++++ 5 files changed, 145 insertions(+), 114 deletions(-) (limited to 'include') diff --git a/arch/x86/xen/debugfs.c b/arch/x86/xen/debugfs.c index ef1db1900d86..c8377fb26cdf 100644 --- a/arch/x86/xen/debugfs.c +++ b/arch/x86/xen/debugfs.c @@ -19,107 +19,3 @@ struct dentry * __init xen_init_debugfs(void) return d_xen_debug; } -struct array_data -{ - void *array; - unsigned elements; -}; - -static int u32_array_open(struct inode *inode, struct file *file) -{ - file->private_data = NULL; - return nonseekable_open(inode, file); -} - -static size_t format_array(char *buf, size_t bufsize, const char *fmt, - u32 *array, unsigned array_size) -{ - size_t ret = 0; - unsigned i; - - for(i = 0; i < array_size; i++) { - size_t len; - - len = snprintf(buf, bufsize, fmt, array[i]); - len++; /* ' ' or '\n' */ - ret += len; - - if (buf) { - buf += len; - bufsize -= len; - buf[-1] = (i == array_size-1) ? '\n' : ' '; - } - } - - ret++; /* \0 */ - if (buf) - *buf = '\0'; - - return ret; -} - -static char *format_array_alloc(const char *fmt, u32 *array, unsigned array_size) -{ - size_t len = format_array(NULL, 0, fmt, array, array_size); - char *ret; - - ret = kmalloc(len, GFP_KERNEL); - if (ret == NULL) - return NULL; - - format_array(ret, len, fmt, array, array_size); - return ret; -} - -static ssize_t u32_array_read(struct file *file, char __user *buf, size_t len, - loff_t *ppos) -{ - struct inode *inode = file->f_path.dentry->d_inode; - struct array_data *data = inode->i_private; - size_t size; - - if (*ppos == 0) { - if (file->private_data) { - kfree(file->private_data); - file->private_data = NULL; - } - - file->private_data = format_array_alloc("%u", data->array, data->elements); - } - - size = 0; - if (file->private_data) - size = strlen(file->private_data); - - return simple_read_from_buffer(buf, len, ppos, file->private_data, size); -} - -static int xen_array_release(struct inode *inode, struct file *file) -{ - kfree(file->private_data); - - return 0; -} - -static const struct file_operations u32_array_fops = { - .owner = THIS_MODULE, - .open = u32_array_open, - .release= xen_array_release, - .read = u32_array_read, - .llseek = no_llseek, -}; - -struct dentry *xen_debugfs_create_u32_array(const char *name, umode_t mode, - struct dentry *parent, - u32 *array, unsigned elements) -{ - struct array_data *data = kmalloc(sizeof(*data), GFP_KERNEL); - - if (data == NULL) - return NULL; - - data->array = array; - data->elements = elements; - - return debugfs_create_file(name, mode, parent, data, &u32_array_fops); -} diff --git a/arch/x86/xen/debugfs.h b/arch/x86/xen/debugfs.h index 78d25499be5b..12ebf3325c7b 100644 --- a/arch/x86/xen/debugfs.h +++ b/arch/x86/xen/debugfs.h @@ -3,8 +3,4 @@ struct dentry * __init xen_init_debugfs(void); -struct dentry *xen_debugfs_create_u32_array(const char *name, umode_t mode, - struct dentry *parent, - u32 *array, unsigned elements); - #endif /* _XEN_DEBUGFS_H */ diff --git a/arch/x86/xen/spinlock.c b/arch/x86/xen/spinlock.c index d69cc6c3f808..83e866d714ce 100644 --- a/arch/x86/xen/spinlock.c +++ b/arch/x86/xen/spinlock.c @@ -440,12 +440,12 @@ static int __init xen_spinlock_debugfs(void) debugfs_create_u64("time_total", 0444, d_spin_debug, &spinlock_stats.time_total); - xen_debugfs_create_u32_array("histo_total", 0444, d_spin_debug, - spinlock_stats.histo_spin_total, HISTO_BUCKETS + 1); - xen_debugfs_create_u32_array("histo_spinning", 0444, d_spin_debug, - spinlock_stats.histo_spin_spinning, HISTO_BUCKETS + 1); - xen_debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug, - spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1); + debugfs_create_u32_array("histo_total", 0444, d_spin_debug, + spinlock_stats.histo_spin_total, HISTO_BUCKETS + 1); + debugfs_create_u32_array("histo_spinning", 0444, d_spin_debug, + spinlock_stats.histo_spin_spinning, HISTO_BUCKETS + 1); + debugfs_create_u32_array("histo_blocked", 0444, d_spin_debug, + spinlock_stats.histo_spin_blocked, HISTO_BUCKETS + 1); return 0; } diff --git a/fs/debugfs/file.c b/fs/debugfs/file.c index 5dfafdd1dbd3..2340f6978d6e 100644 --- a/fs/debugfs/file.c +++ b/fs/debugfs/file.c @@ -20,6 +20,7 @@ #include #include #include +#include static ssize_t default_read_file(struct file *file, char __user *buf, size_t count, loff_t *ppos) @@ -520,6 +521,133 @@ struct dentry *debugfs_create_blob(const char *name, umode_t mode, } EXPORT_SYMBOL_GPL(debugfs_create_blob); +struct array_data { + void *array; + u32 elements; +}; + +static int u32_array_open(struct inode *inode, struct file *file) +{ + file->private_data = NULL; + return nonseekable_open(inode, file); +} + +static size_t format_array(char *buf, size_t bufsize, const char *fmt, + u32 *array, u32 array_size) +{ + size_t ret = 0; + u32 i; + + for (i = 0; i < array_size; i++) { + size_t len; + + len = snprintf(buf, bufsize, fmt, array[i]); + len++; /* ' ' or '\n' */ + ret += len; + + if (buf) { + buf += len; + bufsize -= len; + buf[-1] = (i == array_size-1) ? '\n' : ' '; + } + } + + ret++; /* \0 */ + if (buf) + *buf = '\0'; + + return ret; +} + +static char *format_array_alloc(const char *fmt, u32 *array, + u32 array_size) +{ + size_t len = format_array(NULL, 0, fmt, array, array_size); + char *ret; + + ret = kmalloc(len, GFP_KERNEL); + if (ret == NULL) + return NULL; + + format_array(ret, len, fmt, array, array_size); + return ret; +} + +static ssize_t u32_array_read(struct file *file, char __user *buf, size_t len, + loff_t *ppos) +{ + struct inode *inode = file->f_path.dentry->d_inode; + struct array_data *data = inode->i_private; + size_t size; + + if (*ppos == 0) { + if (file->private_data) { + kfree(file->private_data); + file->private_data = NULL; + } + + file->private_data = format_array_alloc("%u", data->array, + data->elements); + } + + size = 0; + if (file->private_data) + size = strlen(file->private_data); + + return simple_read_from_buffer(buf, len, ppos, + file->private_data, size); +} + +static int u32_array_release(struct inode *inode, struct file *file) +{ + kfree(file->private_data); + + return 0; +} + +static const struct file_operations u32_array_fops = { + .owner = THIS_MODULE, + .open = u32_array_open, + .release = u32_array_release, + .read = u32_array_read, + .llseek = no_llseek, +}; + +/** + * debugfs_create_u32_array - create a debugfs file that is used to read u32 + * array. + * @name: a pointer to a string containing the name of the file to create. + * @mode: the permission that the file should have. + * @parent: a pointer to the parent dentry for this file. This should be a + * directory dentry if set. If this parameter is %NULL, then the + * file will be created in the root of the debugfs filesystem. + * @array: u32 array that provides data. + * @elements: total number of elements in the array. + * + * This function creates a file in debugfs with the given name that exports + * @array as data. If the @mode variable is so set it can be read from. + * Writing is not supported. Seek within the file is also not supported. + * Once array is created its size can not be changed. + * + * The function returns a pointer to dentry on success. If debugfs is not + * enabled in the kernel, the value -%ENODEV will be returned. + */ +struct dentry *debugfs_create_u32_array(const char *name, umode_t mode, + struct dentry *parent, + u32 *array, u32 elements) +{ + struct array_data *data = kmalloc(sizeof(*data), GFP_KERNEL); + + if (data == NULL) + return NULL; + + data->array = array; + data->elements = elements; + + return debugfs_create_file(name, mode, parent, data, &u32_array_fops); +} +EXPORT_SYMBOL_GPL(debugfs_create_u32_array); + #ifdef CONFIG_HAS_IOMEM /* diff --git a/include/linux/debugfs.h b/include/linux/debugfs.h index ae36b72c22f3..66c434f5dd1e 100644 --- a/include/linux/debugfs.h +++ b/include/linux/debugfs.h @@ -93,6 +93,10 @@ struct dentry *debugfs_create_regset32(const char *name, umode_t mode, int debugfs_print_regs32(struct seq_file *s, const struct debugfs_reg32 *regs, int nregs, void __iomem *base, char *prefix); +struct dentry *debugfs_create_u32_array(const char *name, umode_t mode, + struct dentry *parent, + u32 *array, u32 elements); + bool debugfs_initialized(void); #else @@ -219,6 +223,13 @@ static inline bool debugfs_initialized(void) return false; } +static inline struct dentry *debugfs_create_u32_array(const char *name, umode_t mode, + struct dentry *parent, + u32 *array, u32 elements) +{ + return ERR_PTR(-ENODEV); +} + #endif #endif -- cgit v1.2.3 From 0b6c4857f7684f6d3f59e0506f62953575346978 Mon Sep 17 00:00:00 2001 From: Stefan Richter Date: Mon, 9 Apr 2012 20:51:18 +0200 Subject: firewire: core: fix DMA mapping direction Seen with recent libdc1394: If a client mmap()s the buffer of an isochronous reception buffer with PROT_READ|PROT_WRITE instead of just PROT_READ, firewire-core sets the wrong DMA mapping direction during buffer initialization. The fix is to split fw_iso_buffer_init() into allocation and DMA mapping and to perform the latter after both buffer and DMA context were allocated. Buffer allocation and context allocation may happen in any order, but we need the context type (reception or transmission) in order to set the DMA direction of the buffer. Signed-off-by: Stefan Richter --- drivers/firewire/core-cdev.c | 51 ++++++++++++++++++++++------ drivers/firewire/core-iso.c | 80 ++++++++++++++++++++++++++++---------------- drivers/firewire/core.h | 7 +++- include/linux/firewire.h | 1 + 4 files changed, 100 insertions(+), 39 deletions(-) (limited to 'include') diff --git a/drivers/firewire/core-cdev.c b/drivers/firewire/core-cdev.c index 2e6b24547e2a..2783f69dada6 100644 --- a/drivers/firewire/core-cdev.c +++ b/drivers/firewire/core-cdev.c @@ -22,6 +22,7 @@ #include #include #include +#include #include #include #include @@ -70,6 +71,7 @@ struct client { u64 iso_closure; struct fw_iso_buffer buffer; unsigned long vm_start; + bool buffer_is_mapped; struct list_head phy_receiver_link; u64 phy_receiver_closure; @@ -959,11 +961,20 @@ static void iso_mc_callback(struct fw_iso_context *context, sizeof(e->interrupt), NULL, 0); } +static enum dma_data_direction iso_dma_direction(struct fw_iso_context *context) +{ + if (context->type == FW_ISO_CONTEXT_TRANSMIT) + return DMA_TO_DEVICE; + else + return DMA_FROM_DEVICE; +} + static int ioctl_create_iso_context(struct client *client, union ioctl_arg *arg) { struct fw_cdev_create_iso_context *a = &arg->create_iso_context; struct fw_iso_context *context; fw_iso_callback_t cb; + int ret; BUILD_BUG_ON(FW_CDEV_ISO_CONTEXT_TRANSMIT != FW_ISO_CONTEXT_TRANSMIT || FW_CDEV_ISO_CONTEXT_RECEIVE != FW_ISO_CONTEXT_RECEIVE || @@ -1004,8 +1015,21 @@ static int ioctl_create_iso_context(struct client *client, union ioctl_arg *arg) if (client->iso_context != NULL) { spin_unlock_irq(&client->lock); fw_iso_context_destroy(context); + return -EBUSY; } + if (!client->buffer_is_mapped) { + ret = fw_iso_buffer_map_dma(&client->buffer, + client->device->card, + iso_dma_direction(context)); + if (ret < 0) { + spin_unlock_irq(&client->lock); + fw_iso_context_destroy(context); + + return ret; + } + client->buffer_is_mapped = true; + } client->iso_closure = a->closure; client->iso_context = context; spin_unlock_irq(&client->lock); @@ -1651,7 +1675,6 @@ static long fw_device_op_compat_ioctl(struct file *file, static int fw_device_op_mmap(struct file *file, struct vm_area_struct *vma) { struct client *client = file->private_data; - enum dma_data_direction direction; unsigned long size; int page_count, ret; @@ -1674,20 +1697,28 @@ static int fw_device_op_mmap(struct file *file, struct vm_area_struct *vma) if (size & ~PAGE_MASK) return -EINVAL; - if (vma->vm_flags & VM_WRITE) - direction = DMA_TO_DEVICE; - else - direction = DMA_FROM_DEVICE; - - ret = fw_iso_buffer_init(&client->buffer, client->device->card, - page_count, direction); + ret = fw_iso_buffer_alloc(&client->buffer, page_count); if (ret < 0) return ret; - ret = fw_iso_buffer_map(&client->buffer, vma); + spin_lock_irq(&client->lock); + if (client->iso_context) { + ret = fw_iso_buffer_map_dma(&client->buffer, + client->device->card, + iso_dma_direction(client->iso_context)); + client->buffer_is_mapped = (ret == 0); + } + spin_unlock_irq(&client->lock); if (ret < 0) - fw_iso_buffer_destroy(&client->buffer, client->device->card); + goto fail; + ret = fw_iso_buffer_map_vma(&client->buffer, vma); + if (ret < 0) + goto fail; + + return 0; + fail: + fw_iso_buffer_destroy(&client->buffer, client->device->card); return ret; } diff --git a/drivers/firewire/core-iso.c b/drivers/firewire/core-iso.c index d1565828ae2c..8382e27e9a27 100644 --- a/drivers/firewire/core-iso.c +++ b/drivers/firewire/core-iso.c @@ -39,52 +39,73 @@ * Isochronous DMA context management */ -int fw_iso_buffer_init(struct fw_iso_buffer *buffer, struct fw_card *card, - int page_count, enum dma_data_direction direction) +int fw_iso_buffer_alloc(struct fw_iso_buffer *buffer, int page_count) { - int i, j; - dma_addr_t address; - - buffer->page_count = page_count; - buffer->direction = direction; + int i; + buffer->page_count = 0; + buffer->page_count_mapped = 0; buffer->pages = kmalloc(page_count * sizeof(buffer->pages[0]), GFP_KERNEL); if (buffer->pages == NULL) - goto out; + return -ENOMEM; - for (i = 0; i < buffer->page_count; i++) { + for (i = 0; i < page_count; i++) { buffer->pages[i] = alloc_page(GFP_KERNEL | GFP_DMA32 | __GFP_ZERO); if (buffer->pages[i] == NULL) - goto out_pages; + break; + } + buffer->page_count = i; + if (i < page_count) { + fw_iso_buffer_destroy(buffer, NULL); + return -ENOMEM; + } + return 0; +} + +int fw_iso_buffer_map_dma(struct fw_iso_buffer *buffer, struct fw_card *card, + enum dma_data_direction direction) +{ + dma_addr_t address; + int i; + + buffer->direction = direction; + + for (i = 0; i < buffer->page_count; i++) { address = dma_map_page(card->device, buffer->pages[i], 0, PAGE_SIZE, direction); - if (dma_mapping_error(card->device, address)) { - __free_page(buffer->pages[i]); - goto out_pages; - } + if (dma_mapping_error(card->device, address)) + break; + set_page_private(buffer->pages[i], address); } + buffer->page_count_mapped = i; + if (i < buffer->page_count) + return -ENOMEM; return 0; +} - out_pages: - for (j = 0; j < i; j++) { - address = page_private(buffer->pages[j]); - dma_unmap_page(card->device, address, - PAGE_SIZE, direction); - __free_page(buffer->pages[j]); - } - kfree(buffer->pages); - out: - buffer->pages = NULL; +int fw_iso_buffer_init(struct fw_iso_buffer *buffer, struct fw_card *card, + int page_count, enum dma_data_direction direction) +{ + int ret; + + ret = fw_iso_buffer_alloc(buffer, page_count); + if (ret < 0) + return ret; + + ret = fw_iso_buffer_map_dma(buffer, card, direction); + if (ret < 0) + fw_iso_buffer_destroy(buffer, card); - return -ENOMEM; + return ret; } EXPORT_SYMBOL(fw_iso_buffer_init); -int fw_iso_buffer_map(struct fw_iso_buffer *buffer, struct vm_area_struct *vma) +int fw_iso_buffer_map_vma(struct fw_iso_buffer *buffer, + struct vm_area_struct *vma) { unsigned long uaddr; int i, err; @@ -107,15 +128,18 @@ void fw_iso_buffer_destroy(struct fw_iso_buffer *buffer, int i; dma_addr_t address; - for (i = 0; i < buffer->page_count; i++) { + for (i = 0; i < buffer->page_count_mapped; i++) { address = page_private(buffer->pages[i]); dma_unmap_page(card->device, address, PAGE_SIZE, buffer->direction); - __free_page(buffer->pages[i]); } + for (i = 0; i < buffer->page_count; i++) + __free_page(buffer->pages[i]); kfree(buffer->pages); buffer->pages = NULL; + buffer->page_count = 0; + buffer->page_count_mapped = 0; } EXPORT_SYMBOL(fw_iso_buffer_destroy); diff --git a/drivers/firewire/core.h b/drivers/firewire/core.h index 9047f5547d98..94257aecd054 100644 --- a/drivers/firewire/core.h +++ b/drivers/firewire/core.h @@ -3,6 +3,7 @@ #include #include +#include #include #include #include @@ -169,7 +170,11 @@ void fw_node_event(struct fw_card *card, struct fw_node *node, int event); /* -iso */ -int fw_iso_buffer_map(struct fw_iso_buffer *buffer, struct vm_area_struct *vma); +int fw_iso_buffer_alloc(struct fw_iso_buffer *buffer, int page_count); +int fw_iso_buffer_map_dma(struct fw_iso_buffer *buffer, struct fw_card *card, + enum dma_data_direction direction); +int fw_iso_buffer_map_vma(struct fw_iso_buffer *buffer, + struct vm_area_struct *vma); /* -topology */ diff --git a/include/linux/firewire.h b/include/linux/firewire.h index cdc9b719e9c7..0a1905719f6f 100644 --- a/include/linux/firewire.h +++ b/include/linux/firewire.h @@ -391,6 +391,7 @@ struct fw_iso_buffer { enum dma_data_direction direction; struct page **pages; int page_count; + int page_count_mapped; }; int fw_iso_buffer_init(struct fw_iso_buffer *buffer, struct fw_card *card, -- cgit v1.2.3 From 7bdbff6762a573b911e4ee5715779d8ee6a62631 Mon Sep 17 00:00:00 2001 From: Clemens Ladisch Date: Wed, 11 Apr 2012 17:38:10 +0200 Subject: firewire: move rcode_string() to core There is nothing audio-specific about the rcode_string() helper, so move it from snd-firewire-lib into firewire-core to allow other code to use it. Signed-off-by: Clemens Ladisch Signed-off-by: Stefan Richter (fixed sound/firewire/cmp.c) --- drivers/firewire/core-transaction.c | 26 ++++++++++++++++++++++++++ include/linux/firewire.h | 1 + sound/firewire/cmp.c | 2 +- sound/firewire/lib.c | 28 +--------------------------- sound/firewire/lib.h | 1 - 5 files changed, 29 insertions(+), 29 deletions(-) (limited to 'include') diff --git a/drivers/firewire/core-transaction.c b/drivers/firewire/core-transaction.c index dea2dcc9310d..1c4980c32f90 100644 --- a/drivers/firewire/core-transaction.c +++ b/drivers/firewire/core-transaction.c @@ -994,6 +994,32 @@ void fw_core_handle_response(struct fw_card *card, struct fw_packet *p) } EXPORT_SYMBOL(fw_core_handle_response); +/** + * fw_rcode_string - convert a firewire result code to an error description + * @rcode: the result code + */ +const char *fw_rcode_string(int rcode) +{ + static const char *const names[] = { + [RCODE_COMPLETE] = "no error", + [RCODE_CONFLICT_ERROR] = "conflict error", + [RCODE_DATA_ERROR] = "data error", + [RCODE_TYPE_ERROR] = "type error", + [RCODE_ADDRESS_ERROR] = "address error", + [RCODE_SEND_ERROR] = "send error", + [RCODE_CANCELLED] = "timeout", + [RCODE_BUSY] = "busy", + [RCODE_GENERATION] = "bus reset", + [RCODE_NO_ACK] = "no ack", + }; + + if ((unsigned int)rcode < ARRAY_SIZE(names) && names[rcode]) + return names[rcode]; + else + return "unknown"; +} +EXPORT_SYMBOL(fw_rcode_string); + static const struct fw_address_region topology_map_region = { .start = CSR_REGISTER_BASE | CSR_TOPOLOGY_MAP, .end = CSR_REGISTER_BASE | CSR_TOPOLOGY_MAP_END, }; diff --git a/include/linux/firewire.h b/include/linux/firewire.h index 0a1905719f6f..584826ba2eb7 100644 --- a/include/linux/firewire.h +++ b/include/linux/firewire.h @@ -334,6 +334,7 @@ int fw_cancel_transaction(struct fw_card *card, int fw_run_transaction(struct fw_card *card, int tcode, int destination_id, int generation, int speed, unsigned long long offset, void *payload, size_t length); +const char *fw_rcode_string(int rcode); static inline int fw_stream_packet_destination_id(int tag, int channel, int sy) { diff --git a/sound/firewire/cmp.c b/sound/firewire/cmp.c index 76294f2ae47f..645cb0ba4293 100644 --- a/sound/firewire/cmp.c +++ b/sound/firewire/cmp.c @@ -84,7 +84,7 @@ static int pcr_modify(struct cmp_connection *c, return 0; io_error: - cmp_error(c, "transaction failed: %s\n", rcode_string(rcode)); + cmp_error(c, "transaction failed: %s\n", fw_rcode_string(rcode)); return -EIO; bus_reset: diff --git a/sound/firewire/lib.c b/sound/firewire/lib.c index 4750cea2210e..14eb41498372 100644 --- a/sound/firewire/lib.c +++ b/sound/firewire/lib.c @@ -13,32 +13,6 @@ #define ERROR_RETRY_DELAY_MS 5 -/** - * rcode_string - convert a firewire result code to a string - * @rcode: the result - */ -const char *rcode_string(unsigned int rcode) -{ - static const char *const names[] = { - [RCODE_COMPLETE] = "complete", - [RCODE_CONFLICT_ERROR] = "conflict error", - [RCODE_DATA_ERROR] = "data error", - [RCODE_TYPE_ERROR] = "type error", - [RCODE_ADDRESS_ERROR] = "address error", - [RCODE_SEND_ERROR] = "send error", - [RCODE_CANCELLED] = "cancelled", - [RCODE_BUSY] = "busy", - [RCODE_GENERATION] = "generation", - [RCODE_NO_ACK] = "no ack", - }; - - if (rcode < ARRAY_SIZE(names) && names[rcode]) - return names[rcode]; - else - return "unknown"; -} -EXPORT_SYMBOL(rcode_string); - /** * snd_fw_transaction - send a request and wait for its completion * @unit: the driver's unit on the target device @@ -71,7 +45,7 @@ int snd_fw_transaction(struct fw_unit *unit, int tcode, if (rcode_is_permanent_error(rcode) || ++tries >= 3) { dev_err(&unit->device, "transaction failed: %s\n", - rcode_string(rcode)); + fw_rcode_string(rcode)); return -EIO; } diff --git a/sound/firewire/lib.h b/sound/firewire/lib.h index 064f3fd9ab06..aef301476ea9 100644 --- a/sound/firewire/lib.h +++ b/sound/firewire/lib.h @@ -8,7 +8,6 @@ struct fw_unit; int snd_fw_transaction(struct fw_unit *unit, int tcode, u64 offset, void *buffer, size_t length); -const char *rcode_string(unsigned int rcode); /* returns true if retrying the transaction would not make sense */ static inline bool rcode_is_permanent_error(int rcode) -- cgit v1.2.3 From 766644d2df254934d656a0a0628b636212c24f9e Mon Sep 17 00:00:00 2001 From: Thomas Abraham Date: Sun, 25 Mar 2012 20:32:49 +0530 Subject: of/irq: add empty irq_of_parse_and_map() for non-dt builds Add a empty irq_of_parse_and_map() function that returns 0 for non-dt builds and avoid having #ifdef CONFIG_OF around all calls to irq_of_parse_and_map(). In addition to that, the irq_of_parse_and_map() function declaration is made available only if CONFIG_OF_IRQ is defined, which is the same config option that makes the irq_of_parse_and_map() function definition available. While at it, fix a typo as well. Changes since v1: - Moved irq_of_parse_and_map() function declaration under CONFIG_OF_IRQ. - Fix a minor typo in comments. Suggested-by: Grant Likely Signed-off-by: Thomas Abraham Acked-by: Rob Herring [grant.likely: fix bug causing SPARC to break] Signed-off-by: Grant Likely --- include/linux/of_irq.h | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/linux/of_irq.h b/include/linux/of_irq.h index d229ad3edee0..1717cd935e1c 100644 --- a/include/linux/of_irq.h +++ b/include/linux/of_irq.h @@ -11,7 +11,7 @@ struct of_irq; #include /* - * irq_of_parse_and_map() is used ba all OF enabled platforms; but SPARC + * irq_of_parse_and_map() is used by all OF enabled platforms; but SPARC * implements it differently. However, the prototype is the same for all, * so declare it here regardless of the CONFIG_OF_IRQ setting. */ @@ -76,5 +76,13 @@ extern struct device_node *of_irq_find_parent(struct device_node *child); extern void of_irq_init(const struct of_device_id *matches); #endif /* CONFIG_OF_IRQ */ -#endif /* CONFIG_OF */ + +#else /* !CONFIG_OF */ +static inline unsigned int irq_of_parse_and_map(struct device_node *dev, + int index) +{ + return 0; +} +#endif /* !CONFIG_OF */ + #endif /* __OF_IRQ_H */ -- cgit v1.2.3 From e245afe984b120704f15bc8d391fdb6cf96cfe0c Mon Sep 17 00:00:00 2001 From: Hans Verkuil Date: Tue, 17 Apr 2012 08:41:58 -0300 Subject: [media] videodev2.h: Fix VIDIOC_QUERYMENU ioctl regression Fixes a regression in VIDIOC_QUERYMENU introduced when the __s64 value field was added to the union. On a 64-bit system this will change the size of this v4l2_querymenu structure from 44 to 48 bytes, thus breaking the ABI. By adding the packed attribute it is working again. Tested on both 64 and 32 bit systems. Signed-off-by: Hans Verkuil Acked-by: Sakari Ailus Signed-off-by: Mauro Carvalho Chehab --- include/linux/videodev2.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/videodev2.h b/include/linux/videodev2.h index e69cacc9e9ea..5a09ac3f7683 100644 --- a/include/linux/videodev2.h +++ b/include/linux/videodev2.h @@ -1176,7 +1176,7 @@ struct v4l2_querymenu { __s64 value; }; __u32 reserved; -}; +} __attribute__ ((packed)); /* Control flags */ #define V4L2_CTRL_FLAG_DISABLED 0x0001 -- cgit v1.2.3 From b72d66770953c2177d70a7a5d24521a447d2b443 Mon Sep 17 00:00:00 2001 From: Guennadi Liakhovetski Date: Wed, 18 Apr 2012 03:59:58 -0300 Subject: [media] V4L: fix a compiler warning Fix the warning: In file included from /home/lyakh/software/project/24/src/linux-2.6/drivers/media/video/v4l2-subdev.c:29: linux-2.6/include/media/v4l2-ctrls.h:497: warning: 'struct file' declared inside parameter list linux-2.6/include/media/v4l2-ctrls.h:497: warning: its scope is only this definition or declaration, which is probably not what you want linux-2.6/include/media/v4l2-ctrls.h:505: warning: 'struct file' declared inside parameter list Signed-off-by: Guennadi Liakhovetski Acked-by: Hans Verkuil Signed-off-by: Mauro Carvalho Chehab --- include/media/v4l2-ctrls.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include') diff --git a/include/media/v4l2-ctrls.h b/include/media/v4l2-ctrls.h index 33907a969752..8920f8210eab 100644 --- a/include/media/v4l2-ctrls.h +++ b/include/media/v4l2-ctrls.h @@ -496,6 +496,7 @@ void v4l2_ctrl_add_event(struct v4l2_ctrl *ctrl, void v4l2_ctrl_del_event(struct v4l2_ctrl *ctrl, struct v4l2_subscribed_event *sev); +struct file; /* Can be used as a vidioc_log_status function that just dumps all controls associated with the filehandle. */ int v4l2_ctrl_log_status(struct file *file, void *fh); -- cgit v1.2.3 From f78146b0f9230765c6315b2e14f56112513389ad Mon Sep 17 00:00:00 2001 From: Avi Kivity Date: Wed, 18 Apr 2012 19:22:47 +0300 Subject: KVM: Fix page-crossing MMIO MMIO that are split across a page boundary are currently broken - the code does not expect to be aborted by the exit to userspace for the first MMIO fragment. This patch fixes the problem by generalizing the current code for handling 16-byte MMIOs to handle a number of "fragments", and changes the MMIO code to create those fragments. Signed-off-by: Avi Kivity Signed-off-by: Marcelo Tosatti --- arch/ia64/include/asm/kvm_host.h | 2 + arch/ia64/kvm/kvm-ia64.c | 10 ++-- arch/x86/kvm/x86.c | 114 +++++++++++++++++++++++++++------------ include/linux/kvm_host.h | 31 +++++++++-- 4 files changed, 115 insertions(+), 42 deletions(-) (limited to 'include') diff --git a/arch/ia64/include/asm/kvm_host.h b/arch/ia64/include/asm/kvm_host.h index c4b4bac3d09e..6d6a5ac48d85 100644 --- a/arch/ia64/include/asm/kvm_host.h +++ b/arch/ia64/include/asm/kvm_host.h @@ -449,6 +449,8 @@ struct kvm_vcpu_arch { char log_buf[VMM_LOG_LEN]; union context host; union context guest; + + char mmio_data[8]; }; struct kvm_vm_stat { diff --git a/arch/ia64/kvm/kvm-ia64.c b/arch/ia64/kvm/kvm-ia64.c index 9d80ff8d9eff..882ab21a8dcd 100644 --- a/arch/ia64/kvm/kvm-ia64.c +++ b/arch/ia64/kvm/kvm-ia64.c @@ -232,12 +232,12 @@ static int handle_mmio(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) if ((p->addr & PAGE_MASK) == IOAPIC_DEFAULT_BASE_ADDRESS) goto mmio; vcpu->mmio_needed = 1; - vcpu->mmio_phys_addr = kvm_run->mmio.phys_addr = p->addr; - vcpu->mmio_size = kvm_run->mmio.len = p->size; + vcpu->mmio_fragments[0].gpa = kvm_run->mmio.phys_addr = p->addr; + vcpu->mmio_fragments[0].len = kvm_run->mmio.len = p->size; vcpu->mmio_is_write = kvm_run->mmio.is_write = !p->dir; if (vcpu->mmio_is_write) - memcpy(vcpu->mmio_data, &p->data, p->size); + memcpy(vcpu->arch.mmio_data, &p->data, p->size); memcpy(kvm_run->mmio.data, &p->data, p->size); kvm_run->exit_reason = KVM_EXIT_MMIO; return 0; @@ -719,7 +719,7 @@ static void kvm_set_mmio_data(struct kvm_vcpu *vcpu) struct kvm_mmio_req *p = kvm_get_vcpu_ioreq(vcpu); if (!vcpu->mmio_is_write) - memcpy(&p->data, vcpu->mmio_data, 8); + memcpy(&p->data, vcpu->arch.mmio_data, 8); p->state = STATE_IORESP_READY; } @@ -739,7 +739,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run) } if (vcpu->mmio_needed) { - memcpy(vcpu->mmio_data, kvm_run->mmio.data, 8); + memcpy(vcpu->arch.mmio_data, kvm_run->mmio.data, 8); kvm_set_mmio_data(vcpu); vcpu->mmio_read_completed = 1; vcpu->mmio_needed = 0; diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 0d9a57875f0b..4de705cdcafd 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -3718,9 +3718,8 @@ struct read_write_emulator_ops { static int read_prepare(struct kvm_vcpu *vcpu, void *val, int bytes) { if (vcpu->mmio_read_completed) { - memcpy(val, vcpu->mmio_data, bytes); trace_kvm_mmio(KVM_TRACE_MMIO_READ, bytes, - vcpu->mmio_phys_addr, *(u64 *)val); + vcpu->mmio_fragments[0].gpa, *(u64 *)val); vcpu->mmio_read_completed = 0; return 1; } @@ -3756,8 +3755,9 @@ static int read_exit_mmio(struct kvm_vcpu *vcpu, gpa_t gpa, static int write_exit_mmio(struct kvm_vcpu *vcpu, gpa_t gpa, void *val, int bytes) { - memcpy(vcpu->mmio_data, val, bytes); - memcpy(vcpu->run->mmio.data, vcpu->mmio_data, 8); + struct kvm_mmio_fragment *frag = &vcpu->mmio_fragments[0]; + + memcpy(vcpu->run->mmio.data, frag->data, frag->len); return X86EMUL_CONTINUE; } @@ -3784,10 +3784,7 @@ static int emulator_read_write_onepage(unsigned long addr, void *val, gpa_t gpa; int handled, ret; bool write = ops->write; - - if (ops->read_write_prepare && - ops->read_write_prepare(vcpu, val, bytes)) - return X86EMUL_CONTINUE; + struct kvm_mmio_fragment *frag; ret = vcpu_mmio_gva_to_gpa(vcpu, addr, &gpa, exception, write); @@ -3813,15 +3810,19 @@ mmio: bytes -= handled; val += handled; - vcpu->mmio_needed = 1; - vcpu->run->exit_reason = KVM_EXIT_MMIO; - vcpu->run->mmio.phys_addr = vcpu->mmio_phys_addr = gpa; - vcpu->mmio_size = bytes; - vcpu->run->mmio.len = min(vcpu->mmio_size, 8); - vcpu->run->mmio.is_write = vcpu->mmio_is_write = write; - vcpu->mmio_index = 0; + while (bytes) { + unsigned now = min(bytes, 8U); - return ops->read_write_exit_mmio(vcpu, gpa, val, bytes); + frag = &vcpu->mmio_fragments[vcpu->mmio_nr_fragments++]; + frag->gpa = gpa; + frag->data = val; + frag->len = now; + + gpa += now; + val += now; + bytes -= now; + } + return X86EMUL_CONTINUE; } int emulator_read_write(struct x86_emulate_ctxt *ctxt, unsigned long addr, @@ -3830,10 +3831,18 @@ int emulator_read_write(struct x86_emulate_ctxt *ctxt, unsigned long addr, struct read_write_emulator_ops *ops) { struct kvm_vcpu *vcpu = emul_to_vcpu(ctxt); + gpa_t gpa; + int rc; + + if (ops->read_write_prepare && + ops->read_write_prepare(vcpu, val, bytes)) + return X86EMUL_CONTINUE; + + vcpu->mmio_nr_fragments = 0; /* Crossing a page boundary? */ if (((addr + bytes - 1) ^ addr) & PAGE_MASK) { - int rc, now; + int now; now = -addr & ~PAGE_MASK; rc = emulator_read_write_onepage(addr, val, now, exception, @@ -3846,8 +3855,25 @@ int emulator_read_write(struct x86_emulate_ctxt *ctxt, unsigned long addr, bytes -= now; } - return emulator_read_write_onepage(addr, val, bytes, exception, - vcpu, ops); + rc = emulator_read_write_onepage(addr, val, bytes, exception, + vcpu, ops); + if (rc != X86EMUL_CONTINUE) + return rc; + + if (!vcpu->mmio_nr_fragments) + return rc; + + gpa = vcpu->mmio_fragments[0].gpa; + + vcpu->mmio_needed = 1; + vcpu->mmio_cur_fragment = 0; + + vcpu->run->mmio.len = vcpu->mmio_fragments[0].len; + vcpu->run->mmio.is_write = vcpu->mmio_is_write = ops->write; + vcpu->run->exit_reason = KVM_EXIT_MMIO; + vcpu->run->mmio.phys_addr = gpa; + + return ops->read_write_exit_mmio(vcpu, gpa, val, bytes); } static int emulator_read_emulated(struct x86_emulate_ctxt *ctxt, @@ -5446,33 +5472,55 @@ static int __vcpu_run(struct kvm_vcpu *vcpu) return r; } +/* + * Implements the following, as a state machine: + * + * read: + * for each fragment + * write gpa, len + * exit + * copy data + * execute insn + * + * write: + * for each fragment + * write gpa, len + * copy data + * exit + */ static int complete_mmio(struct kvm_vcpu *vcpu) { struct kvm_run *run = vcpu->run; + struct kvm_mmio_fragment *frag; int r; if (!(vcpu->arch.pio.count || vcpu->mmio_needed)) return 1; if (vcpu->mmio_needed) { - vcpu->mmio_needed = 0; + /* Complete previous fragment */ + frag = &vcpu->mmio_fragments[vcpu->mmio_cur_fragment++]; if (!vcpu->mmio_is_write) - memcpy(vcpu->mmio_data + vcpu->mmio_index, - run->mmio.data, 8); - vcpu->mmio_index += 8; - if (vcpu->mmio_index < vcpu->mmio_size) { - run->exit_reason = KVM_EXIT_MMIO; - run->mmio.phys_addr = vcpu->mmio_phys_addr + vcpu->mmio_index; - memcpy(run->mmio.data, vcpu->mmio_data + vcpu->mmio_index, 8); - run->mmio.len = min(vcpu->mmio_size - vcpu->mmio_index, 8); - run->mmio.is_write = vcpu->mmio_is_write; - vcpu->mmio_needed = 1; - return 0; + memcpy(frag->data, run->mmio.data, frag->len); + if (vcpu->mmio_cur_fragment == vcpu->mmio_nr_fragments) { + vcpu->mmio_needed = 0; + if (vcpu->mmio_is_write) + return 1; + vcpu->mmio_read_completed = 1; + goto done; } + /* Initiate next fragment */ + ++frag; + run->exit_reason = KVM_EXIT_MMIO; + run->mmio.phys_addr = frag->gpa; if (vcpu->mmio_is_write) - return 1; - vcpu->mmio_read_completed = 1; + memcpy(run->mmio.data, frag->data, frag->len); + run->mmio.len = frag->len; + run->mmio.is_write = vcpu->mmio_is_write; + return 0; + } +done: vcpu->srcu_idx = srcu_read_lock(&vcpu->kvm->srcu); r = emulate_instruction(vcpu, EMULTYPE_NO_DECODE); srcu_read_unlock(&vcpu->kvm->srcu, vcpu->srcu_idx); diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h index a2d00b1bbf54..186ffab0b9f0 100644 --- a/include/linux/kvm_host.h +++ b/include/linux/kvm_host.h @@ -34,6 +34,20 @@ #define KVM_MMIO_SIZE 8 #endif +/* + * If we support unaligned MMIO, at most one fragment will be split into two: + */ +#ifdef KVM_UNALIGNED_MMIO +# define KVM_EXTRA_MMIO_FRAGMENTS 1 +#else +# define KVM_EXTRA_MMIO_FRAGMENTS 0 +#endif + +#define KVM_USER_MMIO_SIZE 8 + +#define KVM_MAX_MMIO_FRAGMENTS \ + (KVM_MMIO_SIZE / KVM_USER_MMIO_SIZE + KVM_EXTRA_MMIO_FRAGMENTS) + /* * vcpu->requests bit members */ @@ -117,6 +131,16 @@ enum { EXITING_GUEST_MODE }; +/* + * Sometimes a large or cross-page mmio needs to be broken up into separate + * exits for userspace servicing. + */ +struct kvm_mmio_fragment { + gpa_t gpa; + void *data; + unsigned len; +}; + struct kvm_vcpu { struct kvm *kvm; #ifdef CONFIG_PREEMPT_NOTIFIERS @@ -144,10 +168,9 @@ struct kvm_vcpu { int mmio_needed; int mmio_read_completed; int mmio_is_write; - int mmio_size; - int mmio_index; - unsigned char mmio_data[KVM_MMIO_SIZE]; - gpa_t mmio_phys_addr; + int mmio_cur_fragment; + int mmio_nr_fragments; + struct kvm_mmio_fragment mmio_fragments[KVM_MAX_MMIO_FRAGMENTS]; #endif #ifdef CONFIG_KVM_ASYNC_PF -- cgit v1.2.3 From 8bd435b30ecacb69bbb8b2d3e251f770b807c5b2 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Fri, 13 Apr 2012 13:11:28 -0700 Subject: blkcg: remove static policy ID enums Remove BLKIO_POLICY_* enums and let blkio_policy_register() allocate @pol->plid dynamically on registration. The maximum number of blkcg policies which can be registered at the same time is defined by BLKCG_MAX_POLS constant added to include/linux/blkdev.h. Note that blkio_policy_register() now may fail. Policy init functions updated accordingly and unnecessary ifdefs removed from cfq_init(). Signed-off-by: Tejun Heo Cc: Vivek Goyal Signed-off-by: Jens Axboe --- block/blk-cgroup.c | 59 +++++++++++++++++++++++++++++++++++++------------- block/blk-cgroup.h | 15 ++++--------- block/blk-throttle.c | 4 +--- block/cfq-iosched.c | 25 +++++++++++---------- include/linux/blkdev.h | 7 +++++- 5 files changed, 69 insertions(+), 41 deletions(-) (limited to 'include') diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index b1231524a097..2d4d7d6d9ae9 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -31,7 +31,7 @@ static LIST_HEAD(all_q_list); struct blkio_cgroup blkio_root_cgroup = { .cfq_weight = 2 * CFQ_WEIGHT_DEFAULT }; EXPORT_SYMBOL_GPL(blkio_root_cgroup); -static struct blkio_policy_type *blkio_policy[BLKIO_NR_POLICIES]; +static struct blkio_policy_type *blkio_policy[BLKCG_MAX_POLS]; struct blkio_cgroup *cgroup_to_blkio_cgroup(struct cgroup *cgroup) { @@ -67,7 +67,7 @@ static void blkg_free(struct blkio_group *blkg) if (!blkg) return; - for (i = 0; i < BLKIO_NR_POLICIES; i++) { + for (i = 0; i < BLKCG_MAX_POLS; i++) { struct blkio_policy_type *pol = blkio_policy[i]; struct blkg_policy_data *pd = blkg->pd[i]; @@ -107,7 +107,7 @@ static struct blkio_group *blkg_alloc(struct blkio_cgroup *blkcg, blkg->refcnt = 1; cgroup_path(blkcg->css.cgroup, blkg->path, sizeof(blkg->path)); - for (i = 0; i < BLKIO_NR_POLICIES; i++) { + for (i = 0; i < BLKCG_MAX_POLS; i++) { struct blkio_policy_type *pol = blkio_policy[i]; struct blkg_policy_data *pd; @@ -127,7 +127,7 @@ static struct blkio_group *blkg_alloc(struct blkio_cgroup *blkcg, } /* invoke per-policy init */ - for (i = 0; i < BLKIO_NR_POLICIES; i++) { + for (i = 0; i < BLKCG_MAX_POLS; i++) { struct blkio_policy_type *pol = blkio_policy[i]; if (pol) @@ -320,7 +320,7 @@ blkiocg_reset_stats(struct cgroup *cgroup, struct cftype *cftype, u64 val) * anyway. If you get hit by a race, retry. */ hlist_for_each_entry(blkg, n, &blkcg->blkg_list, blkcg_node) { - for (i = 0; i < BLKIO_NR_POLICIES; i++) { + for (i = 0; i < BLKCG_MAX_POLS; i++) { struct blkio_policy_type *pol = blkio_policy[i]; if (pol && pol->ops.blkio_reset_group_stats_fn) @@ -729,46 +729,75 @@ struct cgroup_subsys blkio_subsys = { }; EXPORT_SYMBOL_GPL(blkio_subsys); -void blkio_policy_register(struct blkio_policy_type *blkiop) +/** + * blkio_policy_register - register a blkcg policy + * @blkiop: blkcg policy to register + * + * Register @blkiop with blkcg core. Might sleep and @blkiop may be + * modified on successful registration. Returns 0 on success and -errno on + * failure. + */ +int blkio_policy_register(struct blkio_policy_type *blkiop) { struct request_queue *q; + int i, ret; mutex_lock(&blkcg_pol_mutex); - blkcg_bypass_start(); + /* find an empty slot */ + ret = -ENOSPC; + for (i = 0; i < BLKCG_MAX_POLS; i++) + if (!blkio_policy[i]) + break; + if (i >= BLKCG_MAX_POLS) + goto out_unlock; - BUG_ON(blkio_policy[blkiop->plid]); - blkio_policy[blkiop->plid] = blkiop; + /* register and update blkgs */ + blkiop->plid = i; + blkio_policy[i] = blkiop; + + blkcg_bypass_start(); list_for_each_entry(q, &all_q_list, all_q_node) update_root_blkg_pd(q, blkiop); - blkcg_bypass_end(); + /* everything is in place, add intf files for the new policy */ if (blkiop->cftypes) WARN_ON(cgroup_add_cftypes(&blkio_subsys, blkiop->cftypes)); - + ret = 0; +out_unlock: mutex_unlock(&blkcg_pol_mutex); + return ret; } EXPORT_SYMBOL_GPL(blkio_policy_register); +/** + * blkiop_policy_unregister - unregister a blkcg policy + * @blkiop: blkcg policy to unregister + * + * Undo blkio_policy_register(@blkiop). Might sleep. + */ void blkio_policy_unregister(struct blkio_policy_type *blkiop) { struct request_queue *q; mutex_lock(&blkcg_pol_mutex); + if (WARN_ON(blkio_policy[blkiop->plid] != blkiop)) + goto out_unlock; + + /* kill the intf files first */ if (blkiop->cftypes) cgroup_rm_cftypes(&blkio_subsys, blkiop->cftypes); - blkcg_bypass_start(); - - BUG_ON(blkio_policy[blkiop->plid] != blkiop); + /* unregister and update blkgs */ blkio_policy[blkiop->plid] = NULL; + blkcg_bypass_start(); list_for_each_entry(q, &all_q_list, all_q_node) update_root_blkg_pd(q, blkiop); blkcg_bypass_end(); - +out_unlock: mutex_unlock(&blkcg_pol_mutex); } EXPORT_SYMBOL_GPL(blkio_policy_unregister); diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h index 26949731108f..be80d6eb6531 100644 --- a/block/blk-cgroup.h +++ b/block/blk-cgroup.h @@ -17,13 +17,6 @@ #include #include -enum blkio_policy_id { - BLKIO_POLICY_PROP = 0, /* Proportional Bandwidth division */ - BLKIO_POLICY_THROTL, /* Throttling */ - - BLKIO_NR_POLICIES, -}; - /* Max limits for throttle policy */ #define THROTL_IOPS_MAX UINT_MAX @@ -86,7 +79,7 @@ struct blkio_group { /* reference count */ int refcnt; - struct blkg_policy_data *pd[BLKIO_NR_POLICIES]; + struct blkg_policy_data *pd[BLKCG_MAX_POLS]; struct rcu_head rcu_head; }; @@ -103,7 +96,7 @@ struct blkio_policy_ops { struct blkio_policy_type { struct blkio_policy_ops ops; - enum blkio_policy_id plid; + int plid; size_t pdata_size; /* policy specific private data size */ struct cftype *cftypes; /* cgroup files for the policy */ }; @@ -113,7 +106,7 @@ extern void blkcg_drain_queue(struct request_queue *q); extern void blkcg_exit_queue(struct request_queue *q); /* Blkio controller policy registration */ -extern void blkio_policy_register(struct blkio_policy_type *); +extern int blkio_policy_register(struct blkio_policy_type *); extern void blkio_policy_unregister(struct blkio_policy_type *); extern void blkg_destroy_all(struct request_queue *q, bool destroy_root); extern void update_root_blkg_pd(struct request_queue *q, @@ -329,7 +322,7 @@ struct blkio_policy_type { static inline int blkcg_init_queue(struct request_queue *q) { return 0; } static inline void blkcg_drain_queue(struct request_queue *q) { } static inline void blkcg_exit_queue(struct request_queue *q) { } -static inline void blkio_policy_register(struct blkio_policy_type *blkiop) { } +static inline int blkio_policy_register(struct blkio_policy_type *blkiop) { return 0; } static inline void blkio_policy_unregister(struct blkio_policy_type *blkiop) { } static inline void blkg_destroy_all(struct request_queue *q, bool destory_root) { } diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 07c17c27a628..0dc4645aa7fe 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -1089,7 +1089,6 @@ static struct blkio_policy_type blkio_policy_throtl = { .blkio_exit_group_fn = throtl_exit_blkio_group, .blkio_reset_group_stats_fn = throtl_reset_group_stats, }, - .plid = BLKIO_POLICY_THROTL, .pdata_size = sizeof(struct throtl_grp), .cftypes = throtl_files, }; @@ -1271,8 +1270,7 @@ static int __init throtl_init(void) if (!kthrotld_workqueue) panic("Failed to create kthrotld\n"); - blkio_policy_register(&blkio_policy_throtl); - return 0; + return blkio_policy_register(&blkio_policy_throtl); } module_init(throtl_init); diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index d02f0ae9637f..08db2fc70c29 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -4157,7 +4157,6 @@ static struct blkio_policy_type blkio_policy_cfq = { .blkio_init_group_fn = cfq_init_blkio_group, .blkio_reset_group_stats_fn = cfqg_stats_reset, }, - .plid = BLKIO_POLICY_PROP, .pdata_size = sizeof(struct cfq_group), .cftypes = cfq_blkcg_files, }; @@ -4181,27 +4180,31 @@ static int __init cfq_init(void) #else cfq_group_idle = 0; #endif + + ret = blkio_policy_register(&blkio_policy_cfq); + if (ret) + return ret; + cfq_pool = KMEM_CACHE(cfq_queue, 0); if (!cfq_pool) - return -ENOMEM; + goto err_pol_unreg; ret = elv_register(&iosched_cfq); - if (ret) { - kmem_cache_destroy(cfq_pool); - return ret; - } + if (ret) + goto err_free_pool; -#ifdef CONFIG_CFQ_GROUP_IOSCHED - blkio_policy_register(&blkio_policy_cfq); -#endif return 0; + +err_free_pool: + kmem_cache_destroy(cfq_pool); +err_pol_unreg: + blkio_policy_unregister(&blkio_policy_cfq); + return ret; } static void __exit cfq_exit(void) { -#ifdef CONFIG_CFQ_GROUP_IOSCHED blkio_policy_unregister(&blkio_policy_cfq); -#endif elv_unregister(&iosched_cfq); kmem_cache_destroy(cfq_pool); } diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 33f1b29e53f4..d2c69f8c188a 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -35,6 +35,12 @@ struct bsg_job; #define BLKDEV_MIN_RQ 4 #define BLKDEV_MAX_RQ 128 /* Default maximum */ +/* + * Maximum number of blkcg policies allowed to be registered concurrently. + * Defined here to simplify include dependency. + */ +#define BLKCG_MAX_POLS 2 + struct request; typedef void (rq_end_io_fn)(struct request *, int); @@ -363,7 +369,6 @@ struct request_queue { struct list_head icq_list; #ifdef CONFIG_BLK_CGROUP - /* XXX: array size hardcoded to avoid include dependency (temporary) */ struct list_head blkg_list; #endif -- cgit v1.2.3 From 03d8e11142a893ad322285d3c8a08e88b570cda1 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Fri, 13 Apr 2012 13:11:32 -0700 Subject: blkcg: add request_queue->root_blkg With per-queue policy activation, root blkg creation will be moved to blkcg core. Add q->root_blkg in preparation. For blk-throtl, this replaces throtl_data->root_tg; however, cfq needs to keep cfqd->root_group for !CONFIG_CFQ_GROUP_IOSCHED. This is to prepare for per-queue policy activation and doesn't cause any functional difference. Signed-off-by: Tejun Heo Cc: Vivek Goyal Signed-off-by: Jens Axboe --- block/blk-throttle.c | 16 ++++++++++------ block/cfq-iosched.c | 4 +++- include/linux/blkdev.h | 2 ++ 3 files changed, 15 insertions(+), 7 deletions(-) (limited to 'include') diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 6f1bfdf9a1b7..8c520fad6885 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -97,7 +97,6 @@ struct throtl_data /* service tree for active throtl groups */ struct throtl_rb_root tg_service_tree; - struct throtl_grp *root_tg; struct request_queue *queue; /* Total Number of queued bios on READ and WRITE lists */ @@ -131,6 +130,11 @@ static inline struct blkio_group *tg_to_blkg(struct throtl_grp *tg) return pdata_to_blkg(tg); } +static inline struct throtl_grp *td_root_tg(struct throtl_data *td) +{ + return blkg_to_tg(td->queue->root_blkg); +} + enum tg_state_flags { THROTL_TG_FLAG_on_rr = 0, /* on round-robin busy list */ }; @@ -261,7 +265,7 @@ throtl_grp *throtl_lookup_tg(struct throtl_data *td, struct blkio_cgroup *blkcg) * Avoid lookup in this case */ if (blkcg == &blkio_root_cgroup) - return td->root_tg; + return td_root_tg(td); return blkg_to_tg(blkg_lookup(blkcg, td->queue)); } @@ -277,7 +281,7 @@ static struct throtl_grp *throtl_lookup_create_tg(struct throtl_data *td, * Avoid lookup in this case */ if (blkcg == &blkio_root_cgroup) { - tg = td->root_tg; + tg = td_root_tg(td); } else { struct blkio_group *blkg; @@ -287,7 +291,7 @@ static struct throtl_grp *throtl_lookup_create_tg(struct throtl_data *td, if (!IS_ERR(blkg)) tg = blkg_to_tg(blkg); else if (!blk_queue_dead(q)) - tg = td->root_tg; + tg = td_root_tg(td); } return tg; @@ -1245,12 +1249,12 @@ int blk_throtl_init(struct request_queue *q) blkg = blkg_lookup_create(&blkio_root_cgroup, q, true); if (!IS_ERR(blkg)) - td->root_tg = blkg_to_tg(blkg); + q->root_blkg = blkg; spin_unlock_irq(q->queue_lock); rcu_read_unlock(); - if (!td->root_tg) { + if (!q->root_blkg) { kfree(td); return -ENOMEM; } diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index de95f9a2acf8..86440e04f3ee 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -3964,8 +3964,10 @@ static int cfq_init_queue(struct request_queue *q) spin_lock_irq(q->queue_lock); blkg = blkg_lookup_create(&blkio_root_cgroup, q, true); - if (!IS_ERR(blkg)) + if (!IS_ERR(blkg)) { + q->root_blkg = blkg; cfqd->root_group = blkg_to_cfqg(blkg); + } spin_unlock_irq(q->queue_lock); rcu_read_unlock(); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index d2c69f8c188a..b01c377fd739 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -31,6 +31,7 @@ struct blk_trace; struct request; struct sg_io_hdr; struct bsg_job; +struct blkio_group; #define BLKDEV_MIN_RQ 4 #define BLKDEV_MAX_RQ 128 /* Default maximum */ @@ -369,6 +370,7 @@ struct request_queue { struct list_head icq_list; #ifdef CONFIG_BLK_CGROUP + struct blkio_group *root_blkg; struct list_head blkg_list; #endif -- cgit v1.2.3 From a2b1693bac45ea3fe3ba612fd22c45f17449f610 Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Fri, 13 Apr 2012 13:11:33 -0700 Subject: blkcg: implement per-queue policy activation All blkcg policies were assumed to be enabled on all request_queues. Due to various implementation obstacles, during the recent blkcg core updates, this was temporarily implemented as shooting down all !root blkgs on elevator switch and policy [de]registration combined with half-broken in-place root blkg updates. In addition to being buggy and racy, this meant losing all blkcg configurations across those events. Now that blkcg is cleaned up enough, this patch replaces the temporary implementation with proper per-queue policy activation. Each blkcg policy should call the new blkcg_[de]activate_policy() to enable and disable the policy on a specific queue. blkcg_activate_policy() allocates and installs policy data for the policy for all existing blkgs. blkcg_deactivate_policy() does the reverse. If a policy is not enabled for a given queue, blkg printing / config functions skip the respective blkg for the queue. blkcg_activate_policy() also takes care of root blkg creation, and cfq_init_queue() and blk_throtl_init() are updated accordingly. This replaces blkcg_bypass_{start|end}() and update_root_blkg_pd() unnecessary. Dropped. v2: cfq_init_queue() was returning uninitialized @ret on root_group alloc failure if !CONFIG_CFQ_GROUP_IOSCHED. Fixed. Signed-off-by: Tejun Heo Cc: Vivek Goyal Signed-off-by: Jens Axboe --- block/blk-cgroup.c | 228 +++++++++++++++++++++++++++++++++---------------- block/blk-cgroup.h | 15 +++- block/blk-throttle.c | 52 +++++------ block/cfq-iosched.c | 37 ++++---- block/elevator.c | 2 - include/linux/blkdev.h | 1 + 6 files changed, 201 insertions(+), 134 deletions(-) (limited to 'include') diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index d6e4555c982f..d6d59ad105b4 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -54,6 +54,17 @@ struct blkio_cgroup *bio_blkio_cgroup(struct bio *bio) } EXPORT_SYMBOL_GPL(bio_blkio_cgroup); +static bool blkcg_policy_enabled(struct request_queue *q, + const struct blkio_policy_type *pol) +{ + return pol && test_bit(pol->plid, q->blkcg_pols); +} + +static size_t blkg_pd_size(const struct blkio_policy_type *pol) +{ + return sizeof(struct blkg_policy_data) + pol->pdata_size; +} + /** * blkg_free - free a blkg * @blkg: blkg to free @@ -111,12 +122,11 @@ static struct blkio_group *blkg_alloc(struct blkio_cgroup *blkcg, struct blkio_policy_type *pol = blkio_policy[i]; struct blkg_policy_data *pd; - if (!pol) + if (!blkcg_policy_enabled(q, pol)) continue; /* alloc per-policy data and attach it to blkg */ - pd = kzalloc_node(sizeof(*pd) + pol->pdata_size, GFP_ATOMIC, - q->node); + pd = kzalloc_node(blkg_pd_size(pol), GFP_ATOMIC, q->node); if (!pd) { blkg_free(blkg); return NULL; @@ -130,7 +140,7 @@ static struct blkio_group *blkg_alloc(struct blkio_cgroup *blkcg, for (i = 0; i < BLKCG_MAX_POLS; i++) { struct blkio_policy_type *pol = blkio_policy[i]; - if (pol) + if (blkcg_policy_enabled(blkg->q, pol)) pol->ops.blkio_init_group_fn(blkg); } @@ -236,36 +246,6 @@ static void blkg_destroy(struct blkio_group *blkg) blkg_put(blkg); } -/* - * XXX: This updates blkg policy data in-place for root blkg, which is - * necessary across elevator switch and policy registration as root blkgs - * aren't shot down. This broken and racy implementation is temporary. - * Eventually, blkg shoot down will be replaced by proper in-place update. - */ -void update_root_blkg_pd(struct request_queue *q, - const struct blkio_policy_type *pol) -{ - struct blkio_group *blkg = blkg_lookup(&blkio_root_cgroup, q); - struct blkg_policy_data *pd; - - if (!blkg) - return; - - kfree(blkg->pd[pol->plid]); - blkg->pd[pol->plid] = NULL; - - if (!pol) - return; - - pd = kzalloc(sizeof(*pd) + pol->pdata_size, GFP_KERNEL); - WARN_ON_ONCE(!pd); - - blkg->pd[pol->plid] = pd; - pd->blkg = blkg; - pol->ops.blkio_init_group_fn(blkg); -} -EXPORT_SYMBOL_GPL(update_root_blkg_pd); - /** * blkg_destroy_all - destroy all blkgs associated with a request_queue * @q: request_queue of interest @@ -339,7 +319,8 @@ blkiocg_reset_stats(struct cgroup *cgroup, struct cftype *cftype, u64 val) for (i = 0; i < BLKCG_MAX_POLS; i++) { struct blkio_policy_type *pol = blkio_policy[i]; - if (pol && pol->ops.blkio_reset_group_stats_fn) + if (blkcg_policy_enabled(blkg->q, pol) && + pol->ops.blkio_reset_group_stats_fn) pol->ops.blkio_reset_group_stats_fn(blkg); } } @@ -385,7 +366,7 @@ void blkcg_print_blkgs(struct seq_file *sf, struct blkio_cgroup *blkcg, spin_lock_irq(&blkcg->lock); hlist_for_each_entry(blkg, n, &blkcg->blkg_list, blkcg_node) - if (blkg->pd[pol->plid]) + if (blkcg_policy_enabled(blkg->q, pol)) total += prfill(sf, blkg->pd[pol->plid]->pdata, data); spin_unlock_irq(&blkcg->lock); @@ -510,7 +491,10 @@ int blkg_conf_prep(struct blkio_cgroup *blkcg, rcu_read_lock(); spin_lock_irq(disk->queue->queue_lock); - blkg = blkg_lookup_create(blkcg, disk->queue, false); + if (blkcg_policy_enabled(disk->queue, pol)) + blkg = blkg_lookup_create(blkcg, disk->queue, false); + else + blkg = ERR_PTR(-EINVAL); if (IS_ERR(blkg)) { ret = PTR_ERR(blkg); @@ -712,30 +696,6 @@ static int blkiocg_can_attach(struct cgroup *cgrp, struct cgroup_taskset *tset) return ret; } -static void blkcg_bypass_start(void) - __acquires(&all_q_mutex) -{ - struct request_queue *q; - - mutex_lock(&all_q_mutex); - - list_for_each_entry(q, &all_q_list, all_q_node) { - blk_queue_bypass_start(q); - blkg_destroy_all(q, false); - } -} - -static void blkcg_bypass_end(void) - __releases(&all_q_mutex) -{ - struct request_queue *q; - - list_for_each_entry(q, &all_q_list, all_q_node) - blk_queue_bypass_end(q); - - mutex_unlock(&all_q_mutex); -} - struct cgroup_subsys blkio_subsys = { .name = "blkio", .create = blkiocg_create, @@ -748,6 +708,139 @@ struct cgroup_subsys blkio_subsys = { }; EXPORT_SYMBOL_GPL(blkio_subsys); +/** + * blkcg_activate_policy - activate a blkcg policy on a request_queue + * @q: request_queue of interest + * @pol: blkcg policy to activate + * + * Activate @pol on @q. Requires %GFP_KERNEL context. @q goes through + * bypass mode to populate its blkgs with policy_data for @pol. + * + * Activation happens with @q bypassed, so nobody would be accessing blkgs + * from IO path. Update of each blkg is protected by both queue and blkcg + * locks so that holding either lock and testing blkcg_policy_enabled() is + * always enough for dereferencing policy data. + * + * The caller is responsible for synchronizing [de]activations and policy + * [un]registerations. Returns 0 on success, -errno on failure. + */ +int blkcg_activate_policy(struct request_queue *q, + const struct blkio_policy_type *pol) +{ + LIST_HEAD(pds); + struct blkio_group *blkg; + struct blkg_policy_data *pd, *n; + int cnt = 0, ret; + + if (blkcg_policy_enabled(q, pol)) + return 0; + + blk_queue_bypass_start(q); + + /* make sure the root blkg exists and count the existing blkgs */ + spin_lock_irq(q->queue_lock); + + rcu_read_lock(); + blkg = blkg_lookup_create(&blkio_root_cgroup, q, true); + rcu_read_unlock(); + + if (IS_ERR(blkg)) { + ret = PTR_ERR(blkg); + goto out_unlock; + } + q->root_blkg = blkg; + + list_for_each_entry(blkg, &q->blkg_list, q_node) + cnt++; + + spin_unlock_irq(q->queue_lock); + + /* allocate policy_data for all existing blkgs */ + while (cnt--) { + pd = kzalloc_node(blkg_pd_size(pol), GFP_KERNEL, q->node); + if (!pd) { + ret = -ENOMEM; + goto out_free; + } + list_add_tail(&pd->alloc_node, &pds); + } + + /* + * Install the allocated pds. With @q bypassing, no new blkg + * should have been created while the queue lock was dropped. + */ + spin_lock_irq(q->queue_lock); + + list_for_each_entry(blkg, &q->blkg_list, q_node) { + if (WARN_ON(list_empty(&pds))) { + /* umm... this shouldn't happen, just abort */ + ret = -ENOMEM; + goto out_unlock; + } + pd = list_first_entry(&pds, struct blkg_policy_data, alloc_node); + list_del_init(&pd->alloc_node); + + /* grab blkcg lock too while installing @pd on @blkg */ + spin_lock(&blkg->blkcg->lock); + + blkg->pd[pol->plid] = pd; + pd->blkg = blkg; + pol->ops.blkio_init_group_fn(blkg); + + spin_unlock(&blkg->blkcg->lock); + } + + __set_bit(pol->plid, q->blkcg_pols); + ret = 0; +out_unlock: + spin_unlock_irq(q->queue_lock); +out_free: + blk_queue_bypass_end(q); + list_for_each_entry_safe(pd, n, &pds, alloc_node) + kfree(pd); + return ret; +} +EXPORT_SYMBOL_GPL(blkcg_activate_policy); + +/** + * blkcg_deactivate_policy - deactivate a blkcg policy on a request_queue + * @q: request_queue of interest + * @pol: blkcg policy to deactivate + * + * Deactivate @pol on @q. Follows the same synchronization rules as + * blkcg_activate_policy(). + */ +void blkcg_deactivate_policy(struct request_queue *q, + const struct blkio_policy_type *pol) +{ + struct blkio_group *blkg; + + if (!blkcg_policy_enabled(q, pol)) + return; + + blk_queue_bypass_start(q); + spin_lock_irq(q->queue_lock); + + __clear_bit(pol->plid, q->blkcg_pols); + + list_for_each_entry(blkg, &q->blkg_list, q_node) { + /* grab blkcg lock too while removing @pd from @blkg */ + spin_lock(&blkg->blkcg->lock); + + if (pol->ops.blkio_exit_group_fn) + pol->ops.blkio_exit_group_fn(blkg); + + kfree(blkg->pd[pol->plid]); + blkg->pd[pol->plid] = NULL; + + spin_unlock(&blkg->blkcg->lock); + } + + spin_unlock_irq(q->queue_lock); + blk_queue_bypass_end(q); +} +EXPORT_SYMBOL_GPL(blkcg_deactivate_policy); + /** * blkio_policy_register - register a blkcg policy * @blkiop: blkcg policy to register @@ -758,7 +851,6 @@ EXPORT_SYMBOL_GPL(blkio_subsys); */ int blkio_policy_register(struct blkio_policy_type *blkiop) { - struct request_queue *q; int i, ret; mutex_lock(&blkcg_pol_mutex); @@ -775,11 +867,6 @@ int blkio_policy_register(struct blkio_policy_type *blkiop) blkiop->plid = i; blkio_policy[i] = blkiop; - blkcg_bypass_start(); - list_for_each_entry(q, &all_q_list, all_q_node) - update_root_blkg_pd(q, blkiop); - blkcg_bypass_end(); - /* everything is in place, add intf files for the new policy */ if (blkiop->cftypes) WARN_ON(cgroup_add_cftypes(&blkio_subsys, blkiop->cftypes)); @@ -798,8 +885,6 @@ EXPORT_SYMBOL_GPL(blkio_policy_register); */ void blkio_policy_unregister(struct blkio_policy_type *blkiop) { - struct request_queue *q; - mutex_lock(&blkcg_pol_mutex); if (WARN_ON(blkio_policy[blkiop->plid] != blkiop)) @@ -811,11 +896,6 @@ void blkio_policy_unregister(struct blkio_policy_type *blkiop) /* unregister and update blkgs */ blkio_policy[blkiop->plid] = NULL; - - blkcg_bypass_start(); - list_for_each_entry(q, &all_q_list, all_q_node) - update_root_blkg_pd(q, blkiop); - blkcg_bypass_end(); out_unlock: mutex_unlock(&blkcg_pol_mutex); } diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h index df1c7b290c22..66253a7c8ff4 100644 --- a/block/blk-cgroup.h +++ b/block/blk-cgroup.h @@ -64,6 +64,9 @@ struct blkg_policy_data { /* the blkg this per-policy data belongs to */ struct blkio_group *blkg; + /* used during policy activation */ + struct list_head alloc_node; + /* pol->pdata_size bytes of private data used by policy impl */ char pdata[] __aligned(__alignof__(unsigned long long)); }; @@ -108,9 +111,11 @@ extern void blkcg_exit_queue(struct request_queue *q); /* Blkio controller policy registration */ extern int blkio_policy_register(struct blkio_policy_type *); extern void blkio_policy_unregister(struct blkio_policy_type *); +extern int blkcg_activate_policy(struct request_queue *q, + const struct blkio_policy_type *pol); +extern void blkcg_deactivate_policy(struct request_queue *q, + const struct blkio_policy_type *pol); extern void blkg_destroy_all(struct request_queue *q, bool destroy_root); -extern void update_root_blkg_pd(struct request_queue *q, - const struct blkio_policy_type *pol); void blkcg_print_blkgs(struct seq_file *sf, struct blkio_cgroup *blkcg, u64 (*prfill)(struct seq_file *, void *, int), @@ -325,10 +330,12 @@ static inline void blkcg_drain_queue(struct request_queue *q) { } static inline void blkcg_exit_queue(struct request_queue *q) { } static inline int blkio_policy_register(struct blkio_policy_type *blkiop) { return 0; } static inline void blkio_policy_unregister(struct blkio_policy_type *blkiop) { } +static inline int blkcg_activate_policy(struct request_queue *q, + const struct blkio_policy_type *pol) { return 0; } +static inline void blkcg_deactivate_policy(struct request_queue *q, + const struct blkio_policy_type *pol) { } static inline void blkg_destroy_all(struct request_queue *q, bool destory_root) { } -static inline void update_root_blkg_pd(struct request_queue *q, - const struct blkio_policy_type *pol) { } static inline void *blkg_to_pdata(struct blkio_group *blkg, struct blkio_policy_type *pol) { return NULL; } diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 8c520fad6885..2fc964e06ea4 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -995,35 +995,31 @@ static int tg_set_conf(struct cgroup *cgrp, struct cftype *cft, const char *buf, struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp); struct blkg_conf_ctx ctx; struct throtl_grp *tg; + struct throtl_data *td; int ret; ret = blkg_conf_prep(blkcg, &blkio_policy_throtl, buf, &ctx); if (ret) return ret; - ret = -EINVAL; tg = blkg_to_tg(ctx.blkg); - if (tg) { - struct throtl_data *td = ctx.blkg->q->td; - - if (!ctx.v) - ctx.v = -1; + td = ctx.blkg->q->td; - if (is_u64) - *(u64 *)((void *)tg + cft->private) = ctx.v; - else - *(unsigned int *)((void *)tg + cft->private) = ctx.v; + if (!ctx.v) + ctx.v = -1; - /* XXX: we don't need the following deferred processing */ - xchg(&tg->limits_changed, true); - xchg(&td->limits_changed, true); - throtl_schedule_delayed_work(td, 0); + if (is_u64) + *(u64 *)((void *)tg + cft->private) = ctx.v; + else + *(unsigned int *)((void *)tg + cft->private) = ctx.v; - ret = 0; - } + /* XXX: we don't need the following deferred processing */ + xchg(&tg->limits_changed, true); + xchg(&td->limits_changed, true); + throtl_schedule_delayed_work(td, 0); blkg_conf_finish(&ctx); - return ret; + return 0; } static int tg_set_conf_u64(struct cgroup *cgrp, struct cftype *cft, @@ -1230,7 +1226,7 @@ void blk_throtl_drain(struct request_queue *q) int blk_throtl_init(struct request_queue *q) { struct throtl_data *td; - struct blkio_group *blkg; + int ret; td = kzalloc_node(sizeof(*td), GFP_KERNEL, q->node); if (!td) @@ -1243,28 +1239,18 @@ int blk_throtl_init(struct request_queue *q) q->td = td; td->queue = q; - /* alloc and init root group. */ - rcu_read_lock(); - spin_lock_irq(q->queue_lock); - - blkg = blkg_lookup_create(&blkio_root_cgroup, q, true); - if (!IS_ERR(blkg)) - q->root_blkg = blkg; - - spin_unlock_irq(q->queue_lock); - rcu_read_unlock(); - - if (!q->root_blkg) { + /* activate policy */ + ret = blkcg_activate_policy(q, &blkio_policy_throtl); + if (ret) kfree(td); - return -ENOMEM; - } - return 0; + return ret; } void blk_throtl_exit(struct request_queue *q) { BUG_ON(!q->td); throtl_shutdown_wq(q); + blkcg_deactivate_policy(q, &blkio_policy_throtl); kfree(q->td); } diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index 86440e04f3ee..0203652e1f34 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -1406,8 +1406,7 @@ static int cfqg_set_weight_device(struct cgroup *cgrp, struct cftype *cft, ret = -EINVAL; cfqg = blkg_to_cfqg(ctx.blkg); - if (cfqg && (!ctx.v || (ctx.v >= CFQ_WEIGHT_MIN && - ctx.v <= CFQ_WEIGHT_MAX))) { + if (!ctx.v || (ctx.v >= CFQ_WEIGHT_MIN && ctx.v <= CFQ_WEIGHT_MAX)) { cfqg->dev_weight = ctx.v; cfqg->new_weight = cfqg->dev_weight ?: blkcg->cfq_weight; ret = 0; @@ -3938,7 +3937,7 @@ static void cfq_exit_queue(struct elevator_queue *e) #ifndef CONFIG_CFQ_GROUP_IOSCHED kfree(cfqd->root_group); #endif - update_root_blkg_pd(q, &blkio_policy_cfq); + blkcg_deactivate_policy(q, &blkio_policy_cfq); kfree(cfqd); } @@ -3946,7 +3945,7 @@ static int cfq_init_queue(struct request_queue *q) { struct cfq_data *cfqd; struct blkio_group *blkg __maybe_unused; - int i; + int i, ret; cfqd = kmalloc_node(sizeof(*cfqd), GFP_KERNEL | __GFP_ZERO, q->node); if (!cfqd) @@ -3960,28 +3959,20 @@ static int cfq_init_queue(struct request_queue *q) /* Init root group and prefer root group over other groups by default */ #ifdef CONFIG_CFQ_GROUP_IOSCHED - rcu_read_lock(); - spin_lock_irq(q->queue_lock); - - blkg = blkg_lookup_create(&blkio_root_cgroup, q, true); - if (!IS_ERR(blkg)) { - q->root_blkg = blkg; - cfqd->root_group = blkg_to_cfqg(blkg); - } + ret = blkcg_activate_policy(q, &blkio_policy_cfq); + if (ret) + goto out_free; - spin_unlock_irq(q->queue_lock); - rcu_read_unlock(); + cfqd->root_group = blkg_to_cfqg(q->root_blkg); #else + ret = -ENOMEM; cfqd->root_group = kzalloc_node(sizeof(*cfqd->root_group), GFP_KERNEL, cfqd->queue->node); - if (cfqd->root_group) - cfq_init_cfqg_base(cfqd->root_group); -#endif - if (!cfqd->root_group) { - kfree(cfqd); - return -ENOMEM; - } + if (!cfqd->root_group) + goto out_free; + cfq_init_cfqg_base(cfqd->root_group); +#endif cfqd->root_group->weight = 2 * CFQ_WEIGHT_DEFAULT; /* @@ -4031,6 +4022,10 @@ static int cfq_init_queue(struct request_queue *q) */ cfqd->last_delayed_sync = jiffies - HZ; return 0; + +out_free: + kfree(cfqd); + return ret; } /* diff --git a/block/elevator.c b/block/elevator.c index be3ab6df0fea..6a55d418896f 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -896,8 +896,6 @@ static int elevator_switch(struct request_queue *q, struct elevator_type *new_e) ioc_clear_queue(q); spin_unlock_irq(q->queue_lock); - blkg_destroy_all(q, false); - /* allocate, init and register new elevator */ err = -ENOMEM; q->elevator = elevator_alloc(q, new_e); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index b01c377fd739..68720ab275d4 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -370,6 +370,7 @@ struct request_queue { struct list_head icq_list; #ifdef CONFIG_BLK_CGROUP + DECLARE_BITMAP (blkcg_pols, BLKCG_MAX_POLS); struct blkio_group *root_blkg; struct list_head blkg_list; #endif -- cgit v1.2.3 From 3c798398e393e5f9502dbab2b51e6c25e2e8f2ac Mon Sep 17 00:00:00 2001 From: Tejun Heo Date: Mon, 16 Apr 2012 13:57:25 -0700 Subject: blkcg: mass rename of blkcg API During the recent blkcg cleanup, most of blkcg API has changed to such extent that mass renaming wouldn't cause any noticeable pain. Take the chance and cleanup the naming. * Rename blkio_cgroup to blkcg. * Drop blkio / blkiocg prefixes and consistently use blkcg. * Rename blkio_group to blkcg_gq, which is consistent with io_cq but keep the blkg prefix / variable name. * Rename policy method type and field names to signify they're dealing with policy data. * Rename blkio_policy_type to blkcg_policy. This patch doesn't cause any functional change. Signed-off-by: Tejun Heo Cc: Vivek Goyal Signed-off-by: Jens Axboe --- block/blk-cgroup.c | 202 ++++++++++++++++++++++++------------------------- block/blk-cgroup.h | 109 +++++++++++++------------- block/blk-throttle.c | 72 +++++++++--------- block/cfq-iosched.c | 78 +++++++++---------- include/linux/blkdev.h | 4 +- 5 files changed, 230 insertions(+), 235 deletions(-) (limited to 'include') diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 63337024e4d7..997570329517 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -26,39 +26,39 @@ static DEFINE_MUTEX(blkcg_pol_mutex); -struct blkio_cgroup blkio_root_cgroup = { .cfq_weight = 2 * CFQ_WEIGHT_DEFAULT }; -EXPORT_SYMBOL_GPL(blkio_root_cgroup); +struct blkcg blkcg_root = { .cfq_weight = 2 * CFQ_WEIGHT_DEFAULT }; +EXPORT_SYMBOL_GPL(blkcg_root); -static struct blkio_policy_type *blkio_policy[BLKCG_MAX_POLS]; +static struct blkcg_policy *blkcg_policy[BLKCG_MAX_POLS]; -struct blkio_cgroup *cgroup_to_blkio_cgroup(struct cgroup *cgroup) +struct blkcg *cgroup_to_blkcg(struct cgroup *cgroup) { return container_of(cgroup_subsys_state(cgroup, blkio_subsys_id), - struct blkio_cgroup, css); + struct blkcg, css); } -EXPORT_SYMBOL_GPL(cgroup_to_blkio_cgroup); +EXPORT_SYMBOL_GPL(cgroup_to_blkcg); -static struct blkio_cgroup *task_blkio_cgroup(struct task_struct *tsk) +static struct blkcg *task_blkcg(struct task_struct *tsk) { return container_of(task_subsys_state(tsk, blkio_subsys_id), - struct blkio_cgroup, css); + struct blkcg, css); } -struct blkio_cgroup *bio_blkio_cgroup(struct bio *bio) +struct blkcg *bio_blkcg(struct bio *bio) { if (bio && bio->bi_css) - return container_of(bio->bi_css, struct blkio_cgroup, css); - return task_blkio_cgroup(current); + return container_of(bio->bi_css, struct blkcg, css); + return task_blkcg(current); } -EXPORT_SYMBOL_GPL(bio_blkio_cgroup); +EXPORT_SYMBOL_GPL(bio_blkcg); static bool blkcg_policy_enabled(struct request_queue *q, - const struct blkio_policy_type *pol) + const struct blkcg_policy *pol) { return pol && test_bit(pol->plid, q->blkcg_pols); } -static size_t blkg_pd_size(const struct blkio_policy_type *pol) +static size_t blkg_pd_size(const struct blkcg_policy *pol) { return sizeof(struct blkg_policy_data) + pol->pdata_size; } @@ -69,7 +69,7 @@ static size_t blkg_pd_size(const struct blkio_policy_type *pol) * * Free @blkg which may be partially allocated. */ -static void blkg_free(struct blkio_group *blkg) +static void blkg_free(struct blkcg_gq *blkg) { int i; @@ -77,14 +77,14 @@ static void blkg_free(struct blkio_group *blkg) return; for (i = 0; i < BLKCG_MAX_POLS; i++) { - struct blkio_policy_type *pol = blkio_policy[i]; + struct blkcg_policy *pol = blkcg_policy[i]; struct blkg_policy_data *pd = blkg->pd[i]; if (!pd) continue; - if (pol && pol->ops.blkio_exit_group_fn) - pol->ops.blkio_exit_group_fn(blkg); + if (pol && pol->ops.pd_exit_fn) + pol->ops.pd_exit_fn(blkg); kfree(pd); } @@ -99,10 +99,9 @@ static void blkg_free(struct blkio_group *blkg) * * Allocate a new blkg assocating @blkcg and @q. */ -static struct blkio_group *blkg_alloc(struct blkio_cgroup *blkcg, - struct request_queue *q) +static struct blkcg_gq *blkg_alloc(struct blkcg *blkcg, struct request_queue *q) { - struct blkio_group *blkg; + struct blkcg_gq *blkg; int i; /* alloc and init base part */ @@ -116,7 +115,7 @@ static struct blkio_group *blkg_alloc(struct blkio_cgroup *blkcg, blkg->refcnt = 1; for (i = 0; i < BLKCG_MAX_POLS; i++) { - struct blkio_policy_type *pol = blkio_policy[i]; + struct blkcg_policy *pol = blkcg_policy[i]; struct blkg_policy_data *pd; if (!blkcg_policy_enabled(q, pol)) @@ -135,19 +134,19 @@ static struct blkio_group *blkg_alloc(struct blkio_cgroup *blkcg, /* invoke per-policy init */ for (i = 0; i < BLKCG_MAX_POLS; i++) { - struct blkio_policy_type *pol = blkio_policy[i]; + struct blkcg_policy *pol = blkcg_policy[i]; if (blkcg_policy_enabled(blkg->q, pol)) - pol->ops.blkio_init_group_fn(blkg); + pol->ops.pd_init_fn(blkg); } return blkg; } -static struct blkio_group *__blkg_lookup(struct blkio_cgroup *blkcg, - struct request_queue *q) +static struct blkcg_gq *__blkg_lookup(struct blkcg *blkcg, + struct request_queue *q) { - struct blkio_group *blkg; + struct blkcg_gq *blkg; struct hlist_node *n; hlist_for_each_entry_rcu(blkg, n, &blkcg->blkg_list, blkcg_node) @@ -165,8 +164,7 @@ static struct blkio_group *__blkg_lookup(struct blkio_cgroup *blkcg, * under RCU read lock and is guaranteed to return %NULL if @q is bypassing * - see blk_queue_bypass_start() for details. */ -struct blkio_group *blkg_lookup(struct blkio_cgroup *blkcg, - struct request_queue *q) +struct blkcg_gq *blkg_lookup(struct blkcg *blkcg, struct request_queue *q) { WARN_ON_ONCE(!rcu_read_lock_held()); @@ -176,11 +174,11 @@ struct blkio_group *blkg_lookup(struct blkio_cgroup *blkcg, } EXPORT_SYMBOL_GPL(blkg_lookup); -static struct blkio_group *__blkg_lookup_create(struct blkio_cgroup *blkcg, - struct request_queue *q) +static struct blkcg_gq *__blkg_lookup_create(struct blkcg *blkcg, + struct request_queue *q) __releases(q->queue_lock) __acquires(q->queue_lock) { - struct blkio_group *blkg; + struct blkcg_gq *blkg; WARN_ON_ONCE(!rcu_read_lock_held()); lockdep_assert_held(q->queue_lock); @@ -213,8 +211,8 @@ out: return blkg; } -struct blkio_group *blkg_lookup_create(struct blkio_cgroup *blkcg, - struct request_queue *q) +struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg, + struct request_queue *q) { /* * This could be the first entry point of blkcg implementation and @@ -226,10 +224,10 @@ struct blkio_group *blkg_lookup_create(struct blkio_cgroup *blkcg, } EXPORT_SYMBOL_GPL(blkg_lookup_create); -static void blkg_destroy(struct blkio_group *blkg) +static void blkg_destroy(struct blkcg_gq *blkg) { struct request_queue *q = blkg->q; - struct blkio_cgroup *blkcg = blkg->blkcg; + struct blkcg *blkcg = blkg->blkcg; lockdep_assert_held(q->queue_lock); lockdep_assert_held(&blkcg->lock); @@ -255,12 +253,12 @@ static void blkg_destroy(struct blkio_group *blkg) */ static void blkg_destroy_all(struct request_queue *q) { - struct blkio_group *blkg, *n; + struct blkcg_gq *blkg, *n; lockdep_assert_held(q->queue_lock); list_for_each_entry_safe(blkg, n, &q->blkg_list, q_node) { - struct blkio_cgroup *blkcg = blkg->blkcg; + struct blkcg *blkcg = blkg->blkcg; spin_lock(&blkcg->lock); blkg_destroy(blkg); @@ -270,10 +268,10 @@ static void blkg_destroy_all(struct request_queue *q) static void blkg_rcu_free(struct rcu_head *rcu_head) { - blkg_free(container_of(rcu_head, struct blkio_group, rcu_head)); + blkg_free(container_of(rcu_head, struct blkcg_gq, rcu_head)); } -void __blkg_release(struct blkio_group *blkg) +void __blkg_release(struct blkcg_gq *blkg) { /* release the extra blkcg reference this blkg has been holding */ css_put(&blkg->blkcg->css); @@ -291,11 +289,11 @@ void __blkg_release(struct blkio_group *blkg) } EXPORT_SYMBOL_GPL(__blkg_release); -static int -blkiocg_reset_stats(struct cgroup *cgroup, struct cftype *cftype, u64 val) +static int blkcg_reset_stats(struct cgroup *cgroup, struct cftype *cftype, + u64 val) { - struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgroup); - struct blkio_group *blkg; + struct blkcg *blkcg = cgroup_to_blkcg(cgroup); + struct blkcg_gq *blkg; struct hlist_node *n; int i; @@ -309,11 +307,11 @@ blkiocg_reset_stats(struct cgroup *cgroup, struct cftype *cftype, u64 val) */ hlist_for_each_entry(blkg, n, &blkcg->blkg_list, blkcg_node) { for (i = 0; i < BLKCG_MAX_POLS; i++) { - struct blkio_policy_type *pol = blkio_policy[i]; + struct blkcg_policy *pol = blkcg_policy[i]; if (blkcg_policy_enabled(blkg->q, pol) && - pol->ops.blkio_reset_group_stats_fn) - pol->ops.blkio_reset_group_stats_fn(blkg); + pol->ops.pd_reset_stats_fn) + pol->ops.pd_reset_stats_fn(blkg); } } @@ -322,7 +320,7 @@ blkiocg_reset_stats(struct cgroup *cgroup, struct cftype *cftype, u64 val) return 0; } -static const char *blkg_dev_name(struct blkio_group *blkg) +static const char *blkg_dev_name(struct blkcg_gq *blkg) { /* some drivers (floppy) instantiate a queue w/o disk registered */ if (blkg->q->backing_dev_info.dev) @@ -347,12 +345,12 @@ static const char *blkg_dev_name(struct blkio_group *blkg) * This is to be used to construct print functions for * cftype->read_seq_string method. */ -void blkcg_print_blkgs(struct seq_file *sf, struct blkio_cgroup *blkcg, +void blkcg_print_blkgs(struct seq_file *sf, struct blkcg *blkcg, u64 (*prfill)(struct seq_file *, void *, int), - const struct blkio_policy_type *pol, int data, + const struct blkcg_policy *pol, int data, bool show_total) { - struct blkio_group *blkg; + struct blkcg_gq *blkg; struct hlist_node *n; u64 total = 0; @@ -462,13 +460,12 @@ EXPORT_SYMBOL_GPL(blkg_prfill_rwstat); * value. This function returns with RCU read lock and queue lock held and * must be paired with blkg_conf_finish(). */ -int blkg_conf_prep(struct blkio_cgroup *blkcg, - const struct blkio_policy_type *pol, const char *input, - struct blkg_conf_ctx *ctx) +int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol, + const char *input, struct blkg_conf_ctx *ctx) __acquires(rcu) __acquires(disk->queue->queue_lock) { struct gendisk *disk; - struct blkio_group *blkg; + struct blkcg_gq *blkg; unsigned int major, minor; unsigned long long v; int part, ret; @@ -529,16 +526,16 @@ void blkg_conf_finish(struct blkg_conf_ctx *ctx) } EXPORT_SYMBOL_GPL(blkg_conf_finish); -struct cftype blkio_files[] = { +struct cftype blkcg_files[] = { { .name = "reset_stats", - .write_u64 = blkiocg_reset_stats, + .write_u64 = blkcg_reset_stats, }, { } /* terminate */ }; /** - * blkiocg_pre_destroy - cgroup pre_destroy callback + * blkcg_pre_destroy - cgroup pre_destroy callback * @cgroup: cgroup of interest * * This function is called when @cgroup is about to go away and responsible @@ -548,15 +545,15 @@ struct cftype blkio_files[] = { * * This is the blkcg counterpart of ioc_release_fn(). */ -static int blkiocg_pre_destroy(struct cgroup *cgroup) +static int blkcg_pre_destroy(struct cgroup *cgroup) { - struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgroup); + struct blkcg *blkcg = cgroup_to_blkcg(cgroup); spin_lock_irq(&blkcg->lock); while (!hlist_empty(&blkcg->blkg_list)) { - struct blkio_group *blkg = hlist_entry(blkcg->blkg_list.first, - struct blkio_group, blkcg_node); + struct blkcg_gq *blkg = hlist_entry(blkcg->blkg_list.first, + struct blkcg_gq, blkcg_node); struct request_queue *q = blkg->q; if (spin_trylock(q->queue_lock)) { @@ -573,22 +570,22 @@ static int blkiocg_pre_destroy(struct cgroup *cgroup) return 0; } -static void blkiocg_destroy(struct cgroup *cgroup) +static void blkcg_destroy(struct cgroup *cgroup) { - struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgroup); + struct blkcg *blkcg = cgroup_to_blkcg(cgroup); - if (blkcg != &blkio_root_cgroup) + if (blkcg != &blkcg_root) kfree(blkcg); } -static struct cgroup_subsys_state *blkiocg_create(struct cgroup *cgroup) +static struct cgroup_subsys_state *blkcg_create(struct cgroup *cgroup) { static atomic64_t id_seq = ATOMIC64_INIT(0); - struct blkio_cgroup *blkcg; + struct blkcg *blkcg; struct cgroup *parent = cgroup->parent; if (!parent) { - blkcg = &blkio_root_cgroup; + blkcg = &blkcg_root; goto done; } @@ -656,7 +653,7 @@ void blkcg_exit_queue(struct request_queue *q) * of the main cic data structures. For now we allow a task to change * its cgroup only if it's the only owner of its ioc. */ -static int blkiocg_can_attach(struct cgroup *cgrp, struct cgroup_taskset *tset) +static int blkcg_can_attach(struct cgroup *cgrp, struct cgroup_taskset *tset) { struct task_struct *task; struct io_context *ioc; @@ -677,12 +674,12 @@ static int blkiocg_can_attach(struct cgroup *cgrp, struct cgroup_taskset *tset) struct cgroup_subsys blkio_subsys = { .name = "blkio", - .create = blkiocg_create, - .can_attach = blkiocg_can_attach, - .pre_destroy = blkiocg_pre_destroy, - .destroy = blkiocg_destroy, + .create = blkcg_create, + .can_attach = blkcg_can_attach, + .pre_destroy = blkcg_pre_destroy, + .destroy = blkcg_destroy, .subsys_id = blkio_subsys_id, - .base_cftypes = blkio_files, + .base_cftypes = blkcg_files, .module = THIS_MODULE, }; EXPORT_SYMBOL_GPL(blkio_subsys); @@ -704,10 +701,10 @@ EXPORT_SYMBOL_GPL(blkio_subsys); * [un]registerations. Returns 0 on success, -errno on failure. */ int blkcg_activate_policy(struct request_queue *q, - const struct blkio_policy_type *pol) + const struct blkcg_policy *pol) { LIST_HEAD(pds); - struct blkio_group *blkg; + struct blkcg_gq *blkg; struct blkg_policy_data *pd, *n; int cnt = 0, ret; @@ -720,7 +717,7 @@ int blkcg_activate_policy(struct request_queue *q, spin_lock_irq(q->queue_lock); rcu_read_lock(); - blkg = __blkg_lookup_create(&blkio_root_cgroup, q); + blkg = __blkg_lookup_create(&blkcg_root, q); rcu_read_unlock(); if (IS_ERR(blkg)) { @@ -764,7 +761,7 @@ int blkcg_activate_policy(struct request_queue *q, blkg->pd[pol->plid] = pd; pd->blkg = blkg; - pol->ops.blkio_init_group_fn(blkg); + pol->ops.pd_init_fn(blkg); spin_unlock(&blkg->blkcg->lock); } @@ -790,9 +787,9 @@ EXPORT_SYMBOL_GPL(blkcg_activate_policy); * blkcg_activate_policy(). */ void blkcg_deactivate_policy(struct request_queue *q, - const struct blkio_policy_type *pol) + const struct blkcg_policy *pol) { - struct blkio_group *blkg; + struct blkcg_gq *blkg; if (!blkcg_policy_enabled(q, pol)) return; @@ -810,8 +807,8 @@ void blkcg_deactivate_policy(struct request_queue *q, /* grab blkcg lock too while removing @pd from @blkg */ spin_lock(&blkg->blkcg->lock); - if (pol->ops.blkio_exit_group_fn) - pol->ops.blkio_exit_group_fn(blkg); + if (pol->ops.pd_exit_fn) + pol->ops.pd_exit_fn(blkg); kfree(blkg->pd[pol->plid]); blkg->pd[pol->plid] = NULL; @@ -825,14 +822,13 @@ void blkcg_deactivate_policy(struct request_queue *q, EXPORT_SYMBOL_GPL(blkcg_deactivate_policy); /** - * blkio_policy_register - register a blkcg policy - * @blkiop: blkcg policy to register + * blkcg_policy_register - register a blkcg policy + * @pol: blkcg policy to register * - * Register @blkiop with blkcg core. Might sleep and @blkiop may be - * modified on successful registration. Returns 0 on success and -errno on - * failure. + * Register @pol with blkcg core. Might sleep and @pol may be modified on + * successful registration. Returns 0 on success and -errno on failure. */ -int blkio_policy_register(struct blkio_policy_type *blkiop) +int blkcg_policy_register(struct blkcg_policy *pol) { int i, ret; @@ -841,45 +837,45 @@ int blkio_policy_register(struct blkio_policy_type *blkiop) /* find an empty slot */ ret = -ENOSPC; for (i = 0; i < BLKCG_MAX_POLS; i++) - if (!blkio_policy[i]) + if (!blkcg_policy[i]) break; if (i >= BLKCG_MAX_POLS) goto out_unlock; /* register and update blkgs */ - blkiop->plid = i; - blkio_policy[i] = blkiop; + pol->plid = i; + blkcg_policy[i] = pol; /* everything is in place, add intf files for the new policy */ - if (blkiop->cftypes) - WARN_ON(cgroup_add_cftypes(&blkio_subsys, blkiop->cftypes)); + if (pol->cftypes) + WARN_ON(cgroup_add_cftypes(&blkio_subsys, pol->cftypes)); ret = 0; out_unlock: mutex_unlock(&blkcg_pol_mutex); return ret; } -EXPORT_SYMBOL_GPL(blkio_policy_register); +EXPORT_SYMBOL_GPL(blkcg_policy_register); /** - * blkiop_policy_unregister - unregister a blkcg policy - * @blkiop: blkcg policy to unregister + * blkcg_policy_unregister - unregister a blkcg policy + * @pol: blkcg policy to unregister * - * Undo blkio_policy_register(@blkiop). Might sleep. + * Undo blkcg_policy_register(@pol). Might sleep. */ -void blkio_policy_unregister(struct blkio_policy_type *blkiop) +void blkcg_policy_unregister(struct blkcg_policy *pol) { mutex_lock(&blkcg_pol_mutex); - if (WARN_ON(blkio_policy[blkiop->plid] != blkiop)) + if (WARN_ON(blkcg_policy[pol->plid] != pol)) goto out_unlock; /* kill the intf files first */ - if (blkiop->cftypes) - cgroup_rm_cftypes(&blkio_subsys, blkiop->cftypes); + if (pol->cftypes) + cgroup_rm_cftypes(&blkio_subsys, pol->cftypes); /* unregister and update blkgs */ - blkio_policy[blkiop->plid] = NULL; + blkcg_policy[pol->plid] = NULL; out_unlock: mutex_unlock(&blkcg_pol_mutex); } -EXPORT_SYMBOL_GPL(blkio_policy_unregister); +EXPORT_SYMBOL_GPL(blkcg_policy_unregister); diff --git a/block/blk-cgroup.h b/block/blk-cgroup.h index b347aa08d166..a443b84d2c16 100644 --- a/block/blk-cgroup.h +++ b/block/blk-cgroup.h @@ -37,7 +37,7 @@ enum blkg_rwstat_type { BLKG_RWSTAT_TOTAL = BLKG_RWSTAT_NR, }; -struct blkio_cgroup { +struct blkcg { struct cgroup_subsys_state css; spinlock_t lock; struct hlist_head blkg_list; @@ -45,7 +45,7 @@ struct blkio_cgroup { /* for policies to test whether associated blkcg has changed */ uint64_t id; - /* TODO: per-policy storage in blkio_cgroup */ + /* TODO: per-policy storage in blkcg */ unsigned int cfq_weight; /* belongs to cfq */ }; @@ -62,7 +62,7 @@ struct blkg_rwstat { /* per-blkg per-policy data */ struct blkg_policy_data { /* the blkg this per-policy data belongs to */ - struct blkio_group *blkg; + struct blkcg_gq *blkg; /* used during policy activation */ struct list_head alloc_node; @@ -71,12 +71,13 @@ struct blkg_policy_data { char pdata[] __aligned(__alignof__(unsigned long long)); }; -struct blkio_group { +/* association between a blk cgroup and a request queue */ +struct blkcg_gq { /* Pointer to the associated request_queue */ struct request_queue *q; struct list_head q_node; struct hlist_node blkcg_node; - struct blkio_cgroup *blkcg; + struct blkcg *blkcg; /* reference count */ int refcnt; @@ -85,18 +86,18 @@ struct blkio_group { struct rcu_head rcu_head; }; -typedef void (blkio_init_group_fn)(struct blkio_group *blkg); -typedef void (blkio_exit_group_fn)(struct blkio_group *blkg); -typedef void (blkio_reset_group_stats_fn)(struct blkio_group *blkg); +typedef void (blkcg_pol_init_pd_fn)(struct blkcg_gq *blkg); +typedef void (blkcg_pol_exit_pd_fn)(struct blkcg_gq *blkg); +typedef void (blkcg_pol_reset_pd_stats_fn)(struct blkcg_gq *blkg); -struct blkio_policy_ops { - blkio_init_group_fn *blkio_init_group_fn; - blkio_exit_group_fn *blkio_exit_group_fn; - blkio_reset_group_stats_fn *blkio_reset_group_stats_fn; +struct blkcg_policy_ops { + blkcg_pol_init_pd_fn *pd_init_fn; + blkcg_pol_exit_pd_fn *pd_exit_fn; + blkcg_pol_reset_pd_stats_fn *pd_reset_stats_fn; }; -struct blkio_policy_type { - struct blkio_policy_ops ops; +struct blkcg_policy { + struct blkcg_policy_ops ops; int plid; /* policy specific private data size */ size_t pdata_size; @@ -104,29 +105,28 @@ struct blkio_policy_type { struct cftype *cftypes; }; -extern struct blkio_cgroup blkio_root_cgroup; +extern struct blkcg blkcg_root; -struct blkio_cgroup *cgroup_to_blkio_cgroup(struct cgroup *cgroup); -struct blkio_cgroup *bio_blkio_cgroup(struct bio *bio); -struct blkio_group *blkg_lookup(struct blkio_cgroup *blkcg, - struct request_queue *q); -struct blkio_group *blkg_lookup_create(struct blkio_cgroup *blkcg, - struct request_queue *q); +struct blkcg *cgroup_to_blkcg(struct cgroup *cgroup); +struct blkcg *bio_blkcg(struct bio *bio); +struct blkcg_gq *blkg_lookup(struct blkcg *blkcg, struct request_queue *q); +struct blkcg_gq *blkg_lookup_create(struct blkcg *blkcg, + struct request_queue *q); int blkcg_init_queue(struct request_queue *q); void blkcg_drain_queue(struct request_queue *q); void blkcg_exit_queue(struct request_queue *q); /* Blkio controller policy registration */ -int blkio_policy_register(struct blkio_policy_type *); -void blkio_policy_unregister(struct blkio_policy_type *); +int blkcg_policy_register(struct blkcg_policy *pol); +void blkcg_policy_unregister(struct blkcg_policy *pol); int blkcg_activate_policy(struct request_queue *q, - const struct blkio_policy_type *pol); + const struct blkcg_policy *pol); void blkcg_deactivate_policy(struct request_queue *q, - const struct blkio_policy_type *pol); + const struct blkcg_policy *pol); -void blkcg_print_blkgs(struct seq_file *sf, struct blkio_cgroup *blkcg, +void blkcg_print_blkgs(struct seq_file *sf, struct blkcg *blkcg, u64 (*prfill)(struct seq_file *, void *, int), - const struct blkio_policy_type *pol, int data, + const struct blkcg_policy *pol, int data, bool show_total); u64 __blkg_prfill_u64(struct seq_file *sf, void *pdata, u64 v); u64 __blkg_prfill_rwstat(struct seq_file *sf, void *pdata, @@ -136,13 +136,12 @@ u64 blkg_prfill_rwstat(struct seq_file *sf, void *pdata, int off); struct blkg_conf_ctx { struct gendisk *disk; - struct blkio_group *blkg; + struct blkcg_gq *blkg; u64 v; }; -int blkg_conf_prep(struct blkio_cgroup *blkcg, - const struct blkio_policy_type *pol, const char *input, - struct blkg_conf_ctx *ctx); +int blkg_conf_prep(struct blkcg *blkcg, const struct blkcg_policy *pol, + const char *input, struct blkg_conf_ctx *ctx); void blkg_conf_finish(struct blkg_conf_ctx *ctx); @@ -153,8 +152,8 @@ void blkg_conf_finish(struct blkg_conf_ctx *ctx); * * Return pointer to private data associated with the @blkg-@pol pair. */ -static inline void *blkg_to_pdata(struct blkio_group *blkg, - struct blkio_policy_type *pol) +static inline void *blkg_to_pdata(struct blkcg_gq *blkg, + struct blkcg_policy *pol) { return blkg ? blkg->pd[pol->plid]->pdata : NULL; } @@ -165,7 +164,7 @@ static inline void *blkg_to_pdata(struct blkio_group *blkg, * * @pdata is policy private data. Determine the blkg it's associated with. */ -static inline struct blkio_group *pdata_to_blkg(void *pdata) +static inline struct blkcg_gq *pdata_to_blkg(void *pdata) { if (pdata) { struct blkg_policy_data *pd = @@ -183,7 +182,7 @@ static inline struct blkio_group *pdata_to_blkg(void *pdata) * * Format the path of the cgroup of @blkg into @buf. */ -static inline int blkg_path(struct blkio_group *blkg, char *buf, int buflen) +static inline int blkg_path(struct blkcg_gq *blkg, char *buf, int buflen) { int ret; @@ -201,14 +200,14 @@ static inline int blkg_path(struct blkio_group *blkg, char *buf, int buflen) * * The caller should be holding queue_lock and an existing reference. */ -static inline void blkg_get(struct blkio_group *blkg) +static inline void blkg_get(struct blkcg_gq *blkg) { lockdep_assert_held(blkg->q->queue_lock); WARN_ON_ONCE(!blkg->refcnt); blkg->refcnt++; } -void __blkg_release(struct blkio_group *blkg); +void __blkg_release(struct blkcg_gq *blkg); /** * blkg_put - put a blkg reference @@ -216,7 +215,7 @@ void __blkg_release(struct blkio_group *blkg); * * The caller should be holding queue_lock. */ -static inline void blkg_put(struct blkio_group *blkg) +static inline void blkg_put(struct blkcg_gq *blkg) { lockdep_assert_held(blkg->q->queue_lock); WARN_ON_ONCE(blkg->refcnt <= 0); @@ -343,32 +342,32 @@ static inline void blkg_rwstat_reset(struct blkg_rwstat *rwstat) struct cgroup; -struct blkio_group { +struct blkcg_gq { }; -struct blkio_policy_type { +struct blkcg_policy { }; -static inline struct blkio_cgroup *cgroup_to_blkio_cgroup(struct cgroup *cgroup) { return NULL; } -static inline struct blkio_cgroup *bio_blkio_cgroup(struct bio *bio) { return NULL; } -static inline struct blkio_group *blkg_lookup(struct blkio_cgroup *blkcg, void *key) { return NULL; } +static inline struct blkcg *cgroup_to_blkcg(struct cgroup *cgroup) { return NULL; } +static inline struct blkcg *bio_blkcg(struct bio *bio) { return NULL; } +static inline struct blkcg_gq *blkg_lookup(struct blkcg *blkcg, void *key) { return NULL; } static inline int blkcg_init_queue(struct request_queue *q) { return 0; } static inline void blkcg_drain_queue(struct request_queue *q) { } static inline void blkcg_exit_queue(struct request_queue *q) { } -static inline int blkio_policy_register(struct blkio_policy_type *blkiop) { return 0; } -static inline void blkio_policy_unregister(struct blkio_policy_type *blkiop) { } +static inline int blkcg_policy_register(struct blkcg_policy *pol) { return 0; } +static inline void blkcg_policy_unregister(struct blkcg_policy *pol) { } static inline int blkcg_activate_policy(struct request_queue *q, - const struct blkio_policy_type *pol) { return 0; } + const struct blkcg_policy *pol) { return 0; } static inline void blkcg_deactivate_policy(struct request_queue *q, - const struct blkio_policy_type *pol) { } - -static inline void *blkg_to_pdata(struct blkio_group *blkg, - struct blkio_policy_type *pol) { return NULL; } -static inline struct blkio_group *pdata_to_blkg(void *pdata, - struct blkio_policy_type *pol) { return NULL; } -static inline char *blkg_path(struct blkio_group *blkg) { return NULL; } -static inline void blkg_get(struct blkio_group *blkg) { } -static inline void blkg_put(struct blkio_group *blkg) { } + const struct blkcg_policy *pol) { } + +static inline void *blkg_to_pdata(struct blkcg_gq *blkg, + struct blkcg_policy *pol) { return NULL; } +static inline struct blkcg_gq *pdata_to_blkg(void *pdata, + struct blkcg_policy *pol) { return NULL; } +static inline char *blkg_path(struct blkcg_gq *blkg) { return NULL; } +static inline void blkg_get(struct blkcg_gq *blkg) { } +static inline void blkg_put(struct blkcg_gq *blkg) { } #endif /* CONFIG_BLK_CGROUP */ #endif /* _BLK_CGROUP_H */ diff --git a/block/blk-throttle.c b/block/blk-throttle.c index e9b7a47f6da0..00c7eff66ecf 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -21,7 +21,7 @@ static int throtl_quantum = 32; /* Throttling is performed over 100ms slice and after that slice is renewed */ static unsigned long throtl_slice = HZ/10; /* 100 ms */ -static struct blkio_policy_type blkio_policy_throtl; +static struct blkcg_policy blkcg_policy_throtl; /* A workqueue to queue throttle related work */ static struct workqueue_struct *kthrotld_workqueue; @@ -120,12 +120,12 @@ static LIST_HEAD(tg_stats_alloc_list); static void tg_stats_alloc_fn(struct work_struct *); static DECLARE_DELAYED_WORK(tg_stats_alloc_work, tg_stats_alloc_fn); -static inline struct throtl_grp *blkg_to_tg(struct blkio_group *blkg) +static inline struct throtl_grp *blkg_to_tg(struct blkcg_gq *blkg) { - return blkg_to_pdata(blkg, &blkio_policy_throtl); + return blkg_to_pdata(blkg, &blkcg_policy_throtl); } -static inline struct blkio_group *tg_to_blkg(struct throtl_grp *tg) +static inline struct blkcg_gq *tg_to_blkg(struct throtl_grp *tg) { return pdata_to_blkg(tg); } @@ -208,7 +208,7 @@ alloc_stats: goto alloc_stats; } -static void throtl_init_blkio_group(struct blkio_group *blkg) +static void throtl_pd_init(struct blkcg_gq *blkg) { struct throtl_grp *tg = blkg_to_tg(blkg); @@ -233,7 +233,7 @@ static void throtl_init_blkio_group(struct blkio_group *blkg) spin_unlock(&tg_stats_alloc_lock); } -static void throtl_exit_blkio_group(struct blkio_group *blkg) +static void throtl_pd_exit(struct blkcg_gq *blkg) { struct throtl_grp *tg = blkg_to_tg(blkg); @@ -244,7 +244,7 @@ static void throtl_exit_blkio_group(struct blkio_group *blkg) free_percpu(tg->stats_cpu); } -static void throtl_reset_group_stats(struct blkio_group *blkg) +static void throtl_pd_reset_stats(struct blkcg_gq *blkg) { struct throtl_grp *tg = blkg_to_tg(blkg); int cpu; @@ -260,33 +260,33 @@ static void throtl_reset_group_stats(struct blkio_group *blkg) } } -static struct -throtl_grp *throtl_lookup_tg(struct throtl_data *td, struct blkio_cgroup *blkcg) +static struct throtl_grp *throtl_lookup_tg(struct throtl_data *td, + struct blkcg *blkcg) { /* - * This is the common case when there are no blkio cgroups. - * Avoid lookup in this case + * This is the common case when there are no blkcgs. Avoid lookup + * in this case */ - if (blkcg == &blkio_root_cgroup) + if (blkcg == &blkcg_root) return td_root_tg(td); return blkg_to_tg(blkg_lookup(blkcg, td->queue)); } static struct throtl_grp *throtl_lookup_create_tg(struct throtl_data *td, - struct blkio_cgroup *blkcg) + struct blkcg *blkcg) { struct request_queue *q = td->queue; struct throtl_grp *tg = NULL; /* - * This is the common case when there are no blkio cgroups. - * Avoid lookup in this case + * This is the common case when there are no blkcgs. Avoid lookup + * in this case */ - if (blkcg == &blkio_root_cgroup) { + if (blkcg == &blkcg_root) { tg = td_root_tg(td); } else { - struct blkio_group *blkg; + struct blkcg_gq *blkg; blkg = blkg_lookup_create(blkcg, q); @@ -665,7 +665,7 @@ static bool tg_may_dispatch(struct throtl_data *td, struct throtl_grp *tg, return 0; } -static void throtl_update_dispatch_stats(struct blkio_group *blkg, u64 bytes, +static void throtl_update_dispatch_stats(struct blkcg_gq *blkg, u64 bytes, int rw) { struct throtl_grp *tg = blkg_to_tg(blkg); @@ -822,7 +822,7 @@ static int throtl_select_dispatch(struct throtl_data *td, struct bio_list *bl) static void throtl_process_limit_change(struct throtl_data *td) { struct request_queue *q = td->queue; - struct blkio_group *blkg, *n; + struct blkcg_gq *blkg, *n; if (!td->limits_changed) return; @@ -951,9 +951,9 @@ static u64 tg_prfill_cpu_rwstat(struct seq_file *sf, void *pdata, int off) static int tg_print_cpu_rwstat(struct cgroup *cgrp, struct cftype *cft, struct seq_file *sf) { - struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp); + struct blkcg *blkcg = cgroup_to_blkcg(cgrp); - blkcg_print_blkgs(sf, blkcg, tg_prfill_cpu_rwstat, &blkio_policy_throtl, + blkcg_print_blkgs(sf, blkcg, tg_prfill_cpu_rwstat, &blkcg_policy_throtl, cft->private, true); return 0; } @@ -979,29 +979,29 @@ static u64 tg_prfill_conf_uint(struct seq_file *sf, void *pdata, int off) static int tg_print_conf_u64(struct cgroup *cgrp, struct cftype *cft, struct seq_file *sf) { - blkcg_print_blkgs(sf, cgroup_to_blkio_cgroup(cgrp), tg_prfill_conf_u64, - &blkio_policy_throtl, cft->private, false); + blkcg_print_blkgs(sf, cgroup_to_blkcg(cgrp), tg_prfill_conf_u64, + &blkcg_policy_throtl, cft->private, false); return 0; } static int tg_print_conf_uint(struct cgroup *cgrp, struct cftype *cft, struct seq_file *sf) { - blkcg_print_blkgs(sf, cgroup_to_blkio_cgroup(cgrp), tg_prfill_conf_uint, - &blkio_policy_throtl, cft->private, false); + blkcg_print_blkgs(sf, cgroup_to_blkcg(cgrp), tg_prfill_conf_uint, + &blkcg_policy_throtl, cft->private, false); return 0; } static int tg_set_conf(struct cgroup *cgrp, struct cftype *cft, const char *buf, bool is_u64) { - struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp); + struct blkcg *blkcg = cgroup_to_blkcg(cgrp); struct blkg_conf_ctx ctx; struct throtl_grp *tg; struct throtl_data *td; int ret; - ret = blkg_conf_prep(blkcg, &blkio_policy_throtl, buf, &ctx); + ret = blkg_conf_prep(blkcg, &blkcg_policy_throtl, buf, &ctx); if (ret) return ret; @@ -1086,11 +1086,11 @@ static void throtl_shutdown_wq(struct request_queue *q) cancel_delayed_work_sync(&td->throtl_work); } -static struct blkio_policy_type blkio_policy_throtl = { +static struct blkcg_policy blkcg_policy_throtl = { .ops = { - .blkio_init_group_fn = throtl_init_blkio_group, - .blkio_exit_group_fn = throtl_exit_blkio_group, - .blkio_reset_group_stats_fn = throtl_reset_group_stats, + .pd_init_fn = throtl_pd_init, + .pd_exit_fn = throtl_pd_exit, + .pd_reset_stats_fn = throtl_pd_reset_stats, }, .pdata_size = sizeof(struct throtl_grp), .cftypes = throtl_files, @@ -1101,7 +1101,7 @@ bool blk_throtl_bio(struct request_queue *q, struct bio *bio) struct throtl_data *td = q->td; struct throtl_grp *tg; bool rw = bio_data_dir(bio), update_disptime = true; - struct blkio_cgroup *blkcg; + struct blkcg *blkcg; bool throttled = false; if (bio->bi_rw & REQ_THROTTLED) { @@ -1118,7 +1118,7 @@ bool blk_throtl_bio(struct request_queue *q, struct bio *bio) * just update the dispatch stats in lockless manner and return. */ rcu_read_lock(); - blkcg = bio_blkio_cgroup(bio); + blkcg = bio_blkcg(bio); tg = throtl_lookup_tg(td, blkcg); if (tg) { if (tg_no_rule_group(tg, rw)) { @@ -1243,7 +1243,7 @@ int blk_throtl_init(struct request_queue *q) td->queue = q; /* activate policy */ - ret = blkcg_activate_policy(q, &blkio_policy_throtl); + ret = blkcg_activate_policy(q, &blkcg_policy_throtl); if (ret) kfree(td); return ret; @@ -1253,7 +1253,7 @@ void blk_throtl_exit(struct request_queue *q) { BUG_ON(!q->td); throtl_shutdown_wq(q); - blkcg_deactivate_policy(q, &blkio_policy_throtl); + blkcg_deactivate_policy(q, &blkcg_policy_throtl); kfree(q->td); } @@ -1263,7 +1263,7 @@ static int __init throtl_init(void) if (!kthrotld_workqueue) panic("Failed to create kthrotld\n"); - return blkio_policy_register(&blkio_policy_throtl); + return blkcg_policy_register(&blkcg_policy_throtl); } module_init(throtl_init); diff --git a/block/cfq-iosched.c b/block/cfq-iosched.c index 901286b5f5cb..792218281d91 100644 --- a/block/cfq-iosched.c +++ b/block/cfq-iosched.c @@ -17,7 +17,7 @@ #include "blk.h" #include "blk-cgroup.h" -static struct blkio_policy_type blkio_policy_cfq __maybe_unused; +static struct blkcg_policy blkcg_policy_cfq __maybe_unused; /* * tunables @@ -202,7 +202,7 @@ struct cfqg_stats { struct blkg_stat dequeue; /* total time spent waiting for it to be assigned a timeslice. */ struct blkg_stat group_wait_time; - /* time spent idling for this blkio_group */ + /* time spent idling for this blkcg_gq */ struct blkg_stat idle_time; /* total time with empty current active q with other requests queued */ struct blkg_stat empty_time; @@ -553,12 +553,12 @@ static inline void cfqg_stats_update_avg_queue_size(struct cfq_group *cfqg) { } #ifdef CONFIG_CFQ_GROUP_IOSCHED -static inline struct cfq_group *blkg_to_cfqg(struct blkio_group *blkg) +static inline struct cfq_group *blkg_to_cfqg(struct blkcg_gq *blkg) { - return blkg_to_pdata(blkg, &blkio_policy_cfq); + return blkg_to_pdata(blkg, &blkcg_policy_cfq); } -static inline struct blkio_group *cfqg_to_blkg(struct cfq_group *cfqg) +static inline struct blkcg_gq *cfqg_to_blkg(struct cfq_group *cfqg) { return pdata_to_blkg(cfqg); } @@ -637,7 +637,7 @@ static inline void cfqg_stats_update_completion(struct cfq_group *cfqg, io_start_time - start_time); } -static void cfqg_stats_reset(struct blkio_group *blkg) +static void cfq_pd_reset_stats(struct blkcg_gq *blkg) { struct cfq_group *cfqg = blkg_to_cfqg(blkg); struct cfqg_stats *stats = &cfqg->stats; @@ -662,8 +662,8 @@ static void cfqg_stats_reset(struct blkio_group *blkg) #else /* CONFIG_CFQ_GROUP_IOSCHED */ -static inline struct cfq_group *blkg_to_cfqg(struct blkio_group *blkg) { return NULL; } -static inline struct blkio_group *cfqg_to_blkg(struct cfq_group *cfqg) { return NULL; } +static inline struct cfq_group *blkg_to_cfqg(struct blkcg_gq *blkg) { return NULL; } +static inline struct blkcg_gq *cfqg_to_blkg(struct cfq_group *cfqg) { return NULL; } static inline void cfqg_get(struct cfq_group *cfqg) { } static inline void cfqg_put(struct cfq_group *cfqg) { } @@ -1331,7 +1331,7 @@ static void cfq_init_cfqg_base(struct cfq_group *cfqg) } #ifdef CONFIG_CFQ_GROUP_IOSCHED -static void cfq_init_blkio_group(struct blkio_group *blkg) +static void cfq_pd_init(struct blkcg_gq *blkg) { struct cfq_group *cfqg = blkg_to_cfqg(blkg); @@ -1344,16 +1344,16 @@ static void cfq_init_blkio_group(struct blkio_group *blkg) * be held. */ static struct cfq_group *cfq_lookup_create_cfqg(struct cfq_data *cfqd, - struct blkio_cgroup *blkcg) + struct blkcg *blkcg) { struct request_queue *q = cfqd->queue; struct cfq_group *cfqg = NULL; - /* avoid lookup for the common case where there's no blkio cgroup */ - if (blkcg == &blkio_root_cgroup) { + /* avoid lookup for the common case where there's no blkcg */ + if (blkcg == &blkcg_root) { cfqg = cfqd->root_group; } else { - struct blkio_group *blkg; + struct blkcg_gq *blkg; blkg = blkg_lookup_create(blkcg, q); if (!IS_ERR(blkg)) @@ -1386,8 +1386,8 @@ static u64 cfqg_prfill_weight_device(struct seq_file *sf, void *pdata, int off) static int cfqg_print_weight_device(struct cgroup *cgrp, struct cftype *cft, struct seq_file *sf) { - blkcg_print_blkgs(sf, cgroup_to_blkio_cgroup(cgrp), - cfqg_prfill_weight_device, &blkio_policy_cfq, 0, + blkcg_print_blkgs(sf, cgroup_to_blkcg(cgrp), + cfqg_prfill_weight_device, &blkcg_policy_cfq, 0, false); return 0; } @@ -1395,19 +1395,19 @@ static int cfqg_print_weight_device(struct cgroup *cgrp, struct cftype *cft, static int cfq_print_weight(struct cgroup *cgrp, struct cftype *cft, struct seq_file *sf) { - seq_printf(sf, "%u\n", cgroup_to_blkio_cgroup(cgrp)->cfq_weight); + seq_printf(sf, "%u\n", cgroup_to_blkcg(cgrp)->cfq_weight); return 0; } static int cfqg_set_weight_device(struct cgroup *cgrp, struct cftype *cft, const char *buf) { - struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp); + struct blkcg *blkcg = cgroup_to_blkcg(cgrp); struct blkg_conf_ctx ctx; struct cfq_group *cfqg; int ret; - ret = blkg_conf_prep(blkcg, &blkio_policy_cfq, buf, &ctx); + ret = blkg_conf_prep(blkcg, &blkcg_policy_cfq, buf, &ctx); if (ret) return ret; @@ -1425,8 +1425,8 @@ static int cfqg_set_weight_device(struct cgroup *cgrp, struct cftype *cft, static int cfq_set_weight(struct cgroup *cgrp, struct cftype *cft, u64 val) { - struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp); - struct blkio_group *blkg; + struct blkcg *blkcg = cgroup_to_blkcg(cgrp); + struct blkcg_gq *blkg; struct hlist_node *n; if (val < CFQ_WEIGHT_MIN || val > CFQ_WEIGHT_MAX) @@ -1449,9 +1449,9 @@ static int cfq_set_weight(struct cgroup *cgrp, struct cftype *cft, u64 val) static int cfqg_print_stat(struct cgroup *cgrp, struct cftype *cft, struct seq_file *sf) { - struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp); + struct blkcg *blkcg = cgroup_to_blkcg(cgrp); - blkcg_print_blkgs(sf, blkcg, blkg_prfill_stat, &blkio_policy_cfq, + blkcg_print_blkgs(sf, blkcg, blkg_prfill_stat, &blkcg_policy_cfq, cft->private, false); return 0; } @@ -1459,9 +1459,9 @@ static int cfqg_print_stat(struct cgroup *cgrp, struct cftype *cft, static int cfqg_print_rwstat(struct cgroup *cgrp, struct cftype *cft, struct seq_file *sf) { - struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp); + struct blkcg *blkcg = cgroup_to_blkcg(cgrp); - blkcg_print_blkgs(sf, blkcg, blkg_prfill_rwstat, &blkio_policy_cfq, + blkcg_print_blkgs(sf, blkcg, blkg_prfill_rwstat, &blkcg_policy_cfq, cft->private, true); return 0; } @@ -1485,10 +1485,10 @@ static u64 cfqg_prfill_avg_queue_size(struct seq_file *sf, void *pdata, int off) static int cfqg_print_avg_queue_size(struct cgroup *cgrp, struct cftype *cft, struct seq_file *sf) { - struct blkio_cgroup *blkcg = cgroup_to_blkio_cgroup(cgrp); + struct blkcg *blkcg = cgroup_to_blkcg(cgrp); blkcg_print_blkgs(sf, blkcg, cfqg_prfill_avg_queue_size, - &blkio_policy_cfq, 0, false); + &blkcg_policy_cfq, 0, false); return 0; } #endif /* CONFIG_DEBUG_BLK_CGROUP */ @@ -1580,7 +1580,7 @@ static struct cftype cfq_blkcg_files[] = { }; #else /* GROUP_IOSCHED */ static struct cfq_group *cfq_lookup_create_cfqg(struct cfq_data *cfqd, - struct blkio_cgroup *blkcg) + struct blkcg *blkcg) { return cfqd->root_group; } @@ -3135,7 +3135,7 @@ static void check_blkcg_changed(struct cfq_io_cq *cic, struct bio *bio) uint64_t id; rcu_read_lock(); - id = bio_blkio_cgroup(bio)->id; + id = bio_blkcg(bio)->id; rcu_read_unlock(); /* @@ -3166,14 +3166,14 @@ static struct cfq_queue * cfq_find_alloc_queue(struct cfq_data *cfqd, bool is_sync, struct cfq_io_cq *cic, struct bio *bio, gfp_t gfp_mask) { - struct blkio_cgroup *blkcg; + struct blkcg *blkcg; struct cfq_queue *cfqq, *new_cfqq = NULL; struct cfq_group *cfqg; retry: rcu_read_lock(); - blkcg = bio_blkio_cgroup(bio); + blkcg = bio_blkcg(bio); cfqg = cfq_lookup_create_cfqg(cfqd, blkcg); cfqq = cic_to_cfqq(cic, is_sync); @@ -3944,14 +3944,14 @@ static void cfq_exit_queue(struct elevator_queue *e) #ifndef CONFIG_CFQ_GROUP_IOSCHED kfree(cfqd->root_group); #endif - blkcg_deactivate_policy(q, &blkio_policy_cfq); + blkcg_deactivate_policy(q, &blkcg_policy_cfq); kfree(cfqd); } static int cfq_init_queue(struct request_queue *q) { struct cfq_data *cfqd; - struct blkio_group *blkg __maybe_unused; + struct blkcg_gq *blkg __maybe_unused; int i, ret; cfqd = kmalloc_node(sizeof(*cfqd), GFP_KERNEL | __GFP_ZERO, q->node); @@ -3966,7 +3966,7 @@ static int cfq_init_queue(struct request_queue *q) /* Init root group and prefer root group over other groups by default */ #ifdef CONFIG_CFQ_GROUP_IOSCHED - ret = blkcg_activate_policy(q, &blkio_policy_cfq); + ret = blkcg_activate_policy(q, &blkcg_policy_cfq); if (ret) goto out_free; @@ -4156,10 +4156,10 @@ static struct elevator_type iosched_cfq = { }; #ifdef CONFIG_CFQ_GROUP_IOSCHED -static struct blkio_policy_type blkio_policy_cfq = { +static struct blkcg_policy blkcg_policy_cfq = { .ops = { - .blkio_init_group_fn = cfq_init_blkio_group, - .blkio_reset_group_stats_fn = cfqg_stats_reset, + .pd_init_fn = cfq_pd_init, + .pd_reset_stats_fn = cfq_pd_reset_stats, }, .pdata_size = sizeof(struct cfq_group), .cftypes = cfq_blkcg_files, @@ -4185,7 +4185,7 @@ static int __init cfq_init(void) cfq_group_idle = 0; #endif - ret = blkio_policy_register(&blkio_policy_cfq); + ret = blkcg_policy_register(&blkcg_policy_cfq); if (ret) return ret; @@ -4202,13 +4202,13 @@ static int __init cfq_init(void) err_free_pool: kmem_cache_destroy(cfq_pool); err_pol_unreg: - blkio_policy_unregister(&blkio_policy_cfq); + blkcg_policy_unregister(&blkcg_policy_cfq); return ret; } static void __exit cfq_exit(void) { - blkio_policy_unregister(&blkio_policy_cfq); + blkcg_policy_unregister(&blkcg_policy_cfq); elv_unregister(&iosched_cfq); kmem_cache_destroy(cfq_pool); } diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 68720ab275d4..af33fb1adfee 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -31,7 +31,7 @@ struct blk_trace; struct request; struct sg_io_hdr; struct bsg_job; -struct blkio_group; +struct blkcg_gq; #define BLKDEV_MIN_RQ 4 #define BLKDEV_MAX_RQ 128 /* Default maximum */ @@ -371,7 +371,7 @@ struct request_queue { struct list_head icq_list; #ifdef CONFIG_BLK_CGROUP DECLARE_BITMAP (blkcg_pols, BLKCG_MAX_POLS); - struct blkio_group *root_blkg; + struct blkcg_gq *root_blkg; struct list_head blkg_list; #endif -- cgit v1.2.3 From 141670e9b4356b59b5b39a99e10ac0118d12b16d Mon Sep 17 00:00:00 2001 From: Ville Syrjälä Date: Thu, 5 Apr 2012 21:35:15 +0300 Subject: drm: Move drm_format_num_planes() to drm_crtc.c MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit There will be a need for this function in drm_crtc.c later. This avoids making drm_crtc.c depend on drm_crtc_helper.c. Signed-off-by: Ville Syrjälä Signed-off-by: Dave Airlie --- drivers/gpu/drm/drm_crtc.c | 32 ++++++++++++++++++++++++++++++++ drivers/gpu/drm/drm_crtc_helper.c | 33 --------------------------------- include/drm/drm_crtc.h | 2 ++ include/drm/drm_crtc_helper.h | 2 -- 4 files changed, 34 insertions(+), 35 deletions(-) (limited to 'include') diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c index d3aaeb6ae236..32ab669f4aed 100644 --- a/drivers/gpu/drm/drm_crtc.c +++ b/drivers/gpu/drm/drm_crtc.c @@ -3466,3 +3466,35 @@ void drm_fb_get_bpp_depth(uint32_t format, unsigned int *depth, } } EXPORT_SYMBOL(drm_fb_get_bpp_depth); + +/** + * drm_format_num_planes - get the number of planes for format + * @format: pixel format (DRM_FORMAT_*) + * + * RETURNS: + * The number of planes used by the specified pixel format. + */ +int drm_format_num_planes(uint32_t format) +{ + switch (format) { + case DRM_FORMAT_YUV410: + case DRM_FORMAT_YVU410: + case DRM_FORMAT_YUV411: + case DRM_FORMAT_YVU411: + case DRM_FORMAT_YUV420: + case DRM_FORMAT_YVU420: + case DRM_FORMAT_YUV422: + case DRM_FORMAT_YVU422: + case DRM_FORMAT_YUV444: + case DRM_FORMAT_YVU444: + return 3; + case DRM_FORMAT_NV12: + case DRM_FORMAT_NV21: + case DRM_FORMAT_NV16: + case DRM_FORMAT_NV61: + return 2; + default: + return 1; + } +} +EXPORT_SYMBOL(drm_format_num_planes); diff --git a/drivers/gpu/drm/drm_crtc_helper.c b/drivers/gpu/drm/drm_crtc_helper.c index 81118893264c..974196ab7b22 100644 --- a/drivers/gpu/drm/drm_crtc_helper.c +++ b/drivers/gpu/drm/drm_crtc_helper.c @@ -1023,36 +1023,3 @@ void drm_helper_hpd_irq_event(struct drm_device *dev) queue_delayed_work(system_nrt_wq, &dev->mode_config.output_poll_work, 0); } EXPORT_SYMBOL(drm_helper_hpd_irq_event); - - -/** - * drm_format_num_planes - get the number of planes for format - * @format: pixel format (DRM_FORMAT_*) - * - * RETURNS: - * The number of planes used by the specified pixel format. - */ -int drm_format_num_planes(uint32_t format) -{ - switch (format) { - case DRM_FORMAT_YUV410: - case DRM_FORMAT_YVU410: - case DRM_FORMAT_YUV411: - case DRM_FORMAT_YVU411: - case DRM_FORMAT_YUV420: - case DRM_FORMAT_YVU420: - case DRM_FORMAT_YUV422: - case DRM_FORMAT_YVU422: - case DRM_FORMAT_YUV444: - case DRM_FORMAT_YVU444: - return 3; - case DRM_FORMAT_NV12: - case DRM_FORMAT_NV21: - case DRM_FORMAT_NV16: - case DRM_FORMAT_NV61: - return 2; - default: - return 1; - } -} -EXPORT_SYMBOL(drm_format_num_planes); diff --git a/include/drm/drm_crtc.h b/include/drm/drm_crtc.h index e250eda4e3a8..9dd3ed85547d 100644 --- a/include/drm/drm_crtc.h +++ b/include/drm/drm_crtc.h @@ -1026,4 +1026,6 @@ extern int drm_mode_destroy_dumb_ioctl(struct drm_device *dev, extern void drm_fb_get_bpp_depth(uint32_t format, unsigned int *depth, int *bpp); +extern int drm_format_num_planes(uint32_t format); + #endif /* __DRM_CRTC_H__ */ diff --git a/include/drm/drm_crtc_helper.h b/include/drm/drm_crtc_helper.h index 37515d1afab3..3add00e03388 100644 --- a/include/drm/drm_crtc_helper.h +++ b/include/drm/drm_crtc_helper.h @@ -145,6 +145,4 @@ extern void drm_helper_hpd_irq_event(struct drm_device *dev); extern void drm_kms_helper_poll_disable(struct drm_device *dev); extern void drm_kms_helper_poll_enable(struct drm_device *dev); -extern int drm_format_num_planes(uint32_t format); - #endif -- cgit v1.2.3 From 5a86bd552407bd6b3e0df4e88636797484d06430 Mon Sep 17 00:00:00 2001 From: Ville Syrjälä Date: Thu, 5 Apr 2012 21:35:16 +0300 Subject: drm: Add drm_format_plane_cpp() utility function MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This function returns the bytes per pixel value based on the pixel format and plane index. Signed-off-by: Ville Syrjälä Signed-off-by: Dave Airlie --- drivers/gpu/drm/drm_crtc.c | 45 +++++++++++++++++++++++++++++++++++++++++++++ include/drm/drm_crtc.h | 1 + 2 files changed, 46 insertions(+) (limited to 'include') diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c index 32ab669f4aed..2c4e9cf2a1d2 100644 --- a/drivers/gpu/drm/drm_crtc.c +++ b/drivers/gpu/drm/drm_crtc.c @@ -3498,3 +3498,48 @@ int drm_format_num_planes(uint32_t format) } } EXPORT_SYMBOL(drm_format_num_planes); + +/** + * drm_format_plane_cpp - determine the bytes per pixel value + * @format: pixel format (DRM_FORMAT_*) + * @plane: plane index + * + * RETURNS: + * The bytes per pixel value for the specified plane. + */ +int drm_format_plane_cpp(uint32_t format, int plane) +{ + unsigned int depth; + int bpp; + + if (plane >= drm_format_num_planes(format)) + return 0; + + switch (format) { + case DRM_FORMAT_YUYV: + case DRM_FORMAT_YVYU: + case DRM_FORMAT_UYVY: + case DRM_FORMAT_VYUY: + return 2; + case DRM_FORMAT_NV12: + case DRM_FORMAT_NV21: + case DRM_FORMAT_NV16: + case DRM_FORMAT_NV61: + return plane ? 2 : 1; + case DRM_FORMAT_YUV410: + case DRM_FORMAT_YVU410: + case DRM_FORMAT_YUV411: + case DRM_FORMAT_YVU411: + case DRM_FORMAT_YUV420: + case DRM_FORMAT_YVU420: + case DRM_FORMAT_YUV422: + case DRM_FORMAT_YVU422: + case DRM_FORMAT_YUV444: + case DRM_FORMAT_YVU444: + return 1; + default: + drm_fb_get_bpp_depth(format, &depth, &bpp); + return bpp >> 3; + } +} +EXPORT_SYMBOL(drm_format_plane_cpp); diff --git a/include/drm/drm_crtc.h b/include/drm/drm_crtc.h index 9dd3ed85547d..2d128eb4293f 100644 --- a/include/drm/drm_crtc.h +++ b/include/drm/drm_crtc.h @@ -1027,5 +1027,6 @@ extern int drm_mode_destroy_dumb_ioctl(struct drm_device *dev, extern void drm_fb_get_bpp_depth(uint32_t format, unsigned int *depth, int *bpp); extern int drm_format_num_planes(uint32_t format); +extern int drm_format_plane_cpp(uint32_t format, int plane); #endif /* __DRM_CRTC_H__ */ -- cgit v1.2.3 From 01b68b0483627631c738dcfca0dee7e22892c420 Mon Sep 17 00:00:00 2001 From: Ville Syrjälä Date: Thu, 5 Apr 2012 21:35:17 +0300 Subject: drm: Add drm_format_{horz, vert}_chroma_subsampling() utility functions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit These functions return the chroma subsampling factors for the specified pixel format. Signed-off-by: Ville Syrjälä Signed-off-by: Dave Airlie --- drivers/gpu/drm/drm_crtc.c | 60 ++++++++++++++++++++++++++++++++++++++++++++++ include/drm/drm_crtc.h | 2 ++ 2 files changed, 62 insertions(+) (limited to 'include') diff --git a/drivers/gpu/drm/drm_crtc.c b/drivers/gpu/drm/drm_crtc.c index 2c4e9cf2a1d2..1b79c953b4cc 100644 --- a/drivers/gpu/drm/drm_crtc.c +++ b/drivers/gpu/drm/drm_crtc.c @@ -3543,3 +3543,63 @@ int drm_format_plane_cpp(uint32_t format, int plane) } } EXPORT_SYMBOL(drm_format_plane_cpp); + +/** + * drm_format_horz_chroma_subsampling - get the horizontal chroma subsampling factor + * @format: pixel format (DRM_FORMAT_*) + * + * RETURNS: + * The horizontal chroma subsampling factor for the + * specified pixel format. + */ +int drm_format_horz_chroma_subsampling(uint32_t format) +{ + switch (format) { + case DRM_FORMAT_YUV411: + case DRM_FORMAT_YVU411: + case DRM_FORMAT_YUV410: + case DRM_FORMAT_YVU410: + return 4; + case DRM_FORMAT_YUYV: + case DRM_FORMAT_YVYU: + case DRM_FORMAT_UYVY: + case DRM_FORMAT_VYUY: + case DRM_FORMAT_NV12: + case DRM_FORMAT_NV21: + case DRM_FORMAT_NV16: + case DRM_FORMAT_NV61: + case DRM_FORMAT_YUV422: + case DRM_FORMAT_YVU422: + case DRM_FORMAT_YUV420: + case DRM_FORMAT_YVU420: + return 2; + default: + return 1; + } +} +EXPORT_SYMBOL(drm_format_horz_chroma_subsampling); + +/** + * drm_format_vert_chroma_subsampling - get the vertical chroma subsampling factor + * @format: pixel format (DRM_FORMAT_*) + * + * RETURNS: + * The vertical chroma subsampling factor for the + * specified pixel format. + */ +int drm_format_vert_chroma_subsampling(uint32_t format) +{ + switch (format) { + case DRM_FORMAT_YUV410: + case DRM_FORMAT_YVU410: + return 4; + case DRM_FORMAT_YUV420: + case DRM_FORMAT_YVU420: + case DRM_FORMAT_NV12: + case DRM_FORMAT_NV21: + return 2; + default: + return 1; + } +} +EXPORT_SYMBOL(drm_format_vert_chroma_subsampling); diff --git a/include/drm/drm_crtc.h b/include/drm/drm_crtc.h index 2d128eb4293f..2d63a02571ff 100644 --- a/include/drm/drm_crtc.h +++ b/include/drm/drm_crtc.h @@ -1028,5 +1028,7 @@ extern void drm_fb_get_bpp_depth(uint32_t format, unsigned int *depth, int *bpp); extern int drm_format_num_planes(uint32_t format); extern int drm_format_plane_cpp(uint32_t format, int plane); +extern int drm_format_horz_chroma_subsampling(uint32_t format); +extern int drm_format_vert_chroma_subsampling(uint32_t format); #endif /* __DRM_CRTC_H__ */ -- cgit v1.2.3 From f6e252bac45cab5edc30c2ede971def51e272c9b Mon Sep 17 00:00:00 2001 From: Adam Jackson Date: Fri, 13 Apr 2012 16:33:31 -0400 Subject: drm/edid: Allow drm_mode_find_dmt to hunt for reduced-blanking modes It won't find any, yet. Fix up callers to match: standard mode codes will look prefer r-b modes for a given size if present, EST3 mode codes will look for exactly the r-b-ness mentioned in the mode code. This might mean fewer modes matched for EST3 mode codes between now and when the DMT mode list regrows the r-b modes, but practically speaking EST3 codes don't exist in the wild. Signed-off-by: Adam Jackson Tested-by: Takashi Iwai Reviewed-by: Rodrigo Vivi Signed-off-by: Dave Airlie --- drivers/gpu/drm/drm_edid.c | 37 ++++++++++++++++++++++++------------- drivers/gpu/drm/drm_fb_helper.c | 2 +- include/drm/drm_crtc.h | 3 ++- 3 files changed, 27 insertions(+), 15 deletions(-) (limited to 'include') diff --git a/drivers/gpu/drm/drm_edid.c b/drivers/gpu/drm/drm_edid.c index 9c8fa8860f6b..ec0464c91847 100644 --- a/drivers/gpu/drm/drm_edid.c +++ b/drivers/gpu/drm/drm_edid.c @@ -486,18 +486,29 @@ static void edid_fixup_preferred(struct drm_connector *connector, preferred_mode->type |= DRM_MODE_TYPE_PREFERRED; } +static bool +mode_is_rb(const struct drm_display_mode *mode) +{ + return (mode->htotal - mode->hdisplay == 160) && + (mode->hsync_end - mode->hdisplay == 80) && + (mode->hsync_end - mode->hsync_start == 32) && + (mode->vsync_start - mode->vdisplay == 3); +} + /* * drm_mode_find_dmt - Create a copy of a mode if present in DMT * @dev: Device to duplicate against * @hsize: Mode width * @vsize: Mode height * @fresh: Mode refresh rate + * @rb: Mode reduced-blanking-ness * * Walk the DMT mode list looking for a match for the given parameters. * Return a newly allocated copy of the mode, or NULL if not found. */ struct drm_display_mode *drm_mode_find_dmt(struct drm_device *dev, - int hsize, int vsize, int fresh) + int hsize, int vsize, int fresh, + bool rb) { int i; @@ -509,6 +520,8 @@ struct drm_display_mode *drm_mode_find_dmt(struct drm_device *dev, continue; if (fresh != drm_mode_vrefresh(ptr)) continue; + if (rb != mode_is_rb(ptr)) + continue; return drm_mode_duplicate(dev, ptr); } @@ -742,10 +755,17 @@ drm_mode_std(struct drm_connector *connector, struct edid *edid, } /* check whether it can be found in default mode table */ - mode = drm_mode_find_dmt(dev, hsize, vsize, vrefresh_rate); + if (drm_monitor_supports_rb(edid)) { + mode = drm_mode_find_dmt(dev, hsize, vsize, vrefresh_rate, + true); + if (mode) + return mode; + } + mode = drm_mode_find_dmt(dev, hsize, vsize, vrefresh_rate, false); if (mode) return mode; + /* okay, generate it */ switch (timing_level) { case LEVEL_DMT: break; @@ -919,15 +939,6 @@ static struct drm_display_mode *drm_mode_detailed(struct drm_device *dev, return mode; } -static bool -mode_is_rb(const struct drm_display_mode *mode) -{ - return (mode->htotal - mode->hdisplay == 160) && - (mode->hsync_end - mode->hdisplay == 80) && - (mode->hsync_end - mode->hsync_start == 32) && - (mode->vsync_start - mode->vdisplay == 3); -} - static bool mode_in_hsync_range(const struct drm_display_mode *mode, struct edid *edid, u8 *t) @@ -1073,8 +1084,8 @@ drm_est3_modes(struct drm_connector *connector, struct detailed_timing *timing) mode = drm_mode_find_dmt(connector->dev, est3_modes[m].w, est3_modes[m].h, - est3_modes[m].r - /*, est3_modes[m].rb */); + est3_modes[m].r, + est3_modes[m].rb); if (mode) { drm_mode_probed_add(connector, mode); modes++; diff --git a/drivers/gpu/drm/drm_fb_helper.c b/drivers/gpu/drm/drm_fb_helper.c index a0d6e894d97c..6e19dd156be0 100644 --- a/drivers/gpu/drm/drm_fb_helper.c +++ b/drivers/gpu/drm/drm_fb_helper.c @@ -1083,7 +1083,7 @@ static bool drm_target_cloned(struct drm_fb_helper *fb_helper, /* try and find a 1024x768 mode on each connector */ can_clone = true; - dmt_mode = drm_mode_find_dmt(fb_helper->dev, 1024, 768, 60); + dmt_mode = drm_mode_find_dmt(fb_helper->dev, 1024, 768, 60, false); for (i = 0; i < fb_helper->connector_count; i++) { diff --git a/include/drm/drm_crtc.h b/include/drm/drm_crtc.h index 2d63a02571ff..6f5faf669959 100644 --- a/include/drm/drm_crtc.h +++ b/include/drm/drm_crtc.h @@ -1015,7 +1015,8 @@ extern int drm_edid_header_is_valid(const u8 *raw_edid); extern bool drm_edid_block_valid(u8 *raw_edid); extern bool drm_edid_is_valid(struct edid *edid); struct drm_display_mode *drm_mode_find_dmt(struct drm_device *dev, - int hsize, int vsize, int fresh); + int hsize, int vsize, int fresh, + bool rb); extern int drm_mode_create_dumb_ioctl(struct drm_device *dev, void *data, struct drm_file *file_priv); -- cgit v1.2.3 From eeefa4bea1af34207c5299f989fffe03628ea164 Mon Sep 17 00:00:00 2001 From: Adam Jackson Date: Fri, 13 Apr 2012 16:33:37 -0400 Subject: drm/edid: Update range descriptor struct for EDID 1.4 Signed-off-by: Adam Jackson Tested-by: Takashi Iwai Reviewed-by: Rodrigo Vivi Signed-off-by: Dave Airlie --- include/drm/drm_edid.h | 26 ++++++++++++++++++++------ 1 file changed, 20 insertions(+), 6 deletions(-) (limited to 'include') diff --git a/include/drm/drm_edid.h b/include/drm/drm_edid.h index bcb9a66baa8c..8cefbbee996e 100644 --- a/include/drm/drm_edid.h +++ b/include/drm/drm_edid.h @@ -90,12 +90,26 @@ struct detailed_data_monitor_range { u8 min_hfreq_khz; u8 max_hfreq_khz; u8 pixel_clock_mhz; /* need to multiply by 10 */ - __le16 sec_gtf_toggle; /* A000=use above, 20=use below */ - u8 hfreq_start_khz; /* need to multiply by 2 */ - u8 c; /* need to divide by 2 */ - __le16 m; - u8 k; - u8 j; /* need to divide by 2 */ + u8 flags; + union { + struct { + u8 reserved; + u8 hfreq_start_khz; /* need to multiply by 2 */ + u8 c; /* need to divide by 2 */ + __le16 m; + u8 k; + u8 j; /* need to divide by 2 */ + } gtf2; + struct { + u8 version; + u8 data1; /* high 6 bits: extra clock resolution */ + u8 data2; /* plus low 2 of above: max hactive */ + u8 supported_aspects; + u8 flags; /* preferred aspect and blanking support */ + u8 supported_scalings; + u8 preferred_refresh; + } cvt; + } formula; } __attribute__((packed)); struct detailed_data_wpindex { -- cgit v1.2.3 From 1f15d10984c854e077da5aa1a23f901496b49773 Mon Sep 17 00:00:00 2001 From: Marcelo Tosatti Date: Fri, 20 Apr 2012 18:21:46 -0300 Subject: KVM: add kvm_arch_para_features stub to asm-generic/kvm_para.h Needed by kvm_para_has_feature(). Reported-by: Stephen Rothwell Signed-off-by: Marcelo Tosatti --- include/asm-generic/kvm_para.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include') diff --git a/include/asm-generic/kvm_para.h b/include/asm-generic/kvm_para.h index 05ef7e705939..9a7bbadb688d 100644 --- a/include/asm-generic/kvm_para.h +++ b/include/asm-generic/kvm_para.h @@ -11,4 +11,9 @@ static inline bool kvm_check_and_clear_guest_paused(void) return false; } +static inline unsigned int kvm_arch_para_features(void) +{ + return 0; +} + #endif -- cgit v1.2.3 From 4ccf4beab8c447f8cd33d46afb6e10e1aa3befc6 Mon Sep 17 00:00:00 2001 From: Wolfram Sang Date: Wed, 31 Aug 2011 20:35:40 +0200 Subject: lib: add support for stmp-style devices MX23/28 use IP cores which follow a register layout I have first seen on STMP3xxx SoCs. In this layout, every register actually has four u32: 1.) to store a value directly 2.) a SET register where every 1-bit sets the corresponding bit, others are unaffected 3.) same with a CLR register 4.) same with a TOG (toggle) register Also, the 2 MSBs in register 0 are always the same and can be used to reset the IP core. All this is strictly speaking not mach-specific (but IP core specific) and, thus, doesn't need to be in mach-mxs/include. At least mx6 also uses IP cores following this stmp-style. So: Introduce a stmp-style device, put the code and defines for that in a public place (lib/), and let drivers for stmp-style devices select that code. To avoid regressions and ease reviewing, the actual code is simply copied from mach-mxs. It definately wants updates, but those need a seperate patch series. Voila, mach dependency gone, reusable code introduced. Note that I didn't remove the duplicated code from mach-mxs yet, first the drivers have to be converted. Signed-off-by: Wolfram Sang Acked-by: Shawn Guo Acked-by: Dong Aisheng --- include/linux/stmp_device.h | 20 ++++++++++++ lib/Kconfig | 3 ++ lib/Makefile | 2 ++ lib/stmp_device.c | 80 +++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 105 insertions(+) create mode 100644 include/linux/stmp_device.h create mode 100644 lib/stmp_device.c (limited to 'include') diff --git a/include/linux/stmp_device.h b/include/linux/stmp_device.h new file mode 100644 index 000000000000..6cf7ec9547cf --- /dev/null +++ b/include/linux/stmp_device.h @@ -0,0 +1,20 @@ +/* + * basic functions for devices following the "stmp" style register layout + * + * Copyright (C) 2011 Wolfram Sang, Pengutronix e.K. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +#ifndef __STMP_DEVICE_H__ +#define __STMP_DEVICE_H__ + +#define STMP_OFFSET_REG_SET 0x4 +#define STMP_OFFSET_REG_CLR 0x8 +#define STMP_OFFSET_REG_TOG 0xc + +extern int stmp_reset_block(void __iomem *); +#endif /* __STMP_DEVICE_H__ */ diff --git a/lib/Kconfig b/lib/Kconfig index 4a8aba2e5cc0..c5da1548b964 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -33,6 +33,9 @@ config GENERIC_IO boolean default n +config STMP_DEVICE + bool + config CRC_CCITT tristate "CRC-CCITT functions" help diff --git a/lib/Makefile b/lib/Makefile index 18515f0267c4..f78dbcdc7e3d 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -123,6 +123,8 @@ obj-$(CONFIG_SIGNATURE) += digsig.o obj-$(CONFIG_CLZ_TAB) += clz_tab.o +obj-$(CONFIG_STMP_DEVICE) += stmp_device.o + hostprogs-y := gen_crc32table clean-files := crc32table.h diff --git a/lib/stmp_device.c b/lib/stmp_device.c new file mode 100644 index 000000000000..8ac9bcc4289a --- /dev/null +++ b/lib/stmp_device.c @@ -0,0 +1,80 @@ +/* + * Copyright (C) 1999 ARM Limited + * Copyright (C) 2000 Deep Blue Solutions Ltd + * Copyright 2006-2007,2010 Freescale Semiconductor, Inc. All Rights Reserved. + * Copyright 2008 Juergen Beisert, kernel@pengutronix.de + * Copyright 2009 Ilya Yanok, Emcraft Systems Ltd, yanok@emcraft.com + * Copyright (C) 2011 Wolfram Sang, Pengutronix e.K. + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + */ + +#include +#include +#include +#include +#include + +#define STMP_MODULE_CLKGATE (1 << 30) +#define STMP_MODULE_SFTRST (1 << 31) + +/* + * Clear the bit and poll it cleared. This is usually called with + * a reset address and mask being either SFTRST(bit 31) or CLKGATE + * (bit 30). + */ +static int stmp_clear_poll_bit(void __iomem *addr, u32 mask) +{ + int timeout = 0x400; + + writel(mask, addr + STMP_OFFSET_REG_CLR); + udelay(1); + while ((readl(addr) & mask) && --timeout) + /* nothing */; + + return !timeout; +} + +int stmp_reset_block(void __iomem *reset_addr) +{ + int ret; + int timeout = 0x400; + + /* clear and poll SFTRST */ + ret = stmp_clear_poll_bit(reset_addr, STMP_MODULE_SFTRST); + if (unlikely(ret)) + goto error; + + /* clear CLKGATE */ + writel(STMP_MODULE_CLKGATE, reset_addr + STMP_OFFSET_REG_CLR); + + /* set SFTRST to reset the block */ + writel(STMP_MODULE_SFTRST, reset_addr + STMP_OFFSET_REG_SET); + udelay(1); + + /* poll CLKGATE becoming set */ + while ((!(readl(reset_addr) & STMP_MODULE_CLKGATE)) && --timeout) + /* nothing */; + if (unlikely(!timeout)) + goto error; + + /* clear and poll SFTRST */ + ret = stmp_clear_poll_bit(reset_addr, STMP_MODULE_SFTRST); + if (unlikely(ret)) + goto error; + + /* clear and poll CLKGATE */ + ret = stmp_clear_poll_bit(reset_addr, STMP_MODULE_CLKGATE); + if (unlikely(ret)) + goto error; + + return 0; + +error: + pr_err("%s(%p): module reset timeout\n", __func__, reset_addr); + return -ETIMEDOUT; +} +EXPORT_SYMBOL(stmp_reset_block); -- cgit v1.2.3 From bbbc4c4d8c5face097d695f9bf3a39647ba6b7e7 Mon Sep 17 00:00:00 2001 From: Nicolas Pitre Date: Mon, 16 Apr 2012 19:16:54 -0400 Subject: mmc: sdio: avoid spurious calls to interrupt handlers Commit 06e8935feb ("optimized SDIO IRQ handling for single irq") introduced some spurious calls to SDIO function interrupt handlers, such as when the SDIO IRQ thread is started, or the safety check performed upon a system resume. Let's add a flag to perform the optimization only when a real interrupt is signaled by the host driver and we know there is no point confirming it. Reported-by: Sujit Reddy Thumma Signed-off-by: Nicolas Pitre Cc: stable Signed-off-by: Chris Ball --- drivers/mmc/core/sdio.c | 2 +- drivers/mmc/core/sdio_irq.c | 11 +++++++---- include/linux/mmc/host.h | 2 ++ 3 files changed, 10 insertions(+), 5 deletions(-) (limited to 'include') diff --git a/drivers/mmc/core/sdio.c b/drivers/mmc/core/sdio.c index 2c7c83f832d2..13d0e95380ab 100644 --- a/drivers/mmc/core/sdio.c +++ b/drivers/mmc/core/sdio.c @@ -947,7 +947,7 @@ static int mmc_sdio_resume(struct mmc_host *host) } if (!err && host->sdio_irqs) - mmc_signal_sdio_irq(host); + wake_up_process(host->sdio_irq_thread); mmc_release_host(host); /* diff --git a/drivers/mmc/core/sdio_irq.c b/drivers/mmc/core/sdio_irq.c index f573e7f9f740..3d8ceb4084de 100644 --- a/drivers/mmc/core/sdio_irq.c +++ b/drivers/mmc/core/sdio_irq.c @@ -28,18 +28,20 @@ #include "sdio_ops.h" -static int process_sdio_pending_irqs(struct mmc_card *card) +static int process_sdio_pending_irqs(struct mmc_host *host) { + struct mmc_card *card = host->card; int i, ret, count; unsigned char pending; struct sdio_func *func; /* * Optimization, if there is only 1 function interrupt registered - * call irq handler directly + * and we know an IRQ was signaled then call irq handler directly. + * Otherwise do the full probe. */ func = card->sdio_single_irq; - if (func) { + if (func && host->sdio_irq_pending) { func->irq_handler(func); return 1; } @@ -116,7 +118,8 @@ static int sdio_irq_thread(void *_host) ret = __mmc_claim_host(host, &host->sdio_irq_thread_abort); if (ret) break; - ret = process_sdio_pending_irqs(host->card); + ret = process_sdio_pending_irqs(host); + host->sdio_irq_pending = false; mmc_release_host(host); /* diff --git a/include/linux/mmc/host.h b/include/linux/mmc/host.h index cbde4b7e675e..0707d228d7f1 100644 --- a/include/linux/mmc/host.h +++ b/include/linux/mmc/host.h @@ -297,6 +297,7 @@ struct mmc_host { unsigned int sdio_irqs; struct task_struct *sdio_irq_thread; + bool sdio_irq_pending; atomic_t sdio_irq_thread_abort; mmc_pm_flag_t pm_flags; /* requested pm features */ @@ -352,6 +353,7 @@ extern int mmc_cache_ctrl(struct mmc_host *, u8); static inline void mmc_signal_sdio_irq(struct mmc_host *host) { host->ops->enable_sdio_irq(host, 0); + host->sdio_irq_pending = true; wake_up_process(host->sdio_irq_thread); } -- cgit v1.2.3 From 62c1dcfc7451a8e42104776705a317e06a8e24a3 Mon Sep 17 00:00:00 2001 From: Tomi Valkeinen Date: Thu, 8 Mar 2012 12:37:58 +0200 Subject: OMAPDSS: add set_min_bus_tput pointer to omapdss's platform data omapdss driver needs to use the omap_pm_set_min_bus_tput(), so add a new entry for that in omapdss's platform data, and set it. Signed-off-by: Tomi Valkeinen Cc: Paul Walmsley Acked-by: Kevin Hilman --- arch/arm/mach-omap2/display.c | 6 ++++++ include/video/omapdss.h | 1 + 2 files changed, 7 insertions(+) (limited to 'include') diff --git a/arch/arm/mach-omap2/display.c b/arch/arm/mach-omap2/display.c index db5a88a36c63..60cded4738a0 100644 --- a/arch/arm/mach-omap2/display.c +++ b/arch/arm/mach-omap2/display.c @@ -180,6 +180,11 @@ static void omap_dsi_disable_pads(int dsi_id, unsigned lane_mask) omap4_dsi_mux_pads(dsi_id, 0); } +static int omap_dss_set_min_bus_tput(struct device *dev, unsigned long tput) +{ + return omap_pm_set_min_bus_tput(dev, OCP_INITIATOR_AGENT, tput); +} + int __init omap_display_init(struct omap_dss_board_info *board_data) { int r = 0; @@ -210,6 +215,7 @@ int __init omap_display_init(struct omap_dss_board_info *board_data) pdata.board_data = board_data; pdata.board_data->get_context_loss_count = omap_pm_get_dev_context_loss_count; + pdata.board_data->set_min_bus_tput = omap_dss_set_min_bus_tput; for (i = 0; i < oh_count; i++) { oh = omap_hwmod_lookup(curr_dss_hwmod[i].oh_name); diff --git a/include/video/omapdss.h b/include/video/omapdss.h index 483f67caa7ad..7aecadbb1d9c 100644 --- a/include/video/omapdss.h +++ b/include/video/omapdss.h @@ -309,6 +309,7 @@ struct omap_dss_board_info { struct omap_dss_device *default_device; int (*dsi_enable_pads)(int dsi_id, unsigned lane_mask); void (*dsi_disable_pads)(int dsi_id, unsigned lane_mask); + int (*set_min_bus_tput)(struct device *dev, unsigned long r); }; /* Init with the board info */ -- cgit v1.2.3 From 4b6430fc98cfe051eab69f4696a608bba14ebd6c Mon Sep 17 00:00:00 2001 From: Grazvydas Ignotas Date: Thu, 15 Mar 2012 20:00:23 +0200 Subject: OMAPDSS: provide default get_timings function for panels With this we can eliminate some duplicate code in panel drivers. Also lgphilips-lb035q02, nec-nl8048hl11-01b, picodlp and tpo-td043mtea1 gain support of reading timings over sysfs. Signed-off-by: Grazvydas Ignotas Signed-off-by: Tomi Valkeinen --- drivers/video/omap2/displays/panel-acx565akm.c | 7 ------- drivers/video/omap2/displays/panel-generic-dpi.c | 7 ------- drivers/video/omap2/displays/panel-n8x0.c | 8 -------- drivers/video/omap2/displays/panel-taal.c | 8 -------- drivers/video/omap2/dss/core.c | 2 ++ drivers/video/omap2/dss/display.c | 7 +++++++ drivers/video/omap2/dss/venc.c | 7 ------- include/video/omapdss.h | 2 ++ 8 files changed, 11 insertions(+), 37 deletions(-) (limited to 'include') diff --git a/drivers/video/omap2/displays/panel-acx565akm.c b/drivers/video/omap2/displays/panel-acx565akm.c index d26f37ac69d8..c98f2c16f744 100644 --- a/drivers/video/omap2/displays/panel-acx565akm.c +++ b/drivers/video/omap2/displays/panel-acx565akm.c @@ -738,12 +738,6 @@ static void acx_panel_set_timings(struct omap_dss_device *dssdev, } } -static void acx_panel_get_timings(struct omap_dss_device *dssdev, - struct omap_video_timings *timings) -{ - *timings = dssdev->panel.timings; -} - static int acx_panel_check_timings(struct omap_dss_device *dssdev, struct omap_video_timings *timings) { @@ -761,7 +755,6 @@ static struct omap_dss_driver acx_panel_driver = { .resume = acx_panel_resume, .set_timings = acx_panel_set_timings, - .get_timings = acx_panel_get_timings, .check_timings = acx_panel_check_timings, .get_recommended_bpp = acx_get_recommended_bpp, diff --git a/drivers/video/omap2/displays/panel-generic-dpi.c b/drivers/video/omap2/displays/panel-generic-dpi.c index 30fe4dfeb227..e2b21c511fcf 100644 --- a/drivers/video/omap2/displays/panel-generic-dpi.c +++ b/drivers/video/omap2/displays/panel-generic-dpi.c @@ -549,12 +549,6 @@ static void generic_dpi_panel_set_timings(struct omap_dss_device *dssdev, dpi_set_timings(dssdev, timings); } -static void generic_dpi_panel_get_timings(struct omap_dss_device *dssdev, - struct omap_video_timings *timings) -{ - *timings = dssdev->panel.timings; -} - static int generic_dpi_panel_check_timings(struct omap_dss_device *dssdev, struct omap_video_timings *timings) { @@ -571,7 +565,6 @@ static struct omap_dss_driver dpi_driver = { .resume = generic_dpi_panel_resume, .set_timings = generic_dpi_panel_set_timings, - .get_timings = generic_dpi_panel_get_timings, .check_timings = generic_dpi_panel_check_timings, .driver = { diff --git a/drivers/video/omap2/displays/panel-n8x0.c b/drivers/video/omap2/displays/panel-n8x0.c index dc9408dc93d1..4a34cdc1371b 100644 --- a/drivers/video/omap2/displays/panel-n8x0.c +++ b/drivers/video/omap2/displays/panel-n8x0.c @@ -610,12 +610,6 @@ static int n8x0_panel_resume(struct omap_dss_device *dssdev) return 0; } -static void n8x0_panel_get_timings(struct omap_dss_device *dssdev, - struct omap_video_timings *timings) -{ - *timings = dssdev->panel.timings; -} - static void n8x0_panel_get_resolution(struct omap_dss_device *dssdev, u16 *xres, u16 *yres) { @@ -678,8 +672,6 @@ static struct omap_dss_driver n8x0_panel_driver = { .get_resolution = n8x0_panel_get_resolution, .get_recommended_bpp = omapdss_default_get_recommended_bpp, - .get_timings = n8x0_panel_get_timings, - .driver = { .name = "n8x0_panel", .owner = THIS_MODULE, diff --git a/drivers/video/omap2/displays/panel-taal.c b/drivers/video/omap2/displays/panel-taal.c index 72d63076ab19..3053399faf9a 100644 --- a/drivers/video/omap2/displays/panel-taal.c +++ b/drivers/video/omap2/displays/panel-taal.c @@ -507,12 +507,6 @@ static const struct backlight_ops taal_bl_ops = { .update_status = taal_bl_update_status, }; -static void taal_get_timings(struct omap_dss_device *dssdev, - struct omap_video_timings *timings) -{ - *timings = dssdev->panel.timings; -} - static void taal_get_resolution(struct omap_dss_device *dssdev, u16 *xres, u16 *yres) { @@ -1807,8 +1801,6 @@ static struct omap_dss_driver taal_driver = { .run_test = taal_run_test, .memory_read = taal_memory_read, - .get_timings = taal_get_timings, - .driver = { .name = "taal", .owner = THIS_MODULE, diff --git a/drivers/video/omap2/dss/core.c b/drivers/video/omap2/dss/core.c index 5ad8cc798235..64cb8aa49b26 100644 --- a/drivers/video/omap2/dss/core.c +++ b/drivers/video/omap2/dss/core.c @@ -391,6 +391,8 @@ int omap_dss_register_driver(struct omap_dss_driver *dssdriver) if (dssdriver->get_recommended_bpp == NULL) dssdriver->get_recommended_bpp = omapdss_default_get_recommended_bpp; + if (dssdriver->get_timings == NULL) + dssdriver->get_timings = omapdss_default_get_timings; return driver_register(&dssdriver->driver); } diff --git a/drivers/video/omap2/dss/display.c b/drivers/video/omap2/dss/display.c index 4424c198dbcd..e688d10f061a 100644 --- a/drivers/video/omap2/dss/display.c +++ b/drivers/video/omap2/dss/display.c @@ -308,6 +308,13 @@ int omapdss_default_get_recommended_bpp(struct omap_dss_device *dssdev) } EXPORT_SYMBOL(omapdss_default_get_recommended_bpp); +void omapdss_default_get_timings(struct omap_dss_device *dssdev, + struct omap_video_timings *timings) +{ + *timings = dssdev->panel.timings; +} +EXPORT_SYMBOL(omapdss_default_get_timings); + /* Checks if replication logic should be used. Only use for active matrix, * when overlay is in RGB12U or RGB16 mode, and LCD interface is * 18bpp or 24bpp */ diff --git a/drivers/video/omap2/dss/venc.c b/drivers/video/omap2/dss/venc.c index abfbd4ac3e22..13a20da8ea91 100644 --- a/drivers/video/omap2/dss/venc.c +++ b/drivers/video/omap2/dss/venc.c @@ -579,12 +579,6 @@ static int venc_panel_resume(struct omap_dss_device *dssdev) return venc_panel_enable(dssdev); } -static void venc_get_timings(struct omap_dss_device *dssdev, - struct omap_video_timings *timings) -{ - *timings = dssdev->panel.timings; -} - static void venc_set_timings(struct omap_dss_device *dssdev, struct omap_video_timings *timings) { @@ -663,7 +657,6 @@ static struct omap_dss_driver venc_driver = { .get_resolution = omapdss_default_get_resolution, .get_recommended_bpp = omapdss_default_get_recommended_bpp, - .get_timings = venc_get_timings, .set_timings = venc_set_timings, .check_timings = venc_check_timings, diff --git a/include/video/omapdss.h b/include/video/omapdss.h index 7aecadbb1d9c..5f36ddd0e295 100644 --- a/include/video/omapdss.h +++ b/include/video/omapdss.h @@ -667,6 +667,8 @@ struct omap_overlay *omap_dss_get_overlay(int num); void omapdss_default_get_resolution(struct omap_dss_device *dssdev, u16 *xres, u16 *yres); int omapdss_default_get_recommended_bpp(struct omap_dss_device *dssdev); +void omapdss_default_get_timings(struct omap_dss_device *dssdev, + struct omap_video_timings *timings); typedef void (*omap_dispc_isr_t) (void *arg, u32 mask); int omap_dispc_register_isr(omap_dispc_isr_t isr, void *arg, u32 mask); -- cgit v1.2.3 From 8353e6c632aeaea1470a286b83e68ca233073068 Mon Sep 17 00:00:00 2001 From: Takashi Iwai Date: Mon, 23 Apr 2012 17:40:49 +0100 Subject: drm/edid: Add packed attribute to new gtf2 and cvt structs The new structs added in struct detailed_data_monitor_range must be marked with packed attribute although the outer struct itself is already marked as packed. Otherwise these 7-bytes structs may be aligned, and give the wrong position and size for the data. Signed-off-by: Takashi Iwai Acked-by: Adam Jackson Signed-off-by: Dave Airlie --- include/drm/drm_edid.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include') diff --git a/include/drm/drm_edid.h b/include/drm/drm_edid.h index 8cefbbee996e..0cac551c5347 100644 --- a/include/drm/drm_edid.h +++ b/include/drm/drm_edid.h @@ -99,7 +99,7 @@ struct detailed_data_monitor_range { __le16 m; u8 k; u8 j; /* need to divide by 2 */ - } gtf2; + } __attribute__((packed)) gtf2; struct { u8 version; u8 data1; /* high 6 bits: extra clock resolution */ @@ -108,7 +108,7 @@ struct detailed_data_monitor_range { u8 flags; /* preferred aspect and blanking support */ u8 supported_scalings; u8 preferred_refresh; - } cvt; + } __attribute__((packed)) cvt; } formula; } __attribute__((packed)); -- cgit v1.2.3 From 9923777dff4543050fdf938cf6b19f6d4376b7c5 Mon Sep 17 00:00:00 2001 From: Daniel Vetter Date: Sat, 14 Apr 2012 18:03:10 +0200 Subject: mm: fixup compilation error due to an asm write through a const pointer This regression has been introduced in commit f56f821feb7b36223f309e0ec05986bb137ce418 Author: Daniel Vetter Date: Sun Mar 25 19:47:41 2012 +0200 mm: extend prefault helpers to fault in more than PAGE_SIZE I have failed to notice this because x86 asm seems to happily compile things as-is. Reported-by: Geert Uytterhoeven Signed-off-by: Dave Airlie --- include/linux/pagemap.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'include') diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index c93a9a9bcd35..efa26b4da8d2 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -461,7 +461,7 @@ static inline int fault_in_pages_readable(const char __user *uaddr, int size) static inline int fault_in_multipages_writeable(char __user *uaddr, int size) { int ret; - const char __user *end = uaddr + size - 1; + char __user *end = uaddr + size - 1; if (unlikely(size == 0)) return 0; -- cgit v1.2.3 From 1a39b310e920bb7098067d96411b31e459ae8f32 Mon Sep 17 00:00:00 2001 From: Matthew Garrett Date: Mon, 16 Apr 2012 16:26:02 -0400 Subject: vgaarb: Add support for setting the default video device (v2) The default VGA device is a somewhat fluid concept on platforms with multiple GPUs. Add support for setting it so switching code can update things appropriately, and make sure that the sysfs code returns the right device if it's changed. v2: Updated to fix builds when __ARCH_HAS_VGA_DEFAULT_DEVICE is false. Signed-off-by: Matthew Garrett Acked-by: H. Peter Anvin Acked-by: benh@kernel.crashing.org Cc: airlied@redhat.com Signed-off-by: Dave Airlie --- drivers/gpu/vga/vgaarb.c | 7 +++++++ drivers/pci/pci-sysfs.c | 5 +++++ include/linux/vgaarb.h | 2 ++ 3 files changed, 14 insertions(+) (limited to 'include') diff --git a/drivers/gpu/vga/vgaarb.c b/drivers/gpu/vga/vgaarb.c index 111d956d8e7d..e223b96fa6a0 100644 --- a/drivers/gpu/vga/vgaarb.c +++ b/drivers/gpu/vga/vgaarb.c @@ -136,6 +136,11 @@ struct pci_dev *vga_default_device(void) { return vga_default; } + +void vga_set_default_device(struct pci_dev *pdev) +{ + vga_default = pdev; +} #endif static inline void vga_irq_set_state(struct vga_device *vgadev, bool state) @@ -605,10 +610,12 @@ static bool vga_arbiter_del_pci_device(struct pci_dev *pdev) goto bail; } +#ifndef __ARCH_HAS_VGA_DEFAULT_DEVICE if (vga_default == pdev) { pci_dev_put(vga_default); vga_default = NULL; } +#endif if (vgadev->decodes & (VGA_RSRC_LEGACY_IO | VGA_RSRC_LEGACY_MEM)) vga_decode_count--; diff --git a/drivers/pci/pci-sysfs.c b/drivers/pci/pci-sysfs.c index a55e248618cd..86c63fe45d11 100644 --- a/drivers/pci/pci-sysfs.c +++ b/drivers/pci/pci-sysfs.c @@ -27,6 +27,7 @@ #include #include #include +#include #include "pci.h" static int sysfs_initialized; /* = 0 */ @@ -417,6 +418,10 @@ static ssize_t boot_vga_show(struct device *dev, struct device_attribute *attr, char *buf) { struct pci_dev *pdev = to_pci_dev(dev); + struct pci_dev *vga_dev = vga_default_device(); + + if (vga_dev) + return sprintf(buf, "%u\n", (pdev == vga_dev)); return sprintf(buf, "%u\n", !!(pdev->resource[PCI_ROM_RESOURCE].flags & diff --git a/include/linux/vgaarb.h b/include/linux/vgaarb.h index 9c3120dca294..759a25ba0539 100644 --- a/include/linux/vgaarb.h +++ b/include/linux/vgaarb.h @@ -31,6 +31,7 @@ #ifndef LINUX_VGA_H #define LINUX_VGA_H +#include