From 227c3311786dbe64cb221e63d53817f98240e587 Mon Sep 17 00:00:00 2001 From: "David S. Miller" Date: Sat, 26 Apr 2008 02:19:18 -0700 Subject: sparc64: Fix wedged irq regression. Kernel bugzilla 10273 As reported by Jos van der Ende, ever since commit 5a606b72a4309a656cd1a19ad137dc5557c4b8ea ("[SPARC64]: Do not ACK an INO if it is disabled or inprogress.") sun4u interrupts can get stuck. What this changset did was add the following conditional to the various IRQ chip ->enable() handlers on sparc64: if (unlikely(desc->status & (IRQ_DISABLED|IRQ_INPROGRESS))) return; which is correct, however it means that special care is needed in the ->enable() method. Specifically we must put the interrupt into IDLE state during an enable, or else it might never be sent out again. Setting the INO interrupt state to IDLE resets the state machine, the interrupt input to the INO is retested by the hardware, and if an interrupt is being signalled by the device, the INO moves back into TRANSMIT state, and an interrupt vector is sent to the cpu. The two sun4v IRQ chip handlers were already doing this properly, only sun4u got it wrong. Signed-off-by: David S. Miller --- arch/sparc64/kernel/irq.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/arch/sparc64/kernel/irq.c b/arch/sparc64/kernel/irq.c index eb88bd6e674e..b441a26b73b0 100644 --- a/arch/sparc64/kernel/irq.c +++ b/arch/sparc64/kernel/irq.c @@ -1,6 +1,6 @@ /* irq.c: UltraSparc IRQ handling/init/registry. * - * Copyright (C) 1997, 2007 David S. Miller (davem@davemloft.net) + * Copyright (C) 1997, 2007, 2008 David S. Miller (davem@davemloft.net) * Copyright (C) 1998 Eddie C. Dost (ecd@skynet.be) * Copyright (C) 1998 Jakub Jelinek (jj@ultra.linux.cz) */ @@ -308,6 +308,7 @@ static void sun4u_irq_enable(unsigned int virt_irq) IMAP_AID_SAFARI | IMAP_NID_SAFARI); val |= tid | IMAP_VALID; upa_writeq(val, imap); + upa_writeq(ICLR_IDLE, data->iclr); } } -- cgit v1.2.3 From 05d515ef3d14eb95ffe9239ec1b8a03b24fa8469 Mon Sep 17 00:00:00 2001 From: "David S. Miller" Date: Sat, 26 Apr 2008 03:07:34 -0700 Subject: sparc64: Cleanups and corrections for arch/sparc64/Kconfig Refer to chip as "SPARC" throughout. Say 32-bit SPARC and 64-bit SPARC rather than mentioning specific chips such like UltraSPARC, as appropriate. Remove non-sense help text referring to things that will never appear on a SPARC system, such as EISA busses etc. Use "help" instead of "--help--" Signed-off-by: David S. Miller --- arch/sparc64/Kconfig | 82 +++++++++++----------------------------------------- 1 file changed, 17 insertions(+), 65 deletions(-) diff --git a/arch/sparc64/Kconfig b/arch/sparc64/Kconfig index 8acc5cc38621..d3fa0f8fc613 100644 --- a/arch/sparc64/Kconfig +++ b/arch/sparc64/Kconfig @@ -1,9 +1,5 @@ -# $Id: config.in,v 1.158 2002/01/24 22:14:44 davem Exp $ -# For a description of the syntax of this configuration file, -# see the Configure script. -# - -mainmenu "Linux/UltraSPARC Kernel Configuration" +# sparc64 configuration +mainmenu "Linux Kernel Configuration for 64-bit SPARC" config SPARC bool @@ -17,12 +13,6 @@ config SPARC64 default y select HAVE_IDE select HAVE_LMB - help - SPARC is a family of RISC microprocessors designed and marketed by - Sun Microsystems, incorporated. This port covers the newer 64-bit - UltraSPARC. The UltraLinux project maintains both the SPARC32 and - SPARC64 ports; its web page is available at - . config GENERIC_TIME bool @@ -97,7 +87,7 @@ config SPARC64_PAGE_SIZE_8KB help This lets you select the page size of the kernel. - 8KB and 64KB work quite well, since Sparc ELF sections + 8KB and 64KB work quite well, since SPARC ELF sections provide for up to 64KB alignment. Therefore, 512KB and 4MB are for expert hackers only. @@ -138,7 +128,7 @@ config HOTPLUG_CPU bool "Support for hot-pluggable CPUs" depends on SMP select HOTPLUG - ---help--- + help Say Y here to experiment with turning CPUs off and on. CPUs can be controlled through /sys/devices/system/cpu/cpu#. Say N if you want to disable CPU hotplug. @@ -155,23 +145,16 @@ source "kernel/time/Kconfig" config SMP bool "Symmetric multi-processing support" - ---help--- + help This enables support for systems with more than one CPU. If you have a system with only one CPU, say N. If you have a system with more than one CPU, say Y. If you say N here, the kernel will run on single and multiprocessor machines, but will use only one CPU of a multiprocessor machine. If - you say Y here, the kernel will run on many, but not all, - singleprocessor machines. On a singleprocessor machine, the kernel - will run faster if you say N here. - - People using multiprocessor machines who say Y here should also say - Y to "Enhanced Real Time Clock Support", below. The "Advanced Power - Management" code will be disabled if you say Y here. - - See also and the SMP-HOWTO - available at . + you say Y here, the kernel will run on single-processor machines. + On a single-processor machine, the kernel will run faster if you say + N here. If you don't know what to do here, say N. @@ -284,50 +267,19 @@ source "mm/Kconfig" config ISA bool - help - Find out whether you have ISA slots on your motherboard. ISA is the - name of a bus system, i.e. the way the CPU talks to the other stuff - inside your box. Other bus systems are PCI, EISA, MicroChannel - (MCA) or VESA. ISA is an older system, now being displaced by PCI; - newer boards don't support it. If you have ISA, say Y, otherwise N. config ISAPNP bool - help - Say Y here if you would like support for ISA Plug and Play devices. - Some information is in . - - To compile this driver as a module, choose M here: the - module will be called isapnp. - - If unsure, say Y. config EISA bool - ---help--- - The Extended Industry Standard Architecture (EISA) bus was - developed as an open alternative to the IBM MicroChannel bus. - - The EISA bus provided some of the features of the IBM MicroChannel - bus while maintaining backward compatibility with cards made for - the older ISA bus. The EISA bus saw limited use between 1988 and - 1995 when it was made obsolete by the PCI bus. - - Say Y here if you are building a kernel for an EISA-based machine. - - Otherwise, say N. config MCA bool - help - MicroChannel Architecture is found in some IBM PS/2 machines and - laptops. It is a bus system similar to PCI or ISA. See - (and especially the web page given - there) before attempting to build an MCA bus kernel. config PCMCIA tristate - ---help--- + help Say Y here if you want to attach PCMCIA- or PC-cards to your Linux computer. These are credit-card size devices such as network cards, modems or hard drives often used with laptops computers. There are @@ -369,10 +321,10 @@ config PCI bool "PCI support" select ARCH_SUPPORTS_MSI help - Find out whether you have a PCI motherboard. PCI is the name of a - bus system, i.e. the way the CPU talks to the other stuff inside - your box. Other bus systems are ISA, EISA, MicroChannel (MCA) or - VESA. If you have PCI, say Y, otherwise N. + Find out whether your system includes a PCI bus. PCI is the name of + a bus system, i.e. the way the CPU talks to the other stuff inside + your box. If you say Y here, the kernel will include drivers and + infrastructure code to support PCI bus devices. config PCI_DOMAINS def_bool PCI @@ -397,9 +349,9 @@ menu "Executable file formats" source "fs/Kconfig.binfmt" config SPARC32_COMPAT - bool "Kernel support for Linux/Sparc 32bit binary compatibility" + bool "Kernel support for 32-bit SPARC binary compatibility" help - This allows you to run 32-bit binaries on your Ultra. + This allows you to run 32-bit binaries on your 64-bit SPARC system. Everybody wants this; say Y. config COMPAT @@ -421,8 +373,8 @@ config SCHED_SMT default y help SMT scheduler support improves the CPU scheduler's decision making - when dealing with UltraSPARC cpus at a cost of slightly increased - overhead in some places. If unsure say N here. + when dealing with SPARC cpus at a cost of slightly increased overhead + in some places. If unsure say N here. config SCHED_MC bool "Multi-core scheduler support" -- cgit v1.2.3 From 09337f501ebdd224cd69df6d168a5c4fe75d86fa Mon Sep 17 00:00:00 2001 From: "David S. Miller" Date: Sat, 26 Apr 2008 03:17:12 -0700 Subject: sparc64: Kill CONFIG_SPARC32_COMPAT It's completely superfluous, CONFIG_COMPAT is sufficient. What this used to be is an umbrella for enabling code shared by all 32-bit compat binary support types. But with the removal of SunOS and Solaris support, the only one left is Linux 32-bit ELF. Update defconfig. Signed-off-by: David S. Miller --- arch/sparc64/Kconfig | 7 ------- arch/sparc64/defconfig | 23 +++++++++-------------- arch/sparc64/kernel/Makefile | 4 ++-- arch/sparc64/kernel/audit.c | 6 +++--- arch/sparc64/kernel/signal.c | 4 ++-- init/Kconfig | 2 +- 6 files changed, 17 insertions(+), 29 deletions(-) diff --git a/arch/sparc64/Kconfig b/arch/sparc64/Kconfig index d3fa0f8fc613..edbe71e3fab9 100644 --- a/arch/sparc64/Kconfig +++ b/arch/sparc64/Kconfig @@ -348,15 +348,8 @@ menu "Executable file formats" source "fs/Kconfig.binfmt" -config SPARC32_COMPAT - bool "Kernel support for 32-bit SPARC binary compatibility" - help - This allows you to run 32-bit binaries on your 64-bit SPARC system. - Everybody wants this; say Y. - config COMPAT bool - depends on SPARC32_COMPAT default y select COMPAT_BINFMT_ELF diff --git a/arch/sparc64/defconfig b/arch/sparc64/defconfig index 92f79680f70d..aff93c9d13f4 100644 --- a/arch/sparc64/defconfig +++ b/arch/sparc64/defconfig @@ -1,7 +1,7 @@ # # Automatically generated make config: don't edit -# Linux kernel version: 2.6.25-numa -# Wed Apr 23 04:49:08 2008 +# Linux kernel version: 2.6.25 +# Sat Apr 26 03:11:06 2008 # CONFIG_SPARC=y CONFIG_SPARC64=y @@ -152,7 +152,9 @@ CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_HUGETLB_PAGE_SIZE_4MB=y # CONFIG_HUGETLB_PAGE_SIZE_512K is not set # CONFIG_HUGETLB_PAGE_SIZE_64K is not set -# CONFIG_NUMA is not set +CONFIG_NUMA=y +CONFIG_NODES_SHIFT=4 +CONFIG_NODES_SPAN_OTHER_NODES=y CONFIG_ARCH_POPULATES_NODE_MAP=y CONFIG_ARCH_SELECT_MEMORY_MODEL=y CONFIG_ARCH_SPARSEMEM_ENABLE=y @@ -162,12 +164,14 @@ CONFIG_SELECT_MEMORY_MODEL=y # CONFIG_DISCONTIGMEM_MANUAL is not set CONFIG_SPARSEMEM_MANUAL=y CONFIG_SPARSEMEM=y +CONFIG_NEED_MULTIPLE_NODES=y CONFIG_HAVE_MEMORY_PRESENT=y # CONFIG_SPARSEMEM_STATIC is not set CONFIG_SPARSEMEM_EXTREME=y CONFIG_SPARSEMEM_VMEMMAP_ENABLE=y CONFIG_SPARSEMEM_VMEMMAP=y CONFIG_SPLIT_PTLOCK_CPUS=4 +CONFIG_MIGRATION=y CONFIG_RESOURCES_64BIT=y CONFIG_ZONE_DMA_FLAG=0 CONFIG_NR_QUICK=1 @@ -191,7 +195,6 @@ CONFIG_SUN_OPENPROMFS=m CONFIG_BINFMT_ELF=y CONFIG_COMPAT_BINFMT_ELF=y CONFIG_BINFMT_MISC=m -CONFIG_SPARC32_COMPAT=y CONFIG_COMPAT=y CONFIG_SYSVIPC_COMPAT=y CONFIG_SCHED_SMT=y @@ -746,13 +749,7 @@ CONFIG_DEVPORT=y CONFIG_I2C=y CONFIG_I2C_BOARDINFO=y # CONFIG_I2C_CHARDEV is not set - -# -# I2C Algorithms -# CONFIG_I2C_ALGOBIT=y -# CONFIG_I2C_ALGOPCF is not set -# CONFIG_I2C_ALGOPCA is not set # # I2C Hardware Bus support @@ -780,6 +777,7 @@ CONFIG_I2C_ALGOBIT=y # CONFIG_I2C_VIA is not set # CONFIG_I2C_VIAPRO is not set # CONFIG_I2C_VOODOO3 is not set +# CONFIG_I2C_PCA_PLATFORM is not set # # Miscellaneous I2C Chip support @@ -1026,6 +1024,7 @@ CONFIG_SND_ALI5451=m # CONFIG_SND_AU8810 is not set # CONFIG_SND_AU8820 is not set # CONFIG_SND_AU8830 is not set +# CONFIG_SND_AW2 is not set # CONFIG_SND_AZT3328 is not set # CONFIG_SND_BT87X is not set # CONFIG_SND_CA0106 is not set @@ -1096,10 +1095,6 @@ CONFIG_SND_SUN_CS4231=m # # CONFIG_SND_SOC is not set -# -# SoC Audio support for SuperH -# - # # ALSA SoC audio for Freescale SOCs # diff --git a/arch/sparc64/kernel/Makefile b/arch/sparc64/kernel/Makefile index 63c6ae0dd273..029558222c8f 100644 --- a/arch/sparc64/kernel/Makefile +++ b/arch/sparc64/kernel/Makefile @@ -20,12 +20,12 @@ obj-$(CONFIG_PCI) += ebus.o isa.o pci_common.o \ pci_sun4v.o pci_sun4v_asm.o pci_fire.o obj-$(CONFIG_PCI_MSI) += pci_msi.o obj-$(CONFIG_SMP) += smp.o trampoline.o hvtramp.o -obj-$(CONFIG_SPARC32_COMPAT) += sys32.o sys_sparc32.o signal32.o +obj-$(CONFIG_COMPAT) += sys32.o sys_sparc32.o signal32.o obj-$(CONFIG_MODULES) += module.o obj-$(CONFIG_US3_FREQ) += us3_cpufreq.o obj-$(CONFIG_US2E_FREQ) += us2e_cpufreq.o obj-$(CONFIG_KPROBES) += kprobes.o obj-$(CONFIG_SUN_LDOMS) += ldc.o vio.o viohs.o ds.o obj-$(CONFIG_AUDIT) += audit.o -obj-$(CONFIG_AUDIT)$(CONFIG_SPARC32_COMPAT) += compat_audit.o +obj-$(CONFIG_AUDIT)$(CONFIG_COMPAT) += compat_audit.o obj-y += $(obj-yy) diff --git a/arch/sparc64/kernel/audit.c b/arch/sparc64/kernel/audit.c index 24d7f4b4178a..8fff0ac63d56 100644 --- a/arch/sparc64/kernel/audit.c +++ b/arch/sparc64/kernel/audit.c @@ -30,7 +30,7 @@ static unsigned signal_class[] = { int audit_classify_arch(int arch) { -#ifdef CONFIG_SPARC32_COMPAT +#ifdef CONFIG_COMPAT if (arch == AUDIT_ARCH_SPARC) return 1; #endif @@ -39,7 +39,7 @@ int audit_classify_arch(int arch) int audit_classify_syscall(int abi, unsigned syscall) { -#ifdef CONFIG_SPARC32_COMPAT +#ifdef CONFIG_COMPAT extern int sparc32_classify_syscall(unsigned); if (abi == AUDIT_ARCH_SPARC) return sparc32_classify_syscall(syscall); @@ -60,7 +60,7 @@ int audit_classify_syscall(int abi, unsigned syscall) static int __init audit_classes_init(void) { -#ifdef CONFIG_SPARC32_COMPAT +#ifdef CONFIG_COMPAT extern __u32 sparc32_dir_class[]; extern __u32 sparc32_write_class[]; extern __u32 sparc32_read_class[]; diff --git a/arch/sparc64/kernel/signal.c b/arch/sparc64/kernel/signal.c index 77a3e8592cbc..6afa5ef536eb 100644 --- a/arch/sparc64/kernel/signal.c +++ b/arch/sparc64/kernel/signal.c @@ -8,7 +8,7 @@ * Copyright (C) 1997,1998 Jakub Jelinek (jj@sunsite.mff.cuni.cz) */ -#ifdef CONFIG_SPARC32_COMPAT +#ifdef CONFIG_COMPAT #include /* for compat_old_sigset_t */ #endif #include @@ -531,7 +531,7 @@ static void do_signal(struct pt_regs *regs, unsigned long orig_i0) else oldset = ¤t->blocked; -#ifdef CONFIG_SPARC32_COMPAT +#ifdef CONFIG_COMPAT if (test_thread_flag(TIF_32BIT)) { extern void do_signal32(sigset_t *, struct pt_regs *, struct signal_deliver_cookie *); diff --git a/init/Kconfig b/init/Kconfig index ba3a389fab94..f1f22db74d5a 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -521,7 +521,7 @@ menuconfig EMBEDDED config UID16 bool "Enable 16-bit UID system calls" if EMBEDDED - depends on ARM || BLACKFIN || CRIS || FRV || H8300 || X86_32 || M68K || (S390 && !64BIT) || SUPERH || SPARC32 || (SPARC64 && SPARC32_COMPAT) || UML || (X86_64 && IA32_EMULATION) + depends on ARM || BLACKFIN || CRIS || FRV || H8300 || X86_32 || M68K || (S390 && !64BIT) || SUPERH || SPARC32 || (SPARC64 && COMPAT) || UML || (X86_64 && IA32_EMULATION) default y help This enables the legacy 16-bit UID syscall wrappers. -- cgit v1.2.3 From 0eb78f0b1a0f61b292380028b0debd5af7b3838a Mon Sep 17 00:00:00 2001 From: "David S. Miller" Date: Sat, 26 Apr 2008 03:35:02 -0700 Subject: sparc64: Kill ISA_FLOPPY_WORKS code. This never was enabled, I could never get it working, and if anyone wants to try and get it's very easy to reference this code in the history. It's the only thing referencing the silly ISA device layer in the sparc64 tree. OF device layer infrastructure is what should be used for these things. Signed-off-by: David S. Miller --- include/asm-sparc64/floppy.h | 83 +------------------------------------------- 1 file changed, 1 insertion(+), 82 deletions(-) diff --git a/include/asm-sparc64/floppy.h b/include/asm-sparc64/floppy.h index c47f58d6c15c..040d7962c5a3 100644 --- a/include/asm-sparc64/floppy.h +++ b/include/asm-sparc64/floppy.h @@ -558,82 +558,6 @@ static int __init ebus_fdthree_p(struct linux_ebus_device *edev) } #endif -#ifdef CONFIG_PCI -#undef ISA_FLOPPY_WORKS - -#ifdef ISA_FLOPPY_WORKS -static unsigned long __init isa_floppy_init(void) -{ - struct sparc_isa_bridge *isa_br; - struct sparc_isa_device *isa_dev = NULL; - - for_each_isa(isa_br) { - for_each_isadev(isa_dev, isa_br) { - if (!strcmp(isa_dev->prom_node->name, "dma")) { - struct sparc_isa_device *child = - isa_dev->child; - - while (child) { - if (!strcmp(child->prom_node->name, - "floppy")) { - isa_dev = child; - goto isa_done; - } - child = child->next; - } - } - } - } -isa_done: - if (!isa_dev) - return 0; - - /* We could use DMA on devices behind the ISA bridge, but... - * - * There is a slight problem. Normally on x86 kit the x86 processor - * delays I/O port instructions when the ISA bus "dma in progress" - * signal is active. Well, sparc64 systems do not monitor this - * signal thus we would need to block all I/O port accesses in software - * when a dma transfer is active for some device. - */ - - sun_fdc = (struct sun_flpy_controller *)isa_dev->resource.start; - FLOPPY_IRQ = isa_dev->irq; - - sun_fdops.fd_inb = sun_pci_fd_inb; - sun_fdops.fd_outb = sun_pci_fd_outb; - - can_use_virtual_dma = use_virtual_dma = 1; - sun_fdops.fd_enable_dma = sun_fd_enable_dma; - sun_fdops.fd_disable_dma = sun_fd_disable_dma; - sun_fdops.fd_set_dma_mode = sun_fd_set_dma_mode; - sun_fdops.fd_set_dma_addr = sun_fd_set_dma_addr; - sun_fdops.fd_set_dma_count = sun_fd_set_dma_count; - sun_fdops.get_dma_residue = sun_get_dma_residue; - - sun_fdops.fd_request_irq = sun_fd_request_irq; - sun_fdops.fd_free_irq = sun_fd_free_irq; - - /* Floppy eject is manual. Actually, could determine this - * via presence of 'manual' property in OBP node. - */ - sun_fdops.fd_eject = sun_pci_fd_eject; - - fdc_status = (unsigned long) &sun_fdc->status_82077; - - allowed_drive_mask = 0; - sun_floppy_types[0] = 0; - sun_floppy_types[1] = 4; - - sun_pci_broken_drive = 1; - sun_fdops.fd_outb = sun_pci_fd_broken_outb; - - return sun_floppy_types[0]; -} -#endif /* ISA_FLOPPY_WORKS */ - -#endif - static unsigned long __init sun_floppy_init(void) { char state[128]; @@ -667,13 +591,8 @@ static unsigned long __init sun_floppy_init(void) } } ebus_done: - if (!edev) { -#ifdef ISA_FLOPPY_WORKS - return isa_floppy_init(); -#else + if (!edev) return 0; -#endif - } state_prop = of_get_property(edev->prom_node, "status", NULL); if (state_prop && !strncmp(state_prop, "disabled", 8)) -- cgit v1.2.3 From dc8ca2a111c10f031346f6f8f82640d6bc0dd347 Mon Sep 17 00:00:00 2001 From: "David S. Miller" Date: Sat, 26 Apr 2008 20:59:52 -0700 Subject: sparc64: Do not ignore 'pmu' device ranges. I must have disabled this due to other bugs which were fixed over time. And this is needed in order for child devices of "pmu" to get proper resource values. Signed-off-by: David S. Miller --- arch/sparc64/kernel/of_device.c | 6 ------ 1 file changed, 6 deletions(-) diff --git a/arch/sparc64/kernel/of_device.c b/arch/sparc64/kernel/of_device.c index 9e58e8cba1c3..d569f60c24b8 100644 --- a/arch/sparc64/kernel/of_device.c +++ b/arch/sparc64/kernel/of_device.c @@ -412,12 +412,6 @@ static int __init build_one_resource(struct device_node *parent, static int __init use_1to1_mapping(struct device_node *pp) { - /* If this is on the PMU bus, don't try to translate it even - * if a ranges property exists. - */ - if (!strcmp(pp->name, "pmu")) - return 1; - /* If we have a ranges property in the parent, use it. */ if (of_find_property(pp, "ranges", NULL) != NULL) return 0; -- cgit v1.2.3 From 9c1a5077fdca99356c891af37931e537dea874f5 Mon Sep 17 00:00:00 2001 From: "David S. Miller" Date: Sat, 26 Apr 2008 21:02:21 -0700 Subject: input: Rewrite sparcspkr device probing. Remove all dependencies on EBUS and ISA bus layers, which we'd like to remove as they are superfluous. While we're here, add support for proper frequency changing on BBC beep devices. Unlike the comments that were here, this device can in fact use a programmable frequency. Signed-off-by: David S. Miller --- drivers/input/misc/sparcspkr.c | 262 +++++++++++++++++++++++++++++------------ 1 file changed, 184 insertions(+), 78 deletions(-) diff --git a/drivers/input/misc/sparcspkr.c b/drivers/input/misc/sparcspkr.c index fed3c375ccf3..d8765cc93d27 100644 --- a/drivers/input/misc/sparcspkr.c +++ b/drivers/input/misc/sparcspkr.c @@ -2,33 +2,69 @@ * Driver for PC-speaker like devices found on various Sparc systems. * * Copyright (c) 2002 Vojtech Pavlik - * Copyright (c) 2002, 2006 David S. Miller (davem@davemloft.net) + * Copyright (c) 2002, 2006, 2008 David S. Miller (davem@davemloft.net) */ #include #include #include #include -#include +#include #include -#include -#include MODULE_AUTHOR("David S. Miller "); MODULE_DESCRIPTION("Sparc Speaker beeper driver"); MODULE_LICENSE("GPL"); +struct grover_beep_info { + void __iomem *freq_regs; + void __iomem *enable_reg; +}; + +struct bbc_beep_info { + u32 clock_freq; + void __iomem *regs; +}; + struct sparcspkr_state { const char *name; - unsigned long iobase; int (*event)(struct input_dev *dev, unsigned int type, unsigned int code, int value); spinlock_t lock; struct input_dev *input_dev; + union { + struct grover_beep_info grover; + struct bbc_beep_info bbc; + } u; }; -static int ebus_spkr_event(struct input_dev *dev, unsigned int type, unsigned int code, int value) +static u32 bbc_count_to_reg(struct bbc_beep_info *info, unsigned int count) +{ + u32 val, clock_freq = info->clock_freq; + int i; + + if (!count) + return 0; + + if (count <= clock_freq >> 20) + return 1 << 18; + + if (count >= clock_freq >> 12) + return 1 << 10; + + val = 1 << 18; + for (i = 19; i >= 11; i--) { + val >>= 1; + if (count <= clock_freq >> i) + break; + } + + return val; +} + +static int bbc_spkr_event(struct input_dev *dev, unsigned int type, unsigned int code, int value) { struct sparcspkr_state *state = dev_get_drvdata(dev->dev.parent); + struct bbc_beep_info *info = &state->u.bbc; unsigned int count = 0; unsigned long flags; @@ -44,24 +80,29 @@ static int ebus_spkr_event(struct input_dev *dev, unsigned int type, unsigned in if (value > 20 && value < 32767) count = 1193182 / value; + count = bbc_count_to_reg(info, count); + spin_lock_irqsave(&state->lock, flags); - /* EBUS speaker only has on/off state, the frequency does not - * appear to be programmable. - */ - if (state->iobase & 0x2UL) - outb(!!count, state->iobase); - else - outl(!!count, state->iobase); + if (count) { + outb(0x01, info->regs + 0); + outb(0x00, info->regs + 2); + outb((count >> 16) & 0xff, info->regs + 3); + outb((count >> 8) & 0xff, info->regs + 4); + outb(0x00, info->regs + 5); + } else { + outb(0x00, info->regs + 0); + } spin_unlock_irqrestore(&state->lock, flags); return 0; } -static int isa_spkr_event(struct input_dev *dev, unsigned int type, unsigned int code, int value) +static int grover_spkr_event(struct input_dev *dev, unsigned int type, unsigned int code, int value) { struct sparcspkr_state *state = dev_get_drvdata(dev->dev.parent); + struct grover_beep_info *info = &state->u.grover; unsigned int count = 0; unsigned long flags; @@ -81,15 +122,15 @@ static int isa_spkr_event(struct input_dev *dev, unsigned int type, unsigned int if (count) { /* enable counter 2 */ - outb(inb(state->iobase + 0x61) | 3, state->iobase + 0x61); + outb(inb(info->enable_reg) | 3, info->enable_reg); /* set command for counter 2, 2 byte write */ - outb(0xB6, state->iobase + 0x43); + outb(0xB6, info->freq_regs + 1); /* select desired HZ */ - outb(count & 0xff, state->iobase + 0x42); - outb((count >> 8) & 0xff, state->iobase + 0x42); + outb(count & 0xff, info->freq_regs + 0); + outb((count >> 8) & 0xff, info->freq_regs + 0); } else { /* disable counter 2 */ - outb(inb_p(state->iobase + 0x61) & 0xFC, state->iobase + 0x61); + outb(inb_p(info->enable_reg) & 0xFC, info->enable_reg); } spin_unlock_irqrestore(&state->lock, flags); @@ -131,7 +172,7 @@ static int __devinit sparcspkr_probe(struct device *dev) return 0; } -static int __devexit sparcspkr_remove(struct of_device *dev) +static int sparcspkr_shutdown(struct of_device *dev) { struct sparcspkr_state *state = dev_get_drvdata(&dev->dev); struct input_dev *input_dev = state->input_dev; @@ -139,115 +180,180 @@ static int __devexit sparcspkr_remove(struct of_device *dev) /* turn off the speaker */ state->event(input_dev, EV_SND, SND_BELL, 0); - input_unregister_device(input_dev); - - dev_set_drvdata(&dev->dev, NULL); - kfree(state); - return 0; } -static int sparcspkr_shutdown(struct of_device *dev) +static int __devinit bbc_beep_probe(struct of_device *op, const struct of_device_id *match) { - struct sparcspkr_state *state = dev_get_drvdata(&dev->dev); - struct input_dev *input_dev = state->input_dev; + struct sparcspkr_state *state; + struct bbc_beep_info *info; + struct device_node *dp; + int err = -ENOMEM; - /* turn off the speaker */ - state->event(input_dev, EV_SND, SND_BELL, 0); + state = kzalloc(sizeof(*state), GFP_KERNEL); + if (!state) + goto out_err; + + state->name = "Sparc BBC Speaker"; + state->event = bbc_spkr_event; + spin_lock_init(&state->lock); + + dp = of_find_node_by_path("/"); + err = -ENODEV; + if (!dp) + goto out_free; + + info = &state->u.bbc; + info->clock_freq = of_getintprop_default(dp, "clock-frequency", 0); + if (!info->clock_freq) + goto out_free; + + info->regs = of_ioremap(&op->resource[0], 0, 6, "bbc beep"); + if (!info->regs) + goto out_free; + + dev_set_drvdata(&op->dev, state); + + err = sparcspkr_probe(&op->dev); + if (err) + goto out_clear_drvdata; return 0; + +out_clear_drvdata: + dev_set_drvdata(&op->dev, NULL); + of_iounmap(&op->resource[0], info->regs, 6); + +out_free: + kfree(state); +out_err: + return err; } -static int __devinit ebus_beep_probe(struct of_device *dev, const struct of_device_id *match) +static int bbc_remove(struct of_device *op) { - struct linux_ebus_device *edev = to_ebus_device(&dev->dev); - struct sparcspkr_state *state; - int err; + struct sparcspkr_state *state = dev_get_drvdata(&op->dev); + struct input_dev *input_dev = state->input_dev; + struct bbc_beep_info *info = &state->u.bbc; - state = kzalloc(sizeof(*state), GFP_KERNEL); - if (!state) - return -ENOMEM; + /* turn off the speaker */ + state->event(input_dev, EV_SND, SND_BELL, 0); - state->name = "Sparc EBUS Speaker"; - state->iobase = edev->resource[0].start; - state->event = ebus_spkr_event; - spin_lock_init(&state->lock); + input_unregister_device(input_dev); - dev_set_drvdata(&dev->dev, state); + of_iounmap(&op->resource[0], info->regs, 6); - err = sparcspkr_probe(&dev->dev); - if (err) { - dev_set_drvdata(&dev->dev, NULL); - kfree(state); - } + dev_set_drvdata(&op->dev, NULL); + kfree(state); return 0; } -static struct of_device_id ebus_beep_match[] = { +static struct of_device_id bbc_beep_match[] = { { .name = "beep", + .compatible = "SUNW,bbc-beep", }, {}, }; -static struct of_platform_driver ebus_beep_driver = { - .name = "beep", - .match_table = ebus_beep_match, - .probe = ebus_beep_probe, - .remove = __devexit_p(sparcspkr_remove), +static struct of_platform_driver bbc_beep_driver = { + .name = "bbcbeep", + .match_table = bbc_beep_match, + .probe = bbc_beep_probe, + .remove = __devexit_p(bbc_remove), .shutdown = sparcspkr_shutdown, }; -static int __devinit isa_beep_probe(struct of_device *dev, const struct of_device_id *match) +static int __devinit grover_beep_probe(struct of_device *op, const struct of_device_id *match) { - struct sparc_isa_device *idev = to_isa_device(&dev->dev); struct sparcspkr_state *state; - int err; + struct grover_beep_info *info; + int err = -ENOMEM; state = kzalloc(sizeof(*state), GFP_KERNEL); if (!state) - return -ENOMEM; + goto out_err; - state->name = "Sparc ISA Speaker"; - state->iobase = idev->resource.start; - state->event = isa_spkr_event; + state->name = "Sparc Grover Speaker"; + state->event = grover_spkr_event; spin_lock_init(&state->lock); - dev_set_drvdata(&dev->dev, state); + info = &state->u.grover; + info->freq_regs = of_ioremap(&op->resource[2], 0, 2, "grover beep freq"); + if (!info->freq_regs) + goto out_free; - err = sparcspkr_probe(&dev->dev); - if (err) { - dev_set_drvdata(&dev->dev, NULL); - kfree(state); - } + info->enable_reg = of_ioremap(&op->resource[3], 0, 1, "grover beep enable"); + if (!info->enable_reg) + goto out_unmap_freq_regs; + + dev_set_drvdata(&op->dev, state); + + err = sparcspkr_probe(&op->dev); + if (err) + goto out_clear_drvdata; + + return 0; + +out_clear_drvdata: + dev_set_drvdata(&op->dev, NULL); + of_iounmap(&op->resource[3], info->enable_reg, 1); + +out_unmap_freq_regs: + of_iounmap(&op->resource[2], info->freq_regs, 2); +out_free: + kfree(state); +out_err: + return err; +} + +static int grover_remove(struct of_device *op) +{ + struct sparcspkr_state *state = dev_get_drvdata(&op->dev); + struct grover_beep_info *info = &state->u.grover; + struct input_dev *input_dev = state->input_dev; + + /* turn off the speaker */ + state->event(input_dev, EV_SND, SND_BELL, 0); + + input_unregister_device(input_dev); + + of_iounmap(&op->resource[3], info->enable_reg, 1); + of_iounmap(&op->resource[2], info->freq_regs, 2); + + dev_set_drvdata(&op->dev, NULL); + kfree(state); return 0; } -static struct of_device_id isa_beep_match[] = { +static struct of_device_id grover_beep_match[] = { { - .name = "dma", + .name = "beep", + .compatible = "SUNW,smbus-beep", }, {}, }; -static struct of_platform_driver isa_beep_driver = { - .name = "beep", - .match_table = isa_beep_match, - .probe = isa_beep_probe, - .remove = __devexit_p(sparcspkr_remove), +static struct of_platform_driver grover_beep_driver = { + .name = "groverbeep", + .match_table = grover_beep_match, + .probe = grover_beep_probe, + .remove = __devexit_p(grover_remove), .shutdown = sparcspkr_shutdown, }; static int __init sparcspkr_init(void) { - int err = of_register_driver(&ebus_beep_driver, &ebus_bus_type); + int err = of_register_driver(&bbc_beep_driver, + &of_platform_bus_type); if (!err) { - err = of_register_driver(&isa_beep_driver, &isa_bus_type); + err = of_register_driver(&grover_beep_driver, + &of_platform_bus_type); if (err) - of_unregister_driver(&ebus_beep_driver); + of_unregister_driver(&bbc_beep_driver); } return err; @@ -255,8 +361,8 @@ static int __init sparcspkr_init(void) static void __exit sparcspkr_exit(void) { - of_unregister_driver(&ebus_beep_driver); - of_unregister_driver(&isa_beep_driver); + of_unregister_driver(&bbc_beep_driver); + of_unregister_driver(&grover_beep_driver); } module_init(sparcspkr_init); -- cgit v1.2.3 From 5da496e4b90626784a82803682e186a8e67222a0 Mon Sep 17 00:00:00 2001 From: "David S. Miller" Date: Sat, 26 Apr 2008 21:07:35 -0700 Subject: sparc64: Kill unused local ISA bus layer. No more drivers use this, and therefore it can die. Signed-off-by: David S. Miller --- arch/sparc64/kernel/Makefile | 2 +- arch/sparc64/kernel/isa.c | 191 ------------------------------------ arch/sparc64/kernel/pci.c | 2 - arch/sparc64/kernel/sparc64_ksyms.c | 2 - include/asm-sparc64/floppy.h | 1 - include/asm-sparc64/isa.h | 47 --------- 6 files changed, 1 insertion(+), 244 deletions(-) delete mode 100644 arch/sparc64/kernel/isa.c delete mode 100644 include/asm-sparc64/isa.h diff --git a/arch/sparc64/kernel/Makefile b/arch/sparc64/kernel/Makefile index 029558222c8f..2bd0340b743d 100644 --- a/arch/sparc64/kernel/Makefile +++ b/arch/sparc64/kernel/Makefile @@ -15,7 +15,7 @@ obj-y := process.o setup.o cpu.o idprom.o \ visemul.o prom.o of_device.o hvapi.o sstate.o mdesc.o obj-$(CONFIG_STACKTRACE) += stacktrace.o -obj-$(CONFIG_PCI) += ebus.o isa.o pci_common.o \ +obj-$(CONFIG_PCI) += ebus.o pci_common.o \ pci_psycho.o pci_sabre.o pci_schizo.o \ pci_sun4v.o pci_sun4v_asm.o pci_fire.o obj-$(CONFIG_PCI_MSI) += pci_msi.o diff --git a/arch/sparc64/kernel/isa.c b/arch/sparc64/kernel/isa.c deleted file mode 100644 index a2af5ed784c9..000000000000 --- a/arch/sparc64/kernel/isa.c +++ /dev/null @@ -1,191 +0,0 @@ -#include -#include -#include -#include -#include -#include -#include -#include - -struct sparc_isa_bridge *isa_chain; - -static void __init fatal_err(const char *reason) -{ - prom_printf("ISA: fatal error, %s.\n", reason); -} - -static void __init report_dev(struct sparc_isa_device *isa_dev, int child) -{ - if (child) - printk(" (%s)", isa_dev->prom_node->name); - else - printk(" [%s", isa_dev->prom_node->name); -} - -static void __init isa_dev_get_resource(struct sparc_isa_device *isa_dev) -{ - struct of_device *op = of_find_device_by_node(isa_dev->prom_node); - - memcpy(&isa_dev->resource, &op->resource[0], sizeof(struct resource)); -} - -static void __init isa_dev_get_irq(struct sparc_isa_device *isa_dev) -{ - struct of_device *op = of_find_device_by_node(isa_dev->prom_node); - - if (!op || !op->num_irqs) { - isa_dev->irq = PCI_IRQ_NONE; - } else { - isa_dev->irq = op->irqs[0]; - } -} - -static void __init isa_fill_children(struct sparc_isa_device *parent_isa_dev) -{ - struct device_node *dp = parent_isa_dev->prom_node->child; - - if (!dp) - return; - - printk(" ->"); - while (dp) { - struct sparc_isa_device *isa_dev; - - isa_dev = kzalloc(sizeof(*isa_dev), GFP_KERNEL); - if (!isa_dev) { - fatal_err("cannot allocate child isa_dev"); - prom_halt(); - } - - /* Link it in to parent. */ - isa_dev->next = parent_isa_dev->child; - parent_isa_dev->child = isa_dev; - - isa_dev->bus = parent_isa_dev->bus; - isa_dev->prom_node = dp; - - isa_dev_get_resource(isa_dev); - isa_dev_get_irq(isa_dev); - - report_dev(isa_dev, 1); - - dp = dp->sibling; - } -} - -static void __init isa_fill_devices(struct sparc_isa_bridge *isa_br) -{ - struct device_node *dp = isa_br->prom_node->child; - - while (dp) { - struct sparc_isa_device *isa_dev; - struct dev_archdata *sd; - - isa_dev = kzalloc(sizeof(*isa_dev), GFP_KERNEL); - if (!isa_dev) { - printk(KERN_DEBUG "ISA: cannot allocate isa_dev"); - return; - } - - sd = &isa_dev->ofdev.dev.archdata; - sd->prom_node = dp; - sd->op = &isa_dev->ofdev; - sd->iommu = isa_br->ofdev.dev.parent->archdata.iommu; - sd->stc = isa_br->ofdev.dev.parent->archdata.stc; - sd->numa_node = isa_br->ofdev.dev.parent->archdata.numa_node; - - isa_dev->ofdev.node = dp; - isa_dev->ofdev.dev.parent = &isa_br->ofdev.dev; - isa_dev->ofdev.dev.bus = &isa_bus_type; - sprintf(isa_dev->ofdev.dev.bus_id, "isa[%08x]", dp->node); - - /* Register with core */ - if (of_device_register(&isa_dev->ofdev) != 0) { - printk(KERN_DEBUG "isa: device registration error for %s!\n", - dp->path_component_name); - kfree(isa_dev); - goto next_sibling; - } - - /* Link it in. */ - isa_dev->next = NULL; - if (isa_br->devices == NULL) { - isa_br->devices = isa_dev; - } else { - struct sparc_isa_device *tmp = isa_br->devices; - - while (tmp->next) - tmp = tmp->next; - - tmp->next = isa_dev; - } - - isa_dev->bus = isa_br; - isa_dev->prom_node = dp; - - isa_dev_get_resource(isa_dev); - isa_dev_get_irq(isa_dev); - - report_dev(isa_dev, 0); - - isa_fill_children(isa_dev); - - printk("]"); - - next_sibling: - dp = dp->sibling; - } -} - -void __init isa_init(void) -{ - struct pci_dev *pdev; - unsigned short vendor, device; - int index = 0; - - vendor = PCI_VENDOR_ID_AL; - device = PCI_DEVICE_ID_AL_M1533; - - pdev = NULL; - while ((pdev = pci_get_device(vendor, device, pdev)) != NULL) { - struct sparc_isa_bridge *isa_br; - struct device_node *dp; - - dp = pci_device_to_OF_node(pdev); - - isa_br = kzalloc(sizeof(*isa_br), GFP_KERNEL); - if (!isa_br) { - printk(KERN_DEBUG "isa: cannot allocate sparc_isa_bridge"); - pci_dev_put(pdev); - return; - } - - isa_br->ofdev.node = dp; - isa_br->ofdev.dev.parent = &pdev->dev; - isa_br->ofdev.dev.bus = &isa_bus_type; - sprintf(isa_br->ofdev.dev.bus_id, "isa%d", index); - - /* Register with core */ - if (of_device_register(&isa_br->ofdev) != 0) { - printk(KERN_DEBUG "isa: device registration error for %s!\n", - dp->path_component_name); - kfree(isa_br); - pci_dev_put(pdev); - return; - } - - /* Link it in. */ - isa_br->next = isa_chain; - isa_chain = isa_br; - - isa_br->self = pdev; - isa_br->index = index++; - isa_br->prom_node = dp; - - printk("isa%d:", isa_br->index); - - isa_fill_devices(isa_br); - - printk("\n"); - } -} diff --git a/arch/sparc64/kernel/pci.c b/arch/sparc64/kernel/pci.c index 49f912766519..dbf2fc2f4d87 100644 --- a/arch/sparc64/kernel/pci.c +++ b/arch/sparc64/kernel/pci.c @@ -23,7 +23,6 @@ #include #include #include -#include #include #include @@ -885,7 +884,6 @@ static int __init pcibios_init(void) pci_scan_each_controller_bus(); - isa_init(); ebus_init(); power_init(); diff --git a/arch/sparc64/kernel/sparc64_ksyms.c b/arch/sparc64/kernel/sparc64_ksyms.c index 66336590e830..8ac0b99f2c55 100644 --- a/arch/sparc64/kernel/sparc64_ksyms.c +++ b/arch/sparc64/kernel/sparc64_ksyms.c @@ -49,7 +49,6 @@ #endif #ifdef CONFIG_PCI #include -#include #endif #include #include @@ -187,7 +186,6 @@ EXPORT_SYMBOL(insw); EXPORT_SYMBOL(insl); #ifdef CONFIG_PCI EXPORT_SYMBOL(ebus_chain); -EXPORT_SYMBOL(isa_chain); EXPORT_SYMBOL(pci_alloc_consistent); EXPORT_SYMBOL(pci_free_consistent); EXPORT_SYMBOL(pci_map_single); diff --git a/include/asm-sparc64/floppy.h b/include/asm-sparc64/floppy.h index 040d7962c5a3..ca19f80a9b7d 100644 --- a/include/asm-sparc64/floppy.h +++ b/include/asm-sparc64/floppy.h @@ -293,7 +293,6 @@ static int sun_fd_eject(int drive) #ifdef CONFIG_PCI #include -#include #include static struct ebus_dma_info sun_pci_fd_ebus_dma; diff --git a/include/asm-sparc64/isa.h b/include/asm-sparc64/isa.h deleted file mode 100644 index ecd9290f78d4..000000000000 --- a/include/asm-sparc64/isa.h +++ /dev/null @@ -1,47 +0,0 @@ -/* $Id: isa.h,v 1.1 2001/05/11 04:31:55 davem Exp $ - * isa.h: Sparc64 layer for PCI to ISA bridge devices. - * - * Copyright (C) 2001 David S. Miller (davem@redhat.com) - */ - -#ifndef __SPARC64_ISA_H -#define __SPARC64_ISA_H - -#include -#include -#include - -struct sparc_isa_bridge; - -struct sparc_isa_device { - struct of_device ofdev; - struct sparc_isa_device *next; - struct sparc_isa_device *child; - struct sparc_isa_bridge *bus; - struct device_node *prom_node; - struct resource resource; - unsigned int irq; -}; -#define to_isa_device(d) container_of(d, struct sparc_isa_device, ofdev.dev) - -struct sparc_isa_bridge { - struct of_device ofdev; - struct sparc_isa_bridge *next; - struct sparc_isa_device *devices; - struct pci_dev *self; - int index; - struct device_node *prom_node; -}; -#define to_isa_bridge(d) container_of(d, struct sparc_isa_bridge, ofdev.dev) - -extern struct sparc_isa_bridge *isa_chain; - -extern void isa_init(void); - -#define for_each_isa(bus) \ - for((bus) = isa_chain; (bus); (bus) = (bus)->next) - -#define for_each_isadev(dev, bus) \ - for((dev) = (bus)->devices; (dev); (dev) = (dev)->next) - -#endif /* !(__SPARC64_ISA_H) */ -- cgit v1.2.3 From 403ae52ac047eb339f2b7e8cdf93a3b8077914db Mon Sep 17 00:00:00 2001 From: Robert Reif Date: Sat, 26 Apr 2008 22:29:43 -0700 Subject: sparc: fix drivers/video/tcx.c warning MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Fix compile warning: CC drivers/video/tcx.o drivers/video/tcx.c: In function ‘tcx_init_one’: drivers/video/tcx.c:477: warning: format ‘%lx’ expects type ‘long unsigned int’, but argument 4 has type ‘resource_size_t’ This was the only sparc driver to use the resource directly in the printk so I changed it to physbase like the other drivers. Boot tested on SS4. Signed-off-by: Robert Reif Signed-off-by: David S. Miller --- drivers/video/tcx.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/video/tcx.c b/drivers/video/tcx.c index e5a9ddb3c8be..fd94dfbab44b 100644 --- a/drivers/video/tcx.c +++ b/drivers/video/tcx.c @@ -419,7 +419,7 @@ static int __devinit tcx_init_one(struct of_device *op) par->mmap_map[6].size = SBUS_MMAP_EMPTY; } - par->physbase = 0; + par->physbase = op->resource[0].start; par->which_io = op->resource[0].flags & IORESOURCE_BITS; for (i = 0; i < TCX_MMAP_ENTRIES; i++) { @@ -473,7 +473,7 @@ static int __devinit tcx_init_one(struct of_device *op) printk("%s: TCX at %lx:%lx, %s\n", dp->full_name, par->which_io, - op->resource[0].start, + par->physbase, par->lowdepth ? "8-bit only" : "24-bit depth"); return 0; -- cgit v1.2.3 From 3ade11601f4a3a38d6cd3675ccc87bf11e251915 Mon Sep 17 00:00:00 2001 From: Robert Reif Date: Sat, 26 Apr 2008 23:10:19 -0700 Subject: sparc: sunzilog.c remove unused argument Remove unused argument in function call. Signed-off-by: Robert Reif Signed-off-by: David S. Miller --- drivers/serial/sunzilog.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/serial/sunzilog.c b/drivers/serial/sunzilog.c index 3271379a36db..90a20a152ebf 100644 --- a/drivers/serial/sunzilog.c +++ b/drivers/serial/sunzilog.c @@ -1231,7 +1231,7 @@ static inline struct console *SUNZILOG_CONSOLE(void) #define SUNZILOG_CONSOLE() (NULL) #endif -static void __devinit sunzilog_init_kbdms(struct uart_sunzilog_port *up, int channel) +static void __devinit sunzilog_init_kbdms(struct uart_sunzilog_port *up) { int baud, brg; @@ -1305,7 +1305,7 @@ static void __devinit sunzilog_init_hw(struct uart_sunzilog_port *up) up->curregs[R7] = 0x7E; /* SDLC Flag */ up->curregs[R9] = NV; up->curregs[R7p] = 0x00; - sunzilog_init_kbdms(up, up->port.line); + sunzilog_init_kbdms(up); /* Only enable interrupts if an ISR handler available */ if (up->flags & SUNZILOG_FLAG_ISR_HANDLER) up->curregs[R9] |= MIE; -- cgit v1.2.3 From 7cf069955f2f0b95fed6a8b1a61ef598a3df0f4e Mon Sep 17 00:00:00 2001 From: "David S. Miller" Date: Sun, 27 Apr 2008 00:25:30 -0700 Subject: sparc64: Kill bogus RT_ALIGNEDSZ macro from signal.c The structure has to be 8-byte aligned in size, so this macro is just noise. Signed-off-by: David S. Miller --- arch/sparc64/kernel/signal.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/arch/sparc64/kernel/signal.c b/arch/sparc64/kernel/signal.c index 6afa5ef536eb..314f51aefa0f 100644 --- a/arch/sparc64/kernel/signal.c +++ b/arch/sparc64/kernel/signal.c @@ -236,9 +236,6 @@ struct rt_signal_frame { __siginfo_fpu_t fpu_state; }; -/* Align macros */ -#define RT_ALIGNEDSZ (((sizeof(struct rt_signal_frame) + 7) & (~7))) - static long _sigpause_common(old_sigset_t set) { set &= _BLOCKABLE; @@ -400,7 +397,7 @@ setup_rt_frame(struct k_sigaction *ka, struct pt_regs *regs, synchronize_user_stack(); save_and_clear_fpu(); - sigframe_size = RT_ALIGNEDSZ; + sigframe_size = sizeof(struct rt_signal_frame); if (!(current_thread_info()->fpsaved[0] & FPRS_FEF)) sigframe_size -= sizeof(__siginfo_fpu_t); -- cgit v1.2.3 From 5526b7e4513c66bc1c91f661dcd518d5199f8934 Mon Sep 17 00:00:00 2001 From: "David S. Miller" Date: Sun, 27 Apr 2008 02:26:36 -0700 Subject: sparc: Remove old style signal frame support. Back around the same time we were bootstrapping the first 32-bit sparc Linux kernel with a SunOS userland, we made the signal frame match that of SunOS. By the time we even started putting together a native Linux userland for 32-bit Sparc we realized this layout wasn't sufficient for Linux's needs. Therefore we changed the layout, yet kept support for the old style signal frame layout in there. The detection mechanism is that we had sys_sigaction() start passing in a negative signal number to indicate "new style signal frames please". Anyways, no binaries exist in the world that use the old stuff. In fact, I bet Jakub Jelinek and myself are the only two people who ever had such binaries to be honest. So let's get rid of this stuff. I added an assertion using WARN_ON_ONCE() that makes sure 32-bit applications are passing in that negative signal number still. Signed-off-by: David S. Miller --- arch/sparc/kernel/process.c | 2 - arch/sparc/kernel/signal.c | 260 +++--------------------------------- arch/sparc/kernel/sys_sparc.c | 14 +- arch/sparc64/kernel/process.c | 6 +- arch/sparc64/kernel/signal32.c | 272 +++----------------------------------- arch/sparc64/kernel/sys_sparc32.c | 11 +- include/asm-sparc/processor.h | 4 +- include/asm-sparc64/thread_info.h | 6 +- 8 files changed, 47 insertions(+), 528 deletions(-) diff --git a/arch/sparc/kernel/process.c b/arch/sparc/kernel/process.c index 70c0dd22491d..e7f35198ae34 100644 --- a/arch/sparc/kernel/process.c +++ b/arch/sparc/kernel/process.c @@ -357,8 +357,6 @@ void flush_thread(void) { current_thread_info()->w_saved = 0; - /* No new signal delivery by default */ - current->thread.new_signal = 0; #ifndef CONFIG_SMP if(last_task_used_math == current) { #else diff --git a/arch/sparc/kernel/signal.c b/arch/sparc/kernel/signal.c index 3e849e8e3480..3c312290c3c2 100644 --- a/arch/sparc/kernel/signal.c +++ b/arch/sparc/kernel/signal.c @@ -1,5 +1,4 @@ -/* $Id: signal.c,v 1.110 2002/02/08 03:57:14 davem Exp $ - * linux/arch/sparc/kernel/signal.c +/* linux/arch/sparc/kernel/signal.c * * Copyright (C) 1991, 1992 Linus Torvalds * Copyright (C) 1995 David S. Miller (davem@caip.rutgers.edu) @@ -32,37 +31,7 @@ extern void fpsave(unsigned long *fpregs, unsigned long *fsr, void *fpqueue, unsigned long *fpqdepth); extern void fpload(unsigned long *fpregs, unsigned long *fsr); -/* Signal frames: the original one (compatible with SunOS): - * - * Set up a signal frame... Make the stack look the way SunOS - * expects it to look which is basically: - * - * ---------------------------------- <-- %sp at signal time - * Struct sigcontext - * Signal address - * Ptr to sigcontext area above - * Signal code - * The signal number itself - * One register window - * ---------------------------------- <-- New %sp - */ -struct signal_sframe { - struct reg_window sig_window; - int sig_num; - int sig_code; - struct sigcontext __user *sig_scptr; - int sig_address; - struct sigcontext sig_context; - unsigned int extramask[_NSIG_WORDS - 1]; -}; - -/* - * And the new one, intended to be used for Linux applications only - * (we have enough in there to work with clone). - * All the interesting bits are in the info field. - */ - -struct new_signal_frame { +struct signal_frame { struct sparc_stackf ss; __siginfo_t info; __siginfo_fpu_t __user *fpu_save; @@ -85,8 +54,7 @@ struct rt_signal_frame { }; /* Align macros */ -#define SF_ALIGNEDSZ (((sizeof(struct signal_sframe) + 7) & (~7))) -#define NF_ALIGNEDSZ (((sizeof(struct new_signal_frame) + 7) & (~7))) +#define SF_ALIGNEDSZ (((sizeof(struct signal_frame) + 7) & (~7))) #define RT_ALIGNEDSZ (((sizeof(struct rt_signal_frame) + 7) & (~7))) static int _sigpause_common(old_sigset_t set) @@ -141,15 +109,20 @@ restore_fpu_state(struct pt_regs *regs, __siginfo_fpu_t __user *fpu) return err; } -static inline void do_new_sigreturn (struct pt_regs *regs) +asmlinkage void do_sigreturn(struct pt_regs *regs) { - struct new_signal_frame __user *sf; + struct signal_frame __user *sf; unsigned long up_psr, pc, npc; sigset_t set; __siginfo_fpu_t __user *fpu_save; int err; - sf = (struct new_signal_frame __user *) regs->u_regs[UREG_FP]; + /* Always make any pending restarted system calls return -EINTR */ + current_thread_info()->restart_block.fn = do_no_restart_syscall; + + synchronize_user_stack(); + + sf = (struct signal_frame __user *) regs->u_regs[UREG_FP]; /* 1. Make sure we are not getting garbage from the user */ if (!access_ok(VERIFY_READ, sf, sizeof(*sf))) @@ -198,73 +171,6 @@ segv_and_exit: force_sig(SIGSEGV, current); } -asmlinkage void do_sigreturn(struct pt_regs *regs) -{ - struct sigcontext __user *scptr; - unsigned long pc, npc, psr; - sigset_t set; - int err; - - /* Always make any pending restarted system calls return -EINTR */ - current_thread_info()->restart_block.fn = do_no_restart_syscall; - - synchronize_user_stack(); - - if (current->thread.new_signal) { - do_new_sigreturn(regs); - return; - } - - scptr = (struct sigcontext __user *) regs->u_regs[UREG_I0]; - - /* Check sanity of the user arg. */ - if (!access_ok(VERIFY_READ, scptr, sizeof(struct sigcontext)) || - (((unsigned long) scptr) & 3)) - goto segv_and_exit; - - err = __get_user(pc, &scptr->sigc_pc); - err |= __get_user(npc, &scptr->sigc_npc); - - if ((pc | npc) & 3) - goto segv_and_exit; - - /* This is pretty much atomic, no amount locking would prevent - * the races which exist anyways. - */ - err |= __get_user(set.sig[0], &scptr->sigc_mask); - /* Note that scptr + 1 points to extramask */ - err |= __copy_from_user(&set.sig[1], scptr + 1, - (_NSIG_WORDS - 1) * sizeof(unsigned int)); - - if (err) - goto segv_and_exit; - - sigdelsetmask(&set, ~_BLOCKABLE); - spin_lock_irq(¤t->sighand->siglock); - current->blocked = set; - recalc_sigpending(); - spin_unlock_irq(¤t->sighand->siglock); - - regs->pc = pc; - regs->npc = npc; - - err = __get_user(regs->u_regs[UREG_FP], &scptr->sigc_sp); - err |= __get_user(regs->u_regs[UREG_I0], &scptr->sigc_o0); - err |= __get_user(regs->u_regs[UREG_G1], &scptr->sigc_g1); - - /* User can only change condition codes in %psr. */ - err |= __get_user(psr, &scptr->sigc_psr); - if (err) - goto segv_and_exit; - - regs->psr &= ~(PSR_ICC); - regs->psr |= (psr & PSR_ICC); - return; - -segv_and_exit: - force_sig(SIGSEGV, current); -} - asmlinkage void do_rt_sigreturn(struct pt_regs *regs) { struct rt_signal_frame __user *sf; @@ -351,128 +257,6 @@ static inline void __user *get_sigframe(struct sigaction *sa, struct pt_regs *re return (void __user *)(sp - framesize); } -static inline void -setup_frame(struct sigaction *sa, struct pt_regs *regs, int signr, sigset_t *oldset, siginfo_t *info) -{ - struct signal_sframe __user *sframep; - struct sigcontext __user *sc; - int window = 0, err; - unsigned long pc = regs->pc; - unsigned long npc = regs->npc; - struct thread_info *tp = current_thread_info(); - void __user *sig_address; - int sig_code; - - synchronize_user_stack(); - sframep = (struct signal_sframe __user *) - get_sigframe(sa, regs, SF_ALIGNEDSZ); - if (invalid_frame_pointer(sframep, sizeof(*sframep))){ - /* Don't change signal code and address, so that - * post mortem debuggers can have a look. - */ - goto sigill_and_return; - } - - sc = &sframep->sig_context; - - /* We've already made sure frame pointer isn't in kernel space... */ - err = __put_user((sas_ss_flags(regs->u_regs[UREG_FP]) == SS_ONSTACK), - &sc->sigc_onstack); - err |= __put_user(oldset->sig[0], &sc->sigc_mask); - err |= __copy_to_user(sframep->extramask, &oldset->sig[1], - (_NSIG_WORDS - 1) * sizeof(unsigned int)); - err |= __put_user(regs->u_regs[UREG_FP], &sc->sigc_sp); - err |= __put_user(pc, &sc->sigc_pc); - err |= __put_user(npc, &sc->sigc_npc); - err |= __put_user(regs->psr, &sc->sigc_psr); - err |= __put_user(regs->u_regs[UREG_G1], &sc->sigc_g1); - err |= __put_user(regs->u_regs[UREG_I0], &sc->sigc_o0); - err |= __put_user(tp->w_saved, &sc->sigc_oswins); - if (tp->w_saved) - for (window = 0; window < tp->w_saved; window++) { - put_user((char *)tp->rwbuf_stkptrs[window], - &sc->sigc_spbuf[window]); - err |= __copy_to_user(&sc->sigc_wbuf[window], - &tp->reg_window[window], - sizeof(struct reg_window)); - } - else - err |= __copy_to_user(sframep, (char *) regs->u_regs[UREG_FP], - sizeof(struct reg_window)); - - tp->w_saved = 0; /* So process is allowed to execute. */ - - err |= __put_user(signr, &sframep->sig_num); - sig_address = NULL; - sig_code = 0; - if (SI_FROMKERNEL (info) && (info->si_code & __SI_MASK) == __SI_FAULT) { - sig_address = info->si_addr; - switch (signr) { - case SIGSEGV: - switch (info->si_code) { - case SEGV_MAPERR: sig_code = SUBSIG_NOMAPPING; break; - default: sig_code = SUBSIG_PROTECTION; break; - } - break; - case SIGILL: - switch (info->si_code) { - case ILL_ILLOPC: sig_code = SUBSIG_ILLINST; break; - case ILL_PRVOPC: sig_code = SUBSIG_PRIVINST; break; - case ILL_ILLTRP: sig_code = SUBSIG_BADTRAP(info->si_trapno); break; - default: sig_code = SUBSIG_STACK; break; - } - break; - case SIGFPE: - switch (info->si_code) { - case FPE_INTDIV: sig_code = SUBSIG_IDIVZERO; break; - case FPE_INTOVF: sig_code = SUBSIG_FPINTOVFL; break; - case FPE_FLTDIV: sig_code = SUBSIG_FPDIVZERO; break; - case FPE_FLTOVF: sig_code = SUBSIG_FPOVFLOW; break; - case FPE_FLTUND: sig_code = SUBSIG_FPUNFLOW; break; - case FPE_FLTRES: sig_code = SUBSIG_FPINEXACT; break; - case FPE_FLTINV: sig_code = SUBSIG_FPOPERROR; break; - default: sig_code = SUBSIG_FPERROR; break; - } - break; - case SIGBUS: - switch (info->si_code) { - case BUS_ADRALN: sig_code = SUBSIG_ALIGNMENT; break; - case BUS_ADRERR: sig_code = SUBSIG_MISCERROR; break; - default: sig_code = SUBSIG_BUSTIMEOUT; break; - } - break; - case SIGEMT: - switch (info->si_code) { - case EMT_TAGOVF: sig_code = SUBSIG_TAG; break; - } - break; - case SIGSYS: - if (info->si_code == (__SI_FAULT|0x100)) { - sig_code = info->si_trapno; - break; - } - default: - sig_address = NULL; - } - } - err |= __put_user((unsigned long)sig_address, &sframep->sig_address); - err |= __put_user(sig_code, &sframep->sig_code); - err |= __put_user(sc, &sframep->sig_scptr); - if (err) - goto sigsegv; - - regs->u_regs[UREG_FP] = (unsigned long) sframep; - regs->pc = (unsigned long) sa->sa_handler; - regs->npc = (regs->pc + 4); - return; - -sigill_and_return: - do_exit(SIGILL); -sigsegv: - force_sigsegv(signr, current); -} - - static inline int save_fpu_state(struct pt_regs *regs, __siginfo_fpu_t __user *fpu) { @@ -508,21 +292,20 @@ save_fpu_state(struct pt_regs *regs, __siginfo_fpu_t __user *fpu) return err; } -static inline void -new_setup_frame(struct k_sigaction *ka, struct pt_regs *regs, - int signo, sigset_t *oldset) +static void setup_frame(struct k_sigaction *ka, struct pt_regs *regs, + int signo, sigset_t *oldset) { - struct new_signal_frame __user *sf; + struct signal_frame __user *sf; int sigframe_size, err; /* 1. Make sure everything is clean */ synchronize_user_stack(); - sigframe_size = NF_ALIGNEDSZ; + sigframe_size = SF_ALIGNEDSZ; if (!used_math()) sigframe_size -= sizeof(__siginfo_fpu_t); - sf = (struct new_signal_frame __user *) + sf = (struct signal_frame __user *) get_sigframe(&ka->sa, regs, sigframe_size); if (invalid_frame_pointer(sf, sigframe_size)) @@ -586,9 +369,8 @@ sigsegv: force_sigsegv(signo, current); } -static inline void -new_setup_rt_frame(struct k_sigaction *ka, struct pt_regs *regs, - int signo, sigset_t *oldset, siginfo_t *info) +static void setup_rt_frame(struct k_sigaction *ka, struct pt_regs *regs, + int signo, sigset_t *oldset, siginfo_t *info) { struct rt_signal_frame __user *sf; int sigframe_size; @@ -674,11 +456,9 @@ handle_signal(unsigned long signr, struct k_sigaction *ka, siginfo_t *info, sigset_t *oldset, struct pt_regs *regs) { if (ka->sa.sa_flags & SA_SIGINFO) - new_setup_rt_frame(ka, regs, signr, oldset, info); - else if (current->thread.new_signal) - new_setup_frame(ka, regs, signr, oldset); + setup_rt_frame(ka, regs, signr, oldset, info); else - setup_frame(&ka->sa, regs, signr, oldset, info); + setup_frame(ka, regs, signr, oldset); spin_lock_irq(¤t->sighand->siglock); sigorsets(¤t->blocked,¤t->blocked,&ka->sa.sa_mask); diff --git a/arch/sparc/kernel/sys_sparc.c b/arch/sparc/kernel/sys_sparc.c index 42bf09db9a81..f188b5dc9fd0 100644 --- a/arch/sparc/kernel/sys_sparc.c +++ b/arch/sparc/kernel/sys_sparc.c @@ -1,5 +1,4 @@ -/* $Id: sys_sparc.c,v 1.70 2001/04/14 01:12:02 davem Exp $ - * linux/arch/sparc/kernel/sys_sparc.c +/* linux/arch/sparc/kernel/sys_sparc.c * * This file contains various random system calls that * have a non-standard calling sequence on the Linux/sparc @@ -395,10 +394,8 @@ sparc_sigaction (int sig, const struct old_sigaction __user *act, struct k_sigaction new_ka, old_ka; int ret; - if (sig < 0) { - current->thread.new_signal = 1; - sig = -sig; - } + WARN_ON_ONCE(sig >= 0); + sig = -sig; if (act) { unsigned long mask; @@ -446,11 +443,6 @@ sys_rt_sigaction(int sig, if (sigsetsize != sizeof(sigset_t)) return -EINVAL; - /* All tasks which use RT signals (effectively) use - * new style signals. - */ - current->thread.new_signal = 1; - if (act) { new_ka.ka_restorer = restorer; if (copy_from_user(&new_ka.sa, act, sizeof(*act))) diff --git a/arch/sparc64/kernel/process.c b/arch/sparc64/kernel/process.c index acf8c5250aa9..500ac6d483a0 100644 --- a/arch/sparc64/kernel/process.c +++ b/arch/sparc64/kernel/process.c @@ -1,5 +1,4 @@ -/* $Id: process.c,v 1.131 2002/02/09 19:49:30 davem Exp $ - * arch/sparc64/kernel/process.c +/* arch/sparc64/kernel/process.c * * Copyright (C) 1995, 1996 David S. Miller (davem@caip.rutgers.edu) * Copyright (C) 1996 Eddie C. Dost (ecd@skynet.be) @@ -368,9 +367,6 @@ void flush_thread(void) if (get_thread_current_ds() != ASI_AIUS) set_fs(USER_DS); - - /* Init new signal delivery disposition. */ - clear_thread_flag(TIF_NEWSIGNALS); } /* It's a bit more tricky when 64-bit tasks are involved... */ diff --git a/arch/sparc64/kernel/signal32.c b/arch/sparc64/kernel/signal32.c index 43cdec64d9c9..91f8d0826db1 100644 --- a/arch/sparc64/kernel/signal32.c +++ b/arch/sparc64/kernel/signal32.c @@ -1,5 +1,4 @@ -/* $Id: signal32.c,v 1.74 2002/02/09 19:49:30 davem Exp $ - * arch/sparc64/kernel/signal32.c +/* arch/sparc64/kernel/signal32.c * * Copyright (C) 1991, 1992 Linus Torvalds * Copyright (C) 1995 David S. Miller (davem@caip.rutgers.edu) @@ -31,30 +30,6 @@ #define _BLOCKABLE (~(sigmask(SIGKILL) | sigmask(SIGSTOP))) -/* Signal frames: the original one (compatible with SunOS): - * - * Set up a signal frame... Make the stack look the way SunOS - * expects it to look which is basically: - * - * ---------------------------------- <-- %sp at signal time - * Struct sigcontext - * Signal address - * Ptr to sigcontext area above - * Signal code - * The signal number itself - * One register window - * ---------------------------------- <-- New %sp - */ -struct signal_sframe32 { - struct reg_window32 sig_window; - int sig_num; - int sig_code; - /* struct sigcontext32 * */ u32 sig_scptr; - int sig_address; - struct sigcontext32 sig_context; - unsigned int extramask[_COMPAT_NSIG_WORDS - 1]; -}; - /* This magic should be in g_upper[0] for all upper parts * to be valid. */ @@ -65,12 +40,7 @@ typedef struct { unsigned int asi; } siginfo_extra_v8plus_t; -/* - * And the new one, intended to be used for Linux applications only - * (we have enough in there to work with clone). - * All the interesting bits are in the info field. - */ -struct new_signal_frame32 { +struct signal_frame32 { struct sparc_stackf32 ss; __siginfo32_t info; /* __siginfo_fpu32_t * */ u32 fpu_save; @@ -149,8 +119,7 @@ struct rt_signal_frame32 { }; /* Align macros */ -#define SF_ALIGNEDSZ (((sizeof(struct signal_sframe32) + 7) & (~7))) -#define NF_ALIGNEDSZ (((sizeof(struct new_signal_frame32) + 7) & (~7))) +#define SF_ALIGNEDSZ (((sizeof(struct signal_frame32) + 7) & (~7))) #define RT_ALIGNEDSZ (((sizeof(struct rt_signal_frame32) + 7) & (~7))) int copy_siginfo_to_user32(compat_siginfo_t __user *to, siginfo_t *from) @@ -241,17 +210,22 @@ static int restore_fpu_state32(struct pt_regs *regs, __siginfo_fpu_t __user *fpu return err; } -void do_new_sigreturn32(struct pt_regs *regs) +void do_sigreturn32(struct pt_regs *regs) { - struct new_signal_frame32 __user *sf; + struct signal_frame32 __user *sf; unsigned int psr; unsigned pc, npc, fpu_save; sigset_t set; unsigned seta[_COMPAT_NSIG_WORDS]; int err, i; + /* Always make any pending restarted system calls return -EINTR */ + current_thread_info()->restart_block.fn = do_no_restart_syscall; + + synchronize_user_stack(); + regs->u_regs[UREG_FP] &= 0x00000000ffffffffUL; - sf = (struct new_signal_frame32 __user *) regs->u_regs[UREG_FP]; + sf = (struct signal_frame32 __user *) regs->u_regs[UREG_FP]; /* 1. Make sure we are not getting garbage from the user */ if (!access_ok(VERIFY_READ, sf, sizeof(*sf)) || @@ -319,76 +293,6 @@ segv: force_sig(SIGSEGV, current); } -asmlinkage void do_sigreturn32(struct pt_regs *regs) -{ - struct sigcontext32 __user *scptr; - unsigned int pc, npc, psr; - sigset_t set; - unsigned int seta[_COMPAT_NSIG_WORDS]; - int err; - - /* Always make any pending restarted system calls return -EINTR */ - current_thread_info()->restart_block.fn = do_no_restart_syscall; - - synchronize_user_stack(); - if (test_thread_flag(TIF_NEWSIGNALS)) { - do_new_sigreturn32(regs); - return; - } - - scptr = (struct sigcontext32 __user *) - (regs->u_regs[UREG_I0] & 0x00000000ffffffffUL); - /* Check sanity of the user arg. */ - if (!access_ok(VERIFY_READ, scptr, sizeof(struct sigcontext32)) || - (((unsigned long) scptr) & 3)) - goto segv; - - err = __get_user(pc, &scptr->sigc_pc); - err |= __get_user(npc, &scptr->sigc_npc); - - if ((pc | npc) & 3) - goto segv; /* Nice try. */ - - err |= __get_user(seta[0], &scptr->sigc_mask); - /* Note that scptr + 1 points to extramask */ - err |= copy_from_user(seta+1, scptr + 1, - (_COMPAT_NSIG_WORDS - 1) * sizeof(unsigned int)); - if (err) - goto segv; - switch (_NSIG_WORDS) { - case 4: set.sig[3] = seta[6] + (((long)seta[7]) << 32); - case 3: set.sig[2] = seta[4] + (((long)seta[5]) << 32); - case 2: set.sig[1] = seta[2] + (((long)seta[3]) << 32); - case 1: set.sig[0] = seta[0] + (((long)seta[1]) << 32); - } - sigdelsetmask(&set, ~_BLOCKABLE); - spin_lock_irq(¤t->sighand->siglock); - current->blocked = set; - recalc_sigpending(); - spin_unlock_irq(¤t->sighand->siglock); - - if (test_thread_flag(TIF_32BIT)) { - pc &= 0xffffffff; - npc &= 0xffffffff; - } - regs->tpc = pc; - regs->tnpc = npc; - err = __get_user(regs->u_regs[UREG_FP], &scptr->sigc_sp); - err |= __get_user(regs->u_regs[UREG_I0], &scptr->sigc_o0); - err |= __get_user(regs->u_regs[UREG_G1], &scptr->sigc_g1); - - /* User can only change condition codes in %tstate. */ - err |= __get_user(psr, &scptr->sigc_psr); - if (err) - goto segv; - regs->tstate &= ~(TSTATE_ICC|TSTATE_XCC); - regs->tstate |= psr_to_tstate_icc(psr); - return; - -segv: - force_sig(SIGSEGV, current); -} - asmlinkage void do_rt_sigreturn32(struct pt_regs *regs) { struct rt_signal_frame32 __user *sf; @@ -504,145 +408,6 @@ static void __user *get_sigframe(struct sigaction *sa, struct pt_regs *regs, uns return (void __user *)(sp - framesize); } -static void -setup_frame32(struct sigaction *sa, struct pt_regs *regs, int signr, sigset_t *oldset, siginfo_t *info) -{ - struct signal_sframe32 __user *sframep; - struct sigcontext32 __user *sc; - unsigned int seta[_COMPAT_NSIG_WORDS]; - int err = 0; - void __user *sig_address; - int sig_code; - unsigned long pc = regs->tpc; - unsigned long npc = regs->tnpc; - unsigned int psr; - - if (test_thread_flag(TIF_32BIT)) { - pc &= 0xffffffff; - npc &= 0xffffffff; - } - - synchronize_user_stack(); - save_and_clear_fpu(); - - sframep = (struct signal_sframe32 __user *) - get_sigframe(sa, regs, SF_ALIGNEDSZ); - if (invalid_frame_pointer(sframep, sizeof(*sframep))){ - /* Don't change signal code and address, so that - * post mortem debuggers can have a look. - */ - do_exit(SIGILL); - } - - sc = &sframep->sig_context; - - /* We've already made sure frame pointer isn't in kernel space... */ - err = __put_user((sas_ss_flags(regs->u_regs[UREG_FP]) == SS_ONSTACK), - &sc->sigc_onstack); - - switch (_NSIG_WORDS) { - case 4: seta[7] = (oldset->sig[3] >> 32); - seta[6] = oldset->sig[3]; - case 3: seta[5] = (oldset->sig[2] >> 32); - seta[4] = oldset->sig[2]; - case 2: seta[3] = (oldset->sig[1] >> 32); - seta[2] = oldset->sig[1]; - case 1: seta[1] = (oldset->sig[0] >> 32); - seta[0] = oldset->sig[0]; - } - err |= __put_user(seta[0], &sc->sigc_mask); - err |= __copy_to_user(sframep->extramask, seta + 1, - (_COMPAT_NSIG_WORDS - 1) * sizeof(unsigned int)); - err |= __put_user(regs->u_regs[UREG_FP], &sc->sigc_sp); - err |= __put_user(pc, &sc->sigc_pc); - err |= __put_user(npc, &sc->sigc_npc); - psr = tstate_to_psr(regs->tstate); - if (current_thread_info()->fpsaved[0] & FPRS_FEF) - psr |= PSR_EF; - err |= __put_user(psr, &sc->sigc_psr); - err |= __put_user(regs->u_regs[UREG_G1], &sc->sigc_g1); - err |= __put_user(regs->u_regs[UREG_I0], &sc->sigc_o0); - err |= __put_user(get_thread_wsaved(), &sc->sigc_oswins); - - err |= copy_in_user((u32 __user *)sframep, - (u32 __user *)(regs->u_regs[UREG_FP]), - sizeof(struct reg_window32)); - - set_thread_wsaved(0); /* So process is allowed to execute. */ - err |= __put_user(signr, &sframep->sig_num); - sig_address = NULL; - sig_code = 0; - if (SI_FROMKERNEL (info) && (info->si_code & __SI_MASK) == __SI_FAULT) { - sig_address = info->si_addr; - switch (signr) { - case SIGSEGV: - switch (info->si_code) { - case SEGV_MAPERR: sig_code = SUBSIG_NOMAPPING; break; - default: sig_code = SUBSIG_PROTECTION; break; - } - break; - case SIGILL: - switch (info->si_code) { - case ILL_ILLOPC: sig_code = SUBSIG_ILLINST; break; - case ILL_PRVOPC: sig_code = SUBSIG_PRIVINST; break; - case ILL_ILLTRP: sig_code = SUBSIG_BADTRAP(info->si_trapno); break; - default: sig_code = SUBSIG_STACK; break; - } - break; - case SIGFPE: - switch (info->si_code) { - case FPE_INTDIV: sig_code = SUBSIG_IDIVZERO; break; - case FPE_INTOVF: sig_code = SUBSIG_FPINTOVFL; break; - case FPE_FLTDIV: sig_code = SUBSIG_FPDIVZERO; break; - case FPE_FLTOVF: sig_code = SUBSIG_FPOVFLOW; break; - case FPE_FLTUND: sig_code = SUBSIG_FPUNFLOW; break; - case FPE_FLTRES: sig_code = SUBSIG_FPINEXACT; break; - case FPE_FLTINV: sig_code = SUBSIG_FPOPERROR; break; - default: sig_code = SUBSIG_FPERROR; break; - } - break; - case SIGBUS: - switch (info->si_code) { - case BUS_ADRALN: sig_code = SUBSIG_ALIGNMENT; break; - case BUS_ADRERR: sig_code = SUBSIG_MISCERROR; break; - default: sig_code = SUBSIG_BUSTIMEOUT; break; - } - break; - case SIGEMT: - switch (info->si_code) { - case EMT_TAGOVF: sig_code = SUBSIG_TAG; break; - } - break; - case SIGSYS: - if (info->si_code == (__SI_FAULT|0x100)) { - /* See sys_sunos32.c */ - sig_code = info->si_trapno; - break; - } - default: - sig_address = NULL; - } - } - err |= __put_user(ptr_to_compat(sig_address), &sframep->sig_address); - err |= __put_user(sig_code, &sframep->sig_code); - err |= __put_user(ptr_to_compat(sc), &sframep->sig_scptr); - if (err) - goto sigsegv; - - regs->u_regs[UREG_FP] = (unsigned long) sframep; - regs->tpc = (unsigned long) sa->sa_handler; - regs->tnpc = (regs->tpc + 4); - if (test_thread_flag(TIF_32BIT)) { - regs->tpc &= 0xffffffff; - regs->tnpc &= 0xffffffff; - } - return; - -sigsegv: - force_sigsegv(signr, current); -} - - static int save_fpu_state32(struct pt_regs *regs, __siginfo_fpu_t __user *fpu) { unsigned long *fpregs = current_thread_info()->fpregs; @@ -663,10 +428,10 @@ static int save_fpu_state32(struct pt_regs *regs, __siginfo_fpu_t __user *fpu) return err; } -static void new_setup_frame32(struct k_sigaction *ka, struct pt_regs *regs, - int signo, sigset_t *oldset) +static void setup_frame32(struct k_sigaction *ka, struct pt_regs *regs, + int signo, sigset_t *oldset) { - struct new_signal_frame32 __user *sf; + struct signal_frame32 __user *sf; int sigframe_size; u32 psr; int i, err; @@ -676,11 +441,11 @@ static void new_setup_frame32(struct k_sigaction *ka, struct pt_regs *regs, synchronize_user_stack(); save_and_clear_fpu(); - sigframe_size = NF_ALIGNEDSZ; + sigframe_size = SF_ALIGNEDSZ; if (!(current_thread_info()->fpsaved[0] & FPRS_FEF)) sigframe_size -= sizeof(__siginfo_fpu_t); - sf = (struct new_signal_frame32 __user *) + sf = (struct signal_frame32 __user *) get_sigframe(&ka->sa, regs, sigframe_size); if (invalid_frame_pointer(sf, sigframe_size)) @@ -944,10 +709,9 @@ static inline void handle_signal32(unsigned long signr, struct k_sigaction *ka, { if (ka->sa.sa_flags & SA_SIGINFO) setup_rt_frame32(ka, regs, signr, oldset, info); - else if (test_thread_flag(TIF_NEWSIGNALS)) - new_setup_frame32(ka, regs, signr, oldset); else - setup_frame32(&ka->sa, regs, signr, oldset, info); + setup_frame32(ka, regs, signr, oldset); + spin_lock_irq(¤t->sighand->siglock); sigorsets(¤t->blocked,¤t->blocked,&ka->sa.sa_mask); if (!(ka->sa.sa_flags & SA_NOMASK)) diff --git a/arch/sparc64/kernel/sys_sparc32.c b/arch/sparc64/kernel/sys_sparc32.c index c1a61e98899a..161ce4710fe7 100644 --- a/arch/sparc64/kernel/sys_sparc32.c +++ b/arch/sparc64/kernel/sys_sparc32.c @@ -554,10 +554,8 @@ asmlinkage long compat_sys_sigaction(int sig, struct old_sigaction32 __user *act struct k_sigaction new_ka, old_ka; int ret; - if (sig < 0) { - set_thread_flag(TIF_NEWSIGNALS); - sig = -sig; - } + WARN_ON_ONCE(sig >= 0); + sig = -sig; if (act) { compat_old_sigset_t mask; @@ -601,11 +599,6 @@ asmlinkage long compat_sys_rt_sigaction(int sig, if (sigsetsize != sizeof(compat_sigset_t)) return -EINVAL; - /* All tasks which use RT signals (effectively) use - * new style signals. - */ - set_thread_flag(TIF_NEWSIGNALS); - if (act) { u32 u_handler, u_restorer; diff --git a/include/asm-sparc/processor.h b/include/asm-sparc/processor.h index e3006979709b..8898efbbbe07 100644 --- a/include/asm-sparc/processor.h +++ b/include/asm-sparc/processor.h @@ -1,5 +1,4 @@ -/* $Id: processor.h,v 1.83 2001/10/08 09:32:13 davem Exp $ - * include/asm-sparc/processor.h +/* include/asm-sparc/processor.h * * Copyright (C) 1994 David S. Miller (davem@caip.rutgers.edu) */ @@ -65,7 +64,6 @@ struct thread_struct { struct fpq fpqueue[16]; unsigned long flags; mm_segment_t current_ds; - int new_signal; }; #define SPARC_FLAG_KTHREAD 0x1 /* task is a kernel thread */ diff --git a/include/asm-sparc64/thread_info.h b/include/asm-sparc64/thread_info.h index 98252cd44dd6..71e42d1a80d9 100644 --- a/include/asm-sparc64/thread_info.h +++ b/include/asm-sparc64/thread_info.h @@ -1,5 +1,4 @@ -/* $Id: thread_info.h,v 1.1 2002/02/10 00:00:58 davem Exp $ - * thread_info.h: sparc64 low-level thread information +/* thread_info.h: sparc64 low-level thread information * * Copyright (C) 2002 David S. Miller (davem@redhat.com) */ @@ -223,7 +222,7 @@ register struct thread_info *current_thread_info_reg asm("g6"); #define TIF_NEED_RESCHED 3 /* rescheduling necessary */ #define TIF_PERFCTR 4 /* performance counters active */ #define TIF_UNALIGNED 5 /* allowed to do unaligned accesses */ -#define TIF_NEWSIGNALS 6 /* wants new-style signals */ +/* flag bit 6 is available */ #define TIF_32BIT 7 /* 32-bit binary */ /* flag bit 8 is available */ #define TIF_SECCOMP 9 /* secure computing */ @@ -242,7 +241,6 @@ register struct thread_info *current_thread_info_reg asm("g6"); #define _TIF_NEED_RESCHED (1< Date: Sat, 26 Apr 2008 14:10:16 -0700 Subject: hrtimer: timeout too long when using HRTIMER_CB_SOFTIRQ When using hrtimer with timer->cb_mode == HRTIMER_CB_SOFTIRQ in some cases the clockevent is not programmed. This happens, if: - a timer is rearmed while it's state is HRTIMER_STATE_CALLBACK - hrtimer_reprogram() returns -ETIME, when it is called after CALLBACK is finished. This occurs if the new timer->expires is in the past when CALLBACK is done. In this case, the timer needs to be removed from the tree and put onto the pending list again. The patch is against 2.6.22.5, but AFAICS, it is relevant for 2.6.25 also (in run_hrtimer_pending()). Signed-off-by: Bodo Stroesser Cc: stable@kernel.org Signed-off-by: Thomas Gleixner --- kernel/hrtimer.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/kernel/hrtimer.c b/kernel/hrtimer.c index f78777abe769..e379ef0e9c20 100644 --- a/kernel/hrtimer.c +++ b/kernel/hrtimer.c @@ -1080,8 +1080,19 @@ static void run_hrtimer_pending(struct hrtimer_cpu_base *cpu_base) * If the timer was rearmed on another CPU, reprogram * the event device. */ - if (timer->base->first == &timer->node) - hrtimer_reprogram(timer, timer->base); + struct hrtimer_clock_base *base = timer->base; + + if (base->first == &timer->node && + hrtimer_reprogram(timer, base)) { + /* + * Timer is expired. Thus move it from tree to + * pending list again. + */ + __remove_hrtimer(timer, base, + HRTIMER_STATE_PENDING, 0); + list_add_tail(&timer->cb_entry, + &base->cpu_base->cb_pending); + } } } spin_unlock_irq(&cpu_base->lock); -- cgit v1.2.3 From 86cf02f8eaea1b09e102e0f432fc137dc5cf4407 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Sun, 27 Apr 2008 11:59:30 -0700 Subject: x86 PAT: tone down debugging messages some more Ingo already fixed one of these at my request (in "x86 PAT: tone down debugging messages", commit 1ebcc654f010d4a63f3ebf8ddd2cab5a709b1824), but there was another one he missed. Signed-off-by: Linus Torvalds --- arch/x86/mm/pat.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c index e7ca7fc48d12..b17cdf64e41e 100644 --- a/arch/x86/mm/pat.c +++ b/arch/x86/mm/pat.c @@ -387,8 +387,8 @@ int reserve_memtype(u64 start, u64 end, unsigned long req_type, break; } - printk(KERN_INFO "Overlap at 0x%Lx-0x%Lx\n", - saved_ptr->start, saved_ptr->end); + pr_debug(KERN_INFO "Overlap at 0x%Lx-0x%Lx\n", + saved_ptr->start, saved_ptr->end); /* No conflict. Go ahead and add this new entry */ list_add(&new_entry->nd, &saved_ptr->nd); new_entry = NULL; -- cgit v1.2.3 From 5e2c433d9f84dd9b0e01ef8607380d53a7f64d69 Mon Sep 17 00:00:00 2001 From: YOSHIFUJI Hideaki Date: Sat, 26 Apr 2008 22:24:10 -0700 Subject: [XFRM] AUDIT: Fix flowlabel text format ambibuity. Flowlabel text format was not correct and thus ambiguous. For example, 0x00123 or 0x01203 are formatted as 0x123. This is not what audit tools want. Signed-off-by: YOSHIFUJI Hideaki Signed-off-by: David S. Miller --- net/xfrm/xfrm_state.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/xfrm/xfrm_state.c b/net/xfrm/xfrm_state.c index 5dcc10b93c86..fac27ce770d5 100644 --- a/net/xfrm/xfrm_state.c +++ b/net/xfrm/xfrm_state.c @@ -2112,7 +2112,7 @@ static void xfrm_audit_helper_pktinfo(struct sk_buff *skb, u16 family, iph6 = ipv6_hdr(skb); audit_log_format(audit_buf, " src=" NIP6_FMT " dst=" NIP6_FMT - " flowlbl=0x%x%x%x", + " flowlbl=0x%x%02x%02x", NIP6(iph6->saddr), NIP6(iph6->daddr), iph6->flow_lbl[0] & 0x0f, -- cgit v1.2.3 From ec6b486fa9f6d20bfbaebba1db88bfe9d390ab1a Mon Sep 17 00:00:00 2001 From: Al Viro Date: Sat, 26 Apr 2008 22:28:58 -0700 Subject: ipv6: result of csum_fold() is already 16bit, no need to cast Signed-off-by: Al Viro Signed-off-by: David S. Miller --- net/ipv6/ip6mr.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv6/ip6mr.c b/net/ipv6/ip6mr.c index c8c6e33d1163..2de3c464fe75 100644 --- a/net/ipv6/ip6mr.c +++ b/net/ipv6/ip6mr.c @@ -358,7 +358,7 @@ static int pim6_rcv(struct sk_buff *skb) if (pim->type != ((PIM_VERSION << 4) | PIM_REGISTER) || (pim->flags & PIM_NULL_REGISTER) || (ip_compute_csum((void *)pim, sizeof(*pim)) != 0 && - (u16)csum_fold(skb_checksum(skb, 0, skb->len, 0)))) + csum_fold(skb_checksum(skb, 0, skb->len, 0)))) goto drop; /* check if the inner packet is destined to mcast group */ -- cgit v1.2.3 From becf3da20eca82e06607b9813cc3b592b359bae5 Mon Sep 17 00:00:00 2001 From: Al Viro Date: Sat, 26 Apr 2008 22:42:14 -0700 Subject: tipc: endianness annotations Signed-off-by: Al Viro Signed-off-by: David S. Miller --- net/tipc/msg.h | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/net/tipc/msg.h b/net/tipc/msg.h index 6ad070d87702..ad487e8abcc2 100644 --- a/net/tipc/msg.h +++ b/net/tipc/msg.h @@ -70,10 +70,9 @@ static inline void msg_set_bits(struct tipc_msg *m, u32 w, u32 pos, u32 mask, u32 val) { val = (val & mask) << pos; - val = htonl(val); - mask = htonl(mask << pos); - m->hdr[w] &= ~mask; - m->hdr[w] |= val; + mask = mask << pos; + m->hdr[w] &= ~htonl(mask); + m->hdr[w] |= htonl(val); } /* -- cgit v1.2.3 From 5c5d6dabb7aac9d0ea7aa76b909bbd28efa99065 Mon Sep 17 00:00:00 2001 From: "David S. Miller" Date: Sat, 26 Apr 2008 22:50:57 -0700 Subject: Revert "ipv6: Fix typo in net/ipv6/Kconfig" This reverts commit 5b3f129c5592ca35b3fe8916767c58b98710478c. As requested by Maciej W. Rozycki. Signed-off-by: David S. Miller --- net/ipv6/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/ipv6/Kconfig b/net/ipv6/Kconfig index b2c9becc02e8..42814a2ec9d7 100644 --- a/net/ipv6/Kconfig +++ b/net/ipv6/Kconfig @@ -167,7 +167,7 @@ config IPV6_SIT Tunneling means encapsulating data of one protocol type within another protocol and sending it over a channel that understands the encapsulating protocol. This driver implements encapsulation of IPv6 - into IPv4 packets. This is useful if you want to connect to IPv6 + into IPv4 packets. This is useful if you want to connect two IPv6 networks over an IPv4-only path. Saying M here will produce a module called sit.ko. If unsure, say Y. -- cgit v1.2.3 From 3f91bd420a955803421f2db17b2e04aacfbb2bb8 Mon Sep 17 00:00:00 2001 From: Sam Ravnborg Date: Sat, 26 Apr 2008 22:57:25 -0700 Subject: can: Fix copy_from_user() results interpretation Both copy_to_ and _from_user return the number of bytes, that failed to reach their destination, not the 0/-EXXX values. Based on patch from Pavel Emelyanov Signed-off-by: Sam Ravnborg Acked-by: Oliver Hartkopp Signed-off-by: David S. Miller --- net/can/raw.c | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/net/can/raw.c b/net/can/raw.c index 201cbfc6b9ec..69877b8e7e9c 100644 --- a/net/can/raw.c +++ b/net/can/raw.c @@ -435,15 +435,13 @@ static int raw_setsockopt(struct socket *sock, int level, int optname, if (!filter) return -ENOMEM; - err = copy_from_user(filter, optval, optlen); - if (err) { + if (copy_from_user(filter, optval, optlen)) { kfree(filter); - return err; + return -EFAULT; } } else if (count == 1) { - err = copy_from_user(&sfilter, optval, optlen); - if (err) - return err; + if (copy_from_user(&sfilter, optval, optlen)) + return -EFAULT; } lock_sock(sk); @@ -493,9 +491,8 @@ static int raw_setsockopt(struct socket *sock, int level, int optname, if (optlen != sizeof(err_mask)) return -EINVAL; - err = copy_from_user(&err_mask, optval, optlen); - if (err) - return err; + if (copy_from_user(&err_mask, optval, optlen)) + return -EFAULT; err_mask &= CAN_ERR_MASK; @@ -531,7 +528,8 @@ static int raw_setsockopt(struct socket *sock, int level, int optname, if (optlen != sizeof(ro->loopback)) return -EINVAL; - err = copy_from_user(&ro->loopback, optval, optlen); + if (copy_from_user(&ro->loopback, optval, optlen)) + return -EFAULT; break; @@ -539,7 +537,8 @@ static int raw_setsockopt(struct socket *sock, int level, int optname, if (optlen != sizeof(ro->recv_own_msgs)) return -EINVAL; - err = copy_from_user(&ro->recv_own_msgs, optval, optlen); + if (copy_from_user(&ro->recv_own_msgs, optval, optlen)) + return -EFAULT; break; -- cgit v1.2.3 From 0b80ae4201e5128e16e5161825f5cd377a5d1fee Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Sat, 26 Apr 2008 22:59:02 -0700 Subject: sunrpc: fix missing kernel-doc Fix missing sunrpc kernel-doc: Warning(linux-2.6.25-git7//net/sunrpc/xprt.c:451): No description found for parameter 'action' Signed-off-by: Randy Dunlap Signed-off-by: David S. Miller --- net/sunrpc/xprt.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index d5553b8179f9..61880cc90e86 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -445,7 +445,7 @@ EXPORT_SYMBOL_GPL(xprt_wake_pending_tasks); /** * xprt_wait_for_buffer_space - wait for transport output buffer to clear * @task: task to be put to sleep - * + * @action: function pointer to be executed after wait */ void xprt_wait_for_buffer_space(struct rpc_task *task) { -- cgit v1.2.3 From 01a2202c95989a4df48e9a5b5e013cb80c6b2d66 Mon Sep 17 00:00:00 2001 From: Herbert Xu Date: Sun, 27 Apr 2008 00:59:59 -0700 Subject: [IPSEC]: Use digest_null directly for auth Previously digest_null had no setkey function which meant that we used hmac(digest_null) for IPsec since IPsec always calls setkey. Now that digest_null has a setkey we no longer need to do that. In fact when only confidentiality is specified for ESP we already use digest_null directly. However, when the null algorithm is explicitly specified by the user we still opt for hmac(digest_null). This patch removes this discrepancy. I have not added a new compat name for it because by chance it wasn't actualy possible for the user to specify the name hmac(digest_null) due to a key length check in xfrm_user (which I found out when testing that compat name :) Signed-off-by: Herbert Xu Signed-off-by: David S. Miller --- net/xfrm/xfrm_algo.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/net/xfrm/xfrm_algo.c b/net/xfrm/xfrm_algo.c index 8aa6440d689f..ac765dd9c7f5 100644 --- a/net/xfrm/xfrm_algo.c +++ b/net/xfrm/xfrm_algo.c @@ -129,8 +129,7 @@ static struct xfrm_algo_desc aead_list[] = { static struct xfrm_algo_desc aalg_list[] = { { - .name = "hmac(digest_null)", - .compat = "digest_null", + .name = "digest_null", .uinfo = { .auth = { -- cgit v1.2.3 From dae50295488f35d2d617b08a5fae43154c947eec Mon Sep 17 00:00:00 2001 From: David L Stevens Date: Sun, 27 Apr 2008 01:06:07 -0700 Subject: ipv4/ipv6 compat: Fix SSM applications on 64bit kernels. Add support on 64-bit kernels for seting 32-bit compatible MCAST* socket options. Signed-off-by: David L Stevens Signed-off-by: David S. Miller --- include/net/compat.h | 3 ++ net/compat.c | 117 +++++++++++++++++++++++++++++++++++++++++++++++ net/ipv4/ip_sockglue.c | 5 ++ net/ipv6/ipv6_sockglue.c | 5 ++ 4 files changed, 130 insertions(+) diff --git a/include/net/compat.h b/include/net/compat.h index 406db242f73a..05fa5d0254ab 100644 --- a/include/net/compat.h +++ b/include/net/compat.h @@ -40,4 +40,7 @@ extern int put_cmsg_compat(struct msghdr*, int, int, int, void *); extern int cmsghdr_from_user_compat_to_kern(struct msghdr *, struct sock *, unsigned char *, int); +extern int compat_mc_setsockopt(struct sock *, int, int, char __user *, int, + int (*)(struct sock *, int, int, char __user *, int)); + #endif /* NET_COMPAT_H */ diff --git a/net/compat.c b/net/compat.c index 80013fb69a61..01bf95d0832e 100644 --- a/net/compat.c +++ b/net/compat.c @@ -24,6 +24,8 @@ #include #include +#include +#include #include #include @@ -521,6 +523,121 @@ asmlinkage long compat_sys_getsockopt(int fd, int level, int optname, } return err; } + +struct compat_group_req { + __u32 gr_interface; + struct __kernel_sockaddr_storage gr_group + __attribute__ ((aligned(4))); +} __attribute__ ((packed)); + +struct compat_group_source_req { + __u32 gsr_interface; + struct __kernel_sockaddr_storage gsr_group + __attribute__ ((aligned(4))); + struct __kernel_sockaddr_storage gsr_source + __attribute__ ((aligned(4))); +} __attribute__ ((packed)); + +struct compat_group_filter { + __u32 gf_interface; + struct __kernel_sockaddr_storage gf_group + __attribute__ ((aligned(4))); + __u32 gf_fmode; + __u32 gf_numsrc; + struct __kernel_sockaddr_storage gf_slist[1] + __attribute__ ((aligned(4))); +} __attribute__ ((packed)); + + +int compat_mc_setsockopt(struct sock *sock, int level, int optname, + char __user *optval, int optlen, + int (*setsockopt)(struct sock *,int,int,char __user *,int)) +{ + char __user *koptval = optval; + int koptlen = optlen; + + switch (optname) { + case MCAST_JOIN_GROUP: + case MCAST_LEAVE_GROUP: + { + struct compat_group_req __user *gr32 = (void *)optval; + struct group_req __user *kgr = + compat_alloc_user_space(sizeof(struct group_req)); + u32 interface; + + if (!access_ok(VERIFY_READ, gr32, sizeof(*gr32)) || + !access_ok(VERIFY_WRITE, kgr, sizeof(struct group_req)) || + __get_user(interface, &gr32->gr_interface) || + __put_user(interface, &kgr->gr_interface) || + copy_in_user(&kgr->gr_group, &gr32->gr_group, + sizeof(kgr->gr_group))) + return -EFAULT; + koptval = (char __user *)kgr; + koptlen = sizeof(struct group_req); + break; + } + case MCAST_JOIN_SOURCE_GROUP: + case MCAST_LEAVE_SOURCE_GROUP: + case MCAST_BLOCK_SOURCE: + case MCAST_UNBLOCK_SOURCE: + { + struct compat_group_source_req __user *gsr32 = (void *)optval; + struct group_source_req *kgsr = compat_alloc_user_space( + sizeof(struct group_source_req)); + u32 interface; + + if (!access_ok(VERIFY_READ, gsr32, sizeof(*gsr32)) || + !access_ok(VERIFY_WRITE, kgsr, + sizeof(struct group_source_req)) || + __get_user(interface, &gsr32->gsr_interface) || + __put_user(interface, &kgsr->gsr_interface) || + copy_in_user(&kgsr->gsr_group, &gsr32->gsr_group, + sizeof(kgsr->gsr_group)) || + copy_in_user(&kgsr->gsr_source, &gsr32->gsr_source, + sizeof(kgsr->gsr_source))) + return -EFAULT; + koptval = (char __user *)kgsr; + koptlen = sizeof(struct group_source_req); + break; + } + case MCAST_MSFILTER: + { + struct compat_group_filter __user *gf32 = (void *)optval; + struct group_filter *kgf; + u32 interface, fmode, numsrc; + + if (!access_ok(VERIFY_READ, gf32, sizeof(*gf32)) || + __get_user(interface, &gf32->gf_interface) || + __get_user(fmode, &gf32->gf_fmode) || + __get_user(numsrc, &gf32->gf_numsrc)) + return -EFAULT; + koptlen = optlen + sizeof(struct group_filter) - + sizeof(struct compat_group_filter); + if (koptlen < GROUP_FILTER_SIZE(numsrc)) + return -EINVAL; + kgf = compat_alloc_user_space(koptlen); + if (!access_ok(VERIFY_WRITE, kgf, koptlen) || + __put_user(interface, &kgf->gf_interface) || + __put_user(fmode, &kgf->gf_fmode) || + __put_user(numsrc, &kgf->gf_numsrc) || + copy_in_user(&kgf->gf_group, &gf32->gf_group, + sizeof(kgf->gf_group)) || + (numsrc && copy_in_user(&kgf->gf_slist, &gf32->gf_slist, + numsrc * sizeof(kgf->gf_slist[0])))) + return -EFAULT; + koptval = (char __user *)kgf; + break; + } + + default: + break; + } + return setsockopt(sock, level, optname, koptval, koptlen); +} + +EXPORT_SYMBOL(compat_mc_setsockopt); + + /* Argument list sizes for compat_sys_socketcall */ #define AL(x) ((x) * sizeof(u32)) static unsigned char nas[18]={AL(0),AL(3),AL(3),AL(3),AL(2),AL(3), diff --git a/net/ipv4/ip_sockglue.c b/net/ipv4/ip_sockglue.c index d8adfd4972e2..4d8d95404f45 100644 --- a/net/ipv4/ip_sockglue.c +++ b/net/ipv4/ip_sockglue.c @@ -36,6 +36,7 @@ #include #include #include +#include #if defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE) #include #endif @@ -923,6 +924,10 @@ int compat_ip_setsockopt(struct sock *sk, int level, int optname, if (level != SOL_IP) return -ENOPROTOOPT; + if (optname >= MCAST_JOIN_GROUP && optname <= MCAST_MSFILTER) + return compat_mc_setsockopt(sk, level, optname, optval, optlen, + ip_setsockopt); + err = do_ip_setsockopt(sk, level, optname, optval, optlen); #ifdef CONFIG_NETFILTER /* we need to exclude all possible ENOPROTOOPTs except default case */ diff --git a/net/ipv6/ipv6_sockglue.c b/net/ipv6/ipv6_sockglue.c index 06de9d0e1f6b..db6fdc1498aa 100644 --- a/net/ipv6/ipv6_sockglue.c +++ b/net/ipv6/ipv6_sockglue.c @@ -52,6 +52,7 @@ #include #include #include +#include #include @@ -779,6 +780,10 @@ int compat_ipv6_setsockopt(struct sock *sk, int level, int optname, if (level != SOL_IPV6) return -ENOPROTOOPT; + if (optname >= MCAST_JOIN_GROUP && optname <= MCAST_MSFILTER) + return compat_mc_setsockopt(sk, level, optname, optval, optlen, + ipv6_setsockopt); + err = do_ipv6_setsockopt(sk, level, optname, optval, optlen); #ifdef CONFIG_NETFILTER /* we need to exclude all possible ENOPROTOOPTs except default case */ -- cgit v1.2.3 From 90888816ba1bf1c4eff1e7e4220c1afc802f0fd3 Mon Sep 17 00:00:00 2001 From: "David S. Miller" Date: Sun, 27 Apr 2008 14:52:51 -0700 Subject: sparc64: Clean up handling of pt_regs trap type encoding. If we use this from more than one place, it's better to have helpers instead of twiddling magic constants all over. Add pt_regs_trap_type(), pt_regs_clear_trap_type(), and pt_regs_is_syscall(). Use them in do_signal(). Signed-off-by: David S. Miller --- arch/sparc64/kernel/signal.c | 7 +++---- include/asm-sparc64/ptrace.h | 23 ++++++++++++++++++++++- 2 files changed, 25 insertions(+), 5 deletions(-) diff --git a/arch/sparc64/kernel/signal.c b/arch/sparc64/kernel/signal.c index 314f51aefa0f..f2d88d8f7a42 100644 --- a/arch/sparc64/kernel/signal.c +++ b/arch/sparc64/kernel/signal.c @@ -513,11 +513,10 @@ static void do_signal(struct pt_regs *regs, unsigned long orig_i0) struct k_sigaction ka; sigset_t *oldset; siginfo_t info; - int signr, tt; + int signr; - tt = regs->magic & 0x1ff; - if (tt == 0x110 || tt == 0x111 || tt == 0x16d) { - regs->magic &= ~0x1ff; + if (pt_regs_is_syscall(regs)) { + pt_regs_clear_trap_type(regs); cookie.restart_syscall = 1; } else cookie.restart_syscall = 0; diff --git a/include/asm-sparc64/ptrace.h b/include/asm-sparc64/ptrace.h index b4b951d570bb..714b81956f32 100644 --- a/include/asm-sparc64/ptrace.h +++ b/include/asm-sparc64/ptrace.h @@ -1,4 +1,3 @@ -/* $Id: ptrace.h,v 1.14 2002/02/09 19:49:32 davem Exp $ */ #ifndef _SPARC64_PTRACE_H #define _SPARC64_PTRACE_H @@ -8,10 +7,15 @@ * stack during a system call and basically all traps. */ +/* This magic value must have the low 9 bits clear, + * as that is where we encode the %tt value, see below. + */ #define PT_REGS_MAGIC 0x57ac6c00 #ifndef __ASSEMBLY__ +#include + struct pt_regs { unsigned long u_regs[16]; /* globals and ins */ unsigned long tstate; @@ -33,6 +37,23 @@ struct pt_regs { unsigned int magic; }; +static inline int pt_regs_trap_type(struct pt_regs *regs) +{ + return regs->magic & 0x1ff; +} + +static inline int pt_regs_clear_trap_type(struct pt_regs *regs) +{ + return regs->magic &= ~0x1ff; +} + +static inline bool pt_regs_is_syscall(struct pt_regs *regs) +{ + int tt = pt_regs_trap_type(regs); + + return (tt == 0x110 || tt == 0x111 || tt == 0x16d); +} + struct pt_regs32 { unsigned int psr; unsigned int pc; -- cgit v1.2.3 From fd7354108aa5497d7177b95a6b157eaf8597d621 Mon Sep 17 00:00:00 2001 From: "David S. Miller" Date: Sun, 27 Apr 2008 14:54:02 -0700 Subject: sparc64: Fix accidental syscall restart on child return from clone/fork/vfork. This fixes a regression added by 238468b2ac76020c192a7402c92df5097916bf4a ("[SPARC64]: Use trap type stored in pt_regs to handle syscall restart.") Because we now encode the "returning from syscall" status in the pt_regs area, we have to be mindful to zap it out in the child of a fork. During a parallel kernel build I saw an accidental -EINTR return from vfork() in 'make' because of this bug. Signed-off-by: David S. Miller --- arch/sparc64/kernel/process.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/arch/sparc64/kernel/process.c b/arch/sparc64/kernel/process.c index 500ac6d483a0..056013749157 100644 --- a/arch/sparc64/kernel/process.c +++ b/arch/sparc64/kernel/process.c @@ -591,6 +591,12 @@ int copy_thread(int nr, unsigned long clone_flags, unsigned long sp, if (clone_flags & CLONE_SETTLS) t->kregs->u_regs[UREG_G7] = regs->u_regs[UREG_I3]; + /* We do not want to accidently trigger system call restart + * handling in the new thread. Therefore, clear out the trap + * type, which will make pt_regs_regs_is_syscall() return false. + */ + pt_regs_clear_trap_type(t->kregs); + return 0; } -- cgit v1.2.3 From 2556bf1212c768f567401257582681aa117af4a9 Mon Sep 17 00:00:00 2001 From: Robert Reif Date: Sun, 27 Apr 2008 15:16:59 -0700 Subject: sparc: bw2.c fix bw2_exit Fix void function bw2_exit returning value. Signed-off-by: Robert Reif Signed-off-by: David S. Miller --- drivers/video/bw2.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/video/bw2.c b/drivers/video/bw2.c index 833b10c84064..bb0bb5532845 100644 --- a/drivers/video/bw2.c +++ b/drivers/video/bw2.c @@ -399,10 +399,9 @@ static int __init bw2_init(void) static void __exit bw2_exit(void) { - return of_unregister_driver(&bw2_driver); + of_unregister_driver(&bw2_driver); } - module_init(bw2_init); module_exit(bw2_exit); -- cgit v1.2.3 From 544330009bc5b879129593236aab29e458ec9fe4 Mon Sep 17 00:00:00 2001 From: Robert Reif Date: Sun, 27 Apr 2008 15:17:23 -0700 Subject: sparc: cg14.c make cg14_init and cg15_exit static Make cg14_init and cg14_exit static. Signed-off-by: Robert Reif Signed-off-by: David S. Miller --- drivers/video/cg14.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/video/cg14.c b/drivers/video/cg14.c index fdc9f43ec30a..359734e9bc13 100644 --- a/drivers/video/cg14.c +++ b/drivers/video/cg14.c @@ -605,7 +605,7 @@ static struct of_platform_driver cg14_driver = { .remove = __devexit_p(cg14_remove), }; -int __init cg14_init(void) +static int __init cg14_init(void) { if (fb_get_options("cg14fb", NULL)) return -ENODEV; @@ -613,7 +613,7 @@ int __init cg14_init(void) return of_register_driver(&cg14_driver, &of_bus_type); } -void __exit cg14_exit(void) +static void __exit cg14_exit(void) { of_unregister_driver(&cg14_driver); } -- cgit v1.2.3 From a2fb0ce7aeae9c38146df9c2d9b763e5981a0683 Mon Sep 17 00:00:00 2001 From: Robert Reif Date: Sun, 27 Apr 2008 15:17:49 -0700 Subject: sparc: ffb.c make ffb_init and ffb_exit static Make ffb_init and ffb_exit static. Remove unnecessary function prototype. Signed-off-by: Robert Reif Signed-off-by: David S. Miller --- drivers/video/ffb.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/video/ffb.c b/drivers/video/ffb.c index d7e24889650e..02a61147799a 100644 --- a/drivers/video/ffb.c +++ b/drivers/video/ffb.c @@ -32,7 +32,6 @@ static int ffb_setcolreg(unsigned, unsigned, unsigned, unsigned, unsigned, struct fb_info *); static int ffb_blank(int, struct fb_info *); -static void ffb_init_fix(struct fb_info *); static void ffb_imageblit(struct fb_info *, const struct fb_image *); static void ffb_fillrect(struct fb_info *, const struct fb_fillrect *); @@ -1062,7 +1061,7 @@ static struct of_platform_driver ffb_driver = { .remove = __devexit_p(ffb_remove), }; -int __init ffb_init(void) +static int __init ffb_init(void) { if (fb_get_options("ffb", NULL)) return -ENODEV; @@ -1070,7 +1069,7 @@ int __init ffb_init(void) return of_register_driver(&ffb_driver, &of_bus_type); } -void __exit ffb_exit(void) +static void __exit ffb_exit(void) { of_unregister_driver(&ffb_driver); } -- cgit v1.2.3 From f36861d550e6f1a1a7a851b88938f52bdaed7682 Mon Sep 17 00:00:00 2001 From: Robert Reif Date: Sun, 27 Apr 2008 15:18:12 -0700 Subject: sparc: tcx.c make tcx_init and tcx_exit static Make tcx_init and tcx_exit static. Signed-off-by: Robert Reif Signed-off-by: David S. Miller --- drivers/video/tcx.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/video/tcx.c b/drivers/video/tcx.c index fd94dfbab44b..7dc33bfe2cb2 100644 --- a/drivers/video/tcx.c +++ b/drivers/video/tcx.c @@ -527,7 +527,7 @@ static struct of_platform_driver tcx_driver = { .remove = __devexit_p(tcx_remove), }; -int __init tcx_init(void) +static int __init tcx_init(void) { if (fb_get_options("tcxfb", NULL)) return -ENODEV; @@ -535,7 +535,7 @@ int __init tcx_init(void) return of_register_driver(&tcx_driver, &of_bus_type); } -void __exit tcx_exit(void) +static void __exit tcx_exit(void) { of_unregister_driver(&tcx_driver); } -- cgit v1.2.3 From 194f1a68b93e959ede6ec363db4714e630bdbb6a Mon Sep 17 00:00:00 2001 From: Robert Reif Date: Sun, 27 Apr 2008 15:18:57 -0700 Subject: sparc: video drivers: add facility level Add KERN_ facility level to sparc video drivers. Signed-off-by: Robert Reif Signed-off-by: David S. Miller --- drivers/video/bw2.c | 2 +- drivers/video/cg14.c | 2 +- drivers/video/cg3.c | 2 +- drivers/video/cg6.c | 2 +- drivers/video/ffb.c | 2 +- drivers/video/leo.c | 2 +- drivers/video/p9100.c | 2 +- drivers/video/tcx.c | 2 +- 8 files changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/video/bw2.c b/drivers/video/bw2.c index bb0bb5532845..275d9dab0c61 100644 --- a/drivers/video/bw2.c +++ b/drivers/video/bw2.c @@ -339,7 +339,7 @@ static int __devinit bw2_probe(struct of_device *op, const struct of_device_id * dev_set_drvdata(&op->dev, info); - printk("%s: bwtwo at %lx:%lx\n", + printk(KERN_INFO "%s: bwtwo at %lx:%lx\n", dp->full_name, par->which_io, par->physbase); return 0; diff --git a/drivers/video/cg14.c b/drivers/video/cg14.c index 359734e9bc13..0db0fecba93b 100644 --- a/drivers/video/cg14.c +++ b/drivers/video/cg14.c @@ -556,7 +556,7 @@ static int __devinit cg14_probe(struct of_device *op, const struct of_device_id dev_set_drvdata(&op->dev, info); - printk("%s: cgfourteen at %lx:%lx, %dMB\n", + printk(KERN_INFO "%s: cgfourteen at %lx:%lx, %dMB\n", dp->full_name, par->iospace, par->physbase, par->ramsize >> 20); diff --git a/drivers/video/cg3.c b/drivers/video/cg3.c index a5c7fb331527..010ea53978f8 100644 --- a/drivers/video/cg3.c +++ b/drivers/video/cg3.c @@ -419,7 +419,7 @@ static int __devinit cg3_probe(struct of_device *op, dev_set_drvdata(&op->dev, info); - printk("%s: cg3 at %lx:%lx\n", + printk(KERN_INFO "%s: cg3 at %lx:%lx\n", dp->full_name, par->which_io, par->physbase); return 0; diff --git a/drivers/video/cg6.c b/drivers/video/cg6.c index 549891d76ef5..fc90db6da65a 100644 --- a/drivers/video/cg6.c +++ b/drivers/video/cg6.c @@ -781,7 +781,7 @@ static int __devinit cg6_probe(struct of_device *op, dev_set_drvdata(&op->dev, info); - printk("%s: CGsix [%s] at %lx:%lx\n", + printk(KERN_INFO "%s: CGsix [%s] at %lx:%lx\n", dp->full_name, info->fix.id, par->which_io, par->physbase); diff --git a/drivers/video/ffb.c b/drivers/video/ffb.c index 02a61147799a..93dca3e2aa50 100644 --- a/drivers/video/ffb.c +++ b/drivers/video/ffb.c @@ -1000,7 +1000,7 @@ static int __devinit ffb_probe(struct of_device *op, dev_set_drvdata(&op->dev, info); - printk("%s: %s at %016lx, type %d, " + printk(KERN_INFO "%s: %s at %016lx, type %d, " "DAC pnum[%x] rev[%d] manuf_rev[%d]\n", dp->full_name, ((par->flags & FFB_FLAG_AFB) ? "AFB" : "FFB"), diff --git a/drivers/video/leo.c b/drivers/video/leo.c index 45b9a5d55dec..f3160fc29795 100644 --- a/drivers/video/leo.c +++ b/drivers/video/leo.c @@ -614,7 +614,7 @@ static int __devinit leo_probe(struct of_device *op, const struct of_device_id * dev_set_drvdata(&op->dev, info); - printk("%s: leo at %lx:%lx\n", + printk(KERN_INFO "%s: leo at %lx:%lx\n", dp->full_name, par->which_io, par->physbase); diff --git a/drivers/video/p9100.c b/drivers/video/p9100.c index 58496061142d..c95874fe9076 100644 --- a/drivers/video/p9100.c +++ b/drivers/video/p9100.c @@ -310,7 +310,7 @@ static int __devinit p9100_probe(struct of_device *op, const struct of_device_id dev_set_drvdata(&op->dev, info); - printk("%s: p9100 at %lx:%lx\n", + printk(KERN_INFO "%s: p9100 at %lx:%lx\n", dp->full_name, par->which_io, par->physbase); diff --git a/drivers/video/tcx.c b/drivers/video/tcx.c index 7dc33bfe2cb2..a71774305772 100644 --- a/drivers/video/tcx.c +++ b/drivers/video/tcx.c @@ -470,7 +470,7 @@ static int __devinit tcx_init_one(struct of_device *op) dev_set_drvdata(&op->dev, info); - printk("%s: TCX at %lx:%lx, %s\n", + printk(KERN_INFO "%s: TCX at %lx:%lx, %s\n", dp->full_name, par->which_io, par->physbase, -- cgit v1.2.3 From 9ae27e0adbf471c7a6b80102e38e1d5a346b3b38 Mon Sep 17 00:00:00 2001 From: Evgeniy Polyakov Date: Sun, 27 Apr 2008 15:27:30 -0700 Subject: tcp: Fix slab corruption with ipv6 and tcp6fuzz From: Evgeniy Polyakov This fixes a regression added by ec3c0982a2dd1e671bad8e9d26c28dcba0039d87 ("[TCP]: TCP_DEFER_ACCEPT updates - process as established") tcp_v6_do_rcv()->tcp_rcv_established(), the latter goes to step5, where eventually skb can be freed via tcp_data_queue() (drop: label), then if check for tcp_defer_accept_check() returns true and thus tcp_rcv_established() returns -1, which forces tcp_v6_do_rcv() to jump to reset: label, which in turn will pass through discard: label and free the same skb again. Tested by Eric Sesterhenn. Signed-off-by: David S. Miller Acked-By: Patrick McManus --- net/ipv4/tcp_input.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c index ac9b8482f702..0298f80681f2 100644 --- a/net/ipv4/tcp_input.c +++ b/net/ipv4/tcp_input.c @@ -4925,8 +4925,7 @@ step5: tcp_data_snd_check(sk); tcp_ack_snd_check(sk); - if (tcp_defer_accept_check(sk)) - return -1; + tcp_defer_accept_check(sk); return 0; csum_error: -- cgit v1.2.3 From bd8fd21dfddf51299d782f598cb776b15b7d64cc Mon Sep 17 00:00:00 2001 From: Luca Tettamanti Date: Sun, 27 Apr 2008 15:34:55 -0700 Subject: wireless: Fix compile error with wifi & leds Fix build error caused by commit e82404ad612ebabc65d15c3d59b971cb35c3ff36 ("iwlwifi: Select LEDS_CLASS.") from David Miller: Since MAC80211_LEDS is selected by wireless drivers it must select its own dependencies otherwise a build error may occur (kbuild will select the symbol regardless of "depends" constraints). Signed-off-By: Luca Tettamanti Signed-off-by: David S. Miller --- net/mac80211/Kconfig | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/net/mac80211/Kconfig b/net/mac80211/Kconfig index 520a5180a4f6..a24b459dd45a 100644 --- a/net/mac80211/Kconfig +++ b/net/mac80211/Kconfig @@ -73,7 +73,9 @@ config MAC80211_MESH config MAC80211_LEDS bool "Enable LED triggers" - depends on MAC80211 && LEDS_TRIGGERS + depends on MAC80211 + select NEW_LEDS + select LEDS_TRIGGERS ---help--- This option enables a few LED triggers for different packet receive/transmit events. -- cgit v1.2.3 From 358c12953b88c5a06a57c33eb27c753b2e7934d1 Mon Sep 17 00:00:00 2001 From: Jason Riedy Date: Sun, 27 Apr 2008 15:38:30 -0700 Subject: iwlwifi: Allow building iwl3945 without iwl4965. If IWL3945 ever depends on IWLCORE, the silent, user-invisible IWLWIFI option can go away. Signed-off-by: Jason Riedy Signed-off-by: David S. Miller --- drivers/net/wireless/Makefile | 2 +- drivers/net/wireless/iwlwifi/Kconfig | 6 ++++++ 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/net/wireless/Makefile b/drivers/net/wireless/Makefile index c2642bc1d49b..2c343aae38d4 100644 --- a/drivers/net/wireless/Makefile +++ b/drivers/net/wireless/Makefile @@ -56,7 +56,7 @@ obj-$(CONFIG_RTL8187) += rtl8187.o obj-$(CONFIG_ADM8211) += adm8211.o -obj-$(CONFIG_IWLCORE) += iwlwifi/ +obj-$(CONFIG_IWLWIFI) += iwlwifi/ obj-$(CONFIG_RT2X00) += rt2x00/ obj-$(CONFIG_P54_COMMON) += p54/ diff --git a/drivers/net/wireless/iwlwifi/Kconfig b/drivers/net/wireless/iwlwifi/Kconfig index c4e631d14bfe..9a25f550fd16 100644 --- a/drivers/net/wireless/iwlwifi/Kconfig +++ b/drivers/net/wireless/iwlwifi/Kconfig @@ -1,6 +1,11 @@ +config IWLWIFI + bool + default n + config IWLCORE tristate "Intel Wireless Wifi Core" depends on PCI && MAC80211 && WLAN_80211 && EXPERIMENTAL + select IWLWIFI config IWLWIFI_LEDS bool @@ -106,6 +111,7 @@ config IWL3945 tristate "Intel PRO/Wireless 3945ABG/BG Network Connection" depends on PCI && MAC80211 && WLAN_80211 && EXPERIMENTAL select FW_LOADER + select IWLWIFI ---help--- Select to build the driver supporting the: -- cgit v1.2.3 From e392febedb6e1050a1a81a7bd72456a32c88e710 Mon Sep 17 00:00:00 2001 From: Eric Paris Date: Tue, 22 Apr 2008 17:46:08 -0400 Subject: SELinux: avc_ss.h whitespace, syntax, and other cleanups This patch changes avc_ss.h to fix whitespace and syntax issues. Things that are fixed may include (does not not have to include) whitespace at end of lines spaces followed by tabs spaces used instead of tabs spacing around parenthesis location of { around structs and else clauses location of * in pointer declarations removal of initialization of static data to keep it in the right section useless {} in if statemetns useless checking for NULL before kfree fixing of the indentation depth of switch statements no assignments in if statements and any number of other things I forgot to mention Signed-off-by: Eric Paris Signed-off-by: James Morris --- security/selinux/include/avc_ss.h | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/security/selinux/include/avc_ss.h b/security/selinux/include/avc_ss.h index ff869e8b6f4a..c0d314d9f8e1 100644 --- a/security/selinux/include/avc_ss.h +++ b/security/selinux/include/avc_ss.h @@ -10,22 +10,19 @@ int avc_ss_reset(u32 seqno); -struct av_perm_to_string -{ +struct av_perm_to_string { u16 tclass; u32 value; const char *name; }; -struct av_inherit -{ +struct av_inherit { u16 tclass; const char **common_pts; u32 common_base; }; -struct selinux_class_perm -{ +struct selinux_class_perm { const struct av_perm_to_string *av_perm_to_string; u32 av_pts_len; const char **class_to_string; -- cgit v1.2.3 From cc03766aaf0b670581ec2bd5cba2b9051d14df8d Mon Sep 17 00:00:00 2001 From: Eric Paris Date: Tue, 22 Apr 2008 17:46:09 -0400 Subject: SELinux: netlabel.h whitespace, syntax, and other cleanups This patch changes netlabel.h to fix whitespace and syntax issues. Things that are fixed may include (does not not have to include) spaces used instead of tabs Signed-off-by: Eric Paris Signed-off-by: James Morris --- security/selinux/include/netlabel.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/security/selinux/include/netlabel.h b/security/selinux/include/netlabel.h index 9a9e7cd9a379..487a7d81fe20 100644 --- a/security/selinux/include/netlabel.h +++ b/security/selinux/include/netlabel.h @@ -64,7 +64,7 @@ static inline void selinux_netlbl_cache_invalidate(void) } static inline void selinux_netlbl_sk_security_reset( - struct sk_security_struct *ssec, + struct sk_security_struct *ssec, int family) { return; -- cgit v1.2.3 From a936b79bdf97285e0274eca7b656fc6350ca57ea Mon Sep 17 00:00:00 2001 From: Eric Paris Date: Tue, 22 Apr 2008 17:46:10 -0400 Subject: SELinux: objsec.h whitespace, syntax, and other cleanups This patch changes objsec.h to fix whitespace and syntax issues. Things that are fixed may include (does not not have to include) whitespace at end of lines spaces followed by tabs spaces used instead of tabs spacing around parenthesis location of { around structs and else clauses location of * in pointer declarations removal of initialization of static data to keep it in the right section useless {} in if statemetns useless checking for NULL before kfree fixing of the indentation depth of switch statements no assignments in if statements and any number of other things I forgot to mention Signed-off-by: Eric Paris Signed-off-by: James Morris --- security/selinux/include/objsec.h | 60 +++++++++++++++++++-------------------- 1 file changed, 30 insertions(+), 30 deletions(-) diff --git a/security/selinux/include/objsec.h b/security/selinux/include/objsec.h index 300b61bad7b3..032c2357dad1 100644 --- a/security/selinux/include/objsec.h +++ b/security/selinux/include/objsec.h @@ -4,16 +4,16 @@ * This file contains the SELinux security data structures for kernel objects. * * Author(s): Stephen Smalley, - * Chris Vance, - * Wayne Salamon, - * James Morris + * Chris Vance, + * Wayne Salamon, + * James Morris * * Copyright (C) 2001,2002 Networks Associates Technology, Inc. * Copyright (C) 2003 Red Hat, Inc., James Morris * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2, - * as published by the Free Software Foundation. + * as published by the Free Software Foundation. */ #ifndef _SELINUX_OBJSEC_H_ #define _SELINUX_OBJSEC_H_ @@ -28,58 +28,58 @@ #include "avc.h" struct task_security_struct { - u32 osid; /* SID prior to last execve */ - u32 sid; /* current SID */ - u32 exec_sid; /* exec SID */ - u32 create_sid; /* fscreate SID */ - u32 keycreate_sid; /* keycreate SID */ - u32 sockcreate_sid; /* fscreate SID */ + u32 osid; /* SID prior to last execve */ + u32 sid; /* current SID */ + u32 exec_sid; /* exec SID */ + u32 create_sid; /* fscreate SID */ + u32 keycreate_sid; /* keycreate SID */ + u32 sockcreate_sid; /* fscreate SID */ }; struct inode_security_struct { - struct inode *inode; /* back pointer to inode object */ - struct list_head list; /* list of inode_security_struct */ - u32 task_sid; /* SID of creating task */ - u32 sid; /* SID of this object */ - u16 sclass; /* security class of this object */ - unsigned char initialized; /* initialization flag */ + struct inode *inode; /* back pointer to inode object */ + struct list_head list; /* list of inode_security_struct */ + u32 task_sid; /* SID of creating task */ + u32 sid; /* SID of this object */ + u16 sclass; /* security class of this object */ + unsigned char initialized; /* initialization flag */ struct mutex lock; - unsigned char inherit; /* inherit SID from parent entry */ + unsigned char inherit; /* inherit SID from parent entry */ }; struct file_security_struct { - u32 sid; /* SID of open file description */ - u32 fown_sid; /* SID of file owner (for SIGIO) */ - u32 isid; /* SID of inode at the time of file open */ - u32 pseqno; /* Policy seqno at the time of file open */ + u32 sid; /* SID of open file description */ + u32 fown_sid; /* SID of file owner (for SIGIO) */ + u32 isid; /* SID of inode at the time of file open */ + u32 pseqno; /* Policy seqno at the time of file open */ }; struct superblock_security_struct { - struct super_block *sb; /* back pointer to sb object */ - struct list_head list; /* list of superblock_security_struct */ + struct super_block *sb; /* back pointer to sb object */ + struct list_head list; /* list of superblock_security_struct */ u32 sid; /* SID of file system superblock */ u32 def_sid; /* default SID for labeling */ u32 mntpoint_sid; /* SECURITY_FS_USE_MNTPOINT context for files */ - unsigned int behavior; /* labeling behavior */ - unsigned char initialized; /* initialization flag */ + unsigned int behavior; /* labeling behavior */ + unsigned char initialized; /* initialization flag */ unsigned char flags; /* which mount options were specified */ - unsigned char proc; /* proc fs */ + unsigned char proc; /* proc fs */ struct mutex lock; struct list_head isec_head; spinlock_t isec_lock; }; struct msg_security_struct { - u32 sid; /* SID of message */ + u32 sid; /* SID of message */ }; struct ipc_security_struct { u16 sclass; /* security class of this object */ - u32 sid; /* SID of IPC resource */ + u32 sid; /* SID of IPC resource */ }; struct bprm_security_struct { - u32 sid; /* SID for transformed process */ + u32 sid; /* SID for transformed process */ unsigned char set; /* @@ -123,7 +123,7 @@ struct sk_security_struct { }; struct key_security_struct { - u32 sid; /* SID of key */ + u32 sid; /* SID of key */ }; extern unsigned int selinux_checkreqprot; -- cgit v1.2.3 From b19d8eae99dae42bb747954fdbb2cd456922eb5f Mon Sep 17 00:00:00 2001 From: Eric Paris Date: Tue, 22 Apr 2008 17:46:11 -0400 Subject: SELinux: selinux/include/security.h whitespace, syntax, and other cleanups This patch changes selinux/include/security.h to fix whitespace and syntax issues. Things that are fixed may include (does not not have to include) whitespace at end of lines spaces followed by tabs spaces used instead of tabs spacing around parenthesis location of { around structs and else clauses location of * in pointer declarations removal of initialization of static data to keep it in the right section useless {} in if statemetns useless checking for NULL before kfree fixing of the indentation depth of switch statements no assignments in if statements and any number of other things I forgot to mention Signed-off-by: Eric Paris Signed-off-by: James Morris --- security/selinux/include/security.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/security/selinux/include/security.h b/security/selinux/include/security.h index 1904c462a605..6445b6440648 100644 --- a/security/selinux/include/security.h +++ b/security/selinux/include/security.h @@ -62,7 +62,7 @@ enum { extern int selinux_policycap_netpeer; extern int selinux_policycap_openperm; -int security_load_policy(void * data, size_t len); +int security_load_policy(void *data, size_t len); int security_policycap_supported(unsigned int req_cap); @@ -110,7 +110,7 @@ int security_node_sid(u16 domain, void *addr, u32 addrlen, u32 *out_sid); int security_validate_transition(u32 oldsid, u32 newsid, u32 tasksid, - u16 tclass); + u16 tclass); int security_sid_mls_copy(u32 sid, u32 mls_sid, u32 *new_sid); -- cgit v1.2.3 From ccb3cbeb4f285a02103ded5298850a21e7028ba4 Mon Sep 17 00:00:00 2001 From: Eric Paris Date: Tue, 22 Apr 2008 17:46:12 -0400 Subject: SELinux: ss/conditional.h whitespace, syntax, and other cleanups This patch changes ss/conditional.h to fix whitespace and syntax issues. Things that are fixed may include (does not not have to include) location of * in pointer declarations Signed-off-by: Eric Paris Signed-off-by: James Morris --- security/selinux/ss/conditional.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/security/selinux/ss/conditional.h b/security/selinux/ss/conditional.h index f3a1fc6e5d66..65b9f8366e9c 100644 --- a/security/selinux/ss/conditional.h +++ b/security/selinux/ss/conditional.h @@ -59,10 +59,10 @@ struct cond_node { struct cond_node *next; }; -int cond_policydb_init(struct policydb* p); -void cond_policydb_destroy(struct policydb* p); +int cond_policydb_init(struct policydb *p); +void cond_policydb_destroy(struct policydb *p); -int cond_init_bool_indexes(struct policydb* p); +int cond_init_bool_indexes(struct policydb *p); int cond_destroy_bool(void *key, void *datum, void *p); int cond_index_bool(void *key, void *datum, void *datap); -- cgit v1.2.3 From 81fa42df78511e3bdbc0ea545990bda6a5b3e7de Mon Sep 17 00:00:00 2001 From: Eric Paris Date: Tue, 22 Apr 2008 17:46:13 -0400 Subject: SELinux: context.h whitespace, syntax, and other cleanups This patch changes context.h to fix whitespace and syntax issues. Things that are fixed may include (does not not have to include) include spaces around , in function calls Signed-off-by: Eric Paris Signed-off-by: James Morris --- security/selinux/ss/context.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/security/selinux/ss/context.h b/security/selinux/ss/context.h index 2eee0dab524d..b9a6f7fc62fc 100644 --- a/security/selinux/ss/context.h +++ b/security/selinux/ss/context.h @@ -84,9 +84,9 @@ static inline int mls_context_cmp(struct context *c1, struct context *c2) return 1; return ((c1->range.level[0].sens == c2->range.level[0].sens) && - ebitmap_cmp(&c1->range.level[0].cat,&c2->range.level[0].cat) && + ebitmap_cmp(&c1->range.level[0].cat, &c2->range.level[0].cat) && (c1->range.level[1].sens == c2->range.level[1].sens) && - ebitmap_cmp(&c1->range.level[1].cat,&c2->range.level[1].cat)); + ebitmap_cmp(&c1->range.level[1].cat, &c2->range.level[1].cat)); } static inline void mls_context_destroy(struct context *c) -- cgit v1.2.3 From faff786ce2f7c14f25d29cf61b0634c8f6c4827f Mon Sep 17 00:00:00 2001 From: Eric Paris Date: Tue, 22 Apr 2008 17:46:14 -0400 Subject: SELinux: hashtab.h whitespace, syntax, and other cleanups This patch changes hashtab.h to fix whitespace and syntax issues. Things that are fixed may include (does not not have to include) spaces used instead of tabs Signed-off-by: Eric Paris Signed-off-by: James Morris --- security/selinux/ss/hashtab.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/security/selinux/ss/hashtab.h b/security/selinux/ss/hashtab.h index 7e2ff3e3c6d2..953872cd84ab 100644 --- a/security/selinux/ss/hashtab.h +++ b/security/selinux/ss/hashtab.h @@ -40,8 +40,8 @@ struct hashtab_info { * the new hash table otherwise. */ struct hashtab *hashtab_create(u32 (*hash_value)(struct hashtab *h, const void *key), - int (*keycmp)(struct hashtab *h, const void *key1, const void *key2), - u32 size); + int (*keycmp)(struct hashtab *h, const void *key1, const void *key2), + u32 size); /* * Inserts the specified (key, datum) pair into the specified hash table. @@ -49,7 +49,7 @@ struct hashtab *hashtab_create(u32 (*hash_value)(struct hashtab *h, const void * * Returns -ENOMEM on memory allocation error, * -EEXIST if there is already an entry with the same key, * -EINVAL for general errors or - * 0 otherwise. + 0 otherwise. */ int hashtab_insert(struct hashtab *h, void *k, void *d); -- cgit v1.2.3 From d497fc87c0e201194c3af75b787178cf4559f84b Mon Sep 17 00:00:00 2001 From: Eric Paris Date: Tue, 22 Apr 2008 17:46:15 -0400 Subject: SELinux: mls.h whitespace, syntax, and other cleanups This patch changes mls.h to fix whitespace and syntax issues. Things that are fixed may include (does not not have to include) spaces used instead of tabs Signed-off-by: Eric Paris Signed-off-by: James Morris --- security/selinux/ss/mls.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/security/selinux/ss/mls.h b/security/selinux/ss/mls.h index ab53663d9f5f..0fdf6257ef64 100644 --- a/security/selinux/ss/mls.h +++ b/security/selinux/ss/mls.h @@ -13,7 +13,7 @@ /* * Updated: Hewlett-Packard * - * Added support to import/export the MLS label from NetLabel + * Added support to import/export the MLS label from NetLabel * * (c) Copyright Hewlett-Packard Development Company, L.P., 2006 */ @@ -31,7 +31,7 @@ int mls_range_isvalid(struct policydb *p, struct mls_range *r); int mls_level_isvalid(struct policydb *p, struct mls_level *l); int mls_context_to_sid(char oldc, - char **scontext, + char **scontext, struct context *context, struct sidtab *s, u32 def_sid); @@ -49,7 +49,7 @@ int mls_compute_sid(struct context *scontext, struct context *newcontext); int mls_setup_user_range(struct context *fromcon, struct user_datum *user, - struct context *usercon); + struct context *usercon); #ifdef CONFIG_NETLABEL void mls_export_netlbl_lvl(struct context *context, -- cgit v1.2.3 From 8bf1f3a6c0f7e4092c0c041175a52734600490ba Mon Sep 17 00:00:00 2001 From: Eric Paris Date: Tue, 22 Apr 2008 17:46:16 -0400 Subject: SELinux: mls_types.h whitespace, syntax, and other cleanups This patch changes mls_types.h to fix whitespace and syntax issues. Things that are fixed may include (does not not have to include) spaces used instead of tabs Signed-off-by: Eric Paris Signed-off-by: James Morris --- security/selinux/ss/mls_types.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/security/selinux/ss/mls_types.h b/security/selinux/ss/mls_types.h index 0c692d58d489..b6e943a21061 100644 --- a/security/selinux/ss/mls_types.h +++ b/security/selinux/ss/mls_types.h @@ -31,7 +31,7 @@ static inline int mls_level_eq(struct mls_level *l1, struct mls_level *l2) return 1; return ((l1->sens == l2->sens) && - ebitmap_cmp(&l1->cat, &l2->cat)); + ebitmap_cmp(&l1->cat, &l2->cat)); } static inline int mls_level_dom(struct mls_level *l1, struct mls_level *l2) @@ -40,7 +40,7 @@ static inline int mls_level_dom(struct mls_level *l1, struct mls_level *l2) return 1; return ((l1->sens >= l2->sens) && - ebitmap_contains(&l1->cat, &l2->cat)); + ebitmap_contains(&l1->cat, &l2->cat)); } #define mls_level_incomp(l1, l2) \ -- cgit v1.2.3 From 489a5fd7198d2d2368dd5cf697c841ea4d61ddd1 Mon Sep 17 00:00:00 2001 From: Eric Paris Date: Tue, 22 Apr 2008 17:46:17 -0400 Subject: SELinux: policydb.h whitespace, syntax, and other cleanups This patch changes policydb.h to fix whitespace and syntax issues. Things that are fixed may include (does not not have to include) spaces followed by tabs spaces used instead of tabs location of * in pointer declarations Signed-off-by: Eric Paris Signed-off-by: James Morris --- security/selinux/ss/policydb.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/security/selinux/ss/policydb.h b/security/selinux/ss/policydb.h index ba593a3da877..4253370fda6a 100644 --- a/security/selinux/ss/policydb.h +++ b/security/selinux/ss/policydb.h @@ -12,12 +12,12 @@ * * Updated: Frank Mayer and Karl MacMillan * - * Added conditional policy language extensions + * Added conditional policy language extensions * * Copyright (C) 2004-2005 Trusted Computer Solutions, Inc. * Copyright (C) 2003 - 2004 Tresys Technology, LLC * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License as published by + * it under the terms of the GNU General Public License as published by * the Free Software Foundation, version 2. */ @@ -221,7 +221,7 @@ struct policydb { /* type enforcement conditional access vectors and transitions */ struct avtab te_cond_avtab; /* linked list indexing te_cond_avtab by conditional */ - struct cond_node* cond_list; + struct cond_node *cond_list; /* role allows */ struct role_allow *role_allow; @@ -230,10 +230,10 @@ struct policydb { TCP or UDP port numbers, network interfaces and nodes */ struct ocontext *ocontexts[OCON_NUM]; - /* security contexts for files in filesystems that cannot support + /* security contexts for files in filesystems that cannot support a persistent label mapping or use another fixed labeling behavior. */ - struct genfs *genfs; + struct genfs *genfs; /* range transitions */ struct range_trans *range_tr; -- cgit v1.2.3 From 7b41b1733ca1d3278c8eb891e17905d7d54f5bfa Mon Sep 17 00:00:00 2001 From: Eric Paris Date: Wed, 23 Apr 2008 14:10:25 -0400 Subject: SELinux: include/security.h whitespace, syntax, and other cleanups This patch changes include/security.h to fix whitespace and syntax issues. Things that are fixed may include (does not not have to include) whitespace at end of lines spaces followed by tabs spaces used instead of tabs spacing around parenthesis location of { around structs and else clauses location of * in pointer declarations removal of initialization of static data to keep it in the right section useless {} in if statemetns useless checking for NULL before kfree fixing of the indentation depth of switch statements no assignments in if statements include spaces around , in function calls and any number of other things I forgot to mention Signed-off-by: Eric Paris Signed-off-by: James Morris --- include/linux/security.h | 898 +++++++++++++++++++++++------------------------ 1 file changed, 449 insertions(+), 449 deletions(-) diff --git a/include/linux/security.h b/include/linux/security.h index 53a34539382a..a90c06376eec 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -46,25 +46,25 @@ struct audit_krule; * These functions are in security/capability.c and are used * as the default capabilities functions */ -extern int cap_capable (struct task_struct *tsk, int cap); -extern int cap_settime (struct timespec *ts, struct timezone *tz); -extern int cap_ptrace (struct task_struct *parent, struct task_struct *child); -extern int cap_capget (struct task_struct *target, kernel_cap_t *effective, kernel_cap_t *inheritable, kernel_cap_t *permitted); -extern int cap_capset_check (struct task_struct *target, kernel_cap_t *effective, kernel_cap_t *inheritable, kernel_cap_t *permitted); -extern void cap_capset_set (struct task_struct *target, kernel_cap_t *effective, kernel_cap_t *inheritable, kernel_cap_t *permitted); -extern int cap_bprm_set_security (struct linux_binprm *bprm); -extern void cap_bprm_apply_creds (struct linux_binprm *bprm, int unsafe); +extern int cap_capable(struct task_struct *tsk, int cap); +extern int cap_settime(struct timespec *ts, struct timezone *tz); +extern int cap_ptrace(struct task_struct *parent, struct task_struct *child); +extern int cap_capget(struct task_struct *target, kernel_cap_t *effective, kernel_cap_t *inheritable, kernel_cap_t *permitted); +extern int cap_capset_check(struct task_struct *target, kernel_cap_t *effective, kernel_cap_t *inheritable, kernel_cap_t *permitted); +extern void cap_capset_set(struct task_struct *target, kernel_cap_t *effective, kernel_cap_t *inheritable, kernel_cap_t *permitted); +extern int cap_bprm_set_security(struct linux_binprm *bprm); +extern void cap_bprm_apply_creds(struct linux_binprm *bprm, int unsafe); extern int cap_bprm_secureexec(struct linux_binprm *bprm); extern int cap_inode_setxattr(struct dentry *dentry, char *name, void *value, size_t size, int flags); extern int cap_inode_removexattr(struct dentry *dentry, char *name); extern int cap_inode_need_killpriv(struct dentry *dentry); extern int cap_inode_killpriv(struct dentry *dentry); -extern int cap_task_post_setuid (uid_t old_ruid, uid_t old_euid, uid_t old_suid, int flags); -extern void cap_task_reparent_to_init (struct task_struct *p); -extern int cap_task_setscheduler (struct task_struct *p, int policy, struct sched_param *lp); -extern int cap_task_setioprio (struct task_struct *p, int ioprio); -extern int cap_task_setnice (struct task_struct *p, int nice); -extern int cap_syslog (int type); +extern int cap_task_post_setuid(uid_t old_ruid, uid_t old_euid, uid_t old_suid, int flags); +extern void cap_task_reparent_to_init(struct task_struct *p); +extern int cap_task_setscheduler(struct task_struct *p, int policy, struct sched_param *lp); +extern int cap_task_setioprio(struct task_struct *p, int ioprio); +extern int cap_task_setnice(struct task_struct *p, int nice); +extern int cap_syslog(int type); extern int cap_vm_enough_memory(struct mm_struct *mm, long pages); struct msghdr; @@ -128,7 +128,7 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts) { int i; if (opts->mnt_opts) - for(i = 0; i < opts->num_mnt_opts; i++) + for (i = 0; i < opts->num_mnt_opts; i++) kfree(opts->mnt_opts[i]); kfree(opts->mnt_opts); opts->mnt_opts = NULL; @@ -190,21 +190,21 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts) * @bprm contains the linux_binprm structure. * Return 0 if the hook is successful and permission is granted. * @bprm_check_security: - * This hook mediates the point when a search for a binary handler will - * begin. It allows a check the @bprm->security value which is set in - * the preceding set_security call. The primary difference from - * set_security is that the argv list and envp list are reliably - * available in @bprm. This hook may be called multiple times - * during a single execve; and in each pass set_security is called - * first. - * @bprm contains the linux_binprm structure. + * This hook mediates the point when a search for a binary handler will + * begin. It allows a check the @bprm->security value which is set in + * the preceding set_security call. The primary difference from + * set_security is that the argv list and envp list are reliably + * available in @bprm. This hook may be called multiple times + * during a single execve; and in each pass set_security is called + * first. + * @bprm contains the linux_binprm structure. * Return 0 if the hook is successful and permission is granted. * @bprm_secureexec: - * Return a boolean value (0 or 1) indicating whether a "secure exec" - * is required. The flag is passed in the auxiliary table - * on the initial stack to the ELF interpreter to indicate whether libc - * should enable secure mode. - * @bprm contains the linux_binprm structure. + * Return a boolean value (0 or 1) indicating whether a "secure exec" + * is required. The flag is passed in the auxiliary table + * on the initial stack to the ELF interpreter to indicate whether libc + * should enable secure mode. + * @bprm contains the linux_binprm structure. * * Security hooks for filesystem operations. * @@ -221,7 +221,7 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts) * Check permission before obtaining filesystem statistics for the @mnt * mountpoint. * @dentry is a handle on the superblock for the filesystem. - * Return 0 if permission is granted. + * Return 0 if permission is granted. * @sb_mount: * Check permission before an object specified by @dev_name is mounted on * the mount point named by @nd. For an ordinary mount, @dev_name @@ -282,12 +282,12 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts) * @sb_pivotroot: * Check permission before pivoting the root filesystem. * @old_path contains the path for the new location of the current root (put_old). - * @new_path contains the path for the new root (new_root). + * @new_path contains the path for the new root (new_root). * Return 0 if permission is granted. * @sb_post_pivotroot: * Update module state after a successful pivot. * @old_path contains the path for the old root. - * @new_path contains the path for the new root. + * @new_path contains the path for the new root. * @sb_get_mnt_opts: * Get the security relevant mount options used for a superblock * @sb the superblock to get security mount options from @@ -316,9 +316,9 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts) * @inode_free_security: * @inode contains the inode structure. * Deallocate the inode security structure and set @inode->i_security to - * NULL. + * NULL. * @inode_init_security: - * Obtain the security attribute name suffix and value to set on a newly + * Obtain the security attribute name suffix and value to set on a newly * created inode and set up the incore security field for the new inode. * This hook is called by the fs code as part of the inode creation * transaction and provides for atomic labeling of the inode, unlike @@ -349,7 +349,7 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts) * @new_dentry contains the dentry structure for the new link. * Return 0 if permission is granted. * @inode_unlink: - * Check the permission to remove a hard link to a file. + * Check the permission to remove a hard link to a file. * @dir contains the inode structure of parent directory of the file. * @dentry contains the dentry structure for file to be unlinked. * Return 0 if permission is granted. @@ -361,7 +361,7 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts) * Return 0 if permission is granted. * @inode_mkdir: * Check permissions to create a new directory in the existing directory - * associated with inode strcture @dir. + * associated with inode strcture @dir. * @dir containst the inode structure of parent of the directory to be created. * @dentry contains the dentry structure of new directory. * @mode contains the mode of new directory. @@ -406,7 +406,7 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts) * called when the actual read/write operations are performed. * @inode contains the inode structure to check. * @mask contains the permission mask. - * @nd contains the nameidata (may be NULL). + * @nd contains the nameidata (may be NULL). * Return 0 if permission is granted. * @inode_setattr: * Check permission before setting file attributes. Note that the kernel @@ -428,24 +428,24 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts) * can use this hook to release any persistent label associated with the * inode. * @inode_setxattr: - * Check permission before setting the extended attributes - * @value identified by @name for @dentry. - * Return 0 if permission is granted. + * Check permission before setting the extended attributes + * @value identified by @name for @dentry. + * Return 0 if permission is granted. * @inode_post_setxattr: - * Update inode security field after successful setxattr operation. - * @value identified by @name for @dentry. + * Update inode security field after successful setxattr operation. + * @value identified by @name for @dentry. * @inode_getxattr: - * Check permission before obtaining the extended attributes - * identified by @name for @dentry. - * Return 0 if permission is granted. + * Check permission before obtaining the extended attributes + * identified by @name for @dentry. + * Return 0 if permission is granted. * @inode_listxattr: - * Check permission before obtaining the list of extended attribute - * names for @dentry. - * Return 0 if permission is granted. + * Check permission before obtaining the list of extended attribute + * names for @dentry. + * Return 0 if permission is granted. * @inode_removexattr: - * Check permission before removing the extended attribute - * identified by @name for @dentry. - * Return 0 if permission is granted. + * Check permission before removing the extended attribute + * identified by @name for @dentry. + * Return 0 if permission is granted. * @inode_getsecurity: * Retrieve a copy of the extended attribute representation of the * security label associated with @name for @inode via @buffer. Note that @@ -457,7 +457,7 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts) * Set the security label associated with @name for @inode from the * extended attribute value @value. @size indicates the size of the * @value in bytes. @flags may be XATTR_CREATE, XATTR_REPLACE, or 0. - * Note that @name is the remainder of the attribute name after the + * Note that @name is the remainder of the attribute name after the * security. prefix has been removed. * Return 0 on success. * @inode_listsecurity: @@ -564,7 +564,7 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts) * struct file, so the file structure (and associated security information) * can always be obtained: * container_of(fown, struct file, f_owner) - * @tsk contains the structure of task receiving signal. + * @tsk contains the structure of task receiving signal. * @fown contains the file owner information. * @sig is the signal that will be sent. When 0, kernel sends SIGIO. * Return 0 if permission is granted. @@ -722,12 +722,12 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts) * @arg5 contains a argument. * Return 0 if permission is granted. * @task_reparent_to_init: - * Set the security attributes in @p->security for a kernel thread that - * is being reparented to the init task. + * Set the security attributes in @p->security for a kernel thread that + * is being reparented to the init task. * @p contains the task_struct for the kernel thread. * @task_to_inode: - * Set the security attributes for an inode based on an associated task's - * security attributes, e.g. for /proc/pid inodes. + * Set the security attributes for an inode based on an associated task's + * security attributes, e.g. for /proc/pid inodes. * @p contains the task_struct for the task. * @inode contains the inode structure for the inode. * @@ -737,7 +737,7 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts) * Save security information for a netlink message so that permission * checking can be performed when the message is processed. The security * information can be saved using the eff_cap field of the - * netlink_skb_parms structure. Also may be used to provide fine + * netlink_skb_parms structure. Also may be used to provide fine * grained control over message transmission. * @sk associated sock of task sending the message., * @skb contains the sk_buff structure for the netlink message. @@ -805,14 +805,14 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts) * @sock contains the socket structure. * @address contains the address to bind to. * @addrlen contains the length of address. - * Return 0 if permission is granted. + * Return 0 if permission is granted. * @socket_connect: * Check permission before socket protocol layer connect operation * attempts to connect socket @sock to a remote address, @address. * @sock contains the socket structure. * @address contains the address of remote endpoint. * @addrlen contains the length of address. - * Return 0 if permission is granted. + * Return 0 if permission is granted. * @socket_listen: * Check permission before socket protocol layer listen operation. * @sock contains the socket structure. @@ -842,7 +842,7 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts) * @msg contains the message structure. * @size contains the size of message structure. * @flags contains the operational flags. - * Return 0 if permission is granted. + * Return 0 if permission is granted. * @socket_getsockname: * Check permission before the local address (name) of the socket object * @sock is retrieved. @@ -866,7 +866,7 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts) * @sock contains the socket structure. * @level contains the protocol level to set options for. * @optname contains the name of the option to set. - * Return 0 if permission is granted. + * Return 0 if permission is granted. * @socket_shutdown: * Checks permission before all or part of a connection on the socket * @sock is shut down. @@ -893,19 +893,19 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts) * Return 0 if all is well, otherwise, typical getsockopt return * values. * @socket_getpeersec_dgram: - * This hook allows the security module to provide peer socket security - * state for udp sockets on a per-packet basis to userspace via - * getsockopt SO_GETPEERSEC. The application must first have indicated - * the IP_PASSSEC option via getsockopt. It can then retrieve the - * security state returned by this hook for a packet via the SCM_SECURITY - * ancillary message type. - * @skb is the skbuff for the packet being queried - * @secdata is a pointer to a buffer in which to copy the security data - * @seclen is the maximum length for @secdata - * Return 0 on success, error on failure. + * This hook allows the security module to provide peer socket security + * state for udp sockets on a per-packet basis to userspace via + * getsockopt SO_GETPEERSEC. The application must first have indicated + * the IP_PASSSEC option via getsockopt. It can then retrieve the + * security state returned by this hook for a packet via the SCM_SECURITY + * ancillary message type. + * @skb is the skbuff for the packet being queried + * @secdata is a pointer to a buffer in which to copy the security data + * @seclen is the maximum length for @secdata + * Return 0 on success, error on failure. * @sk_alloc_security: - * Allocate and attach a security structure to the sk->sk_security field, - * which is used to copy security attributes between local stream sockets. + * Allocate and attach a security structure to the sk->sk_security field, + * which is used to copy security attributes between local stream sockets. * @sk_free_security: * Deallocate security structure. * @sk_clone_security: @@ -920,7 +920,7 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts) * @inet_csk_clone: * Sets the new child socket's sid to the openreq sid. * @inet_conn_established: - * Sets the connection's peersid to the secmark on skb. + * Sets the connection's peersid to the secmark on skb. * @req_classify_flow: * Sets the flow's sid to the openreq sid. * @@ -999,13 +999,13 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts) * No return value. * @key_permission: * See whether a specific operational right is granted to a process on a - * key. + * key. * @key_ref refers to the key (key pointer + possession attribute bit). * @context points to the process to provide the context against which to - * evaluate the security data on the key. + * evaluate the security data on the key. * @perm describes the combination of permissions required of this key. * Return 1 if permission granted, 0 if permission denied and -ve it the - * normal permissions model should be effected. + * normal permissions model should be effected. * * Security hooks affecting all System V IPC operations. * @@ -1056,7 +1056,7 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts) * The @msq may be NULL, e.g. for IPC_INFO or MSG_INFO. * @msq contains the message queue to act upon. May be NULL. * @cmd contains the operation to be performed. - * Return 0 if permission is granted. + * Return 0 if permission is granted. * @msg_queue_msgsnd: * Check permission before a message, @msg, is enqueued on the message * queue, @msq. @@ -1066,8 +1066,8 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts) * Return 0 if permission is granted. * @msg_queue_msgrcv: * Check permission before a message, @msg, is removed from the message - * queue, @msq. The @target task structure contains a pointer to the - * process that will be receiving the message (not equal to the current + * queue, @msq. The @target task structure contains a pointer to the + * process that will be receiving the message (not equal to the current * process when inline receives are being performed). * @msq contains the message queue to retrieve message from. * @msg contains the message destination. @@ -1132,15 +1132,15 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts) * Return 0 if permission is granted. * @sem_semctl: * Check permission when a semaphore operation specified by @cmd is to be - * performed on the semaphore @sma. The @sma may be NULL, e.g. for + * performed on the semaphore @sma. The @sma may be NULL, e.g. for * IPC_INFO or SEM_INFO. * @sma contains the semaphore structure. May be NULL. * @cmd contains the operation to be performed. * Return 0 if permission is granted. * @sem_semop * Check permissions before performing operations on members of the - * semaphore set @sma. If the @alter flag is nonzero, the semaphore set - * may be modified. + * semaphore set @sma. If the @alter flag is nonzero, the semaphore set + * may be modified. * @sma contains the semaphore structure. * @sops contains the operations to perform. * @nsops contains the number of operations to perform. @@ -1211,7 +1211,7 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts) * @syslog: * Check permission before accessing the kernel message ring or changing * logging to the console. - * See the syslog(2) manual page for an explanation of the @type values. + * See the syslog(2) manual page for an explanation of the @type values. * @type contains the type of action. * Return 0 if permission is granted. * @settime: @@ -1223,22 +1223,22 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts) * @vm_enough_memory: * Check permissions for allocating a new virtual mapping. * @mm contains the mm struct it is being added to. - * @pages contains the number of pages. + * @pages contains the number of pages. * Return 0 if permission is granted. * * @register_security: - * allow module stacking. - * @name contains the name of the security module being stacked. - * @ops contains a pointer to the struct security_operations of the module to stack. - * + * allow module stacking. + * @name contains the name of the security module being stacked. + * @ops contains a pointer to the struct security_operations of the module to stack. + * * @secid_to_secctx: * Convert secid to security context. * @secid contains the security ID. * @secdata contains the pointer that stores the converted security context. * @secctx_to_secid: - * Convert security context to secid. - * @secid contains the pointer to the generated security ID. - * @secdata contains the security context. + * Convert security context to secid. + * @secid contains the pointer to the generated security ID. + * @secdata contains the security context. * * @release_secctx: * Release the security context. @@ -1281,49 +1281,49 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts) struct security_operations { char name[SECURITY_NAME_MAX + 1]; - int (*ptrace) (struct task_struct * parent, struct task_struct * child); - int (*capget) (struct task_struct * target, - kernel_cap_t * effective, - kernel_cap_t * inheritable, kernel_cap_t * permitted); - int (*capset_check) (struct task_struct * target, - kernel_cap_t * effective, - kernel_cap_t * inheritable, - kernel_cap_t * permitted); - void (*capset_set) (struct task_struct * target, - kernel_cap_t * effective, - kernel_cap_t * inheritable, - kernel_cap_t * permitted); - int (*capable) (struct task_struct * tsk, int cap); - int (*acct) (struct file * file); - int (*sysctl) (struct ctl_table * table, int op); - int (*quotactl) (int cmds, int type, int id, struct super_block * sb); - int (*quota_on) (struct dentry * dentry); + int (*ptrace) (struct task_struct *parent, struct task_struct *child); + int (*capget) (struct task_struct *target, + kernel_cap_t *effective, + kernel_cap_t *inheritable, kernel_cap_t *permitted); + int (*capset_check) (struct task_struct *target, + kernel_cap_t *effective, + kernel_cap_t *inheritable, + kernel_cap_t *permitted); + void (*capset_set) (struct task_struct *target, + kernel_cap_t *effective, + kernel_cap_t *inheritable, + kernel_cap_t *permitted); + int (*capable) (struct task_struct *tsk, int cap); + int (*acct) (struct file *file); + int (*sysctl) (struct ctl_table *table, int op); + int (*quotactl) (int cmds, int type, int id, struct super_block *sb); + int (*quota_on) (struct dentry *dentry); int (*syslog) (int type); int (*settime) (struct timespec *ts, struct timezone *tz); int (*vm_enough_memory) (struct mm_struct *mm, long pages); - int (*bprm_alloc_security) (struct linux_binprm * bprm); - void (*bprm_free_security) (struct linux_binprm * bprm); - void (*bprm_apply_creds) (struct linux_binprm * bprm, int unsafe); - void (*bprm_post_apply_creds) (struct linux_binprm * bprm); - int (*bprm_set_security) (struct linux_binprm * bprm); - int (*bprm_check_security) (struct linux_binprm * bprm); - int (*bprm_secureexec) (struct linux_binprm * bprm); - - int (*sb_alloc_security) (struct super_block * sb); - void (*sb_free_security) (struct super_block * sb); - int (*sb_copy_data)(char *orig, char *copy); + int (*bprm_alloc_security) (struct linux_binprm *bprm); + void (*bprm_free_security) (struct linux_binprm *bprm); + void (*bprm_apply_creds) (struct linux_binprm *bprm, int unsafe); + void (*bprm_post_apply_creds) (struct linux_binprm *bprm); + int (*bprm_set_security) (struct linux_binprm *bprm); + int (*bprm_check_security) (struct linux_binprm *bprm); + int (*bprm_secureexec) (struct linux_binprm *bprm); + + int (*sb_alloc_security) (struct super_block *sb); + void (*sb_free_security) (struct super_block *sb); + int (*sb_copy_data) (char *orig, char *copy); int (*sb_kern_mount) (struct super_block *sb, void *data); int (*sb_statfs) (struct dentry *dentry); int (*sb_mount) (char *dev_name, struct path *path, char *type, unsigned long flags, void *data); - int (*sb_check_sb) (struct vfsmount * mnt, struct path *path); - int (*sb_umount) (struct vfsmount * mnt, int flags); - void (*sb_umount_close) (struct vfsmount * mnt); - void (*sb_umount_busy) (struct vfsmount * mnt); - void (*sb_post_remount) (struct vfsmount * mnt, + int (*sb_check_sb) (struct vfsmount *mnt, struct path *path); + int (*sb_umount) (struct vfsmount *mnt, int flags); + void (*sb_umount_close) (struct vfsmount *mnt); + void (*sb_umount_busy) (struct vfsmount *mnt); + void (*sb_post_remount) (struct vfsmount *mnt, unsigned long flags, void *data); - void (*sb_post_addmount) (struct vfsmount * mnt, + void (*sb_post_addmount) (struct vfsmount *mnt, struct path *mountpoint); int (*sb_pivotroot) (struct path *old_path, struct path *new_path); @@ -1337,29 +1337,29 @@ struct security_operations { struct super_block *newsb); int (*sb_parse_opts_str) (char *options, struct security_mnt_opts *opts); - int (*inode_alloc_security) (struct inode *inode); + int (*inode_alloc_security) (struct inode *inode); void (*inode_free_security) (struct inode *inode); int (*inode_init_security) (struct inode *inode, struct inode *dir, char **name, void **value, size_t *len); int (*inode_create) (struct inode *dir, - struct dentry *dentry, int mode); + struct dentry *dentry, int mode); int (*inode_link) (struct dentry *old_dentry, - struct inode *dir, struct dentry *new_dentry); + struct inode *dir, struct dentry *new_dentry); int (*inode_unlink) (struct inode *dir, struct dentry *dentry); int (*inode_symlink) (struct inode *dir, - struct dentry *dentry, const char *old_name); + struct dentry *dentry, const char *old_name); int (*inode_mkdir) (struct inode *dir, struct dentry *dentry, int mode); int (*inode_rmdir) (struct inode *dir, struct dentry *dentry); int (*inode_mknod) (struct inode *dir, struct dentry *dentry, - int mode, dev_t dev); + int mode, dev_t dev); int (*inode_rename) (struct inode *old_dir, struct dentry *old_dentry, - struct inode *new_dir, struct dentry *new_dentry); + struct inode *new_dir, struct dentry *new_dentry); int (*inode_readlink) (struct dentry *dentry); int (*inode_follow_link) (struct dentry *dentry, struct nameidata *nd); int (*inode_permission) (struct inode *inode, int mask, struct nameidata *nd); int (*inode_setattr) (struct dentry *dentry, struct iattr *attr); int (*inode_getattr) (struct vfsmount *mnt, struct dentry *dentry); - void (*inode_delete) (struct inode *inode); + void (*inode_delete) (struct inode *inode); int (*inode_setxattr) (struct dentry *dentry, char *name, void *value, size_t size, int flags); void (*inode_post_setxattr) (struct dentry *dentry, char *name, void *value, @@ -1369,145 +1369,145 @@ struct security_operations { int (*inode_removexattr) (struct dentry *dentry, char *name); int (*inode_need_killpriv) (struct dentry *dentry); int (*inode_killpriv) (struct dentry *dentry); - int (*inode_getsecurity)(const struct inode *inode, const char *name, void **buffer, bool alloc); - int (*inode_setsecurity)(struct inode *inode, const char *name, const void *value, size_t size, int flags); - int (*inode_listsecurity)(struct inode *inode, char *buffer, size_t buffer_size); - void (*inode_getsecid)(const struct inode *inode, u32 *secid); - - int (*file_permission) (struct file * file, int mask); - int (*file_alloc_security) (struct file * file); - void (*file_free_security) (struct file * file); - int (*file_ioctl) (struct file * file, unsigned int cmd, + int (*inode_getsecurity) (const struct inode *inode, const char *name, void **buffer, bool alloc); + int (*inode_setsecurity) (struct inode *inode, const char *name, const void *value, size_t size, int flags); + int (*inode_listsecurity) (struct inode *inode, char *buffer, size_t buffer_size); + void (*inode_getsecid) (const struct inode *inode, u32 *secid); + + int (*file_permission) (struct file *file, int mask); + int (*file_alloc_security) (struct file *file); + void (*file_free_security) (struct file *file); + int (*file_ioctl) (struct file *file, unsigned int cmd, unsigned long arg); - int (*file_mmap) (struct file * file, + int (*file_mmap) (struct file *file, unsigned long reqprot, unsigned long prot, unsigned long flags, unsigned long addr, unsigned long addr_only); - int (*file_mprotect) (struct vm_area_struct * vma, + int (*file_mprotect) (struct vm_area_struct *vma, unsigned long reqprot, unsigned long prot); - int (*file_lock) (struct file * file, unsigned int cmd); - int (*file_fcntl) (struct file * file, unsigned int cmd, + int (*file_lock) (struct file *file, unsigned int cmd); + int (*file_fcntl) (struct file *file, unsigned int cmd, unsigned long arg); - int (*file_set_fowner) (struct file * file); - int (*file_send_sigiotask) (struct task_struct * tsk, - struct fown_struct * fown, int sig); - int (*file_receive) (struct file * file); - int (*dentry_open) (struct file *file); + int (*file_set_fowner) (struct file *file); + int (*file_send_sigiotask) (struct task_struct *tsk, + struct fown_struct *fown, int sig); + int (*file_receive) (struct file *file); + int (*dentry_open) (struct file *file); int (*task_create) (unsigned long clone_flags); - int (*task_alloc_security) (struct task_struct * p); - void (*task_free_security) (struct task_struct * p); + int (*task_alloc_security) (struct task_struct *p); + void (*task_free_security) (struct task_struct *p); int (*task_setuid) (uid_t id0, uid_t id1, uid_t id2, int flags); int (*task_post_setuid) (uid_t old_ruid /* or fsuid */ , uid_t old_euid, uid_t old_suid, int flags); int (*task_setgid) (gid_t id0, gid_t id1, gid_t id2, int flags); - int (*task_setpgid) (struct task_struct * p, pid_t pgid); - int (*task_getpgid) (struct task_struct * p); - int (*task_getsid) (struct task_struct * p); - void (*task_getsecid) (struct task_struct * p, u32 * secid); + int (*task_setpgid) (struct task_struct *p, pid_t pgid); + int (*task_getpgid) (struct task_struct *p); + int (*task_getsid) (struct task_struct *p); + void (*task_getsecid) (struct task_struct *p, u32 *secid); int (*task_setgroups) (struct group_info *group_info); - int (*task_setnice) (struct task_struct * p, int nice); - int (*task_setioprio) (struct task_struct * p, int ioprio); - int (*task_getioprio) (struct task_struct * p); - int (*task_setrlimit) (unsigned int resource, struct rlimit * new_rlim); - int (*task_setscheduler) (struct task_struct * p, int policy, - struct sched_param * lp); - int (*task_getscheduler) (struct task_struct * p); - int (*task_movememory) (struct task_struct * p); - int (*task_kill) (struct task_struct * p, - struct siginfo * info, int sig, u32 secid); - int (*task_wait) (struct task_struct * p); + int (*task_setnice) (struct task_struct *p, int nice); + int (*task_setioprio) (struct task_struct *p, int ioprio); + int (*task_getioprio) (struct task_struct *p); + int (*task_setrlimit) (unsigned int resource, struct rlimit *new_rlim); + int (*task_setscheduler) (struct task_struct *p, int policy, + struct sched_param *lp); + int (*task_getscheduler) (struct task_struct *p); + int (*task_movememory) (struct task_struct *p); + int (*task_kill) (struct task_struct *p, + struct siginfo *info, int sig, u32 secid); + int (*task_wait) (struct task_struct *p); int (*task_prctl) (int option, unsigned long arg2, unsigned long arg3, unsigned long arg4, unsigned long arg5); - void (*task_reparent_to_init) (struct task_struct * p); - void (*task_to_inode)(struct task_struct *p, struct inode *inode); + void (*task_reparent_to_init) (struct task_struct *p); + void (*task_to_inode) (struct task_struct *p, struct inode *inode); - int (*ipc_permission) (struct kern_ipc_perm * ipcp, short flag); + int (*ipc_permission) (struct kern_ipc_perm *ipcp, short flag); void (*ipc_getsecid) (struct kern_ipc_perm *ipcp, u32 *secid); - int (*msg_msg_alloc_security) (struct msg_msg * msg); - void (*msg_msg_free_security) (struct msg_msg * msg); - - int (*msg_queue_alloc_security) (struct msg_queue * msq); - void (*msg_queue_free_security) (struct msg_queue * msq); - int (*msg_queue_associate) (struct msg_queue * msq, int msqflg); - int (*msg_queue_msgctl) (struct msg_queue * msq, int cmd); - int (*msg_queue_msgsnd) (struct msg_queue * msq, - struct msg_msg * msg, int msqflg); - int (*msg_queue_msgrcv) (struct msg_queue * msq, - struct msg_msg * msg, - struct task_struct * target, + int (*msg_msg_alloc_security) (struct msg_msg *msg); + void (*msg_msg_free_security) (struct msg_msg *msg); + + int (*msg_queue_alloc_security) (struct msg_queue *msq); + void (*msg_queue_free_security) (struct msg_queue *msq); + int (*msg_queue_associate) (struct msg_queue *msq, int msqflg); + int (*msg_queue_msgctl) (struct msg_queue *msq, int cmd); + int (*msg_queue_msgsnd) (struct msg_queue *msq, + struct msg_msg *msg, int msqflg); + int (*msg_queue_msgrcv) (struct msg_queue *msq, + struct msg_msg *msg, + struct task_struct *target, long type, int mode); - int (*shm_alloc_security) (struct shmid_kernel * shp); - void (*shm_free_security) (struct shmid_kernel * shp); - int (*shm_associate) (struct shmid_kernel * shp, int shmflg); - int (*shm_shmctl) (struct shmid_kernel * shp, int cmd); - int (*shm_shmat) (struct shmid_kernel * shp, + int (*shm_alloc_security) (struct shmid_kernel *shp); + void (*shm_free_security) (struct shmid_kernel *shp); + int (*shm_associate) (struct shmid_kernel *shp, int shmflg); + int (*shm_shmctl) (struct shmid_kernel *shp, int cmd); + int (*shm_shmat) (struct shmid_kernel *shp, char __user *shmaddr, int shmflg); - int (*sem_alloc_security) (struct sem_array * sma); - void (*sem_free_security) (struct sem_array * sma); - int (*sem_associate) (struct sem_array * sma, int semflg); - int (*sem_semctl) (struct sem_array * sma, int cmd); - int (*sem_semop) (struct sem_array * sma, - struct sembuf * sops, unsigned nsops, int alter); + int (*sem_alloc_security) (struct sem_array *sma); + void (*sem_free_security) (struct sem_array *sma); + int (*sem_associate) (struct sem_array *sma, int semflg); + int (*sem_semctl) (struct sem_array *sma, int cmd); + int (*sem_semop) (struct sem_array *sma, + struct sembuf *sops, unsigned nsops, int alter); - int (*netlink_send) (struct sock * sk, struct sk_buff * skb); - int (*netlink_recv) (struct sk_buff * skb, int cap); + int (*netlink_send) (struct sock *sk, struct sk_buff *skb); + int (*netlink_recv) (struct sk_buff *skb, int cap); /* allow module stacking */ int (*register_security) (const char *name, - struct security_operations *ops); + struct security_operations *ops); void (*d_instantiate) (struct dentry *dentry, struct inode *inode); - int (*getprocattr)(struct task_struct *p, char *name, char **value); - int (*setprocattr)(struct task_struct *p, char *name, void *value, size_t size); - int (*secid_to_secctx)(u32 secid, char **secdata, u32 *seclen); - int (*secctx_to_secid)(char *secdata, u32 seclen, u32 *secid); - void (*release_secctx)(char *secdata, u32 seclen); + int (*getprocattr) (struct task_struct *p, char *name, char **value); + int (*setprocattr) (struct task_struct *p, char *name, void *value, size_t size); + int (*secid_to_secctx) (u32 secid, char **secdata, u32 *seclen); + int (*secctx_to_secid) (char *secdata, u32 seclen, u32 *secid); + void (*release_secctx) (char *secdata, u32 seclen); #ifdef CONFIG_SECURITY_NETWORK - int (*unix_stream_connect) (struct socket * sock, - struct socket * other, struct sock * newsk); - int (*unix_may_send) (struct socket * sock, struct socket * other); + int (*unix_stream_connect) (struct socket *sock, + struct socket *other, struct sock *newsk); + int (*unix_may_send) (struct socket *sock, struct socket *other); int (*socket_create) (int family, int type, int protocol, int kern); - int (*socket_post_create) (struct socket * sock, int family, + int (*socket_post_create) (struct socket *sock, int family, int type, int protocol, int kern); - int (*socket_bind) (struct socket * sock, - struct sockaddr * address, int addrlen); - int (*socket_connect) (struct socket * sock, - struct sockaddr * address, int addrlen); - int (*socket_listen) (struct socket * sock, int backlog); - int (*socket_accept) (struct socket * sock, struct socket * newsock); - void (*socket_post_accept) (struct socket * sock, - struct socket * newsock); - int (*socket_sendmsg) (struct socket * sock, - struct msghdr * msg, int size); - int (*socket_recvmsg) (struct socket * sock, - struct msghdr * msg, int size, int flags); - int (*socket_getsockname) (struct socket * sock); - int (*socket_getpeername) (struct socket * sock); - int (*socket_getsockopt) (struct socket * sock, int level, int optname); - int (*socket_setsockopt) (struct socket * sock, int level, int optname); - int (*socket_shutdown) (struct socket * sock, int how); - int (*socket_sock_rcv_skb) (struct sock * sk, struct sk_buff * skb); + int (*socket_bind) (struct socket *sock, + struct sockaddr *address, int addrlen); + int (*socket_connect) (struct socket *sock, + struct sockaddr *address, int addrlen); + int (*socket_listen) (struct socket *sock, int backlog); + int (*socket_accept) (struct socket *sock, struct socket *newsock); + void (*socket_post_accept) (struct socket *sock, + struct socket *newsock); + int (*socket_sendmsg) (struct socket *sock, + struct msghdr *msg, int size); + int (*socket_recvmsg) (struct socket *sock, + struct msghdr *msg, int size, int flags); + int (*socket_getsockname) (struct socket *sock); + int (*socket_getpeername) (struct socket *sock); + int (*socket_getsockopt) (struct socket *sock, int level, int optname); + int (*socket_setsockopt) (struct socket *sock, int level, int optname); + int (*socket_shutdown) (struct socket *sock, int how); + int (*socket_sock_rcv_skb) (struct sock *sk, struct sk_buff *skb); int (*socket_getpeersec_stream) (struct socket *sock, char __user *optval, int __user *optlen, unsigned len); int (*socket_getpeersec_dgram) (struct socket *sock, struct sk_buff *skb, u32 *secid); int (*sk_alloc_security) (struct sock *sk, int family, gfp_t priority); void (*sk_free_security) (struct sock *sk); void (*sk_clone_security) (const struct sock *sk, struct sock *newsk); void (*sk_getsecid) (struct sock *sk, u32 *secid); - void (*sock_graft)(struct sock* sk, struct socket *parent); - int (*inet_conn_request)(struct sock *sk, struct sk_buff *skb, - struct request_sock *req); - void (*inet_csk_clone)(struct sock *newsk, const struct request_sock *req); - void (*inet_conn_established)(struct sock *sk, struct sk_buff *skb); - void (*req_classify_flow)(const struct request_sock *req, struct flowi *fl); + void (*sock_graft) (struct sock *sk, struct socket *parent); + int (*inet_conn_request) (struct sock *sk, struct sk_buff *skb, + struct request_sock *req); + void (*inet_csk_clone) (struct sock *newsk, const struct request_sock *req); + void (*inet_conn_established) (struct sock *sk, struct sk_buff *skb); + void (*req_classify_flow) (const struct request_sock *req, struct flowi *fl); #endif /* CONFIG_SECURITY_NETWORK */ #ifdef CONFIG_SECURITY_NETWORK_XFRM @@ -1521,57 +1521,57 @@ struct security_operations { u32 secid); void (*xfrm_state_free_security) (struct xfrm_state *x); int (*xfrm_state_delete_security) (struct xfrm_state *x); - int (*xfrm_policy_lookup)(struct xfrm_sec_ctx *ctx, u32 fl_secid, u8 dir); - int (*xfrm_state_pol_flow_match)(struct xfrm_state *x, - struct xfrm_policy *xp, struct flowi *fl); - int (*xfrm_decode_session)(struct sk_buff *skb, u32 *secid, int ckall); + int (*xfrm_policy_lookup) (struct xfrm_sec_ctx *ctx, u32 fl_secid, u8 dir); + int (*xfrm_state_pol_flow_match) (struct xfrm_state *x, + struct xfrm_policy *xp, + struct flowi *fl); + int (*xfrm_decode_session) (struct sk_buff *skb, u32 *secid, int ckall); #endif /* CONFIG_SECURITY_NETWORK_XFRM */ /* key management security hooks */ #ifdef CONFIG_KEYS - int (*key_alloc)(struct key *key, struct task_struct *tsk, unsigned long flags); - void (*key_free)(struct key *key); - int (*key_permission)(key_ref_t key_ref, - struct task_struct *context, - key_perm_t perm); + int (*key_alloc) (struct key *key, struct task_struct *tsk, unsigned long flags); + void (*key_free) (struct key *key); + int (*key_permission) (key_ref_t key_ref, + struct task_struct *context, + key_perm_t perm); #endif /* CONFIG_KEYS */ #ifdef CONFIG_AUDIT - int (*audit_rule_init)(u32 field, u32 op, char *rulestr, void **lsmrule); - int (*audit_rule_known)(struct audit_krule *krule); - int (*audit_rule_match)(u32 secid, u32 field, u32 op, void *lsmrule, - struct audit_context *actx); - void (*audit_rule_free)(void *lsmrule); + int (*audit_rule_init) (u32 field, u32 op, char *rulestr, void **lsmrule); + int (*audit_rule_known) (struct audit_krule *krule); + int (*audit_rule_match) (u32 secid, u32 field, u32 op, void *lsmrule, + struct audit_context *actx); + void (*audit_rule_free) (void *lsmrule); #endif /* CONFIG_AUDIT */ }; /* prototypes */ -extern int security_init (void); +extern int security_init(void); extern int security_module_enable(struct security_operations *ops); -extern int register_security (struct security_operations *ops); -extern int mod_reg_security (const char *name, struct security_operations *ops); +extern int register_security(struct security_operations *ops); +extern int mod_reg_security(const char *name, struct security_operations *ops); extern struct dentry *securityfs_create_file(const char *name, mode_t mode, struct dentry *parent, void *data, const struct file_operations *fops); extern struct dentry *securityfs_create_dir(const char *name, struct dentry *parent); extern void securityfs_remove(struct dentry *dentry); - /* Security operations */ int security_ptrace(struct task_struct *parent, struct task_struct *child); int security_capget(struct task_struct *target, - kernel_cap_t *effective, - kernel_cap_t *inheritable, - kernel_cap_t *permitted); + kernel_cap_t *effective, + kernel_cap_t *inheritable, + kernel_cap_t *permitted); int security_capset_check(struct task_struct *target, - kernel_cap_t *effective, - kernel_cap_t *inheritable, - kernel_cap_t *permitted); -void security_capset_set(struct task_struct *target, kernel_cap_t *effective, kernel_cap_t *inheritable, kernel_cap_t *permitted); +void security_capset_set(struct task_struct *target, + kernel_cap_t *effective, + kernel_cap_t *inheritable, + kernel_cap_t *permitted); int security_capable(struct task_struct *tsk, int cap); int security_acct(struct file *file); int security_sysctl(struct ctl_table *table, int op); @@ -1594,7 +1594,7 @@ int security_sb_copy_data(char *orig, char *copy); int security_sb_kern_mount(struct super_block *sb, void *data); int security_sb_statfs(struct dentry *dentry); int security_sb_mount(char *dev_name, struct path *path, - char *type, unsigned long flags, void *data); + char *type, unsigned long flags, void *data); int security_sb_check_sb(struct vfsmount *mnt, struct path *path); int security_sb_umount(struct vfsmount *mnt, int flags); void security_sb_umount_close(struct vfsmount *mnt); @@ -1619,12 +1619,12 @@ int security_inode_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_dentry); int security_inode_unlink(struct inode *dir, struct dentry *dentry); int security_inode_symlink(struct inode *dir, struct dentry *dentry, - const char *old_name); + const char *old_name); int security_inode_mkdir(struct inode *dir, struct dentry *dentry, int mode); int security_inode_rmdir(struct inode *dir, struct dentry *dentry); int security_inode_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev); int security_inode_rename(struct inode *old_dir, struct dentry *old_dentry, - struct inode *new_dir, struct dentry *new_dentry); + struct inode *new_dir, struct dentry *new_dentry); int security_inode_readlink(struct dentry *dentry); int security_inode_follow_link(struct dentry *dentry, struct nameidata *nd); int security_inode_permission(struct inode *inode, int mask, struct nameidata *nd); @@ -1632,9 +1632,9 @@ int security_inode_setattr(struct dentry *dentry, struct iattr *attr); int security_inode_getattr(struct vfsmount *mnt, struct dentry *dentry); void security_inode_delete(struct inode *inode); int security_inode_setxattr(struct dentry *dentry, char *name, - void *value, size_t size, int flags); + void *value, size_t size, int flags); void security_inode_post_setxattr(struct dentry *dentry, char *name, - void *value, size_t size, int flags); + void *value, size_t size, int flags); int security_inode_getxattr(struct dentry *dentry, char *name); int security_inode_listxattr(struct dentry *dentry); int security_inode_removexattr(struct dentry *dentry, char *name); @@ -1652,12 +1652,12 @@ int security_file_mmap(struct file *file, unsigned long reqprot, unsigned long prot, unsigned long flags, unsigned long addr, unsigned long addr_only); int security_file_mprotect(struct vm_area_struct *vma, unsigned long reqprot, - unsigned long prot); + unsigned long prot); int security_file_lock(struct file *file, unsigned int cmd); int security_file_fcntl(struct file *file, unsigned int cmd, unsigned long arg); int security_file_set_fowner(struct file *file); int security_file_send_sigiotask(struct task_struct *tsk, - struct fown_struct *fown, int sig); + struct fown_struct *fown, int sig); int security_file_receive(struct file *file); int security_dentry_open(struct file *file); int security_task_create(unsigned long clone_flags); @@ -1665,7 +1665,7 @@ int security_task_alloc(struct task_struct *p); void security_task_free(struct task_struct *p); int security_task_setuid(uid_t id0, uid_t id1, uid_t id2, int flags); int security_task_post_setuid(uid_t old_ruid, uid_t old_euid, - uid_t old_suid, int flags); + uid_t old_suid, int flags); int security_task_setgid(gid_t id0, gid_t id1, gid_t id2, int flags); int security_task_setpgid(struct task_struct *p, pid_t pgid); int security_task_getpgid(struct task_struct *p); @@ -1696,9 +1696,9 @@ void security_msg_queue_free(struct msg_queue *msq); int security_msg_queue_associate(struct msg_queue *msq, int msqflg); int security_msg_queue_msgctl(struct msg_queue *msq, int cmd); int security_msg_queue_msgsnd(struct msg_queue *msq, - struct msg_msg *msg, int msqflg); + struct msg_msg *msg, int msqflg); int security_msg_queue_msgrcv(struct msg_queue *msq, struct msg_msg *msg, - struct task_struct *target, long type, int mode); + struct task_struct *target, long type, int mode); int security_shm_alloc(struct shmid_kernel *shp); void security_shm_free(struct shmid_kernel *shp); int security_shm_associate(struct shmid_kernel *shp, int shmflg); @@ -1710,7 +1710,7 @@ int security_sem_associate(struct sem_array *sma, int semflg); int security_sem_semctl(struct sem_array *sma, int cmd); int security_sem_semop(struct sem_array *sma, struct sembuf *sops, unsigned nsops, int alter); -void security_d_instantiate (struct dentry *dentry, struct inode *inode); +void security_d_instantiate(struct dentry *dentry, struct inode *inode); int security_getprocattr(struct task_struct *p, char *name, char **value); int security_setprocattr(struct task_struct *p, char *name, void *value, size_t size); int security_netlink_send(struct sock *sk, struct sk_buff *skb); @@ -1741,33 +1741,33 @@ static inline int security_init(void) return 0; } -static inline int security_ptrace (struct task_struct *parent, struct task_struct * child) +static inline int security_ptrace(struct task_struct *parent, struct task_struct *child) { - return cap_ptrace (parent, child); + return cap_ptrace(parent, child); } -static inline int security_capget (struct task_struct *target, +static inline int security_capget(struct task_struct *target, kernel_cap_t *effective, kernel_cap_t *inheritable, kernel_cap_t *permitted) { - return cap_capget (target, effective, inheritable, permitted); + return cap_capget(target, effective, inheritable, permitted); } -static inline int security_capset_check (struct task_struct *target, +static inline int security_capset_check(struct task_struct *target, kernel_cap_t *effective, kernel_cap_t *inheritable, kernel_cap_t *permitted) { - return cap_capset_check (target, effective, inheritable, permitted); + return cap_capset_check(target, effective, inheritable, permitted); } -static inline void security_capset_set (struct task_struct *target, +static inline void security_capset_set(struct task_struct *target, kernel_cap_t *effective, kernel_cap_t *inheritable, kernel_cap_t *permitted) { - cap_capset_set (target, effective, inheritable, permitted); + cap_capset_set(target, effective, inheritable, permitted); } static inline int security_capable(struct task_struct *tsk, int cap) @@ -1775,7 +1775,7 @@ static inline int security_capable(struct task_struct *tsk, int cap) return cap_capable(tsk, cap); } -static inline int security_acct (struct file *file) +static inline int security_acct(struct file *file) { return 0; } @@ -1785,13 +1785,13 @@ static inline int security_sysctl(struct ctl_table *table, int op) return 0; } -static inline int security_quotactl (int cmds, int type, int id, - struct super_block * sb) +static inline int security_quotactl(int cmds, int type, int id, + struct super_block *sb) { return 0; } -static inline int security_quota_on (struct dentry * dentry) +static inline int security_quota_on(struct dentry *dentry) { return 0; } @@ -1816,102 +1816,102 @@ static inline int security_vm_enough_memory_mm(struct mm_struct *mm, long pages) return cap_vm_enough_memory(mm, pages); } -static inline int security_bprm_alloc (struct linux_binprm *bprm) +static inline int security_bprm_alloc(struct linux_binprm *bprm) { return 0; } -static inline void security_bprm_free (struct linux_binprm *bprm) +static inline void security_bprm_free(struct linux_binprm *bprm) { } -static inline void security_bprm_apply_creds (struct linux_binprm *bprm, int unsafe) -{ - cap_bprm_apply_creds (bprm, unsafe); +static inline void security_bprm_apply_creds(struct linux_binprm *bprm, int unsafe) +{ + cap_bprm_apply_creds(bprm, unsafe); } -static inline void security_bprm_post_apply_creds (struct linux_binprm *bprm) +static inline void security_bprm_post_apply_creds(struct linux_binprm *bprm) { return; } -static inline int security_bprm_set (struct linux_binprm *bprm) +static inline int security_bprm_set(struct linux_binprm *bprm) { - return cap_bprm_set_security (bprm); + return cap_bprm_set_security(bprm); } -static inline int security_bprm_check (struct linux_binprm *bprm) +static inline int security_bprm_check(struct linux_binprm *bprm) { return 0; } -static inline int security_bprm_secureexec (struct linux_binprm *bprm) +static inline int security_bprm_secureexec(struct linux_binprm *bprm) { return cap_bprm_secureexec(bprm); } -static inline int security_sb_alloc (struct super_block *sb) +static inline int security_sb_alloc(struct super_block *sb) { return 0; } -static inline void security_sb_free (struct super_block *sb) +static inline void security_sb_free(struct super_block *sb) { } -static inline int security_sb_copy_data (char *orig, char *copy) +static inline int security_sb_copy_data(char *orig, char *copy) { return 0; } -static inline int security_sb_kern_mount (struct super_block *sb, void *data) +static inline int security_sb_kern_mount(struct super_block *sb, void *data) { return 0; } -static inline int security_sb_statfs (struct dentry *dentry) +static inline int security_sb_statfs(struct dentry *dentry) { return 0; } -static inline int security_sb_mount (char *dev_name, struct path *path, +static inline int security_sb_mount(char *dev_name, struct path *path, char *type, unsigned long flags, void *data) { return 0; } -static inline int security_sb_check_sb (struct vfsmount *mnt, - struct path *path) +static inline int security_sb_check_sb(struct vfsmount *mnt, + struct path *path) { return 0; } -static inline int security_sb_umount (struct vfsmount *mnt, int flags) +static inline int security_sb_umount(struct vfsmount *mnt, int flags) { return 0; } -static inline void security_sb_umount_close (struct vfsmount *mnt) +static inline void security_sb_umount_close(struct vfsmount *mnt) { } -static inline void security_sb_umount_busy (struct vfsmount *mnt) +static inline void security_sb_umount_busy(struct vfsmount *mnt) { } -static inline void security_sb_post_remount (struct vfsmount *mnt, +static inline void security_sb_post_remount(struct vfsmount *mnt, unsigned long flags, void *data) { } -static inline void security_sb_post_addmount (struct vfsmount *mnt, - struct path *mountpoint) +static inline void security_sb_post_addmount(struct vfsmount *mnt, + struct path *mountpoint) { } -static inline int security_sb_pivotroot (struct path *old_path, - struct path *new_path) +static inline int security_sb_pivotroot(struct path *old_path, + struct path *new_path) { return 0; } -static inline void security_sb_post_pivotroot (struct path *old_path, - struct path *new_path) +static inline void security_sb_post_pivotroot(struct path *old_path, + struct path *new_path) { } static inline int security_sb_get_mnt_opts(const struct super_block *sb, struct security_mnt_opts *opts) @@ -1935,15 +1935,15 @@ static inline int security_sb_parse_opts_str(char *options, struct security_mnt_ return 0; } -static inline int security_inode_alloc (struct inode *inode) +static inline int security_inode_alloc(struct inode *inode) { return 0; } -static inline void security_inode_free (struct inode *inode) +static inline void security_inode_free(struct inode *inode) { } -static inline int security_inode_init_security (struct inode *inode, +static inline int security_inode_init_security(struct inode *inode, struct inode *dir, char **name, void **value, @@ -1951,55 +1951,55 @@ static inline int security_inode_init_security (struct inode *inode, { return -EOPNOTSUPP; } - -static inline int security_inode_create (struct inode *dir, + +static inline int security_inode_create(struct inode *dir, struct dentry *dentry, int mode) { return 0; } -static inline int security_inode_link (struct dentry *old_dentry, +static inline int security_inode_link(struct dentry *old_dentry, struct inode *dir, struct dentry *new_dentry) { return 0; } -static inline int security_inode_unlink (struct inode *dir, +static inline int security_inode_unlink(struct inode *dir, struct dentry *dentry) { return 0; } -static inline int security_inode_symlink (struct inode *dir, +static inline int security_inode_symlink(struct inode *dir, struct dentry *dentry, const char *old_name) { return 0; } -static inline int security_inode_mkdir (struct inode *dir, +static inline int security_inode_mkdir(struct inode *dir, struct dentry *dentry, int mode) { return 0; } -static inline int security_inode_rmdir (struct inode *dir, +static inline int security_inode_rmdir(struct inode *dir, struct dentry *dentry) { return 0; } -static inline int security_inode_mknod (struct inode *dir, +static inline int security_inode_mknod(struct inode *dir, struct dentry *dentry, int mode, dev_t dev) { return 0; } -static inline int security_inode_rename (struct inode *old_dir, +static inline int security_inode_rename(struct inode *old_dir, struct dentry *old_dentry, struct inode *new_dir, struct dentry *new_dentry) @@ -2007,59 +2007,59 @@ static inline int security_inode_rename (struct inode *old_dir, return 0; } -static inline int security_inode_readlink (struct dentry *dentry) +static inline int security_inode_readlink(struct dentry *dentry) { return 0; } -static inline int security_inode_follow_link (struct dentry *dentry, +static inline int security_inode_follow_link(struct dentry *dentry, struct nameidata *nd) { return 0; } -static inline int security_inode_permission (struct inode *inode, int mask, +static inline int security_inode_permission(struct inode *inode, int mask, struct nameidata *nd) { return 0; } -static inline int security_inode_setattr (struct dentry *dentry, +static inline int security_inode_setattr(struct dentry *dentry, struct iattr *attr) { return 0; } -static inline int security_inode_getattr (struct vfsmount *mnt, +static inline int security_inode_getattr(struct vfsmount *mnt, struct dentry *dentry) { return 0; } -static inline void security_inode_delete (struct inode *inode) +static inline void security_inode_delete(struct inode *inode) { } -static inline int security_inode_setxattr (struct dentry *dentry, char *name, +static inline int security_inode_setxattr(struct dentry *dentry, char *name, void *value, size_t size, int flags) { return cap_inode_setxattr(dentry, name, value, size, flags); } -static inline void security_inode_post_setxattr (struct dentry *dentry, char *name, +static inline void security_inode_post_setxattr(struct dentry *dentry, char *name, void *value, size_t size, int flags) { } -static inline int security_inode_getxattr (struct dentry *dentry, char *name) +static inline int security_inode_getxattr(struct dentry *dentry, char *name) { return 0; } -static inline int security_inode_listxattr (struct dentry *dentry) +static inline int security_inode_listxattr(struct dentry *dentry) { return 0; } -static inline int security_inode_removexattr (struct dentry *dentry, char *name) +static inline int security_inode_removexattr(struct dentry *dentry, char *name) { return cap_inode_removexattr(dentry, name); } @@ -2094,198 +2094,198 @@ static inline void security_inode_getsecid(const struct inode *inode, u32 *secid *secid = 0; } -static inline int security_file_permission (struct file *file, int mask) +static inline int security_file_permission(struct file *file, int mask) { return 0; } -static inline int security_file_alloc (struct file *file) +static inline int security_file_alloc(struct file *file) { return 0; } -static inline void security_file_free (struct file *file) +static inline void security_file_free(struct file *file) { } -static inline int security_file_ioctl (struct file *file, unsigned int cmd, - unsigned long arg) +static inline int security_file_ioctl(struct file *file, unsigned int cmd, + unsigned long arg) { return 0; } -static inline int security_file_mmap (struct file *file, unsigned long reqprot, - unsigned long prot, - unsigned long flags, - unsigned long addr, - unsigned long addr_only) +static inline int security_file_mmap(struct file *file, unsigned long reqprot, + unsigned long prot, + unsigned long flags, + unsigned long addr, + unsigned long addr_only) { return 0; } -static inline int security_file_mprotect (struct vm_area_struct *vma, - unsigned long reqprot, - unsigned long prot) +static inline int security_file_mprotect(struct vm_area_struct *vma, + unsigned long reqprot, + unsigned long prot) { return 0; } -static inline int security_file_lock (struct file *file, unsigned int cmd) +static inline int security_file_lock(struct file *file, unsigned int cmd) { return 0; } -static inline int security_file_fcntl (struct file *file, unsigned int cmd, - unsigned long arg) +static inline int security_file_fcntl(struct file *file, unsigned int cmd, + unsigned long arg) { return 0; } -static inline int security_file_set_fowner (struct file *file) +static inline int security_file_set_fowner(struct file *file) { return 0; } -static inline int security_file_send_sigiotask (struct task_struct *tsk, - struct fown_struct *fown, - int sig) +static inline int security_file_send_sigiotask(struct task_struct *tsk, + struct fown_struct *fown, + int sig) { return 0; } -static inline int security_file_receive (struct file *file) +static inline int security_file_receive(struct file *file) { return 0; } -static inline int security_dentry_open (struct file *file) +static inline int security_dentry_open(struct file *file) { return 0; } -static inline int security_task_create (unsigned long clone_flags) +static inline int security_task_create(unsigned long clone_flags) { return 0; } -static inline int security_task_alloc (struct task_struct *p) +static inline int security_task_alloc(struct task_struct *p) { return 0; } -static inline void security_task_free (struct task_struct *p) +static inline void security_task_free(struct task_struct *p) { } -static inline int security_task_setuid (uid_t id0, uid_t id1, uid_t id2, - int flags) +static inline int security_task_setuid(uid_t id0, uid_t id1, uid_t id2, + int flags) { return 0; } -static inline int security_task_post_setuid (uid_t old_ruid, uid_t old_euid, - uid_t old_suid, int flags) +static inline int security_task_post_setuid(uid_t old_ruid, uid_t old_euid, + uid_t old_suid, int flags) { - return cap_task_post_setuid (old_ruid, old_euid, old_suid, flags); + return cap_task_post_setuid(old_ruid, old_euid, old_suid, flags); } -static inline int security_task_setgid (gid_t id0, gid_t id1, gid_t id2, - int flags) +static inline int security_task_setgid(gid_t id0, gid_t id1, gid_t id2, + int flags) { return 0; } -static inline int security_task_setpgid (struct task_struct *p, pid_t pgid) +static inline int security_task_setpgid(struct task_struct *p, pid_t pgid) { return 0; } -static inline int security_task_getpgid (struct task_struct *p) +static inline int security_task_getpgid(struct task_struct *p) { return 0; } -static inline int security_task_getsid (struct task_struct *p) +static inline int security_task_getsid(struct task_struct *p) { return 0; } -static inline void security_task_getsecid (struct task_struct *p, u32 *secid) +static inline void security_task_getsecid(struct task_struct *p, u32 *secid) { *secid = 0; } -static inline int security_task_setgroups (struct group_info *group_info) +static inline int security_task_setgroups(struct group_info *group_info) { return 0; } -static inline int security_task_setnice (struct task_struct *p, int nice) +static inline int security_task_setnice(struct task_struct *p, int nice) { return cap_task_setnice(p, nice); } -static inline int security_task_setioprio (struct task_struct *p, int ioprio) +static inline int security_task_setioprio(struct task_struct *p, int ioprio) { return cap_task_setioprio(p, ioprio); } -static inline int security_task_getioprio (struct task_struct *p) +static inline int security_task_getioprio(struct task_struct *p) { return 0; } -static inline int security_task_setrlimit (unsigned int resource, - struct rlimit *new_rlim) +static inline int security_task_setrlimit(unsigned int resource, + struct rlimit *new_rlim) { return 0; } -static inline int security_task_setscheduler (struct task_struct *p, - int policy, - struct sched_param *lp) +static inline int security_task_setscheduler(struct task_struct *p, + int policy, + struct sched_param *lp) { return cap_task_setscheduler(p, policy, lp); } -static inline int security_task_getscheduler (struct task_struct *p) +static inline int security_task_getscheduler(struct task_struct *p) { return 0; } -static inline int security_task_movememory (struct task_struct *p) +static inline int security_task_movememory(struct task_struct *p) { return 0; } -static inline int security_task_kill (struct task_struct *p, - struct siginfo *info, int sig, - u32 secid) +static inline int security_task_kill(struct task_struct *p, + struct siginfo *info, int sig, + u32 secid) { return 0; } -static inline int security_task_wait (struct task_struct *p) +static inline int security_task_wait(struct task_struct *p) { return 0; } -static inline int security_task_prctl (int option, unsigned long arg2, - unsigned long arg3, - unsigned long arg4, - unsigned long arg5) +static inline int security_task_prctl(int option, unsigned long arg2, + unsigned long arg3, + unsigned long arg4, + unsigned long arg5) { return 0; } -static inline void security_task_reparent_to_init (struct task_struct *p) +static inline void security_task_reparent_to_init(struct task_struct *p) { - cap_task_reparent_to_init (p); + cap_task_reparent_to_init(p); } static inline void security_task_to_inode(struct task_struct *p, struct inode *inode) { } -static inline int security_ipc_permission (struct kern_ipc_perm *ipcp, - short flag) +static inline int security_ipc_permission(struct kern_ipc_perm *ipcp, + short flag) { return 0; } @@ -2295,98 +2295,98 @@ static inline void security_ipc_getsecid(struct kern_ipc_perm *ipcp, u32 *secid) *secid = 0; } -static inline int security_msg_msg_alloc (struct msg_msg * msg) +static inline int security_msg_msg_alloc(struct msg_msg *msg) { return 0; } -static inline void security_msg_msg_free (struct msg_msg * msg) +static inline void security_msg_msg_free(struct msg_msg *msg) { } -static inline int security_msg_queue_alloc (struct msg_queue *msq) +static inline int security_msg_queue_alloc(struct msg_queue *msq) { return 0; } -static inline void security_msg_queue_free (struct msg_queue *msq) +static inline void security_msg_queue_free(struct msg_queue *msq) { } -static inline int security_msg_queue_associate (struct msg_queue * msq, - int msqflg) +static inline int security_msg_queue_associate(struct msg_queue *msq, + int msqflg) { return 0; } -static inline int security_msg_queue_msgctl (struct msg_queue * msq, int cmd) +static inline int security_msg_queue_msgctl(struct msg_queue *msq, int cmd) { return 0; } -static inline int security_msg_queue_msgsnd (struct msg_queue * msq, - struct msg_msg * msg, int msqflg) +static inline int security_msg_queue_msgsnd(struct msg_queue *msq, + struct msg_msg *msg, int msqflg) { return 0; } -static inline int security_msg_queue_msgrcv (struct msg_queue * msq, - struct msg_msg * msg, - struct task_struct * target, - long type, int mode) +static inline int security_msg_queue_msgrcv(struct msg_queue *msq, + struct msg_msg *msg, + struct task_struct *target, + long type, int mode) { return 0; } -static inline int security_shm_alloc (struct shmid_kernel *shp) +static inline int security_shm_alloc(struct shmid_kernel *shp) { return 0; } -static inline void security_shm_free (struct shmid_kernel *shp) +static inline void security_shm_free(struct shmid_kernel *shp) { } -static inline int security_shm_associate (struct shmid_kernel * shp, - int shmflg) +static inline int security_shm_associate(struct shmid_kernel *shp, + int shmflg) { return 0; } -static inline int security_shm_shmctl (struct shmid_kernel * shp, int cmd) +static inline int security_shm_shmctl(struct shmid_kernel *shp, int cmd) { return 0; } -static inline int security_shm_shmat (struct shmid_kernel * shp, - char __user *shmaddr, int shmflg) +static inline int security_shm_shmat(struct shmid_kernel *shp, + char __user *shmaddr, int shmflg) { return 0; } -static inline int security_sem_alloc (struct sem_array *sma) +static inline int security_sem_alloc(struct sem_array *sma) { return 0; } -static inline void security_sem_free (struct sem_array *sma) +static inline void security_sem_free(struct sem_array *sma) { } -static inline int security_sem_associate (struct sem_array * sma, int semflg) +static inline int security_sem_associate(struct sem_array *sma, int semflg) { return 0; } -static inline int security_sem_semctl (struct sem_array * sma, int cmd) +static inline int security_sem_semctl(struct sem_array *sma, int cmd) { return 0; } -static inline int security_sem_semop (struct sem_array * sma, - struct sembuf * sops, unsigned nsops, - int alter) +static inline int security_sem_semop(struct sem_array *sma, + struct sembuf *sops, unsigned nsops, + int alter) { return 0; } -static inline void security_d_instantiate (struct dentry *dentry, struct inode *inode) +static inline void security_d_instantiate(struct dentry *dentry, struct inode *inode) { } static inline int security_getprocattr(struct task_struct *p, char *name, char **value) @@ -2399,14 +2399,14 @@ static inline int security_setprocattr(struct task_struct *p, char *name, void * return -EINVAL; } -static inline int security_netlink_send (struct sock *sk, struct sk_buff *skb) +static inline int security_netlink_send(struct sock *sk, struct sk_buff *skb) { - return cap_netlink_send (sk, skb); + return cap_netlink_send(sk, skb); } -static inline int security_netlink_recv (struct sk_buff *skb, int cap) +static inline int security_netlink_recv(struct sk_buff *skb, int cap) { - return cap_netlink_recv (skb, cap); + return cap_netlink_recv(skb, cap); } static inline struct dentry *securityfs_create_dir(const char *name, @@ -2484,26 +2484,26 @@ void security_inet_conn_established(struct sock *sk, struct sk_buff *skb); #else /* CONFIG_SECURITY_NETWORK */ -static inline int security_unix_stream_connect(struct socket * sock, - struct socket * other, - struct sock * newsk) +static inline int security_unix_stream_connect(struct socket *sock, + struct socket *other, + struct sock *newsk) { return 0; } -static inline int security_unix_may_send(struct socket * sock, - struct socket * other) +static inline int security_unix_may_send(struct socket *sock, + struct socket *other) { return 0; } -static inline int security_socket_create (int family, int type, - int protocol, int kern) +static inline int security_socket_create(int family, int type, + int protocol, int kern) { return 0; } -static inline int security_socket_post_create(struct socket * sock, +static inline int security_socket_post_create(struct socket *sock, int family, int type, int protocol, int kern) @@ -2511,77 +2511,77 @@ static inline int security_socket_post_create(struct socket * sock, return 0; } -static inline int security_socket_bind(struct socket * sock, - struct sockaddr * address, +static inline int security_socket_bind(struct socket *sock, + struct sockaddr *address, int addrlen) { return 0; } -static inline int security_socket_connect(struct socket * sock, - struct sockaddr * address, +static inline int security_socket_connect(struct socket *sock, + struct sockaddr *address, int addrlen) { return 0; } -static inline int security_socket_listen(struct socket * sock, int backlog) +static inline int security_socket_listen(struct socket *sock, int backlog) { return 0; } -static inline int security_socket_accept(struct socket * sock, - struct socket * newsock) +static inline int security_socket_accept(struct socket *sock, + struct socket *newsock) { return 0; } -static inline void security_socket_post_accept(struct socket * sock, - struct socket * newsock) +static inline void security_socket_post_accept(struct socket *sock, + struct socket *newsock) { } -static inline int security_socket_sendmsg(struct socket * sock, - struct msghdr * msg, int size) +static inline int security_socket_sendmsg(struct socket *sock, + struct msghdr *msg, int size) { return 0; } -static inline int security_socket_recvmsg(struct socket * sock, - struct msghdr * msg, int size, +static inline int security_socket_recvmsg(struct socket *sock, + struct msghdr *msg, int size, int flags) { return 0; } -static inline int security_socket_getsockname(struct socket * sock) +static inline int security_socket_getsockname(struct socket *sock) { return 0; } -static inline int security_socket_getpeername(struct socket * sock) +static inline int security_socket_getpeername(struct socket *sock) { return 0; } -static inline int security_socket_getsockopt(struct socket * sock, +static inline int security_socket_getsockopt(struct socket *sock, int level, int optname) { return 0; } -static inline int security_socket_setsockopt(struct socket * sock, +static inline int security_socket_setsockopt(struct socket *sock, int level, int optname) { return 0; } -static inline int security_socket_shutdown(struct socket * sock, int how) +static inline int security_socket_shutdown(struct socket *sock, int how) { return 0; } -static inline int security_sock_rcv_skb (struct sock * sk, - struct sk_buff * skb) +static inline int security_sock_rcv_skb(struct sock *sk, + struct sk_buff *skb) { return 0; } @@ -2618,7 +2618,7 @@ static inline void security_req_classify_flow(const struct request_sock *req, st { } -static inline void security_sock_graft(struct sock* sk, struct socket *parent) +static inline void security_sock_graft(struct sock *sk, struct socket *parent) { } -- cgit v1.2.3 From a639e7ca8e8282b75be2724a28bfc788aa3bb156 Mon Sep 17 00:00:00 2001 From: Paul Moore Date: Fri, 25 Apr 2008 15:03:34 -0400 Subject: SELinux: Made netnode cache adds faster When adding new entries to the network node cache we would walk the entire hash bucket to make sure we didn't cross a threshold (done to bound the cache size). This isn't a very quick or elegant solution for something which is supposed to be quick-ish so add a counter to each hash bucket to track the size of the bucket and eliminate the need to walk the entire bucket list on each add. Signed-off-by: Paul Moore Signed-off-by: James Morris --- security/selinux/netnode.c | 104 +++++++++++++++++++++------------------------ 1 file changed, 49 insertions(+), 55 deletions(-) diff --git a/security/selinux/netnode.c b/security/selinux/netnode.c index 2edc4c5e0c61..b6ccd09379f1 100644 --- a/security/selinux/netnode.c +++ b/security/selinux/netnode.c @@ -40,11 +40,17 @@ #include #include +#include "netnode.h" #include "objsec.h" #define SEL_NETNODE_HASH_SIZE 256 #define SEL_NETNODE_HASH_BKT_LIMIT 16 +struct sel_netnode_bkt { + unsigned int size; + struct list_head list; +}; + struct sel_netnode { struct netnode_security_struct nsec; @@ -60,7 +66,7 @@ struct sel_netnode { static LIST_HEAD(sel_netnode_list); static DEFINE_SPINLOCK(sel_netnode_lock); -static struct list_head sel_netnode_hash[SEL_NETNODE_HASH_SIZE]; +static struct sel_netnode_bkt sel_netnode_hash[SEL_NETNODE_HASH_SIZE]; /** * sel_netnode_free - Frees a node entry @@ -87,7 +93,7 @@ static void sel_netnode_free(struct rcu_head *p) * the bucket number for the given IP address. * */ -static u32 sel_netnode_hashfn_ipv4(__be32 addr) +static unsigned int sel_netnode_hashfn_ipv4(__be32 addr) { /* at some point we should determine if the mismatch in byte order * affects the hash function dramatically */ @@ -103,7 +109,7 @@ static u32 sel_netnode_hashfn_ipv4(__be32 addr) * the bucket number for the given IP address. * */ -static u32 sel_netnode_hashfn_ipv6(const struct in6_addr *addr) +static unsigned int sel_netnode_hashfn_ipv6(const struct in6_addr *addr) { /* just hash the least significant 32 bits to keep things fast (they * are the most likely to be different anyway), we can revisit this @@ -123,7 +129,7 @@ static u32 sel_netnode_hashfn_ipv6(const struct in6_addr *addr) */ static struct sel_netnode *sel_netnode_find(const void *addr, u16 family) { - u32 idx; + unsigned int idx; struct sel_netnode *node; switch (family) { @@ -137,7 +143,7 @@ static struct sel_netnode *sel_netnode_find(const void *addr, u16 family) BUG(); } - list_for_each_entry_rcu(node, &sel_netnode_hash[idx], list) + list_for_each_entry_rcu(node, &sel_netnode_hash[idx].list, list) if (node->nsec.family == family) switch (family) { case PF_INET: @@ -159,15 +165,12 @@ static struct sel_netnode *sel_netnode_find(const void *addr, u16 family) * @node: the new node record * * Description: - * Add a new node record to the network address hash table. Returns zero on - * success, negative values on failure. + * Add a new node record to the network address hash table. * */ -static int sel_netnode_insert(struct sel_netnode *node) +static void sel_netnode_insert(struct sel_netnode *node) { - u32 idx; - u32 count = 0; - struct sel_netnode *iter; + unsigned int idx; switch (node->nsec.family) { case PF_INET: @@ -179,32 +182,21 @@ static int sel_netnode_insert(struct sel_netnode *node) default: BUG(); } - list_add_rcu(&node->list, &sel_netnode_hash[idx]); + + INIT_RCU_HEAD(&node->rcu); /* we need to impose a limit on the growth of the hash table so check * this bucket to make sure it is within the specified bounds */ - list_for_each_entry(iter, &sel_netnode_hash[idx], list) - if (++count > SEL_NETNODE_HASH_BKT_LIMIT) { - list_del_rcu(&iter->list); - call_rcu(&iter->rcu, sel_netnode_free); - break; - } - - return 0; -} - -/** - * sel_netnode_destroy - Remove a node record from the table - * @node: the existing node record - * - * Description: - * Remove an existing node record from the network address table. - * - */ -static void sel_netnode_destroy(struct sel_netnode *node) -{ - list_del_rcu(&node->list); - call_rcu(&node->rcu, sel_netnode_free); + list_add_rcu(&node->list, &sel_netnode_hash[idx].list); + if (sel_netnode_hash[idx].size == SEL_NETNODE_HASH_BKT_LIMIT) { + struct sel_netnode *tail; + tail = list_entry( + rcu_dereference(sel_netnode_hash[idx].list.prev), + struct sel_netnode, list); + list_del_rcu(&tail->list); + call_rcu(&tail->rcu, sel_netnode_free); + } else + sel_netnode_hash[idx].size++; } /** @@ -222,7 +214,7 @@ static void sel_netnode_destroy(struct sel_netnode *node) */ static int sel_netnode_sid_slow(void *addr, u16 family, u32 *sid) { - int ret; + int ret = -ENOMEM; struct sel_netnode *node; struct sel_netnode *new = NULL; @@ -230,25 +222,21 @@ static int sel_netnode_sid_slow(void *addr, u16 family, u32 *sid) node = sel_netnode_find(addr, family); if (node != NULL) { *sid = node->nsec.sid; - ret = 0; - goto out; + spin_unlock_bh(&sel_netnode_lock); + return 0; } new = kzalloc(sizeof(*new), GFP_ATOMIC); - if (new == NULL) { - ret = -ENOMEM; + if (new == NULL) goto out; - } switch (family) { case PF_INET: ret = security_node_sid(PF_INET, - addr, sizeof(struct in_addr), - &new->nsec.sid); + addr, sizeof(struct in_addr), sid); new->nsec.addr.ipv4 = *(__be32 *)addr; break; case PF_INET6: ret = security_node_sid(PF_INET6, - addr, sizeof(struct in6_addr), - &new->nsec.sid); + addr, sizeof(struct in6_addr), sid); ipv6_addr_copy(&new->nsec.addr.ipv6, addr); break; default: @@ -256,11 +244,10 @@ static int sel_netnode_sid_slow(void *addr, u16 family, u32 *sid) } if (ret != 0) goto out; + new->nsec.family = family; - ret = sel_netnode_insert(new); - if (ret != 0) - goto out; - *sid = new->nsec.sid; + new->nsec.sid = *sid; + sel_netnode_insert(new); out: spin_unlock_bh(&sel_netnode_lock); @@ -312,13 +299,18 @@ int sel_netnode_sid(void *addr, u16 family, u32 *sid) */ static void sel_netnode_flush(void) { - u32 idx; - struct sel_netnode *node; + unsigned int idx; + struct sel_netnode *node, *node_tmp; spin_lock_bh(&sel_netnode_lock); - for (idx = 0; idx < SEL_NETNODE_HASH_SIZE; idx++) - list_for_each_entry(node, &sel_netnode_hash[idx], list) - sel_netnode_destroy(node); + for (idx = 0; idx < SEL_NETNODE_HASH_SIZE; idx++) { + list_for_each_entry_safe(node, node_tmp, + &sel_netnode_hash[idx].list, list) { + list_del_rcu(&node->list); + call_rcu(&node->rcu, sel_netnode_free); + } + sel_netnode_hash[idx].size = 0; + } spin_unlock_bh(&sel_netnode_lock); } @@ -340,8 +332,10 @@ static __init int sel_netnode_init(void) if (!selinux_enabled) return 0; - for (iter = 0; iter < SEL_NETNODE_HASH_SIZE; iter++) - INIT_LIST_HEAD(&sel_netnode_hash[iter]); + for (iter = 0; iter < SEL_NETNODE_HASH_SIZE; iter++) { + INIT_LIST_HEAD(&sel_netnode_hash[iter].list); + sel_netnode_hash[iter].size = 0; + } ret = avc_add_callback(sel_netnode_avc_callback, AVC_CALLBACK_RESET, SECSID_NULL, SECSID_NULL, SECCLASS_NULL, 0); -- cgit v1.2.3 From c9b7b9793764b171a118d049d4b721a7f5d8ac82 Mon Sep 17 00:00:00 2001 From: Paul Moore Date: Fri, 25 Apr 2008 15:03:39 -0400 Subject: SELinux: Fix a RCU free problem with the netport cache The netport cache doesn't free resources in a manner which is safe or orderly. This patch fixes this by adding in a missing call to rcu_dereference() in sel_netport_insert() as well as some general cleanup throughout the file. Signed-off-by: Paul Moore Signed-off-by: James Morris --- security/selinux/netport.c | 40 ++++++++++++++++++---------------------- 1 file changed, 18 insertions(+), 22 deletions(-) diff --git a/security/selinux/netport.c b/security/selinux/netport.c index 68ede3c498ab..90b4cff7c350 100644 --- a/security/selinux/netport.c +++ b/security/selinux/netport.c @@ -114,8 +114,7 @@ static struct sel_netport *sel_netport_find(u8 protocol, u16 pnum) idx = sel_netport_hashfn(pnum); list_for_each_entry_rcu(port, &sel_netport_hash[idx].list, list) - if (port->psec.port == pnum && - port->psec.protocol == protocol) + if (port->psec.port == pnum && port->psec.protocol == protocol) return port; return NULL; @@ -126,11 +125,10 @@ static struct sel_netport *sel_netport_find(u8 protocol, u16 pnum) * @port: the new port record * * Description: - * Add a new port record to the network address hash table. Returns zero on - * success, negative values on failure. + * Add a new port record to the network address hash table. * */ -static int sel_netport_insert(struct sel_netport *port) +static void sel_netport_insert(struct sel_netport *port) { unsigned int idx; @@ -140,13 +138,13 @@ static int sel_netport_insert(struct sel_netport *port) list_add_rcu(&port->list, &sel_netport_hash[idx].list); if (sel_netport_hash[idx].size == SEL_NETPORT_HASH_BKT_LIMIT) { struct sel_netport *tail; - tail = list_entry(port->list.prev, struct sel_netport, list); - list_del_rcu(port->list.prev); + tail = list_entry( + rcu_dereference(sel_netport_hash[idx].list.prev), + struct sel_netport, list); + list_del_rcu(&tail->list); call_rcu(&tail->rcu, sel_netport_free); } else sel_netport_hash[idx].size++; - - return 0; } /** @@ -163,7 +161,7 @@ static int sel_netport_insert(struct sel_netport *port) */ static int sel_netport_sid_slow(u8 protocol, u16 pnum, u32 *sid) { - int ret; + int ret = -ENOMEM; struct sel_netport *port; struct sel_netport *new = NULL; @@ -171,23 +169,20 @@ static int sel_netport_sid_slow(u8 protocol, u16 pnum, u32 *sid) port = sel_netport_find(protocol, pnum); if (port != NULL) { *sid = port->psec.sid; - ret = 0; - goto out; + spin_unlock_bh(&sel_netport_lock); + return 0; } new = kzalloc(sizeof(*new), GFP_ATOMIC); - if (new == NULL) { - ret = -ENOMEM; + if (new == NULL) goto out; - } - ret = security_port_sid(protocol, pnum, &new->psec.sid); + ret = security_port_sid(protocol, pnum, sid); if (ret != 0) goto out; + new->psec.port = pnum; new->psec.protocol = protocol; - ret = sel_netport_insert(new); - if (ret != 0) - goto out; - *sid = new->psec.sid; + new->psec.sid = *sid; + sel_netport_insert(new); out: spin_unlock_bh(&sel_netport_lock); @@ -239,11 +234,12 @@ int sel_netport_sid(u8 protocol, u16 pnum, u32 *sid) static void sel_netport_flush(void) { unsigned int idx; - struct sel_netport *port; + struct sel_netport *port, *port_tmp; spin_lock_bh(&sel_netport_lock); for (idx = 0; idx < SEL_NETPORT_HASH_SIZE; idx++) { - list_for_each_entry(port, &sel_netport_hash[idx].list, list) { + list_for_each_entry_safe(port, port_tmp, + &sel_netport_hash[idx].list, list) { list_del_rcu(&port->list); call_rcu(&port->rcu, sel_netport_free); } -- cgit v1.2.3 From f022bfd58253099102218db5249220a7f4787114 Mon Sep 17 00:00:00 2001 From: Ingo Molnar Date: Fri, 21 Mar 2008 15:42:28 +0100 Subject: x86: PAT fix Adrian Bunk noticed the following Coverity report: > Commit e7f260a276f2c9184fe753732d834b1f6fbe9f17 > (x86: PAT use reserve free memtype in mmap of /dev/mem) > added the following gem to arch/x86/mm/pat.c: > > <-- snip --> > > ... > int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn, > unsigned long size, pgprot_t *vma_prot) > { > u64 offset = ((u64) pfn) << PAGE_SHIFT; > unsigned long flags = _PAGE_CACHE_UC_MINUS; > unsigned long ret_flags; > ... > ... (nothing that touches ret_flags) > ... > if (flags != _PAGE_CACHE_UC_MINUS) { > retval = reserve_memtype(offset, offset + size, flags, NULL); > } else { > retval = reserve_memtype(offset, offset + size, -1, &ret_flags); > } > > if (retval < 0) > return 0; > > flags = ret_flags; > > if (pfn <= max_pfn_mapped && > ioremap_change_attr((unsigned long)__va(offset), size, flags) < 0) { > free_memtype(offset, offset + size); > printk(KERN_INFO > "%s:%d /dev/mem ioremap_change_attr failed %s for %Lx-%Lx\n", > current->comm, current->pid, > cattr_name(flags), > offset, offset + size); > return 0; > } > > *vma_prot = __pgprot((pgprot_val(*vma_prot) & ~_PAGE_CACHE_MASK) | > flags); > return 1; > } > > <-- snip --> > > If (flags != _PAGE_CACHE_UC_MINUS) we pass garbage from the stack to > ioremap_change_attr() and/or __pgprot(). > > Spotted by the Coverity checker. the fix simplifies the code as we get rid of the 'ret_flags' complication. Signed-off-by: Ingo Molnar Signed-off-by: Linus Torvalds --- arch/x86/mm/pat.c | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/arch/x86/mm/pat.c b/arch/x86/mm/pat.c index b17cdf64e41e..277446cd30b6 100644 --- a/arch/x86/mm/pat.c +++ b/arch/x86/mm/pat.c @@ -510,7 +510,6 @@ int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn, { u64 offset = ((u64) pfn) << PAGE_SHIFT; unsigned long flags = _PAGE_CACHE_UC_MINUS; - unsigned long ret_flags; int retval; if (!range_is_allowed(pfn, size)) @@ -549,14 +548,12 @@ int phys_mem_access_prot_allowed(struct file *file, unsigned long pfn, if (flags != _PAGE_CACHE_UC_MINUS) { retval = reserve_memtype(offset, offset + size, flags, NULL); } else { - retval = reserve_memtype(offset, offset + size, -1, &ret_flags); + retval = reserve_memtype(offset, offset + size, -1, &flags); } if (retval < 0) return 0; - flags = ret_flags; - if (pfn <= max_pfn_mapped && ioremap_change_attr((unsigned long)__va(offset), size, flags) < 0) { free_memtype(offset, offset + size); -- cgit v1.2.3 From 556637cdabcd5918c7d4a1a2679b8f86fc81e891 Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Mon, 28 Apr 2008 02:11:47 -0700 Subject: mm: fix possible off-by-one in walk_pte_range() After the loop in walk_pte_range() pte might point to the first address after the pmd it walks. The pte_unmap() is then applied to something bad. Spotted by Roel Kluin and Andreas Schwab. Signed-off-by: Johannes Weiner Cc: Roel Kluin <12o3l@tiscali.nl> Cc: Andreas Schwab Acked-by: Matt Mackall Acked-by: Mikael Pettersson Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/pagewalk.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/mm/pagewalk.c b/mm/pagewalk.c index 1cf1417ef8b7..0afd2387e507 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -9,11 +9,15 @@ static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, int err = 0; pte = pte_offset_map(pmd, addr); - do { + for (;;) { err = walk->pte_entry(pte, addr, addr + PAGE_SIZE, private); if (err) break; - } while (pte++, addr += PAGE_SIZE, addr != end); + addr += PAGE_SIZE; + if (addr == end) + break; + pte++; + } pte_unmap(pte); return err; -- cgit v1.2.3 From 1ecf0d0cd28a4bfed3009f752061998e52d14db2 Mon Sep 17 00:00:00 2001 From: Roel Kluin <12o3l@tiscali.nl> Date: Mon, 28 Apr 2008 02:11:50 -0700 Subject: dz: test after postfix decrement fails in dz_console_putchar() When loops reaches 0 the postfix decrement still subtracts, so the subsequent test fails. Signed-off-by: Roel Kluin <12o3l@tiscali.nl> Acked-by: Maciej W. Rozycki Cc: Johannes Weiner Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/serial/dz.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/serial/dz.c b/drivers/serial/dz.c index 116211fcd36f..0dddd68b20d2 100644 --- a/drivers/serial/dz.c +++ b/drivers/serial/dz.c @@ -819,7 +819,7 @@ static void dz_console_putchar(struct uart_port *uport, int ch) dz_out(dport, DZ_TCR, mask); iob(); udelay(2); - } while (loops--); + } while (--loops); if (loops) /* Cannot send otherwise. */ dz_out(dport, DZ_TDR, ch); -- cgit v1.2.3 From 77459b059b02c16b2c8cbc39b524941a576ad36e Mon Sep 17 00:00:00 2001 From: David Brownell Date: Mon, 28 Apr 2008 02:11:51 -0700 Subject: rtc-pcf8583 build fix Fix bogus #include in rtc-pcf8583, so it compiles on platforms that don't support PC clone RTCs. (Original issue noted by Adrian Bunk.) Signed-off-by: David Brownell Cc: Adrian Bunk Acked-by: Alessandro Zummo Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/rtc/rtc-pcf8583.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/rtc/rtc-pcf8583.c b/drivers/rtc/rtc-pcf8583.c index 8b3997007506..3d09d8f0b1f0 100644 --- a/drivers/rtc/rtc-pcf8583.c +++ b/drivers/rtc/rtc-pcf8583.c @@ -15,7 +15,7 @@ #include #include #include -#include +#include #include #include #include -- cgit v1.2.3 From c750090085f260503d8beec1c73c4d2e4fe93628 Mon Sep 17 00:00:00 2001 From: David Brownell Date: Mon, 28 Apr 2008 02:11:52 -0700 Subject: rtc: avoid legacy drivers with generic framework Kconfig tweaks to help reduce RTC configuration bugs, by avoiding legacy RTC drivers when the generic RTC framework is enabled: - If rtc-cmos is selected, disable the legacy rtc driver; - When using generic RTC on x86, enable rtc-cmos by default; - In the old "chardev RTC" section of Kconfig, add a comment warning people off these (seven) legacy RTC drivers when the generic framework is in use. People can still use the legacy drivers if they want (or need) to. This doesn't fix the broken dependencies for the legacy "CMOS" RTC driver. Ideally it would be a full list of platforms where it works, not a partial list of ones where it won't. Or better yet, it would depend on a "HAVE_CMOS_RTC" flag defined by various platforms ... surely there's a Kconfig style guideline lurking there. Signed-off-by: David Brownell Acked-by: Alessandro Zummo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/char/Kconfig | 11 ++++++++++- drivers/rtc/Kconfig | 5 +---- 2 files changed, 11 insertions(+), 5 deletions(-) diff --git a/drivers/char/Kconfig b/drivers/char/Kconfig index 2906ee7bd298..929d4fa73fd9 100644 --- a/drivers/char/Kconfig +++ b/drivers/char/Kconfig @@ -732,9 +732,16 @@ config NVRAM To compile this driver as a module, choose M here: the module will be called nvram. +# +# These legacy RTC drivers just cause too many conflicts with the generic +# RTC framework ... let's not even try to coexist any more. +# +if RTC_LIB=n + config RTC tristate "Enhanced Real Time Clock Support" - depends on !PPC && !PARISC && !IA64 && !M68K && !SPARC && !FRV && !ARM && !SUPERH && !S390 && !AVR32 + depends on !PPC && !PARISC && !IA64 && !M68K && !SPARC && !FRV \ + && !ARM && !SUPERH && !S390 && !AVR32 ---help--- If you say Y here and create a character special file /dev/rtc with major number 10 and minor number 135 using mknod ("man mknod"), you @@ -840,6 +847,8 @@ config DS1302 will get access to the real time clock (or hardware clock) built into your computer. +endif # RTC_LIB + config COBALT_LCD bool "Support for Cobalt LCD" depends on MIPS_COBALT diff --git a/drivers/rtc/Kconfig b/drivers/rtc/Kconfig index 02a4c8cf2b2d..6cc2c0330230 100644 --- a/drivers/rtc/Kconfig +++ b/drivers/rtc/Kconfig @@ -20,10 +20,6 @@ menuconfig RTC_CLASS if RTC_CLASS -if GEN_RTC || RTC -comment "Conflicting RTC option has been selected, check GEN_RTC and RTC" -endif - config RTC_HCTOSYS bool "Set system time from RTC on startup and resume" depends on RTC_CLASS = y @@ -304,6 +300,7 @@ comment "Platform RTC drivers" config RTC_DRV_CMOS tristate "PC-style 'CMOS'" depends on X86 || ALPHA || ARM || M32R || ATARI || PPC || MIPS + default y if X86 help Say "yes" here to get direct support for the real time clock found in every PC or ACPI-based system, and some other boards. -- cgit v1.2.3 From 9edae7bcdcbac2dbf037b751ce1809eb2758cd8e Mon Sep 17 00:00:00 2001 From: Alessandro Zummo Date: Mon, 28 Apr 2008 02:11:53 -0700 Subject: rtc-isl1208: new style conversion and minor bug fixes [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Alessandro Zummo Cc: Herbert Valerio Riedel Cc: David Brownell Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/rtc/rtc-isl1208.c | 357 ++++++++++++++++++++++------------------------ 1 file changed, 170 insertions(+), 187 deletions(-) diff --git a/drivers/rtc/rtc-isl1208.c b/drivers/rtc/rtc-isl1208.c index 725b0c73c333..fb15e3fb4ce2 100644 --- a/drivers/rtc/rtc-isl1208.c +++ b/drivers/rtc/rtc-isl1208.c @@ -15,16 +15,15 @@ #include #include -#define DRV_NAME "isl1208" -#define DRV_VERSION "0.2" +#define DRV_VERSION "0.3" /* Register map */ /* rtc section */ #define ISL1208_REG_SC 0x00 #define ISL1208_REG_MN 0x01 #define ISL1208_REG_HR 0x02 -#define ISL1208_REG_HR_MIL (1<<7) /* 24h/12h mode */ -#define ISL1208_REG_HR_PM (1<<5) /* PM/AM bit in 12h mode */ +#define ISL1208_REG_HR_MIL (1<<7) /* 24h/12h mode */ +#define ISL1208_REG_HR_PM (1<<5) /* PM/AM bit in 12h mode */ #define ISL1208_REG_DT 0x03 #define ISL1208_REG_MO 0x04 #define ISL1208_REG_YR 0x05 @@ -33,14 +32,14 @@ /* control/status section */ #define ISL1208_REG_SR 0x07 -#define ISL1208_REG_SR_ARST (1<<7) /* auto reset */ -#define ISL1208_REG_SR_XTOSCB (1<<6) /* crystal oscillator */ -#define ISL1208_REG_SR_WRTC (1<<4) /* write rtc */ -#define ISL1208_REG_SR_ALM (1<<2) /* alarm */ -#define ISL1208_REG_SR_BAT (1<<1) /* battery */ -#define ISL1208_REG_SR_RTCF (1<<0) /* rtc fail */ +#define ISL1208_REG_SR_ARST (1<<7) /* auto reset */ +#define ISL1208_REG_SR_XTOSCB (1<<6) /* crystal oscillator */ +#define ISL1208_REG_SR_WRTC (1<<4) /* write rtc */ +#define ISL1208_REG_SR_ALM (1<<2) /* alarm */ +#define ISL1208_REG_SR_BAT (1<<1) /* battery */ +#define ISL1208_REG_SR_RTCF (1<<0) /* rtc fail */ #define ISL1208_REG_INT 0x08 -#define ISL1208_REG_09 0x09 /* reserved */ +#define ISL1208_REG_09 0x09 /* reserved */ #define ISL1208_REG_ATR 0x0a #define ISL1208_REG_DTR 0x0b @@ -58,39 +57,21 @@ #define ISL1208_REG_USR2 0x13 #define ISL1208_USR_SECTION_LEN 2 -/* i2c configuration */ -#define ISL1208_I2C_ADDR 0xde - -static const unsigned short normal_i2c[] = { - ISL1208_I2C_ADDR>>1, I2C_CLIENT_END -}; -I2C_CLIENT_INSMOD; /* defines addr_data */ - -static int isl1208_attach_adapter(struct i2c_adapter *adapter); -static int isl1208_detach_client(struct i2c_client *client); - -static struct i2c_driver isl1208_driver = { - .driver = { - .name = DRV_NAME, - }, - .id = I2C_DRIVERID_ISL1208, - .attach_adapter = &isl1208_attach_adapter, - .detach_client = &isl1208_detach_client, -}; +static struct i2c_driver isl1208_driver; /* block read */ static int isl1208_i2c_read_regs(struct i2c_client *client, u8 reg, u8 buf[], - unsigned len) + unsigned len) { u8 reg_addr[1] = { reg }; struct i2c_msg msgs[2] = { - { client->addr, client->flags, sizeof(reg_addr), reg_addr }, - { client->addr, client->flags | I2C_M_RD, len, buf } + {client->addr, 0, sizeof(reg_addr), reg_addr} + , + {client->addr, I2C_M_RD, len, buf} }; int ret; - BUG_ON(len == 0); BUG_ON(reg > ISL1208_REG_USR2); BUG_ON(reg + len > ISL1208_REG_USR2 + 1); @@ -103,15 +84,14 @@ isl1208_i2c_read_regs(struct i2c_client *client, u8 reg, u8 buf[], /* block write */ static int isl1208_i2c_set_regs(struct i2c_client *client, u8 reg, u8 const buf[], - unsigned len) + unsigned len) { u8 i2c_buf[ISL1208_REG_USR2 + 2]; struct i2c_msg msgs[1] = { - { client->addr, client->flags, len + 1, i2c_buf } + {client->addr, 0, len + 1, i2c_buf} }; int ret; - BUG_ON(len == 0); BUG_ON(reg > ISL1208_REG_USR2); BUG_ON(reg + len > ISL1208_REG_USR2 + 1); @@ -125,7 +105,8 @@ isl1208_i2c_set_regs(struct i2c_client *client, u8 reg, u8 const buf[], } /* simple check to see wether we have a isl1208 */ -static int isl1208_i2c_validate_client(struct i2c_client *client) +static int +isl1208_i2c_validate_client(struct i2c_client *client) { u8 regs[ISL1208_RTC_SECTION_LEN] = { 0, }; u8 zero_mask[ISL1208_RTC_SECTION_LEN] = { @@ -139,24 +120,29 @@ static int isl1208_i2c_validate_client(struct i2c_client *client) return ret; for (i = 0; i < ISL1208_RTC_SECTION_LEN; ++i) { - if (regs[i] & zero_mask[i]) /* check if bits are cleared */ + if (regs[i] & zero_mask[i]) /* check if bits are cleared */ return -ENODEV; } return 0; } -static int isl1208_i2c_get_sr(struct i2c_client *client) +static int +isl1208_i2c_get_sr(struct i2c_client *client) { - return i2c_smbus_read_byte_data(client, ISL1208_REG_SR) == -1 ? -EIO:0; + int sr = i2c_smbus_read_byte_data(client, ISL1208_REG_SR); + if (sr < 0) + return -EIO; + + return sr; } -static int isl1208_i2c_get_atr(struct i2c_client *client) +static int +isl1208_i2c_get_atr(struct i2c_client *client) { int atr = i2c_smbus_read_byte_data(client, ISL1208_REG_ATR); - if (atr < 0) - return -EIO; + return atr; /* The 6bit value in the ATR register controls the load * capacitance C_load * in steps of 0.25pF @@ -169,51 +155,54 @@ static int isl1208_i2c_get_atr(struct i2c_client *client) * */ - atr &= 0x3f; /* mask out lsb */ - atr ^= 1<<5; /* invert 6th bit */ - atr += 2*9; /* add offset of 4.5pF; unit[atr] = 0.25pF */ + atr &= 0x3f; /* mask out lsb */ + atr ^= 1 << 5; /* invert 6th bit */ + atr += 2 * 9; /* add offset of 4.5pF; unit[atr] = 0.25pF */ return atr; } -static int isl1208_i2c_get_dtr(struct i2c_client *client) +static int +isl1208_i2c_get_dtr(struct i2c_client *client) { int dtr = i2c_smbus_read_byte_data(client, ISL1208_REG_DTR); - if (dtr < 0) return -EIO; /* dtr encodes adjustments of {-60,-40,-20,0,20,40,60} ppm */ - dtr = ((dtr & 0x3) * 20) * (dtr & (1<<2) ? -1 : 1); + dtr = ((dtr & 0x3) * 20) * (dtr & (1 << 2) ? -1 : 1); return dtr; } -static int isl1208_i2c_get_usr(struct i2c_client *client) +static int +isl1208_i2c_get_usr(struct i2c_client *client) { u8 buf[ISL1208_USR_SECTION_LEN] = { 0, }; int ret; - ret = isl1208_i2c_read_regs (client, ISL1208_REG_USR1, buf, - ISL1208_USR_SECTION_LEN); + ret = isl1208_i2c_read_regs(client, ISL1208_REG_USR1, buf, + ISL1208_USR_SECTION_LEN); if (ret < 0) return ret; return (buf[1] << 8) | buf[0]; } -static int isl1208_i2c_set_usr(struct i2c_client *client, u16 usr) +static int +isl1208_i2c_set_usr(struct i2c_client *client, u16 usr) { u8 buf[ISL1208_USR_SECTION_LEN]; buf[0] = usr & 0xff; buf[1] = (usr >> 8) & 0xff; - return isl1208_i2c_set_regs (client, ISL1208_REG_USR1, buf, - ISL1208_USR_SECTION_LEN); + return isl1208_i2c_set_regs(client, ISL1208_REG_USR1, buf, + ISL1208_USR_SECTION_LEN); } -static int isl1208_rtc_proc(struct device *dev, struct seq_file *seq) +static int +isl1208_rtc_proc(struct device *dev, struct seq_file *seq) { struct i2c_client *const client = to_i2c_client(dev); int sr, dtr, atr, usr; @@ -230,20 +219,19 @@ static int isl1208_rtc_proc(struct device *dev, struct seq_file *seq) (sr & ISL1208_REG_SR_ALM) ? " ALM" : "", (sr & ISL1208_REG_SR_WRTC) ? " WRTC" : "", (sr & ISL1208_REG_SR_XTOSCB) ? " XTOSCB" : "", - (sr & ISL1208_REG_SR_ARST) ? " ARST" : "", - sr); + (sr & ISL1208_REG_SR_ARST) ? " ARST" : "", sr); seq_printf(seq, "batt_status\t: %s\n", (sr & ISL1208_REG_SR_RTCF) ? "bad" : "okay"); dtr = isl1208_i2c_get_dtr(client); - if (dtr >= 0 -1) + if (dtr >= 0 - 1) seq_printf(seq, "digital_trim\t: %d ppm\n", dtr); atr = isl1208_i2c_get_atr(client); if (atr >= 0) seq_printf(seq, "analog_trim\t: %d.%.2d pF\n", - atr>>2, (atr&0x3)*25); + atr >> 2, (atr & 0x3) * 25); usr = isl1208_i2c_get_usr(client); if (usr >= 0) @@ -252,9 +240,8 @@ static int isl1208_rtc_proc(struct device *dev, struct seq_file *seq) return 0; } - -static int isl1208_i2c_read_time(struct i2c_client *client, - struct rtc_time *tm) +static int +isl1208_i2c_read_time(struct i2c_client *client, struct rtc_time *tm) { int sr; u8 regs[ISL1208_RTC_SECTION_LEN] = { 0, }; @@ -274,27 +261,30 @@ static int isl1208_i2c_read_time(struct i2c_client *client, tm->tm_sec = BCD2BIN(regs[ISL1208_REG_SC]); tm->tm_min = BCD2BIN(regs[ISL1208_REG_MN]); - { /* HR field has a more complex interpretation */ + + /* HR field has a more complex interpretation */ + { const u8 _hr = regs[ISL1208_REG_HR]; - if (_hr & ISL1208_REG_HR_MIL) /* 24h format */ + if (_hr & ISL1208_REG_HR_MIL) /* 24h format */ tm->tm_hour = BCD2BIN(_hr & 0x3f); - else { // 12h format + else { + /* 12h format */ tm->tm_hour = BCD2BIN(_hr & 0x1f); - if (_hr & ISL1208_REG_HR_PM) /* PM flag set */ + if (_hr & ISL1208_REG_HR_PM) /* PM flag set */ tm->tm_hour += 12; } } tm->tm_mday = BCD2BIN(regs[ISL1208_REG_DT]); - tm->tm_mon = BCD2BIN(regs[ISL1208_REG_MO]) - 1; /* rtc starts at 1 */ + tm->tm_mon = BCD2BIN(regs[ISL1208_REG_MO]) - 1; /* rtc starts at 1 */ tm->tm_year = BCD2BIN(regs[ISL1208_REG_YR]) + 100; tm->tm_wday = BCD2BIN(regs[ISL1208_REG_DW]); return 0; } -static int isl1208_i2c_read_alarm(struct i2c_client *client, - struct rtc_wkalrm *alarm) +static int +isl1208_i2c_read_alarm(struct i2c_client *client, struct rtc_wkalrm *alarm) { struct rtc_time *const tm = &alarm->time; u8 regs[ISL1208_ALARM_SECTION_LEN] = { 0, }; @@ -307,7 +297,7 @@ static int isl1208_i2c_read_alarm(struct i2c_client *client, } sr = isl1208_i2c_read_regs(client, ISL1208_REG_SCA, regs, - ISL1208_ALARM_SECTION_LEN); + ISL1208_ALARM_SECTION_LEN); if (sr < 0) { dev_err(&client->dev, "%s: reading alarm section failed\n", __func__); @@ -315,23 +305,25 @@ static int isl1208_i2c_read_alarm(struct i2c_client *client, } /* MSB of each alarm register is an enable bit */ - tm->tm_sec = BCD2BIN(regs[ISL1208_REG_SCA-ISL1208_REG_SCA] & 0x7f); - tm->tm_min = BCD2BIN(regs[ISL1208_REG_MNA-ISL1208_REG_SCA] & 0x7f); - tm->tm_hour = BCD2BIN(regs[ISL1208_REG_HRA-ISL1208_REG_SCA] & 0x3f); - tm->tm_mday = BCD2BIN(regs[ISL1208_REG_DTA-ISL1208_REG_SCA] & 0x3f); - tm->tm_mon = BCD2BIN(regs[ISL1208_REG_MOA-ISL1208_REG_SCA] & 0x1f)-1; - tm->tm_wday = BCD2BIN(regs[ISL1208_REG_DWA-ISL1208_REG_SCA] & 0x03); + tm->tm_sec = BCD2BIN(regs[ISL1208_REG_SCA - ISL1208_REG_SCA] & 0x7f); + tm->tm_min = BCD2BIN(regs[ISL1208_REG_MNA - ISL1208_REG_SCA] & 0x7f); + tm->tm_hour = BCD2BIN(regs[ISL1208_REG_HRA - ISL1208_REG_SCA] & 0x3f); + tm->tm_mday = BCD2BIN(regs[ISL1208_REG_DTA - ISL1208_REG_SCA] & 0x3f); + tm->tm_mon = + BCD2BIN(regs[ISL1208_REG_MOA - ISL1208_REG_SCA] & 0x1f) - 1; + tm->tm_wday = BCD2BIN(regs[ISL1208_REG_DWA - ISL1208_REG_SCA] & 0x03); return 0; } -static int isl1208_rtc_read_time(struct device *dev, struct rtc_time *tm) +static int +isl1208_rtc_read_time(struct device *dev, struct rtc_time *tm) { return isl1208_i2c_read_time(to_i2c_client(dev), tm); } -static int isl1208_i2c_set_time(struct i2c_client *client, - struct rtc_time const *tm) +static int +isl1208_i2c_set_time(struct i2c_client *client, struct rtc_time const *tm) { int sr; u8 regs[ISL1208_RTC_SECTION_LEN] = { 0, }; @@ -353,7 +345,7 @@ static int isl1208_i2c_set_time(struct i2c_client *client, } /* set WRTC */ - sr = i2c_smbus_write_byte_data (client, ISL1208_REG_SR, + sr = i2c_smbus_write_byte_data(client, ISL1208_REG_SR, sr | ISL1208_REG_SR_WRTC); if (sr < 0) { dev_err(&client->dev, "%s: writing SR failed\n", __func__); @@ -369,7 +361,7 @@ static int isl1208_i2c_set_time(struct i2c_client *client, } /* clear WRTC again */ - sr = i2c_smbus_write_byte_data (client, ISL1208_REG_SR, + sr = i2c_smbus_write_byte_data(client, ISL1208_REG_SR, sr & ~ISL1208_REG_SR_WRTC); if (sr < 0) { dev_err(&client->dev, "%s: writing SR failed\n", __func__); @@ -380,70 +372,69 @@ static int isl1208_i2c_set_time(struct i2c_client *client, } -static int isl1208_rtc_set_time(struct device *dev, struct rtc_time *tm) +static int +isl1208_rtc_set_time(struct device *dev, struct rtc_time *tm) { return isl1208_i2c_set_time(to_i2c_client(dev), tm); } -static int isl1208_rtc_read_alarm(struct device *dev, struct rtc_wkalrm *alarm) +static int +isl1208_rtc_read_alarm(struct device *dev, struct rtc_wkalrm *alarm) { return isl1208_i2c_read_alarm(to_i2c_client(dev), alarm); } static const struct rtc_class_ops isl1208_rtc_ops = { - .proc = isl1208_rtc_proc, - .read_time = isl1208_rtc_read_time, - .set_time = isl1208_rtc_set_time, - .read_alarm = isl1208_rtc_read_alarm, - //.set_alarm = isl1208_rtc_set_alarm, + .proc = isl1208_rtc_proc, + .read_time = isl1208_rtc_read_time, + .set_time = isl1208_rtc_set_time, + .read_alarm = isl1208_rtc_read_alarm, + /*.set_alarm = isl1208_rtc_set_alarm, */ }; /* sysfs interface */ -static ssize_t isl1208_sysfs_show_atrim(struct device *dev, - struct device_attribute *attr, - char *buf) +static ssize_t +isl1208_sysfs_show_atrim(struct device *dev, + struct device_attribute *attr, char *buf) { - int atr; - - atr = isl1208_i2c_get_atr(to_i2c_client(dev)); + int atr = isl1208_i2c_get_atr(to_i2c_client(dev)); if (atr < 0) return atr; - return sprintf(buf, "%d.%.2d pF\n", atr>>2, (atr&0x3)*25); + return sprintf(buf, "%d.%.2d pF\n", atr >> 2, (atr & 0x3) * 25); } + static DEVICE_ATTR(atrim, S_IRUGO, isl1208_sysfs_show_atrim, NULL); -static ssize_t isl1208_sysfs_show_dtrim(struct device *dev, - struct device_attribute *attr, - char *buf) +static ssize_t +isl1208_sysfs_show_dtrim(struct device *dev, + struct device_attribute *attr, char *buf) { - int dtr; - - dtr = isl1208_i2c_get_dtr(to_i2c_client(dev)); + int dtr = isl1208_i2c_get_dtr(to_i2c_client(dev)); if (dtr < 0) return dtr; return sprintf(buf, "%d ppm\n", dtr); } + static DEVICE_ATTR(dtrim, S_IRUGO, isl1208_sysfs_show_dtrim, NULL); -static ssize_t isl1208_sysfs_show_usr(struct device *dev, - struct device_attribute *attr, - char *buf) +static ssize_t +isl1208_sysfs_show_usr(struct device *dev, + struct device_attribute *attr, char *buf) { - int usr; - - usr = isl1208_i2c_get_usr(to_i2c_client(dev)); + int usr = isl1208_i2c_get_usr(to_i2c_client(dev)); if (usr < 0) return usr; return sprintf(buf, "0x%.4x\n", usr); } -static ssize_t isl1208_sysfs_store_usr(struct device *dev, - struct device_attribute *attr, - const char *buf, size_t count) +static ssize_t +isl1208_sysfs_store_usr(struct device *dev, + struct device_attribute *attr, + const char *buf, size_t count) { int usr = -1; @@ -460,124 +451,116 @@ static ssize_t isl1208_sysfs_store_usr(struct device *dev, return isl1208_i2c_set_usr(to_i2c_client(dev), usr) ? -EIO : count; } + static DEVICE_ATTR(usr, S_IRUGO | S_IWUSR, isl1208_sysfs_show_usr, isl1208_sysfs_store_usr); static int -isl1208_probe(struct i2c_adapter *adapter, int addr, int kind) +isl1208_sysfs_register(struct device *dev) { - int rc = 0; - struct i2c_client *new_client = NULL; - struct rtc_device *rtc = NULL; + int err; + + err = device_create_file(dev, &dev_attr_atrim); + if (err) + return err; - if (!i2c_check_functionality(adapter, I2C_FUNC_I2C)) { - rc = -ENODEV; - goto failout; + err = device_create_file(dev, &dev_attr_dtrim); + if (err) { + device_remove_file(dev, &dev_attr_atrim); + return err; } - new_client = kzalloc(sizeof(struct i2c_client), GFP_KERNEL); - if (new_client == NULL) { - rc = -ENOMEM; - goto failout; + err = device_create_file(dev, &dev_attr_usr); + if (err) { + device_remove_file(dev, &dev_attr_atrim); + device_remove_file(dev, &dev_attr_dtrim); } - new_client->addr = addr; - new_client->adapter = adapter; - new_client->driver = &isl1208_driver; - new_client->flags = 0; - strcpy(new_client->name, DRV_NAME); + return 0; +} - if (kind < 0) { - rc = isl1208_i2c_validate_client(new_client); - if (rc < 0) - goto failout; - } +static int +isl1208_sysfs_unregister(struct device *dev) +{ + device_remove_file(dev, &dev_attr_atrim); + device_remove_file(dev, &dev_attr_atrim); + device_remove_file(dev, &dev_attr_usr); + + return 0; +} + +static int +isl1208_probe(struct i2c_client *client) +{ + int rc = 0; + struct rtc_device *rtc; - rc = i2c_attach_client(new_client); - if (rc < 0) - goto failout; + if (!i2c_check_functionality(client->adapter, I2C_FUNC_I2C)) + return -ENODEV; - dev_info(&new_client->dev, + if (isl1208_i2c_validate_client(client) < 0) + return -ENODEV; + + dev_info(&client->dev, "chip found, driver version " DRV_VERSION "\n"); rtc = rtc_device_register(isl1208_driver.driver.name, - &new_client->dev, - &isl1208_rtc_ops, THIS_MODULE); - - if (IS_ERR(rtc)) { - rc = PTR_ERR(rtc); - goto failout_detach; - } + &client->dev, &isl1208_rtc_ops, + THIS_MODULE); + if (IS_ERR(rtc)) + return PTR_ERR(rtc); - i2c_set_clientdata(new_client, rtc); + i2c_set_clientdata(client, rtc); - rc = isl1208_i2c_get_sr(new_client); + rc = isl1208_i2c_get_sr(client); if (rc < 0) { - dev_err(&new_client->dev, "reading status failed\n"); - goto failout_unregister; + dev_err(&client->dev, "reading status failed\n"); + goto exit_unregister; } if (rc & ISL1208_REG_SR_RTCF) - dev_warn(&new_client->dev, "rtc power failure detected, " + dev_warn(&client->dev, "rtc power failure detected, " "please set clock.\n"); - rc = device_create_file(&new_client->dev, &dev_attr_atrim); - if (rc < 0) - goto failout_unregister; - rc = device_create_file(&new_client->dev, &dev_attr_dtrim); - if (rc < 0) - goto failout_atrim; - rc = device_create_file(&new_client->dev, &dev_attr_usr); - if (rc < 0) - goto failout_dtrim; + rc = isl1208_sysfs_register(&client->dev); + if (rc) + goto exit_unregister; return 0; - failout_dtrim: - device_remove_file(&new_client->dev, &dev_attr_dtrim); - failout_atrim: - device_remove_file(&new_client->dev, &dev_attr_atrim); - failout_unregister: +exit_unregister: rtc_device_unregister(rtc); - failout_detach: - i2c_detach_client(new_client); - failout: - kfree(new_client); - return rc; -} -static int -isl1208_attach_adapter (struct i2c_adapter *adapter) -{ - return i2c_probe(adapter, &addr_data, isl1208_probe); + return rc; } static int -isl1208_detach_client(struct i2c_client *client) +isl1208_remove(struct i2c_client *client) { - int rc; - struct rtc_device *const rtc = i2c_get_clientdata(client); - - if (rtc) - rtc_device_unregister(rtc); /* do we need to kfree? */ - - rc = i2c_detach_client(client); - if (rc) - return rc; + struct rtc_device *rtc = i2c_get_clientdata(client); - kfree(client); + isl1208_sysfs_unregister(&client->dev); + rtc_device_unregister(rtc); return 0; } -/* module management */ +static struct i2c_driver isl1208_driver = { + .driver = { + .name = "rtc-isl1208", + }, + .probe = isl1208_probe, + .remove = isl1208_remove, +}; -static int __init isl1208_init(void) +static int __init +isl1208_init(void) { return i2c_add_driver(&isl1208_driver); } -static void __exit isl1208_exit(void) +static void __exit +isl1208_exit(void) { i2c_del_driver(&isl1208_driver); } -- cgit v1.2.3 From e5fc9cc0266e5babcf84c81908ec8843b7e3349f Mon Sep 17 00:00:00 2001 From: Alessandro Zummo Date: Mon, 28 Apr 2008 02:11:54 -0700 Subject: rtc-pcf8563: new style conversion [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Alessandro Zummo Cc: David Brownell Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/rtc/rtc-pcf8563.c | 112 ++++++++++++++-------------------------------- 1 file changed, 34 insertions(+), 78 deletions(-) diff --git a/drivers/rtc/rtc-pcf8563.c b/drivers/rtc/rtc-pcf8563.c index b3317fcc16c3..a1e2f39521f8 100644 --- a/drivers/rtc/rtc-pcf8563.c +++ b/drivers/rtc/rtc-pcf8563.c @@ -18,17 +18,7 @@ #include #include -#define DRV_VERSION "0.4.2" - -/* Addresses to scan: none - * This chip cannot be reliably autodetected. An empty eeprom - * located at 0x51 will pass the validation routine due to - * the way the registers are implemented. - */ -static const unsigned short normal_i2c[] = { I2C_CLIENT_END }; - -/* Module parameters */ -I2C_CLIENT_INSMOD; +#define DRV_VERSION "0.4.3" #define PCF8563_REG_ST1 0x00 /* status */ #define PCF8563_REG_ST2 0x01 @@ -53,8 +43,10 @@ I2C_CLIENT_INSMOD; #define PCF8563_SC_LV 0x80 /* low voltage */ #define PCF8563_MO_C 0x80 /* century */ +static struct i2c_driver pcf8563_driver; + struct pcf8563 { - struct i2c_client client; + struct rtc_device *rtc; /* * The meaning of MO_C bit varies by the chip type. * From PCF8563 datasheet: this bit is toggled when the years @@ -72,16 +64,13 @@ struct pcf8563 { int c_polarity; /* 0: MO_C=1 means 19xx, otherwise MO_C=1 means 20xx */ }; -static int pcf8563_probe(struct i2c_adapter *adapter, int address, int kind); -static int pcf8563_detach(struct i2c_client *client); - /* * In the routines that deal directly with the pcf8563 hardware, we use * rtc_time -- month 0-11, hour 0-23, yr = calendar year-epoch. */ static int pcf8563_get_datetime(struct i2c_client *client, struct rtc_time *tm) { - struct pcf8563 *pcf8563 = container_of(client, struct pcf8563, client); + struct pcf8563 *pcf8563 = i2c_get_clientdata(client); unsigned char buf[13] = { PCF8563_REG_ST1 }; struct i2c_msg msgs[] = { @@ -138,7 +127,7 @@ static int pcf8563_get_datetime(struct i2c_client *client, struct rtc_time *tm) static int pcf8563_set_datetime(struct i2c_client *client, struct rtc_time *tm) { - struct pcf8563 *pcf8563 = container_of(client, struct pcf8563, client); + struct pcf8563 *pcf8563 = i2c_get_clientdata(client); int i, err; unsigned char buf[9]; @@ -257,100 +246,67 @@ static const struct rtc_class_ops pcf8563_rtc_ops = { .set_time = pcf8563_rtc_set_time, }; -static int pcf8563_attach(struct i2c_adapter *adapter) -{ - return i2c_probe(adapter, &addr_data, pcf8563_probe); -} - -static struct i2c_driver pcf8563_driver = { - .driver = { - .name = "pcf8563", - }, - .id = I2C_DRIVERID_PCF8563, - .attach_adapter = &pcf8563_attach, - .detach_client = &pcf8563_detach, -}; - -static int pcf8563_probe(struct i2c_adapter *adapter, int address, int kind) +static int pcf8563_probe(struct i2c_client *client) { struct pcf8563 *pcf8563; - struct i2c_client *client; - struct rtc_device *rtc; int err = 0; - dev_dbg(&adapter->dev, "%s\n", __FUNCTION__); + dev_dbg(&client->dev, "%s\n", __func__); - if (!i2c_check_functionality(adapter, I2C_FUNC_I2C)) { - err = -ENODEV; - goto exit; - } + if (!i2c_check_functionality(client->adapter, I2C_FUNC_I2C)) + return -ENODEV; - if (!(pcf8563 = kzalloc(sizeof(struct pcf8563), GFP_KERNEL))) { - err = -ENOMEM; - goto exit; - } - - client = &pcf8563->client; - client->addr = address; - client->driver = &pcf8563_driver; - client->adapter = adapter; - - strlcpy(client->name, pcf8563_driver.driver.name, I2C_NAME_SIZE); + pcf8563 = kzalloc(sizeof(struct pcf8563), GFP_KERNEL); + if (!pcf8563) + return -ENOMEM; /* Verify the chip is really an PCF8563 */ - if (kind < 0) { - if (pcf8563_validate_client(client) < 0) { - err = -ENODEV; - goto exit_kfree; - } - } - - /* Inform the i2c layer */ - if ((err = i2c_attach_client(client))) + if (pcf8563_validate_client(client) < 0) { + err = -ENODEV; goto exit_kfree; + } dev_info(&client->dev, "chip found, driver version " DRV_VERSION "\n"); - rtc = rtc_device_register(pcf8563_driver.driver.name, &client->dev, - &pcf8563_rtc_ops, THIS_MODULE); + pcf8563->rtc = rtc_device_register(pcf8563_driver.driver.name, + &client->dev, &pcf8563_rtc_ops, THIS_MODULE); - if (IS_ERR(rtc)) { - err = PTR_ERR(rtc); - goto exit_detach; + if (IS_ERR(pcf8563->rtc)) { + err = PTR_ERR(pcf8563->rtc); + goto exit_kfree; } - i2c_set_clientdata(client, rtc); + i2c_set_clientdata(client, pcf8563); return 0; -exit_detach: - i2c_detach_client(client); - exit_kfree: kfree(pcf8563); -exit: return err; } -static int pcf8563_detach(struct i2c_client *client) +static int pcf8563_remove(struct i2c_client *client) { - struct pcf8563 *pcf8563 = container_of(client, struct pcf8563, client); - int err; - struct rtc_device *rtc = i2c_get_clientdata(client); + struct pcf8563 *pcf8563 = i2c_get_clientdata(client); - if (rtc) - rtc_device_unregister(rtc); - - if ((err = i2c_detach_client(client))) - return err; + if (pcf8563->rtc) + rtc_device_unregister(pcf8563->rtc); kfree(pcf8563); return 0; } +static struct i2c_driver pcf8563_driver = { + .driver = { + .name = "rtc-pcf8563", + }, + .probe = pcf8563_probe, + .remove = pcf8563_remove, +}; + static int __init pcf8563_init(void) { return i2c_add_driver(&pcf8563_driver); -- cgit v1.2.3 From 4edac2b442d6176afb0ae431123993dc00882987 Mon Sep 17 00:00:00 2001 From: Alessandro Zummo Date: Mon, 28 Apr 2008 02:11:54 -0700 Subject: rtc-x1205: new style conversion [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Alessandro Zummo Cc: David Brownell Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/rtc/rtc-x1205.c | 130 ++++++++++++++++-------------------------------- 1 file changed, 43 insertions(+), 87 deletions(-) diff --git a/drivers/rtc/rtc-x1205.c b/drivers/rtc/rtc-x1205.c index b90fb1866ce9..bb3290360091 100644 --- a/drivers/rtc/rtc-x1205.c +++ b/drivers/rtc/rtc-x1205.c @@ -22,20 +22,7 @@ #include #include -#define DRV_VERSION "1.0.7" - -/* Addresses to scan: none. This chip is located at - * 0x6f and uses a two bytes register addressing. - * Two bytes need to be written to read a single register, - * while most other chips just require one and take the second - * one as the data to be written. To prevent corrupting - * unknown chips, the user must explicitly set the probe parameter. - */ - -static const unsigned short normal_i2c[] = { I2C_CLIENT_END }; - -/* Insmod parameters */ -I2C_CLIENT_INSMOD; +#define DRV_VERSION "1.0.8" /* offsets into CCR area */ @@ -91,19 +78,7 @@ I2C_CLIENT_INSMOD; #define X1205_HR_MIL 0x80 /* Set in ccr.hour for 24 hr mode */ -/* Prototypes */ -static int x1205_attach(struct i2c_adapter *adapter); -static int x1205_detach(struct i2c_client *client); -static int x1205_probe(struct i2c_adapter *adapter, int address, int kind); - -static struct i2c_driver x1205_driver = { - .driver = { - .name = "x1205", - }, - .id = I2C_DRIVERID_X1205, - .attach_adapter = &x1205_attach, - .detach_client = &x1205_detach, -}; +static struct i2c_driver x1205_driver; /* * In the routines that deal directly with the x1205 hardware, we use @@ -497,58 +472,49 @@ static ssize_t x1205_sysfs_show_dtrim(struct device *dev, } static DEVICE_ATTR(dtrim, S_IRUGO, x1205_sysfs_show_dtrim, NULL); -static int x1205_attach(struct i2c_adapter *adapter) +static int x1205_sysfs_register(struct device *dev) +{ + int err; + + err = device_create_file(dev, &dev_attr_atrim); + if (err) + return err; + + err = device_create_file(dev, &dev_attr_dtrim); + if (err) + device_remove_file(dev, &dev_attr_atrim); + + return err; +} + +static void x1205_sysfs_unregister(struct device *dev) { - return i2c_probe(adapter, &addr_data, x1205_probe); + device_remove_file(dev, &dev_attr_atrim); + device_remove_file(dev, &dev_attr_dtrim); } -static int x1205_probe(struct i2c_adapter *adapter, int address, int kind) + +static int x1205_probe(struct i2c_client *client) { int err = 0; unsigned char sr; - struct i2c_client *client; struct rtc_device *rtc; - dev_dbg(&adapter->dev, "%s\n", __FUNCTION__); - - if (!i2c_check_functionality(adapter, I2C_FUNC_I2C)) { - err = -ENODEV; - goto exit; - } - - if (!(client = kzalloc(sizeof(struct i2c_client), GFP_KERNEL))) { - err = -ENOMEM; - goto exit; - } - - /* I2C client */ - client->addr = address; - client->driver = &x1205_driver; - client->adapter = adapter; + dev_dbg(&client->dev, "%s\n", __func__); - strlcpy(client->name, x1205_driver.driver.name, I2C_NAME_SIZE); - - /* Verify the chip is really an X1205 */ - if (kind < 0) { - if (x1205_validate_client(client) < 0) { - err = -ENODEV; - goto exit_kfree; - } - } + if (!i2c_check_functionality(client->adapter, I2C_FUNC_I2C)) + return -ENODEV; - /* Inform the i2c layer */ - if ((err = i2c_attach_client(client))) - goto exit_kfree; + if (x1205_validate_client(client) < 0) + return -ENODEV; dev_info(&client->dev, "chip found, driver version " DRV_VERSION "\n"); rtc = rtc_device_register(x1205_driver.driver.name, &client->dev, &x1205_rtc_ops, THIS_MODULE); - if (IS_ERR(rtc)) { - err = PTR_ERR(rtc); - goto exit_detach; - } + if (IS_ERR(rtc)) + return PTR_ERR(rtc); i2c_set_clientdata(client, rtc); @@ -565,45 +531,35 @@ static int x1205_probe(struct i2c_adapter *adapter, int address, int kind) else dev_err(&client->dev, "couldn't read status\n"); - err = device_create_file(&client->dev, &dev_attr_atrim); - if (err) goto exit_devreg; - err = device_create_file(&client->dev, &dev_attr_dtrim); - if (err) goto exit_atrim; + err = x1205_sysfs_register(&client->dev); + if (err) + goto exit_devreg; return 0; -exit_atrim: - device_remove_file(&client->dev, &dev_attr_atrim); - exit_devreg: rtc_device_unregister(rtc); -exit_detach: - i2c_detach_client(client); - -exit_kfree: - kfree(client); - -exit: return err; } -static int x1205_detach(struct i2c_client *client) +static int x1205_remove(struct i2c_client *client) { - int err; struct rtc_device *rtc = i2c_get_clientdata(client); - if (rtc) - rtc_device_unregister(rtc); - - if ((err = i2c_detach_client(client))) - return err; - - kfree(client); - + rtc_device_unregister(rtc); + x1205_sysfs_unregister(&client->dev); return 0; } +static struct i2c_driver x1205_driver = { + .driver = { + .name = "rtc-x1205", + }, + .probe = x1205_probe, + .remove = x1205_remove, +}; + static int __init x1205_init(void) { return i2c_add_driver(&x1205_driver); -- cgit v1.2.3 From c464652813fe128c346ce6e7ec8fb0d2b67de6fb Mon Sep 17 00:00:00 2001 From: Sam Ravnborg Date: Mon, 28 Apr 2008 02:11:55 -0700 Subject: rtc: silence section mismatch warning in rtc-test Fix following warning: WARNING: vmlinux.o(.data+0x253e28): Section mismatch in reference from the variable test_drv to the function .devexit.text:test_remove() Fix by renaming the platfrom_driver variable from *_drv to *_driver so modpost ignore the reference to an __devexit section. Signed-off-by: Sam Ravnborg Cc: Alessandro Zummo Cc: David Brownell Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/rtc/rtc-test.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/rtc/rtc-test.c b/drivers/rtc/rtc-test.c index 254c9fce27da..bc930022004a 100644 --- a/drivers/rtc/rtc-test.c +++ b/drivers/rtc/rtc-test.c @@ -147,7 +147,7 @@ static int __devexit test_remove(struct platform_device *plat_dev) return 0; } -static struct platform_driver test_drv = { +static struct platform_driver test_driver = { .probe = test_probe, .remove = __devexit_p(test_remove), .driver = { @@ -160,7 +160,7 @@ static int __init test_init(void) { int err; - if ((err = platform_driver_register(&test_drv))) + if ((err = platform_driver_register(&test_driver))) return err; if ((test0 = platform_device_alloc("rtc-test", 0)) == NULL) { @@ -191,7 +191,7 @@ exit_free_test0: platform_device_put(test0); exit_driver_unregister: - platform_driver_unregister(&test_drv); + platform_driver_unregister(&test_driver); return err; } @@ -199,7 +199,7 @@ static void __exit test_exit(void) { platform_device_unregister(test0); platform_device_unregister(test1); - platform_driver_unregister(&test_drv); + platform_driver_unregister(&test_driver); } MODULE_AUTHOR("Alessandro Zummo "); -- cgit v1.2.3 From a3ed107e63b7cd4d1ba1567a69a1feec5f0eabc1 Mon Sep 17 00:00:00 2001 From: Adrian Bunk Date: Mon, 28 Apr 2008 02:11:55 -0700 Subject: make ds1511_rtc_{read,set}_time() static Make the needlessly global ds1511_rtc_{read,set}_time() static. Signed-off-by: Adrian Bunk Cc: David Brownell Cc: Alessandro Zummo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/rtc/rtc-ds1511.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/rtc/rtc-ds1511.c b/drivers/rtc/rtc-ds1511.c index d08912f18ddd..a83a40b3ebaa 100644 --- a/drivers/rtc/rtc-ds1511.c +++ b/drivers/rtc/rtc-ds1511.c @@ -181,8 +181,7 @@ ds1511_wdog_disable(void) * stupidly, some callers call with year unmolested; * and some call with year = year - 1900. thanks. */ - int -ds1511_rtc_set_time(struct device *dev, struct rtc_time *rtc_tm) +static int ds1511_rtc_set_time(struct device *dev, struct rtc_time *rtc_tm) { u8 mon, day, dow, hrs, min, sec, yrs, cen; unsigned int flags; @@ -245,8 +244,7 @@ ds1511_rtc_set_time(struct device *dev, struct rtc_time *rtc_tm) return 0; } - int -ds1511_rtc_read_time(struct device *dev, struct rtc_time *rtc_tm) +static int ds1511_rtc_read_time(struct device *dev, struct rtc_time *rtc_tm) { unsigned int century; unsigned int flags; -- cgit v1.2.3 From e275ac477161a3df5c27e40c55f7af94cfb396cf Mon Sep 17 00:00:00 2001 From: David Brownell Date: Mon, 28 Apr 2008 02:11:56 -0700 Subject: kerneldoc for Add to the generated kerneldoc, with some overview to go along with those per-function descriptions. Signed-off-by: David Brownell Cc: Russell King Cc: Alessandro Zummo Cc: "Randy.Dunlap" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/DocBook/kernel-api.tmpl | 54 +++++++++++++++++++++++++++++++++++ 1 file changed, 54 insertions(+) diff --git a/Documentation/DocBook/kernel-api.tmpl b/Documentation/DocBook/kernel-api.tmpl index 488dd4a4945b..617c2d979975 100644 --- a/Documentation/DocBook/kernel-api.tmpl +++ b/Documentation/DocBook/kernel-api.tmpl @@ -645,4 +645,58 @@ X!Idrivers/video/console/fonts.c !Edrivers/i2c/i2c-core.c + + Clock Framework + + + The clock framework defines programming interfaces to support + software management of the system clock tree. + This framework is widely used with System-On-Chip (SOC) platforms + to support power management and various devices which may need + custom clock rates. + Note that these "clocks" don't relate to timekeeping or real + time clocks (RTCs), each of which have separate frameworks. + These struct clk instances may be used + to manage for example a 96 MHz signal that is used to shift bits + into and out of peripherals or busses, or otherwise trigger + synchronous state machine transitions in system hardware. + + + + Power management is supported by explicit software clock gating: + unused clocks are disabled, so the system doesn't waste power + changing the state of transistors that aren't in active use. + On some systems this may be backed by hardware clock gating, + where clocks are gated without being disabled in software. + Sections of chips that are powered but not clocked may be able + to retain their last state. + This low power state is often called a retention + mode. + This mode still incurs leakage currents, especially with finer + circuit geometries, but for CMOS circuits power is mostly used + by clocked state changes. + + + + Power-aware drivers only enable their clocks when the device + they manage is in active use. Also, system sleep states often + differ according to which clock domains are active: while a + "standby" state may allow wakeup from several active domains, a + "mem" (suspend-to-RAM) state may require a more wholesale shutdown + of clocks derived from higher speed PLLs and oscillators, limiting + the number of possible wakeup event sources. A driver's suspend + method may need to be aware of system-specific clock constraints + on the target sleep state. + + + + Some platforms support programmable clock generators. These + can be used by external chips of various kinds, such as other + CPUs, multimedia codecs, and devices with strict requirements + for interface clocking. + + +!Iinclude/linux/clk.h + + -- cgit v1.2.3 From e2bfe3424b368e977002fc58f81536d5d8ea9449 Mon Sep 17 00:00:00 2001 From: Paul Mundt Date: Mon, 28 Apr 2008 02:11:57 -0700 Subject: rtc: rtc-rs5c372: fix up NULL name in transfer error path rs5c_get_regs() currently uses rs5c->rtc->name for its debug printk when i2c_transfer() fails, though it is used several times before the rtc dev has been registered. The earliest we can get at the symbolic name is via the i2c client's struct device, which can be handled by moving the first rs5c_get_regs() until after the client pointer is assigned. Signed-off-by: Paul Mundt Cc: David Brownell Cc: Alessandro Zummo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/rtc/rtc-rs5c372.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/rtc/rtc-rs5c372.c b/drivers/rtc/rtc-rs5c372.c index 6b67b5097927..67d8768c1b64 100644 --- a/drivers/rtc/rtc-rs5c372.c +++ b/drivers/rtc/rtc-rs5c372.c @@ -99,7 +99,7 @@ static int rs5c_get_regs(struct rs5c372 *rs5c) * least 80219 chips; this works around that bug. */ if ((i2c_transfer(client->adapter, msgs, 1)) != 1) { - pr_debug("%s: can't read registers\n", rs5c->rtc->name); + dev_warn(&client->dev, "can't read registers\n"); return -EIO; } @@ -512,12 +512,12 @@ static int rs5c372_probe(struct i2c_client *client) goto exit; } - /* we read registers 0x0f then 0x00-0x0f; skip the first one */ - rs5c372->regs=&rs5c372->buf[1]; - rs5c372->client = client; i2c_set_clientdata(client, rs5c372); + /* we read registers 0x0f then 0x00-0x0f; skip the first one */ + rs5c372->regs = &rs5c372->buf[1]; + err = rs5c_get_regs(rs5c372); if (err < 0) goto exit_kfree; -- cgit v1.2.3 From c116bc2ae516e9949d645bc75b1ee294ff15db23 Mon Sep 17 00:00:00 2001 From: Zhao Yakui Date: Mon, 28 Apr 2008 02:11:58 -0700 Subject: rtc: add the support for alarm time relative to current time in sysfs In current kernel if we want to set the alarm time, the absolute time the seconds relative to 1970-01-01 00:00:00) should be written into /sys/class/rtc/rtc0/wakealarm. It is not convenient. It is more reasonable to add the support for the alarm time relative to current RTC time.(the unit is second) For example: If the RTC is required to generate alarm after 2 minutes, the following will be OK. echo +120 > /sys/class/rtc/rtc0/wakealarm or echo +0x78 > /sys/class/rtc/rtc0/wakealarm Signed-off-by: Zhao Yakui Signed-off-by: Zhang Rui Signed-off-by: David Brownell Cc: Alessandro Zummo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/rtc/rtc-sysfs.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/rtc/rtc-sysfs.c b/drivers/rtc/rtc-sysfs.c index 4d27ccc4fc06..2531ce4c9db0 100644 --- a/drivers/rtc/rtc-sysfs.c +++ b/drivers/rtc/rtc-sysfs.c @@ -145,6 +145,8 @@ rtc_sysfs_set_wakealarm(struct device *dev, struct device_attribute *attr, unsigned long now, alarm; struct rtc_wkalrm alm; struct rtc_device *rtc = to_rtc_device(dev); + char *buf_ptr; + int adjust = 0; /* Only request alarms that trigger in the future. Disable them * by writing another time, e.g. 0 meaning Jan 1 1970 UTC. @@ -154,7 +156,15 @@ rtc_sysfs_set_wakealarm(struct device *dev, struct device_attribute *attr, return retval; rtc_tm_to_time(&alm.time, &now); - alarm = simple_strtoul(buf, NULL, 0); + buf_ptr = (char *)buf; + if (*buf_ptr == '+') { + buf_ptr++; + adjust = 1; + } + alarm = simple_strtoul(buf_ptr, NULL, 0); + if (adjust) { + alarm += now; + } if (alarm > now) { /* Avoid accidentally clobbering active alarms; we can't * entirely prevent that here, without even the minimal -- cgit v1.2.3 From dca03a51549bc645685fb8a77efa64df531666c3 Mon Sep 17 00:00:00 2001 From: Julia Lawall Date: Mon, 28 Apr 2008 02:11:59 -0700 Subject: drivers/char/rtc.c: use time_before, time_before_eq, etc The functions time_before, time_before_eq, time_after, and time_after_eq are more robust for comparing jiffies against other values. A simplified version of the semantic patch making this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // @ change_compare_np @ expression E; @@ ( - jiffies <= E + time_before_eq(jiffies,E) | - jiffies >= E + time_after_eq(jiffies,E) | - jiffies < E + time_before(jiffies,E) | - jiffies > E + time_after(jiffies,E) ) @ include depends on change_compare_np @ @@ #include @ no_include depends on !include && change_compare_np @ @@ #include + #include // Signed-off-by: Julia Lawall Cc: Ingo Molnar Cc: Thomas Gleixner Cc: Alessandro Zummo Cc: David Brownell Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/char/rtc.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/char/rtc.c b/drivers/char/rtc.c index 5c3142b6f1fc..e2ec2ee4cf79 100644 --- a/drivers/char/rtc.c +++ b/drivers/char/rtc.c @@ -88,6 +88,7 @@ #ifdef CONFIG_SPARC32 #include +#include #include static unsigned long rtc_port; @@ -1316,7 +1317,8 @@ void rtc_get_rtc_time(struct rtc_time *rtc_tm) * Once the read clears, read the RTC time (again via ioctl). Easy. */ - while (rtc_is_updating() != 0 && jiffies - uip_watchdog < 2*HZ/100) + while (rtc_is_updating() != 0 && + time_before(jiffies, uip_watchdog + 2*HZ/100)) cpu_relax(); /* -- cgit v1.2.3 From 2a4e2b8780c6df42b19c053243dada7fa4d311ee Mon Sep 17 00:00:00 2001 From: Harvey Harrison Date: Mon, 28 Apr 2008 02:12:00 -0700 Subject: rtc: replace remaining __FUNCTION__ occurrences __FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison Cc: David Brownell Cc: Alessandro Zummo Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/rtc/rtc-at91rm9200.c | 12 ++++++------ drivers/rtc/rtc-at91sam9.c | 2 +- drivers/rtc/rtc-ds1302.c | 2 +- drivers/rtc/rtc-ds1672.c | 14 +++++++------- drivers/rtc/rtc-max6900.c | 6 +++--- drivers/rtc/rtc-max6902.c | 4 ++-- drivers/rtc/rtc-pcf8563.c | 14 +++++++------- drivers/rtc/rtc-rs5c313.c | 4 ++-- drivers/rtc/rtc-rs5c372.c | 10 +++++----- drivers/rtc/rtc-s3c.c | 6 +++--- drivers/rtc/rtc-sh.c | 2 +- drivers/rtc/rtc-v3020.c | 4 ++-- drivers/rtc/rtc-x1205.c | 40 ++++++++++++++++++++-------------------- 13 files changed, 60 insertions(+), 60 deletions(-) diff --git a/drivers/rtc/rtc-at91rm9200.c b/drivers/rtc/rtc-at91rm9200.c index 52abffc86bcd..39e64ab1ecb7 100644 --- a/drivers/rtc/rtc-at91rm9200.c +++ b/drivers/rtc/rtc-at91rm9200.c @@ -83,7 +83,7 @@ static int at91_rtc_readtime(struct device *dev, struct rtc_time *tm) tm->tm_yday = rtc_year_days(tm->tm_mday, tm->tm_mon, tm->tm_year); tm->tm_year = tm->tm_year - 1900; - pr_debug("%s(): %4d-%02d-%02d %02d:%02d:%02d\n", __FUNCTION__, + pr_debug("%s(): %4d-%02d-%02d %02d:%02d:%02d\n", __func__, 1900 + tm->tm_year, tm->tm_mon, tm->tm_mday, tm->tm_hour, tm->tm_min, tm->tm_sec); @@ -97,7 +97,7 @@ static int at91_rtc_settime(struct device *dev, struct rtc_time *tm) { unsigned long cr; - pr_debug("%s(): %4d-%02d-%02d %02d:%02d:%02d\n", __FUNCTION__, + pr_debug("%s(): %4d-%02d-%02d %02d:%02d:%02d\n", __func__, 1900 + tm->tm_year, tm->tm_mon, tm->tm_mday, tm->tm_hour, tm->tm_min, tm->tm_sec); @@ -142,7 +142,7 @@ static int at91_rtc_readalarm(struct device *dev, struct rtc_wkalrm *alrm) alrm->enabled = (at91_sys_read(AT91_RTC_IMR) & AT91_RTC_ALARM) ? 1 : 0; - pr_debug("%s(): %4d-%02d-%02d %02d:%02d:%02d\n", __FUNCTION__, + pr_debug("%s(): %4d-%02d-%02d %02d:%02d:%02d\n", __func__, 1900 + tm->tm_year, tm->tm_mon, tm->tm_mday, tm->tm_hour, tm->tm_min, tm->tm_sec); @@ -178,7 +178,7 @@ static int at91_rtc_setalarm(struct device *dev, struct rtc_wkalrm *alrm) if (alrm->enabled) at91_sys_write(AT91_RTC_IER, AT91_RTC_ALARM); - pr_debug("%s(): %4d-%02d-%02d %02d:%02d:%02d\n", __FUNCTION__, + pr_debug("%s(): %4d-%02d-%02d %02d:%02d:%02d\n", __func__, at91_alarm_year, tm.tm_mon, tm.tm_mday, tm.tm_hour, tm.tm_min, tm.tm_sec); @@ -193,7 +193,7 @@ static int at91_rtc_ioctl(struct device *dev, unsigned int cmd, { int ret = 0; - pr_debug("%s(): cmd=%08x, arg=%08lx.\n", __FUNCTION__, cmd, arg); + pr_debug("%s(): cmd=%08x, arg=%08lx.\n", __func__, cmd, arg); switch (cmd) { case RTC_AIE_OFF: /* alarm off */ @@ -265,7 +265,7 @@ static irqreturn_t at91_rtc_interrupt(int irq, void *dev_id) rtc_update_irq(rtc, 1, events); - pr_debug("%s(): num=%ld, events=0x%02lx\n", __FUNCTION__, + pr_debug("%s(): num=%ld, events=0x%02lx\n", __func__, events >> 8, events & 0x000000FF); return IRQ_HANDLED; diff --git a/drivers/rtc/rtc-at91sam9.c b/drivers/rtc/rtc-at91sam9.c index 56728a2a3385..38d8742a4bdf 100644 --- a/drivers/rtc/rtc-at91sam9.c +++ b/drivers/rtc/rtc-at91sam9.c @@ -288,7 +288,7 @@ static irqreturn_t at91_rtc_interrupt(int irq, void *_rtc) rtc_update_irq(rtc->rtcdev, 1, events); - pr_debug("%s: num=%ld, events=0x%02lx\n", __FUNCTION__, + pr_debug("%s: num=%ld, events=0x%02lx\n", __func__, events >> 8, events & 0x000000FF); return IRQ_HANDLED; diff --git a/drivers/rtc/rtc-ds1302.c b/drivers/rtc/rtc-ds1302.c index 7b002ceeaa7d..b9397818f73a 100644 --- a/drivers/rtc/rtc-ds1302.c +++ b/drivers/rtc/rtc-ds1302.c @@ -122,7 +122,7 @@ static int ds1302_rtc_read_time(struct device *dev, struct rtc_time *tm) dev_dbg(dev, "%s: tm is secs=%d, mins=%d, hours=%d, " "mday=%d, mon=%d, year=%d, wday=%d\n", - __FUNCTION__, + __func__, tm->tm_sec, tm->tm_min, tm->tm_hour, tm->tm_mday, tm->tm_mon + 1, tm->tm_year, tm->tm_wday); diff --git a/drivers/rtc/rtc-ds1672.c b/drivers/rtc/rtc-ds1672.c index e0900ca678ec..6fa4556f5f5c 100644 --- a/drivers/rtc/rtc-ds1672.c +++ b/drivers/rtc/rtc-ds1672.c @@ -50,13 +50,13 @@ static int ds1672_get_datetime(struct i2c_client *client, struct rtc_time *tm) /* read date registers */ if ((i2c_transfer(client->adapter, &msgs[0], 2)) != 2) { - dev_err(&client->dev, "%s: read error\n", __FUNCTION__); + dev_err(&client->dev, "%s: read error\n", __func__); return -EIO; } dev_dbg(&client->dev, "%s: raw read data - counters=%02x,%02x,%02x,%02x\n", - __FUNCTION__, buf[0], buf[1], buf[2], buf[3]); + __func__, buf[0], buf[1], buf[2], buf[3]); time = (buf[3] << 24) | (buf[2] << 16) | (buf[1] << 8) | buf[0]; @@ -64,7 +64,7 @@ static int ds1672_get_datetime(struct i2c_client *client, struct rtc_time *tm) dev_dbg(&client->dev, "%s: tm is secs=%d, mins=%d, hours=%d, " "mday=%d, mon=%d, year=%d, wday=%d\n", - __FUNCTION__, tm->tm_sec, tm->tm_min, tm->tm_hour, + __func__, tm->tm_sec, tm->tm_min, tm->tm_hour, tm->tm_mday, tm->tm_mon, tm->tm_year, tm->tm_wday); return 0; @@ -84,7 +84,7 @@ static int ds1672_set_mmss(struct i2c_client *client, unsigned long secs) xfer = i2c_master_send(client, buf, 6); if (xfer != 6) { - dev_err(&client->dev, "%s: send: %d\n", __FUNCTION__, xfer); + dev_err(&client->dev, "%s: send: %d\n", __func__, xfer); return -EIO; } @@ -98,7 +98,7 @@ static int ds1672_set_datetime(struct i2c_client *client, struct rtc_time *tm) dev_dbg(&client->dev, "%s: secs=%d, mins=%d, hours=%d, " "mday=%d, mon=%d, year=%d, wday=%d\n", - __FUNCTION__, + __func__, tm->tm_sec, tm->tm_min, tm->tm_hour, tm->tm_mday, tm->tm_mon, tm->tm_year, tm->tm_wday); @@ -133,7 +133,7 @@ static int ds1672_get_control(struct i2c_client *client, u8 *status) /* read control register */ if ((i2c_transfer(client->adapter, &msgs[0], 2)) != 2) { - dev_err(&client->dev, "%s: read error\n", __FUNCTION__); + dev_err(&client->dev, "%s: read error\n", __func__); return -EIO; } @@ -199,7 +199,7 @@ static int ds1672_probe(struct i2c_adapter *adapter, int address, int kind) struct i2c_client *client; struct rtc_device *rtc; - dev_dbg(&adapter->dev, "%s\n", __FUNCTION__); + dev_dbg(&adapter->dev, "%s\n", __func__); if (!i2c_check_functionality(adapter, I2C_FUNC_I2C)) { err = -ENODEV; diff --git a/drivers/rtc/rtc-max6900.c b/drivers/rtc/rtc-max6900.c index 7683412970c4..ded3c0abad83 100644 --- a/drivers/rtc/rtc-max6900.c +++ b/drivers/rtc/rtc-max6900.c @@ -98,7 +98,7 @@ static int max6900_i2c_read_regs(struct i2c_client *client, u8 *buf) rc = i2c_transfer(client->adapter, msgs, ARRAY_SIZE(msgs)); if (rc != ARRAY_SIZE(msgs)) { dev_err(&client->dev, "%s: register read failed\n", - __FUNCTION__); + __func__); return -EIO; } return 0; @@ -150,7 +150,7 @@ static int max6900_i2c_write_regs(struct i2c_client *client, u8 const *buf) write_failed: dev_err(&client->dev, "%s: register write failed\n", - __FUNCTION__); + __func__); return -EIO; } @@ -214,7 +214,7 @@ static int max6900_i2c_clear_write_protect(struct i2c_client *client) rc = i2c_smbus_write_byte_data (client, MAX6900_REG_CONTROL_WRITE, 0); if (rc < 0) { dev_err(&client->dev, "%s: control register write failed\n", - __FUNCTION__); + __func__); return -EIO; } return 0; diff --git a/drivers/rtc/rtc-max6902.c b/drivers/rtc/rtc-max6902.c index 1f956dc5d56e..12f0310ae89c 100644 --- a/drivers/rtc/rtc-max6902.c +++ b/drivers/rtc/rtc-max6902.c @@ -140,7 +140,7 @@ static int max6902_get_datetime(struct device *dev, struct rtc_time *dt) dt->tm_year -= 1900; #ifdef MAX6902_DEBUG - printk("\n%s : Read RTC values\n",__FUNCTION__); + printk("\n%s : Read RTC values\n",__func__); printk("tm_hour: %i\n",dt->tm_hour); printk("tm_min : %i\n",dt->tm_min); printk("tm_sec : %i\n",dt->tm_sec); @@ -158,7 +158,7 @@ static int max6902_set_datetime(struct device *dev, struct rtc_time *dt) dt->tm_year = dt->tm_year+1900; #ifdef MAX6902_DEBUG - printk("\n%s : Setting RTC values\n",__FUNCTION__); + printk("\n%s : Setting RTC values\n",__func__); printk("tm_sec : %i\n",dt->tm_sec); printk("tm_min : %i\n",dt->tm_min); printk("tm_hour: %i\n",dt->tm_hour); diff --git a/drivers/rtc/rtc-pcf8563.c b/drivers/rtc/rtc-pcf8563.c index a1e2f39521f8..a41681d26eba 100644 --- a/drivers/rtc/rtc-pcf8563.c +++ b/drivers/rtc/rtc-pcf8563.c @@ -80,7 +80,7 @@ static int pcf8563_get_datetime(struct i2c_client *client, struct rtc_time *tm) /* read registers */ if ((i2c_transfer(client->adapter, msgs, 2)) != 2) { - dev_err(&client->dev, "%s: read error\n", __FUNCTION__); + dev_err(&client->dev, "%s: read error\n", __func__); return -EIO; } @@ -91,7 +91,7 @@ static int pcf8563_get_datetime(struct i2c_client *client, struct rtc_time *tm) dev_dbg(&client->dev, "%s: raw data is st1=%02x, st2=%02x, sec=%02x, min=%02x, hr=%02x, " "mday=%02x, wday=%02x, mon=%02x, year=%02x\n", - __FUNCTION__, + __func__, buf[0], buf[1], buf[2], buf[3], buf[4], buf[5], buf[6], buf[7], buf[8]); @@ -112,7 +112,7 @@ static int pcf8563_get_datetime(struct i2c_client *client, struct rtc_time *tm) dev_dbg(&client->dev, "%s: tm is secs=%d, mins=%d, hours=%d, " "mday=%d, mon=%d, year=%d, wday=%d\n", - __FUNCTION__, + __func__, tm->tm_sec, tm->tm_min, tm->tm_hour, tm->tm_mday, tm->tm_mon, tm->tm_year, tm->tm_wday); @@ -133,7 +133,7 @@ static int pcf8563_set_datetime(struct i2c_client *client, struct rtc_time *tm) dev_dbg(&client->dev, "%s: secs=%d, mins=%d, hours=%d, " "mday=%d, mon=%d, year=%d, wday=%d\n", - __FUNCTION__, + __func__, tm->tm_sec, tm->tm_min, tm->tm_hour, tm->tm_mday, tm->tm_mon, tm->tm_year, tm->tm_wday); @@ -163,7 +163,7 @@ static int pcf8563_set_datetime(struct i2c_client *client, struct rtc_time *tm) if (err != sizeof(data)) { dev_err(&client->dev, "%s: err=%d addr=%02x, data=%02x\n", - __FUNCTION__, err, data[0], data[1]); + __func__, err, data[0], data[1]); return -EIO; } }; @@ -208,7 +208,7 @@ static int pcf8563_validate_client(struct i2c_client *client) if (xfer != ARRAY_SIZE(msgs)) { dev_err(&client->dev, "%s: could not read register 0x%02X\n", - __FUNCTION__, pattern[i].reg); + __func__, pattern[i].reg); return -EIO; } @@ -220,7 +220,7 @@ static int pcf8563_validate_client(struct i2c_client *client) dev_dbg(&client->dev, "%s: pattern=%d, reg=%x, mask=0x%02x, min=%d, " "max=%d, value=%d, raw=0x%02X\n", - __FUNCTION__, i, pattern[i].reg, pattern[i].mask, + __func__, i, pattern[i].reg, pattern[i].mask, pattern[i].min, pattern[i].max, value, buf); diff --git a/drivers/rtc/rtc-rs5c313.c b/drivers/rtc/rtc-rs5c313.c index 664e89a817ed..1c14d4497c4d 100644 --- a/drivers/rtc/rtc-rs5c313.c +++ b/drivers/rtc/rtc-rs5c313.c @@ -228,7 +228,7 @@ static int rs5c313_rtc_read_time(struct device *dev, struct rtc_time *tm) ndelay(700); /* CE:L */ if (cnt++ > 100) { - dev_err(dev, "%s: timeout error\n", __FUNCTION__); + dev_err(dev, "%s: timeout error\n", __func__); return -EIO; } } @@ -289,7 +289,7 @@ static int rs5c313_rtc_set_time(struct device *dev, struct rtc_time *tm) ndelay(700); /* CE:L */ if (cnt++ > 100) { - dev_err(dev, "%s: timeout error\n", __FUNCTION__); + dev_err(dev, "%s: timeout error\n", __func__); return -EIO; } } diff --git a/drivers/rtc/rtc-rs5c372.c b/drivers/rtc/rtc-rs5c372.c index 67d8768c1b64..7e63074708eb 100644 --- a/drivers/rtc/rtc-rs5c372.c +++ b/drivers/rtc/rtc-rs5c372.c @@ -166,7 +166,7 @@ static int rs5c372_get_datetime(struct i2c_client *client, struct rtc_time *tm) dev_dbg(&client->dev, "%s: tm is secs=%d, mins=%d, hours=%d, " "mday=%d, mon=%d, year=%d, wday=%d\n", - __FUNCTION__, + __func__, tm->tm_sec, tm->tm_min, tm->tm_hour, tm->tm_mday, tm->tm_mon, tm->tm_year, tm->tm_wday); @@ -181,7 +181,7 @@ static int rs5c372_set_datetime(struct i2c_client *client, struct rtc_time *tm) dev_dbg(&client->dev, "%s: tm is secs=%d, mins=%d, hours=%d " "mday=%d, mon=%d, year=%d, wday=%d\n", - __FUNCTION__, + __func__, tm->tm_sec, tm->tm_min, tm->tm_hour, tm->tm_mday, tm->tm_mon, tm->tm_year, tm->tm_wday); @@ -195,7 +195,7 @@ static int rs5c372_set_datetime(struct i2c_client *client, struct rtc_time *tm) buf[7] = BIN2BCD(tm->tm_year - 100); if ((i2c_master_send(client, buf, 8)) != 8) { - dev_err(&client->dev, "%s: write error\n", __FUNCTION__); + dev_err(&client->dev, "%s: write error\n", __func__); return -EIO; } @@ -220,7 +220,7 @@ static int rs5c372_get_trim(struct i2c_client *client, int *osc, int *trim) *osc = (tmp & RS5C372_TRIM_XSL) ? 32000 : 32768; if (trim) { - dev_dbg(&client->dev, "%s: raw trim=%x\n", __FUNCTION__, tmp); + dev_dbg(&client->dev, "%s: raw trim=%x\n", __func__, tmp); tmp &= RS5C372_TRIM_MASK; if (tmp & 0x3e) { int t = tmp & 0x3f; @@ -500,7 +500,7 @@ static int rs5c372_probe(struct i2c_client *client) struct rs5c372 *rs5c372; struct rtc_time tm; - dev_dbg(&client->dev, "%s\n", __FUNCTION__); + dev_dbg(&client->dev, "%s\n", __func__); if (!i2c_check_functionality(client->adapter, I2C_FUNC_I2C)) { err = -ENODEV; diff --git a/drivers/rtc/rtc-s3c.c b/drivers/rtc/rtc-s3c.c index 9f4d5129a496..f26e0cad8f16 100644 --- a/drivers/rtc/rtc-s3c.c +++ b/drivers/rtc/rtc-s3c.c @@ -68,7 +68,7 @@ static void s3c_rtc_setaie(int to) { unsigned int tmp; - pr_debug("%s: aie=%d\n", __FUNCTION__, to); + pr_debug("%s: aie=%d\n", __func__, to); tmp = readb(s3c_rtc_base + S3C2410_RTCALM) & ~S3C2410_RTCALM_ALMEN; @@ -82,7 +82,7 @@ static void s3c_rtc_setpie(int to) { unsigned int tmp; - pr_debug("%s: pie=%d\n", __FUNCTION__, to); + pr_debug("%s: pie=%d\n", __func__, to); spin_lock_irq(&s3c_rtc_pie_lock); tmp = readb(s3c_rtc_base + S3C2410_TICNT) & ~S3C2410_TICNT_ENABLE; @@ -457,7 +457,7 @@ static int s3c_rtc_probe(struct platform_device *pdev) struct resource *res; int ret; - pr_debug("%s: probe=%p\n", __FUNCTION__, pdev); + pr_debug("%s: probe=%p\n", __func__, pdev); /* find the IRQs */ diff --git a/drivers/rtc/rtc-sh.c b/drivers/rtc/rtc-sh.c index c594b34c6767..110699bb4787 100644 --- a/drivers/rtc/rtc-sh.c +++ b/drivers/rtc/rtc-sh.c @@ -361,7 +361,7 @@ static int sh_rtc_read_time(struct device *dev, struct rtc_time *tm) dev_dbg(dev, "%s: tm is secs=%d, mins=%d, hours=%d, " "mday=%d, mon=%d, year=%d, wday=%d\n", - __FUNCTION__, + __func__, tm->tm_sec, tm->tm_min, tm->tm_hour, tm->tm_mday, tm->tm_mon + 1, tm->tm_year, tm->tm_wday); diff --git a/drivers/rtc/rtc-v3020.c b/drivers/rtc/rtc-v3020.c index 24203a06051a..10025d840268 100644 --- a/drivers/rtc/rtc-v3020.c +++ b/drivers/rtc/rtc-v3020.c @@ -107,7 +107,7 @@ static int v3020_read_time(struct device *dev, struct rtc_time *dt) dt->tm_year = BCD2BIN(tmp)+100; #ifdef DEBUG - printk("\n%s : Read RTC values\n",__FUNCTION__); + printk("\n%s : Read RTC values\n",__func__); printk("tm_hour: %i\n",dt->tm_hour); printk("tm_min : %i\n",dt->tm_min); printk("tm_sec : %i\n",dt->tm_sec); @@ -126,7 +126,7 @@ static int v3020_set_time(struct device *dev, struct rtc_time *dt) struct v3020 *chip = dev_get_drvdata(dev); #ifdef DEBUG - printk("\n%s : Setting RTC values\n",__FUNCTION__); + printk("\n%s : Setting RTC values\n",__func__); printk("tm_sec : %i\n",dt->tm_sec); printk("tm_min : %i\n",dt->tm_min); printk("tm_hour: %i\n",dt->tm_hour); diff --git a/drivers/rtc/rtc-x1205.c b/drivers/rtc/rtc-x1205.c index bb3290360091..095282f63523 100644 --- a/drivers/rtc/rtc-x1205.c +++ b/drivers/rtc/rtc-x1205.c @@ -99,14 +99,14 @@ static int x1205_get_datetime(struct i2c_client *client, struct rtc_time *tm, /* read date registers */ if ((i2c_transfer(client->adapter, &msgs[0], 2)) != 2) { - dev_err(&client->dev, "%s: read error\n", __FUNCTION__); + dev_err(&client->dev, "%s: read error\n", __func__); return -EIO; } dev_dbg(&client->dev, "%s: raw read data - sec=%02x, min=%02x, hr=%02x, " "mday=%02x, mon=%02x, year=%02x, wday=%02x, y2k=%02x\n", - __FUNCTION__, + __func__, buf[0], buf[1], buf[2], buf[3], buf[4], buf[5], buf[6], buf[7]); @@ -121,7 +121,7 @@ static int x1205_get_datetime(struct i2c_client *client, struct rtc_time *tm, dev_dbg(&client->dev, "%s: tm is secs=%d, mins=%d, hours=%d, " "mday=%d, mon=%d, year=%d, wday=%d\n", - __FUNCTION__, + __func__, tm->tm_sec, tm->tm_min, tm->tm_hour, tm->tm_mday, tm->tm_mon, tm->tm_year, tm->tm_wday); @@ -139,7 +139,7 @@ static int x1205_get_status(struct i2c_client *client, unsigned char *sr) /* read status register */ if ((i2c_transfer(client->adapter, &msgs[0], 2)) != 2) { - dev_err(&client->dev, "%s: read error\n", __FUNCTION__); + dev_err(&client->dev, "%s: read error\n", __func__); return -EIO; } @@ -162,7 +162,7 @@ static int x1205_set_datetime(struct i2c_client *client, struct rtc_time *tm, dev_dbg(&client->dev, "%s: secs=%d, mins=%d, hours=%d\n", - __FUNCTION__, + __func__, tm->tm_sec, tm->tm_min, tm->tm_hour); buf[CCR_SEC] = BIN2BCD(tm->tm_sec); @@ -175,7 +175,7 @@ static int x1205_set_datetime(struct i2c_client *client, struct rtc_time *tm, if (datetoo) { dev_dbg(&client->dev, "%s: mday=%d, mon=%d, year=%d, wday=%d\n", - __FUNCTION__, + __func__, tm->tm_mday, tm->tm_mon, tm->tm_year, tm->tm_wday); buf[CCR_MDAY] = BIN2BCD(tm->tm_mday); @@ -191,12 +191,12 @@ static int x1205_set_datetime(struct i2c_client *client, struct rtc_time *tm, /* this sequence is required to unlock the chip */ if ((xfer = i2c_master_send(client, wel, 3)) != 3) { - dev_err(&client->dev, "%s: wel - %d\n", __FUNCTION__, xfer); + dev_err(&client->dev, "%s: wel - %d\n", __func__, xfer); return -EIO; } if ((xfer = i2c_master_send(client, rwel, 3)) != 3) { - dev_err(&client->dev, "%s: rwel - %d\n", __FUNCTION__, xfer); + dev_err(&client->dev, "%s: rwel - %d\n", __func__, xfer); return -EIO; } @@ -208,7 +208,7 @@ static int x1205_set_datetime(struct i2c_client *client, struct rtc_time *tm, if (xfer != 3) { dev_err(&client->dev, "%s: xfer=%d addr=%02x, data=%02x\n", - __FUNCTION__, + __func__, xfer, rdata[1], rdata[2]); return -EIO; } @@ -216,7 +216,7 @@ static int x1205_set_datetime(struct i2c_client *client, struct rtc_time *tm, /* disable further writes */ if ((xfer = i2c_master_send(client, diswe, 3)) != 3) { - dev_err(&client->dev, "%s: diswe - %d\n", __FUNCTION__, xfer); + dev_err(&client->dev, "%s: diswe - %d\n", __func__, xfer); return -EIO; } @@ -249,11 +249,11 @@ static int x1205_get_dtrim(struct i2c_client *client, int *trim) /* read dtr register */ if ((i2c_transfer(client->adapter, &msgs[0], 2)) != 2) { - dev_err(&client->dev, "%s: read error\n", __FUNCTION__); + dev_err(&client->dev, "%s: read error\n", __func__); return -EIO; } - dev_dbg(&client->dev, "%s: raw dtr=%x\n", __FUNCTION__, dtr); + dev_dbg(&client->dev, "%s: raw dtr=%x\n", __func__, dtr); *trim = 0; @@ -281,11 +281,11 @@ static int x1205_get_atrim(struct i2c_client *client, int *trim) /* read atr register */ if ((i2c_transfer(client->adapter, &msgs[0], 2)) != 2) { - dev_err(&client->dev, "%s: read error\n", __FUNCTION__); + dev_err(&client->dev, "%s: read error\n", __func__); return -EIO; } - dev_dbg(&client->dev, "%s: raw atr=%x\n", __FUNCTION__, atr); + dev_dbg(&client->dev, "%s: raw atr=%x\n", __func__, atr); /* atr is a two's complement value on 6 bits, * perform sign extension. The formula is @@ -294,11 +294,11 @@ static int x1205_get_atrim(struct i2c_client *client, int *trim) if (atr & 0x20) atr |= 0xC0; - dev_dbg(&client->dev, "%s: raw atr=%x (%d)\n", __FUNCTION__, atr, atr); + dev_dbg(&client->dev, "%s: raw atr=%x (%d)\n", __func__, atr, atr); *trim = (atr * 250) + 11000; - dev_dbg(&client->dev, "%s: real=%d\n", __FUNCTION__, *trim); + dev_dbg(&client->dev, "%s: real=%d\n", __func__, *trim); return 0; } @@ -352,7 +352,7 @@ static int x1205_validate_client(struct i2c_client *client) if ((xfer = i2c_transfer(client->adapter, msgs, 2)) != 2) { dev_err(&client->dev, "%s: could not read register %x\n", - __FUNCTION__, probe_zero_pattern[i]); + __func__, probe_zero_pattern[i]); return -EIO; } @@ -360,7 +360,7 @@ static int x1205_validate_client(struct i2c_client *client) if ((buf & probe_zero_pattern[i+1]) != 0) { dev_err(&client->dev, "%s: register=%02x, zero pattern=%d, value=%x\n", - __FUNCTION__, probe_zero_pattern[i], i, buf); + __func__, probe_zero_pattern[i], i, buf); return -ENODEV; } @@ -380,7 +380,7 @@ static int x1205_validate_client(struct i2c_client *client) if ((xfer = i2c_transfer(client->adapter, msgs, 2)) != 2) { dev_err(&client->dev, "%s: could not read register %x\n", - __FUNCTION__, probe_limits_pattern[i].reg); + __func__, probe_limits_pattern[i].reg); return -EIO; } @@ -391,7 +391,7 @@ static int x1205_validate_client(struct i2c_client *client) value < probe_limits_pattern[i].min) { dev_dbg(&client->dev, "%s: register=%x, lim pattern=%d, value=%d\n", - __FUNCTION__, probe_limits_pattern[i].reg, + __func__, probe_limits_pattern[i].reg, i, value); return -ENODEV; -- cgit v1.2.3 From ea01ea937dcae2caa146dea1918cccf2f16ed3c4 Mon Sep 17 00:00:00 2001 From: Badari Pulavarty Date: Mon, 28 Apr 2008 02:12:01 -0700 Subject: hotplug memory remove: generic __remove_pages() support Generic helper function to remove section mappings and sysfs entries for the section of the memory we are removing. offline_pages() correctly adjusted zone and marked the pages reserved. TODO: Yasunori Goto is working on patches to free up allocations from bootmem. Signed-off-by: Badari Pulavarty Acked-by: Yasunori Goto Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/memory_hotplug.h | 6 ++++- mm/memory_hotplug.c | 55 ++++++++++++++++++++++++++++++++++++++++++ mm/sparse.c | 45 +++++++++++++++++++++++++++++++--- 3 files changed, 102 insertions(+), 4 deletions(-) diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 8fee7a45736b..aca9c65f8d08 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -8,6 +8,7 @@ struct page; struct zone; struct pglist_data; +struct mem_section; #ifdef CONFIG_MEMORY_HOTPLUG /* @@ -64,9 +65,11 @@ extern int offline_pages(unsigned long, unsigned long, unsigned long); /* reasonably generic interface to expand the physical pages in a zone */ extern int __add_pages(struct zone *zone, unsigned long start_pfn, unsigned long nr_pages); +extern int __remove_pages(struct zone *zone, unsigned long start_pfn, + unsigned long nr_pages); /* - * Walk thorugh all memory which is registered as resource. + * Walk through all memory which is registered as resource. * arg is (start_pfn, nr_pages, private_arg_pointer) */ extern int walk_memory_resource(unsigned long start_pfn, @@ -176,5 +179,6 @@ extern int arch_add_memory(int nid, u64 start, u64 size); extern int remove_memory(u64 start, u64 size); extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn, int nr_pages); +extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms); #endif /* __LINUX_MEMORY_HOTPLUG_H */ diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 0fb330271271..d5094929766d 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -101,6 +101,25 @@ static int __add_section(struct zone *zone, unsigned long phys_start_pfn) return register_new_memory(__pfn_to_section(phys_start_pfn)); } +static int __remove_section(struct zone *zone, struct mem_section *ms) +{ + unsigned long flags; + struct pglist_data *pgdat = zone->zone_pgdat; + int ret = -EINVAL; + + if (!valid_section(ms)) + return ret; + + ret = unregister_memory_section(ms); + if (ret) + return ret; + + pgdat_resize_lock(pgdat, &flags); + sparse_remove_one_section(zone, ms); + pgdat_resize_unlock(pgdat, &flags); + return 0; +} + /* * Reasonably generic function for adding memory. It is * expected that archs that support memory hotplug will @@ -134,6 +153,42 @@ int __add_pages(struct zone *zone, unsigned long phys_start_pfn, } EXPORT_SYMBOL_GPL(__add_pages); +/** + * __remove_pages() - remove sections of pages from a zone + * @zone: zone from which pages need to be removed + * @phys_start_pfn: starting pageframe (must be aligned to start of a section) + * @nr_pages: number of pages to remove (must be multiple of section size) + * + * Generic helper function to remove section mappings and sysfs entries + * for the section of the memory we are removing. Caller needs to make + * sure that pages are marked reserved and zones are adjust properly by + * calling offline_pages(). + */ +int __remove_pages(struct zone *zone, unsigned long phys_start_pfn, + unsigned long nr_pages) +{ + unsigned long i, ret = 0; + int sections_to_remove; + + /* + * We can only remove entire sections + */ + BUG_ON(phys_start_pfn & ~PAGE_SECTION_MASK); + BUG_ON(nr_pages % PAGES_PER_SECTION); + + release_mem_region(phys_start_pfn << PAGE_SHIFT, nr_pages * PAGE_SIZE); + + sections_to_remove = nr_pages / PAGES_PER_SECTION; + for (i = 0; i < sections_to_remove; i++) { + unsigned long pfn = phys_start_pfn + i*PAGES_PER_SECTION; + ret = __remove_section(zone, __pfn_to_section(pfn)); + if (ret) + break; + } + return ret; +} +EXPORT_SYMBOL_GPL(__remove_pages); + static void grow_zone_span(struct zone *zone, unsigned long start_pfn, unsigned long end_pfn) { diff --git a/mm/sparse.c b/mm/sparse.c index 7e9191381f86..186a85bf7912 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -208,12 +208,13 @@ static unsigned long sparse_encode_mem_map(struct page *mem_map, unsigned long p } /* - * We need this if we ever free the mem_maps. While not implemented yet, - * this function is included for parity with its sibling. + * Decode mem_map from the coded memmap */ -static __attribute((unused)) +static struct page *sparse_decode_mem_map(unsigned long coded_mem_map, unsigned long pnum) { + /* mask off the extra low bits of information */ + coded_mem_map &= SECTION_MAP_MASK; return ((struct page *)coded_mem_map) + section_nr_to_pfn(pnum); } @@ -404,6 +405,28 @@ static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages) } #endif /* CONFIG_SPARSEMEM_VMEMMAP */ +static void free_section_usemap(struct page *memmap, unsigned long *usemap) +{ + if (!usemap) + return; + + /* + * Check to see if allocation came from hot-plug-add + */ + if (PageSlab(virt_to_page(usemap))) { + kfree(usemap); + if (memmap) + __kfree_section_memmap(memmap, PAGES_PER_SECTION); + return; + } + + /* + * TODO: Allocations came from bootmem - how do I free up ? + */ + printk(KERN_WARNING "Not freeing up allocations from bootmem " + "- leaking memory\n"); +} + /* * returns the number of sections whose mem_maps were properly * set. If this is <=0, then that means that the passed-in @@ -456,4 +479,20 @@ out: } return ret; } + +void sparse_remove_one_section(struct zone *zone, struct mem_section *ms) +{ + struct page *memmap = NULL; + unsigned long *usemap = NULL; + + if (ms->section_mem_map) { + usemap = ms->pageblock_flags; + memmap = sparse_decode_mem_map(ms->section_mem_map, + __section_nr(ms)); + ms->section_mem_map = 0; + ms->pageblock_flags = NULL; + } + + free_section_usemap(memmap, usemap); +} #endif -- cgit v1.2.3 From 180c06efce691f2b721dd0d965079827bdd7ee03 Mon Sep 17 00:00:00 2001 From: Jeremy Fitzhardinge Date: Mon, 28 Apr 2008 02:12:03 -0700 Subject: hotplug-memory: make online_page() common All architectures use an effectively identical definition of online_page(), so just make it common code. x86-64, ia64, powerpc and sh are actually identical; x86-32 is slightly different. x86-32's differences arise because it puts its hotplug pages in the highmem zone. We can handle this in the generic code by inspecting the page to see if its in highmem, and update the totalhigh_pages count appropriately. This leaves init_32.c:free_new_highpage with a single caller, so I folded it into add_one_highpage_init. I also removed an incorrect comment referring to the NUMA case; any NUMA details have already been dealt with by the time online_page() is called. [akpm@linux-foundation.org: fix indenting] Signed-off-by: Jeremy Fitzhardinge Acked-by: Dave Hansen Reviewed-by: KAMEZAWA Hiroyuki Tested-by: KAMEZAWA Hiroyuki Cc: Yasunori Goto Cc: Christoph Lameter Acked-by: Ingo Molnar Acked-by: Yasunori Goto Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/ia64/mm/init.c | 9 --------- arch/powerpc/mm/mem.c | 9 --------- arch/sh/mm/init.c | 9 --------- arch/x86/mm/init_32.c | 36 +++--------------------------------- arch/x86/mm/init_64.c | 9 --------- mm/memory_hotplug.c | 19 +++++++++++++++++++ 6 files changed, 22 insertions(+), 69 deletions(-) diff --git a/arch/ia64/mm/init.c b/arch/ia64/mm/init.c index 5c1de53c8c1c..fc6c6636ffda 100644 --- a/arch/ia64/mm/init.c +++ b/arch/ia64/mm/init.c @@ -682,15 +682,6 @@ mem_init (void) } #ifdef CONFIG_MEMORY_HOTPLUG -void online_page(struct page *page) -{ - ClearPageReserved(page); - init_page_count(page); - __free_page(page); - totalram_pages++; - num_physpages++; -} - int arch_add_memory(int nid, u64 start, u64 size) { pg_data_t *pgdat; diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c index 5ccb579b81e4..d9e37f365b54 100644 --- a/arch/powerpc/mm/mem.c +++ b/arch/powerpc/mm/mem.c @@ -110,15 +110,6 @@ EXPORT_SYMBOL(phys_mem_access_prot); #ifdef CONFIG_MEMORY_HOTPLUG -void online_page(struct page *page) -{ - ClearPageReserved(page); - init_page_count(page); - __free_page(page); - totalram_pages++; - num_physpages++; -} - #ifdef CONFIG_NUMA int memory_add_physaddr_to_nid(u64 start) { diff --git a/arch/sh/mm/init.c b/arch/sh/mm/init.c index 53dde0607362..d7df26bd1e54 100644 --- a/arch/sh/mm/init.c +++ b/arch/sh/mm/init.c @@ -307,15 +307,6 @@ void free_initrd_mem(unsigned long start, unsigned long end) #endif #ifdef CONFIG_MEMORY_HOTPLUG -void online_page(struct page *page) -{ - ClearPageReserved(page); - init_page_count(page); - __free_page(page); - totalram_pages++; - num_physpages++; -} - int arch_add_memory(int nid, u64 start, u64 size) { pg_data_t *pgdat; diff --git a/arch/x86/mm/init_32.c b/arch/x86/mm/init_32.c index 4a4761892951..de236e419cb5 100644 --- a/arch/x86/mm/init_32.c +++ b/arch/x86/mm/init_32.c @@ -287,47 +287,17 @@ static void __init permanent_kmaps_init(pgd_t *pgd_base) pkmap_page_table = pte; } -static void __meminit free_new_highpage(struct page *page) -{ - init_page_count(page); - __free_page(page); - totalhigh_pages++; -} - void __init add_one_highpage_init(struct page *page, int pfn, int bad_ppro) { if (page_is_ram(pfn) && !(bad_ppro && page_kills_ppro(pfn))) { ClearPageReserved(page); - free_new_highpage(page); + init_page_count(page); + __free_page(page); + totalhigh_pages++; } else SetPageReserved(page); } -static int __meminit -add_one_highpage_hotplug(struct page *page, unsigned long pfn) -{ - free_new_highpage(page); - totalram_pages++; -#ifdef CONFIG_FLATMEM - max_mapnr = max(pfn, max_mapnr); -#endif - num_physpages++; - - return 0; -} - -/* - * Not currently handling the NUMA case. - * Assuming single node and all memory that - * has been added dynamically that would be - * onlined here is in HIGHMEM. - */ -void __meminit online_page(struct page *page) -{ - ClearPageReserved(page); - add_one_highpage_hotplug(page, page_to_pfn(page)); -} - #ifndef CONFIG_NUMA static void __init set_highmem_pages_init(int bad_ppro) { diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c index 5fbb8652cf59..32ba13b0f818 100644 --- a/arch/x86/mm/init_64.c +++ b/arch/x86/mm/init_64.c @@ -620,15 +620,6 @@ void __init paging_init(void) /* * Memory hotplug specific functions */ -void online_page(struct page *page) -{ - ClearPageReserved(page); - init_page_count(page); - __free_page(page); - totalram_pages++; - num_physpages++; -} - #ifdef CONFIG_MEMORY_HOTPLUG /* * Memory is added always to NORMAL zone. This means you will never get diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index d5094929766d..c8b3ca79de2d 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -219,6 +219,25 @@ static void grow_pgdat_span(struct pglist_data *pgdat, pgdat->node_start_pfn; } +void online_page(struct page *page) +{ + totalram_pages++; + num_physpages++; + +#ifdef CONFIG_HIGHMEM + if (PageHighMem(page)) + totalhigh_pages++; +#endif + +#ifdef CONFIG_FLATMEM + max_mapnr = max(page_to_pfn(page), max_mapnr); +#endif + + ClearPageReserved(page); + init_page_count(page); + __free_page(page); +} + static int online_pages_range(unsigned long start_pfn, unsigned long nr_pages, void *arg) { -- cgit v1.2.3 From e92adcba261fd391591bb63c1703185a04a41554 Mon Sep 17 00:00:00 2001 From: Jeff Moyer Date: Mon, 28 Apr 2008 02:12:04 -0700 Subject: aio: io_getevents() should return if io_destroy() is invoked This patch wakes up a thread waiting in io_getevents if another thread destroys the context. This was tested using a small program that spawns a thread to wait in io_getevents while the parent thread destroys the io context and then waits for the getevents thread to exit. Without this patch, the program hangs indefinitely. With the patch, the program exits as expected. Signed-off-by: Jeff Moyer Cc: Zach Brown Cc: Christopher Smith Cc: Benjamin LaHaise Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/aio.c | 12 +++++++++++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/fs/aio.c b/fs/aio.c index 228368610dfa..ae94e1dea266 100644 --- a/fs/aio.c +++ b/fs/aio.c @@ -1166,7 +1166,10 @@ retry: break; if (min_nr <= i) break; - ret = 0; + if (unlikely(ctx->dead)) { + ret = -EINVAL; + break; + } if (to.timed_out) /* Only check after read evt */ break; /* Try to only show up in io wait if there are ops @@ -1231,6 +1234,13 @@ static void io_destroy(struct kioctx *ioctx) aio_cancel_all(ioctx); wait_for_all_aios(ioctx); + + /* + * Wake up any waiters. The setting of ctx->dead must be seen + * by other CPUs at this point. Right now, we rely on the + * locking done by the above calls to ensure this consistency. + */ + wake_up(&ioctx->wait); put_ioctx(ioctx); /* once for the lookup */ } -- cgit v1.2.3 From 488514d1798289f56f80ed018e246179fe500383 Mon Sep 17 00:00:00 2001 From: Christoph Lameter Date: Mon, 28 Apr 2008 02:12:05 -0700 Subject: Remove set_migrateflags() Migrate flags must be set on slab creation as agreed upon when the antifrag logic was reviewed. Otherwise some slabs of a slabcache will end up in the unmovable and others in the reclaimable section depending on which flag was active when a new slab page was allocated. This likely slid in somehow when antifrag was merged. Remove it. The buffer_heads are always allocated with __GFP_RECLAIMABLE because the SLAB_RECLAIM_ACCOUNT option is set. The set_migrateflags() never had any effect there. Radix tree allocations are not directly reclaimable but they are allocated with __GFP_RECLAIMABLE set on each allocation. We now set SLAB_RECLAIM_ACCOUNT on radix tree slab creation making sure that radix tree slabs are consistently placed in the reclaimable section. Radix tree slabs will also be accounted as such. There is then no user left of set_migratepages. So remove it. Signed-off-by: Christoph Lameter Cc: Mel Gorman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/buffer.c | 3 +-- include/linux/gfp.h | 6 ------ lib/radix-tree.c | 9 ++++----- 3 files changed, 5 insertions(+), 13 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index 39ff14403d13..8b9807523efe 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -3180,8 +3180,7 @@ static void recalc_bh_state(void) struct buffer_head *alloc_buffer_head(gfp_t gfp_flags) { - struct buffer_head *ret = kmem_cache_alloc(bh_cachep, - set_migrateflags(gfp_flags, __GFP_RECLAIMABLE)); + struct buffer_head *ret = kmem_cache_alloc(bh_cachep, gfp_flags); if (ret) { INIT_LIST_HEAD(&ret->b_assoc_buffers); get_cpu_var(bh_accounting).nr++; diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 164be9da3c1b..c17ba4945203 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -144,12 +144,6 @@ static inline enum zone_type gfp_zone(gfp_t flags) return base + ZONE_NORMAL; } -static inline gfp_t set_migrateflags(gfp_t gfp, gfp_t migrate_flags) -{ - BUG_ON((gfp & GFP_MOVABLE_MASK) == GFP_MOVABLE_MASK); - return (gfp & ~(GFP_MOVABLE_MASK)) | migrate_flags; -} - /* * There is only one page-allocator function, and two main namespaces to * it. The alloc_page*() variants return 'struct page *' and as such diff --git a/lib/radix-tree.c b/lib/radix-tree.c index 65f0e758ec38..bd521716ab1a 100644 --- a/lib/radix-tree.c +++ b/lib/radix-tree.c @@ -114,8 +114,7 @@ radix_tree_node_alloc(struct radix_tree_root *root) } } if (ret == NULL) - ret = kmem_cache_alloc(radix_tree_node_cachep, - set_migrateflags(gfp_mask, __GFP_RECLAIMABLE)); + ret = kmem_cache_alloc(radix_tree_node_cachep, gfp_mask); BUG_ON(radix_tree_is_indirect_ptr(ret)); return ret; @@ -150,8 +149,7 @@ int radix_tree_preload(gfp_t gfp_mask) rtp = &__get_cpu_var(radix_tree_preloads); while (rtp->nr < ARRAY_SIZE(rtp->nodes)) { preempt_enable(); - node = kmem_cache_alloc(radix_tree_node_cachep, - set_migrateflags(gfp_mask, __GFP_RECLAIMABLE)); + node = kmem_cache_alloc(radix_tree_node_cachep, gfp_mask); if (node == NULL) goto out; preempt_disable(); @@ -1098,7 +1096,8 @@ void __init radix_tree_init(void) { radix_tree_node_cachep = kmem_cache_create("radix_tree_node", sizeof(struct radix_tree_node), 0, - SLAB_PANIC, radix_tree_node_ctor); + SLAB_PANIC | SLAB_RECLAIM_ACCOUNT, + radix_tree_node_ctor); radix_tree_init_maxindex(); hotcpu_notifier(radix_tree_callback, 0); } -- cgit v1.2.3 From ddc81ed2c5d47a078a3b02c5c3a4345bc2bc3c9b Mon Sep 17 00:00:00 2001 From: Harvey Harrison Date: Mon, 28 Apr 2008 02:12:07 -0700 Subject: remove sparse warning for mmzone.h include/linux/mmzone.h:640:22: warning: potentially expensive pointer subtraction Calculate the offset into the node_zones array rather than the index using casts to (char *) and comparing against the index * sizeof(struct zone). On X86_32 this saves a sar, but code size increases by one byte per is_highmem() use due to 32-bit cmps rather than 16 bit cmps. Before: 207: 2b 80 8c 07 00 00 sub 0x78c(%eax),%eax 20d: c1 f8 0b sar $0xb,%eax 210: 83 f8 02 cmp $0x2,%eax 213: 74 16 je 22b 215: 83 f8 03 cmp $0x3,%eax 218: 0f 85 8f 00 00 00 jne 2ad 21e: 83 3d 00 00 00 00 02 cmpl $0x2,0x0 225: 0f 85 82 00 00 00 jne 2ad 22b: 64 a1 00 00 00 00 mov %fs:0x0,%eax After: 207: 2b 80 8c 07 00 00 sub 0x78c(%eax),%eax 20d: 3d 00 10 00 00 cmp $0x1000,%eax 212: 74 18 je 22c 214: 3d 00 18 00 00 cmp $0x1800,%eax 219: 0f 85 8f 00 00 00 jne 2ae 21f: 83 3d 00 00 00 00 02 cmpl $0x2,0x0 226: 0f 85 82 00 00 00 jne 2ae 22c: 64 a1 00 00 00 00 mov %fs:0x0,%eax [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Harvey Harrison Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mmzone.h | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 9f274a687c7e..451eaa13bc28 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -637,9 +637,10 @@ static inline int is_normal_idx(enum zone_type idx) static inline int is_highmem(struct zone *zone) { #ifdef CONFIG_HIGHMEM - int zone_idx = zone - zone->zone_pgdat->node_zones; - return zone_idx == ZONE_HIGHMEM || - (zone_idx == ZONE_MOVABLE && zone_movable_is_highmem()); + int zone_off = (char *)zone - (char *)zone->zone_pgdat->node_zones; + return zone_off == ZONE_HIGHMEM * sizeof(*zone) || + (zone_off == ZONE_MOVABLE * sizeof(*zone) && + zone_movable_is_highmem()); #else return 0; #endif -- cgit v1.2.3 From 0dd1334faf7e075bfdb6f5284eed65210b296fc1 Mon Sep 17 00:00:00 2001 From: Hisashi Hifumi Date: Mon, 28 Apr 2008 02:12:08 -0700 Subject: fix invalidate_inode_pages2_range() to not clear ret DIO invalidates page cache through invalidate_inode_pages2_range(). invalidate_inode_pages2_range() sets ret=-EIO when invalidate_complete_page2() fails, but this ret is cleared if do_launder_page() succeed on a page of next index. In this case, dio is carried out even if invalidate_complete_page2() fails on some pages. This can cause inconsistency between memory and blocks on HDD because the page cache still exists. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Hisashi Hifumi Cc: Badari Pulavarty Cc: Ken Chen Cc: Zach Brown Cc: Nick Piggin Cc: Trond Myklebust Cc: "J. Bruce Fields" Cc: Chuck Lever Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/truncate.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/mm/truncate.c b/mm/truncate.c index 7d20ce41ecf5..b8961cb63414 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -391,6 +391,7 @@ int invalidate_inode_pages2_range(struct address_space *mapping, pgoff_t next; int i; int ret = 0; + int ret2 = 0; int did_range_unmap = 0; int wrapped = 0; @@ -438,9 +439,13 @@ int invalidate_inode_pages2_range(struct address_space *mapping, } } BUG_ON(page_mapped(page)); - ret = do_launder_page(mapping, page); - if (ret == 0 && !invalidate_complete_page2(mapping, page)) - ret = -EIO; + ret2 = do_launder_page(mapping, page); + if (ret2 == 0) { + if (!invalidate_complete_page2(mapping, page)) + ret2 = -EIO; + } + if (ret2 < 0) + ret = ret2; unlock_page(page); } pagevec_release(&pvec); -- cgit v1.2.3 From 4d3d5b41a72b52555d43efbfc4ccde6ba6e5444f Mon Sep 17 00:00:00 2001 From: Oleg Nesterov Date: Mon, 28 Apr 2008 02:12:10 -0700 Subject: mmap_region: cleanup the final vma_merge() related code It is not easy to actually understand the "if (!file || !vma_merge())" code, turn it into "if (file && vma_merge())". This makes immediately obvious that the subsequent "if (file)" is superfluous. As Hugh Dickins pointed out, we can also factor out the ->i_writecount corrections, and add a small comment about that. Signed-off-by: Oleg Nesterov Cc: Miklos Szeredi Cc: Hugh Dickins Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/mmap.c | 23 ++++++++++------------- 1 file changed, 10 insertions(+), 13 deletions(-) diff --git a/mm/mmap.c b/mm/mmap.c index a32d28ce31cd..6aaf657adb87 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1068,7 +1068,6 @@ int vma_wants_writenotify(struct vm_area_struct *vma) mapping_cap_account_dirty(vma->vm_file->f_mapping); } - unsigned long mmap_region(struct file *file, unsigned long addr, unsigned long len, unsigned long flags, unsigned int vm_flags, unsigned long pgoff, @@ -1181,22 +1180,20 @@ munmap_back: if (vma_wants_writenotify(vma)) vma->vm_page_prot = vm_get_page_prot(vm_flags & ~VM_SHARED); - if (!file || !vma_merge(mm, prev, addr, vma->vm_end, + if (file && vma_merge(mm, prev, addr, vma->vm_end, vma->vm_flags, NULL, file, pgoff, vma_policy(vma))) { - file = vma->vm_file; - vma_link(mm, vma, prev, rb_link, rb_parent); - if (correct_wcount) - atomic_inc(&inode->i_writecount); - } else { - if (file) { - if (correct_wcount) - atomic_inc(&inode->i_writecount); - fput(file); - } mpol_free(vma_policy(vma)); kmem_cache_free(vm_area_cachep, vma); + fput(file); + } else { + vma_link(mm, vma, prev, rb_link, rb_parent); + file = vma->vm_file; } -out: + + /* Once vma denies write, undo our temporary denial count */ + if (correct_wcount) + atomic_inc(&inode->i_writecount); +out: mm->total_vm += len >> PAGE_SHIFT; vm_stat_account(mm, vm_flags, file, len >> PAGE_SHIFT); if (vm_flags & VM_LOCKED) { -- cgit v1.2.3 From 3c18ddd160d1fcd46d1131d9ad6c594dd8e9af99 Mon Sep 17 00:00:00 2001 From: Nick Piggin Date: Mon, 28 Apr 2008 02:12:10 -0700 Subject: mm: remove nopage Nothing in the tree uses nopage any more. Remove support for it in the core mm code and documentation (and a few stray references to it in comments). Signed-off-by: Nick Piggin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/feature-removal-schedule.txt | 9 --------- Documentation/filesystems/Locking | 3 --- drivers/media/video/vino.c | 2 +- drivers/video/vermilion/vermilion.c | 5 +++-- fs/gfs2/ops_address.c | 2 +- include/linux/mm.h | 8 -------- mm/memory.c | 22 +++++----------------- mm/mincore.c | 2 +- mm/rmap.c | 1 - 9 files changed, 11 insertions(+), 43 deletions(-) diff --git a/Documentation/feature-removal-schedule.txt b/Documentation/feature-removal-schedule.txt index 448729fcaeb1..599fe55bf297 100644 --- a/Documentation/feature-removal-schedule.txt +++ b/Documentation/feature-removal-schedule.txt @@ -128,15 +128,6 @@ Who: Arjan van de Ven --------------------------- -What: vm_ops.nopage -When: Soon, provided in-kernel callers have been converted -Why: This interface is replaced by vm_ops.fault, but it has been around - forever, is used by a lot of drivers, and doesn't cost much to - maintain. -Who: Nick Piggin - ---------------------------- - What: PHYSDEVPATH, PHYSDEVBUS, PHYSDEVDRIVER in the uevent environment When: October 2008 Why: The stacking of class devices makes these values misleading and diff --git a/Documentation/filesystems/Locking b/Documentation/filesystems/Locking index 42d4b30b1045..c2992bc54f2f 100644 --- a/Documentation/filesystems/Locking +++ b/Documentation/filesystems/Locking @@ -511,7 +511,6 @@ prototypes: void (*open)(struct vm_area_struct*); void (*close)(struct vm_area_struct*); int (*fault)(struct vm_area_struct*, struct vm_fault *); - struct page *(*nopage)(struct vm_area_struct*, unsigned long, int *); int (*page_mkwrite)(struct vm_area_struct *, struct page *); locking rules: @@ -519,7 +518,6 @@ locking rules: open: no yes close: no yes fault: no yes -nopage: no yes page_mkwrite: no yes no ->page_mkwrite() is called when a previously read-only page is @@ -537,4 +535,3 @@ NULL. ipc/shm.c::shm_delete() - may need BKL. ->read() and ->write() in many drivers are (probably) missing BKL. -drivers/sgi/char/graphics.c::sgi_graphics_nopage() - may need BKL. diff --git a/drivers/media/video/vino.c b/drivers/media/video/vino.c index d545c98dd5e7..01ea99c9bc1a 100644 --- a/drivers/media/video/vino.c +++ b/drivers/media/video/vino.c @@ -13,7 +13,7 @@ /* * TODO: * - remove "mark pages reserved-hacks" from memory allocation code - * and implement nopage() + * and implement fault() * - check decimation, calculating and reporting image size when * using decimation * - implement read(), user mode buffers and overlay (?) diff --git a/drivers/video/vermilion/vermilion.c b/drivers/video/vermilion/vermilion.c index 2aa71eb67c2b..c18f1884b550 100644 --- a/drivers/video/vermilion/vermilion.c +++ b/drivers/video/vermilion/vermilion.c @@ -112,8 +112,9 @@ static int vmlfb_alloc_vram_area(struct vram_area *va, unsigned max_order, /* * It seems like __get_free_pages only ups the usage count - * of the first page. This doesn't work with nopage mapping, so - * up the usage count once more. + * of the first page. This doesn't work with fault mapping, so + * up the usage count once more (XXX: should use split_page or + * compound page). */ memset((void *)va->logical, 0x00, va->size); diff --git a/fs/gfs2/ops_address.c b/fs/gfs2/ops_address.c index 90a04a6e3789..f55394e57cb2 100644 --- a/fs/gfs2/ops_address.c +++ b/fs/gfs2/ops_address.c @@ -438,7 +438,7 @@ static int stuffed_readpage(struct gfs2_inode *ip, struct page *page) int error; /* - * Due to the order of unstuffing files and ->nopage(), we can be + * Due to the order of unstuffing files and ->fault(), we can be * asked for a zero page in the case of a stuffed file being extended, * so we need to supply one here. It doesn't happen often. */ diff --git a/include/linux/mm.h b/include/linux/mm.h index 286d31521605..ca973359fe5f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -164,8 +164,6 @@ struct vm_operations_struct { void (*open)(struct vm_area_struct * area); void (*close)(struct vm_area_struct * area); int (*fault)(struct vm_area_struct *vma, struct vm_fault *vmf); - struct page *(*nopage)(struct vm_area_struct *area, - unsigned long address, int *type); unsigned long (*nopfn)(struct vm_area_struct *area, unsigned long address); @@ -648,12 +646,6 @@ static inline int page_mapped(struct page *page) return atomic_read(&(page)->_mapcount) >= 0; } -/* - * Error return values for the *_nopage functions - */ -#define NOPAGE_SIGBUS (NULL) -#define NOPAGE_OOM ((struct page *) (-1)) - /* * Error return values for the *_nopfn functions */ diff --git a/mm/memory.c b/mm/memory.c index 0d14d1e58a5f..46958fb97c2d 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1057,8 +1057,7 @@ int get_user_pages(struct task_struct *tsk, struct mm_struct *mm, if (pages) foll_flags |= FOLL_GET; if (!write && !(vma->vm_flags & VM_LOCKED) && - (!vma->vm_ops || (!vma->vm_ops->nopage && - !vma->vm_ops->fault))) + (!vma->vm_ops || !vma->vm_ops->fault)) foll_flags |= FOLL_ANON; do { @@ -2199,20 +2198,9 @@ static int __do_fault(struct mm_struct *mm, struct vm_area_struct *vma, BUG_ON(vma->vm_flags & VM_PFNMAP); - if (likely(vma->vm_ops->fault)) { - ret = vma->vm_ops->fault(vma, &vmf); - if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE))) - return ret; - } else { - /* Legacy ->nopage path */ - ret = 0; - vmf.page = vma->vm_ops->nopage(vma, address & PAGE_MASK, &ret); - /* no page was available -- either SIGBUS or OOM */ - if (unlikely(vmf.page == NOPAGE_SIGBUS)) - return VM_FAULT_SIGBUS; - else if (unlikely(vmf.page == NOPAGE_OOM)) - return VM_FAULT_OOM; - } + ret = vma->vm_ops->fault(vma, &vmf); + if (unlikely(ret & (VM_FAULT_ERROR | VM_FAULT_NOPAGE))) + return ret; /* * For consistency in subsequent calls, make the faulted page always @@ -2458,7 +2446,7 @@ static inline int handle_pte_fault(struct mm_struct *mm, if (!pte_present(entry)) { if (pte_none(entry)) { if (vma->vm_ops) { - if (vma->vm_ops->fault || vma->vm_ops->nopage) + if (likely(vma->vm_ops->fault)) return do_linear_fault(mm, vma, address, pte, pmd, write_access, entry); if (unlikely(vma->vm_ops->nopfn)) diff --git a/mm/mincore.c b/mm/mincore.c index 5efe0ded69b1..5178800bc129 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -33,7 +33,7 @@ static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff) * When tmpfs swaps out a page from a file, any process mapping that * file will not get a swp_entry_t in its pte, but rather it is like * any other file mapping (ie. marked !present and faulted in with - * tmpfs's .nopage). So swapped out tmpfs mappings are tested here. + * tmpfs's .fault). So swapped out tmpfs mappings are tested here. * * However when tmpfs moves the page from pagecache and into swapcache, * it is still in core, but the find_get_page below won't find it. diff --git a/mm/rmap.c b/mm/rmap.c index e9bb6b1093f6..bf0a5b7cfb8e 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -662,7 +662,6 @@ void page_remove_rmap(struct page *page, struct vm_area_struct *vma) printk (KERN_EMERG " page->mapping = %p\n", page->mapping); print_symbol (KERN_EMERG " vma->vm_ops = %s\n", (unsigned long)vma->vm_ops); if (vma->vm_ops) { - print_symbol (KERN_EMERG " vma->vm_ops->nopage = %s\n", (unsigned long)vma->vm_ops->nopage); print_symbol (KERN_EMERG " vma->vm_ops->fault = %s\n", (unsigned long)vma->vm_ops->fault); } if (vma->vm_file && vma->vm_file->f_op) -- cgit v1.2.3 From 9d02dbc8137759e4c2f91db0b7f9c8a1ec2a9276 Mon Sep 17 00:00:00 2001 From: Adrian Bunk Date: Mon, 28 Apr 2008 02:12:11 -0700 Subject: make swap_pte_to_pagemap_entry() static Make the needlessly global swap_pte_to_pagemap_entry() static. Signed-off-by: Adrian Bunk Acked-by: Matt Mackall Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/proc/task_mmu.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 9dfb5ff24209..f4ab76c7c662 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -579,7 +579,7 @@ static int pagemap_pte_hole(unsigned long start, unsigned long end, return err; } -u64 swap_pte_to_pagemap_entry(pte_t pte) +static u64 swap_pte_to_pagemap_entry(pte_t pte) { swp_entry_t e = pte_to_swp_entry(pte); return swp_type(e) | (swp_offset(e) << MAX_SWAPFILES_SHIFT); -- cgit v1.2.3 From dac1d27bc8d5ca636d3014ecfdf94407031d1970 Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Mon, 28 Apr 2008 02:12:12 -0700 Subject: mm: use zonelists instead of zones when direct reclaiming pages The following patches replace multiple zonelists per node with two zonelists that are filtered based on the GFP flags. The patches as a set fix a bug with regard to the use of MPOL_BIND and ZONE_MOVABLE. With this patchset, the MPOL_BIND will apply to the two highest zones when the highest zone is ZONE_MOVABLE. This should be considered as an alternative fix for the MPOL_BIND+ZONE_MOVABLE in 2.6.23 to the previously discussed hack that filters only custom zonelists. The first patch cleans up an inconsistency where direct reclaim uses zonelist->zones where other places use zonelist. The second patch introduces a helper function node_zonelist() for looking up the appropriate zonelist for a GFP mask which simplifies patches later in the set. The third patch defines/remembers the "preferred zone" for numa statistics, as it is no longer always the first zone in a zonelist. The forth patch replaces multiple zonelists with two zonelists that are filtered. The two zonelists are due to the fact that the memoryless patchset introduces a second set of zonelists for __GFP_THISNODE. The fifth patch introduces helper macros for retrieving the zone and node indices of entries in a zonelist. The final patch introduces filtering of the zonelists based on a nodemask. Two zonelists exist per node, one for normal allocations and one for __GFP_THISNODE. Performance results varied depending on the machine configuration. In real workloads the gain/loss will depend on how much the userspace portion of the benchmark benefits from having more cache available due to reduced referencing of zonelists. These are the range of performance losses/gains when running against 2.6.24-rc4-mm1. The set and these machines are a mix of i386, x86_64 and ppc64 both NUMA and non-NUMA. loss to gain Total CPU time on Kernbench: -0.86% to 1.13% Elapsed time on Kernbench: -0.79% to 0.76% page_test from aim9: -4.37% to 0.79% brk_test from aim9: -0.71% to 4.07% fork_test from aim9: -1.84% to 4.60% exec_test from aim9: -0.71% to 1.08% This patch: The allocator deals with zonelists which indicate the order in which zones should be targeted for an allocation. Similarly, direct reclaim of pages iterates over an array of zones. For consistency, this patch converts direct reclaim to use a zonelist. No functionality is changed by this patch. This simplifies zonelist iterators in the next patch. Signed-off-by: Mel Gorman Acked-by: Christoph Lameter Signed-off-by: Lee Schermerhorn Cc: KAMEZAWA Hiroyuki Cc: Mel Gorman Cc: Christoph Lameter Cc: Hugh Dickins Cc: Nick Piggin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/buffer.c | 8 ++++---- include/linux/swap.h | 2 +- mm/page_alloc.c | 2 +- mm/vmscan.c | 21 ++++++++++++--------- 4 files changed, 18 insertions(+), 15 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index 8b9807523efe..1dae94acb3fe 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -360,16 +360,16 @@ void invalidate_bdev(struct block_device *bdev) */ static void free_more_memory(void) { - struct zone **zones; + struct zonelist *zonelist; pg_data_t *pgdat; wakeup_pdflush(1024); yield(); for_each_online_pgdat(pgdat) { - zones = pgdat->node_zonelists[gfp_zone(GFP_NOFS)].zones; - if (*zones) - try_to_free_pages(zones, 0, GFP_NOFS); + zonelist = &pgdat->node_zonelists[gfp_zone(GFP_NOFS)]; + if (zonelist->zones[0]) + try_to_free_pages(zonelist, 0, GFP_NOFS); } } diff --git a/include/linux/swap.h b/include/linux/swap.h index 878459ae0454..4286e7ac2b00 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -181,7 +181,7 @@ extern int rotate_reclaimable_page(struct page *page); extern void swap_setup(void); /* linux/mm/vmscan.c */ -extern unsigned long try_to_free_pages(struct zone **zones, int order, +extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order, gfp_t gfp_mask); extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem, gfp_t gfp_mask); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 32e796af12a1..1bda771a072a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1569,7 +1569,7 @@ nofail_alloc: reclaim_state.reclaimed_slab = 0; p->reclaim_state = &reclaim_state; - did_some_progress = try_to_free_pages(zonelist->zones, order, gfp_mask); + did_some_progress = try_to_free_pages(zonelist, order, gfp_mask); p->reclaim_state = NULL; p->flags &= ~PF_MEMALLOC; diff --git a/mm/vmscan.c b/mm/vmscan.c index f80a5b7c057f..ef8551e0d2d0 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1246,10 +1246,11 @@ static unsigned long shrink_zone(int priority, struct zone *zone, * If a zone is deemed to be full of pinned pages then just give it a light * scan then give up on it. */ -static unsigned long shrink_zones(int priority, struct zone **zones, +static unsigned long shrink_zones(int priority, struct zonelist *zonelist, struct scan_control *sc) { unsigned long nr_reclaimed = 0; + struct zone **zones = zonelist->zones; int i; @@ -1301,8 +1302,8 @@ static unsigned long shrink_zones(int priority, struct zone **zones, * holds filesystem locks which prevent writeout this might not work, and the * allocation attempt will fail. */ -static unsigned long do_try_to_free_pages(struct zone **zones, gfp_t gfp_mask, - struct scan_control *sc) +static unsigned long do_try_to_free_pages(struct zonelist *zonelist, + gfp_t gfp_mask, struct scan_control *sc) { int priority; int ret = 0; @@ -1310,6 +1311,7 @@ static unsigned long do_try_to_free_pages(struct zone **zones, gfp_t gfp_mask, unsigned long nr_reclaimed = 0; struct reclaim_state *reclaim_state = current->reclaim_state; unsigned long lru_pages = 0; + struct zone **zones = zonelist->zones; int i; if (scan_global_lru(sc)) @@ -1333,7 +1335,7 @@ static unsigned long do_try_to_free_pages(struct zone **zones, gfp_t gfp_mask, sc->nr_scanned = 0; if (!priority) disable_swap_token(); - nr_reclaimed += shrink_zones(priority, zones, sc); + nr_reclaimed += shrink_zones(priority, zonelist, sc); /* * Don't shrink slabs when reclaiming memory from * over limit cgroups @@ -1397,7 +1399,8 @@ out: return ret; } -unsigned long try_to_free_pages(struct zone **zones, int order, gfp_t gfp_mask) +unsigned long try_to_free_pages(struct zonelist *zonelist, int order, + gfp_t gfp_mask) { struct scan_control sc = { .gfp_mask = gfp_mask, @@ -1410,7 +1413,7 @@ unsigned long try_to_free_pages(struct zone **zones, int order, gfp_t gfp_mask) .isolate_pages = isolate_pages_global, }; - return do_try_to_free_pages(zones, gfp_mask, &sc); + return do_try_to_free_pages(zonelist, gfp_mask, &sc); } #ifdef CONFIG_CGROUP_MEM_RES_CTLR @@ -1428,11 +1431,11 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont, .mem_cgroup = mem_cont, .isolate_pages = mem_cgroup_isolate_pages, }; - struct zone **zones; + struct zonelist *zonelist; int target_zone = gfp_zone(GFP_HIGHUSER_MOVABLE); - zones = NODE_DATA(numa_node_id())->node_zonelists[target_zone].zones; - if (do_try_to_free_pages(zones, sc.gfp_mask, &sc)) + zonelist = &NODE_DATA(numa_node_id())->node_zonelists[target_zone]; + if (do_try_to_free_pages(zonelist, sc.gfp_mask, &sc)) return 1; return 0; } -- cgit v1.2.3 From 0e88460da6ab7bb6a7ef83675412ed5b6315d741 Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Mon, 28 Apr 2008 02:12:14 -0700 Subject: mm: introduce node_zonelist() for accessing the zonelist for a GFP mask Introduce a node_zonelist() helper function. It is used to lookup the appropriate zonelist given a node and a GFP mask. The patch on its own is a cleanup but it helps clarify parts of the two-zonelist-per-node patchset. If necessary, it can be merged with the next patch in this set without problems. Reviewed-by: Christoph Lameter Signed-off-by: Mel Gorman Signed-off-by: Lee Schermerhorn Cc: KAMEZAWA Hiroyuki Cc: Mel Gorman Cc: Christoph Lameter Cc: Hugh Dickins Cc: Nick Piggin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/char/sysrq.c | 3 +-- fs/buffer.c | 6 +++--- include/linux/gfp.h | 8 ++++++-- include/linux/mempolicy.h | 2 +- mm/mempolicy.c | 6 +++--- mm/page_alloc.c | 3 +-- mm/slab.c | 3 +-- mm/slub.c | 3 +-- 8 files changed, 17 insertions(+), 17 deletions(-) diff --git a/drivers/char/sysrq.c b/drivers/char/sysrq.c index de60e1ea4fb3..1ade193c9128 100644 --- a/drivers/char/sysrq.c +++ b/drivers/char/sysrq.c @@ -271,8 +271,7 @@ static struct sysrq_key_op sysrq_term_op = { static void moom_callback(struct work_struct *ignored) { - out_of_memory(&NODE_DATA(0)->node_zonelists[ZONE_NORMAL], - GFP_KERNEL, 0); + out_of_memory(node_zonelist(0, GFP_KERNEL), GFP_KERNEL, 0); } static DECLARE_WORK(moom_work, moom_callback); diff --git a/fs/buffer.c b/fs/buffer.c index 1dae94acb3fe..71358499bc57 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -361,13 +361,13 @@ void invalidate_bdev(struct block_device *bdev) static void free_more_memory(void) { struct zonelist *zonelist; - pg_data_t *pgdat; + int nid; wakeup_pdflush(1024); yield(); - for_each_online_pgdat(pgdat) { - zonelist = &pgdat->node_zonelists[gfp_zone(GFP_NOFS)]; + for_each_online_node(nid) { + zonelist = node_zonelist(nid, GFP_NOFS); if (zonelist->zones[0]) try_to_free_pages(zonelist, 0, GFP_NOFS); } diff --git a/include/linux/gfp.h b/include/linux/gfp.h index c17ba4945203..e865d51f1c74 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -154,10 +154,15 @@ static inline enum zone_type gfp_zone(gfp_t flags) /* * We get the zone list from the current node and the gfp_mask. * This zone list contains a maximum of MAXNODES*MAX_NR_ZONES zones. + * There are many zonelists per node, two for each active zone. * * For the normal case of non-DISCONTIGMEM systems the NODE_DATA() gets * optimized to &contig_page_data at compile-time. */ +static inline struct zonelist *node_zonelist(int nid, gfp_t flags) +{ + return NODE_DATA(nid)->node_zonelists + gfp_zone(flags); +} #ifndef HAVE_ARCH_FREE_PAGE static inline void arch_free_page(struct page *page, int order) { } @@ -178,8 +183,7 @@ static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask, if (nid < 0) nid = numa_node_id(); - return __alloc_pages(gfp_mask, order, - NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_mask)); + return __alloc_pages(gfp_mask, order, node_zonelist(nid, gfp_mask)); } #ifdef CONFIG_NUMA diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index 59c4865bc85f..69160dc32d48 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -241,7 +241,7 @@ static inline void mpol_fix_fork_child_flag(struct task_struct *p) static inline struct zonelist *huge_zonelist(struct vm_area_struct *vma, unsigned long addr, gfp_t gfp_flags, struct mempolicy **mpol) { - return NODE_DATA(0)->node_zonelists + gfp_zone(gfp_flags); + return node_zonelist(0, gfp_flags); } static inline int do_migrate_pages(struct mm_struct *mm, diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 3c3601121509..5d20bf44062f 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1183,7 +1183,7 @@ static struct zonelist *zonelist_policy(gfp_t gfp, struct mempolicy *policy) nd = 0; BUG(); } - return NODE_DATA(nd)->node_zonelists + gfp_zone(gfp); + return node_zonelist(nd, gfp); } /* Do dynamic interleaving for a process */ @@ -1299,7 +1299,7 @@ struct zonelist *huge_zonelist(struct vm_area_struct *vma, unsigned long addr, if (unlikely(pol != &default_policy && pol != current->mempolicy)) __mpol_free(pol); /* finished with pol */ - return NODE_DATA(nid)->node_zonelists + gfp_zone(gfp_flags); + return node_zonelist(nid, gfp_flags); } zl = zonelist_policy(GFP_HIGHUSER, pol); @@ -1321,7 +1321,7 @@ static struct page *alloc_page_interleave(gfp_t gfp, unsigned order, struct zonelist *zl; struct page *page; - zl = NODE_DATA(nid)->node_zonelists + gfp_zone(gfp); + zl = node_zonelist(nid, gfp); page = __alloc_pages(gfp, order, zl); if (page && page_zone(page) == zl->zones[0]) inc_zone_page_state(page, NUMA_INTERLEAVE_HIT); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 1bda771a072a..63ff71830ea4 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1713,10 +1713,9 @@ EXPORT_SYMBOL(free_pages); static unsigned int nr_free_zone_pages(int offset) { /* Just pick one node, since fallback list is circular */ - pg_data_t *pgdat = NODE_DATA(numa_node_id()); unsigned int sum = 0; - struct zonelist *zonelist = pgdat->node_zonelists + offset; + struct zonelist *zonelist = node_zonelist(numa_node_id(), GFP_KERNEL); struct zone **zonep = zonelist->zones; struct zone *zone; diff --git a/mm/slab.c b/mm/slab.c index 03927cb5ec9e..5488c54b1172 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -3249,8 +3249,7 @@ static void *fallback_alloc(struct kmem_cache *cache, gfp_t flags) if (flags & __GFP_THISNODE) return NULL; - zonelist = &NODE_DATA(slab_node(current->mempolicy)) - ->node_zonelists[gfp_zone(flags)]; + zonelist = node_zonelist(slab_node(current->mempolicy), flags); local_flags = flags & (GFP_CONSTRAINT_MASK|GFP_RECLAIM_MASK); retry: diff --git a/mm/slub.c b/mm/slub.c index 39592b5ce68a..19ebbfb20689 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1309,8 +1309,7 @@ static struct page *get_any_partial(struct kmem_cache *s, gfp_t flags) get_cycles() % 1024 > s->remote_node_defrag_ratio) return NULL; - zonelist = &NODE_DATA( - slab_node(current->mempolicy))->node_zonelists[gfp_zone(flags)]; + zonelist = node_zonelist(slab_node(current->mempolicy), flags); for (z = zonelist->zones; *z; z++) { struct kmem_cache_node *n; -- cgit v1.2.3 From 18ea7e710d2452fa726814a406779188028cf1bf Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Mon, 28 Apr 2008 02:12:14 -0700 Subject: mm: remember what the preferred zone is for zone_statistics On NUMA, zone_statistics() is used to record events like numa hit, miss and foreign. It assumes that the first zone in a zonelist is the preferred zone. When multiple zonelists are replaced by one that is filtered, this is no longer the case. This patch records what the preferred zone is rather than assuming the first zone in the zonelist is it. This simplifies the reading of later patches in this set. Signed-off-by: Mel Gorman Signed-off-by: Lee Schermerhorn Cc: KAMEZAWA Hiroyuki Cc: Mel Gorman Reviewed-by: Christoph Lameter Cc: Hugh Dickins Cc: Nick Piggin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/vmstat.h | 2 +- mm/page_alloc.c | 9 +++++---- mm/vmstat.c | 6 +++--- 3 files changed, 9 insertions(+), 8 deletions(-) diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index 9f1b4b46151e..e726b6d46495 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -174,7 +174,7 @@ static inline unsigned long node_page_state(int node, zone_page_state(&zones[ZONE_MOVABLE], item); } -extern void zone_statistics(struct zonelist *, struct zone *); +extern void zone_statistics(struct zone *, struct zone *); #else diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 63ff71830ea4..187efd47a446 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1050,7 +1050,7 @@ void split_page(struct page *page, unsigned int order) * we cheat by calling it from here, in the order > 0 path. Saves a branch * or two. */ -static struct page *buffered_rmqueue(struct zonelist *zonelist, +static struct page *buffered_rmqueue(struct zone *preferred_zone, struct zone *zone, int order, gfp_t gfp_flags) { unsigned long flags; @@ -1102,7 +1102,7 @@ again: } __count_zone_vm_events(PGALLOC, zone, 1 << order); - zone_statistics(zonelist, zone); + zone_statistics(preferred_zone, zone); local_irq_restore(flags); put_cpu(); @@ -1383,7 +1383,7 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, struct zone **z; struct page *page = NULL; int classzone_idx = zone_idx(zonelist->zones[0]); - struct zone *zone; + struct zone *zone, *preferred_zone; nodemask_t *allowednodes = NULL;/* zonelist_cache approximation */ int zlc_active = 0; /* set if using zonelist_cache */ int did_zlc_setup = 0; /* just call zlc_setup() one time */ @@ -1395,6 +1395,7 @@ zonelist_scan: * See also cpuset_zone_allowed() comment in kernel/cpuset.c. */ z = zonelist->zones; + preferred_zone = *z; do { /* @@ -1433,7 +1434,7 @@ zonelist_scan: } } - page = buffered_rmqueue(zonelist, zone, order, gfp_mask); + page = buffered_rmqueue(preferred_zone, zone, order, gfp_mask); if (page) break; this_zone_full: diff --git a/mm/vmstat.c b/mm/vmstat.c index 7c7286e9506d..879bcc0a1d4c 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -364,13 +364,13 @@ void refresh_cpu_vm_stats(int cpu) * * Must be called with interrupts disabled. */ -void zone_statistics(struct zonelist *zonelist, struct zone *z) +void zone_statistics(struct zone *preferred_zone, struct zone *z) { - if (z->zone_pgdat == zonelist->zones[0]->zone_pgdat) { + if (z->zone_pgdat == preferred_zone->zone_pgdat) { __inc_zone_state(z, NUMA_HIT); } else { __inc_zone_state(z, NUMA_MISS); - __inc_zone_state(zonelist->zones[0], NUMA_FOREIGN); + __inc_zone_state(preferred_zone, NUMA_FOREIGN); } if (z->node == numa_node_id()) __inc_zone_state(z, NUMA_LOCAL); -- cgit v1.2.3 From 54a6eb5c4765aa573a030ceeba2c14e3d2ea5706 Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Mon, 28 Apr 2008 02:12:16 -0700 Subject: mm: use two zonelist that are filtered by GFP mask Currently a node has two sets of zonelists, one for each zone type in the system and a second set for GFP_THISNODE allocations. Based on the zones allowed by a gfp mask, one of these zonelists is selected. All of these zonelists consume memory and occupy cache lines. This patch replaces the multiple zonelists per-node with two zonelists. The first contains all populated zones in the system, ordered by distance, for fallback allocations when the target/preferred node has no free pages. The second contains all populated zones in the node suitable for GFP_THISNODE allocations. An iterator macro is introduced called for_each_zone_zonelist() that interates through each zone allowed by the GFP flags in the selected zonelist. Signed-off-by: Mel Gorman Acked-by: Christoph Lameter Signed-off-by: Lee Schermerhorn Cc: KAMEZAWA Hiroyuki Cc: Mel Gorman Cc: Christoph Lameter Cc: Hugh Dickins Cc: Nick Piggin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/parisc/mm/init.c | 11 ++-- fs/buffer.c | 10 +-- include/linux/gfp.h | 13 +++- include/linux/mmzone.h | 65 ++++++++++++------- mm/hugetlb.c | 8 +-- mm/oom_kill.c | 8 ++- mm/page_alloc.c | 170 +++++++++++++++++++++---------------------------- mm/slab.c | 8 ++- mm/slub.c | 8 ++- mm/vmscan.c | 21 +++--- 10 files changed, 168 insertions(+), 154 deletions(-) diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c index eb80f5e33d7d..9bb6136d77c2 100644 --- a/arch/parisc/mm/init.c +++ b/arch/parisc/mm/init.c @@ -603,15 +603,18 @@ void show_mem(void) #ifdef CONFIG_DISCONTIGMEM { struct zonelist *zl; - int i, j, k; + int i, j; for (i = 0; i < npmem_ranges; i++) { + zl = node_zonelist(i); for (j = 0; j < MAX_NR_ZONES; j++) { - zl = NODE_DATA(i)->node_zonelists + j; + struct zone **z; + struct zone *zone; printk("Zone list for zone %d on node %d: ", j, i); - for (k = 0; zl->zones[k] != NULL; k++) - printk("[%d/%s] ", zone_to_nid(zl->zones[k]), zl->zones[k]->name); + for_each_zone_zonelist(zone, z, zl, j) + printk("[%d/%s] ", zone_to_nid(zone), + zone->name); printk("\n"); } } diff --git a/fs/buffer.c b/fs/buffer.c index 71358499bc57..9b5434a80479 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -360,16 +360,18 @@ void invalidate_bdev(struct block_device *bdev) */ static void free_more_memory(void) { - struct zonelist *zonelist; + struct zone **zones; int nid; wakeup_pdflush(1024); yield(); for_each_online_node(nid) { - zonelist = node_zonelist(nid, GFP_NOFS); - if (zonelist->zones[0]) - try_to_free_pages(zonelist, 0, GFP_NOFS); + zones = first_zones_zonelist(node_zonelist(nid, GFP_NOFS), + gfp_zone(GFP_NOFS)); + if (*zones) + try_to_free_pages(node_zonelist(nid, GFP_NOFS), 0, + GFP_NOFS); } } diff --git a/include/linux/gfp.h b/include/linux/gfp.h index e865d51f1c74..e1c6064cb6c7 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -151,17 +151,26 @@ static inline enum zone_type gfp_zone(gfp_t flags) * virtual kernel addresses to the allocated page(s). */ +static inline int gfp_zonelist(gfp_t flags) +{ + if (NUMA_BUILD && unlikely(flags & __GFP_THISNODE)) + return 1; + + return 0; +} + /* * We get the zone list from the current node and the gfp_mask. * This zone list contains a maximum of MAXNODES*MAX_NR_ZONES zones. - * There are many zonelists per node, two for each active zone. + * There are two zonelists per node, one for all zones with memory and + * one containing just zones from the node the zonelist belongs to. * * For the normal case of non-DISCONTIGMEM systems the NODE_DATA() gets * optimized to &contig_page_data at compile-time. */ static inline struct zonelist *node_zonelist(int nid, gfp_t flags) { - return NODE_DATA(nid)->node_zonelists + gfp_zone(flags); + return NODE_DATA(nid)->node_zonelists + gfp_zonelist(flags); } #ifndef HAVE_ARCH_FREE_PAGE diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 451eaa13bc28..d5c33a0b89e9 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -393,10 +393,10 @@ static inline int zone_is_oom_locked(const struct zone *zone) * The NUMA zonelists are doubled becausse we need zonelists that restrict the * allocations to a single node for GFP_THISNODE. * - * [0 .. MAX_NR_ZONES -1] : Zonelists with fallback - * [MAZ_NR_ZONES ... MAZ_ZONELISTS -1] : No fallback (GFP_THISNODE) + * [0] : Zonelist with fallback + * [1] : No fallback (GFP_THISNODE) */ -#define MAX_ZONELISTS (2 * MAX_NR_ZONES) +#define MAX_ZONELISTS 2 /* @@ -464,7 +464,7 @@ struct zonelist_cache { unsigned long last_full_zap; /* when last zap'd (jiffies) */ }; #else -#define MAX_ZONELISTS MAX_NR_ZONES +#define MAX_ZONELISTS 1 struct zonelist_cache; #endif @@ -486,24 +486,6 @@ struct zonelist { #endif }; -#ifdef CONFIG_NUMA -/* - * Only custom zonelists like MPOL_BIND need to be filtered as part of - * policies. As described in the comment for struct zonelist_cache, these - * zonelists will not have a zlcache so zlcache_ptr will not be set. Use - * that to determine if the zonelists needs to be filtered or not. - */ -static inline int alloc_should_filter_zonelist(struct zonelist *zonelist) -{ - return !zonelist->zlcache_ptr; -} -#else -static inline int alloc_should_filter_zonelist(struct zonelist *zonelist) -{ - return 0; -} -#endif /* CONFIG_NUMA */ - #ifdef CONFIG_ARCH_POPULATES_NODE_MAP struct node_active_region { unsigned long start_pfn; @@ -731,6 +713,45 @@ extern struct zone *next_zone(struct zone *zone); zone; \ zone = next_zone(zone)) +/* Returns the first zone at or below highest_zoneidx in a zonelist */ +static inline struct zone **first_zones_zonelist(struct zonelist *zonelist, + enum zone_type highest_zoneidx) +{ + struct zone **z; + + /* Find the first suitable zone to use for the allocation */ + z = zonelist->zones; + while (*z && zone_idx(*z) > highest_zoneidx) + z++; + + return z; +} + +/* Returns the next zone at or below highest_zoneidx in a zonelist */ +static inline struct zone **next_zones_zonelist(struct zone **z, + enum zone_type highest_zoneidx) +{ + /* Find the next suitable zone to use for the allocation */ + while (*z && zone_idx(*z) > highest_zoneidx) + z++; + + return z; +} + +/** + * for_each_zone_zonelist - helper macro to iterate over valid zones in a zonelist at or below a given zone index + * @zone - The current zone in the iterator + * @z - The current pointer within zonelist->zones being iterated + * @zlist - The zonelist being iterated + * @highidx - The zone index of the highest zone to return + * + * This iterator iterates though all zones at or below a given zone index. + */ +#define for_each_zone_zonelist(zone, z, zlist, highidx) \ + for (z = first_zones_zonelist(zlist, highidx), zone = *z++; \ + zone; \ + z = next_zones_zonelist(z, highidx), zone = *z++) + #ifdef CONFIG_SPARSEMEM #include #endif diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 51c9e2c01640..ddd141cad77f 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -97,11 +97,11 @@ static struct page *dequeue_huge_page_vma(struct vm_area_struct *vma, struct mempolicy *mpol; struct zonelist *zonelist = huge_zonelist(vma, address, htlb_alloc_mask, &mpol); - struct zone **z; + struct zone *zone, **z; - for (z = zonelist->zones; *z; z++) { - nid = zone_to_nid(*z); - if (cpuset_zone_allowed_softwall(*z, htlb_alloc_mask) && + for_each_zone_zonelist(zone, z, zonelist, MAX_NR_ZONES - 1) { + nid = zone_to_nid(zone); + if (cpuset_zone_allowed_softwall(zone, htlb_alloc_mask) && !list_empty(&hugepage_freelists[nid])) { page = list_entry(hugepage_freelists[nid].next, struct page, lru); diff --git a/mm/oom_kill.c b/mm/oom_kill.c index beb592fe9389..2c93502cfcb4 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -175,12 +175,14 @@ static inline enum oom_constraint constrained_alloc(struct zonelist *zonelist, gfp_t gfp_mask) { #ifdef CONFIG_NUMA + struct zone *zone; struct zone **z; + enum zone_type high_zoneidx = gfp_zone(gfp_mask); nodemask_t nodes = node_states[N_HIGH_MEMORY]; - for (z = zonelist->zones; *z; z++) - if (cpuset_zone_allowed_softwall(*z, gfp_mask)) - node_clear(zone_to_nid(*z), nodes); + for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) + if (cpuset_zone_allowed_softwall(zone, gfp_mask)) + node_clear(zone_to_nid(zone), nodes); else return CONSTRAINT_CPUSET; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 187efd47a446..4ccb8651cf22 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1378,42 +1378,29 @@ static void zlc_mark_zone_full(struct zonelist *zonelist, struct zone **z) */ static struct page * get_page_from_freelist(gfp_t gfp_mask, unsigned int order, - struct zonelist *zonelist, int alloc_flags) + struct zonelist *zonelist, int high_zoneidx, int alloc_flags) { struct zone **z; struct page *page = NULL; - int classzone_idx = zone_idx(zonelist->zones[0]); + int classzone_idx; struct zone *zone, *preferred_zone; nodemask_t *allowednodes = NULL;/* zonelist_cache approximation */ int zlc_active = 0; /* set if using zonelist_cache */ int did_zlc_setup = 0; /* just call zlc_setup() one time */ - enum zone_type highest_zoneidx = -1; /* Gets set for policy zonelists */ + + z = first_zones_zonelist(zonelist, high_zoneidx); + classzone_idx = zone_idx(*z); + preferred_zone = *z; zonelist_scan: /* * Scan zonelist, looking for a zone with enough free. * See also cpuset_zone_allowed() comment in kernel/cpuset.c. */ - z = zonelist->zones; - preferred_zone = *z; - - do { - /* - * In NUMA, this could be a policy zonelist which contains - * zones that may not be allowed by the current gfp_mask. - * Check the zone is allowed by the current flags - */ - if (unlikely(alloc_should_filter_zonelist(zonelist))) { - if (highest_zoneidx == -1) - highest_zoneidx = gfp_zone(gfp_mask); - if (zone_idx(*z) > highest_zoneidx) - continue; - } - + for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) { if (NUMA_BUILD && zlc_active && !zlc_zone_worth_trying(zonelist, z, allowednodes)) continue; - zone = *z; if ((alloc_flags & ALLOC_CPUSET) && !cpuset_zone_allowed_softwall(zone, gfp_mask)) goto try_next_zone; @@ -1447,7 +1434,7 @@ try_next_zone: zlc_active = 1; did_zlc_setup = 1; } - } while (*(++z) != NULL); + } if (unlikely(NUMA_BUILD && page == NULL && zlc_active)) { /* Disable zlc cache for second zonelist scan */ @@ -1465,6 +1452,7 @@ __alloc_pages(gfp_t gfp_mask, unsigned int order, struct zonelist *zonelist) { const gfp_t wait = gfp_mask & __GFP_WAIT; + enum zone_type high_zoneidx = gfp_zone(gfp_mask); struct zone **z; struct page *page; struct reclaim_state reclaim_state; @@ -1490,7 +1478,7 @@ restart: } page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, order, - zonelist, ALLOC_WMARK_LOW|ALLOC_CPUSET); + zonelist, high_zoneidx, ALLOC_WMARK_LOW|ALLOC_CPUSET); if (page) goto got_pg; @@ -1534,7 +1522,8 @@ restart: * Ignore cpuset if GFP_ATOMIC (!wait) rather than fail alloc. * See also cpuset_zone_allowed() comment in kernel/cpuset.c. */ - page = get_page_from_freelist(gfp_mask, order, zonelist, alloc_flags); + page = get_page_from_freelist(gfp_mask, order, zonelist, + high_zoneidx, alloc_flags); if (page) goto got_pg; @@ -1547,7 +1536,7 @@ rebalance: nofail_alloc: /* go through the zonelist yet again, ignoring mins */ page = get_page_from_freelist(gfp_mask, order, - zonelist, ALLOC_NO_WATERMARKS); + zonelist, high_zoneidx, ALLOC_NO_WATERMARKS); if (page) goto got_pg; if (gfp_mask & __GFP_NOFAIL) { @@ -1582,7 +1571,7 @@ nofail_alloc: if (likely(did_some_progress)) { page = get_page_from_freelist(gfp_mask, order, - zonelist, alloc_flags); + zonelist, high_zoneidx, alloc_flags); if (page) goto got_pg; } else if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) { @@ -1598,7 +1587,7 @@ nofail_alloc: * under heavy pressure. */ page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, order, - zonelist, ALLOC_WMARK_HIGH|ALLOC_CPUSET); + zonelist, high_zoneidx, ALLOC_WMARK_HIGH|ALLOC_CPUSET); if (page) { clear_zonelist_oom(zonelist); goto got_pg; @@ -1713,14 +1702,15 @@ EXPORT_SYMBOL(free_pages); static unsigned int nr_free_zone_pages(int offset) { + struct zone **z; + struct zone *zone; + /* Just pick one node, since fallback list is circular */ unsigned int sum = 0; struct zonelist *zonelist = node_zonelist(numa_node_id(), GFP_KERNEL); - struct zone **zonep = zonelist->zones; - struct zone *zone; - for (zone = *zonep++; zone; zone = *zonep++) { + for_each_zone_zonelist(zone, z, zonelist, offset) { unsigned long size = zone->present_pages; unsigned long high = zone->pages_high; if (size > high) @@ -2078,17 +2068,15 @@ static int find_next_best_node(int node, nodemask_t *used_node_mask) */ static void build_zonelists_in_node_order(pg_data_t *pgdat, int node) { - enum zone_type i; int j; struct zonelist *zonelist; - for (i = 0; i < MAX_NR_ZONES; i++) { - zonelist = pgdat->node_zonelists + i; - for (j = 0; zonelist->zones[j] != NULL; j++) - ; - j = build_zonelists_node(NODE_DATA(node), zonelist, j, i); - zonelist->zones[j] = NULL; - } + zonelist = &pgdat->node_zonelists[0]; + for (j = 0; zonelist->zones[j] != NULL; j++) + ; + j = build_zonelists_node(NODE_DATA(node), zonelist, j, + MAX_NR_ZONES - 1); + zonelist->zones[j] = NULL; } /* @@ -2096,15 +2084,12 @@ static void build_zonelists_in_node_order(pg_data_t *pgdat, int node) */ static void build_thisnode_zonelists(pg_data_t *pgdat) { - enum zone_type i; int j; struct zonelist *zonelist; - for (i = 0; i < MAX_NR_ZONES; i++) { - zonelist = pgdat->node_zonelists + MAX_NR_ZONES + i; - j = build_zonelists_node(pgdat, zonelist, 0, i); - zonelist->zones[j] = NULL; - } + zonelist = &pgdat->node_zonelists[1]; + j = build_zonelists_node(pgdat, zonelist, 0, MAX_NR_ZONES - 1); + zonelist->zones[j] = NULL; } /* @@ -2117,27 +2102,24 @@ static int node_order[MAX_NUMNODES]; static void build_zonelists_in_zone_order(pg_data_t *pgdat, int nr_nodes) { - enum zone_type i; int pos, j, node; int zone_type; /* needs to be signed */ struct zone *z; struct zonelist *zonelist; - for (i = 0; i < MAX_NR_ZONES; i++) { - zonelist = pgdat->node_zonelists + i; - pos = 0; - for (zone_type = i; zone_type >= 0; zone_type--) { - for (j = 0; j < nr_nodes; j++) { - node = node_order[j]; - z = &NODE_DATA(node)->node_zones[zone_type]; - if (populated_zone(z)) { - zonelist->zones[pos++] = z; - check_highest_zone(zone_type); - } + zonelist = &pgdat->node_zonelists[0]; + pos = 0; + for (zone_type = MAX_NR_ZONES - 1; zone_type >= 0; zone_type--) { + for (j = 0; j < nr_nodes; j++) { + node = node_order[j]; + z = &NODE_DATA(node)->node_zones[zone_type]; + if (populated_zone(z)) { + zonelist->zones[pos++] = z; + check_highest_zone(zone_type); } } - zonelist->zones[pos] = NULL; } + zonelist->zones[pos] = NULL; } static int default_zonelist_order(void) @@ -2264,19 +2246,15 @@ static void build_zonelists(pg_data_t *pgdat) /* Construct the zonelist performance cache - see further mmzone.h */ static void build_zonelist_cache(pg_data_t *pgdat) { - int i; - - for (i = 0; i < MAX_NR_ZONES; i++) { - struct zonelist *zonelist; - struct zonelist_cache *zlc; - struct zone **z; + struct zonelist *zonelist; + struct zonelist_cache *zlc; + struct zone **z; - zonelist = pgdat->node_zonelists + i; - zonelist->zlcache_ptr = zlc = &zonelist->zlcache; - bitmap_zero(zlc->fullzones, MAX_ZONES_PER_ZONELIST); - for (z = zonelist->zones; *z; z++) - zlc->z_to_n[z - zonelist->zones] = zone_to_nid(*z); - } + zonelist = &pgdat->node_zonelists[0]; + zonelist->zlcache_ptr = zlc = &zonelist->zlcache; + bitmap_zero(zlc->fullzones, MAX_ZONES_PER_ZONELIST); + for (z = zonelist->zones; *z; z++) + zlc->z_to_n[z - zonelist->zones] = zone_to_nid(*z); } @@ -2290,45 +2268,43 @@ static void set_zonelist_order(void) static void build_zonelists(pg_data_t *pgdat) { int node, local_node; - enum zone_type i,j; + enum zone_type j; + struct zonelist *zonelist; local_node = pgdat->node_id; - for (i = 0; i < MAX_NR_ZONES; i++) { - struct zonelist *zonelist; - zonelist = pgdat->node_zonelists + i; + zonelist = &pgdat->node_zonelists[0]; + j = build_zonelists_node(pgdat, zonelist, 0, MAX_NR_ZONES - 1); - j = build_zonelists_node(pgdat, zonelist, 0, i); - /* - * Now we build the zonelist so that it contains the zones - * of all the other nodes. - * We don't want to pressure a particular node, so when - * building the zones for node N, we make sure that the - * zones coming right after the local ones are those from - * node N+1 (modulo N) - */ - for (node = local_node + 1; node < MAX_NUMNODES; node++) { - if (!node_online(node)) - continue; - j = build_zonelists_node(NODE_DATA(node), zonelist, j, i); - } - for (node = 0; node < local_node; node++) { - if (!node_online(node)) - continue; - j = build_zonelists_node(NODE_DATA(node), zonelist, j, i); - } - - zonelist->zones[j] = NULL; + /* + * Now we build the zonelist so that it contains the zones + * of all the other nodes. + * We don't want to pressure a particular node, so when + * building the zones for node N, we make sure that the + * zones coming right after the local ones are those from + * node N+1 (modulo N) + */ + for (node = local_node + 1; node < MAX_NUMNODES; node++) { + if (!node_online(node)) + continue; + j = build_zonelists_node(NODE_DATA(node), zonelist, j, + MAX_NR_ZONES - 1); } + for (node = 0; node < local_node; node++) { + if (!node_online(node)) + continue; + j = build_zonelists_node(NODE_DATA(node), zonelist, j, + MAX_NR_ZONES - 1); + } + + zonelist->zones[j] = NULL; } /* non-NUMA variant of zonelist performance cache - just NULL zlcache_ptr */ static void build_zonelist_cache(pg_data_t *pgdat) { - int i; - - for (i = 0; i < MAX_NR_ZONES; i++) - pgdat->node_zonelists[i].zlcache_ptr = NULL; + pgdat->node_zonelists[0].zlcache_ptr = NULL; + pgdat->node_zonelists[1].zlcache_ptr = NULL; } #endif /* CONFIG_NUMA */ diff --git a/mm/slab.c b/mm/slab.c index 5488c54b1172..29851841da62 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -3243,6 +3243,8 @@ static void *fallback_alloc(struct kmem_cache *cache, gfp_t flags) struct zonelist *zonelist; gfp_t local_flags; struct zone **z; + struct zone *zone; + enum zone_type high_zoneidx = gfp_zone(flags); void *obj = NULL; int nid; @@ -3257,10 +3259,10 @@ retry: * Look through allowed nodes for objects available * from existing per node queues. */ - for (z = zonelist->zones; *z && !obj; z++) { - nid = zone_to_nid(*z); + for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) { + nid = zone_to_nid(zone); - if (cpuset_zone_allowed_hardwall(*z, flags) && + if (cpuset_zone_allowed_hardwall(zone, flags) && cache->nodelists[nid] && cache->nodelists[nid]->free_objects) obj = ____cache_alloc_node(cache, diff --git a/mm/slub.c b/mm/slub.c index 19ebbfb20689..80d20cc1c0f8 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1285,6 +1285,8 @@ static struct page *get_any_partial(struct kmem_cache *s, gfp_t flags) #ifdef CONFIG_NUMA struct zonelist *zonelist; struct zone **z; + struct zone *zone; + enum zone_type high_zoneidx = gfp_zone(flags); struct page *page; /* @@ -1310,12 +1312,12 @@ static struct page *get_any_partial(struct kmem_cache *s, gfp_t flags) return NULL; zonelist = node_zonelist(slab_node(current->mempolicy), flags); - for (z = zonelist->zones; *z; z++) { + for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) { struct kmem_cache_node *n; - n = get_node(s, zone_to_nid(*z)); + n = get_node(s, zone_to_nid(zone)); - if (n && cpuset_zone_allowed_hardwall(*z, flags) && + if (n && cpuset_zone_allowed_hardwall(zone, flags) && n->nr_partial > MIN_PARTIAL) { page = get_partial_node(n); if (page) diff --git a/mm/vmscan.c b/mm/vmscan.c index ef8551e0d2d0..0515b8f44894 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1249,15 +1249,13 @@ static unsigned long shrink_zone(int priority, struct zone *zone, static unsigned long shrink_zones(int priority, struct zonelist *zonelist, struct scan_control *sc) { + enum zone_type high_zoneidx = gfp_zone(sc->gfp_mask); unsigned long nr_reclaimed = 0; - struct zone **zones = zonelist->zones; - int i; - + struct zone **z; + struct zone *zone; sc->all_unreclaimable = 1; - for (i = 0; zones[i] != NULL; i++) { - struct zone *zone = zones[i]; - + for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) { if (!populated_zone(zone)) continue; /* @@ -1311,8 +1309,9 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist, unsigned long nr_reclaimed = 0; struct reclaim_state *reclaim_state = current->reclaim_state; unsigned long lru_pages = 0; - struct zone **zones = zonelist->zones; - int i; + struct zone **z; + struct zone *zone; + enum zone_type high_zoneidx = gfp_zone(gfp_mask); if (scan_global_lru(sc)) count_vm_event(ALLOCSTALL); @@ -1320,8 +1319,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist, * mem_cgroup will not do shrink_slab. */ if (scan_global_lru(sc)) { - for (i = 0; zones[i] != NULL; i++) { - struct zone *zone = zones[i]; + for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) { if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL)) continue; @@ -1385,8 +1383,7 @@ out: priority = 0; if (scan_global_lru(sc)) { - for (i = 0; zones[i] != NULL; i++) { - struct zone *zone = zones[i]; + for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) { if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL)) continue; -- cgit v1.2.3 From dd1a239f6f2d4d3eedd318583ec319aa145b324c Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Mon, 28 Apr 2008 02:12:17 -0700 Subject: mm: have zonelist contains structs with both a zone pointer and zone_idx Filtering zonelists requires very frequent use of zone_idx(). This is costly as it involves a lookup of another structure and a substraction operation. As the zone_idx is often required, it should be quickly accessible. The node idx could also be stored here if it was found that accessing zone->node is significant which may be the case on workloads where nodemasks are heavily used. This patch introduces a struct zoneref to store a zone pointer and a zone index. The zonelist then consists of an array of these struct zonerefs which are looked up as necessary. Helpers are given for accessing the zone index as well as the node index. [kamezawa.hiroyu@jp.fujitsu.com: Suggested struct zoneref instead of embedding information in pointers] [hugh@veritas.com: mm-have-zonelist: fix memcg ooms] [hugh@veritas.com: just return do_try_to_free_pages] [hugh@veritas.com: do_try_to_free_pages gfp_mask redundant] Signed-off-by: Mel Gorman Acked-by: Christoph Lameter Acked-by: David Rientjes Signed-off-by: Lee Schermerhorn Cc: KAMEZAWA Hiroyuki Cc: Mel Gorman Cc: Christoph Lameter Cc: Nick Piggin Signed-off-by: Hugh Dickins Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/parisc/mm/init.c | 2 +- fs/buffer.c | 6 ++--- include/linux/mmzone.h | 64 +++++++++++++++++++++++++++++++++++++++-------- include/linux/oom.h | 4 +-- kernel/cpuset.c | 4 +-- mm/hugetlb.c | 3 ++- mm/mempolicy.c | 36 +++++++++++++++----------- mm/oom_kill.c | 45 ++++++++++++++++----------------- mm/page_alloc.c | 68 ++++++++++++++++++++++++++++---------------------- mm/slab.c | 2 +- mm/slub.c | 2 +- mm/vmscan.c | 22 ++++++++-------- 12 files changed, 158 insertions(+), 100 deletions(-) diff --git a/arch/parisc/mm/init.c b/arch/parisc/mm/init.c index 9bb6136d77c2..1f012843150f 100644 --- a/arch/parisc/mm/init.c +++ b/arch/parisc/mm/init.c @@ -608,7 +608,7 @@ void show_mem(void) for (i = 0; i < npmem_ranges; i++) { zl = node_zonelist(i); for (j = 0; j < MAX_NR_ZONES; j++) { - struct zone **z; + struct zoneref *z; struct zone *zone; printk("Zone list for zone %d on node %d: ", j, i); diff --git a/fs/buffer.c b/fs/buffer.c index 9b5434a80479..ac84cd13075d 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -360,16 +360,16 @@ void invalidate_bdev(struct block_device *bdev) */ static void free_more_memory(void) { - struct zone **zones; + struct zoneref *zrefs; int nid; wakeup_pdflush(1024); yield(); for_each_online_node(nid) { - zones = first_zones_zonelist(node_zonelist(nid, GFP_NOFS), + zrefs = first_zones_zonelist(node_zonelist(nid, GFP_NOFS), gfp_zone(GFP_NOFS)); - if (*zones) + if (zrefs->zone) try_to_free_pages(node_zonelist(nid, GFP_NOFS), 0, GFP_NOFS); } diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index d5c33a0b89e9..d34b4c290017 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -468,6 +468,15 @@ struct zonelist_cache { struct zonelist_cache; #endif +/* + * This struct contains information about a zone in a zonelist. It is stored + * here to avoid dereferences into large structures and lookups of tables + */ +struct zoneref { + struct zone *zone; /* Pointer to actual zone */ + int zone_idx; /* zone_idx(zoneref->zone) */ +}; + /* * One allocation request operates on a zonelist. A zonelist * is a list of zones, the first one is the 'goal' of the @@ -476,11 +485,18 @@ struct zonelist_cache; * * If zlcache_ptr is not NULL, then it is just the address of zlcache, * as explained above. If zlcache_ptr is NULL, there is no zlcache. + * * + * To speed the reading of the zonelist, the zonerefs contain the zone index + * of the entry being read. Helper functions to access information given + * a struct zoneref are + * + * zonelist_zone() - Return the struct zone * for an entry in _zonerefs + * zonelist_zone_idx() - Return the index of the zone for an entry + * zonelist_node_idx() - Return the index of the node for an entry */ - struct zonelist { struct zonelist_cache *zlcache_ptr; // NULL or &zlcache - struct zone *zones[MAX_ZONES_PER_ZONELIST + 1]; // NULL delimited + struct zoneref _zonerefs[MAX_ZONES_PER_ZONELIST + 1]; #ifdef CONFIG_NUMA struct zonelist_cache zlcache; // optional ... #endif @@ -713,26 +729,52 @@ extern struct zone *next_zone(struct zone *zone); zone; \ zone = next_zone(zone)) +static inline struct zone *zonelist_zone(struct zoneref *zoneref) +{ + return zoneref->zone; +} + +static inline int zonelist_zone_idx(struct zoneref *zoneref) +{ + return zoneref->zone_idx; +} + +static inline int zonelist_node_idx(struct zoneref *zoneref) +{ +#ifdef CONFIG_NUMA + /* zone_to_nid not available in this context */ + return zoneref->zone->node; +#else + return 0; +#endif /* CONFIG_NUMA */ +} + +static inline void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref) +{ + zoneref->zone = zone; + zoneref->zone_idx = zone_idx(zone); +} + /* Returns the first zone at or below highest_zoneidx in a zonelist */ -static inline struct zone **first_zones_zonelist(struct zonelist *zonelist, +static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist, enum zone_type highest_zoneidx) { - struct zone **z; + struct zoneref *z; /* Find the first suitable zone to use for the allocation */ - z = zonelist->zones; - while (*z && zone_idx(*z) > highest_zoneidx) + z = zonelist->_zonerefs; + while (zonelist_zone_idx(z) > highest_zoneidx) z++; return z; } /* Returns the next zone at or below highest_zoneidx in a zonelist */ -static inline struct zone **next_zones_zonelist(struct zone **z, +static inline struct zoneref *next_zones_zonelist(struct zoneref *z, enum zone_type highest_zoneidx) { /* Find the next suitable zone to use for the allocation */ - while (*z && zone_idx(*z) > highest_zoneidx) + while (zonelist_zone_idx(z) > highest_zoneidx) z++; return z; @@ -748,9 +790,11 @@ static inline struct zone **next_zones_zonelist(struct zone **z, * This iterator iterates though all zones at or below a given zone index. */ #define for_each_zone_zonelist(zone, z, zlist, highidx) \ - for (z = first_zones_zonelist(zlist, highidx), zone = *z++; \ + for (z = first_zones_zonelist(zlist, highidx), \ + zone = zonelist_zone(z++); \ zone; \ - z = next_zones_zonelist(z, highidx), zone = *z++) + z = next_zones_zonelist(z, highidx), \ + zone = zonelist_zone(z++)) #ifdef CONFIG_SPARSEMEM #include diff --git a/include/linux/oom.h b/include/linux/oom.h index 3852436b652a..a7979baf1e39 100644 --- a/include/linux/oom.h +++ b/include/linux/oom.h @@ -23,8 +23,8 @@ enum oom_constraint { CONSTRAINT_MEMORY_POLICY, }; -extern int try_set_zone_oom(struct zonelist *zonelist); -extern void clear_zonelist_oom(struct zonelist *zonelist); +extern int try_set_zone_oom(struct zonelist *zonelist, gfp_t gfp_flags); +extern void clear_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_flags); extern void out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask, int order); extern int register_oom_notifier(struct notifier_block *nb); diff --git a/kernel/cpuset.c b/kernel/cpuset.c index 8b35fbd8292f..a220b13cbfaf 100644 --- a/kernel/cpuset.c +++ b/kernel/cpuset.c @@ -1967,8 +1967,8 @@ int cpuset_zonelist_valid_mems_allowed(struct zonelist *zl) { int i; - for (i = 0; zl->zones[i]; i++) { - int nid = zone_to_nid(zl->zones[i]); + for (i = 0; zl->_zonerefs[i].zone; i++) { + int nid = zonelist_node_idx(&zl->_zonerefs[i]); if (node_isset(nid, current->mems_allowed)) return 1; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ddd141cad77f..4bced0d705ca 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -97,7 +97,8 @@ static struct page *dequeue_huge_page_vma(struct vm_area_struct *vma, struct mempolicy *mpol; struct zonelist *zonelist = huge_zonelist(vma, address, htlb_alloc_mask, &mpol); - struct zone *zone, **z; + struct zone *zone; + struct zoneref *z; for_each_zone_zonelist(zone, z, zonelist, MAX_NR_ZONES - 1) { nid = zone_to_nid(zone); diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 5d20bf44062f..90193a2a915b 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -186,7 +186,7 @@ static struct zonelist *bind_zonelist(nodemask_t *nodes) for_each_node_mask(nd, *nodes) { struct zone *z = &NODE_DATA(nd)->node_zones[k]; if (z->present_pages > 0) - zl->zones[num++] = z; + zoneref_set_zone(z, &zl->_zonerefs[num++]); } if (k == 0) break; @@ -196,7 +196,8 @@ static struct zonelist *bind_zonelist(nodemask_t *nodes) kfree(zl); return ERR_PTR(-EINVAL); } - zl->zones[num] = NULL; + zl->_zonerefs[num].zone = NULL; + zl->_zonerefs[num].zone_idx = 0; return zl; } @@ -504,9 +505,11 @@ static void get_zonemask(struct mempolicy *p, nodemask_t *nodes) nodes_clear(*nodes); switch (p->policy) { case MPOL_BIND: - for (i = 0; p->v.zonelist->zones[i]; i++) - node_set(zone_to_nid(p->v.zonelist->zones[i]), - *nodes); + for (i = 0; p->v.zonelist->_zonerefs[i].zone; i++) { + struct zoneref *zref; + zref = &p->v.zonelist->_zonerefs[i]; + node_set(zonelist_node_idx(zref), *nodes); + } break; case MPOL_DEFAULT: break; @@ -1212,12 +1215,13 @@ unsigned slab_node(struct mempolicy *policy) case MPOL_INTERLEAVE: return interleave_nodes(policy); - case MPOL_BIND: + case MPOL_BIND: { /* * Follow bind policy behavior and start allocation at the * first node. */ - return zone_to_nid(policy->v.zonelist->zones[0]); + return zonelist_node_idx(policy->v.zonelist->_zonerefs); + } case MPOL_PREFERRED: if (policy->v.preferred_node >= 0) @@ -1323,7 +1327,7 @@ static struct page *alloc_page_interleave(gfp_t gfp, unsigned order, zl = node_zonelist(nid, gfp); page = __alloc_pages(gfp, order, zl); - if (page && page_zone(page) == zl->zones[0]) + if (page && page_zone(page) == zonelist_zone(&zl->_zonerefs[0])) inc_zone_page_state(page, NUMA_INTERLEAVE_HIT); return page; } @@ -1463,10 +1467,14 @@ int __mpol_equal(struct mempolicy *a, struct mempolicy *b) return a->v.preferred_node == b->v.preferred_node; case MPOL_BIND: { int i; - for (i = 0; a->v.zonelist->zones[i]; i++) - if (a->v.zonelist->zones[i] != b->v.zonelist->zones[i]) + for (i = 0; a->v.zonelist->_zonerefs[i].zone; i++) { + struct zone *za, *zb; + za = zonelist_zone(&a->v.zonelist->_zonerefs[i]); + zb = zonelist_zone(&b->v.zonelist->_zonerefs[i]); + if (za != zb) return 0; - return b->v.zonelist->zones[i] == NULL; + } + return b->v.zonelist->_zonerefs[i].zone == NULL; } default: BUG(); @@ -1785,12 +1793,12 @@ static void mpol_rebind_policy(struct mempolicy *pol, break; case MPOL_BIND: { nodemask_t nodes; - struct zone **z; + struct zoneref *z; struct zonelist *zonelist; nodes_clear(nodes); - for (z = pol->v.zonelist->zones; *z; z++) - node_set(zone_to_nid(*z), nodes); + for (z = pol->v.zonelist->_zonerefs; z->zone; z++) + node_set(zonelist_node_idx(z), nodes); nodes_remap(tmp, nodes, *mpolmask, *newmask); nodes = tmp; diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 2c93502cfcb4..e41504aa5da9 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -176,7 +176,7 @@ static inline enum oom_constraint constrained_alloc(struct zonelist *zonelist, { #ifdef CONFIG_NUMA struct zone *zone; - struct zone **z; + struct zoneref *z; enum zone_type high_zoneidx = gfp_zone(gfp_mask); nodemask_t nodes = node_states[N_HIGH_MEMORY]; @@ -462,29 +462,29 @@ EXPORT_SYMBOL_GPL(unregister_oom_notifier); * if a parallel OOM killing is already taking place that includes a zone in * the zonelist. Otherwise, locks all zones in the zonelist and returns 1. */ -int try_set_zone_oom(struct zonelist *zonelist) +int try_set_zone_oom(struct zonelist *zonelist, gfp_t gfp_mask) { - struct zone **z; + struct zoneref *z; + struct zone *zone; int ret = 1; - z = zonelist->zones; - spin_lock(&zone_scan_mutex); - do { - if (zone_is_oom_locked(*z)) { + for_each_zone_zonelist(zone, z, zonelist, gfp_zone(gfp_mask)) { + if (zone_is_oom_locked(zone)) { ret = 0; goto out; } - } while (*(++z) != NULL); + } + + for_each_zone_zonelist(zone, z, zonelist, gfp_zone(gfp_mask)) { + /* + * Lock each zone in the zonelist under zone_scan_mutex so a + * parallel invocation of try_set_zone_oom() doesn't succeed + * when it shouldn't. + */ + zone_set_flag(zone, ZONE_OOM_LOCKED); + } - /* - * Lock each zone in the zonelist under zone_scan_mutex so a parallel - * invocation of try_set_zone_oom() doesn't succeed when it shouldn't. - */ - z = zonelist->zones; - do { - zone_set_flag(*z, ZONE_OOM_LOCKED); - } while (*(++z) != NULL); out: spin_unlock(&zone_scan_mutex); return ret; @@ -495,16 +495,15 @@ out: * allocation attempts with zonelists containing them may now recall the OOM * killer, if necessary. */ -void clear_zonelist_oom(struct zonelist *zonelist) +void clear_zonelist_oom(struct zonelist *zonelist, gfp_t gfp_mask) { - struct zone **z; - - z = zonelist->zones; + struct zoneref *z; + struct zone *zone; spin_lock(&zone_scan_mutex); - do { - zone_clear_flag(*z, ZONE_OOM_LOCKED); - } while (*(++z) != NULL); + for_each_zone_zonelist(zone, z, zonelist, gfp_zone(gfp_mask)) { + zone_clear_flag(zone, ZONE_OOM_LOCKED); + } spin_unlock(&zone_scan_mutex); } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 4ccb8651cf22..6d94d04ea784 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1317,7 +1317,7 @@ static nodemask_t *zlc_setup(struct zonelist *zonelist, int alloc_flags) * We are low on memory in the second scan, and should leave no stone * unturned looking for a free page. */ -static int zlc_zone_worth_trying(struct zonelist *zonelist, struct zone **z, +static int zlc_zone_worth_trying(struct zonelist *zonelist, struct zoneref *z, nodemask_t *allowednodes) { struct zonelist_cache *zlc; /* cached zonelist speedup info */ @@ -1328,7 +1328,7 @@ static int zlc_zone_worth_trying(struct zonelist *zonelist, struct zone **z, if (!zlc) return 1; - i = z - zonelist->zones; + i = z - zonelist->_zonerefs; n = zlc->z_to_n[i]; /* This zone is worth trying if it is allowed but not full */ @@ -1340,7 +1340,7 @@ static int zlc_zone_worth_trying(struct zonelist *zonelist, struct zone **z, * zlc->fullzones, so that subsequent attempts to allocate a page * from that zone don't waste time re-examining it. */ -static void zlc_mark_zone_full(struct zonelist *zonelist, struct zone **z) +static void zlc_mark_zone_full(struct zonelist *zonelist, struct zoneref *z) { struct zonelist_cache *zlc; /* cached zonelist speedup info */ int i; /* index of *z in zonelist zones */ @@ -1349,7 +1349,7 @@ static void zlc_mark_zone_full(struct zonelist *zonelist, struct zone **z) if (!zlc) return; - i = z - zonelist->zones; + i = z - zonelist->_zonerefs; set_bit(i, zlc->fullzones); } @@ -1361,13 +1361,13 @@ static nodemask_t *zlc_setup(struct zonelist *zonelist, int alloc_flags) return NULL; } -static int zlc_zone_worth_trying(struct zonelist *zonelist, struct zone **z, +static int zlc_zone_worth_trying(struct zonelist *zonelist, struct zoneref *z, nodemask_t *allowednodes) { return 1; } -static void zlc_mark_zone_full(struct zonelist *zonelist, struct zone **z) +static void zlc_mark_zone_full(struct zonelist *zonelist, struct zoneref *z) { } #endif /* CONFIG_NUMA */ @@ -1380,7 +1380,7 @@ static struct page * get_page_from_freelist(gfp_t gfp_mask, unsigned int order, struct zonelist *zonelist, int high_zoneidx, int alloc_flags) { - struct zone **z; + struct zoneref *z; struct page *page = NULL; int classzone_idx; struct zone *zone, *preferred_zone; @@ -1389,8 +1389,8 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int did_zlc_setup = 0; /* just call zlc_setup() one time */ z = first_zones_zonelist(zonelist, high_zoneidx); - classzone_idx = zone_idx(*z); - preferred_zone = *z; + classzone_idx = zonelist_zone_idx(z); + preferred_zone = zonelist_zone(z); zonelist_scan: /* @@ -1453,7 +1453,8 @@ __alloc_pages(gfp_t gfp_mask, unsigned int order, { const gfp_t wait = gfp_mask & __GFP_WAIT; enum zone_type high_zoneidx = gfp_zone(gfp_mask); - struct zone **z; + struct zoneref *z; + struct zone *zone; struct page *page; struct reclaim_state reclaim_state; struct task_struct *p = current; @@ -1467,9 +1468,9 @@ __alloc_pages(gfp_t gfp_mask, unsigned int order, return NULL; restart: - z = zonelist->zones; /* the list of zones suitable for gfp_mask */ + z = zonelist->_zonerefs; /* the list of zones suitable for gfp_mask */ - if (unlikely(*z == NULL)) { + if (unlikely(!z->zone)) { /* * Happens if we have an empty zonelist as a result of * GFP_THISNODE being used on a memoryless node @@ -1493,8 +1494,8 @@ restart: if (NUMA_BUILD && (gfp_mask & GFP_THISNODE) == GFP_THISNODE) goto nopage; - for (z = zonelist->zones; *z; z++) - wakeup_kswapd(*z, order); + for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) + wakeup_kswapd(zone, order); /* * OK, we're below the kswapd watermark and have kicked background @@ -1575,7 +1576,7 @@ nofail_alloc: if (page) goto got_pg; } else if ((gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY)) { - if (!try_set_zone_oom(zonelist)) { + if (!try_set_zone_oom(zonelist, gfp_mask)) { schedule_timeout_uninterruptible(1); goto restart; } @@ -1589,18 +1590,18 @@ nofail_alloc: page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, order, zonelist, high_zoneidx, ALLOC_WMARK_HIGH|ALLOC_CPUSET); if (page) { - clear_zonelist_oom(zonelist); + clear_zonelist_oom(zonelist, gfp_mask); goto got_pg; } /* The OOM killer will not help higher order allocs so fail */ if (order > PAGE_ALLOC_COSTLY_ORDER) { - clear_zonelist_oom(zonelist); + clear_zonelist_oom(zonelist, gfp_mask); goto nopage; } out_of_memory(zonelist, gfp_mask, order); - clear_zonelist_oom(zonelist); + clear_zonelist_oom(zonelist, gfp_mask); goto restart; } @@ -1702,7 +1703,7 @@ EXPORT_SYMBOL(free_pages); static unsigned int nr_free_zone_pages(int offset) { - struct zone **z; + struct zoneref *z; struct zone *zone; /* Just pick one node, since fallback list is circular */ @@ -1896,7 +1897,8 @@ static int build_zonelists_node(pg_data_t *pgdat, struct zonelist *zonelist, zone_type--; zone = pgdat->node_zones + zone_type; if (populated_zone(zone)) { - zonelist->zones[nr_zones++] = zone; + zoneref_set_zone(zone, + &zonelist->_zonerefs[nr_zones++]); check_highest_zone(zone_type); } @@ -2072,11 +2074,12 @@ static void build_zonelists_in_node_order(pg_data_t *pgdat, int node) struct zonelist *zonelist; zonelist = &pgdat->node_zonelists[0]; - for (j = 0; zonelist->zones[j] != NULL; j++) + for (j = 0; zonelist->_zonerefs[j].zone != NULL; j++) ; j = build_zonelists_node(NODE_DATA(node), zonelist, j, MAX_NR_ZONES - 1); - zonelist->zones[j] = NULL; + zonelist->_zonerefs[j].zone = NULL; + zonelist->_zonerefs[j].zone_idx = 0; } /* @@ -2089,7 +2092,8 @@ static void build_thisnode_zonelists(pg_data_t *pgdat) zonelist = &pgdat->node_zonelists[1]; j = build_zonelists_node(pgdat, zonelist, 0, MAX_NR_ZONES - 1); - zonelist->zones[j] = NULL; + zonelist->_zonerefs[j].zone = NULL; + zonelist->_zonerefs[j].zone_idx = 0; } /* @@ -2114,12 +2118,14 @@ static void build_zonelists_in_zone_order(pg_data_t *pgdat, int nr_nodes) node = node_order[j]; z = &NODE_DATA(node)->node_zones[zone_type]; if (populated_zone(z)) { - zonelist->zones[pos++] = z; + zoneref_set_zone(z, + &zonelist->_zonerefs[pos++]); check_highest_zone(zone_type); } } } - zonelist->zones[pos] = NULL; + zonelist->_zonerefs[pos].zone = NULL; + zonelist->_zonerefs[pos].zone_idx = 0; } static int default_zonelist_order(void) @@ -2196,7 +2202,8 @@ static void build_zonelists(pg_data_t *pgdat) /* initialize zonelists */ for (i = 0; i < MAX_ZONELISTS; i++) { zonelist = pgdat->node_zonelists + i; - zonelist->zones[0] = NULL; + zonelist->_zonerefs[0].zone = NULL; + zonelist->_zonerefs[0].zone_idx = 0; } /* NUMA-aware ordering of nodes */ @@ -2248,13 +2255,13 @@ static void build_zonelist_cache(pg_data_t *pgdat) { struct zonelist *zonelist; struct zonelist_cache *zlc; - struct zone **z; + struct zoneref *z; zonelist = &pgdat->node_zonelists[0]; zonelist->zlcache_ptr = zlc = &zonelist->zlcache; bitmap_zero(zlc->fullzones, MAX_ZONES_PER_ZONELIST); - for (z = zonelist->zones; *z; z++) - zlc->z_to_n[z - zonelist->zones] = zone_to_nid(*z); + for (z = zonelist->_zonerefs; z->zone; z++) + zlc->z_to_n[z - zonelist->_zonerefs] = zonelist_node_idx(z); } @@ -2297,7 +2304,8 @@ static void build_zonelists(pg_data_t *pgdat) MAX_NR_ZONES - 1); } - zonelist->zones[j] = NULL; + zonelist->_zonerefs[j].zone = NULL; + zonelist->_zonerefs[j].zone_idx = 0; } /* non-NUMA variant of zonelist performance cache - just NULL zlcache_ptr */ diff --git a/mm/slab.c b/mm/slab.c index 29851841da62..7bc4a136846e 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -3242,7 +3242,7 @@ static void *fallback_alloc(struct kmem_cache *cache, gfp_t flags) { struct zonelist *zonelist; gfp_t local_flags; - struct zone **z; + struct zoneref *z; struct zone *zone; enum zone_type high_zoneidx = gfp_zone(flags); void *obj = NULL; diff --git a/mm/slub.c b/mm/slub.c index 80d20cc1c0f8..48fff83a1e9d 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1284,7 +1284,7 @@ static struct page *get_any_partial(struct kmem_cache *s, gfp_t flags) { #ifdef CONFIG_NUMA struct zonelist *zonelist; - struct zone **z; + struct zoneref *z; struct zone *zone; enum zone_type high_zoneidx = gfp_zone(flags); struct page *page; diff --git a/mm/vmscan.c b/mm/vmscan.c index 0515b8f44894..eceac9f9032f 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1251,7 +1251,7 @@ static unsigned long shrink_zones(int priority, struct zonelist *zonelist, { enum zone_type high_zoneidx = gfp_zone(sc->gfp_mask); unsigned long nr_reclaimed = 0; - struct zone **z; + struct zoneref *z; struct zone *zone; sc->all_unreclaimable = 1; @@ -1301,7 +1301,7 @@ static unsigned long shrink_zones(int priority, struct zonelist *zonelist, * allocation attempt will fail. */ static unsigned long do_try_to_free_pages(struct zonelist *zonelist, - gfp_t gfp_mask, struct scan_control *sc) + struct scan_control *sc) { int priority; int ret = 0; @@ -1309,9 +1309,9 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist, unsigned long nr_reclaimed = 0; struct reclaim_state *reclaim_state = current->reclaim_state; unsigned long lru_pages = 0; - struct zone **z; + struct zoneref *z; struct zone *zone; - enum zone_type high_zoneidx = gfp_zone(gfp_mask); + enum zone_type high_zoneidx = gfp_zone(sc->gfp_mask); if (scan_global_lru(sc)) count_vm_event(ALLOCSTALL); @@ -1339,7 +1339,7 @@ static unsigned long do_try_to_free_pages(struct zonelist *zonelist, * over limit cgroups */ if (scan_global_lru(sc)) { - shrink_slab(sc->nr_scanned, gfp_mask, lru_pages); + shrink_slab(sc->nr_scanned, sc->gfp_mask, lru_pages); if (reclaim_state) { nr_reclaimed += reclaim_state->reclaimed_slab; reclaim_state->reclaimed_slab = 0; @@ -1410,7 +1410,7 @@ unsigned long try_to_free_pages(struct zonelist *zonelist, int order, .isolate_pages = isolate_pages_global, }; - return do_try_to_free_pages(zonelist, gfp_mask, &sc); + return do_try_to_free_pages(zonelist, &sc); } #ifdef CONFIG_CGROUP_MEM_RES_CTLR @@ -1419,7 +1419,6 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont, gfp_t gfp_mask) { struct scan_control sc = { - .gfp_mask = gfp_mask, .may_writepage = !laptop_mode, .may_swap = 1, .swap_cluster_max = SWAP_CLUSTER_MAX, @@ -1429,12 +1428,11 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont, .isolate_pages = mem_cgroup_isolate_pages, }; struct zonelist *zonelist; - int target_zone = gfp_zone(GFP_HIGHUSER_MOVABLE); - zonelist = &NODE_DATA(numa_node_id())->node_zonelists[target_zone]; - if (do_try_to_free_pages(zonelist, sc.gfp_mask, &sc)) - return 1; - return 0; + sc.gfp_mask = (gfp_mask & GFP_RECLAIM_MASK) | + (GFP_HIGHUSER_MOVABLE & ~GFP_RECLAIM_MASK); + zonelist = NODE_DATA(numa_node_id())->node_zonelists; + return do_try_to_free_pages(zonelist, &sc); } #endif -- cgit v1.2.3 From 19770b32609b6bf97a3dece2529089494cbfc549 Mon Sep 17 00:00:00 2001 From: Mel Gorman Date: Mon, 28 Apr 2008 02:12:18 -0700 Subject: mm: filter based on a nodemask as well as a gfp_mask The MPOL_BIND policy creates a zonelist that is used for allocations controlled by that mempolicy. As the per-node zonelist is already being filtered based on a zone id, this patch adds a version of __alloc_pages() that takes a nodemask for further filtering. This eliminates the need for MPOL_BIND to create a custom zonelist. A positive benefit of this is that allocations using MPOL_BIND now use the local node's distance-ordered zonelist instead of a custom node-id-ordered zonelist. I.e., pages will be allocated from the closest allowed node with available memory. [Lee.Schermerhorn@hp.com: Mempolicy: update stale documentation and comments] [Lee.Schermerhorn@hp.com: Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask] [Lee.Schermerhorn@hp.com: Mempolicy: make dequeue_huge_page_vma() obey MPOL_BIND nodemask rework] Signed-off-by: Mel Gorman Acked-by: Christoph Lameter Signed-off-by: Lee Schermerhorn Cc: KAMEZAWA Hiroyuki Cc: Mel Gorman Cc: Hugh Dickins Cc: Nick Piggin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/numa_memory_policy.txt | 11 +- fs/buffer.c | 9 +- include/linux/cpuset.h | 4 +- include/linux/gfp.h | 4 + include/linux/mempolicy.h | 19 ++-- include/linux/mmzone.h | 80 ++++++++------ kernel/cpuset.c | 18 +--- mm/hugetlb.c | 6 +- mm/mempolicy.c | 184 +++++++++++++------------------- mm/mmzone.c | 30 ++++++ mm/page_alloc.c | 50 ++++++--- 11 files changed, 224 insertions(+), 191 deletions(-) diff --git a/Documentation/vm/numa_memory_policy.txt b/Documentation/vm/numa_memory_policy.txt index dd4986497996..1278e685d650 100644 --- a/Documentation/vm/numa_memory_policy.txt +++ b/Documentation/vm/numa_memory_policy.txt @@ -182,14 +182,9 @@ Components of Memory Policies The Default mode does not use the optional set of nodes. MPOL_BIND: This mode specifies that memory must come from the - set of nodes specified by the policy. - - The memory policy APIs do not specify an order in which the nodes - will be searched. However, unlike "local allocation", the Bind - policy does not consider the distance between the nodes. Rather, - allocations will fallback to the nodes specified by the policy in - order of numeric node id. Like everything in Linux, this is subject - to change. + set of nodes specified by the policy. Memory will be allocated from + the node in the set with sufficient free memory that is closest to + the node where the allocation takes place. MPOL_PREFERRED: This mode specifies that the allocation should be attempted from the single node specified in the policy. If that diff --git a/fs/buffer.c b/fs/buffer.c index ac84cd13075d..7d51e649b19a 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -360,16 +360,17 @@ void invalidate_bdev(struct block_device *bdev) */ static void free_more_memory(void) { - struct zoneref *zrefs; + struct zone *zone; int nid; wakeup_pdflush(1024); yield(); for_each_online_node(nid) { - zrefs = first_zones_zonelist(node_zonelist(nid, GFP_NOFS), - gfp_zone(GFP_NOFS)); - if (zrefs->zone) + (void)first_zones_zonelist(node_zonelist(nid, GFP_NOFS), + gfp_zone(GFP_NOFS), NULL, + &zone); + if (zone) try_to_free_pages(node_zonelist(nid, GFP_NOFS), 0, GFP_NOFS); } diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h index 726761e24003..038578362b47 100644 --- a/include/linux/cpuset.h +++ b/include/linux/cpuset.h @@ -26,7 +26,7 @@ extern nodemask_t cpuset_mems_allowed(struct task_struct *p); #define cpuset_current_mems_allowed (current->mems_allowed) void cpuset_init_current_mems_allowed(void); void cpuset_update_task_memory_state(void); -int cpuset_zonelist_valid_mems_allowed(struct zonelist *zl); +int cpuset_nodemask_valid_mems_allowed(nodemask_t *nodemask); extern int __cpuset_zone_allowed_softwall(struct zone *z, gfp_t gfp_mask); extern int __cpuset_zone_allowed_hardwall(struct zone *z, gfp_t gfp_mask); @@ -103,7 +103,7 @@ static inline nodemask_t cpuset_mems_allowed(struct task_struct *p) static inline void cpuset_init_current_mems_allowed(void) {} static inline void cpuset_update_task_memory_state(void) {} -static inline int cpuset_zonelist_valid_mems_allowed(struct zonelist *zl) +static inline int cpuset_nodemask_valid_mems_allowed(nodemask_t *nodemask) { return 1; } diff --git a/include/linux/gfp.h b/include/linux/gfp.h index e1c6064cb6c7..898aa9d5b6c2 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -182,6 +182,10 @@ static inline void arch_alloc_page(struct page *page, int order) { } extern struct page *__alloc_pages(gfp_t, unsigned int, struct zonelist *); +extern struct page * +__alloc_pages_nodemask(gfp_t, unsigned int, + struct zonelist *, nodemask_t *nodemask); + static inline struct page *alloc_pages_node(int nid, gfp_t gfp_mask, unsigned int order) { diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index 69160dc32d48..b8b3da7a3315 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -54,19 +54,20 @@ struct mm_struct; * mmap_sem. * * Freeing policy: - * When policy is MPOL_BIND v.zonelist is kmalloc'ed and must be kfree'd. - * All other policies don't have any external state. mpol_free() handles this. + * Mempolicy objects are reference counted. A mempolicy will be freed when + * mpol_free() decrements the reference count to zero. * * Copying policy objects: - * For MPOL_BIND the zonelist must be always duplicated. mpol_clone() does this. + * mpol_copy() allocates a new mempolicy and copies the specified mempolicy + * to the new storage. The reference count of the new object is initialized + * to 1, representing the caller of mpol_copy(). */ struct mempolicy { atomic_t refcnt; short policy; /* See MPOL_* above */ union { - struct zonelist *zonelist; /* bind */ short preferred_node; /* preferred */ - nodemask_t nodes; /* interleave */ + nodemask_t nodes; /* interleave/bind */ /* undefined for default */ } v; nodemask_t cpuset_mems_allowed; /* mempolicy relative to these nodes */ @@ -151,7 +152,8 @@ extern void mpol_fix_fork_child_flag(struct task_struct *p); extern struct mempolicy default_policy; extern struct zonelist *huge_zonelist(struct vm_area_struct *vma, - unsigned long addr, gfp_t gfp_flags, struct mempolicy **mpol); + unsigned long addr, gfp_t gfp_flags, + struct mempolicy **mpol, nodemask_t **nodemask); extern unsigned slab_node(struct mempolicy *policy); extern enum zone_type policy_zone; @@ -239,8 +241,11 @@ static inline void mpol_fix_fork_child_flag(struct task_struct *p) } static inline struct zonelist *huge_zonelist(struct vm_area_struct *vma, - unsigned long addr, gfp_t gfp_flags, struct mempolicy **mpol) + unsigned long addr, gfp_t gfp_flags, + struct mempolicy **mpol, nodemask_t **nodemask) { + *mpol = NULL; + *nodemask = NULL; return node_zonelist(0, gfp_flags); } diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index d34b4c290017..498d6ceff2f4 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -749,36 +749,60 @@ static inline int zonelist_node_idx(struct zoneref *zoneref) #endif /* CONFIG_NUMA */ } -static inline void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref) -{ - zoneref->zone = zone; - zoneref->zone_idx = zone_idx(zone); -} +/** + * next_zones_zonelist - Returns the next zone at or below highest_zoneidx within the allowed nodemask using a cursor within a zonelist as a starting point + * @z - The cursor used as a starting point for the search + * @highest_zoneidx - The zone index of the highest zone to return + * @nodes - An optional nodemask to filter the zonelist with + * @zone - The first suitable zone found is returned via this parameter + * + * This function returns the next zone at or below a given zone index that is + * within the allowed nodemask using a cursor as the starting point for the + * search. The zoneref returned is a cursor that is used as the next starting + * point for future calls to next_zones_zonelist(). + */ +struct zoneref *next_zones_zonelist(struct zoneref *z, + enum zone_type highest_zoneidx, + nodemask_t *nodes, + struct zone **zone); -/* Returns the first zone at or below highest_zoneidx in a zonelist */ +/** + * first_zones_zonelist - Returns the first zone at or below highest_zoneidx within the allowed nodemask in a zonelist + * @zonelist - The zonelist to search for a suitable zone + * @highest_zoneidx - The zone index of the highest zone to return + * @nodes - An optional nodemask to filter the zonelist with + * @zone - The first suitable zone found is returned via this parameter + * + * This function returns the first zone at or below a given zone index that is + * within the allowed nodemask. The zoneref returned is a cursor that can be + * used to iterate the zonelist with next_zones_zonelist. The cursor should + * not be used by the caller as it does not match the value of the zone + * returned. + */ static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist, - enum zone_type highest_zoneidx) + enum zone_type highest_zoneidx, + nodemask_t *nodes, + struct zone **zone) { - struct zoneref *z; - - /* Find the first suitable zone to use for the allocation */ - z = zonelist->_zonerefs; - while (zonelist_zone_idx(z) > highest_zoneidx) - z++; - - return z; + return next_zones_zonelist(zonelist->_zonerefs, highest_zoneidx, nodes, + zone); } -/* Returns the next zone at or below highest_zoneidx in a zonelist */ -static inline struct zoneref *next_zones_zonelist(struct zoneref *z, - enum zone_type highest_zoneidx) -{ - /* Find the next suitable zone to use for the allocation */ - while (zonelist_zone_idx(z) > highest_zoneidx) - z++; - - return z; -} +/** + * for_each_zone_zonelist_nodemask - helper macro to iterate over valid zones in a zonelist at or below a given zone index and within a nodemask + * @zone - The current zone in the iterator + * @z - The current pointer within zonelist->zones being iterated + * @zlist - The zonelist being iterated + * @highidx - The zone index of the highest zone to return + * @nodemask - Nodemask allowed by the allocator + * + * This iterator iterates though all zones at or below a given zone index and + * within a given nodemask + */ +#define for_each_zone_zonelist_nodemask(zone, z, zlist, highidx, nodemask) \ + for (z = first_zones_zonelist(zlist, highidx, nodemask, &zone); \ + zone; \ + z = next_zones_zonelist(z, highidx, nodemask, &zone)) \ /** * for_each_zone_zonelist - helper macro to iterate over valid zones in a zonelist at or below a given zone index @@ -790,11 +814,7 @@ static inline struct zoneref *next_zones_zonelist(struct zoneref *z, * This iterator iterates though all zones at or below a given zone index. */ #define for_each_zone_zonelist(zone, z, zlist, highidx) \ - for (z = first_zones_zonelist(zlist, highidx), \ - zone = zonelist_zone(z++); \ - zone; \ - z = next_zones_zonelist(z, highidx), \ - zone = zonelist_zone(z++)) + for_each_zone_zonelist_nodemask(zone, z, zlist, highidx, NULL) #ifdef CONFIG_SPARSEMEM #include diff --git a/kernel/cpuset.c b/kernel/cpuset.c index a220b13cbfaf..c9923e3c9a3b 100644 --- a/kernel/cpuset.c +++ b/kernel/cpuset.c @@ -1958,22 +1958,14 @@ nodemask_t cpuset_mems_allowed(struct task_struct *tsk) } /** - * cpuset_zonelist_valid_mems_allowed - check zonelist vs. curremt mems_allowed - * @zl: the zonelist to be checked + * cpuset_nodemask_valid_mems_allowed - check nodemask vs. curremt mems_allowed + * @nodemask: the nodemask to be checked * - * Are any of the nodes on zonelist zl allowed in current->mems_allowed? + * Are any of the nodes in the nodemask allowed in current->mems_allowed? */ -int cpuset_zonelist_valid_mems_allowed(struct zonelist *zl) +int cpuset_nodemask_valid_mems_allowed(nodemask_t *nodemask) { - int i; - - for (i = 0; zl->_zonerefs[i].zone; i++) { - int nid = zonelist_node_idx(&zl->_zonerefs[i]); - - if (node_isset(nid, current->mems_allowed)) - return 1; - } - return 0; + return nodes_intersects(*nodemask, current->mems_allowed); } /* diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 4bced0d705ca..3737d82f5225 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -95,12 +95,14 @@ static struct page *dequeue_huge_page_vma(struct vm_area_struct *vma, int nid; struct page *page = NULL; struct mempolicy *mpol; + nodemask_t *nodemask; struct zonelist *zonelist = huge_zonelist(vma, address, - htlb_alloc_mask, &mpol); + htlb_alloc_mask, &mpol, &nodemask); struct zone *zone; struct zoneref *z; - for_each_zone_zonelist(zone, z, zonelist, MAX_NR_ZONES - 1) { + for_each_zone_zonelist_nodemask(zone, z, zonelist, + MAX_NR_ZONES - 1, nodemask) { nid = zone_to_nid(zone); if (cpuset_zone_allowed_softwall(zone, htlb_alloc_mask) && !list_empty(&hugepage_freelists[nid])) { diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 90193a2a915b..acb5ee3587c3 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -163,42 +163,25 @@ static int mpol_check_policy(int mode, nodemask_t *nodes) return 0; } -/* Generate a custom zonelist for the BIND policy. */ -static struct zonelist *bind_zonelist(nodemask_t *nodes) +/* Check that the nodemask contains at least one populated zone */ +static int is_valid_nodemask(nodemask_t *nodemask) { - struct zonelist *zl; - int num, max, nd; - enum zone_type k; + int nd, k; - max = 1 + MAX_NR_ZONES * nodes_weight(*nodes); - max++; /* space for zlcache_ptr (see mmzone.h) */ - zl = kmalloc(sizeof(struct zone *) * max, GFP_KERNEL); - if (!zl) - return ERR_PTR(-ENOMEM); - zl->zlcache_ptr = NULL; - num = 0; - /* First put in the highest zones from all nodes, then all the next - lower zones etc. Avoid empty zones because the memory allocator - doesn't like them. If you implement node hot removal you - have to fix that. */ - k = MAX_NR_ZONES - 1; - while (1) { - for_each_node_mask(nd, *nodes) { - struct zone *z = &NODE_DATA(nd)->node_zones[k]; - if (z->present_pages > 0) - zoneref_set_zone(z, &zl->_zonerefs[num++]); + /* Check that there is something useful in this mask */ + k = policy_zone; + + for_each_node_mask(nd, *nodemask) { + struct zone *z; + + for (k = 0; k <= policy_zone; k++) { + z = &NODE_DATA(nd)->node_zones[k]; + if (z->present_pages > 0) + return 1; } - if (k == 0) - break; - k--; - } - if (num == 0) { - kfree(zl); - return ERR_PTR(-EINVAL); } - zl->_zonerefs[num].zone = NULL; - zl->_zonerefs[num].zone_idx = 0; - return zl; + + return 0; } /* Create a new policy */ @@ -229,12 +212,11 @@ static struct mempolicy *mpol_new(int mode, nodemask_t *nodes) policy->v.preferred_node = -1; break; case MPOL_BIND: - policy->v.zonelist = bind_zonelist(nodes); - if (IS_ERR(policy->v.zonelist)) { - void *error_code = policy->v.zonelist; + if (!is_valid_nodemask(nodes)) { kmem_cache_free(policy_cache, policy); - return error_code; + return ERR_PTR(-EINVAL); } + policy->v.nodes = *nodes; break; } policy->policy = mode; @@ -500,19 +482,12 @@ static long do_set_mempolicy(int mode, nodemask_t *nodes) /* Fill a zone bitmap for a policy */ static void get_zonemask(struct mempolicy *p, nodemask_t *nodes) { - int i; - nodes_clear(*nodes); switch (p->policy) { - case MPOL_BIND: - for (i = 0; p->v.zonelist->_zonerefs[i].zone; i++) { - struct zoneref *zref; - zref = &p->v.zonelist->_zonerefs[i]; - node_set(zonelist_node_idx(zref), *nodes); - } - break; case MPOL_DEFAULT: break; + case MPOL_BIND: + /* Fall through */ case MPOL_INTERLEAVE: *nodes = p->v.nodes; break; @@ -1160,6 +1135,18 @@ static struct mempolicy * get_vma_policy(struct task_struct *task, return pol; } +/* Return a nodemask representing a mempolicy */ +static nodemask_t *nodemask_policy(gfp_t gfp, struct mempolicy *policy) +{ + /* Lower zones don't get a nodemask applied for MPOL_BIND */ + if (unlikely(policy->policy == MPOL_BIND) && + gfp_zone(gfp) >= policy_zone && + cpuset_nodemask_valid_mems_allowed(&policy->v.nodes)) + return &policy->v.nodes; + + return NULL; +} + /* Return a zonelist representing a mempolicy */ static struct zonelist *zonelist_policy(gfp_t gfp, struct mempolicy *policy) { @@ -1172,12 +1159,17 @@ static struct zonelist *zonelist_policy(gfp_t gfp, struct mempolicy *policy) nd = numa_node_id(); break; case MPOL_BIND: - /* Lower zones don't get a policy applied */ - /* Careful: current->mems_allowed might have moved */ - if (gfp_zone(gfp) >= policy_zone) - if (cpuset_zonelist_valid_mems_allowed(policy->v.zonelist)) - return policy->v.zonelist; - /*FALL THROUGH*/ + /* + * Normally, MPOL_BIND allocations node-local are node-local + * within the allowed nodemask. However, if __GFP_THISNODE is + * set and the current node is part of the mask, we use the + * the zonelist for the first node in the mask instead. + */ + nd = numa_node_id(); + if (unlikely(gfp & __GFP_THISNODE) && + unlikely(!node_isset(nd, policy->v.nodes))) + nd = first_node(policy->v.nodes); + break; case MPOL_INTERLEAVE: /* should not happen */ case MPOL_DEFAULT: nd = numa_node_id(); @@ -1220,7 +1212,14 @@ unsigned slab_node(struct mempolicy *policy) * Follow bind policy behavior and start allocation at the * first node. */ - return zonelist_node_idx(policy->v.zonelist->_zonerefs); + struct zonelist *zonelist; + struct zone *zone; + enum zone_type highest_zoneidx = gfp_zone(GFP_KERNEL); + zonelist = &NODE_DATA(numa_node_id())->node_zonelists[0]; + (void)first_zones_zonelist(zonelist, highest_zoneidx, + &policy->v.nodes, + &zone); + return zone->node; } case MPOL_PREFERRED: @@ -1278,25 +1277,31 @@ static inline unsigned interleave_nid(struct mempolicy *pol, * @vma = virtual memory area whose policy is sought * @addr = address in @vma for shared policy lookup and interleave policy * @gfp_flags = for requested zone - * @mpol = pointer to mempolicy pointer for reference counted 'BIND policy + * @mpol = pointer to mempolicy pointer for reference counted mempolicy + * @nodemask = pointer to nodemask pointer for MPOL_BIND nodemask * * Returns a zonelist suitable for a huge page allocation. - * If the effective policy is 'BIND, returns pointer to policy's zonelist. + * If the effective policy is 'BIND, returns pointer to local node's zonelist, + * and a pointer to the mempolicy's @nodemask for filtering the zonelist. * If it is also a policy for which get_vma_policy() returns an extra - * reference, we must hold that reference until after allocation. + * reference, we must hold that reference until after the allocation. * In that case, return policy via @mpol so hugetlb allocation can drop - * the reference. For non-'BIND referenced policies, we can/do drop the + * the reference. For non-'BIND referenced policies, we can/do drop the * reference here, so the caller doesn't need to know about the special case * for default and current task policy. */ struct zonelist *huge_zonelist(struct vm_area_struct *vma, unsigned long addr, - gfp_t gfp_flags, struct mempolicy **mpol) + gfp_t gfp_flags, struct mempolicy **mpol, + nodemask_t **nodemask) { struct mempolicy *pol = get_vma_policy(current, vma, addr); struct zonelist *zl; *mpol = NULL; /* probably no unref needed */ - if (pol->policy == MPOL_INTERLEAVE) { + *nodemask = NULL; /* assume !MPOL_BIND */ + if (pol->policy == MPOL_BIND) { + *nodemask = &pol->v.nodes; + } else if (pol->policy == MPOL_INTERLEAVE) { unsigned nid; nid = interleave_nid(pol, vma, addr, HPAGE_SHIFT); @@ -1376,14 +1381,15 @@ alloc_page_vma(gfp_t gfp, struct vm_area_struct *vma, unsigned long addr) /* * slow path: ref counted policy -- shared or vma */ - struct page *page = __alloc_pages(gfp, 0, zl); + struct page *page = __alloc_pages_nodemask(gfp, 0, + zl, nodemask_policy(gfp, pol)); __mpol_free(pol); return page; } /* * fast path: default or task policy */ - return __alloc_pages(gfp, 0, zl); + return __alloc_pages_nodemask(gfp, 0, zl, nodemask_policy(gfp, pol)); } /** @@ -1415,7 +1421,8 @@ struct page *alloc_pages_current(gfp_t gfp, unsigned order) pol = &default_policy; if (pol->policy == MPOL_INTERLEAVE) return alloc_page_interleave(gfp, order, interleave_nodes(pol)); - return __alloc_pages(gfp, order, zonelist_policy(gfp, pol)); + return __alloc_pages_nodemask(gfp, order, + zonelist_policy(gfp, pol), nodemask_policy(gfp, pol)); } EXPORT_SYMBOL(alloc_pages_current); @@ -1440,14 +1447,6 @@ struct mempolicy *__mpol_copy(struct mempolicy *old) } *new = *old; atomic_set(&new->refcnt, 1); - if (new->policy == MPOL_BIND) { - int sz = ksize(old->v.zonelist); - new->v.zonelist = kmemdup(old->v.zonelist, sz, GFP_KERNEL); - if (!new->v.zonelist) { - kmem_cache_free(policy_cache, new); - return ERR_PTR(-ENOMEM); - } - } return new; } @@ -1461,21 +1460,12 @@ int __mpol_equal(struct mempolicy *a, struct mempolicy *b) switch (a->policy) { case MPOL_DEFAULT: return 1; + case MPOL_BIND: + /* Fall through */ case MPOL_INTERLEAVE: return nodes_equal(a->v.nodes, b->v.nodes); case MPOL_PREFERRED: return a->v.preferred_node == b->v.preferred_node; - case MPOL_BIND: { - int i; - for (i = 0; a->v.zonelist->_zonerefs[i].zone; i++) { - struct zone *za, *zb; - za = zonelist_zone(&a->v.zonelist->_zonerefs[i]); - zb = zonelist_zone(&b->v.zonelist->_zonerefs[i]); - if (za != zb) - return 0; - } - return b->v.zonelist->_zonerefs[i].zone == NULL; - } default: BUG(); return 0; @@ -1487,8 +1477,6 @@ void __mpol_free(struct mempolicy *p) { if (!atomic_dec_and_test(&p->refcnt)) return; - if (p->policy == MPOL_BIND) - kfree(p->v.zonelist); p->policy = MPOL_DEFAULT; kmem_cache_free(policy_cache, p); } @@ -1779,6 +1767,8 @@ static void mpol_rebind_policy(struct mempolicy *pol, switch (pol->policy) { case MPOL_DEFAULT: break; + case MPOL_BIND: + /* Fall through */ case MPOL_INTERLEAVE: nodes_remap(tmp, pol->v.nodes, *mpolmask, *newmask); pol->v.nodes = tmp; @@ -1791,32 +1781,6 @@ static void mpol_rebind_policy(struct mempolicy *pol, *mpolmask, *newmask); *mpolmask = *newmask; break; - case MPOL_BIND: { - nodemask_t nodes; - struct zoneref *z; - struct zonelist *zonelist; - - nodes_clear(nodes); - for (z = pol->v.zonelist->_zonerefs; z->zone; z++) - node_set(zonelist_node_idx(z), nodes); - nodes_remap(tmp, nodes, *mpolmask, *newmask); - nodes = tmp; - - zonelist = bind_zonelist(&nodes); - - /* If no mem, then zonelist is NULL and we keep old zonelist. - * If that old zonelist has no remaining mems_allowed nodes, - * then zonelist_policy() will "FALL THROUGH" to MPOL_DEFAULT. - */ - - if (!IS_ERR(zonelist)) { - /* Good - got mem - substitute new zonelist */ - kfree(pol->v.zonelist); - pol->v.zonelist = zonelist; - } - *mpolmask = *newmask; - break; - } default: BUG(); break; @@ -1879,9 +1843,7 @@ static inline int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) break; case MPOL_BIND: - get_zonemask(pol, &nodes); - break; - + /* Fall through */ case MPOL_INTERLEAVE: nodes = pol->v.nodes; break; diff --git a/mm/mmzone.c b/mm/mmzone.c index eb5838634f18..486ed595ee6f 100644 --- a/mm/mmzone.c +++ b/mm/mmzone.c @@ -42,3 +42,33 @@ struct zone *next_zone(struct zone *zone) return zone; } +static inline int zref_in_nodemask(struct zoneref *zref, nodemask_t *nodes) +{ +#ifdef CONFIG_NUMA + return node_isset(zonelist_node_idx(zref), *nodes); +#else + return 1; +#endif /* CONFIG_NUMA */ +} + +/* Returns the next zone at or below highest_zoneidx in a zonelist */ +struct zoneref *next_zones_zonelist(struct zoneref *z, + enum zone_type highest_zoneidx, + nodemask_t *nodes, + struct zone **zone) +{ + /* + * Find the next suitable zone to use for the allocation. + * Only filter based on nodemask if it's set + */ + if (likely(nodes == NULL)) + while (zonelist_zone_idx(z) > highest_zoneidx) + z++; + else + while (zonelist_zone_idx(z) > highest_zoneidx || + (z->zone && !zref_in_nodemask(z, nodes))) + z++; + + *zone = zonelist_zone(z++); + return z; +} diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 6d94d04ea784..b4beb3eea8b7 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1377,7 +1377,7 @@ static void zlc_mark_zone_full(struct zonelist *zonelist, struct zoneref *z) * a page. */ static struct page * -get_page_from_freelist(gfp_t gfp_mask, unsigned int order, +get_page_from_freelist(gfp_t gfp_mask, nodemask_t *nodemask, unsigned int order, struct zonelist *zonelist, int high_zoneidx, int alloc_flags) { struct zoneref *z; @@ -1388,16 +1388,17 @@ get_page_from_freelist(gfp_t gfp_mask, unsigned int order, int zlc_active = 0; /* set if using zonelist_cache */ int did_zlc_setup = 0; /* just call zlc_setup() one time */ - z = first_zones_zonelist(zonelist, high_zoneidx); - classzone_idx = zonelist_zone_idx(z); - preferred_zone = zonelist_zone(z); + (void)first_zones_zonelist(zonelist, high_zoneidx, nodemask, + &preferred_zone); + classzone_idx = zone_idx(preferred_zone); zonelist_scan: /* * Scan zonelist, looking for a zone with enough free. * See also cpuset_zone_allowed() comment in kernel/cpuset.c. */ - for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) { + for_each_zone_zonelist_nodemask(zone, z, zonelist, + high_zoneidx, nodemask) { if (NUMA_BUILD && zlc_active && !zlc_zone_worth_trying(zonelist, z, allowednodes)) continue; @@ -1447,9 +1448,9 @@ try_next_zone: /* * This is the 'heart' of the zoned buddy allocator. */ -struct page * -__alloc_pages(gfp_t gfp_mask, unsigned int order, - struct zonelist *zonelist) +static struct page * +__alloc_pages_internal(gfp_t gfp_mask, unsigned int order, + struct zonelist *zonelist, nodemask_t *nodemask) { const gfp_t wait = gfp_mask & __GFP_WAIT; enum zone_type high_zoneidx = gfp_zone(gfp_mask); @@ -1478,7 +1479,7 @@ restart: return NULL; } - page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, order, + page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order, zonelist, high_zoneidx, ALLOC_WMARK_LOW|ALLOC_CPUSET); if (page) goto got_pg; @@ -1523,7 +1524,7 @@ restart: * Ignore cpuset if GFP_ATOMIC (!wait) rather than fail alloc. * See also cpuset_zone_allowed() comment in kernel/cpuset.c. */ - page = get_page_from_freelist(gfp_mask, order, zonelist, + page = get_page_from_freelist(gfp_mask, nodemask, order, zonelist, high_zoneidx, alloc_flags); if (page) goto got_pg; @@ -1536,7 +1537,7 @@ rebalance: if (!(gfp_mask & __GFP_NOMEMALLOC)) { nofail_alloc: /* go through the zonelist yet again, ignoring mins */ - page = get_page_from_freelist(gfp_mask, order, + page = get_page_from_freelist(gfp_mask, nodemask, order, zonelist, high_zoneidx, ALLOC_NO_WATERMARKS); if (page) goto got_pg; @@ -1571,7 +1572,7 @@ nofail_alloc: drain_all_pages(); if (likely(did_some_progress)) { - page = get_page_from_freelist(gfp_mask, order, + page = get_page_from_freelist(gfp_mask, nodemask, order, zonelist, high_zoneidx, alloc_flags); if (page) goto got_pg; @@ -1587,8 +1588,9 @@ nofail_alloc: * a parallel oom killing, we must fail if we're still * under heavy pressure. */ - page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, order, - zonelist, high_zoneidx, ALLOC_WMARK_HIGH|ALLOC_CPUSET); + page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, + order, zonelist, high_zoneidx, + ALLOC_WMARK_HIGH|ALLOC_CPUSET); if (page) { clear_zonelist_oom(zonelist, gfp_mask); goto got_pg; @@ -1637,6 +1639,20 @@ got_pg: return page; } +struct page * +__alloc_pages(gfp_t gfp_mask, unsigned int order, + struct zonelist *zonelist) +{ + return __alloc_pages_internal(gfp_mask, order, zonelist, NULL); +} + +struct page * +__alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, + struct zonelist *zonelist, nodemask_t *nodemask) +{ + return __alloc_pages_internal(gfp_mask, order, zonelist, nodemask); +} + EXPORT_SYMBOL(__alloc_pages); /* @@ -1880,6 +1896,12 @@ void show_free_areas(void) show_swap_cache_info(); } +static void zoneref_set_zone(struct zone *zone, struct zoneref *zoneref) +{ + zoneref->zone = zone; + zoneref->zone_idx = zone_idx(zone); +} + /* * Builds allocation fallback zone lists. * -- cgit v1.2.3 From 797df5749032c2286bc7ff3a52de41fde0cdf0a5 Mon Sep 17 00:00:00 2001 From: Chris Dearman Date: Mon, 28 Apr 2008 02:12:19 -0700 Subject: mm: try both endianess when checking for endianess When checking for the swap header try byteswapping the endianess dependent fields to allow the swap partition to be shared between big & little endian systems. Signed-off-by: Chris Dearman Signed-off-by: Ralf Baechle Acked-by: Hugh Dickins Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/swapfile.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/mm/swapfile.c b/mm/swapfile.c index 2da149cfc9ac..67051be7083a 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1582,6 +1582,14 @@ asmlinkage long sys_swapon(const char __user * specialfile, int swap_flags) error = -EINVAL; goto bad_swap; case 2: + /* swap partition endianess hack... */ + if (swab32(swap_header->info.version) == 1) { + swab32s(&swap_header->info.version); + swab32s(&swap_header->info.last_page); + swab32s(&swap_header->info.nr_badpages); + for (i = 0; i < swap_header->info.nr_badpages; i++) + swab32s(&swap_header->info.badpages[i]); + } /* Check the swap header's sub-version and the size of the swap file and bad block lists */ if (swap_header->info.version != 1) { -- cgit v1.2.3 From 19fc3f0acde32636529969570055c7e2a744787c Mon Sep 17 00:00:00 2001 From: Adam Litke Date: Mon, 28 Apr 2008 02:12:20 -0700 Subject: hugetlb: decrease hugetlb_lock cycling in gather_surplus_huge_pages To reduce hugetlb_lock acquisitions and releases when freeing excess surplus pages, scan the page list in two parts. First, transfer the needed pages to the hugetlb pool. Then drop the lock and free the remaining pages back to the buddy allocator. In the common case there are zero excess pages and no lock operations are required. Thanks Mel Gorman for this improvement. Signed-off-by: Adam Litke Cc: Mel Gorman Cc: Dave Hansen Cc: William Lee Irwin III Cc: Andy Whitcroft Cc: Mel Gorman Cc: David Gibson Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/hugetlb.c | 17 ++++++++++++----- 1 file changed, 12 insertions(+), 5 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 3737d82f5225..93ea46a0fba4 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -372,11 +372,19 @@ retry: resv_huge_pages += delta; ret = 0; free: + /* Free the needed pages to the hugetlb pool */ list_for_each_entry_safe(page, tmp, &surplus_list, lru) { + if ((--needed) < 0) + break; list_del(&page->lru); - if ((--needed) >= 0) - enqueue_huge_page(page); - else { + enqueue_huge_page(page); + } + + /* Free unnecessary surplus pages to the buddy allocator */ + if (!list_empty(&surplus_list)) { + spin_unlock(&hugetlb_lock); + list_for_each_entry_safe(page, tmp, &surplus_list, lru) { + list_del(&page->lru); /* * The page has a reference count of zero already, so * call free_huge_page directly instead of using @@ -384,10 +392,9 @@ free: * unlocked which is safe because free_huge_page takes * hugetlb_lock before deciding how to free the page. */ - spin_unlock(&hugetlb_lock); free_huge_page(page); - spin_lock(&hugetlb_lock); } + spin_lock(&hugetlb_lock); } return ret; -- cgit v1.2.3 From 1b27d05b6e21249d2338be26dfcbe8f8d8ff8a5b Mon Sep 17 00:00:00 2001 From: Pekka Enberg Date: Mon, 28 Apr 2008 02:12:22 -0700 Subject: mm: move cache_line_size() to Not all architectures define cache_line_size() so as suggested by Andrew move the private implementations in mm/slab.c and mm/slob.c to . Cc: Thomas Gleixner Cc: Ingo Molnar Cc: H. Peter Anvin Reviewed-by: Christoph Lameter Signed-off-by: Pekka Enberg Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/x86/Kconfig | 3 +++ include/linux/cache.h | 4 ++++ mm/slab.c | 4 ---- mm/slub.c | 5 ----- 4 files changed, 7 insertions(+), 9 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index e5790fe9e330..a8ce13a54764 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -114,6 +114,9 @@ config GENERIC_TIME_VSYSCALL config ARCH_HAS_CPU_RELAX def_bool y +config ARCH_HAS_CACHE_LINE_SIZE + def_bool y + config HAVE_SETUP_PER_CPU_AREA def_bool X86_64 || (X86_SMP && !X86_VOYAGER) diff --git a/include/linux/cache.h b/include/linux/cache.h index 4552504c0228..97e24881c4c6 100644 --- a/include/linux/cache.h +++ b/include/linux/cache.h @@ -60,4 +60,8 @@ #endif #endif +#ifndef CONFIG_ARCH_HAS_CACHE_LINE_SIZE +#define cache_line_size() L1_CACHE_BYTES +#endif + #endif /* __LINUX_CACHE_H */ diff --git a/mm/slab.c b/mm/slab.c index 7bc4a136846e..39d20f8a0791 100644 --- a/mm/slab.c +++ b/mm/slab.c @@ -139,10 +139,6 @@ #define BYTES_PER_WORD sizeof(void *) #define REDZONE_ALIGN max(BYTES_PER_WORD, __alignof__(unsigned long long)) -#ifndef cache_line_size -#define cache_line_size() L1_CACHE_BYTES -#endif - #ifndef ARCH_KMALLOC_MINALIGN /* * Enforce a minimum alignment for the kmalloc caches. diff --git a/mm/slub.c b/mm/slub.c index 48fff83a1e9d..38914bc64aca 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -207,11 +207,6 @@ static inline void ClearSlabDebug(struct page *page) #define __KMALLOC_CACHE 0x20000000 /* objects freed using kfree */ #define __PAGE_ALLOC_FALLBACK 0x10000000 /* Allow fallback to page alloc */ -/* Not all arches define cache_line_size */ -#ifndef cache_line_size -#define cache_line_size() L1_CACHE_BYTES -#endif - static int kmem_size = sizeof(struct kmem_cache); #ifdef CONFIG_SMP -- cgit v1.2.3 From a3b51e0142d1be156ac697eaadadd6cfbb7ba32b Mon Sep 17 00:00:00 2001 From: David Rientjes Date: Mon, 28 Apr 2008 02:12:23 -0700 Subject: mempolicy: convert MPOL constants to enum The mempolicy mode constants, MPOL_DEFAULT, MPOL_PREFERRED, MPOL_BIND, and MPOL_INTERLEAVE, are better declared as part of an enum since they are sequentially numbered and cannot be combined. The policy member of struct mempolicy is also converted from type short to type unsigned short. A negative policy does not have any legitimate meaning, so it is possible to change its type in preparation for adding optional mode flags later. The equivalent member of struct shmem_sb_info is also changed from int to unsigned short. For compatibility, the policy formal to get_mempolicy() remains as a pointer to an int: int get_mempolicy(int *policy, unsigned long *nmask, unsigned long maxnode, unsigned long addr, unsigned long flags); although the only possible values is the range of type unsigned short. Cc: Paul Jackson Cc: Christoph Lameter Cc: Lee Schermerhorn Cc: Andi Kleen Signed-off-by: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mempolicy.h | 19 ++++++++++--------- include/linux/shmem_fs.h | 2 +- mm/mempolicy.c | 29 +++++++++++++++++------------ mm/shmem.c | 9 +++++---- 4 files changed, 33 insertions(+), 26 deletions(-) diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index b8b3da7a3315..389a06e8ee21 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -9,12 +9,13 @@ */ /* Policies */ -#define MPOL_DEFAULT 0 -#define MPOL_PREFERRED 1 -#define MPOL_BIND 2 -#define MPOL_INTERLEAVE 3 - -#define MPOL_MAX MPOL_INTERLEAVE +enum { + MPOL_DEFAULT, + MPOL_PREFERRED, + MPOL_BIND, + MPOL_INTERLEAVE, + MPOL_MAX, /* always last member of enum */ +}; /* Flags for get_mem_policy */ #define MPOL_F_NODE (1<<0) /* return next IL mode instead of node mask */ @@ -64,7 +65,7 @@ struct mm_struct; */ struct mempolicy { atomic_t refcnt; - short policy; /* See MPOL_* above */ + unsigned short policy; /* See MPOL_* above */ union { short preferred_node; /* preferred */ nodemask_t nodes; /* interleave/bind */ @@ -134,7 +135,7 @@ struct shared_policy { spinlock_t lock; }; -void mpol_shared_policy_init(struct shared_policy *info, int policy, +void mpol_shared_policy_init(struct shared_policy *info, unsigned short policy, nodemask_t *nodes); int mpol_set_shared_policy(struct shared_policy *info, struct vm_area_struct *vma, @@ -202,7 +203,7 @@ static inline int mpol_set_shared_policy(struct shared_policy *info, } static inline void mpol_shared_policy_init(struct shared_policy *info, - int policy, nodemask_t *nodes) + unsigned short policy, nodemask_t *nodes) { } diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 8d5fb36ea047..639a4070708e 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -34,7 +34,7 @@ struct shmem_sb_info { uid_t uid; /* Mount uid for root directory */ gid_t gid; /* Mount gid for root directory */ mode_t mode; /* Mount mode for root directory */ - int policy; /* Default NUMA memory alloc policy */ + unsigned short policy; /* Default NUMA memory alloc policy */ nodemask_t policy_nodes; /* nodemask for preferred and bind */ }; diff --git a/mm/mempolicy.c b/mm/mempolicy.c index acb5ee3587c3..1311dc4a3888 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -114,7 +114,7 @@ static void mpol_rebind_policy(struct mempolicy *pol, const nodemask_t *newmask); /* Do sanity checking on a policy */ -static int mpol_check_policy(int mode, nodemask_t *nodes) +static int mpol_check_policy(unsigned short mode, nodemask_t *nodes) { int was_empty, is_empty; @@ -159,6 +159,8 @@ static int mpol_check_policy(int mode, nodemask_t *nodes) if (!was_empty && is_empty) return -EINVAL; break; + default: + BUG(); } return 0; } @@ -185,7 +187,7 @@ static int is_valid_nodemask(nodemask_t *nodemask) } /* Create a new policy */ -static struct mempolicy *mpol_new(int mode, nodemask_t *nodes) +static struct mempolicy *mpol_new(unsigned short mode, nodemask_t *nodes) { struct mempolicy *policy; @@ -218,6 +220,8 @@ static struct mempolicy *mpol_new(int mode, nodemask_t *nodes) } policy->v.nodes = *nodes; break; + default: + BUG(); } policy->policy = mode; policy->cpuset_mems_allowed = cpuset_mems_allowed(current); @@ -462,7 +466,7 @@ static void mpol_set_task_struct_flag(void) } /* Set the process memory policy */ -static long do_set_mempolicy(int mode, nodemask_t *nodes) +static long do_set_mempolicy(unsigned short mode, nodemask_t *nodes) { struct mempolicy *new; @@ -759,7 +763,7 @@ static struct page *new_vma_page(struct page *page, unsigned long private, int * #endif static long do_mbind(unsigned long start, unsigned long len, - unsigned long mode, nodemask_t *nmask, + unsigned short mode, nodemask_t *nmask, unsigned long flags) { struct vm_area_struct *vma; @@ -769,9 +773,8 @@ static long do_mbind(unsigned long start, unsigned long len, int err; LIST_HEAD(pagelist); - if ((flags & ~(unsigned long)(MPOL_MF_STRICT | - MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) - || mode > MPOL_MAX) + if (flags & ~(unsigned long)(MPOL_MF_STRICT | + MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) return -EINVAL; if ((flags & MPOL_MF_MOVE_ALL) && !capable(CAP_SYS_NICE)) return -EPERM; @@ -804,7 +807,7 @@ static long do_mbind(unsigned long start, unsigned long len, if (!new) flags |= MPOL_MF_DISCONTIG_OK; - pr_debug("mbind %lx-%lx mode:%ld nodes:%lx\n",start,start+len, + pr_debug("mbind %lx-%lx mode:%d nodes:%lx\n", start, start + len, mode, nmask ? nodes_addr(*nmask)[0] : -1); down_write(&mm->mmap_sem); @@ -905,6 +908,8 @@ asmlinkage long sys_mbind(unsigned long start, unsigned long len, nodemask_t nodes; int err; + if (mode >= MPOL_MAX) + return -EINVAL; err = get_nodes(&nodes, nmask, maxnode); if (err) return err; @@ -918,7 +923,7 @@ asmlinkage long sys_set_mempolicy(int mode, unsigned long __user *nmask, int err; nodemask_t nodes; - if (mode < 0 || mode > MPOL_MAX) + if (mode < 0 || mode >= MPOL_MAX) return -EINVAL; err = get_nodes(&nodes, nmask, maxnode); if (err) @@ -1201,7 +1206,7 @@ static unsigned interleave_nodes(struct mempolicy *policy) */ unsigned slab_node(struct mempolicy *policy) { - int pol = policy ? policy->policy : MPOL_DEFAULT; + unsigned short pol = policy ? policy->policy : MPOL_DEFAULT; switch (pol) { case MPOL_INTERLEAVE: @@ -1635,7 +1640,7 @@ restart: return 0; } -void mpol_shared_policy_init(struct shared_policy *info, int policy, +void mpol_shared_policy_init(struct shared_policy *info, unsigned short policy, nodemask_t *policy_nodes) { info->root = RB_ROOT; @@ -1830,7 +1835,7 @@ static inline int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) char *p = buffer; int l; nodemask_t nodes; - int mode = pol ? pol->policy : MPOL_DEFAULT; + unsigned short mode = pol ? pol->policy : MPOL_DEFAULT; switch (mode) { case MPOL_DEFAULT: diff --git a/mm/shmem.c b/mm/shmem.c index f514dd392cd9..d8ef7ba831a5 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1079,7 +1079,8 @@ redirty: #ifdef CONFIG_NUMA #ifdef CONFIG_TMPFS -static int shmem_parse_mpol(char *value, int *policy, nodemask_t *policy_nodes) +static int shmem_parse_mpol(char *value, unsigned short *policy, + nodemask_t *policy_nodes) { char *nodelist = strchr(value, ':'); int err = 1; @@ -1128,7 +1129,7 @@ out: return err; } -static void shmem_show_mpol(struct seq_file *seq, int policy, +static void shmem_show_mpol(struct seq_file *seq, unsigned short policy, const nodemask_t policy_nodes) { char *policy_string; @@ -1197,13 +1198,13 @@ static struct page *shmem_alloc_page(gfp_t gfp, } #else /* !CONFIG_NUMA */ #ifdef CONFIG_TMPFS -static inline int shmem_parse_mpol(char *value, int *policy, +static inline int shmem_parse_mpol(char *value, unsigned short *policy, nodemask_t *policy_nodes) { return 1; } -static inline void shmem_show_mpol(struct seq_file *seq, int policy, +static inline void shmem_show_mpol(struct seq_file *seq, unsigned short policy, const nodemask_t policy_nodes) { } -- cgit v1.2.3 From 028fec414d803117eb4b2ed12acb4dd5da65b32d Mon Sep 17 00:00:00 2001 From: David Rientjes Date: Mon, 28 Apr 2008 02:12:25 -0700 Subject: mempolicy: support optional mode flags With the evolution of mempolicies, it is necessary to support mempolicy mode flags that specify how the policy shall behave in certain circumstances. The most immediate need for mode flag support is to suppress remapping the nodemask of a policy at the time of rebind. Both the mempolicy mode and flags are passed by the user in the 'int policy' formal of either the set_mempolicy() or mbind() syscall. A new constant, MPOL_MODE_FLAGS, represents the union of legal optional flags that may be passed as part of this int. Mempolicies that include illegal flags as part of their policy are rejected as invalid. An additional member to struct mempolicy is added to support the mode flags: struct mempolicy { ... unsigned short policy; unsigned short flags; } The splitting of the 'int' actual passed by the user is done in sys_set_mempolicy() and sys_mbind() for their respective syscalls. This is done by intersecting the actual with MPOL_MODE_FLAGS, rejecting the syscall of there are additional flags, and storing it in the new 'flags' member of struct mempolicy. The intersection of the actual with ~MPOL_MODE_FLAGS is stored in the 'policy' member of the struct and all current users of pol->policy remain unchanged. The union of the policy mode and optional mode flags is passed back to the user in get_mempolicy(). This combination of mode and flags within the same actual does not break userspace code that relies on get_mempolicy(&policy, ...) and either switch (policy) { case MPOL_BIND: ... case MPOL_INTERLEAVE: ... }; statements or if (policy == MPOL_INTERLEAVE) { ... } statements. Such applications would need to use optional mode flags when calling set_mempolicy() or mbind() for these previously implemented statements to stop working. If an application does start using optional mode flags, it will need to mask the optional flags off the policy in switch and conditional statements that only test mode. An additional member is also added to struct shmem_sb_info to store the optional mode flags. [hugh@veritas.com: shmem mpol: fix build warning] Cc: Paul Jackson Cc: Christoph Lameter Cc: Lee Schermerhorn Cc: Andi Kleen Signed-off-by: David Rientjes Signed-off-by: Hugh Dickins Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/hugetlbfs/inode.c | 2 +- include/linux/mempolicy.h | 20 ++++++++++++++++--- include/linux/shmem_fs.h | 1 + mm/mempolicy.c | 51 ++++++++++++++++++++++++++++------------------- mm/shmem.c | 24 ++++++++++++++-------- 5 files changed, 66 insertions(+), 32 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 6846785fe904..2e9e5bdd5629 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -504,7 +504,7 @@ static struct inode *hugetlbfs_get_inode(struct super_block *sb, uid_t uid, inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME; INIT_LIST_HEAD(&inode->i_mapping->private_list); info = HUGETLBFS_I(inode); - mpol_shared_policy_init(&info->policy, MPOL_DEFAULT, NULL); + mpol_shared_policy_init(&info->policy, MPOL_DEFAULT, 0, NULL); switch (mode & S_IFMT) { default: init_special_inode(inode, mode, dev); diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index 389a06e8ee21..f2bab4d2fc40 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -8,6 +8,12 @@ * Copyright 2003,2004 Andi Kleen SuSE Labs */ +/* + * Both the MPOL_* mempolicy mode and the MPOL_F_* optional mode flags are + * passed by the user to either set_mempolicy() or mbind() in an 'int' actual. + * The MPOL_MODE_FLAGS macro determines the legal set of optional mode flags. + */ + /* Policies */ enum { MPOL_DEFAULT, @@ -17,7 +23,14 @@ enum { MPOL_MAX, /* always last member of enum */ }; -/* Flags for get_mem_policy */ +/* Flags for set_mempolicy */ +/* + * MPOL_MODE_FLAGS is the union of all possible optional mode flags passed to + * either set_mempolicy() or mbind(). + */ +#define MPOL_MODE_FLAGS (0) + +/* Flags for get_mempolicy */ #define MPOL_F_NODE (1<<0) /* return next IL mode instead of node mask */ #define MPOL_F_ADDR (1<<1) /* look up vma using address */ #define MPOL_F_MEMS_ALLOWED (1<<2) /* return allowed memories */ @@ -66,6 +79,7 @@ struct mm_struct; struct mempolicy { atomic_t refcnt; unsigned short policy; /* See MPOL_* above */ + unsigned short flags; /* See set_mempolicy() MPOL_F_* above */ union { short preferred_node; /* preferred */ nodemask_t nodes; /* interleave/bind */ @@ -136,7 +150,7 @@ struct shared_policy { }; void mpol_shared_policy_init(struct shared_policy *info, unsigned short policy, - nodemask_t *nodes); + unsigned short flags, nodemask_t *nodes); int mpol_set_shared_policy(struct shared_policy *info, struct vm_area_struct *vma, struct mempolicy *new); @@ -203,7 +217,7 @@ static inline int mpol_set_shared_policy(struct shared_policy *info, } static inline void mpol_shared_policy_init(struct shared_policy *info, - unsigned short policy, nodemask_t *nodes) + unsigned short policy, unsigned short flags, nodemask_t *nodes) { } diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 639a4070708e..d7699a628d78 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -35,6 +35,7 @@ struct shmem_sb_info { gid_t gid; /* Mount gid for root directory */ mode_t mode; /* Mount mode for root directory */ unsigned short policy; /* Default NUMA memory alloc policy */ + unsigned short flags; /* Optional mempolicy flags */ nodemask_t policy_nodes; /* nodemask for preferred and bind */ }; diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 1311dc4a3888..1f6ff9c1bbc3 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -187,12 +187,13 @@ static int is_valid_nodemask(nodemask_t *nodemask) } /* Create a new policy */ -static struct mempolicy *mpol_new(unsigned short mode, nodemask_t *nodes) +static struct mempolicy *mpol_new(unsigned short mode, unsigned short flags, + nodemask_t *nodes) { struct mempolicy *policy; - pr_debug("setting mode %d nodes[0] %lx\n", - mode, nodes ? nodes_addr(*nodes)[0] : -1); + pr_debug("setting mode %d flags %d nodes[0] %lx\n", + mode, flags, nodes ? nodes_addr(*nodes)[0] : -1); if (mode == MPOL_DEFAULT) return NULL; @@ -224,6 +225,7 @@ static struct mempolicy *mpol_new(unsigned short mode, nodemask_t *nodes) BUG(); } policy->policy = mode; + policy->flags = flags; policy->cpuset_mems_allowed = cpuset_mems_allowed(current); return policy; } @@ -466,13 +468,14 @@ static void mpol_set_task_struct_flag(void) } /* Set the process memory policy */ -static long do_set_mempolicy(unsigned short mode, nodemask_t *nodes) +static long do_set_mempolicy(unsigned short mode, unsigned short flags, + nodemask_t *nodes) { struct mempolicy *new; if (mpol_check_policy(mode, nodes)) return -EINVAL; - new = mpol_new(mode, nodes); + new = mpol_new(mode, flags, nodes); if (IS_ERR(new)) return PTR_ERR(new); mpol_free(current->mempolicy); @@ -573,7 +576,7 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask, goto out; } } else - *policy = pol->policy; + *policy = pol->policy | pol->flags; if (vma) { up_read(¤t->mm->mmap_sem); @@ -763,8 +766,8 @@ static struct page *new_vma_page(struct page *page, unsigned long private, int * #endif static long do_mbind(unsigned long start, unsigned long len, - unsigned short mode, nodemask_t *nmask, - unsigned long flags) + unsigned short mode, unsigned short mode_flags, + nodemask_t *nmask, unsigned long flags) { struct vm_area_struct *vma; struct mm_struct *mm = current->mm; @@ -796,7 +799,7 @@ static long do_mbind(unsigned long start, unsigned long len, if (mpol_check_policy(mode, nmask)) return -EINVAL; - new = mpol_new(mode, nmask); + new = mpol_new(mode, mode_flags, nmask); if (IS_ERR(new)) return PTR_ERR(new); @@ -807,8 +810,9 @@ static long do_mbind(unsigned long start, unsigned long len, if (!new) flags |= MPOL_MF_DISCONTIG_OK; - pr_debug("mbind %lx-%lx mode:%d nodes:%lx\n", start, start + len, - mode, nmask ? nodes_addr(*nmask)[0] : -1); + pr_debug("mbind %lx-%lx mode:%d flags:%d nodes:%lx\n", + start, start + len, mode, mode_flags, + nmask ? nodes_addr(*nmask)[0] : -1); down_write(&mm->mmap_sem); vma = check_range(mm, start, end, nmask, @@ -907,13 +911,16 @@ asmlinkage long sys_mbind(unsigned long start, unsigned long len, { nodemask_t nodes; int err; + unsigned short mode_flags; + mode_flags = mode & MPOL_MODE_FLAGS; + mode &= ~MPOL_MODE_FLAGS; if (mode >= MPOL_MAX) return -EINVAL; err = get_nodes(&nodes, nmask, maxnode); if (err) return err; - return do_mbind(start, len, mode, &nodes, flags); + return do_mbind(start, len, mode, mode_flags, &nodes, flags); } /* Set the process memory policy */ @@ -922,13 +929,16 @@ asmlinkage long sys_set_mempolicy(int mode, unsigned long __user *nmask, { int err; nodemask_t nodes; + unsigned short flags; - if (mode < 0 || mode >= MPOL_MAX) + flags = mode & MPOL_MODE_FLAGS; + mode &= ~MPOL_MODE_FLAGS; + if ((unsigned int)mode >= MPOL_MAX) return -EINVAL; err = get_nodes(&nodes, nmask, maxnode); if (err) return err; - return do_set_mempolicy(mode, &nodes); + return do_set_mempolicy(mode, flags, &nodes); } asmlinkage long sys_migrate_pages(pid_t pid, unsigned long maxnode, @@ -1641,7 +1651,7 @@ restart: } void mpol_shared_policy_init(struct shared_policy *info, unsigned short policy, - nodemask_t *policy_nodes) + unsigned short flags, nodemask_t *policy_nodes) { info->root = RB_ROOT; spin_lock_init(&info->lock); @@ -1650,7 +1660,7 @@ void mpol_shared_policy_init(struct shared_policy *info, unsigned short policy, struct mempolicy *newpol; /* Falls back to MPOL_DEFAULT on any error */ - newpol = mpol_new(policy, policy_nodes); + newpol = mpol_new(policy, flags, policy_nodes); if (!IS_ERR(newpol)) { /* Create pseudo-vma that contains just the policy */ struct vm_area_struct pvma; @@ -1671,9 +1681,10 @@ int mpol_set_shared_policy(struct shared_policy *info, struct sp_node *new = NULL; unsigned long sz = vma_pages(vma); - pr_debug("set_shared_policy %lx sz %lu %d %lx\n", + pr_debug("set_shared_policy %lx sz %lu %d %d %lx\n", vma->vm_pgoff, - sz, npol? npol->policy : -1, + sz, npol ? npol->policy : -1, + npol ? npol->flags : -1, npol ? nodes_addr(npol->v.nodes)[0] : -1); if (npol) { @@ -1746,14 +1757,14 @@ void __init numa_policy_init(void) if (unlikely(nodes_empty(interleave_nodes))) node_set(prefer, interleave_nodes); - if (do_set_mempolicy(MPOL_INTERLEAVE, &interleave_nodes)) + if (do_set_mempolicy(MPOL_INTERLEAVE, 0, &interleave_nodes)) printk("numa_policy_init: interleaving failed\n"); } /* Reset policy of current process to default */ void numa_default_policy(void) { - do_set_mempolicy(MPOL_DEFAULT, NULL); + do_set_mempolicy(MPOL_DEFAULT, 0, NULL); } /* Migrate a policy to a different set of nodes */ diff --git a/mm/shmem.c b/mm/shmem.c index d8ef7ba831a5..1ccf794fbe61 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1080,9 +1080,10 @@ redirty: #ifdef CONFIG_NUMA #ifdef CONFIG_TMPFS static int shmem_parse_mpol(char *value, unsigned short *policy, - nodemask_t *policy_nodes) + unsigned short *mode_flags, nodemask_t *policy_nodes) { char *nodelist = strchr(value, ':'); + char *flags = strchr(value, '='); int err = 1; if (nodelist) { @@ -1093,6 +1094,8 @@ static int shmem_parse_mpol(char *value, unsigned short *policy, if (!nodes_subset(*policy_nodes, node_states[N_HIGH_MEMORY])) goto out; } + if (flags) + *flags++ = '\0'; if (!strcmp(value, "default")) { *policy = MPOL_DEFAULT; /* Don't allow a nodelist */ @@ -1122,6 +1125,8 @@ static int shmem_parse_mpol(char *value, unsigned short *policy, *policy_nodes = node_states[N_HIGH_MEMORY]; err = 0; } + if (flags) { + } out: /* Restore string for error message */ if (nodelist) @@ -1130,7 +1135,7 @@ out: } static void shmem_show_mpol(struct seq_file *seq, unsigned short policy, - const nodemask_t policy_nodes) + unsigned short flags, const nodemask_t policy_nodes) { char *policy_string; @@ -1199,13 +1204,13 @@ static struct page *shmem_alloc_page(gfp_t gfp, #else /* !CONFIG_NUMA */ #ifdef CONFIG_TMPFS static inline int shmem_parse_mpol(char *value, unsigned short *policy, - nodemask_t *policy_nodes) + unsigned short *mode_flags, nodemask_t *policy_nodes) { return 1; } static inline void shmem_show_mpol(struct seq_file *seq, unsigned short policy, - const nodemask_t policy_nodes) + unsigned short flags, const nodemask_t policy_nodes) { } #endif /* CONFIG_TMPFS */ @@ -1578,7 +1583,7 @@ shmem_get_inode(struct super_block *sb, int mode, dev_t dev) inode->i_op = &shmem_inode_operations; inode->i_fop = &shmem_file_operations; mpol_shared_policy_init(&info->policy, sbinfo->policy, - &sbinfo->policy_nodes); + sbinfo->flags, &sbinfo->policy_nodes); break; case S_IFDIR: inc_nlink(inode); @@ -1592,7 +1597,7 @@ shmem_get_inode(struct super_block *sb, int mode, dev_t dev) * Must not load anything in the rbtree, * mpol_free_shared_policy will not be called. */ - mpol_shared_policy_init(&info->policy, MPOL_DEFAULT, + mpol_shared_policy_init(&info->policy, MPOL_DEFAULT, 0, NULL); break; } @@ -2209,7 +2214,7 @@ static int shmem_parse_options(char *options, struct shmem_sb_info *sbinfo, goto bad_val; } else if (!strcmp(this_char,"mpol")) { if (shmem_parse_mpol(value, &sbinfo->policy, - &sbinfo->policy_nodes)) + &sbinfo->flags, &sbinfo->policy_nodes)) goto bad_val; } else { printk(KERN_ERR "tmpfs: Bad mount option %s\n", @@ -2261,6 +2266,7 @@ static int shmem_remount_fs(struct super_block *sb, int *flags, char *data) sbinfo->max_inodes = config.max_inodes; sbinfo->free_inodes = config.max_inodes - inodes; sbinfo->policy = config.policy; + sbinfo->flags = config.flags; sbinfo->policy_nodes = config.policy_nodes; out: spin_unlock(&sbinfo->stat_lock); @@ -2282,7 +2288,8 @@ static int shmem_show_options(struct seq_file *seq, struct vfsmount *vfs) seq_printf(seq, ",uid=%u", sbinfo->uid); if (sbinfo->gid != 0) seq_printf(seq, ",gid=%u", sbinfo->gid); - shmem_show_mpol(seq, sbinfo->policy, sbinfo->policy_nodes); + shmem_show_mpol(seq, sbinfo->policy, sbinfo->flags, + sbinfo->policy_nodes); return 0; } #endif /* CONFIG_TMPFS */ @@ -2313,6 +2320,7 @@ static int shmem_fill_super(struct super_block *sb, sbinfo->uid = current->fsuid; sbinfo->gid = current->fsgid; sbinfo->policy = MPOL_DEFAULT; + sbinfo->flags = 0; sbinfo->policy_nodes = node_states[N_HIGH_MEMORY]; sb->s_fs_info = sbinfo; -- cgit v1.2.3 From f5b087b52f1710eb0bf15a2d2b030c51a6a1ca9e Mon Sep 17 00:00:00 2001 From: David Rientjes Date: Mon, 28 Apr 2008 02:12:27 -0700 Subject: mempolicy: add MPOL_F_STATIC_NODES flag Add an optional mempolicy mode flag, MPOL_F_STATIC_NODES, that suppresses the node remap when the policy is rebound. Adds another member to struct mempolicy, nodemask_t user_nodemask, as part of a union with cpuset_mems_allowed: struct mempolicy { ... union { nodemask_t cpuset_mems_allowed; nodemask_t user_nodemask; } w; } that stores the the nodemask that the user passed when he or she created the mempolicy via set_mempolicy() or mbind(). When using MPOL_F_STATIC_NODES, which is passed with any mempolicy mode, the user's passed nodemask intersected with the VMA or task's allowed nodes is always used when determining the preferred node, setting the MPOL_BIND zonelist, or creating the interleave nodemask. This happens whenever the policy is rebound, including when a task's cpuset assignment changes or the cpuset's mems are changed. This creates an interesting side-effect in that it allows the mempolicy "intent" to lie dormant and uneffected until it has access to the node(s) that it desires. For example, if you currently ask for an interleaved policy over a set of nodes that you do not have access to, the mempolicy is not created and the task continues to use the previous policy. With this change, however, it is possible to create the same mempolicy; it is only effected when access to nodes in the nodemask is acquired. It is also possible to mount tmpfs with the static nodemask behavior when specifying a node or nodemask. To do this, simply add "=static" immediately following the mempolicy mode at mount time: mount -o remount mpol=interleave=static:1-3 Also removes mpol_check_policy() and folds its logic into mpol_new() since it is now obsoleted. The unused vma_mpol_equal() is also removed. Cc: Paul Jackson Cc: Christoph Lameter Cc: Lee Schermerhorn Cc: Andi Kleen Signed-off-by: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mempolicy.h | 11 +-- mm/mempolicy.c | 172 ++++++++++++++++++++++++---------------------- mm/shmem.c | 2 + 3 files changed, 97 insertions(+), 88 deletions(-) diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index f2bab4d2fc40..07350d7b8d96 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -24,11 +24,13 @@ enum { }; /* Flags for set_mempolicy */ +#define MPOL_F_STATIC_NODES (1 << 15) + /* * MPOL_MODE_FLAGS is the union of all possible optional mode flags passed to * either set_mempolicy() or mbind(). */ -#define MPOL_MODE_FLAGS (0) +#define MPOL_MODE_FLAGS (MPOL_F_STATIC_NODES) /* Flags for get_mempolicy */ #define MPOL_F_NODE (1<<0) /* return next IL mode instead of node mask */ @@ -85,7 +87,10 @@ struct mempolicy { nodemask_t nodes; /* interleave/bind */ /* undefined for default */ } v; - nodemask_t cpuset_mems_allowed; /* mempolicy relative to these nodes */ + union { + nodemask_t cpuset_mems_allowed; /* relative to these nodes */ + nodemask_t user_nodemask; /* nodemask passed by user */ + } w; }; /* @@ -124,7 +129,6 @@ static inline int mpol_equal(struct mempolicy *a, struct mempolicy *b) return 1; return __mpol_equal(a, b); } -#define vma_mpol_equal(a,b) mpol_equal(vma_policy(a), vma_policy(b)) /* Could later add inheritance of the process policy here. */ @@ -190,7 +194,6 @@ static inline int mpol_equal(struct mempolicy *a, struct mempolicy *b) { return 1; } -#define vma_mpol_equal(a,b) 1 #define mpol_set_vma_default(vma) do {} while(0) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 1f6ff9c1bbc3..d59b1e766aee 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -113,58 +113,6 @@ struct mempolicy default_policy = { static void mpol_rebind_policy(struct mempolicy *pol, const nodemask_t *newmask); -/* Do sanity checking on a policy */ -static int mpol_check_policy(unsigned short mode, nodemask_t *nodes) -{ - int was_empty, is_empty; - - if (!nodes) - return 0; - - /* - * "Contextualize" the in-coming nodemast for cpusets: - * Remember whether in-coming nodemask was empty, If not, - * restrict the nodes to the allowed nodes in the cpuset. - * This is guaranteed to be a subset of nodes with memory. - */ - cpuset_update_task_memory_state(); - is_empty = was_empty = nodes_empty(*nodes); - if (!was_empty) { - nodes_and(*nodes, *nodes, cpuset_current_mems_allowed); - is_empty = nodes_empty(*nodes); /* after "contextualization" */ - } - - switch (mode) { - case MPOL_DEFAULT: - /* - * require caller to specify an empty nodemask - * before "contextualization" - */ - if (!was_empty) - return -EINVAL; - break; - case MPOL_BIND: - case MPOL_INTERLEAVE: - /* - * require at least 1 valid node after "contextualization" - */ - if (is_empty) - return -EINVAL; - break; - case MPOL_PREFERRED: - /* - * Did caller specify invalid nodes? - * Don't silently accept this as "local allocation". - */ - if (!was_empty && is_empty) - return -EINVAL; - break; - default: - BUG(); - } - return 0; -} - /* Check that the nodemask contains at least one populated zone */ static int is_valid_nodemask(nodemask_t *nodemask) { @@ -186,48 +134,60 @@ static int is_valid_nodemask(nodemask_t *nodemask) return 0; } +static inline int mpol_store_user_nodemask(const struct mempolicy *pol) +{ + return pol->flags & MPOL_F_STATIC_NODES; +} + /* Create a new policy */ static struct mempolicy *mpol_new(unsigned short mode, unsigned short flags, nodemask_t *nodes) { struct mempolicy *policy; + nodemask_t cpuset_context_nmask; pr_debug("setting mode %d flags %d nodes[0] %lx\n", mode, flags, nodes ? nodes_addr(*nodes)[0] : -1); if (mode == MPOL_DEFAULT) - return NULL; + return (nodes && nodes_weight(*nodes)) ? ERR_PTR(-EINVAL) : + NULL; policy = kmem_cache_alloc(policy_cache, GFP_KERNEL); if (!policy) return ERR_PTR(-ENOMEM); atomic_set(&policy->refcnt, 1); + cpuset_update_task_memory_state(); + nodes_and(cpuset_context_nmask, *nodes, cpuset_current_mems_allowed); switch (mode) { case MPOL_INTERLEAVE: - policy->v.nodes = *nodes; - if (nodes_weight(policy->v.nodes) == 0) { - kmem_cache_free(policy_cache, policy); - return ERR_PTR(-EINVAL); - } + if (nodes_empty(*nodes) || nodes_empty(cpuset_context_nmask)) + goto free; + policy->v.nodes = cpuset_context_nmask; break; case MPOL_PREFERRED: - policy->v.preferred_node = first_node(*nodes); + policy->v.preferred_node = first_node(cpuset_context_nmask); if (policy->v.preferred_node >= MAX_NUMNODES) - policy->v.preferred_node = -1; + goto free; break; case MPOL_BIND: - if (!is_valid_nodemask(nodes)) { - kmem_cache_free(policy_cache, policy); - return ERR_PTR(-EINVAL); - } - policy->v.nodes = *nodes; + if (!is_valid_nodemask(&cpuset_context_nmask)) + goto free; + policy->v.nodes = cpuset_context_nmask; break; default: BUG(); } policy->policy = mode; policy->flags = flags; - policy->cpuset_mems_allowed = cpuset_mems_allowed(current); + if (mpol_store_user_nodemask(policy)) + policy->w.user_nodemask = *nodes; + else + policy->w.cpuset_mems_allowed = cpuset_mems_allowed(current); return policy; + +free: + kmem_cache_free(policy_cache, policy); + return ERR_PTR(-EINVAL); } static void gather_stats(struct page *, void *, int pte_dirty); @@ -473,15 +433,14 @@ static long do_set_mempolicy(unsigned short mode, unsigned short flags, { struct mempolicy *new; - if (mpol_check_policy(mode, nodes)) - return -EINVAL; new = mpol_new(mode, flags, nodes); if (IS_ERR(new)) return PTR_ERR(new); mpol_free(current->mempolicy); current->mempolicy = new; mpol_set_task_struct_flag(); - if (new && new->policy == MPOL_INTERLEAVE) + if (new && new->policy == MPOL_INTERLEAVE && + nodes_weight(new->v.nodes)) current->il_next = first_node(new->v.nodes); return 0; } @@ -796,9 +755,6 @@ static long do_mbind(unsigned long start, unsigned long len, if (end == start) return 0; - if (mpol_check_policy(mode, nmask)) - return -EINVAL; - new = mpol_new(mode, mode_flags, nmask); if (IS_ERR(new)) return PTR_ERR(new); @@ -1206,7 +1162,8 @@ static unsigned interleave_nodes(struct mempolicy *policy) next = next_node(nid, policy->v.nodes); if (next >= MAX_NUMNODES) next = first_node(policy->v.nodes); - me->il_next = next; + if (next < MAX_NUMNODES) + me->il_next = next; return nid; } @@ -1252,10 +1209,13 @@ static unsigned offset_il_node(struct mempolicy *pol, struct vm_area_struct *vma, unsigned long off) { unsigned nnodes = nodes_weight(pol->v.nodes); - unsigned target = (unsigned)off % nnodes; + unsigned target; int c; int nid = -1; + if (!nnodes) + return numa_node_id(); + target = (unsigned int)off % nnodes; c = 0; do { nid = next_node(nid, pol->v.nodes); @@ -1465,6 +1425,16 @@ struct mempolicy *__mpol_copy(struct mempolicy *old) return new; } +static int mpol_match_intent(const struct mempolicy *a, + const struct mempolicy *b) +{ + if (a->flags != b->flags) + return 0; + if (!mpol_store_user_nodemask(a)) + return 1; + return nodes_equal(a->w.user_nodemask, b->w.user_nodemask); +} + /* Slow path of a mempolicy comparison */ int __mpol_equal(struct mempolicy *a, struct mempolicy *b) { @@ -1472,6 +1442,8 @@ int __mpol_equal(struct mempolicy *a, struct mempolicy *b) return 0; if (a->policy != b->policy) return 0; + if (a->policy != MPOL_DEFAULT && !mpol_match_intent(a, b)) + return 0; switch (a->policy) { case MPOL_DEFAULT: return 1; @@ -1771,13 +1743,14 @@ void numa_default_policy(void) static void mpol_rebind_policy(struct mempolicy *pol, const nodemask_t *newmask) { - nodemask_t *mpolmask; nodemask_t tmp; + int static_nodes; if (!pol) return; - mpolmask = &pol->cpuset_mems_allowed; - if (nodes_equal(*mpolmask, *newmask)) + static_nodes = pol->flags & MPOL_F_STATIC_NODES; + if (!mpol_store_user_nodemask(pol) && + nodes_equal(pol->w.cpuset_mems_allowed, *newmask)) return; switch (pol->policy) { @@ -1786,16 +1759,35 @@ static void mpol_rebind_policy(struct mempolicy *pol, case MPOL_BIND: /* Fall through */ case MPOL_INTERLEAVE: - nodes_remap(tmp, pol->v.nodes, *mpolmask, *newmask); + if (static_nodes) + nodes_and(tmp, pol->w.user_nodemask, *newmask); + else { + nodes_remap(tmp, pol->v.nodes, + pol->w.cpuset_mems_allowed, *newmask); + pol->w.cpuset_mems_allowed = *newmask; + } pol->v.nodes = tmp; - *mpolmask = *newmask; - current->il_next = node_remap(current->il_next, - *mpolmask, *newmask); + if (!node_isset(current->il_next, tmp)) { + current->il_next = next_node(current->il_next, tmp); + if (current->il_next >= MAX_NUMNODES) + current->il_next = first_node(tmp); + if (current->il_next >= MAX_NUMNODES) + current->il_next = numa_node_id(); + } break; case MPOL_PREFERRED: - pol->v.preferred_node = node_remap(pol->v.preferred_node, - *mpolmask, *newmask); - *mpolmask = *newmask; + if (static_nodes) { + int node = first_node(pol->w.user_nodemask); + + if (node_isset(node, *newmask)) + pol->v.preferred_node = node; + else + pol->v.preferred_node = -1; + } else { + pol->v.preferred_node = node_remap(pol->v.preferred_node, + pol->w.cpuset_mems_allowed, *newmask); + pol->w.cpuset_mems_allowed = *newmask; + } break; default: BUG(); @@ -1847,6 +1839,7 @@ static inline int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) int l; nodemask_t nodes; unsigned short mode = pol ? pol->policy : MPOL_DEFAULT; + unsigned short flags = pol ? pol->flags : 0; switch (mode) { case MPOL_DEFAULT: @@ -1876,6 +1869,17 @@ static inline int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) strcpy(p, policy_types[mode]); p += l; + if (flags) { + int need_bar = 0; + + if (buffer + maxlen < p + 2) + return -ENOSPC; + *p++ = '='; + + if (flags & MPOL_F_STATIC_NODES) + p += sprintf(p, "%sstatic", need_bar++ ? "|" : ""); + } + if (!nodes_empty(nodes)) { if (buffer + maxlen < p + 2) return -ENOSPC; diff --git a/mm/shmem.c b/mm/shmem.c index 1ccf794fbe61..3e9fda0ca470 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1126,6 +1126,8 @@ static int shmem_parse_mpol(char *value, unsigned short *policy, err = 0; } if (flags) { + if (!strcmp(flags, "static")) + *mode_flags |= MPOL_F_STATIC_NODES; } out: /* Restore string for error message */ -- cgit v1.2.3 From 7ea931c9fc80c4d0a4306c30ec92eb0f1d922a0b Mon Sep 17 00:00:00 2001 From: Paul Jackson Date: Mon, 28 Apr 2008 02:12:29 -0700 Subject: mempolicy: add bitmap_onto() and bitmap_fold() operations The following adds two more bitmap operators, bitmap_onto() and bitmap_fold(), with the usual cpumask and nodemask wrappers. The bitmap_onto() operator computes one bitmap relative to another. If the n-th bit in the origin mask is set, then the m-th bit of the destination mask will be set, where m is the position of the n-th set bit in the relative mask. The bitmap_fold() operator folds a bitmap into a second that has bit m set iff the input bitmap has some bit n set, where m == n mod sz, for the specified sz value. There are two substantive changes between this patch and its predecessor bitmap_relative: 1) Renamed bitmap_relative() to be bitmap_onto(). 2) Added bitmap_fold(). The essential motivation for bitmap_onto() is to provide a mechanism for converting a cpuset-relative CPU or Node mask to an absolute mask. Cpuset relative masks are written as if the current task were in a cpuset whose CPUs or Nodes were just the consecutive ones numbered 0..N-1, for some N. The bitmap_onto() operator is provided in anticipation of adding support for the first such cpuset relative mask, by the mbind() and set_mempolicy() system calls, using a planned flag of MPOL_F_RELATIVE_NODES. These bitmap operators (and their nodemask wrappers, in particular) will be used in code that converts the user specified cpuset relative memory policy to a specific system node numbered policy, given the current mems_allowed of the tasks cpuset. Such cpuset relative mempolicies will address two deficiencies of the existing interface between cpusets and mempolicies: 1) A task cannot at present reliably establish a cpuset relative mempolicy because there is an essential race condition, in that the tasks cpuset may be changed in between the time the task can query its cpuset placement, and the time the task can issue the applicable mbind or set_memplicy system call. 2) A task cannot at present establish what cpuset relative mempolicy it would like to have, if it is in a smaller cpuset than it might have mempolicy preferences for, because the existing interface only allows specifying mempolicies for nodes currently allowed by the cpuset. Cpuset relative mempolicies are useful for tasks that don't distinguish particularly between one CPU or Node and another, but only between how many of each are allowed, and the proper placement of threads and memory pages on the various CPUs and Nodes available. The motivation for the added bitmap_fold() can be seen in the following example. Let's say an application has specified some mempolicies that presume 16 memory nodes, including say a mempolicy that specified MPOL_F_RELATIVE_NODES (cpuset relative) nodes 12-15. Then lets say that application is crammed into a cpuset that only has 8 memory nodes, 0-7. If one just uses bitmap_onto(), this mempolicy, mapped to that cpuset, would ignore the requested relative nodes above 7, leaving it empty of nodes. That's not good; better to fold the higher nodes down, so that some nodes are included in the resulting mapped mempolicy. In this case, the mempolicy nodes 12-15 are taken modulo 8 (the weight of the mems_allowed of the confining cpuset), resulting in a mempolicy specifying nodes 4-7. Signed-off-by: Paul Jackson Signed-off-by: David Rientjes Cc: Christoph Lameter Cc: Andi Kleen Cc: Mel Gorman Cc: Lee Schermerhorn Cc: Cc: Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/bitmap.h | 6 ++ include/linux/cpumask.h | 22 ++++++- include/linux/nodemask.h | 22 ++++++- lib/bitmap.c | 158 +++++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 206 insertions(+), 2 deletions(-) diff --git a/include/linux/bitmap.h b/include/linux/bitmap.h index 1dbe074f1c64..43b406def35f 100644 --- a/include/linux/bitmap.h +++ b/include/linux/bitmap.h @@ -46,6 +46,8 @@ * bitmap_shift_left(dst, src, n, nbits) *dst = *src << n * bitmap_remap(dst, src, old, new, nbits) *dst = map(old, new)(src) * bitmap_bitremap(oldbit, old, new, nbits) newbit = map(old, new)(oldbit) + * bitmap_onto(dst, orig, relmap, nbits) *dst = orig relative to relmap + * bitmap_fold(dst, orig, sz, nbits) dst bits = orig bits mod sz * bitmap_scnprintf(buf, len, src, nbits) Print bitmap src to buf * bitmap_parse(buf, buflen, dst, nbits) Parse bitmap dst from kernel buf * bitmap_parse_user(ubuf, ulen, dst, nbits) Parse bitmap dst from user buf @@ -121,6 +123,10 @@ extern void bitmap_remap(unsigned long *dst, const unsigned long *src, const unsigned long *old, const unsigned long *new, int bits); extern int bitmap_bitremap(int oldbit, const unsigned long *old, const unsigned long *new, int bits); +extern void bitmap_onto(unsigned long *dst, const unsigned long *orig, + const unsigned long *relmap, int bits); +extern void bitmap_fold(unsigned long *dst, const unsigned long *orig, + int sz, int bits); extern int bitmap_find_free_region(unsigned long *bitmap, int bits, int order); extern void bitmap_release_region(unsigned long *bitmap, int pos, int order); extern int bitmap_allocate_region(unsigned long *bitmap, int pos, int order); diff --git a/include/linux/cpumask.h b/include/linux/cpumask.h index 259c8051155d..9650806fe2ea 100644 --- a/include/linux/cpumask.h +++ b/include/linux/cpumask.h @@ -14,6 +14,8 @@ * bitmap_scnlistprintf() and bitmap_parselist(), also in bitmap.c. * For details of cpu_remap(), see bitmap_bitremap in lib/bitmap.c * For details of cpus_remap(), see bitmap_remap in lib/bitmap.c. + * For details of cpus_onto(), see bitmap_onto in lib/bitmap.c. + * For details of cpus_fold(), see bitmap_fold in lib/bitmap.c. * * The available cpumask operations are: * @@ -53,7 +55,9 @@ * int cpulist_scnprintf(buf, len, mask) Format cpumask as list for printing * int cpulist_parse(buf, map) Parse ascii string as cpulist * int cpu_remap(oldbit, old, new) newbit = map(old, new)(oldbit) - * int cpus_remap(dst, src, old, new) *dst = map(old, new)(src) + * void cpus_remap(dst, src, old, new) *dst = map(old, new)(src) + * void cpus_onto(dst, orig, relmap) *dst = orig relative to relmap + * void cpus_fold(dst, orig, sz) dst bits = orig bits mod sz * * for_each_cpu_mask(cpu, mask) for-loop cpu over mask * @@ -330,6 +334,22 @@ static inline void __cpus_remap(cpumask_t *dstp, const cpumask_t *srcp, bitmap_remap(dstp->bits, srcp->bits, oldp->bits, newp->bits, nbits); } +#define cpus_onto(dst, orig, relmap) \ + __cpus_onto(&(dst), &(orig), &(relmap), NR_CPUS) +static inline void __cpus_onto(cpumask_t *dstp, const cpumask_t *origp, + const cpumask_t *relmapp, int nbits) +{ + bitmap_onto(dstp->bits, origp->bits, relmapp->bits, nbits); +} + +#define cpus_fold(dst, orig, sz) \ + __cpus_fold(&(dst), &(orig), sz, NR_CPUS) +static inline void __cpus_fold(cpumask_t *dstp, const cpumask_t *origp, + int sz, int nbits) +{ + bitmap_fold(dstp->bits, origp->bits, sz, nbits); +} + #if NR_CPUS > 1 #define for_each_cpu_mask(cpu, mask) \ for ((cpu) = first_cpu(mask); \ diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h index 905e18f4b412..848025cd7087 100644 --- a/include/linux/nodemask.h +++ b/include/linux/nodemask.h @@ -14,6 +14,8 @@ * bitmap_scnlistprintf() and bitmap_parselist(), also in bitmap.c. * For details of node_remap(), see bitmap_bitremap in lib/bitmap.c. * For details of nodes_remap(), see bitmap_remap in lib/bitmap.c. + * For details of nodes_onto(), see bitmap_onto in lib/bitmap.c. + * For details of nodes_fold(), see bitmap_fold in lib/bitmap.c. * * The available nodemask operations are: * @@ -55,7 +57,9 @@ * int nodelist_scnprintf(buf, len, mask) Format nodemask as list for printing * int nodelist_parse(buf, map) Parse ascii string as nodelist * int node_remap(oldbit, old, new) newbit = map(old, new)(oldbit) - * int nodes_remap(dst, src, old, new) *dst = map(old, new)(dst) + * void nodes_remap(dst, src, old, new) *dst = map(old, new)(src) + * void nodes_onto(dst, orig, relmap) *dst = orig relative to relmap + * void nodes_fold(dst, orig, sz) dst bits = orig bits mod sz * * for_each_node_mask(node, mask) for-loop node over mask * @@ -326,6 +330,22 @@ static inline void __nodes_remap(nodemask_t *dstp, const nodemask_t *srcp, bitmap_remap(dstp->bits, srcp->bits, oldp->bits, newp->bits, nbits); } +#define nodes_onto(dst, orig, relmap) \ + __nodes_onto(&(dst), &(orig), &(relmap), MAX_NUMNODES) +static inline void __nodes_onto(nodemask_t *dstp, const nodemask_t *origp, + const nodemask_t *relmapp, int nbits) +{ + bitmap_onto(dstp->bits, origp->bits, relmapp->bits, nbits); +} + +#define nodes_fold(dst, orig, sz) \ + __nodes_fold(&(dst), &(orig), sz, MAX_NUMNODES) +static inline void __nodes_fold(nodemask_t *dstp, const nodemask_t *origp, + int sz, int nbits) +{ + bitmap_fold(dstp->bits, origp->bits, sz, nbits); +} + #if MAX_NUMNODES > 1 #define for_each_node_mask(node, mask) \ for ((node) = first_node(mask); \ diff --git a/lib/bitmap.c b/lib/bitmap.c index a6939e18d7bb..c4cb48f77f0c 100644 --- a/lib/bitmap.c +++ b/lib/bitmap.c @@ -714,6 +714,164 @@ int bitmap_bitremap(int oldbit, const unsigned long *old, } EXPORT_SYMBOL(bitmap_bitremap); +/** + * bitmap_onto - translate one bitmap relative to another + * @dst: resulting translated bitmap + * @orig: original untranslated bitmap + * @relmap: bitmap relative to which translated + * @bits: number of bits in each of these bitmaps + * + * Set the n-th bit of @dst iff there exists some m such that the + * n-th bit of @relmap is set, the m-th bit of @orig is set, and + * the n-th bit of @relmap is also the m-th _set_ bit of @relmap. + * (If you understood the previous sentence the first time your + * read it, you're overqualified for your current job.) + * + * In other words, @orig is mapped onto (surjectively) @dst, + * using the the map { | the n-th bit of @relmap is the + * m-th set bit of @relmap }. + * + * Any set bits in @orig above bit number W, where W is the + * weight of (number of set bits in) @relmap are mapped nowhere. + * In particular, if for all bits m set in @orig, m >= W, then + * @dst will end up empty. In situations where the possibility + * of such an empty result is not desired, one way to avoid it is + * to use the bitmap_fold() operator, below, to first fold the + * @orig bitmap over itself so that all its set bits x are in the + * range 0 <= x < W. The bitmap_fold() operator does this by + * setting the bit (m % W) in @dst, for each bit (m) set in @orig. + * + * Example [1] for bitmap_onto(): + * Let's say @relmap has bits 30-39 set, and @orig has bits + * 1, 3, 5, 7, 9 and 11 set. Then on return from this routine, + * @dst will have bits 31, 33, 35, 37 and 39 set. + * + * When bit 0 is set in @orig, it means turn on the bit in + * @dst corresponding to whatever is the first bit (if any) + * that is turned on in @relmap. Since bit 0 was off in the + * above example, we leave off that bit (bit 30) in @dst. + * + * When bit 1 is set in @orig (as in the above example), it + * means turn on the bit in @dst corresponding to whatever + * is the second bit that is turned on in @relmap. The second + * bit in @relmap that was turned on in the above example was + * bit 31, so we turned on bit 31 in @dst. + * + * Similarly, we turned on bits 33, 35, 37 and 39 in @dst, + * because they were the 4th, 6th, 8th and 10th set bits + * set in @relmap, and the 4th, 6th, 8th and 10th bits of + * @orig (i.e. bits 3, 5, 7 and 9) were also set. + * + * When bit 11 is set in @orig, it means turn on the bit in + * @dst corresponding to whatever is the twelth bit that is + * turned on in @relmap. In the above example, there were + * only ten bits turned on in @relmap (30..39), so that bit + * 11 was set in @orig had no affect on @dst. + * + * Example [2] for bitmap_fold() + bitmap_onto(): + * Let's say @relmap has these ten bits set: + * 40 41 42 43 45 48 53 61 74 95 + * (for the curious, that's 40 plus the first ten terms of the + * Fibonacci sequence.) + * + * Further lets say we use the following code, invoking + * bitmap_fold() then bitmap_onto, as suggested above to + * avoid the possitility of an empty @dst result: + * + * unsigned long *tmp; // a temporary bitmap's bits + * + * bitmap_fold(tmp, orig, bitmap_weight(relmap, bits), bits); + * bitmap_onto(dst, tmp, relmap, bits); + * + * Then this table shows what various values of @dst would be, for + * various @orig's. I list the zero-based positions of each set bit. + * The tmp column shows the intermediate result, as computed by + * using bitmap_fold() to fold the @orig bitmap modulo ten + * (the weight of @relmap). + * + * @orig tmp @dst + * 0 0 40 + * 1 1 41 + * 9 9 95 + * 10 0 40 (*) + * 1 3 5 7 1 3 5 7 41 43 48 61 + * 0 1 2 3 4 0 1 2 3 4 40 41 42 43 45 + * 0 9 18 27 0 9 8 7 40 61 74 95 + * 0 10 20 30 0 40 + * 0 11 22 33 0 1 2 3 40 41 42 43 + * 0 12 24 36 0 2 4 6 40 42 45 53 + * 78 102 211 1 2 8 41 42 74 (*) + * + * (*) For these marked lines, if we hadn't first done bitmap_fold() + * into tmp, then the @dst result would have been empty. + * + * If either of @orig or @relmap is empty (no set bits), then @dst + * will be returned empty. + * + * If (as explained above) the only set bits in @orig are in positions + * m where m >= W, (where W is the weight of @relmap) then @dst will + * once again be returned empty. + * + * All bits in @dst not set by the above rule are cleared. + */ +void bitmap_onto(unsigned long *dst, const unsigned long *orig, + const unsigned long *relmap, int bits) +{ + int n, m; /* same meaning as in above comment */ + + if (dst == orig) /* following doesn't handle inplace mappings */ + return; + bitmap_zero(dst, bits); + + /* + * The following code is a more efficient, but less + * obvious, equivalent to the loop: + * for (m = 0; m < bitmap_weight(relmap, bits); m++) { + * n = bitmap_ord_to_pos(orig, m, bits); + * if (test_bit(m, orig)) + * set_bit(n, dst); + * } + */ + + m = 0; + for (n = find_first_bit(relmap, bits); + n < bits; + n = find_next_bit(relmap, bits, n + 1)) { + /* m == bitmap_pos_to_ord(relmap, n, bits) */ + if (test_bit(m, orig)) + set_bit(n, dst); + m++; + } +} +EXPORT_SYMBOL(bitmap_onto); + +/** + * bitmap_fold - fold larger bitmap into smaller, modulo specified size + * @dst: resulting smaller bitmap + * @orig: original larger bitmap + * @sz: specified size + * @bits: number of bits in each of these bitmaps + * + * For each bit oldbit in @orig, set bit oldbit mod @sz in @dst. + * Clear all other bits in @dst. See further the comment and + * Example [2] for bitmap_onto() for why and how to use this. + */ +void bitmap_fold(unsigned long *dst, const unsigned long *orig, + int sz, int bits) +{ + int oldbit; + + if (dst == orig) /* following doesn't handle inplace mappings */ + return; + bitmap_zero(dst, bits); + + for (oldbit = find_first_bit(orig, bits); + oldbit < bits; + oldbit = find_next_bit(orig, bits, oldbit + 1)) + set_bit(oldbit % sz, dst); +} +EXPORT_SYMBOL(bitmap_fold); + /* * Common code for bitmap_*_region() routines. * bitmap: array of unsigned longs corresponding to the bitmap -- cgit v1.2.3 From 4c50bc0116cf3cc35e7152d6a8424b4db65f52d6 Mon Sep 17 00:00:00 2001 From: David Rientjes Date: Mon, 28 Apr 2008 02:12:30 -0700 Subject: mempolicy: add MPOL_F_RELATIVE_NODES flag Adds another optional mode flag, MPOL_F_RELATIVE_NODES, that specifies nodemasks passed via set_mempolicy() or mbind() should be considered relative to the current task's mems_allowed. When the mempolicy is created, the passed nodemask is folded and mapped onto the current task's mems_allowed. For example, consider a task using set_mempolicy() to pass MPOL_INTERLEAVE | MPOL_F_RELATIVE_NODES with a nodemask of 1-3. If current's mems_allowed is 4-7, the effected nodemask is 5-7 (the second, third, and fourth node of mems_allowed). If the same task is attached to a cpuset, the mempolicy nodemask is rebound each time the mems are changed. Some possible rebinds and results are: mems result 1-3 1-3 1-7 2-4 1,5-6 1,5-6 1,5-7 5-7 Likewise, the zonelist built for MPOL_BIND acts on the set of zones assigned to the resultant nodemask from the relative remap. In the MPOL_PREFERRED case, the preferred node is remapped from the currently effected nodemask to the relative nodemask. This mempolicy mode flag was conceived of by Paul Jackson . Cc: Paul Jackson Cc: Christoph Lameter Cc: Lee Schermerhorn Cc: Andi Kleen Signed-off-by: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mempolicy.h | 3 ++- mm/mempolicy.c | 33 +++++++++++++++++++++++++++++++-- mm/shmem.c | 6 ++++++ 3 files changed, 39 insertions(+), 3 deletions(-) diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index 07350d7b8d96..02b11efd7066 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -25,12 +25,13 @@ enum { /* Flags for set_mempolicy */ #define MPOL_F_STATIC_NODES (1 << 15) +#define MPOL_F_RELATIVE_NODES (1 << 14) /* * MPOL_MODE_FLAGS is the union of all possible optional mode flags passed to * either set_mempolicy() or mbind(). */ -#define MPOL_MODE_FLAGS (MPOL_F_STATIC_NODES) +#define MPOL_MODE_FLAGS (MPOL_F_STATIC_NODES | MPOL_F_RELATIVE_NODES) /* Flags for get_mempolicy */ #define MPOL_F_NODE (1<<0) /* return next IL mode instead of node mask */ diff --git a/mm/mempolicy.c b/mm/mempolicy.c index d59b1e766aee..ffd3be66b255 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -136,7 +136,15 @@ static int is_valid_nodemask(nodemask_t *nodemask) static inline int mpol_store_user_nodemask(const struct mempolicy *pol) { - return pol->flags & MPOL_F_STATIC_NODES; + return pol->flags & (MPOL_F_STATIC_NODES | MPOL_F_RELATIVE_NODES); +} + +static void mpol_relative_nodemask(nodemask_t *ret, const nodemask_t *orig, + const nodemask_t *rel) +{ + nodemask_t tmp; + nodes_fold(tmp, *orig, nodes_weight(*rel)); + nodes_onto(*ret, tmp, *rel); } /* Create a new policy */ @@ -157,7 +165,12 @@ static struct mempolicy *mpol_new(unsigned short mode, unsigned short flags, return ERR_PTR(-ENOMEM); atomic_set(&policy->refcnt, 1); cpuset_update_task_memory_state(); - nodes_and(cpuset_context_nmask, *nodes, cpuset_current_mems_allowed); + if (flags & MPOL_F_RELATIVE_NODES) + mpol_relative_nodemask(&cpuset_context_nmask, nodes, + &cpuset_current_mems_allowed); + else + nodes_and(cpuset_context_nmask, *nodes, + cpuset_current_mems_allowed); switch (mode) { case MPOL_INTERLEAVE: if (nodes_empty(*nodes) || nodes_empty(cpuset_context_nmask)) @@ -873,6 +886,9 @@ asmlinkage long sys_mbind(unsigned long start, unsigned long len, mode &= ~MPOL_MODE_FLAGS; if (mode >= MPOL_MAX) return -EINVAL; + if ((mode_flags & MPOL_F_STATIC_NODES) && + (mode_flags & MPOL_F_RELATIVE_NODES)) + return -EINVAL; err = get_nodes(&nodes, nmask, maxnode); if (err) return err; @@ -891,6 +907,8 @@ asmlinkage long sys_set_mempolicy(int mode, unsigned long __user *nmask, mode &= ~MPOL_MODE_FLAGS; if ((unsigned int)mode >= MPOL_MAX) return -EINVAL; + if ((flags & MPOL_F_STATIC_NODES) && (flags & MPOL_F_RELATIVE_NODES)) + return -EINVAL; err = get_nodes(&nodes, nmask, maxnode); if (err) return err; @@ -1745,10 +1763,12 @@ static void mpol_rebind_policy(struct mempolicy *pol, { nodemask_t tmp; int static_nodes; + int relative_nodes; if (!pol) return; static_nodes = pol->flags & MPOL_F_STATIC_NODES; + relative_nodes = pol->flags & MPOL_F_RELATIVE_NODES; if (!mpol_store_user_nodemask(pol) && nodes_equal(pol->w.cpuset_mems_allowed, *newmask)) return; @@ -1761,6 +1781,9 @@ static void mpol_rebind_policy(struct mempolicy *pol, case MPOL_INTERLEAVE: if (static_nodes) nodes_and(tmp, pol->w.user_nodemask, *newmask); + else if (relative_nodes) + mpol_relative_nodemask(&tmp, &pol->w.user_nodemask, + newmask); else { nodes_remap(tmp, pol->v.nodes, pol->w.cpuset_mems_allowed, *newmask); @@ -1783,6 +1806,10 @@ static void mpol_rebind_policy(struct mempolicy *pol, pol->v.preferred_node = node; else pol->v.preferred_node = -1; + } else if (relative_nodes) { + mpol_relative_nodemask(&tmp, &pol->w.user_nodemask, + newmask); + pol->v.preferred_node = first_node(tmp); } else { pol->v.preferred_node = node_remap(pol->v.preferred_node, pol->w.cpuset_mems_allowed, *newmask); @@ -1878,6 +1905,8 @@ static inline int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) if (flags & MPOL_F_STATIC_NODES) p += sprintf(p, "%sstatic", need_bar++ ? "|" : ""); + if (flags & MPOL_F_RELATIVE_NODES) + p += sprintf(p, "%srelative", need_bar++ ? "|" : ""); } if (!nodes_empty(nodes)) { diff --git a/mm/shmem.c b/mm/shmem.c index 3e9fda0ca470..9435f298dd75 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1128,6 +1128,12 @@ static int shmem_parse_mpol(char *value, unsigned short *policy, if (flags) { if (!strcmp(flags, "static")) *mode_flags |= MPOL_F_STATIC_NODES; + if (!strcmp(flags, "relative")) + *mode_flags |= MPOL_F_RELATIVE_NODES; + + if ((*mode_flags & MPOL_F_STATIC_NODES) && + (*mode_flags & MPOL_F_RELATIVE_NODES)) + err = 1; } out: /* Restore string for error message */ -- cgit v1.2.3 From 65d66fc02ed9433b957588071b60425b12628e25 Mon Sep 17 00:00:00 2001 From: David Rientjes Date: Mon, 28 Apr 2008 02:12:31 -0700 Subject: mempolicy: update NUMA memory policy documentation Updates Documentation/vm/numa_memory_policy.txt and Documentation/filesystems/tmpfs.txt to describe optional mempolicy mode flags. Cc: Christoph Lameter Cc: Lee Schermerhorn Cc: Andi Kleen Cc: Randy Dunlap Signed-off-by: David Rientjes Signed-off-by: Paul Jackson Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/filesystems/tmpfs.txt | 12 +++ Documentation/vm/numa_memory_policy.txt | 131 ++++++++++++++++++++++++-------- 2 files changed, 112 insertions(+), 31 deletions(-) diff --git a/Documentation/filesystems/tmpfs.txt b/Documentation/filesystems/tmpfs.txt index 145e44086358..222437efd75a 100644 --- a/Documentation/filesystems/tmpfs.txt +++ b/Documentation/filesystems/tmpfs.txt @@ -92,6 +92,18 @@ NodeList format is a comma-separated list of decimal numbers and ranges, a range being two hyphen-separated decimal numbers, the smallest and largest node numbers in the range. For example, mpol=bind:0-3,5,7,9-15 +NUMA memory allocation policies have optional flags that can be used in +conjunction with their modes. These optional flags can be specified +when tmpfs is mounted by appending them to the mode before the NodeList. +See Documentation/vm/numa_memory_policy.txt for a list of all available +memory allocation policy mode flags. + + =static is equivalent to MPOL_F_STATIC_NODES + =relative is equivalent to MPOL_F_RELATIVE_NODES + +For example, mpol=bind=static:NodeList, is the equivalent of an +allocation policy of MPOL_BIND | MPOL_F_STATIC_NODES. + Note that trying to mount a tmpfs with an mpol option will fail if the running kernel does not support NUMA; and will fail if its nodelist specifies a node which is not online. If your system relies on that diff --git a/Documentation/vm/numa_memory_policy.txt b/Documentation/vm/numa_memory_policy.txt index 1278e685d650..706410dfb9e5 100644 --- a/Documentation/vm/numa_memory_policy.txt +++ b/Documentation/vm/numa_memory_policy.txt @@ -135,9 +135,11 @@ most general to most specific: Components of Memory Policies - A Linux memory policy is a tuple consisting of a "mode" and an optional set - of nodes. The mode determine the behavior of the policy, while the - optional set of nodes can be viewed as the arguments to the behavior. + A Linux memory policy consists of a "mode", optional mode flags, and an + optional set of nodes. The mode determines the behavior of the policy, + the optional mode flags determine the behavior of the mode, and the + optional set of nodes can be viewed as the arguments to the policy + behavior. Internally, memory policies are implemented by a reference counted structure, struct mempolicy. Details of this structure will be discussed @@ -179,7 +181,8 @@ Components of Memory Policies on a non-shared region of the address space. However, see MPOL_PREFERRED below. - The Default mode does not use the optional set of nodes. + It is an error for the set of nodes specified for this policy to + be non-empty. MPOL_BIND: This mode specifies that memory must come from the set of nodes specified by the policy. Memory will be allocated from @@ -226,6 +229,80 @@ Components of Memory Policies the temporary interleaved system default policy works in this mode. + Linux memory policy supports the following optional mode flags: + + MPOL_F_STATIC_NODES: This flag specifies that the nodemask passed by + the user should not be remapped if the task or VMA's set of allowed + nodes changes after the memory policy has been defined. + + Without this flag, anytime a mempolicy is rebound because of a + change in the set of allowed nodes, the node (Preferred) or + nodemask (Bind, Interleave) is remapped to the new set of + allowed nodes. This may result in nodes being used that were + previously undesired. + + With this flag, if the user-specified nodes overlap with the + nodes allowed by the task's cpuset, then the memory policy is + applied to their intersection. If the two sets of nodes do not + overlap, the Default policy is used. + + For example, consider a task that is attached to a cpuset with + mems 1-3 that sets an Interleave policy over the same set. If + the cpuset's mems change to 3-5, the Interleave will now occur + over nodes 3, 4, and 5. With this flag, however, since only node + 3 is allowed from the user's nodemask, the "interleave" only + occurs over that node. If no nodes from the user's nodemask are + now allowed, the Default behavior is used. + + MPOL_F_STATIC_NODES cannot be used with MPOL_F_RELATIVE_NODES. + + MPOL_F_RELATIVE_NODES: This flag specifies that the nodemask passed + by the user will be mapped relative to the set of the task or VMA's + set of allowed nodes. The kernel stores the user-passed nodemask, + and if the allowed nodes changes, then that original nodemask will + be remapped relative to the new set of allowed nodes. + + Without this flag (and without MPOL_F_STATIC_NODES), anytime a + mempolicy is rebound because of a change in the set of allowed + nodes, the node (Preferred) or nodemask (Bind, Interleave) is + remapped to the new set of allowed nodes. That remap may not + preserve the relative nature of the user's passed nodemask to its + set of allowed nodes upon successive rebinds: a nodemask of + 1,3,5 may be remapped to 7-9 and then to 1-3 if the set of + allowed nodes is restored to its original state. + + With this flag, the remap is done so that the node numbers from + the user's passed nodemask are relative to the set of allowed + nodes. In other words, if nodes 0, 2, and 4 are set in the user's + nodemask, the policy will be effected over the first (and in the + Bind or Interleave case, the third and fifth) nodes in the set of + allowed nodes. The nodemask passed by the user represents nodes + relative to task or VMA's set of allowed nodes. + + If the user's nodemask includes nodes that are outside the range + of the new set of allowed nodes (for example, node 5 is set in + the user's nodemask when the set of allowed nodes is only 0-3), + then the remap wraps around to the beginning of the nodemask and, + if not already set, sets the node in the mempolicy nodemask. + + For example, consider a task that is attached to a cpuset with + mems 2-5 that sets an Interleave policy over the same set with + MPOL_F_RELATIVE_NODES. If the cpuset's mems change to 3-7, the + interleave now occurs over nodes 3,5-6. If the cpuset's mems + then change to 0,2-3,5, then the interleave occurs over nodes + 0,3,5. + + Thanks to the consistent remapping, applications preparing + nodemasks to specify memory policies using this flag should + disregard their current, actual cpuset imposed memory placement + and prepare the nodemask as if they were always located on + memory nodes 0 to N-1, where N is the number of memory nodes the + policy is intended to manage. Let the kernel then remap to the + set of memory nodes allowed by the task's cpuset, as that may + change over time. + + MPOL_F_RELATIVE_NODES cannot be used with MPOL_F_STATIC_NODES. + MEMORY POLICY APIs Linux supports 3 system calls for controlling memory policy. These APIS @@ -246,7 +323,9 @@ Set [Task] Memory Policy: Set's the calling task's "task/process memory policy" to mode specified by the 'mode' argument and the set of nodes defined by 'nmask'. 'nmask' points to a bit mask of node ids containing - at least 'maxnode' ids. + at least 'maxnode' ids. Optional mode flags may be passed by + combining the 'mode' argument with the flag (for example: + MPOL_INTERLEAVE | MPOL_F_STATIC_NODES). See the set_mempolicy(2) man page for more details @@ -298,29 +377,19 @@ MEMORY POLICIES AND CPUSETS Memory policies work within cpusets as described above. For memory policies that require a node or set of nodes, the nodes are restricted to the set of nodes whose memories are allowed by the cpuset constraints. If the nodemask -specified for the policy contains nodes that are not allowed by the cpuset, or -the intersection of the set of nodes specified for the policy and the set of -nodes with memory is the empty set, the policy is considered invalid -and cannot be installed. - -The interaction of memory policies and cpusets can be problematic for a -couple of reasons: - -1) the memory policy APIs take physical node id's as arguments. As mentioned - above, it is illegal to specify nodes that are not allowed in the cpuset. - The application must query the allowed nodes using the get_mempolicy() - API with the MPOL_F_MEMS_ALLOWED flag to determine the allowed nodes and - restrict itself to those nodes. However, the resources available to a - cpuset can be changed by the system administrator, or a workload manager - application, at any time. So, a task may still get errors attempting to - specify policy nodes, and must query the allowed memories again. - -2) when tasks in two cpusets share access to a memory region, such as shared - memory segments created by shmget() of mmap() with the MAP_ANONYMOUS and - MAP_SHARED flags, and any of the tasks install shared policy on the region, - only nodes whose memories are allowed in both cpusets may be used in the - policies. Obtaining this information requires "stepping outside" the - memory policy APIs to use the cpuset information and requires that one - know in what cpusets other task might be attaching to the shared region. - Furthermore, if the cpusets' allowed memory sets are disjoint, "local" - allocation is the only valid policy. +specified for the policy contains nodes that are not allowed by the cpuset and +MPOL_F_RELATIVE_NODES is not used, the intersection of the set of nodes +specified for the policy and the set of nodes with memory is used. If the +result is the empty set, the policy is considered invalid and cannot be +installed. If MPOL_F_RELATIVE_NODES is used, the policy's nodes are mapped +onto and folded into the task's set of allowed nodes as previously described. + +The interaction of memory policies and cpusets can be problematic when tasks +in two cpusets share access to a memory region, such as shared memory segments +created by shmget() of mmap() with the MAP_ANONYMOUS and MAP_SHARED flags, and +any of the tasks install shared policy on the region, only nodes whose +memories are allowed in both cpusets may be used in the policies. Obtaining +this information requires "stepping outside" the memory policy APIs to use the +cpuset information and requires that one know in what cpusets other task might +be attaching to the shared region. Furthermore, if the cpusets' allowed +memory sets are disjoint, "local" allocation is the only valid policy. -- cgit v1.2.3 From 1d0d2680a01c4f9e292ec6d4714884da939053a1 Mon Sep 17 00:00:00 2001 From: David Rientjes Date: Mon, 28 Apr 2008 02:12:32 -0700 Subject: mempolicy: move rebind functions Move the mpol_rebind_{policy,task,mm}() functions after mpol_new() to avoid having to declare function prototypes. Cc: Paul Jackson Cc: Christoph Lameter Cc: Lee Schermerhorn Cc: Andi Kleen Signed-off-by: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/mempolicy.c | 185 ++++++++++++++++++++++++++++----------------------------- 1 file changed, 91 insertions(+), 94 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index ffd3be66b255..d44c524e5ae4 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -110,9 +110,6 @@ struct mempolicy default_policy = { .policy = MPOL_DEFAULT, }; -static void mpol_rebind_policy(struct mempolicy *pol, - const nodemask_t *newmask); - /* Check that the nodemask contains at least one populated zone */ static int is_valid_nodemask(nodemask_t *nodemask) { @@ -203,6 +200,97 @@ free: return ERR_PTR(-EINVAL); } +/* Migrate a policy to a different set of nodes */ +static void mpol_rebind_policy(struct mempolicy *pol, + const nodemask_t *newmask) +{ + nodemask_t tmp; + int static_nodes; + int relative_nodes; + + if (!pol) + return; + static_nodes = pol->flags & MPOL_F_STATIC_NODES; + relative_nodes = pol->flags & MPOL_F_RELATIVE_NODES; + if (!mpol_store_user_nodemask(pol) && + nodes_equal(pol->w.cpuset_mems_allowed, *newmask)) + return; + + switch (pol->policy) { + case MPOL_DEFAULT: + break; + case MPOL_BIND: + /* Fall through */ + case MPOL_INTERLEAVE: + if (static_nodes) + nodes_and(tmp, pol->w.user_nodemask, *newmask); + else if (relative_nodes) + mpol_relative_nodemask(&tmp, &pol->w.user_nodemask, + newmask); + else { + nodes_remap(tmp, pol->v.nodes, + pol->w.cpuset_mems_allowed, *newmask); + pol->w.cpuset_mems_allowed = *newmask; + } + pol->v.nodes = tmp; + if (!node_isset(current->il_next, tmp)) { + current->il_next = next_node(current->il_next, tmp); + if (current->il_next >= MAX_NUMNODES) + current->il_next = first_node(tmp); + if (current->il_next >= MAX_NUMNODES) + current->il_next = numa_node_id(); + } + break; + case MPOL_PREFERRED: + if (static_nodes) { + int node = first_node(pol->w.user_nodemask); + + if (node_isset(node, *newmask)) + pol->v.preferred_node = node; + else + pol->v.preferred_node = -1; + } else if (relative_nodes) { + mpol_relative_nodemask(&tmp, &pol->w.user_nodemask, + newmask); + pol->v.preferred_node = first_node(tmp); + } else { + pol->v.preferred_node = node_remap(pol->v.preferred_node, + pol->w.cpuset_mems_allowed, *newmask); + pol->w.cpuset_mems_allowed = *newmask; + } + break; + default: + BUG(); + break; + } +} + +/* + * Wrapper for mpol_rebind_policy() that just requires task + * pointer, and updates task mempolicy. + */ + +void mpol_rebind_task(struct task_struct *tsk, const nodemask_t *new) +{ + mpol_rebind_policy(tsk->mempolicy, new); +} + +/* + * Rebind each vma in mm to new nodemask. + * + * Call holding a reference to mm. Takes mm->mmap_sem during call. + */ + +void mpol_rebind_mm(struct mm_struct *mm, nodemask_t *new) +{ + struct vm_area_struct *vma; + + down_write(&mm->mmap_sem); + for (vma = mm->mmap; vma; vma = vma->vm_next) + mpol_rebind_policy(vma->vm_policy, new); + up_write(&mm->mmap_sem); +} + static void gather_stats(struct page *, void *, int pte_dirty); static void migrate_page_add(struct page *page, struct list_head *pagelist, unsigned long flags); @@ -1757,97 +1845,6 @@ void numa_default_policy(void) do_set_mempolicy(MPOL_DEFAULT, 0, NULL); } -/* Migrate a policy to a different set of nodes */ -static void mpol_rebind_policy(struct mempolicy *pol, - const nodemask_t *newmask) -{ - nodemask_t tmp; - int static_nodes; - int relative_nodes; - - if (!pol) - return; - static_nodes = pol->flags & MPOL_F_STATIC_NODES; - relative_nodes = pol->flags & MPOL_F_RELATIVE_NODES; - if (!mpol_store_user_nodemask(pol) && - nodes_equal(pol->w.cpuset_mems_allowed, *newmask)) - return; - - switch (pol->policy) { - case MPOL_DEFAULT: - break; - case MPOL_BIND: - /* Fall through */ - case MPOL_INTERLEAVE: - if (static_nodes) - nodes_and(tmp, pol->w.user_nodemask, *newmask); - else if (relative_nodes) - mpol_relative_nodemask(&tmp, &pol->w.user_nodemask, - newmask); - else { - nodes_remap(tmp, pol->v.nodes, - pol->w.cpuset_mems_allowed, *newmask); - pol->w.cpuset_mems_allowed = *newmask; - } - pol->v.nodes = tmp; - if (!node_isset(current->il_next, tmp)) { - current->il_next = next_node(current->il_next, tmp); - if (current->il_next >= MAX_NUMNODES) - current->il_next = first_node(tmp); - if (current->il_next >= MAX_NUMNODES) - current->il_next = numa_node_id(); - } - break; - case MPOL_PREFERRED: - if (static_nodes) { - int node = first_node(pol->w.user_nodemask); - - if (node_isset(node, *newmask)) - pol->v.preferred_node = node; - else - pol->v.preferred_node = -1; - } else if (relative_nodes) { - mpol_relative_nodemask(&tmp, &pol->w.user_nodemask, - newmask); - pol->v.preferred_node = first_node(tmp); - } else { - pol->v.preferred_node = node_remap(pol->v.preferred_node, - pol->w.cpuset_mems_allowed, *newmask); - pol->w.cpuset_mems_allowed = *newmask; - } - break; - default: - BUG(); - break; - } -} - -/* - * Wrapper for mpol_rebind_policy() that just requires task - * pointer, and updates task mempolicy. - */ - -void mpol_rebind_task(struct task_struct *tsk, const nodemask_t *new) -{ - mpol_rebind_policy(tsk->mempolicy, new); -} - -/* - * Rebind each vma in mm to new nodemask. - * - * Call holding a reference to mm. Takes mm->mmap_sem during call. - */ - -void mpol_rebind_mm(struct mm_struct *mm, nodemask_t *new) -{ - struct vm_area_struct *vma; - - down_write(&mm->mmap_sem); - for (vma = mm->mmap; vma; vma = vma->vm_next) - mpol_rebind_policy(vma->vm_policy, new); - up_write(&mm->mmap_sem); -} - /* * Display pages allocated per node and memory policy via /proc. */ -- cgit v1.2.3 From 37012946da940521fb997a758a219d2f1ab56e51 Mon Sep 17 00:00:00 2001 From: David Rientjes Date: Mon, 28 Apr 2008 02:12:33 -0700 Subject: mempolicy: create mempolicy_operations structure Create a mempolicy_operations structure that currently points to two functions[*] for the various modes: int (*create)(struct mempolicy *, const nodemask_t *); void (*rebind)(struct mempolicy *, const nodemask_t *); This splits the implementation for the various modes out of two large functions, mpol_new() and mpol_rebind_policy(). Eventually it may be beneficial to add additional functions to accomodate the existing switch() statements in mm/mempolicy.c. [*] The ->create() function for MPOL_DEFAULT is currently NULL since no struct mempolicy is dynamically allocated. [Lee.Schermerhorn@hp.com: fix regression in the package mempolicy regression tests] Signed-off-by: David Rientjes Cc: Paul Jackson Cc: Christoph Lameter Cc: Andi Kleen Signed-off-by: Lee Schermerhorn Cc: Eric Whitney Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/mempolicy.c | 233 ++++++++++++++++++++++++++++++++++----------------------- 1 file changed, 140 insertions(+), 93 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index d44c524e5ae4..a94d994eaaa8 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -63,7 +63,6 @@ grows down? make bind policy root only? It can trigger oom much faster and the kernel is not always grateful with that. - could replace all the switch()es with a mempolicy_ops structure. */ #include @@ -110,8 +109,13 @@ struct mempolicy default_policy = { .policy = MPOL_DEFAULT, }; +static const struct mempolicy_operations { + int (*create)(struct mempolicy *pol, const nodemask_t *nodes); + void (*rebind)(struct mempolicy *pol, const nodemask_t *nodes); +} mpol_ops[MPOL_MAX]; + /* Check that the nodemask contains at least one populated zone */ -static int is_valid_nodemask(nodemask_t *nodemask) +static int is_valid_nodemask(const nodemask_t *nodemask) { int nd, k; @@ -144,125 +148,151 @@ static void mpol_relative_nodemask(nodemask_t *ret, const nodemask_t *orig, nodes_onto(*ret, tmp, *rel); } +static int mpol_new_interleave(struct mempolicy *pol, const nodemask_t *nodes) +{ + if (nodes_empty(*nodes)) + return -EINVAL; + pol->v.nodes = *nodes; + return 0; +} + +static int mpol_new_preferred(struct mempolicy *pol, const nodemask_t *nodes) +{ + if (!nodes) + pol->v.preferred_node = -1; /* local allocation */ + else if (nodes_empty(*nodes)) + return -EINVAL; /* no allowed nodes */ + else + pol->v.preferred_node = first_node(*nodes); + return 0; +} + +static int mpol_new_bind(struct mempolicy *pol, const nodemask_t *nodes) +{ + if (!is_valid_nodemask(nodes)) + return -EINVAL; + pol->v.nodes = *nodes; + return 0; +} + /* Create a new policy */ static struct mempolicy *mpol_new(unsigned short mode, unsigned short flags, nodemask_t *nodes) { struct mempolicy *policy; nodemask_t cpuset_context_nmask; + int localalloc = 0; + int ret; pr_debug("setting mode %d flags %d nodes[0] %lx\n", mode, flags, nodes ? nodes_addr(*nodes)[0] : -1); if (mode == MPOL_DEFAULT) - return (nodes && nodes_weight(*nodes)) ? ERR_PTR(-EINVAL) : - NULL; + return NULL; + if (!nodes || nodes_empty(*nodes)) { + if (mode != MPOL_PREFERRED) + return ERR_PTR(-EINVAL); + localalloc = 1; /* special case: no mode flags */ + } policy = kmem_cache_alloc(policy_cache, GFP_KERNEL); if (!policy) return ERR_PTR(-ENOMEM); atomic_set(&policy->refcnt, 1); - cpuset_update_task_memory_state(); - if (flags & MPOL_F_RELATIVE_NODES) - mpol_relative_nodemask(&cpuset_context_nmask, nodes, - &cpuset_current_mems_allowed); - else - nodes_and(cpuset_context_nmask, *nodes, - cpuset_current_mems_allowed); - switch (mode) { - case MPOL_INTERLEAVE: - if (nodes_empty(*nodes) || nodes_empty(cpuset_context_nmask)) - goto free; - policy->v.nodes = cpuset_context_nmask; - break; - case MPOL_PREFERRED: - policy->v.preferred_node = first_node(cpuset_context_nmask); - if (policy->v.preferred_node >= MAX_NUMNODES) - goto free; - break; - case MPOL_BIND: - if (!is_valid_nodemask(&cpuset_context_nmask)) - goto free; - policy->v.nodes = cpuset_context_nmask; - break; - default: - BUG(); - } policy->policy = mode; - policy->flags = flags; - if (mpol_store_user_nodemask(policy)) - policy->w.user_nodemask = *nodes; - else - policy->w.cpuset_mems_allowed = cpuset_mems_allowed(current); + + if (!localalloc) { + policy->flags = flags; + cpuset_update_task_memory_state(); + if (flags & MPOL_F_RELATIVE_NODES) + mpol_relative_nodemask(&cpuset_context_nmask, nodes, + &cpuset_current_mems_allowed); + else + nodes_and(cpuset_context_nmask, *nodes, + cpuset_current_mems_allowed); + if (mpol_store_user_nodemask(policy)) + policy->w.user_nodemask = *nodes; + else + policy->w.cpuset_mems_allowed = + cpuset_mems_allowed(current); + } + + ret = mpol_ops[mode].create(policy, + localalloc ? NULL : &cpuset_context_nmask); + if (ret < 0) { + kmem_cache_free(policy_cache, policy); + return ERR_PTR(ret); + } return policy; +} + +static void mpol_rebind_default(struct mempolicy *pol, const nodemask_t *nodes) +{ +} + +static void mpol_rebind_nodemask(struct mempolicy *pol, + const nodemask_t *nodes) +{ + nodemask_t tmp; + + if (pol->flags & MPOL_F_STATIC_NODES) + nodes_and(tmp, pol->w.user_nodemask, *nodes); + else if (pol->flags & MPOL_F_RELATIVE_NODES) + mpol_relative_nodemask(&tmp, &pol->w.user_nodemask, nodes); + else { + nodes_remap(tmp, pol->v.nodes, pol->w.cpuset_mems_allowed, + *nodes); + pol->w.cpuset_mems_allowed = *nodes; + } -free: - kmem_cache_free(policy_cache, policy); - return ERR_PTR(-EINVAL); + pol->v.nodes = tmp; + if (!node_isset(current->il_next, tmp)) { + current->il_next = next_node(current->il_next, tmp); + if (current->il_next >= MAX_NUMNODES) + current->il_next = first_node(tmp); + if (current->il_next >= MAX_NUMNODES) + current->il_next = numa_node_id(); + } +} + +static void mpol_rebind_preferred(struct mempolicy *pol, + const nodemask_t *nodes) +{ + nodemask_t tmp; + + /* + * check 'STATIC_NODES first, as preferred_node == -1 may be + * a temporary, "fallback" state for this policy. + */ + if (pol->flags & MPOL_F_STATIC_NODES) { + int node = first_node(pol->w.user_nodemask); + + if (node_isset(node, *nodes)) + pol->v.preferred_node = node; + else + pol->v.preferred_node = -1; + } else if (pol->v.preferred_node == -1) { + return; /* no remap required for explicit local alloc */ + } else if (pol->flags & MPOL_F_RELATIVE_NODES) { + mpol_relative_nodemask(&tmp, &pol->w.user_nodemask, nodes); + pol->v.preferred_node = first_node(tmp); + } else { + pol->v.preferred_node = node_remap(pol->v.preferred_node, + pol->w.cpuset_mems_allowed, + *nodes); + pol->w.cpuset_mems_allowed = *nodes; + } } /* Migrate a policy to a different set of nodes */ static void mpol_rebind_policy(struct mempolicy *pol, const nodemask_t *newmask) { - nodemask_t tmp; - int static_nodes; - int relative_nodes; - if (!pol) return; - static_nodes = pol->flags & MPOL_F_STATIC_NODES; - relative_nodes = pol->flags & MPOL_F_RELATIVE_NODES; if (!mpol_store_user_nodemask(pol) && nodes_equal(pol->w.cpuset_mems_allowed, *newmask)) return; - - switch (pol->policy) { - case MPOL_DEFAULT: - break; - case MPOL_BIND: - /* Fall through */ - case MPOL_INTERLEAVE: - if (static_nodes) - nodes_and(tmp, pol->w.user_nodemask, *newmask); - else if (relative_nodes) - mpol_relative_nodemask(&tmp, &pol->w.user_nodemask, - newmask); - else { - nodes_remap(tmp, pol->v.nodes, - pol->w.cpuset_mems_allowed, *newmask); - pol->w.cpuset_mems_allowed = *newmask; - } - pol->v.nodes = tmp; - if (!node_isset(current->il_next, tmp)) { - current->il_next = next_node(current->il_next, tmp); - if (current->il_next >= MAX_NUMNODES) - current->il_next = first_node(tmp); - if (current->il_next >= MAX_NUMNODES) - current->il_next = numa_node_id(); - } - break; - case MPOL_PREFERRED: - if (static_nodes) { - int node = first_node(pol->w.user_nodemask); - - if (node_isset(node, *newmask)) - pol->v.preferred_node = node; - else - pol->v.preferred_node = -1; - } else if (relative_nodes) { - mpol_relative_nodemask(&tmp, &pol->w.user_nodemask, - newmask); - pol->v.preferred_node = first_node(tmp); - } else { - pol->v.preferred_node = node_remap(pol->v.preferred_node, - pol->w.cpuset_mems_allowed, *newmask); - pol->w.cpuset_mems_allowed = *newmask; - } - break; - default: - BUG(); - break; - } + mpol_ops[pol->policy].rebind(pol, newmask); } /* @@ -291,6 +321,24 @@ void mpol_rebind_mm(struct mm_struct *mm, nodemask_t *new) up_write(&mm->mmap_sem); } +static const struct mempolicy_operations mpol_ops[MPOL_MAX] = { + [MPOL_DEFAULT] = { + .rebind = mpol_rebind_default, + }, + [MPOL_INTERLEAVE] = { + .create = mpol_new_interleave, + .rebind = mpol_rebind_nodemask, + }, + [MPOL_PREFERRED] = { + .create = mpol_new_preferred, + .rebind = mpol_rebind_preferred, + }, + [MPOL_BIND] = { + .create = mpol_new_bind, + .rebind = mpol_rebind_nodemask, + }, +}; + static void gather_stats(struct page *, void *, int pte_dirty); static void migrate_page_add(struct page *page, struct list_head *pagelist, unsigned long flags); @@ -1848,7 +1896,6 @@ void numa_default_policy(void) /* * Display pages allocated per node and memory policy via /proc. */ - static const char * const policy_types[] = { "default", "prefer", "bind", "interleave" }; -- cgit v1.2.3 From 3842b46de626d1a3c44ad280d67ab0a4dc047d13 Mon Sep 17 00:00:00 2001 From: David Rientjes Date: Mon, 28 Apr 2008 02:12:34 -0700 Subject: mempolicy: small header file cleanup Removes forward definition of vm_area_struct in linux/mempolicy.h. We already get it from the linux/slab.h -> linux/gfp.h include. Removes the unused mpol_set_vma_default() macro from linux/mempolicy.h. Removes the extern definition of default_policy since it is only referenced, as it should be, in mm/mempolicy.c. Cc: Paul Jackson Cc: Christoph Lameter Cc: Lee Schermerhorn Cc: Andi Kleen Signed-off-by: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mempolicy.h | 8 -------- 1 file changed, 8 deletions(-) diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index 02b11efd7066..319fd342b1b7 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -52,7 +52,6 @@ enum { #include #include -struct vm_area_struct; struct mm_struct; #ifdef CONFIG_NUMA @@ -131,10 +130,6 @@ static inline int mpol_equal(struct mempolicy *a, struct mempolicy *b) return __mpol_equal(a, b); } -/* Could later add inheritance of the process policy here. */ - -#define mpol_set_vma_default(vma) ((vma)->vm_policy = NULL) - /* * Tree of shared policies for a shared memory region. * Maintain the policies in a pseudo mm that contains vmas. The vmas @@ -170,7 +165,6 @@ extern void mpol_rebind_task(struct task_struct *tsk, extern void mpol_rebind_mm(struct mm_struct *mm, nodemask_t *new); extern void mpol_fix_fork_child_flag(struct task_struct *p); -extern struct mempolicy default_policy; extern struct zonelist *huge_zonelist(struct vm_area_struct *vma, unsigned long addr, gfp_t gfp_flags, struct mempolicy **mpol, nodemask_t **nodemask); @@ -196,8 +190,6 @@ static inline int mpol_equal(struct mempolicy *a, struct mempolicy *b) return 1; } -#define mpol_set_vma_default(vma) do {} while(0) - static inline void mpol_free(struct mempolicy *p) { } -- cgit v1.2.3 From 3e1f064562fcff7bf3856bc1d00dfa84d4f121cc Mon Sep 17 00:00:00 2001 From: David Rientjes Date: Mon, 28 Apr 2008 02:12:34 -0700 Subject: mempolicy: disallow static or relative flags for local preferred mode MPOL_F_STATIC_NODES and MPOL_F_RELATIVE_NODES don't mean anything for MPOL_PREFERRED policies that were created with an empty nodemask (for purely local allocations). They'll never be invalidated because the allowed mems of a task changes or need to be rebound relative to a cpuset's placement. Also fixes a bug identified by Lee Schermerhorn that disallowed empty nodemasks to be passed to MPOL_PREFERRED to specify local allocations. [A different, somewhat incomplete, patch already existed in 25-rc5-mm1.] Cc: Paul Jackson Cc: Christoph Lameter Cc: Lee Schermerhorn Cc: Andi Kleen Cc: Randy Dunlap Signed-off-by: Lee Schermerhorn Signed-off-by: David Rientjes Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/numa_memory_policy.txt | 16 +++++++++++-- mm/mempolicy.c | 42 ++++++++++++++++++++------------- 2 files changed, 40 insertions(+), 18 deletions(-) diff --git a/Documentation/vm/numa_memory_policy.txt b/Documentation/vm/numa_memory_policy.txt index 706410dfb9e5..1c7dd21623d2 100644 --- a/Documentation/vm/numa_memory_policy.txt +++ b/Documentation/vm/numa_memory_policy.txt @@ -205,6 +205,12 @@ Components of Memory Policies local allocation for a specific range of addresses--i.e. for VMA policies. + It is possible for the user to specify that local allocation is + always preferred by passing an empty nodemask with this mode. + If an empty nodemask is passed, the policy cannot use the + MPOL_F_STATIC_NODES or MPOL_F_RELATIVE_NODES flags described + below. + MPOL_INTERLEAVED: This mode specifies that page allocations be interleaved, on a page granularity, across the nodes specified in the policy. This mode also behaves slightly differently, based on @@ -254,7 +260,10 @@ Components of Memory Policies occurs over that node. If no nodes from the user's nodemask are now allowed, the Default behavior is used. - MPOL_F_STATIC_NODES cannot be used with MPOL_F_RELATIVE_NODES. + MPOL_F_STATIC_NODES cannot be combined with the + MPOL_F_RELATIVE_NODES flag. It also cannot be used for + MPOL_PREFERRED policies that were created with an empty nodemask + (local allocation). MPOL_F_RELATIVE_NODES: This flag specifies that the nodemask passed by the user will be mapped relative to the set of the task or VMA's @@ -301,7 +310,10 @@ Components of Memory Policies set of memory nodes allowed by the task's cpuset, as that may change over time. - MPOL_F_RELATIVE_NODES cannot be used with MPOL_F_STATIC_NODES. + MPOL_F_RELATIVE_NODES cannot be combined with the + MPOL_F_STATIC_NODES flag. It also cannot be used for + MPOL_PREFERRED policies that were created with an empty nodemask + (local allocation). MEMORY POLICY APIs diff --git a/mm/mempolicy.c b/mm/mempolicy.c index a94d994eaaa8..c1b907789d84 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -181,27 +181,43 @@ static struct mempolicy *mpol_new(unsigned short mode, unsigned short flags, { struct mempolicy *policy; nodemask_t cpuset_context_nmask; - int localalloc = 0; int ret; pr_debug("setting mode %d flags %d nodes[0] %lx\n", mode, flags, nodes ? nodes_addr(*nodes)[0] : -1); - if (mode == MPOL_DEFAULT) - return NULL; - if (!nodes || nodes_empty(*nodes)) { - if (mode != MPOL_PREFERRED) + if (mode == MPOL_DEFAULT) { + if (nodes && !nodes_empty(*nodes)) return ERR_PTR(-EINVAL); - localalloc = 1; /* special case: no mode flags */ + return NULL; } + VM_BUG_ON(!nodes); + + /* + * MPOL_PREFERRED cannot be used with MPOL_F_STATIC_NODES or + * MPOL_F_RELATIVE_NODES if the nodemask is empty (local allocation). + * All other modes require a valid pointer to a non-empty nodemask. + */ + if (mode == MPOL_PREFERRED) { + if (nodes_empty(*nodes)) { + if (((flags & MPOL_F_STATIC_NODES) || + (flags & MPOL_F_RELATIVE_NODES))) + return ERR_PTR(-EINVAL); + nodes = NULL; /* flag local alloc */ + } + } else if (nodes_empty(*nodes)) + return ERR_PTR(-EINVAL); policy = kmem_cache_alloc(policy_cache, GFP_KERNEL); if (!policy) return ERR_PTR(-ENOMEM); atomic_set(&policy->refcnt, 1); policy->policy = mode; + policy->flags = flags; - if (!localalloc) { - policy->flags = flags; + if (nodes) { + /* + * cpuset related setup doesn't apply to local allocation + */ cpuset_update_task_memory_state(); if (flags & MPOL_F_RELATIVE_NODES) mpol_relative_nodemask(&cpuset_context_nmask, nodes, @@ -217,7 +233,7 @@ static struct mempolicy *mpol_new(unsigned short mode, unsigned short flags, } ret = mpol_ops[mode].create(policy, - localalloc ? NULL : &cpuset_context_nmask); + nodes ? &cpuset_context_nmask : NULL); if (ret < 0) { kmem_cache_free(policy_cache, policy); return ERR_PTR(ret); @@ -259,10 +275,6 @@ static void mpol_rebind_preferred(struct mempolicy *pol, { nodemask_t tmp; - /* - * check 'STATIC_NODES first, as preferred_node == -1 may be - * a temporary, "fallback" state for this policy. - */ if (pol->flags & MPOL_F_STATIC_NODES) { int node = first_node(pol->w.user_nodemask); @@ -270,12 +282,10 @@ static void mpol_rebind_preferred(struct mempolicy *pol, pol->v.preferred_node = node; else pol->v.preferred_node = -1; - } else if (pol->v.preferred_node == -1) { - return; /* no remap required for explicit local alloc */ } else if (pol->flags & MPOL_F_RELATIVE_NODES) { mpol_relative_nodemask(&tmp, &pol->w.user_nodemask, nodes); pol->v.preferred_node = first_node(tmp); - } else { + } else if (pol->v.preferred_node != -1) { pol->v.preferred_node = node_remap(pol->v.preferred_node, pol->w.cpuset_mems_allowed, *nodes); -- cgit v1.2.3 From a43361cf3cb6fb6431fdbfb0f3ef26a334826160 Mon Sep 17 00:00:00 2001 From: Lee Schermerhorn Date: Mon, 28 Apr 2008 02:12:36 -0700 Subject: mempolicy: fix parsing of tmpfs mpol mount option Parsing of new mode flags in the tmpfs mpol mount option is slightly broken: Setting a valid flag works OK: #mount -o remount,mpol=bind=static:1-2 /dev/shm #mount ... tmpfs on /dev/shm type tmpfs (rw,mpol=bind=static:1-2) ... However, we can't remove them or change them, once we've set a valid flag: #mount -o remount,mpol=bind:1-2 /dev/shm #mount ... tmpfs on /dev/shm type tmpfs (rw,mpol=bind:1-2) ... It SAYS it removed it, but that's just a copy of the input string. If we now try to set it to a different flag, we get: #mount -o remount,mpol=bind=relative:1-2 /dev/shm mount: /dev/shm not mounted already, or bad option And on the console, we see: tmpfs: Bad value 'bind' for mount option 'mpol' ^ lost remainder of string Furthermore, bogus flags are accepted with out error. Granted, they are a no-op: #mount -o remount,mpol=interleave=foo:0-3 /dev/shm #mount ... tmpfs on /dev/shm type tmpfs (rw,mpol=interleave=foo:0-3) Again, that's just a copy of the input string shown by the mount command. This patch fixes the behavior by pre-zeroing the flags so that only one of the mutually exclusive flags can be set at one time. It also reports an error when an unrecognized flag is specified. The check for both flags being set is removed because it can't happen with this implementation. If we ever want to support multiple non-exclusive flags, this area will need rework and we will need to check that any mutually exclusive flags aren't specified. Signed-off-by: Lee Schermerhorn Cc: David Rientjes Cc: Paul Jackson Cc: Christoph Lameter Cc: Andi Kleen Cc: Eric Whitney Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/shmem.c | 16 +++++++++++----- 1 file changed, 11 insertions(+), 5 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 9435f298dd75..177c7a7d2bb3 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1125,20 +1125,26 @@ static int shmem_parse_mpol(char *value, unsigned short *policy, *policy_nodes = node_states[N_HIGH_MEMORY]; err = 0; } + + *mode_flags = 0; if (flags) { + /* + * Currently, we only support two mutually exclusive + * mode flags. + */ if (!strcmp(flags, "static")) *mode_flags |= MPOL_F_STATIC_NODES; - if (!strcmp(flags, "relative")) + else if (!strcmp(flags, "relative")) *mode_flags |= MPOL_F_RELATIVE_NODES; - - if ((*mode_flags & MPOL_F_STATIC_NODES) && - (*mode_flags & MPOL_F_RELATIVE_NODES)) - err = 1; + else + err = 1; /* unrecognized flag */ } out: /* Restore string for error message */ if (nodelist) *--nodelist = ':'; + if (flags) + *--flags = '='; return err; } -- cgit v1.2.3 From b5ee5befa75e33e55d34584ad10286c5005cb1de Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Mon, 28 Apr 2008 02:12:37 -0700 Subject: dmapool: enable debugging for CONFIG_SLUB_DEBUG_ON too Previously it was only enabled for CONFIG_DEBUG_SLAB. Not hooked into the slub runtime debug configuration, so you currently only get it with CONFIG_SLUB_DEBUG_ON, not plain CONFIG_SLUB_DEBUG Acked-by: Matthew Wilcox Signed-off-by: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/dmapool.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/mm/dmapool.c b/mm/dmapool.c index 34aaac451a96..b1f0885dda22 100644 --- a/mm/dmapool.c +++ b/mm/dmapool.c @@ -37,6 +37,10 @@ #include #include +#if defined(CONFIG_DEBUG_SLAB) || defined(CONFIG_SLUB_DEBUG_ON) +#define DMAPOOL_DEBUG 1 +#endif + struct dma_pool { /* the pool */ struct list_head page_list; spinlock_t lock; @@ -216,7 +220,7 @@ static struct dma_page *pool_alloc_page(struct dma_pool *pool, gfp_t mem_flags) page->vaddr = dma_alloc_coherent(pool->dev, pool->allocation, &page->dma, mem_flags); if (page->vaddr) { -#ifdef CONFIG_DEBUG_SLAB +#ifdef DMAPOOL_DEBUG memset(page->vaddr, POOL_POISON_FREED, pool->allocation); #endif pool_initialise_page(pool, page); @@ -239,7 +243,7 @@ static void pool_free_page(struct dma_pool *pool, struct dma_page *page) { dma_addr_t dma = page->dma; -#ifdef CONFIG_DEBUG_SLAB +#ifdef DMAPOOL_DEBUG memset(page->vaddr, POOL_POISON_FREED, pool->allocation); #endif dma_free_coherent(pool->dev, pool->allocation, page->vaddr, dma); @@ -336,7 +340,7 @@ void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags, page->offset = *(int *)(page->vaddr + offset); retval = offset + page->vaddr; *handle = offset + page->dma; -#ifdef CONFIG_DEBUG_SLAB +#ifdef DMAPOOL_DEBUG memset(retval, POOL_POISON_ALLOCATED, pool->size); #endif done: @@ -391,7 +395,7 @@ void dma_pool_free(struct dma_pool *pool, void *vaddr, dma_addr_t dma) } offset = vaddr - page->vaddr; -#ifdef CONFIG_DEBUG_SLAB +#ifdef DMAPOOL_DEBUG if ((dma - page->dma) != offset) { if (pool->dev) dev_err(pool->dev, -- cgit v1.2.3 From 7edf85aa3c00df1e86e82f649c41efa0dd8a7218 Mon Sep 17 00:00:00 2001 From: Andi Kleen Date: Mon, 28 Apr 2008 02:12:37 -0700 Subject: mm: save some bytes in mm_struct by filling holes on 64bit Save some bytes in mm_struct by filling holes Putting int values together for better packing on 64bit shrinks sizeof(struct mm_struct) from 776 bytes to 764 bytes. Signed-off-by: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mm_types.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index af190ceab971..29adaa781cb6 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -172,6 +172,7 @@ struct mm_struct { atomic_t mm_users; /* How many users with user space? */ atomic_t mm_count; /* How many references to "struct mm_struct" (users count as 1) */ int map_count; /* number of VMAs */ + int core_waiters; struct rw_semaphore mmap_sem; spinlock_t page_table_lock; /* Protects page tables and some counters */ @@ -216,11 +217,10 @@ struct mm_struct { unsigned long flags; /* Must use atomic bitops to access the bits */ /* coredumping support */ - int core_waiters; struct completion *core_startup_done, core_done; /* aio bits */ - rwlock_t ioctx_list_lock; + rwlock_t ioctx_list_lock; /* aio lock */ struct kioctx *ioctx_list; #ifdef CONFIG_CGROUP_MEM_RES_CTLR struct mem_cgroup *mem_cgroup; -- cgit v1.2.3 From f05111f50105ac479a008cf85749cf9c956453ea Mon Sep 17 00:00:00 2001 From: "S.Caglar Onur" Date: Mon, 28 Apr 2008 02:12:38 -0700 Subject: mm/page_alloc.c: fix indentation zlc_setup(): handle jiffies wraparound (10ed273f5016c582413dfbc468dd084957d847e1) changes tab with spaces Signed-off-by: S.Caglar Onur Cc: Lee Schermerhorn Cc: Paul Jackson Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/page_alloc.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index b4beb3eea8b7..af28e2cec8b4 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1284,7 +1284,7 @@ static nodemask_t *zlc_setup(struct zonelist *zonelist, int alloc_flags) if (!zlc) return NULL; - if (time_after(jiffies, zlc->last_full_zap + HZ)) { + if (time_after(jiffies, zlc->last_full_zap + HZ)) { bitmap_zero(zlc->fullzones, MAX_ZONES_PER_ZONELIST); zlc->last_full_zap = jiffies; } -- cgit v1.2.3 From ac6aadb24b7d4f0e54246732e221c102073412bf Mon Sep 17 00:00:00 2001 From: Miklos Szeredi Date: Mon, 28 Apr 2008 02:12:38 -0700 Subject: mm: rotate_reclaimable_page() cleanup Clean up messy conditional calling of test_clear_page_writeback() from both rotate_reclaimable_page() and end_page_writeback(). The only user of rotate_reclaimable_page() is end_page_writeback() so this is OK. Signed-off-by: Miklos Szeredi Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/swap.h | 2 +- mm/filemap.c | 10 ++++++---- mm/swap.c | 37 ++++++++++++------------------------- 3 files changed, 19 insertions(+), 30 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index 4286e7ac2b00..0b3377650c85 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -177,7 +177,7 @@ extern void activate_page(struct page *); extern void mark_page_accessed(struct page *); extern void lru_add_drain(void); extern int lru_add_drain_all(void); -extern int rotate_reclaimable_page(struct page *page); +extern void rotate_reclaimable_page(struct page *page); extern void swap_setup(void); /* linux/mm/vmscan.c */ diff --git a/mm/filemap.c b/mm/filemap.c index 07e9d9258b48..239d36163bbe 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -576,10 +576,12 @@ EXPORT_SYMBOL(unlock_page); */ void end_page_writeback(struct page *page) { - if (!TestClearPageReclaim(page) || rotate_reclaimable_page(page)) { - if (!test_clear_page_writeback(page)) - BUG(); - } + if (TestClearPageReclaim(page)) + rotate_reclaimable_page(page); + + if (!test_clear_page_writeback(page)) + BUG(); + smp_mb__after_clear_bit(); wake_up_page(page, PG_writeback); } diff --git a/mm/swap.c b/mm/swap.c index aa1139ccf3a7..91e194445a5e 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -132,34 +132,21 @@ static void pagevec_move_tail(struct pagevec *pvec) * Writeback is about to end against a page which has been marked for immediate * reclaim. If it still appears to be reclaimable, move it to the tail of the * inactive list. - * - * Returns zero if it cleared PG_writeback. */ -int rotate_reclaimable_page(struct page *page) +void rotate_reclaimable_page(struct page *page) { - struct pagevec *pvec; - unsigned long flags; - - if (PageLocked(page)) - return 1; - if (PageDirty(page)) - return 1; - if (PageActive(page)) - return 1; - if (!PageLRU(page)) - return 1; - - page_cache_get(page); - local_irq_save(flags); - pvec = &__get_cpu_var(lru_rotate_pvecs); - if (!pagevec_add(pvec, page)) - pagevec_move_tail(pvec); - local_irq_restore(flags); - - if (!test_clear_page_writeback(page)) - BUG(); + if (!PageLocked(page) && !PageDirty(page) && !PageActive(page) && + PageLRU(page)) { + struct pagevec *pvec; + unsigned long flags; - return 0; + page_cache_get(page); + local_irq_save(flags); + pvec = &__get_cpu_var(lru_rotate_pvecs); + if (!pagevec_add(pvec, page)) + pagevec_move_tail(pvec); + local_irq_restore(flags); + } } /* -- cgit v1.2.3 From b45445684198a946b587732265692e6495993abf Mon Sep 17 00:00:00 2001 From: Andrew Morton Date: Mon, 28 Apr 2008 02:12:39 -0700 Subject: mm: make early_pfn_to_nid() a C function Fix this (sparc64) mm/sparse-vmemmap.c: In function `vmemmap_verify': mm/sparse-vmemmap.c:64: warning: unused variable `pfn' by switching to a C function which touches its arg. (reason 3,555 why macros are bad) Also, the `nid' arg was misnamed. Reviewed-by: Christoph Lameter Acked-by: Andy Whitcroft Cc: Mel Gorman Cc: Andi Kleen Cc: KAMEZAWA Hiroyuki Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mmzone.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 498d6ceff2f4..0aece6d8937e 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -841,7 +841,10 @@ static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist, #if !defined(CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID) && \ !defined(CONFIG_ARCH_POPULATES_NODE_MAP) -#define early_pfn_to_nid(nid) (0UL) +static inline unsigned long early_pfn_to_nid(unsigned long pfn) +{ + return 0; +} #endif #ifdef CONFIG_FLATMEM -- cgit v1.2.3 From a10aa579878fc6f9cd17455067380bbdf1d53c91 Mon Sep 17 00:00:00 2001 From: Christoph Lameter Date: Mon, 28 Apr 2008 02:12:40 -0700 Subject: vmalloc: show vmalloced areas via /proc/vmallocinfo Implement a new proc file that allows the display of the currently allocated vmalloc memory. It allows to see the users of vmalloc. That is important if vmalloc space is scarce (i386 for example). And it's going to be important for the compound page fallback to vmalloc. Many of the current users can be switched to use compound pages with fallback. This means that the number of users of vmalloc is reduced and page tables no longer necessary to access the memory. /proc/vmallocinfo allows to review how that reduction occurs. If memory becomes fragmented and larger order allocations are no longer possible then /proc/vmallocinfo allows to see which compound page allocations fell back to virtual compound pages. That is important for new users of virtual compound pages. Such as order 1 stack allocation etc that may fallback to virtual compound pages in the future. /proc/vmallocinfo permissions are made readable-only-by-root to avoid possible information leakage. [akpm@linux-foundation.org: coding-style fixes] [akpm@linux-foundation.org: CONFIG_MMU=n build fix] Signed-off-by: Christoph Lameter Reviewed-by: KOSAKI Motohiro Cc: Hugh Dickins Cc: Nick Piggin Cc: Arjan van de Ven Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/proc/proc_misc.c | 17 +++++++++++ include/linux/vmalloc.h | 2 ++ mm/vmalloc.c | 76 ++++++++++++++++++++++++++++++++++++++++++++++++- 3 files changed, 94 insertions(+), 1 deletion(-) diff --git a/fs/proc/proc_misc.c b/fs/proc/proc_misc.c index 2d563979cb02..441a32f0e5f2 100644 --- a/fs/proc/proc_misc.c +++ b/fs/proc/proc_misc.c @@ -456,6 +456,20 @@ static const struct file_operations proc_slabstats_operations = { #endif #endif +#ifdef CONFIG_MMU +static int vmalloc_open(struct inode *inode, struct file *file) +{ + return seq_open(file, &vmalloc_op); +} + +static const struct file_operations proc_vmalloc_operations = { + .open = vmalloc_open, + .read = seq_read, + .llseek = seq_lseek, + .release = seq_release, +}; +#endif + static int show_stat(struct seq_file *p, void *v) { int i; @@ -868,6 +882,9 @@ void __init proc_misc_init(void) #ifdef CONFIG_DEBUG_SLAB_LEAK create_seq_entry("slab_allocators", 0 ,&proc_slabstats_operations); #endif +#endif +#ifdef CONFIG_MMU + proc_create("vmallocinfo", S_IRUSR, NULL, &proc_vmalloc_operations); #endif create_seq_entry("buddyinfo",S_IRUGO, &fragmentation_file_operations); create_seq_entry("pagetypeinfo", S_IRUGO, &pagetypeinfo_file_ops); diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index ce8e7da05807..7f3adfda337a 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -87,4 +87,6 @@ extern void free_vm_area(struct vm_struct *area); extern rwlock_t vmlist_lock; extern struct vm_struct *vmlist; +extern const struct seq_operations vmalloc_op; + #endif /* _LINUX_VMALLOC_H */ diff --git a/mm/vmalloc.c b/mm/vmalloc.c index ecf91f8034bf..afa550f66537 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -14,7 +14,7 @@ #include #include #include - +#include #include #include @@ -873,3 +873,77 @@ void free_vm_area(struct vm_struct *area) kfree(area); } EXPORT_SYMBOL_GPL(free_vm_area); + + +#ifdef CONFIG_PROC_FS +static void *s_start(struct seq_file *m, loff_t *pos) +{ + loff_t n = *pos; + struct vm_struct *v; + + read_lock(&vmlist_lock); + v = vmlist; + while (n > 0 && v) { + n--; + v = v->next; + } + if (!n) + return v; + + return NULL; + +} + +static void *s_next(struct seq_file *m, void *p, loff_t *pos) +{ + struct vm_struct *v = p; + + ++*pos; + return v->next; +} + +static void s_stop(struct seq_file *m, void *p) +{ + read_unlock(&vmlist_lock); +} + +static int s_show(struct seq_file *m, void *p) +{ + struct vm_struct *v = p; + + seq_printf(m, "0x%p-0x%p %7ld", + v->addr, v->addr + v->size, v->size); + + if (v->nr_pages) + seq_printf(m, " pages=%d", v->nr_pages); + + if (v->phys_addr) + seq_printf(m, " phys=%lx", v->phys_addr); + + if (v->flags & VM_IOREMAP) + seq_printf(m, " ioremap"); + + if (v->flags & VM_ALLOC) + seq_printf(m, " vmalloc"); + + if (v->flags & VM_MAP) + seq_printf(m, " vmap"); + + if (v->flags & VM_USERMAP) + seq_printf(m, " user"); + + if (v->flags & VM_VPAGES) + seq_printf(m, " vpages"); + + seq_putc(m, '\n'); + return 0; +} + +const struct seq_operations vmalloc_op = { + .start = s_start, + .next = s_next, + .stop = s_stop, + .show = s_show, +}; +#endif + -- cgit v1.2.3 From 2301696932b55e2ea2085cefc84f7b94fa2dd54b Mon Sep 17 00:00:00 2001 From: Christoph Lameter Date: Mon, 28 Apr 2008 02:12:42 -0700 Subject: vmallocinfo: add caller information Add caller information so that /proc/vmallocinfo shows where the allocation request for a slice of vmalloc memory originated. Results in output like this: 0xffffc20000000000-0xffffc20000801000 8392704 alloc_large_system_hash+0x127/0x246 pages=2048 vmalloc vpages 0xffffc20000801000-0xffffc20000806000 20480 alloc_large_system_hash+0x127/0x246 pages=4 vmalloc 0xffffc20000806000-0xffffc20000c07000 4198400 alloc_large_system_hash+0x127/0x246 pages=1024 vmalloc vpages 0xffffc20000c07000-0xffffc20000c0a000 12288 alloc_large_system_hash+0x127/0x246 pages=2 vmalloc 0xffffc20000c0a000-0xffffc20000c0c000 8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap 0xffffc20000c0c000-0xffffc20000c0f000 12288 acpi_os_map_memory+0x13/0x1c phys=cff64000 ioremap 0xffffc20000c10000-0xffffc20000c15000 20480 acpi_os_map_memory+0x13/0x1c phys=cff65000 ioremap 0xffffc20000c16000-0xffffc20000c18000 8192 acpi_os_map_memory+0x13/0x1c phys=cff69000 ioremap 0xffffc20000c18000-0xffffc20000c1a000 8192 acpi_os_map_memory+0x13/0x1c phys=fed1f000 ioremap 0xffffc20000c1a000-0xffffc20000c1c000 8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap 0xffffc20000c1c000-0xffffc20000c1e000 8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap 0xffffc20000c1e000-0xffffc20000c20000 8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap 0xffffc20000c20000-0xffffc20000c22000 8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap 0xffffc20000c22000-0xffffc20000c24000 8192 acpi_os_map_memory+0x13/0x1c phys=cff68000 ioremap 0xffffc20000c24000-0xffffc20000c26000 8192 acpi_os_map_memory+0x13/0x1c phys=e0081000 ioremap 0xffffc20000c26000-0xffffc20000c28000 8192 acpi_os_map_memory+0x13/0x1c phys=e0080000 ioremap 0xffffc20000c28000-0xffffc20000c2d000 20480 alloc_large_system_hash+0x127/0x246 pages=4 vmalloc 0xffffc20000c2d000-0xffffc20000c31000 16384 tcp_init+0xd5/0x31c pages=3 vmalloc 0xffffc20000c31000-0xffffc20000c34000 12288 alloc_large_system_hash+0x127/0x246 pages=2 vmalloc 0xffffc20000c34000-0xffffc20000c36000 8192 init_vdso_vars+0xde/0x1f1 0xffffc20000c36000-0xffffc20000c38000 8192 pci_iomap+0x8a/0xb4 phys=d8e00000 ioremap 0xffffc20000c38000-0xffffc20000c3a000 8192 usb_hcd_pci_probe+0x139/0x295 [usbcore] phys=d8e00000 ioremap 0xffffc20000c3a000-0xffffc20000c3e000 16384 sys_swapon+0x509/0xa15 pages=3 vmalloc 0xffffc20000c40000-0xffffc20000c61000 135168 e1000_probe+0x1c4/0xa32 phys=d8a20000 ioremap 0xffffc20000c61000-0xffffc20000c6a000 36864 _xfs_buf_map_pages+0x8e/0xc0 vmap 0xffffc20000c6a000-0xffffc20000c73000 36864 _xfs_buf_map_pages+0x8e/0xc0 vmap 0xffffc20000c73000-0xffffc20000c7c000 36864 _xfs_buf_map_pages+0x8e/0xc0 vmap 0xffffc20000c7c000-0xffffc20000c7f000 12288 e1000e_setup_tx_resources+0x29/0xbe pages=2 vmalloc 0xffffc20000c80000-0xffffc20001481000 8392704 pci_mmcfg_arch_init+0x90/0x118 phys=e0000000 ioremap 0xffffc20001481000-0xffffc20001682000 2101248 alloc_large_system_hash+0x127/0x246 pages=512 vmalloc 0xffffc20001682000-0xffffc20001e83000 8392704 alloc_large_system_hash+0x127/0x246 pages=2048 vmalloc vpages 0xffffc20001e83000-0xffffc20002204000 3674112 alloc_large_system_hash+0x127/0x246 pages=896 vmalloc vpages 0xffffc20002204000-0xffffc2000220d000 36864 _xfs_buf_map_pages+0x8e/0xc0 vmap 0xffffc2000220d000-0xffffc20002216000 36864 _xfs_buf_map_pages+0x8e/0xc0 vmap 0xffffc20002216000-0xffffc2000221f000 36864 _xfs_buf_map_pages+0x8e/0xc0 vmap 0xffffc2000221f000-0xffffc20002228000 36864 _xfs_buf_map_pages+0x8e/0xc0 vmap 0xffffc20002228000-0xffffc20002231000 36864 _xfs_buf_map_pages+0x8e/0xc0 vmap 0xffffc20002231000-0xffffc20002234000 12288 e1000e_setup_rx_resources+0x35/0x122 pages=2 vmalloc 0xffffc20002240000-0xffffc20002261000 135168 e1000_probe+0x1c4/0xa32 phys=d8a60000 ioremap 0xffffc20002261000-0xffffc2000270c000 4894720 sys_swapon+0x509/0xa15 pages=1194 vmalloc vpages 0xffffffffa0000000-0xffffffffa0022000 139264 module_alloc+0x4f/0x55 pages=33 vmalloc 0xffffffffa0022000-0xffffffffa0029000 28672 module_alloc+0x4f/0x55 pages=6 vmalloc 0xffffffffa002b000-0xffffffffa0034000 36864 module_alloc+0x4f/0x55 pages=8 vmalloc 0xffffffffa0034000-0xffffffffa003d000 36864 module_alloc+0x4f/0x55 pages=8 vmalloc 0xffffffffa003d000-0xffffffffa0049000 49152 module_alloc+0x4f/0x55 pages=11 vmalloc 0xffffffffa0049000-0xffffffffa0050000 28672 module_alloc+0x4f/0x55 pages=6 vmalloc [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Christoph Lameter Reviewed-by: KOSAKI Motohiro Cc: Hugh Dickins Cc: Nick Piggin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/x86/mm/ioremap.c | 15 +++++++----- include/linux/vmalloc.h | 3 +++ mm/vmalloc.c | 65 +++++++++++++++++++++++++++++++++++-------------- 3 files changed, 59 insertions(+), 24 deletions(-) diff --git a/arch/x86/mm/ioremap.c b/arch/x86/mm/ioremap.c index d176b23110cc..804de18abcc2 100644 --- a/arch/x86/mm/ioremap.c +++ b/arch/x86/mm/ioremap.c @@ -117,8 +117,8 @@ int ioremap_change_attr(unsigned long vaddr, unsigned long size, * have to convert them into an offset in a page-aligned mapping, but the * caller shouldn't need to know that small detail. */ -static void __iomem *__ioremap(resource_size_t phys_addr, unsigned long size, - unsigned long prot_val) +static void __iomem *__ioremap_caller(resource_size_t phys_addr, + unsigned long size, unsigned long prot_val, void *caller) { unsigned long pfn, offset, vaddr; resource_size_t last_addr; @@ -212,7 +212,7 @@ static void __iomem *__ioremap(resource_size_t phys_addr, unsigned long size, /* * Ok, go for it.. */ - area = get_vm_area(size, VM_IOREMAP); + area = get_vm_area_caller(size, VM_IOREMAP, caller); if (!area) return NULL; area->phys_addr = phys_addr; @@ -255,7 +255,8 @@ static void __iomem *__ioremap(resource_size_t phys_addr, unsigned long size, */ void __iomem *ioremap_nocache(resource_size_t phys_addr, unsigned long size) { - return __ioremap(phys_addr, size, _PAGE_CACHE_UC); + return __ioremap_caller(phys_addr, size, _PAGE_CACHE_UC, + __builtin_return_address(0)); } EXPORT_SYMBOL(ioremap_nocache); @@ -272,7 +273,8 @@ EXPORT_SYMBOL(ioremap_nocache); void __iomem *ioremap_wc(unsigned long phys_addr, unsigned long size) { if (pat_wc_enabled) - return __ioremap(phys_addr, size, _PAGE_CACHE_WC); + return __ioremap_caller(phys_addr, size, _PAGE_CACHE_WC, + __builtin_return_address(0)); else return ioremap_nocache(phys_addr, size); } @@ -280,7 +282,8 @@ EXPORT_SYMBOL(ioremap_wc); void __iomem *ioremap_cache(resource_size_t phys_addr, unsigned long size) { - return __ioremap(phys_addr, size, _PAGE_CACHE_WB); + return __ioremap_caller(phys_addr, size, _PAGE_CACHE_WB, + __builtin_return_address(0)); } EXPORT_SYMBOL(ioremap_cache); diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h index 7f3adfda337a..364789aae9f3 100644 --- a/include/linux/vmalloc.h +++ b/include/linux/vmalloc.h @@ -31,6 +31,7 @@ struct vm_struct { struct page **pages; unsigned int nr_pages; unsigned long phys_addr; + void *caller; }; /* @@ -66,6 +67,8 @@ static inline size_t get_vm_area_size(const struct vm_struct *area) } extern struct vm_struct *get_vm_area(unsigned long size, unsigned long flags); +extern struct vm_struct *get_vm_area_caller(unsigned long size, + unsigned long flags, void *caller); extern struct vm_struct *__get_vm_area(unsigned long size, unsigned long flags, unsigned long start, unsigned long end); extern struct vm_struct *get_vm_area_node(unsigned long size, diff --git a/mm/vmalloc.c b/mm/vmalloc.c index afa550f66537..e33e0ae69ad1 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include @@ -25,7 +26,7 @@ DEFINE_RWLOCK(vmlist_lock); struct vm_struct *vmlist; static void *__vmalloc_node(unsigned long size, gfp_t gfp_mask, pgprot_t prot, - int node); + int node, void *caller); static void vunmap_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end) { @@ -204,9 +205,9 @@ unsigned long vmalloc_to_pfn(const void *vmalloc_addr) } EXPORT_SYMBOL(vmalloc_to_pfn); -static struct vm_struct *__get_vm_area_node(unsigned long size, unsigned long flags, - unsigned long start, unsigned long end, - int node, gfp_t gfp_mask) +static struct vm_struct * +__get_vm_area_node(unsigned long size, unsigned long flags, unsigned long start, + unsigned long end, int node, gfp_t gfp_mask, void *caller) { struct vm_struct **p, *tmp, *area; unsigned long align = 1; @@ -269,6 +270,7 @@ found: area->pages = NULL; area->nr_pages = 0; area->phys_addr = 0; + area->caller = caller; write_unlock(&vmlist_lock); return area; @@ -284,7 +286,8 @@ out: struct vm_struct *__get_vm_area(unsigned long size, unsigned long flags, unsigned long start, unsigned long end) { - return __get_vm_area_node(size, flags, start, end, -1, GFP_KERNEL); + return __get_vm_area_node(size, flags, start, end, -1, GFP_KERNEL, + __builtin_return_address(0)); } EXPORT_SYMBOL_GPL(__get_vm_area); @@ -299,14 +302,22 @@ EXPORT_SYMBOL_GPL(__get_vm_area); */ struct vm_struct *get_vm_area(unsigned long size, unsigned long flags) { - return __get_vm_area(size, flags, VMALLOC_START, VMALLOC_END); + return __get_vm_area_node(size, flags, VMALLOC_START, VMALLOC_END, + -1, GFP_KERNEL, __builtin_return_address(0)); +} + +struct vm_struct *get_vm_area_caller(unsigned long size, unsigned long flags, + void *caller) +{ + return __get_vm_area_node(size, flags, VMALLOC_START, VMALLOC_END, + -1, GFP_KERNEL, caller); } struct vm_struct *get_vm_area_node(unsigned long size, unsigned long flags, int node, gfp_t gfp_mask) { return __get_vm_area_node(size, flags, VMALLOC_START, VMALLOC_END, node, - gfp_mask); + gfp_mask, __builtin_return_address(0)); } /* Caller must hold vmlist_lock */ @@ -455,9 +466,11 @@ void *vmap(struct page **pages, unsigned int count, if (count > num_physpages) return NULL; - area = get_vm_area((count << PAGE_SHIFT), flags); + area = get_vm_area_caller((count << PAGE_SHIFT), flags, + __builtin_return_address(0)); if (!area) return NULL; + if (map_vm_area(area, prot, &pages)) { vunmap(area->addr); return NULL; @@ -468,7 +481,7 @@ void *vmap(struct page **pages, unsigned int count, EXPORT_SYMBOL(vmap); static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, - pgprot_t prot, int node) + pgprot_t prot, int node, void *caller) { struct page **pages; unsigned int nr_pages, array_size, i; @@ -480,7 +493,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, /* Please note that the recursion is strictly bounded. */ if (array_size > PAGE_SIZE) { pages = __vmalloc_node(array_size, gfp_mask | __GFP_ZERO, - PAGE_KERNEL, node); + PAGE_KERNEL, node, caller); area->flags |= VM_VPAGES; } else { pages = kmalloc_node(array_size, @@ -488,6 +501,7 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, node); } area->pages = pages; + area->caller = caller; if (!area->pages) { remove_vm_area(area->addr); kfree(area); @@ -521,7 +535,8 @@ fail: void *__vmalloc_area(struct vm_struct *area, gfp_t gfp_mask, pgprot_t prot) { - return __vmalloc_area_node(area, gfp_mask, prot, -1); + return __vmalloc_area_node(area, gfp_mask, prot, -1, + __builtin_return_address(0)); } /** @@ -536,7 +551,7 @@ void *__vmalloc_area(struct vm_struct *area, gfp_t gfp_mask, pgprot_t prot) * kernel virtual space, using a pagetable protection of @prot. */ static void *__vmalloc_node(unsigned long size, gfp_t gfp_mask, pgprot_t prot, - int node) + int node, void *caller) { struct vm_struct *area; @@ -544,16 +559,19 @@ static void *__vmalloc_node(unsigned long size, gfp_t gfp_mask, pgprot_t prot, if (!size || (size >> PAGE_SHIFT) > num_physpages) return NULL; - area = get_vm_area_node(size, VM_ALLOC, node, gfp_mask); + area = __get_vm_area_node(size, VM_ALLOC, VMALLOC_START, VMALLOC_END, + node, gfp_mask, caller); + if (!area) return NULL; - return __vmalloc_area_node(area, gfp_mask, prot, node); + return __vmalloc_area_node(area, gfp_mask, prot, node, caller); } void *__vmalloc(unsigned long size, gfp_t gfp_mask, pgprot_t prot) { - return __vmalloc_node(size, gfp_mask, prot, -1); + return __vmalloc_node(size, gfp_mask, prot, -1, + __builtin_return_address(0)); } EXPORT_SYMBOL(__vmalloc); @@ -568,7 +586,8 @@ EXPORT_SYMBOL(__vmalloc); */ void *vmalloc(unsigned long size) { - return __vmalloc(size, GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL); + return __vmalloc_node(size, GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL, + -1, __builtin_return_address(0)); } EXPORT_SYMBOL(vmalloc); @@ -608,7 +627,8 @@ EXPORT_SYMBOL(vmalloc_user); */ void *vmalloc_node(unsigned long size, int node) { - return __vmalloc_node(size, GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL, node); + return __vmalloc_node(size, GFP_KERNEL | __GFP_HIGHMEM, PAGE_KERNEL, + node, __builtin_return_address(0)); } EXPORT_SYMBOL(vmalloc_node); @@ -843,7 +863,8 @@ struct vm_struct *alloc_vm_area(size_t size) { struct vm_struct *area; - area = get_vm_area(size, VM_IOREMAP); + area = get_vm_area_caller(size, VM_IOREMAP, + __builtin_return_address(0)); if (area == NULL) return NULL; @@ -914,6 +935,14 @@ static int s_show(struct seq_file *m, void *p) seq_printf(m, "0x%p-0x%p %7ld", v->addr, v->addr + v->size, v->size); + if (v->caller) { + char buff[2 * KSYM_NAME_LEN]; + + seq_putc(m, ' '); + sprint_symbol(buff, (unsigned long)v->caller); + seq_puts(m, buff); + } + if (v->nr_pages) seq_printf(m, " pages=%d", v->nr_pages); -- cgit v1.2.3 From 308c05e35e3517d19bb67a7e97772235c9e15cd7 Mon Sep 17 00:00:00 2001 From: Christoph Lameter Date: Mon, 28 Apr 2008 02:12:43 -0700 Subject: sparsemem: vmemmap does not need section bits A set of patches that attempts to improve page flag handling. First of all a method is introduced to generate the page flag functions using macros. Then the number of page flags used by sparsemem is reduced. All page flag operations will no longer be macros. All flags will use inline function. Then we add a way to export enum constants to the preprocessor which allows us to get rid of __ZONE_COUNT and use the NR_PAGEFLAGS for the dynamic calculation of actually available page flags for fields. This patch: Sparsemem vmemmap does not need any section bits. This patch has the effect of reducing the number of bits used in page->flags by at least 6. Signed-off-by: Christoph Lameter Cc: Andy Whitcroft Cc: KAMEZAWA Hiroyuki Cc: KOSAKI Motohiro Cc: Rik van Riel Cc: Mel Gorman Cc: Jeremy Fitzhardinge Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mm.h | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index ca973359fe5f..24659ed06bae 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -395,11 +395,11 @@ static inline void set_compound_order(struct page *page, unsigned long order) * we have run out of space and have to fall back to an * alternate (slower) way of determining the node. * - * No sparsemem: | NODE | ZONE | ... | FLAGS | - * with space for node: | SECTION | NODE | ZONE | ... | FLAGS | - * no space for node: | SECTION | ZONE | ... | FLAGS | + * No sparsemem or sparsemem vmemmap: | NODE | ZONE | ... | FLAGS | + * classic sparse with space for node:| SECTION | NODE | ZONE | ... | FLAGS | + * classic sparse no space for node: | SECTION | ZONE | ... | FLAGS | */ -#ifdef CONFIG_SPARSEMEM +#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP) #define SECTIONS_WIDTH SECTIONS_SHIFT #else #define SECTIONS_WIDTH 0 @@ -410,6 +410,9 @@ static inline void set_compound_order(struct page *page, unsigned long order) #if SECTIONS_WIDTH+ZONES_WIDTH+NODES_SHIFT <= FLAGS_RESERVED #define NODES_WIDTH NODES_SHIFT #else +#ifdef CONFIG_SPARSEMEM_VMEMMAP +#error "Vmemmap: No space for nodes field in page flags" +#endif #define NODES_WIDTH 0 #endif @@ -502,10 +505,12 @@ static inline struct zone *page_zone(struct page *page) return &NODE_DATA(page_to_nid(page))->node_zones[page_zonenum(page)]; } +#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP) static inline unsigned long page_to_section(struct page *page) { return (page->flags >> SECTIONS_PGSHIFT) & SECTIONS_MASK; } +#endif static inline void set_page_zone(struct page *page, enum zone_type zone) { -- cgit v1.2.3 From 1cdf25d704f7951d02a04064c97db547d6021872 Mon Sep 17 00:00:00 2001 From: Christoph Lameter Date: Mon, 28 Apr 2008 02:12:44 -0700 Subject: kbuild: create a way to create preprocessor constants from C expressions The use of enums create constants that are not available to the preprocessor when building the kernel (f.e. MAX_NR_ZONES). Arch code already has a way to export constants calculated to the preprocessor through the asm-offsets.c file. Generate something similar for the core kernel through kbuild. Signed-off-by: Sam Ravnborg Signed-off-by: Christoph Lameter Cc: Andy Whitcroft Cc: KAMEZAWA Hiroyuki Cc: KOSAKI Motohiro Cc: Rik van Riel Cc: Mel Gorman Cc: Jeremy Fitzhardinge Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Kbuild | 56 ++++++++++++++++++++++++++++++++++++++++++++++++-------- kernel/bounds.c | 19 +++++++++++++++++++ 2 files changed, 67 insertions(+), 8 deletions(-) create mode 100644 kernel/bounds.c diff --git a/Kbuild b/Kbuild index 1570d248ad92..7136de7b6fcb 100644 --- a/Kbuild +++ b/Kbuild @@ -1,19 +1,54 @@ # # Kbuild for top-level directory of the kernel # This file takes care of the following: -# 1) Generate asm-offsets.h -# 2) Check for missing system calls +# 1) Generate bounds.h +# 2) Generate asm-offsets.h (may need bounds.h) +# 3) Check for missing system calls ##### -# 1) Generate asm-offsets.h +# 1) Generate bounds.h + +bounds-file := include/linux/bounds.h + +always := $(bounds-file) +targets := $(bounds-file) kernel/bounds.s + +quiet_cmd_bounds = GEN $@ +define cmd_bounds + (set -e; \ + echo "#ifndef __LINUX_BOUNDS_H__"; \ + echo "#define __LINUX_BOUNDS_H__"; \ + echo "/*"; \ + echo " * DO NOT MODIFY."; \ + echo " *"; \ + echo " * This file was generated by Kbuild"; \ + echo " *"; \ + echo " */"; \ + echo ""; \ + sed -ne $(sed-y) $<; \ + echo ""; \ + echo "#endif" ) > $@ +endef + +# We use internal kbuild rules to avoid the "is up to date" message from make +kernel/bounds.s: kernel/bounds.c FORCE + $(Q)mkdir -p $(dir $@) + $(call if_changed_dep,cc_s_c) + +$(obj)/$(bounds-file): kernel/bounds.s Kbuild + $(Q)mkdir -p $(dir $@) + $(call cmd,bounds) + +##### +# 2) Generate asm-offsets.h # offsets-file := include/asm-$(SRCARCH)/asm-offsets.h -always := $(offsets-file) -targets := $(offsets-file) +always += $(offsets-file) +targets += $(offsets-file) targets += arch/$(SRCARCH)/kernel/asm-offsets.s -clean-files := $(addprefix $(objtree)/,$(targets)) + # Default sed regexp - multiline due to syntax constraints define sed-y @@ -40,7 +75,8 @@ define cmd_offsets endef # We use internal kbuild rules to avoid the "is up to date" message from make -arch/$(SRCARCH)/kernel/asm-offsets.s: arch/$(SRCARCH)/kernel/asm-offsets.c FORCE +arch/$(SRCARCH)/kernel/asm-offsets.s: arch/$(SRCARCH)/kernel/asm-offsets.c \ + $(obj)/$(bounds-file) FORCE $(Q)mkdir -p $(dir $@) $(call if_changed_dep,cc_s_c) @@ -49,7 +85,7 @@ $(obj)/$(offsets-file): arch/$(SRCARCH)/kernel/asm-offsets.s Kbuild $(call cmd,offsets) ##### -# 2) Check for missing system calls +# 3) Check for missing system calls # quiet_cmd_syscalls = CALL $< @@ -58,3 +94,7 @@ quiet_cmd_syscalls = CALL $< PHONY += missing-syscalls missing-syscalls: scripts/checksyscalls.sh FORCE $(call cmd,syscalls) + +# Delete all targets during make clean +clean-files := $(addprefix $(objtree)/,$(targets)) + diff --git a/kernel/bounds.c b/kernel/bounds.c new file mode 100644 index 000000000000..85bb281858cb --- /dev/null +++ b/kernel/bounds.c @@ -0,0 +1,19 @@ +/* + * Generate definitions needed by the preprocessor. + * This code generates raw asm output which is post-processed + * to extract and format the required data. + */ + +#define __GENERATING_BOUNDS_H +/* Include headers that define the enum constants of interest */ + +#define DEFINE(sym, val) \ + asm volatile("\n->" #sym " %0 " #val : : "i" (val)) + +#define BLANK() asm volatile("\n->" : : ) + +void foo(void) +{ + /* The enum constants to put into include/linux/bounds.h */ + /* End of constants */ +} -- cgit v1.2.3 From 726b80127239aeea9c8d8aad5b4e2c80313e3ce8 Mon Sep 17 00:00:00 2001 From: Andrew Morton Date: Mon, 28 Apr 2008 02:12:44 -0700 Subject: page_mapping(): add ifdef around reference to swapper_space This fixes the superh build when the pageflags patches are applied. But it shouldn't unless it's a gcc bug. Cc: Christoph Lameter Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mm.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 24659ed06bae..4f3c1b2f44de 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -605,9 +605,12 @@ static inline struct address_space *page_mapping(struct page *page) struct address_space *mapping = page->mapping; VM_BUG_ON(PageSlab(page)); +#ifdef CONFIG_SWAP if (unlikely(PageSwapCache(page))) mapping = &swapper_space; - else if (unlikely((unsigned long)mapping & PAGE_MAPPING_ANON)) + else +#endif + if (unlikely((unsigned long)mapping & PAGE_MAPPING_ANON)) mapping = NULL; return mapping; } -- cgit v1.2.3 From bf2ae2b37c06cc9fb6fc03d99617f1161939980f Mon Sep 17 00:00:00 2001 From: Christoph Lameter Date: Mon, 28 Apr 2008 02:12:45 -0700 Subject: pageflags: standardize comment inclusion in asm-offsets.h and fix MIPS Add the ability to pass comments into asm-offsets.h by generating asm output like -># comment line Mips needs this feature to preserve the comments that are in asm-mips/asm-offsets.h right now. Then remove the special handling for mips from Kbuild and convert mips to use the new string to include the comments. Cc: Ralf Baechle Signed-off-by: Christoph Lameter Cc: Sam Ravnborg Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Kbuild | 6 +- arch/mips/kernel/asm-offsets.c | 404 ++++++++++++++++++++--------------------- 2 files changed, 205 insertions(+), 205 deletions(-) diff --git a/Kbuild b/Kbuild index 7136de7b6fcb..32f19c5c9bb0 100644 --- a/Kbuild +++ b/Kbuild @@ -52,10 +52,10 @@ targets += arch/$(SRCARCH)/kernel/asm-offsets.s # Default sed regexp - multiline due to syntax constraints define sed-y - "/^->/{s:^->\([^ ]*\) [\$$#]*\([^ ]*\) \(.*\):#define \1 \2 /* \3 */:; s:->::; p;}" + "/^->/{s:->#\(.*\):/* \1 */:; \ + s:^->\([^ ]*\) [\$$#]*\([^ ]*\) \(.*\):#define \1 \2 /* \3 */:; \ + s:->::; p;}" endef -# Override default regexp for specific architectures -sed-$(CONFIG_MIPS) := "/^@@@/{s/^@@@//; s/ \#.*\$$//; p;}" quiet_cmd_offsets = GEN $@ define cmd_offsets diff --git a/arch/mips/kernel/asm-offsets.c b/arch/mips/kernel/asm-offsets.c index ca136298acdc..5bf03b3c4150 100644 --- a/arch/mips/kernel/asm-offsets.c +++ b/arch/mips/kernel/asm-offsets.c @@ -17,252 +17,252 @@ #include #include -#define text(t) __asm__("\n@@@" t) +#define text(t) __asm__("\n->#" t) #define _offset(type, member) (&(((type *)NULL)->member)) #define offset(string, ptr, member) \ - __asm__("\n@@@" string "%0" : : "i" (_offset(ptr, member))) + __asm__("\n->" string " %0" : : "i" (_offset(ptr, member))) #define constant(string, member) \ - __asm__("\n@@@" string "%X0" : : "ri" (member)) + __asm__("\n->" string " %0" : : "ri" (member)) #define size(string, size) \ - __asm__("\n@@@" string "%0" : : "i" (sizeof(size))) + __asm__("\n->" string " %0" : : "i" (sizeof(size))) #define linefeed text("") void output_ptreg_defines(void) { - text("/* MIPS pt_regs offsets. */"); - offset("#define PT_R0 ", struct pt_regs, regs[0]); - offset("#define PT_R1 ", struct pt_regs, regs[1]); - offset("#define PT_R2 ", struct pt_regs, regs[2]); - offset("#define PT_R3 ", struct pt_regs, regs[3]); - offset("#define PT_R4 ", struct pt_regs, regs[4]); - offset("#define PT_R5 ", struct pt_regs, regs[5]); - offset("#define PT_R6 ", struct pt_regs, regs[6]); - offset("#define PT_R7 ", struct pt_regs, regs[7]); - offset("#define PT_R8 ", struct pt_regs, regs[8]); - offset("#define PT_R9 ", struct pt_regs, regs[9]); - offset("#define PT_R10 ", struct pt_regs, regs[10]); - offset("#define PT_R11 ", struct pt_regs, regs[11]); - offset("#define PT_R12 ", struct pt_regs, regs[12]); - offset("#define PT_R13 ", struct pt_regs, regs[13]); - offset("#define PT_R14 ", struct pt_regs, regs[14]); - offset("#define PT_R15 ", struct pt_regs, regs[15]); - offset("#define PT_R16 ", struct pt_regs, regs[16]); - offset("#define PT_R17 ", struct pt_regs, regs[17]); - offset("#define PT_R18 ", struct pt_regs, regs[18]); - offset("#define PT_R19 ", struct pt_regs, regs[19]); - offset("#define PT_R20 ", struct pt_regs, regs[20]); - offset("#define PT_R21 ", struct pt_regs, regs[21]); - offset("#define PT_R22 ", struct pt_regs, regs[22]); - offset("#define PT_R23 ", struct pt_regs, regs[23]); - offset("#define PT_R24 ", struct pt_regs, regs[24]); - offset("#define PT_R25 ", struct pt_regs, regs[25]); - offset("#define PT_R26 ", struct pt_regs, regs[26]); - offset("#define PT_R27 ", struct pt_regs, regs[27]); - offset("#define PT_R28 ", struct pt_regs, regs[28]); - offset("#define PT_R29 ", struct pt_regs, regs[29]); - offset("#define PT_R30 ", struct pt_regs, regs[30]); - offset("#define PT_R31 ", struct pt_regs, regs[31]); - offset("#define PT_LO ", struct pt_regs, lo); - offset("#define PT_HI ", struct pt_regs, hi); + text("MIPS pt_regs offsets."); + offset("PT_R0", struct pt_regs, regs[0]); + offset("PT_R1", struct pt_regs, regs[1]); + offset("PT_R2", struct pt_regs, regs[2]); + offset("PT_R3", struct pt_regs, regs[3]); + offset("PT_R4", struct pt_regs, regs[4]); + offset("PT_R5", struct pt_regs, regs[5]); + offset("PT_R6", struct pt_regs, regs[6]); + offset("PT_R7", struct pt_regs, regs[7]); + offset("PT_R8", struct pt_regs, regs[8]); + offset("PT_R9", struct pt_regs, regs[9]); + offset("PT_R10", struct pt_regs, regs[10]); + offset("PT_R11", struct pt_regs, regs[11]); + offset("PT_R12", struct pt_regs, regs[12]); + offset("PT_R13", struct pt_regs, regs[13]); + offset("PT_R14", struct pt_regs, regs[14]); + offset("PT_R15", struct pt_regs, regs[15]); + offset("PT_R16", struct pt_regs, regs[16]); + offset("PT_R17", struct pt_regs, regs[17]); + offset("PT_R18", struct pt_regs, regs[18]); + offset("PT_R19", struct pt_regs, regs[19]); + offset("PT_R20", struct pt_regs, regs[20]); + offset("PT_R21", struct pt_regs, regs[21]); + offset("PT_R22", struct pt_regs, regs[22]); + offset("PT_R23", struct pt_regs, regs[23]); + offset("PT_R24", struct pt_regs, regs[24]); + offset("PT_R25", struct pt_regs, regs[25]); + offset("PT_R26", struct pt_regs, regs[26]); + offset("PT_R27", struct pt_regs, regs[27]); + offset("PT_R28", struct pt_regs, regs[28]); + offset("PT_R29", struct pt_regs, regs[29]); + offset("PT_R30", struct pt_regs, regs[30]); + offset("PT_R31", struct pt_regs, regs[31]); + offset("PT_LO", struct pt_regs, lo); + offset("PT_HI", struct pt_regs, hi); #ifdef CONFIG_CPU_HAS_SMARTMIPS - offset("#define PT_ACX ", struct pt_regs, acx); + offset("PT_ACX", struct pt_regs, acx); #endif - offset("#define PT_EPC ", struct pt_regs, cp0_epc); - offset("#define PT_BVADDR ", struct pt_regs, cp0_badvaddr); - offset("#define PT_STATUS ", struct pt_regs, cp0_status); - offset("#define PT_CAUSE ", struct pt_regs, cp0_cause); + offset("PT_EPC", struct pt_regs, cp0_epc); + offset("PT_BVADDR", struct pt_regs, cp0_badvaddr); + offset("PT_STATUS", struct pt_regs, cp0_status); + offset("PT_CAUSE", struct pt_regs, cp0_cause); #ifdef CONFIG_MIPS_MT_SMTC - offset("#define PT_TCSTATUS ", struct pt_regs, cp0_tcstatus); + offset("PT_TCSTATUS", struct pt_regs, cp0_tcstatus); #endif /* CONFIG_MIPS_MT_SMTC */ - size("#define PT_SIZE ", struct pt_regs); + size("PT_SIZE", struct pt_regs); linefeed; } void output_task_defines(void) { - text("/* MIPS task_struct offsets. */"); - offset("#define TASK_STATE ", struct task_struct, state); - offset("#define TASK_THREAD_INFO ", struct task_struct, stack); - offset("#define TASK_FLAGS ", struct task_struct, flags); - offset("#define TASK_MM ", struct task_struct, mm); - offset("#define TASK_PID ", struct task_struct, pid); - size( "#define TASK_STRUCT_SIZE ", struct task_struct); + text("MIPS task_struct offsets."); + offset("TASK_STATE", struct task_struct, state); + offset("TASK_THREAD_INFO", struct task_struct, stack); + offset("TASK_FLAGS", struct task_struct, flags); + offset("TASK_MM", struct task_struct, mm); + offset("TASK_PID", struct task_struct, pid); + size( "TASK_STRUCT_SIZE", struct task_struct); linefeed; } void output_thread_info_defines(void) { - text("/* MIPS thread_info offsets. */"); - offset("#define TI_TASK ", struct thread_info, task); - offset("#define TI_EXEC_DOMAIN ", struct thread_info, exec_domain); - offset("#define TI_FLAGS ", struct thread_info, flags); - offset("#define TI_TP_VALUE ", struct thread_info, tp_value); - offset("#define TI_CPU ", struct thread_info, cpu); - offset("#define TI_PRE_COUNT ", struct thread_info, preempt_count); - offset("#define TI_ADDR_LIMIT ", struct thread_info, addr_limit); - offset("#define TI_RESTART_BLOCK ", struct thread_info, restart_block); - offset("#define TI_REGS ", struct thread_info, regs); - constant("#define _THREAD_SIZE ", THREAD_SIZE); - constant("#define _THREAD_MASK ", THREAD_MASK); + text("MIPS thread_info offsets."); + offset("TI_TASK", struct thread_info, task); + offset("TI_EXEC_DOMAIN", struct thread_info, exec_domain); + offset("TI_FLAGS", struct thread_info, flags); + offset("TI_TP_VALUE", struct thread_info, tp_value); + offset("TI_CPU", struct thread_info, cpu); + offset("TI_PRE_COUNT", struct thread_info, preempt_count); + offset("TI_ADDR_LIMIT", struct thread_info, addr_limit); + offset("TI_RESTART_BLOCK", struct thread_info, restart_block); + offset("TI_REGS", struct thread_info, regs); + constant("_THREAD_SIZE", THREAD_SIZE); + constant("_THREAD_MASK", THREAD_MASK); linefeed; } void output_thread_defines(void) { - text("/* MIPS specific thread_struct offsets. */"); - offset("#define THREAD_REG16 ", struct task_struct, thread.reg16); - offset("#define THREAD_REG17 ", struct task_struct, thread.reg17); - offset("#define THREAD_REG18 ", struct task_struct, thread.reg18); - offset("#define THREAD_REG19 ", struct task_struct, thread.reg19); - offset("#define THREAD_REG20 ", struct task_struct, thread.reg20); - offset("#define THREAD_REG21 ", struct task_struct, thread.reg21); - offset("#define THREAD_REG22 ", struct task_struct, thread.reg22); - offset("#define THREAD_REG23 ", struct task_struct, thread.reg23); - offset("#define THREAD_REG29 ", struct task_struct, thread.reg29); - offset("#define THREAD_REG30 ", struct task_struct, thread.reg30); - offset("#define THREAD_REG31 ", struct task_struct, thread.reg31); - offset("#define THREAD_STATUS ", struct task_struct, + text("MIPS specific thread_struct offsets."); + offset("THREAD_REG16", struct task_struct, thread.reg16); + offset("THREAD_REG17", struct task_struct, thread.reg17); + offset("THREAD_REG18", struct task_struct, thread.reg18); + offset("THREAD_REG19", struct task_struct, thread.reg19); + offset("THREAD_REG20", struct task_struct, thread.reg20); + offset("THREAD_REG21", struct task_struct, thread.reg21); + offset("THREAD_REG22", struct task_struct, thread.reg22); + offset("THREAD_REG23", struct task_struct, thread.reg23); + offset("THREAD_REG29", struct task_struct, thread.reg29); + offset("THREAD_REG30", struct task_struct, thread.reg30); + offset("THREAD_REG31", struct task_struct, thread.reg31); + offset("THREAD_STATUS", struct task_struct, thread.cp0_status); - offset("#define THREAD_FPU ", struct task_struct, thread.fpu); + offset("THREAD_FPU", struct task_struct, thread.fpu); - offset("#define THREAD_BVADDR ", struct task_struct, \ + offset("THREAD_BVADDR", struct task_struct, \ thread.cp0_badvaddr); - offset("#define THREAD_BUADDR ", struct task_struct, \ + offset("THREAD_BUADDR", struct task_struct, \ thread.cp0_baduaddr); - offset("#define THREAD_ECODE ", struct task_struct, \ + offset("THREAD_ECODE", struct task_struct, \ thread.error_code); - offset("#define THREAD_TRAPNO ", struct task_struct, thread.trap_no); - offset("#define THREAD_TRAMP ", struct task_struct, \ + offset("THREAD_TRAPNO", struct task_struct, thread.trap_no); + offset("THREAD_TRAMP", struct task_struct, \ thread.irix_trampoline); - offset("#define THREAD_OLDCTX ", struct task_struct, \ + offset("THREAD_OLDCTX", struct task_struct, \ thread.irix_oldctx); linefeed; } void output_thread_fpu_defines(void) { - offset("#define THREAD_FPR0 ", + offset("THREAD_FPR0", struct task_struct, thread.fpu.fpr[0]); - offset("#define THREAD_FPR1 ", + offset("THREAD_FPR1", struct task_struct, thread.fpu.fpr[1]); - offset("#define THREAD_FPR2 ", + offset("THREAD_FPR2", struct task_struct, thread.fpu.fpr[2]); - offset("#define THREAD_FPR3 ", + offset("THREAD_FPR3", struct task_struct, thread.fpu.fpr[3]); - offset("#define THREAD_FPR4 ", + offset("THREAD_FPR4", struct task_struct, thread.fpu.fpr[4]); - offset("#define THREAD_FPR5 ", + offset("THREAD_FPR5", struct task_struct, thread.fpu.fpr[5]); - offset("#define THREAD_FPR6 ", + offset("THREAD_FPR6", struct task_struct, thread.fpu.fpr[6]); - offset("#define THREAD_FPR7 ", + offset("THREAD_FPR7", struct task_struct, thread.fpu.fpr[7]); - offset("#define THREAD_FPR8 ", + offset("THREAD_FPR8", struct task_struct, thread.fpu.fpr[8]); - offset("#define THREAD_FPR9 ", + offset("THREAD_FPR9", struct task_struct, thread.fpu.fpr[9]); - offset("#define THREAD_FPR10 ", + offset("THREAD_FPR10", struct task_struct, thread.fpu.fpr[10]); - offset("#define THREAD_FPR11 ", + offset("THREAD_FPR11", struct task_struct, thread.fpu.fpr[11]); - offset("#define THREAD_FPR12 ", + offset("THREAD_FPR12", struct task_struct, thread.fpu.fpr[12]); - offset("#define THREAD_FPR13 ", + offset("THREAD_FPR13", struct task_struct, thread.fpu.fpr[13]); - offset("#define THREAD_FPR14 ", + offset("THREAD_FPR14", struct task_struct, thread.fpu.fpr[14]); - offset("#define THREAD_FPR15 ", + offset("THREAD_FPR15", struct task_struct, thread.fpu.fpr[15]); - offset("#define THREAD_FPR16 ", + offset("THREAD_FPR16", struct task_struct, thread.fpu.fpr[16]); - offset("#define THREAD_FPR17 ", + offset("THREAD_FPR17", struct task_struct, thread.fpu.fpr[17]); - offset("#define THREAD_FPR18 ", + offset("THREAD_FPR18", struct task_struct, thread.fpu.fpr[18]); - offset("#define THREAD_FPR19 ", + offset("THREAD_FPR19", struct task_struct, thread.fpu.fpr[19]); - offset("#define THREAD_FPR20 ", + offset("THREAD_FPR20", struct task_struct, thread.fpu.fpr[20]); - offset("#define THREAD_FPR21 ", + offset("THREAD_FPR21", struct task_struct, thread.fpu.fpr[21]); - offset("#define THREAD_FPR22 ", + offset("THREAD_FPR22", struct task_struct, thread.fpu.fpr[22]); - offset("#define THREAD_FPR23 ", + offset("THREAD_FPR23", struct task_struct, thread.fpu.fpr[23]); - offset("#define THREAD_FPR24 ", + offset("THREAD_FPR24", struct task_struct, thread.fpu.fpr[24]); - offset("#define THREAD_FPR25 ", + offset("THREAD_FPR25", struct task_struct, thread.fpu.fpr[25]); - offset("#define THREAD_FPR26 ", + offset("THREAD_FPR26", struct task_struct, thread.fpu.fpr[26]); - offset("#define THREAD_FPR27 ", + offset("THREAD_FPR27", struct task_struct, thread.fpu.fpr[27]); - offset("#define THREAD_FPR28 ", + offset("THREAD_FPR28", struct task_struct, thread.fpu.fpr[28]); - offset("#define THREAD_FPR29 ", + offset("THREAD_FPR29", struct task_struct, thread.fpu.fpr[29]); - offset("#define THREAD_FPR30 ", + offset("THREAD_FPR30", struct task_struct, thread.fpu.fpr[30]); - offset("#define THREAD_FPR31 ", + offset("THREAD_FPR31", struct task_struct, thread.fpu.fpr[31]); - offset("#define THREAD_FCR31 ", + offset("THREAD_FCR31", struct task_struct, thread.fpu.fcr31); linefeed; } void output_mm_defines(void) { - text("/* Size of struct page */"); - size("#define STRUCT_PAGE_SIZE ", struct page); + text("Size of struct page"); + size("STRUCT_PAGE_SIZE", struct page); linefeed; - text("/* Linux mm_struct offsets. */"); - offset("#define MM_USERS ", struct mm_struct, mm_users); - offset("#define MM_PGD ", struct mm_struct, pgd); - offset("#define MM_CONTEXT ", struct mm_struct, context); + text("Linux mm_struct offsets."); + offset("MM_USERS", struct mm_struct, mm_users); + offset("MM_PGD", struct mm_struct, pgd); + offset("MM_CONTEXT", struct mm_struct, context); linefeed; - constant("#define _PAGE_SIZE ", PAGE_SIZE); - constant("#define _PAGE_SHIFT ", PAGE_SHIFT); + constant("_PAGE_SIZE", PAGE_SIZE); + constant("_PAGE_SHIFT", PAGE_SHIFT); linefeed; - constant("#define _PGD_T_SIZE ", sizeof(pgd_t)); - constant("#define _PMD_T_SIZE ", sizeof(pmd_t)); - constant("#define _PTE_T_SIZE ", sizeof(pte_t)); + constant("_PGD_T_SIZE", sizeof(pgd_t)); + constant("_PMD_T_SIZE", sizeof(pmd_t)); + constant("_PTE_T_SIZE", sizeof(pte_t)); linefeed; - constant("#define _PGD_T_LOG2 ", PGD_T_LOG2); - constant("#define _PMD_T_LOG2 ", PMD_T_LOG2); - constant("#define _PTE_T_LOG2 ", PTE_T_LOG2); + constant("_PGD_T_LOG2", PGD_T_LOG2); + constant("_PMD_T_LOG2", PMD_T_LOG2); + constant("_PTE_T_LOG2", PTE_T_LOG2); linefeed; - constant("#define _PGD_ORDER ", PGD_ORDER); - constant("#define _PMD_ORDER ", PMD_ORDER); - constant("#define _PTE_ORDER ", PTE_ORDER); + constant("_PGD_ORDER", PGD_ORDER); + constant("_PMD_ORDER", PMD_ORDER); + constant("_PTE_ORDER", PTE_ORDER); linefeed; - constant("#define _PMD_SHIFT ", PMD_SHIFT); - constant("#define _PGDIR_SHIFT ", PGDIR_SHIFT); + constant("_PMD_SHIFT", PMD_SHIFT); + constant("_PGDIR_SHIFT", PGDIR_SHIFT); linefeed; - constant("#define _PTRS_PER_PGD ", PTRS_PER_PGD); - constant("#define _PTRS_PER_PMD ", PTRS_PER_PMD); - constant("#define _PTRS_PER_PTE ", PTRS_PER_PTE); + constant("_PTRS_PER_PGD", PTRS_PER_PGD); + constant("_PTRS_PER_PMD", PTRS_PER_PMD); + constant("_PTRS_PER_PTE", PTRS_PER_PTE); linefeed; } #ifdef CONFIG_32BIT void output_sc_defines(void) { - text("/* Linux sigcontext offsets. */"); - offset("#define SC_REGS ", struct sigcontext, sc_regs); - offset("#define SC_FPREGS ", struct sigcontext, sc_fpregs); - offset("#define SC_ACX ", struct sigcontext, sc_acx); - offset("#define SC_MDHI ", struct sigcontext, sc_mdhi); - offset("#define SC_MDLO ", struct sigcontext, sc_mdlo); - offset("#define SC_PC ", struct sigcontext, sc_pc); - offset("#define SC_FPC_CSR ", struct sigcontext, sc_fpc_csr); - offset("#define SC_FPC_EIR ", struct sigcontext, sc_fpc_eir); - offset("#define SC_HI1 ", struct sigcontext, sc_hi1); - offset("#define SC_LO1 ", struct sigcontext, sc_lo1); - offset("#define SC_HI2 ", struct sigcontext, sc_hi2); - offset("#define SC_LO2 ", struct sigcontext, sc_lo2); - offset("#define SC_HI3 ", struct sigcontext, sc_hi3); - offset("#define SC_LO3 ", struct sigcontext, sc_lo3); + text("Linux sigcontext offsets."); + offset("SC_REGS", struct sigcontext, sc_regs); + offset("SC_FPREGS", struct sigcontext, sc_fpregs); + offset("SC_ACX", struct sigcontext, sc_acx); + offset("SC_MDHI", struct sigcontext, sc_mdhi); + offset("SC_MDLO", struct sigcontext, sc_mdlo); + offset("SC_PC", struct sigcontext, sc_pc); + offset("SC_FPC_CSR", struct sigcontext, sc_fpc_csr); + offset("SC_FPC_EIR", struct sigcontext, sc_fpc_eir); + offset("SC_HI1", struct sigcontext, sc_hi1); + offset("SC_LO1", struct sigcontext, sc_lo1); + offset("SC_HI2", struct sigcontext, sc_hi2); + offset("SC_LO2", struct sigcontext, sc_lo2); + offset("SC_HI3", struct sigcontext, sc_hi3); + offset("SC_LO3", struct sigcontext, sc_lo3); linefeed; } #endif @@ -270,13 +270,13 @@ void output_sc_defines(void) #ifdef CONFIG_64BIT void output_sc_defines(void) { - text("/* Linux sigcontext offsets. */"); - offset("#define SC_REGS ", struct sigcontext, sc_regs); - offset("#define SC_FPREGS ", struct sigcontext, sc_fpregs); - offset("#define SC_MDHI ", struct sigcontext, sc_mdhi); - offset("#define SC_MDLO ", struct sigcontext, sc_mdlo); - offset("#define SC_PC ", struct sigcontext, sc_pc); - offset("#define SC_FPC_CSR ", struct sigcontext, sc_fpc_csr); + text("Linux sigcontext offsets."); + offset("SC_REGS", struct sigcontext, sc_regs); + offset("SC_FPREGS", struct sigcontext, sc_fpregs); + offset("SC_MDHI", struct sigcontext, sc_mdhi); + offset("SC_MDLO", struct sigcontext, sc_mdlo); + offset("SC_PC", struct sigcontext, sc_pc); + offset("SC_FPC_CSR", struct sigcontext, sc_fpc_csr); linefeed; } #endif @@ -284,56 +284,56 @@ void output_sc_defines(void) #ifdef CONFIG_MIPS32_COMPAT void output_sc32_defines(void) { - text("/* Linux 32-bit sigcontext offsets. */"); - offset("#define SC32_FPREGS ", struct sigcontext32, sc_fpregs); - offset("#define SC32_FPC_CSR ", struct sigcontext32, sc_fpc_csr); - offset("#define SC32_FPC_EIR ", struct sigcontext32, sc_fpc_eir); + text("Linux 32-bit sigcontext offsets."); + offset("SC32_FPREGS", struct sigcontext32, sc_fpregs); + offset("SC32_FPC_CSR", struct sigcontext32, sc_fpc_csr); + offset("SC32_FPC_EIR", struct sigcontext32, sc_fpc_eir); linefeed; } #endif void output_signal_defined(void) { - text("/* Linux signal numbers. */"); - constant("#define _SIGHUP ", SIGHUP); - constant("#define _SIGINT ", SIGINT); - constant("#define _SIGQUIT ", SIGQUIT); - constant("#define _SIGILL ", SIGILL); - constant("#define _SIGTRAP ", SIGTRAP); - constant("#define _SIGIOT ", SIGIOT); - constant("#define _SIGABRT ", SIGABRT); - constant("#define _SIGEMT ", SIGEMT); - constant("#define _SIGFPE ", SIGFPE); - constant("#define _SIGKILL ", SIGKILL); - constant("#define _SIGBUS ", SIGBUS); - constant("#define _SIGSEGV ", SIGSEGV); - constant("#define _SIGSYS ", SIGSYS); - constant("#define _SIGPIPE ", SIGPIPE); - constant("#define _SIGALRM ", SIGALRM); - constant("#define _SIGTERM ", SIGTERM); - constant("#define _SIGUSR1 ", SIGUSR1); - constant("#define _SIGUSR2 ", SIGUSR2); - constant("#define _SIGCHLD ", SIGCHLD); - constant("#define _SIGPWR ", SIGPWR); - constant("#define _SIGWINCH ", SIGWINCH); - constant("#define _SIGURG ", SIGURG); - constant("#define _SIGIO ", SIGIO); - constant("#define _SIGSTOP ", SIGSTOP); - constant("#define _SIGTSTP ", SIGTSTP); - constant("#define _SIGCONT ", SIGCONT); - constant("#define _SIGTTIN ", SIGTTIN); - constant("#define _SIGTTOU ", SIGTTOU); - constant("#define _SIGVTALRM ", SIGVTALRM); - constant("#define _SIGPROF ", SIGPROF); - constant("#define _SIGXCPU ", SIGXCPU); - constant("#define _SIGXFSZ ", SIGXFSZ); + text("Linux signal numbers."); + constant("_SIGHUP", SIGHUP); + constant("_SIGINT", SIGINT); + constant("_SIGQUIT", SIGQUIT); + constant("_SIGILL", SIGILL); + constant("_SIGTRAP", SIGTRAP); + constant("_SIGIOT", SIGIOT); + constant("_SIGABRT", SIGABRT); + constant("_SIGEMT", SIGEMT); + constant("_SIGFPE", SIGFPE); + constant("_SIGKILL", SIGKILL); + constant("_SIGBUS", SIGBUS); + constant("_SIGSEGV", SIGSEGV); + constant("_SIGSYS", SIGSYS); + constant("_SIGPIPE", SIGPIPE); + constant("_SIGALRM", SIGALRM); + constant("_SIGTERM", SIGTERM); + constant("_SIGUSR1", SIGUSR1); + constant("_SIGUSR2", SIGUSR2); + constant("_SIGCHLD", SIGCHLD); + constant("_SIGPWR", SIGPWR); + constant("_SIGWINCH", SIGWINCH); + constant("_SIGURG", SIGURG); + constant("_SIGIO", SIGIO); + constant("_SIGSTOP", SIGSTOP); + constant("_SIGTSTP", SIGTSTP); + constant("_SIGCONT", SIGCONT); + constant("_SIGTTIN", SIGTTIN); + constant("_SIGTTOU", SIGTTOU); + constant("_SIGVTALRM", SIGVTALRM); + constant("_SIGPROF", SIGPROF); + constant("_SIGXCPU", SIGXCPU); + constant("_SIGXFSZ", SIGXFSZ); linefeed; } void output_irq_cpustat_t_defines(void) { - text("/* Linux irq_cpustat_t offsets. */"); - offset("#define IC_SOFTIRQ_PENDING ", irq_cpustat_t, __softirq_pending); - size("#define IC_IRQ_CPUSTAT_T ", irq_cpustat_t); + text("Linux irq_cpustat_t offsets."); + offset("IC_SOFTIRQ_PENDING", irq_cpustat_t, __softirq_pending); + size("IC_IRQ_CPUSTAT_T", irq_cpustat_t); linefeed; } -- cgit v1.2.3 From e26831814998cee8e6d9f0a9854cb46c516f5547 Mon Sep 17 00:00:00 2001 From: Christoph Lameter Date: Mon, 28 Apr 2008 02:12:47 -0700 Subject: pageflags: use an enum for the flags Use an enum to ease the maintenance of page flags. This is going to change the numbering from 0 to 18. Signed-off-by: Christoph Lameter Cc: Andy Whitcroft Cc: KAMEZAWA Hiroyuki Cc: KOSAKI Motohiro Cc: Rik van Riel Cc: Mel Gorman Cc: Jeremy Fitzhardinge Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/page-flags.h | 56 +++++++++++++++++++++------------------------- 1 file changed, 26 insertions(+), 30 deletions(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index b5b30f1c1e59..d66971530caa 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -67,35 +67,29 @@ * FLAGS_RESERVED which defines the width of the fields section * (see linux/mmzone.h). New flags must _not_ overlap with this area. */ -#define PG_locked 0 /* Page is locked. Don't touch. */ -#define PG_error 1 -#define PG_referenced 2 -#define PG_uptodate 3 - -#define PG_dirty 4 -#define PG_lru 5 -#define PG_active 6 -#define PG_slab 7 /* slab debug (Suparna wants this) */ - -#define PG_owner_priv_1 8 /* Owner use. If pagecache, fs may use*/ -#define PG_arch_1 9 -#define PG_reserved 10 -#define PG_private 11 /* If pagecache, has fs-private data */ - -#define PG_writeback 12 /* Page is under writeback */ -#define PG_compound 14 /* Part of a compound page */ -#define PG_swapcache 15 /* Swap page: swp_entry_t in private */ - -#define PG_mappedtodisk 16 /* Has blocks allocated on-disk */ -#define PG_reclaim 17 /* To be reclaimed asap */ -#define PG_buddy 19 /* Page is free, on buddy lists */ - -/* PG_readahead is only used for file reads; PG_reclaim is only for writes */ -#define PG_readahead PG_reclaim /* Reminder to do async read-ahead */ - -/* PG_owner_priv_1 users should have descriptive aliases */ -#define PG_checked PG_owner_priv_1 /* Used by some filesystems */ -#define PG_pinned PG_owner_priv_1 /* Xen pinned pagetable */ +enum pageflags { + PG_locked, /* Page is locked. Don't touch. */ + PG_error, + PG_referenced, + PG_uptodate, + PG_dirty, + PG_lru, + PG_active, + PG_slab, + PG_owner_priv_1, /* Owner use. If pagecache, fs may use*/ + PG_checked = PG_owner_priv_1, /* Used by some filesystems */ + PG_pinned = PG_owner_priv_1, /* Xen pinned pagetable */ + PG_arch_1, + PG_reserved, + PG_private, /* If pagecache, has fs-private data */ + PG_writeback, /* Page is under writeback */ + PG_compound, /* A compound page */ + PG_swapcache, /* Swap page: swp_entry_t in private */ + PG_mappedtodisk, /* Has blocks allocated on-disk */ + PG_reclaim, /* To be reclaimed asap */ + /* PG_readahead is only used for file reads; PG_reclaim is only for writes */ + PG_readahead = PG_reclaim, /* Reminder to do async read-ahead */ + PG_buddy, /* Page is free, on buddy lists */ #if (BITS_PER_LONG > 32) /* @@ -105,8 +99,10 @@ * 64 bit | FIELDS | ?????? FLAGS | * 63 32 0 */ -#define PG_uncached 31 /* Page has been mapped as uncached */ + PG_uncached = 31, /* Page has been mapped as uncached */ #endif + NR_PAGEFLAGS +}; /* * Manipulation of page state flags -- cgit v1.2.3 From 9223b4190fa1297a59f292f3419fc0285321d0ea Mon Sep 17 00:00:00 2001 From: Christoph Lameter Date: Mon, 28 Apr 2008 02:12:48 -0700 Subject: pageflags: get rid of FLAGS_RESERVED NR_PAGEFLAGS specifies the number of page flags we are using. From that we can calculate the number of bits leftover that can be used for zone, node (and maybe the sections id). There is no need anymore for FLAGS_RESERVED if we use NR_PAGEFLAGS. Use the new methods to make NR_PAGEFLAGS available via the preprocessor. NR_PAGEFLAGS is used to calculate field boundaries in the page flags fields. These field widths have to be available to the preprocessor. Signed-off-by: Christoph Lameter Cc: David Miller Cc: Andy Whitcroft Cc: KAMEZAWA Hiroyuki Cc: KOSAKI Motohiro Cc: Rik van Riel Cc: Mel Gorman Cc: Jeremy Fitzhardinge Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/sparc64/mm/init.c | 16 ++++++++++++++-- include/linux/mm.h | 6 +++--- include/linux/mmzone.h | 19 ------------------- include/linux/page-flags.h | 19 ++++++++++++------- kernel/bounds.c | 2 ++ 5 files changed, 31 insertions(+), 31 deletions(-) diff --git a/arch/sparc64/mm/init.c b/arch/sparc64/mm/init.c index 177d8aaeec42..8c2b50e8abc6 100644 --- a/arch/sparc64/mm/init.c +++ b/arch/sparc64/mm/init.c @@ -1699,9 +1699,21 @@ void __init paging_init(void) * functions like clear_dcache_dirty_cpu use the cpu mask * in 13-bit signed-immediate instruction fields. */ - BUILD_BUG_ON(FLAGS_RESERVED != 32); + + /* + * Page flags must not reach into upper 32 bits that are used + * for the cpu number + */ + BUILD_BUG_ON(NR_PAGEFLAGS > 32); + + /* + * The bit fields placed in the high range must not reach below + * the 32 bit boundary. Otherwise we cannot place the cpu field + * at the 32 bit boundary. + */ BUILD_BUG_ON(SECTIONS_WIDTH + NODES_WIDTH + ZONES_WIDTH + - ilog2(roundup_pow_of_two(NR_CPUS)) > FLAGS_RESERVED); + ilog2(roundup_pow_of_two(NR_CPUS)) > 32); + BUILD_BUG_ON(NR_CPUS > 4096); kern_base = (prom_boot_mapping_phys_low >> 22UL) << 22UL; diff --git a/include/linux/mm.h b/include/linux/mm.h index 4f3c1b2f44de..526f810367d9 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -407,7 +407,7 @@ static inline void set_compound_order(struct page *page, unsigned long order) #define ZONES_WIDTH ZONES_SHIFT -#if SECTIONS_WIDTH+ZONES_WIDTH+NODES_SHIFT <= FLAGS_RESERVED +#if SECTIONS_WIDTH+ZONES_WIDTH+NODES_SHIFT <= BITS_PER_LONG - NR_PAGEFLAGS #define NODES_WIDTH NODES_SHIFT #else #ifdef CONFIG_SPARSEMEM_VMEMMAP @@ -455,8 +455,8 @@ static inline void set_compound_order(struct page *page, unsigned long order) #define ZONEID_PGSHIFT (ZONEID_PGOFF * (ZONEID_SHIFT != 0)) -#if SECTIONS_WIDTH+NODES_WIDTH+ZONES_WIDTH > FLAGS_RESERVED -#error SECTIONS_WIDTH+NODES_WIDTH+ZONES_WIDTH > FLAGS_RESERVED +#if SECTIONS_WIDTH+NODES_WIDTH+ZONES_WIDTH > BITS_PER_LONG - NR_PAGEFLAGS +#error SECTIONS_WIDTH+NODES_WIDTH+ZONES_WIDTH > BITS_PER_LONG - NR_PAGEFLAGS #endif #define ZONES_MASK ((1UL << ZONES_WIDTH) - 1) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 0aece6d8937e..c7a51dac441d 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -820,25 +820,6 @@ static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist, #include #endif -#if BITS_PER_LONG == 32 -/* - * with 32 bit page->flags field, we reserve 9 bits for node/zone info. - * there are 4 zones (3 bits) and this leaves 9-3=6 bits for nodes. - */ -#define FLAGS_RESERVED 9 - -#elif BITS_PER_LONG == 64 -/* - * with 64 bit flags field, there's plenty of room. - */ -#define FLAGS_RESERVED 32 - -#else - -#error BITS_PER_LONG not defined - -#endif - #if !defined(CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID) && \ !defined(CONFIG_ARCH_POPULATES_NODE_MAP) static inline unsigned long early_pfn_to_nid(unsigned long pfn) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index d66971530caa..00e55e23b777 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -6,7 +6,10 @@ #define PAGE_FLAGS_H #include +#ifndef __GENERATING_BOUNDS_H #include +#include +#endif /* !__GENERATING_BOUNDS_H */ /* * Various page->flags bits: @@ -59,13 +62,12 @@ * extends from the high bits downwards. * * | FIELD | ... | FLAGS | - * N-1 ^ 0 - * (N-FLAGS_RESERVED) + * N-1 ^ 0 + * (NR_PAGEFLAGS) * - * The fields area is reserved for fields mapping zone, node and SPARSEMEM - * section. The boundry between these two areas is defined by - * FLAGS_RESERVED which defines the width of the fields section - * (see linux/mmzone.h). New flags must _not_ overlap with this area. + * The fields area is reserved for fields mapping zone, node (for NUMA) and + * SPARSEMEM section (for variants of SPARSEMEM that require section ids like + * SPARSEMEM_EXTREME with !SPARSEMEM_VMEMMAP). */ enum pageflags { PG_locked, /* Page is locked. Don't touch. */ @@ -101,9 +103,11 @@ enum pageflags { */ PG_uncached = 31, /* Page has been mapped as uncached */ #endif - NR_PAGEFLAGS + __NR_PAGEFLAGS }; +#ifndef __GENERATING_BOUNDS_H + /* * Manipulation of page state flags */ @@ -304,4 +308,5 @@ static inline void set_page_writeback(struct page *page) test_set_page_writeback(page); } +#endif /* !__GENERATING_BOUNDS_H */ #endif /* PAGE_FLAGS_H */ diff --git a/kernel/bounds.c b/kernel/bounds.c index 85bb281858cb..9ca2bb30243c 100644 --- a/kernel/bounds.c +++ b/kernel/bounds.c @@ -6,6 +6,7 @@ #define __GENERATING_BOUNDS_H /* Include headers that define the enum constants of interest */ +#include #define DEFINE(sym, val) \ asm volatile("\n->" #sym " %0 " #val : : "i" (val)) @@ -15,5 +16,6 @@ void foo(void) { /* The enum constants to put into include/linux/bounds.h */ + DEFINE(NR_PAGEFLAGS, __NR_PAGEFLAGS); /* End of constants */ } -- cgit v1.2.3 From f94a62e910840b3552c7adb7c57e0f8b3b345f6e Mon Sep 17 00:00:00 2001 From: Christoph Lameter Date: Mon, 28 Apr 2008 02:12:49 -0700 Subject: pageflags: introduce macros to generate page flag functions Introduce a set of macros that generate functions to handle page flags. A page flag function group typically starts with either SETPAGEFLAG(,) to create a set of page flag operations that are atomic. Or __SETPAGEFLAG(, Cc: Andy Whitcroft Cc: KAMEZAWA Hiroyuki Cc: KOSAKI Motohiro Cc: Rik van Riel Cc: Mel Gorman Cc: Jeremy Fitzhardinge Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/page-flags.h | 41 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 41 insertions(+) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 00e55e23b777..e5bddbfcf7ae 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -108,6 +108,47 @@ enum pageflags { #ifndef __GENERATING_BOUNDS_H +/* + * Macros to create function definitions for page flags + */ +#define TESTPAGEFLAG(uname, lname) \ +static inline int Page##uname(struct page *page) \ + { return test_bit(PG_##lname, &page->flags); } + +#define SETPAGEFLAG(uname, lname) \ +static inline void SetPage##uname(struct page *page) \ + { set_bit(PG_##lname, &page->flags); } + +#define CLEARPAGEFLAG(uname, lname) \ +static inline void ClearPage##uname(struct page *page) \ + { clear_bit(PG_##lname, &page->flags); } + +#define __SETPAGEFLAG(uname, lname) \ +static inline void __SetPage##uname(struct page *page) \ + { __set_bit(PG_##lname, &page->flags); } + +#define __CLEARPAGEFLAG(uname, lname) \ +static inline void __ClearPage##uname(struct page *page) \ + { __clear_bit(PG_##lname, &page->flags); } + +#define TESTSETFLAG(uname, lname) \ +static inline int TestSetPage##uname(struct page *page) \ + { return test_and_set_bit(PG_##lname, &page->flags); } + +#define TESTCLEARFLAG(uname, lname) \ +static inline int TestClearPage##uname(struct page *page) \ + { return test_and_clear_bit(PG_##lname, &page->flags); } + + +#define PAGEFLAG(uname, lname) TESTPAGEFLAG(uname, lname) \ + SETPAGEFLAG(uname, lname) CLEARPAGEFLAG(uname, lname) + +#define __PAGEFLAG(uname, lname) TESTPAGEFLAG(uname, lname) \ + __SETPAGEFLAG(uname, lname) __CLEARPAGEFLAG(uname, lname) + +#define TESTSCFLAG(uname, lname) \ + TESTSETFLAG(uname, lname) TESTCLEARFLAG(uname, lname) + /* * Manipulation of page state flags */ -- cgit v1.2.3 From 6a1e7f777f613bf0df99c7772fa2123d01ce2f7d Mon Sep 17 00:00:00 2001 From: Christoph Lameter Date: Mon, 28 Apr 2008 02:12:50 -0700 Subject: pageflags: convert to the use of new macros Replace explicit definitions of page flags through the use of macros. Significantly reduces the size of the definitions and removes a lot of opportunity for errors. Additonal page flags can typically be generated with a single line. Signed-off-by: Christoph Lameter Cc: Andy Whitcroft Cc: KAMEZAWA Hiroyuki Cc: KOSAKI Motohiro Cc: Rik van Riel Cc: Mel Gorman Cc: Jeremy Fitzhardinge Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/page-flags.h | 195 ++++++++++++++++----------------------------- 1 file changed, 68 insertions(+), 127 deletions(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index e5bddbfcf7ae..ed7659adfaaf 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -149,28 +149,58 @@ static inline int TestClearPage##uname(struct page *page) \ #define TESTSCFLAG(uname, lname) \ TESTSETFLAG(uname, lname) TESTCLEARFLAG(uname, lname) +struct page; /* forward declaration */ + +PAGEFLAG(Locked, locked) TESTSCFLAG(Locked, locked) +PAGEFLAG(Error, error) +PAGEFLAG(Referenced, referenced) TESTCLEARFLAG(Referenced, referenced) +PAGEFLAG(Dirty, dirty) TESTSCFLAG(Dirty, dirty) __CLEARPAGEFLAG(Dirty, dirty) +PAGEFLAG(LRU, lru) __CLEARPAGEFLAG(LRU, lru) +PAGEFLAG(Active, active) __CLEARPAGEFLAG(Active, active) +__PAGEFLAG(Slab, slab) +PAGEFLAG(Checked, checked) /* Used by some filesystems */ +PAGEFLAG(Pinned, pinned) /* Xen pinned pagetable */ +PAGEFLAG(Reserved, reserved) __CLEARPAGEFLAG(Reserved, reserved) +PAGEFLAG(Private, private) __CLEARPAGEFLAG(Private, private) + __SETPAGEFLAG(Private, private) + +/* + * Only test-and-set exist for PG_writeback. The unconditional operators are + * risky: they bypass page accounting. + */ +TESTPAGEFLAG(Writeback, writeback) TESTSCFLAG(Writeback, writeback) +__PAGEFLAG(Buddy, buddy) +PAGEFLAG(MappedToDisk, mappedtodisk) + +/* PG_readahead is only used for file reads; PG_reclaim is only for writes */ +PAGEFLAG(Reclaim, reclaim) TESTCLEARFLAG(Reclaim, reclaim) +PAGEFLAG(Readahead, readahead) /* Reminder to do async read-ahead */ + +#ifdef CONFIG_HIGHMEM /* - * Manipulation of page state flags + * Must use a macro here due to header dependency issues. page_zone() is not + * available at this point. */ -#define PageLocked(page) \ - test_bit(PG_locked, &(page)->flags) -#define SetPageLocked(page) \ - set_bit(PG_locked, &(page)->flags) -#define TestSetPageLocked(page) \ - test_and_set_bit(PG_locked, &(page)->flags) -#define ClearPageLocked(page) \ - clear_bit(PG_locked, &(page)->flags) -#define TestClearPageLocked(page) \ - test_and_clear_bit(PG_locked, &(page)->flags) - -#define PageError(page) test_bit(PG_error, &(page)->flags) -#define SetPageError(page) set_bit(PG_error, &(page)->flags) -#define ClearPageError(page) clear_bit(PG_error, &(page)->flags) - -#define PageReferenced(page) test_bit(PG_referenced, &(page)->flags) -#define SetPageReferenced(page) set_bit(PG_referenced, &(page)->flags) -#define ClearPageReferenced(page) clear_bit(PG_referenced, &(page)->flags) -#define TestClearPageReferenced(page) test_and_clear_bit(PG_referenced, &(page)->flags) +#define PageHighMem(__p) is_highmem(page_zone(page)) +#else +static inline int PageHighMem(struct page *page) +{ + return 0; +} +#endif + +#ifdef CONFIG_SWAP +PAGEFLAG(SwapCache, swapcache) +#else +static inline int PageSwapCache(struct page *page) +{ + return 0; +} +#endif + +#if (BITS_PER_LONG > 32) +PAGEFLAG(Uncached, uncached) +#endif static inline int PageUptodate(struct page *page) { @@ -218,97 +248,37 @@ static inline void SetPageUptodate(struct page *page) #endif } -#define ClearPageUptodate(page) clear_bit(PG_uptodate, &(page)->flags) - -#define PageDirty(page) test_bit(PG_dirty, &(page)->flags) -#define SetPageDirty(page) set_bit(PG_dirty, &(page)->flags) -#define TestSetPageDirty(page) test_and_set_bit(PG_dirty, &(page)->flags) -#define ClearPageDirty(page) clear_bit(PG_dirty, &(page)->flags) -#define __ClearPageDirty(page) __clear_bit(PG_dirty, &(page)->flags) -#define TestClearPageDirty(page) test_and_clear_bit(PG_dirty, &(page)->flags) - -#define PageLRU(page) test_bit(PG_lru, &(page)->flags) -#define SetPageLRU(page) set_bit(PG_lru, &(page)->flags) -#define ClearPageLRU(page) clear_bit(PG_lru, &(page)->flags) -#define __ClearPageLRU(page) __clear_bit(PG_lru, &(page)->flags) - -#define PageActive(page) test_bit(PG_active, &(page)->flags) -#define SetPageActive(page) set_bit(PG_active, &(page)->flags) -#define ClearPageActive(page) clear_bit(PG_active, &(page)->flags) -#define __ClearPageActive(page) __clear_bit(PG_active, &(page)->flags) - -#define PageSlab(page) test_bit(PG_slab, &(page)->flags) -#define __SetPageSlab(page) __set_bit(PG_slab, &(page)->flags) -#define __ClearPageSlab(page) __clear_bit(PG_slab, &(page)->flags) - -#ifdef CONFIG_HIGHMEM -#define PageHighMem(page) is_highmem(page_zone(page)) -#else -#define PageHighMem(page) 0 /* needed to optimize away at compile time */ -#endif - -#define PageChecked(page) test_bit(PG_checked, &(page)->flags) -#define SetPageChecked(page) set_bit(PG_checked, &(page)->flags) -#define ClearPageChecked(page) clear_bit(PG_checked, &(page)->flags) - -#define PagePinned(page) test_bit(PG_pinned, &(page)->flags) -#define SetPagePinned(page) set_bit(PG_pinned, &(page)->flags) -#define ClearPagePinned(page) clear_bit(PG_pinned, &(page)->flags) - -#define PageReserved(page) test_bit(PG_reserved, &(page)->flags) -#define SetPageReserved(page) set_bit(PG_reserved, &(page)->flags) -#define ClearPageReserved(page) clear_bit(PG_reserved, &(page)->flags) -#define __ClearPageReserved(page) __clear_bit(PG_reserved, &(page)->flags) - -#define SetPagePrivate(page) set_bit(PG_private, &(page)->flags) -#define ClearPagePrivate(page) clear_bit(PG_private, &(page)->flags) -#define PagePrivate(page) test_bit(PG_private, &(page)->flags) -#define __SetPagePrivate(page) __set_bit(PG_private, &(page)->flags) -#define __ClearPagePrivate(page) __clear_bit(PG_private, &(page)->flags) +CLEARPAGEFLAG(Uptodate, uptodate) -/* - * Only test-and-set exist for PG_writeback. The unconditional operators are - * risky: they bypass page accounting. - */ -#define PageWriteback(page) test_bit(PG_writeback, &(page)->flags) -#define TestSetPageWriteback(page) test_and_set_bit(PG_writeback, \ - &(page)->flags) -#define TestClearPageWriteback(page) test_and_clear_bit(PG_writeback, \ - &(page)->flags) - -#define PageBuddy(page) test_bit(PG_buddy, &(page)->flags) -#define __SetPageBuddy(page) __set_bit(PG_buddy, &(page)->flags) -#define __ClearPageBuddy(page) __clear_bit(PG_buddy, &(page)->flags) - -#define PageMappedToDisk(page) test_bit(PG_mappedtodisk, &(page)->flags) -#define SetPageMappedToDisk(page) set_bit(PG_mappedtodisk, &(page)->flags) -#define ClearPageMappedToDisk(page) clear_bit(PG_mappedtodisk, &(page)->flags) +extern void cancel_dirty_page(struct page *page, unsigned int account_size); -#define PageReadahead(page) test_bit(PG_readahead, &(page)->flags) -#define SetPageReadahead(page) set_bit(PG_readahead, &(page)->flags) -#define ClearPageReadahead(page) clear_bit(PG_readahead, &(page)->flags) +int test_clear_page_writeback(struct page *page); +int test_set_page_writeback(struct page *page); -#define PageReclaim(page) test_bit(PG_reclaim, &(page)->flags) -#define SetPageReclaim(page) set_bit(PG_reclaim, &(page)->flags) -#define ClearPageReclaim(page) clear_bit(PG_reclaim, &(page)->flags) -#define TestClearPageReclaim(page) test_and_clear_bit(PG_reclaim, &(page)->flags) +static inline void set_page_writeback(struct page *page) +{ + test_set_page_writeback(page); +} -#define PageCompound(page) test_bit(PG_compound, &(page)->flags) -#define __SetPageCompound(page) __set_bit(PG_compound, &(page)->flags) -#define __ClearPageCompound(page) __clear_bit(PG_compound, &(page)->flags) +TESTPAGEFLAG(Compound, compound) +__PAGEFLAG(Head, compound) /* * PG_reclaim is used in combination with PG_compound to mark the - * head and tail of a compound page + * head and tail of a compound page. This saves one page flag + * but makes it impossible to use compound pages for the page cache. + * The PG_reclaim bit would have to be used for reclaim or readahead + * if compound pages enter the page cache. * * PG_compound & PG_reclaim => Tail page * PG_compound & ~PG_reclaim => Head page */ - #define PG_head_tail_mask ((1L << PG_compound) | (1L << PG_reclaim)) -#define PageTail(page) (((page)->flags & PG_head_tail_mask) \ - == PG_head_tail_mask) +static inline int PageTail(struct page *page) +{ + return ((page->flags & PG_head_tail_mask) == PG_head_tail_mask); +} static inline void __SetPageTail(struct page *page) { @@ -320,34 +290,5 @@ static inline void __ClearPageTail(struct page *page) page->flags &= ~PG_head_tail_mask; } -#define PageHead(page) (((page)->flags & PG_head_tail_mask) \ - == (1L << PG_compound)) -#define __SetPageHead(page) __SetPageCompound(page) -#define __ClearPageHead(page) __ClearPageCompound(page) - -#ifdef CONFIG_SWAP -#define PageSwapCache(page) test_bit(PG_swapcache, &(page)->flags) -#define SetPageSwapCache(page) set_bit(PG_swapcache, &(page)->flags) -#define ClearPageSwapCache(page) clear_bit(PG_swapcache, &(page)->flags) -#else -#define PageSwapCache(page) 0 -#endif - -#define PageUncached(page) test_bit(PG_uncached, &(page)->flags) -#define SetPageUncached(page) set_bit(PG_uncached, &(page)->flags) -#define ClearPageUncached(page) clear_bit(PG_uncached, &(page)->flags) - -struct page; /* forward declaration */ - -extern void cancel_dirty_page(struct page *page, unsigned int account_size); - -int test_clear_page_writeback(struct page *page); -int test_set_page_writeback(struct page *page); - -static inline void set_page_writeback(struct page *page) -{ - test_set_page_writeback(page); -} - #endif /* !__GENERATING_BOUNDS_H */ #endif /* PAGE_FLAGS_H */ -- cgit v1.2.3 From d60cd46bbdc5a79d9a177e40009f960e44f0e334 Mon Sep 17 00:00:00 2001 From: Christoph Lameter Date: Mon, 28 Apr 2008 02:12:51 -0700 Subject: pageflags: use proper page flag functions in Xen Xen uses bitops to manipulate page flags. Make it use proper page flag functions. Signed-off-by: Christoph Lameter Cc: Andy Whitcroft Cc: KAMEZAWA Hiroyuki Cc: KOSAKI Motohiro Cc: Rik van Riel Cc: Mel Gorman Cc: Jeremy Fitzhardinge Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/x86/xen/mmu.c | 4 ++-- include/linux/page-flags.h | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/x86/xen/mmu.c b/arch/x86/xen/mmu.c index 6cbcf65609ad..126766d43aea 100644 --- a/arch/x86/xen/mmu.c +++ b/arch/x86/xen/mmu.c @@ -387,7 +387,7 @@ static void xen_do_pin(unsigned level, unsigned long pfn) static int pin_page(struct page *page, enum pt_level level) { - unsigned pgfl = test_and_set_bit(PG_pinned, &page->flags); + unsigned pgfl = TestSetPagePinned(page); int flush; if (pgfl) @@ -468,7 +468,7 @@ void __init xen_mark_init_mm_pinned(void) static int unpin_page(struct page *page, enum pt_level level) { - unsigned pgfl = test_and_clear_bit(PG_pinned, &page->flags); + unsigned pgfl = TestClearPagePinned(page); if (pgfl && !PageHighMem(page)) { void *pt = lowmem_page_address(page); diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index ed7659adfaaf..3cafd878e4ca 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -159,7 +159,7 @@ PAGEFLAG(LRU, lru) __CLEARPAGEFLAG(LRU, lru) PAGEFLAG(Active, active) __CLEARPAGEFLAG(Active, active) __PAGEFLAG(Slab, slab) PAGEFLAG(Checked, checked) /* Used by some filesystems */ -PAGEFLAG(Pinned, pinned) /* Xen pinned pagetable */ +PAGEFLAG(Pinned, pinned) TESTSCFLAG(Pinned, pinned) /* Xen pagetable */ PAGEFLAG(Reserved, reserved) __CLEARPAGEFLAG(Reserved, reserved) PAGEFLAG(Private, private) __CLEARPAGEFLAG(Private, private) __SETPAGEFLAG(Private, private) -- cgit v1.2.3 From 0a128b2b1a5e8ebce0260e3345812ee70daccc7f Mon Sep 17 00:00:00 2001 From: Christoph Lameter Date: Mon, 28 Apr 2008 02:12:52 -0700 Subject: pageflags: eliminate PG_xxx aliases Remove aliases of PG_xxx. We can easily drop those now and alias by specifying the PG_xxx flag in the macro that generates the functions. Signed-off-by: Christoph Lameter Cc: Andy Whitcroft Cc: KAMEZAWA Hiroyuki Cc: KOSAKI Motohiro Cc: Rik van Riel Cc: Mel Gorman Cc: Jeremy Fitzhardinge Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/page-flags.h | 12 ++++-------- mm/page_alloc.c | 2 +- 2 files changed, 5 insertions(+), 9 deletions(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 3cafd878e4ca..437778c703f5 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -79,8 +79,6 @@ enum pageflags { PG_active, PG_slab, PG_owner_priv_1, /* Owner use. If pagecache, fs may use*/ - PG_checked = PG_owner_priv_1, /* Used by some filesystems */ - PG_pinned = PG_owner_priv_1, /* Xen pinned pagetable */ PG_arch_1, PG_reserved, PG_private, /* If pagecache, has fs-private data */ @@ -89,8 +87,6 @@ enum pageflags { PG_swapcache, /* Swap page: swp_entry_t in private */ PG_mappedtodisk, /* Has blocks allocated on-disk */ PG_reclaim, /* To be reclaimed asap */ - /* PG_readahead is only used for file reads; PG_reclaim is only for writes */ - PG_readahead = PG_reclaim, /* Reminder to do async read-ahead */ PG_buddy, /* Page is free, on buddy lists */ #if (BITS_PER_LONG > 32) @@ -158,8 +154,8 @@ PAGEFLAG(Dirty, dirty) TESTSCFLAG(Dirty, dirty) __CLEARPAGEFLAG(Dirty, dirty) PAGEFLAG(LRU, lru) __CLEARPAGEFLAG(LRU, lru) PAGEFLAG(Active, active) __CLEARPAGEFLAG(Active, active) __PAGEFLAG(Slab, slab) -PAGEFLAG(Checked, checked) /* Used by some filesystems */ -PAGEFLAG(Pinned, pinned) TESTSCFLAG(Pinned, pinned) /* Xen pagetable */ +PAGEFLAG(Checked, owner_priv_1) /* Used by some filesystems */ +PAGEFLAG(Pinned, owner_priv_1) TESTSCFLAG(Pinned, owner_priv_1) /* Xen */ PAGEFLAG(Reserved, reserved) __CLEARPAGEFLAG(Reserved, reserved) PAGEFLAG(Private, private) __CLEARPAGEFLAG(Private, private) __SETPAGEFLAG(Private, private) @@ -174,14 +170,14 @@ PAGEFLAG(MappedToDisk, mappedtodisk) /* PG_readahead is only used for file reads; PG_reclaim is only for writes */ PAGEFLAG(Reclaim, reclaim) TESTCLEARFLAG(Reclaim, reclaim) -PAGEFLAG(Readahead, readahead) /* Reminder to do async read-ahead */ +PAGEFLAG(Readahead, reclaim) /* Reminder to do async read-ahead */ #ifdef CONFIG_HIGHMEM /* * Must use a macro here due to header dependency issues. page_zone() is not * available at this point. */ -#define PageHighMem(__p) is_highmem(page_zone(page)) +#define PageHighMem(__p) is_highmem(page_zone(__p)) #else static inline int PageHighMem(struct page *page) { diff --git a/mm/page_alloc.c b/mm/page_alloc.c index af28e2cec8b4..e0fc3baba843 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -632,7 +632,7 @@ static int prep_new_page(struct page *page, int order, gfp_t gfp_flags) if (PageReserved(page)) return 1; - page->flags &= ~(1 << PG_uptodate | 1 << PG_error | 1 << PG_readahead | + page->flags &= ~(1 << PG_uptodate | 1 << PG_error | 1 << PG_reclaim | 1 << PG_referenced | 1 << PG_arch_1 | 1 << PG_owner_priv_1 | 1 << PG_mappedtodisk); set_page_private(page, 0); -- cgit v1.2.3 From 602c4d112f9abf43af4b882b4a6f5505ed5c51b7 Mon Sep 17 00:00:00 2001 From: Christoph Lameter Date: Mon, 28 Apr 2008 02:12:52 -0700 Subject: page flags: handle PG_uncached like all other flags Remove the special setup for PG_uncached and simply make it part of the enum. The page flag will only be allocated when the kernel build includes the uncached allocator. Acked-by: Dean Nelson Cc: Jes Sorensen Signed-off-by: Christoph Lameter Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/page-flags.h | 19 ++++++++----------- 1 file changed, 8 insertions(+), 11 deletions(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 437778c703f5..17deafa9eb9b 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -88,16 +88,8 @@ enum pageflags { PG_mappedtodisk, /* Has blocks allocated on-disk */ PG_reclaim, /* To be reclaimed asap */ PG_buddy, /* Page is free, on buddy lists */ - -#if (BITS_PER_LONG > 32) -/* - * 64-bit-only flags build down from bit 31 - * - * 32 bit -------------------------------| FIELDS | FLAGS | - * 64 bit | FIELDS | ?????? FLAGS | - * 63 32 0 - */ - PG_uncached = 31, /* Page has been mapped as uncached */ +#ifdef CONFIG_IA64_UNCACHED_ALLOCATOR + PG_uncached, /* Page has been mapped as uncached */ #endif __NR_PAGEFLAGS }; @@ -194,8 +186,13 @@ static inline int PageSwapCache(struct page *page) } #endif -#if (BITS_PER_LONG > 32) +#ifdef CONFIG_IA64_UNCACHED_ALLOCATOR PAGEFLAG(Uncached, uncached) +#else +static inline int PageUncached(struct page *) +{ + return 0; +} #endif static inline int PageUptodate(struct page *page) -- cgit v1.2.3 From ec7cade8c1a3d1ace69b35cc843b181818578dce Mon Sep 17 00:00:00 2001 From: Christoph Lameter Date: Mon, 28 Apr 2008 02:12:53 -0700 Subject: page flags: add PAGEFLAGS_FALSE for flags that are always false Turns out that there are a number of times that a flag is simply always returning 0. Define a macro for that. Signed-off-by: Christoph Lameter Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/page-flags.h | 19 +++++++------------ 1 file changed, 7 insertions(+), 12 deletions(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 17deafa9eb9b..d16efa9066d9 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -134,6 +134,10 @@ static inline int TestClearPage##uname(struct page *page) \ #define __PAGEFLAG(uname, lname) TESTPAGEFLAG(uname, lname) \ __SETPAGEFLAG(uname, lname) __CLEARPAGEFLAG(uname, lname) +#define PAGEFLAG_FALSE(uname) \ +static inline int Page##uname(struct page *page) \ + { return 0; } + #define TESTSCFLAG(uname, lname) \ TESTSETFLAG(uname, lname) TESTCLEARFLAG(uname, lname) @@ -171,28 +175,19 @@ PAGEFLAG(Readahead, reclaim) /* Reminder to do async read-ahead */ */ #define PageHighMem(__p) is_highmem(page_zone(__p)) #else -static inline int PageHighMem(struct page *page) -{ - return 0; -} +PAGEFLAG_FALSE(HighMem) #endif #ifdef CONFIG_SWAP PAGEFLAG(SwapCache, swapcache) #else -static inline int PageSwapCache(struct page *page) -{ - return 0; -} +PAGEFLAG_FALSE(SwapCache) #endif #ifdef CONFIG_IA64_UNCACHED_ALLOCATOR PAGEFLAG(Uncached, uncached) #else -static inline int PageUncached(struct page *) -{ - return 0; -} +PAGEFLAG_FALSE(Uncached) #endif static inline int PageUptodate(struct page *page) -- cgit v1.2.3 From 97965478a66fbdf0f4ad5e4ecc4828f0cb548a45 Mon Sep 17 00:00:00 2001 From: Christoph Lameter Date: Mon, 28 Apr 2008 02:12:54 -0700 Subject: mm: Get rid of __ZONE_COUNT It was used to compensate because MAX_NR_ZONES was not available to the #ifdefs. Export MAX_NR_ZONES via the new mechanism and get rid of __ZONE_COUNT. Signed-off-by: Christoph Lameter Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mmzone.h | 28 +++++++++++----------------- kernel/bounds.c | 2 ++ 2 files changed, 13 insertions(+), 17 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index c7a51dac441d..c3828497f41d 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -3,6 +3,7 @@ #ifdef __KERNEL__ #ifndef __ASSEMBLY__ +#ifndef __GENERATING_BOUNDS_H #include #include @@ -15,6 +16,7 @@ #include #include #include +#include #include #include @@ -129,6 +131,8 @@ struct per_cpu_pageset { #define zone_pcp(__z, __cpu) (&(__z)->pageset[(__cpu)]) #endif +#endif /* !__GENERATING_BOUNDS.H */ + enum zone_type { #ifdef CONFIG_ZONE_DMA /* @@ -177,9 +181,11 @@ enum zone_type { ZONE_HIGHMEM, #endif ZONE_MOVABLE, - MAX_NR_ZONES + __MAX_NR_ZONES }; +#ifndef __GENERATING_BOUNDS_H + /* * When a memory allocation must conform to specific limitations (such * as being suitable for DMA) the caller will pass in hints to the @@ -188,28 +194,15 @@ enum zone_type { * match the requested limits. See gfp_zone() in include/linux/gfp.h */ -/* - * Count the active zones. Note that the use of defined(X) outside - * #if and family is not necessarily defined so ensure we cannot use - * it later. Use __ZONE_COUNT to work out how many shift bits we need. - */ -#define __ZONE_COUNT ( \ - defined(CONFIG_ZONE_DMA) \ - + defined(CONFIG_ZONE_DMA32) \ - + 1 \ - + defined(CONFIG_HIGHMEM) \ - + 1 \ -) -#if __ZONE_COUNT < 2 +#if MAX_NR_ZONES < 2 #define ZONES_SHIFT 0 -#elif __ZONE_COUNT <= 2 +#elif MAX_NR_ZONES <= 2 #define ZONES_SHIFT 1 -#elif __ZONE_COUNT <= 4 +#elif MAX_NR_ZONES <= 4 #define ZONES_SHIFT 2 #else #error ZONES_SHIFT -- too many zones configured adjust calculation #endif -#undef __ZONE_COUNT struct zone { /* Fields commonly accessed by the page allocator */ @@ -1008,6 +1001,7 @@ unsigned long __init node_memmap_size_bytes(int, unsigned long, unsigned long); #define pfn_valid_within(pfn) (1) #endif +#endif /* !__GENERATING_BOUNDS.H */ #endif /* !__ASSEMBLY__ */ #endif /* __KERNEL__ */ #endif /* _LINUX_MMZONE_H */ diff --git a/kernel/bounds.c b/kernel/bounds.c index 9ca2bb30243c..c3c55544db2f 100644 --- a/kernel/bounds.c +++ b/kernel/bounds.c @@ -7,6 +7,7 @@ #define __GENERATING_BOUNDS_H /* Include headers that define the enum constants of interest */ #include +#include #define DEFINE(sym, val) \ asm volatile("\n->" #sym " %0 " #val : : "i" (val)) @@ -17,5 +18,6 @@ void foo(void) { /* The enum constants to put into include/linux/bounds.h */ DEFINE(NR_PAGEFLAGS, __NR_PAGEFLAGS); + DEFINE(MAX_NR_ZONES, __MAX_NR_ZONES); /* End of constants */ } -- cgit v1.2.3 From e20b8cca760ed2a6abcfe37ef56f2306790db648 Mon Sep 17 00:00:00 2001 From: Christoph Lameter Date: Mon, 28 Apr 2008 02:12:55 -0700 Subject: PAGEFLAGS_EXTENDED and separate page flags for Head and Tail Having separate page flags for the head and the tail of a compound page allows the compiler to use bitops instead of operations on a word to check for a tail page. That is f.e. important for virt_to_head_page() which is used in various critical code paths (kfree for example): Code for PageTail(page) Before: mov (%rdi),%rdx page->flags mov %rdx,%rax 3 bytes and $0x12000,%eax 5 bytes cmp $0x12000,%rax 6 bytes je 897 After: mov (%rdi),%rax test $0x40,%ah (3 bytes) jne 887 So we go from 14 bytes to 3 bytes and from 3 instructions to one. From the use of 2 registers we go to none. We can only use page flags for this if we have page flags available. This patch introduces CONFIG_PAGEFLAGS_EXTENDED that is set if pageflags are not scarce due to SPARSEMEM using page flags for its sectionid on 32 bit NUMA platforms. Additional page flag definitions can be added to the CONFIG_PAGEFLAGS_EXTENDED section in page-flags.h if the functionality depends on PAGEFLAGS_EXTENDED or if more page flag overlapping tricks are used for the !PAGEFLAGS_EXTENDED fallback (the upcoming virtual compound patch may hook in here and Rik's/Lee's additional page flags to solve the reclaim issues could also be added there [hint... hint... where are these patchsets?]). Avoiding the overlaying of Pg_reclaim also clears the way for possible use of compound pages for the pagecache or on the LRU. Signed-off-by: Christoph Lameter Cc: Nick Piggin Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/page-flags.h | 28 ++++++++++++++++++++++++++++ mm/Kconfig | 12 ++++++++++++ 2 files changed, 40 insertions(+) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index d16efa9066d9..590cff32415d 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -83,7 +83,12 @@ enum pageflags { PG_reserved, PG_private, /* If pagecache, has fs-private data */ PG_writeback, /* Page is under writeback */ +#ifdef CONFIG_PAGEFLAGS_EXTENDED + PG_head, /* A head page */ + PG_tail, /* A tail page */ +#else PG_compound, /* A compound page */ +#endif PG_swapcache, /* Swap page: swp_entry_t in private */ PG_mappedtodisk, /* Has blocks allocated on-disk */ PG_reclaim, /* To be reclaimed asap */ @@ -248,6 +253,28 @@ static inline void set_page_writeback(struct page *page) test_set_page_writeback(page); } +#ifdef CONFIG_PAGEFLAGS_EXTENDED +/* + * System with lots of page flags available. This allows separate + * flags for PageHead() and PageTail() checks of compound pages so that bit + * tests can be used in performance sensitive paths. PageCompound is + * generally not used in hot code paths. + */ +__PAGEFLAG(Head, head) +__PAGEFLAG(Tail, tail) + +static inline int PageCompound(struct page *page) +{ + return page->flags & ((1L << PG_head) | (1L << PG_tail)); + +} +#else +/* + * Reduce page flag use as much as possible by overlapping + * compound page flags with the flags used for page cache pages. Possible + * because PageCompound is always set for compound pages and not for + * pages on the LRU and/or pagecache. + */ TESTPAGEFLAG(Compound, compound) __PAGEFLAG(Head, compound) @@ -278,5 +305,6 @@ static inline void __ClearPageTail(struct page *page) page->flags &= ~PG_head_tail_mask; } +#endif /* !PAGEFLAGS_EXTENDED */ #endif /* !__GENERATING_BOUNDS_H */ #endif /* PAGE_FLAGS_H */ diff --git a/mm/Kconfig b/mm/Kconfig index 0016ebd4dcba..3aa819d628c1 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -143,6 +143,18 @@ config MEMORY_HOTREMOVE depends on MEMORY_HOTPLUG && ARCH_ENABLE_MEMORY_HOTREMOVE depends on MIGRATION +# +# If we have space for more page flags then we can enable additional +# optimizations and functionality. +# +# Regular Sparsemem takes page flag bits for the sectionid if it does not +# use a virtual memmap. Disable extended page flags for 32 bit platforms +# that require the use of a sectionid in the page flags. +# +config PAGEFLAGS_EXTENDED + def_bool y + depends on 64BIT || SPARSEMEM_VMEMMAP || !NUMA || !SPARSEMEM + # Heavily threaded applications may benefit from splitting the mm-wide # page_table_lock, so that faults on different parts of the user address # space can be handled with less contention: split it at this NR_CPUS. -- cgit v1.2.3 From 214e471ff99064726b2d8af3aa0e24a73c775531 Mon Sep 17 00:00:00 2001 From: Peter Zijlstra Date: Mon, 28 Apr 2008 02:12:55 -0700 Subject: smaps: account swap entries Show the amount of swap for each vma. This can be used to see where all the swap goes. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Peter Zijlstra Acked-by: Matt Mackall Acked-by: KOSAKI Motohiro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/proc/task_mmu.c | 16 ++++++++++++---- 1 file changed, 12 insertions(+), 4 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index f4ab76c7c662..7415eeb7cc3a 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -338,8 +338,7 @@ const struct file_operations proc_maps_operations = { #define PSS_SHIFT 12 #ifdef CONFIG_PROC_PAGE_MONITOR -struct mem_size_stats -{ +struct mem_size_stats { struct vm_area_struct *vma; unsigned long resident; unsigned long shared_clean; @@ -347,6 +346,7 @@ struct mem_size_stats unsigned long private_clean; unsigned long private_dirty; unsigned long referenced; + unsigned long swap; u64 pss; }; @@ -363,6 +363,12 @@ static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); for (; addr != end; pte++, addr += PAGE_SIZE) { ptent = *pte; + + if (is_swap_pte(ptent)) { + mss->swap += PAGE_SIZE; + continue; + } + if (!pte_present(ptent)) continue; @@ -421,7 +427,8 @@ static int show_smap(struct seq_file *m, void *v) "Shared_Dirty: %8lu kB\n" "Private_Clean: %8lu kB\n" "Private_Dirty: %8lu kB\n" - "Referenced: %8lu kB\n", + "Referenced: %8lu kB\n" + "Swap: %8lu kB\n", (vma->vm_end - vma->vm_start) >> 10, mss.resident >> 10, (unsigned long)(mss.pss >> (10 + PSS_SHIFT)), @@ -429,7 +436,8 @@ static int show_smap(struct seq_file *m, void *v) mss.shared_dirty >> 10, mss.private_clean >> 10, mss.private_dirty >> 10, - mss.referenced >> 10); + mss.referenced >> 10, + mss.swap >> 10); return ret; } -- cgit v1.2.3 From b379d790197cdf8a95fb67507d75a24ac0a1678d Mon Sep 17 00:00:00 2001 From: Jared Hulbert Date: Mon, 28 Apr 2008 02:12:58 -0700 Subject: mm: introduce VM_MIXEDMAP This series introduces some important infrastructure work. The overall result is that: 1. We now support XIP backed filesystems using memory that have no struct page allocated to them. And patches 6 and 7 actually implement this for s390. This is pretty important in a number of cases. As far as I understand, in the case of virtualisation (eg. s390), each guest may mount a readonly copy of the same filesystem (eg. the distro). Currently, guests need to allocate struct pages for this image. So if you have 100 guests, you already need to allocate more memory for the struct pages than the size of the image. I think. (Carsten?) For other (eg. embedded) systems, you may have a very large non- volatile filesystem. If you have to have struct pages for this, then your RAM consumption will go up proportionally to fs size. Even though it is just a small proportion, the RAM can be much more costly eg in terms of power, so every KB less that Linux uses makes it more attractive to a lot of these guys. 2. VM_MIXEDMAP allows us to support mappings where you actually do want to refcount _some_ pages in the mapping, but not others, and support COW on arbitrary (non-linear) mappings. Jared needs this for his NVRAM filesystem in progress. Future iterations of this filesystem will most likely want to migrate pages between pagecache and XIP backing, which is where the requirement for mixed (some refcounted, some not) comes from. 3. pte_special also has a peripheral usage that I need for my lockless get_user_pages patch. That was shown to speed up "oltp" on db2 by 10% on a 2 socket system, which is kind of significant because they scrounge for months to try to find 0.1% improvement on these workloads. I'm hoping we might finally be faster than AIX on pSeries with this :). My reference to lockless get_user_pages is not meant to justify this patchset (which doesn't include lockless gup), but just to show that pte_special is not some s390 specific thing that should be hidden in arch code or xip code: I definitely want to use it on at least x86 and powerpc as well. This patch: Introduce a new type of mapping, VM_MIXEDMAP. This is unlike VM_PFNMAP in that it can support COW mappings of arbitrary ranges including ranges without struct page *and* ranges with a struct page that we actually want to refcount (PFNMAP can only support COW in those cases where the un-COW-ed translations are mapped linearly in the virtual address, and can only support non refcounted ranges). VM_MIXEDMAP achieves this by refcounting all pfn_valid pages, and not refcounting !pfn_valid pages (which is not an option for VM_PFNMAP, because it needs to avoid refcounting pfn_valid pages eg. for /dev/mem mappings). Signed-off-by: Jared Hulbert Signed-off-by: Nick Piggin Acked-by: Carsten Otte Cc: Jared Hulbert Cc: Martin Schwidefsky Cc: Heiko Carstens Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mm.h | 1 + mm/memory.c | 79 +++++++++++++++++++++++++++++++++++++++--------------- 2 files changed, 59 insertions(+), 21 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 526f810367d9..c657ea0bd6aa 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -107,6 +107,7 @@ extern unsigned int kobjsize(const void *objp); #define VM_ALWAYSDUMP 0x04000000 /* Always include in core dumps */ #define VM_CAN_NONLINEAR 0x08000000 /* Has ->fault & does nonlinear pages */ +#define VM_MIXEDMAP 0x10000000 /* Can contain "struct page" and pure PFN pages */ #ifndef VM_STACK_DEFAULT_FLAGS /* arch can override this */ #define VM_STACK_DEFAULT_FLAGS VM_DATA_DEFAULT_FLAGS diff --git a/mm/memory.c b/mm/memory.c index 46958fb97c2d..0da414c383e7 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -371,35 +371,65 @@ static inline int is_cow_mapping(unsigned int flags) } /* - * This function gets the "struct page" associated with a pte. + * This function gets the "struct page" associated with a pte or returns + * NULL if no "struct page" is associated with the pte. * - * NOTE! Some mappings do not have "struct pages". A raw PFN mapping - * will have each page table entry just pointing to a raw page frame - * number, and as far as the VM layer is concerned, those do not have - * pages associated with them - even if the PFN might point to memory + * A raw VM_PFNMAP mapping (ie. one that is not COWed) may not have any "struct + * page" backing, and even if they do, they are not refcounted. COWed pages of + * a VM_PFNMAP do always have a struct page, and they are normally refcounted + * (they are _normal_ pages). + * + * So a raw PFNMAP mapping will have each page table entry just pointing + * to a page frame number, and as far as the VM layer is concerned, those do + * not have pages associated with them - even if the PFN might point to memory * that otherwise is perfectly fine and has a "struct page". * - * The way we recognize those mappings is through the rules set up - * by "remap_pfn_range()": the vma will have the VM_PFNMAP bit set, - * and the vm_pgoff will point to the first PFN mapped: thus every + * The way we recognize COWed pages within VM_PFNMAP mappings is through the + * rules set up by "remap_pfn_range()": the vma will have the VM_PFNMAP bit + * set, and the vm_pgoff will point to the first PFN mapped: thus every * page that is a raw mapping will always honor the rule * * pfn_of_page == vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT) * - * and if that isn't true, the page has been COW'ed (in which case it - * _does_ have a "struct page" associated with it even if it is in a - * VM_PFNMAP range). + * A call to vm_normal_page() will return NULL for such a page. + * + * If the page doesn't follow the "remap_pfn_range()" rule in a VM_PFNMAP + * then the page has been COW'ed. A COW'ed page _does_ have a "struct page" + * associated with it even if it is in a VM_PFNMAP range. Calling + * vm_normal_page() on such a page will therefore return the "struct page". + * + * + * VM_MIXEDMAP mappings can likewise contain memory with or without "struct + * page" backing, however the difference is that _all_ pages with a struct + * page (that is, those where pfn_valid is true) are refcounted and considered + * normal pages by the VM. The disadvantage is that pages are refcounted + * (which can be slower and simply not an option for some PFNMAP users). The + * advantage is that we don't have to follow the strict linearity rule of + * PFNMAP mappings in order to support COWable mappings. + * + * A call to vm_normal_page() with a VM_MIXEDMAP mapping will return the + * associated "struct page" or NULL for memory not backed by a "struct page". + * + * + * All other mappings should have a valid struct page, which will be + * returned by a call to vm_normal_page(). */ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, pte_t pte) { unsigned long pfn = pte_pfn(pte); - if (unlikely(vma->vm_flags & VM_PFNMAP)) { - unsigned long off = (addr - vma->vm_start) >> PAGE_SHIFT; - if (pfn == vma->vm_pgoff + off) - return NULL; - if (!is_cow_mapping(vma->vm_flags)) - return NULL; + if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) { + if (vma->vm_flags & VM_MIXEDMAP) { + if (!pfn_valid(pfn)) + return NULL; + goto out; + } else { + unsigned long off = (addr-vma->vm_start) >> PAGE_SHIFT; + if (pfn == vma->vm_pgoff + off) + return NULL; + if (!is_cow_mapping(vma->vm_flags)) + return NULL; + } } #ifdef CONFIG_DEBUG_VM @@ -422,6 +452,7 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, pte_ * The PAGE_ZERO() pages and various VDSO mappings can * cause them to exist. */ +out: return pfn_to_page(pfn); } @@ -1232,8 +1263,11 @@ int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr, pte_t *pte, entry; spinlock_t *ptl; - BUG_ON(!(vma->vm_flags & VM_PFNMAP)); - BUG_ON(is_cow_mapping(vma->vm_flags)); + BUG_ON(!(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))); + BUG_ON((vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)) == + (VM_PFNMAP|VM_MIXEDMAP)); + BUG_ON((vma->vm_flags & VM_PFNMAP) && is_cow_mapping(vma->vm_flags)); + BUG_ON((vma->vm_flags & VM_MIXEDMAP) && pfn_valid(pfn)); retval = -ENOMEM; pte = get_locked_pte(mm, addr, &ptl); @@ -2365,10 +2399,13 @@ static noinline int do_no_pfn(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long pfn; pte_unmap(page_table); - BUG_ON(!(vma->vm_flags & VM_PFNMAP)); - BUG_ON(is_cow_mapping(vma->vm_flags)); + BUG_ON(!(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))); + BUG_ON((vma->vm_flags & VM_PFNMAP) && is_cow_mapping(vma->vm_flags)); pfn = vma->vm_ops->nopfn(vma, address & PAGE_MASK); + + BUG_ON((vma->vm_flags & VM_MIXEDMAP) && pfn_valid(pfn)); + if (unlikely(pfn == NOPFN_OOM)) return VM_FAULT_OOM; else if (unlikely(pfn == NOPFN_SIGBUS)) -- cgit v1.2.3 From 7e675137a8e1a4d45822746456dd389b65745bf6 Mon Sep 17 00:00:00 2001 From: Nick Piggin Date: Mon, 28 Apr 2008 02:13:00 -0700 Subject: mm: introduce pte_special pte bit s390 for one, cannot implement VM_MIXEDMAP with pfn_valid, due to their memory model (which is more dynamic than most). Instead, they had proposed to implement it with an additional path through vm_normal_page(), using a bit in the pte to determine whether or not the page should be refcounted: vm_normal_page() { ... if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) { if (vma->vm_flags & VM_MIXEDMAP) { #ifdef s390 if (!mixedmap_refcount_pte(pte)) return NULL; #else if (!pfn_valid(pfn)) return NULL; #endif goto out; } ... } This is fine, however if we are allowed to use a bit in the pte to determine refcountedness, we can use that to _completely_ replace all the vma based schemes. So instead of adding more cases to the already complex vma-based scheme, we can have a clearly seperate and simple pte-based scheme (and get slightly better code generation in the process): vm_normal_page() { #ifdef s390 if (!mixedmap_refcount_pte(pte)) return NULL; return pte_page(pte); #else ... #endif } And finally, we may rather make this concept usable by any architecture rather than making it s390 only, so implement a new type of pte state for this. Unfortunately the old vma based code must stay, because some architectures may not be able to spare pte bits. This makes vm_normal_page a little bit more ugly than we would like, but the 2 cases are clearly seperate. So introduce a pte_special pte state, and use it in mm/memory.c. It is currently a noop for all architectures, so this doesn't actually result in any compiled code changes to mm/memory.o. BTW: I haven't put vm_normal_page() into arch code as-per an earlier suggestion. The reason is that, regardless of where vm_normal_page is actually implemented, the *abstraction* is still exactly the same. Also, while it depends on whether the architecture has pte_special or not, that is the only two possible cases, and it really isn't an arch specific function -- the role of the arch code should be to provide primitive functions and accessors with which to build the core code; pte_special does that. We do not want architectures to know or care about vm_normal_page itself, and we definitely don't want them being able to invent something new there out of sight of mm/ code. If we made vm_normal_page an arch function, then we have to make vm_insert_mixed (next patch) an arch function too. So I don't think moving it to arch code fundamentally improves any abstractions, while it does practically make the code more difficult to follow, for both mm and arch developers, and easier to misuse. [akpm@linux-foundation.org: build fix] Signed-off-by: Nick Piggin Acked-by: Carsten Otte Cc: Jared Hulbert Cc: Martin Schwidefsky Cc: Heiko Carstens Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/asm-alpha/pgtable.h | 2 + include/asm-arm/pgtable.h | 3 ++ include/asm-avr32/pgtable.h | 8 +++ include/asm-cris/pgtable.h | 2 + include/asm-frv/pgtable.h | 2 + include/asm-ia64/pgtable.h | 3 ++ include/asm-m32r/pgtable.h | 10 ++++ include/asm-m68k/motorola_pgtable.h | 2 + include/asm-m68k/sun3_pgtable.h | 2 + include/asm-mips/pgtable.h | 2 + include/asm-mn10300/pgtable.h | 3 ++ include/asm-parisc/pgtable.h | 2 + include/asm-powerpc/pgtable-ppc32.h | 3 ++ include/asm-powerpc/pgtable-ppc64.h | 3 ++ include/asm-ppc/pgtable.h | 3 ++ include/asm-s390/pgtable.h | 10 ++++ include/asm-sh/pgtable_32.h | 3 ++ include/asm-sh/pgtable_64.h | 10 ++-- include/asm-sparc/pgtable.h | 7 +++ include/asm-sparc64/pgtable.h | 10 ++++ include/asm-um/pgtable.h | 10 ++++ include/asm-x86/pgtable.h | 10 ++++ include/asm-xtensa/pgtable.h | 4 ++ include/linux/mm.h | 4 +- mm/memory.c | 99 ++++++++++++++++++++----------------- 25 files changed, 168 insertions(+), 49 deletions(-) diff --git a/include/asm-alpha/pgtable.h b/include/asm-alpha/pgtable.h index 99037b032357..05ce5fba43e3 100644 --- a/include/asm-alpha/pgtable.h +++ b/include/asm-alpha/pgtable.h @@ -268,6 +268,7 @@ extern inline int pte_write(pte_t pte) { return !(pte_val(pte) & _PAGE_FOW); } extern inline int pte_dirty(pte_t pte) { return pte_val(pte) & _PAGE_DIRTY; } extern inline int pte_young(pte_t pte) { return pte_val(pte) & _PAGE_ACCESSED; } extern inline int pte_file(pte_t pte) { return pte_val(pte) & _PAGE_FILE; } +extern inline int pte_special(pte_t pte) { return 0; } extern inline pte_t pte_wrprotect(pte_t pte) { pte_val(pte) |= _PAGE_FOW; return pte; } extern inline pte_t pte_mkclean(pte_t pte) { pte_val(pte) &= ~(__DIRTY_BITS); return pte; } @@ -275,6 +276,7 @@ extern inline pte_t pte_mkold(pte_t pte) { pte_val(pte) &= ~(__ACCESS_BITS); ret extern inline pte_t pte_mkwrite(pte_t pte) { pte_val(pte) &= ~_PAGE_FOW; return pte; } extern inline pte_t pte_mkdirty(pte_t pte) { pte_val(pte) |= __DIRTY_BITS; return pte; } extern inline pte_t pte_mkyoung(pte_t pte) { pte_val(pte) |= __ACCESS_BITS; return pte; } +extern inline pte_t pte_mkspecial(pte_t pte) { return pte; } #define PAGE_DIR_OFFSET(tsk,address) pgd_offset((tsk),(address)) diff --git a/include/asm-arm/pgtable.h b/include/asm-arm/pgtable.h index 5e0182485d8c..5571c13c3f3b 100644 --- a/include/asm-arm/pgtable.h +++ b/include/asm-arm/pgtable.h @@ -260,6 +260,7 @@ extern struct page *empty_zero_page; #define pte_write(pte) (pte_val(pte) & L_PTE_WRITE) #define pte_dirty(pte) (pte_val(pte) & L_PTE_DIRTY) #define pte_young(pte) (pte_val(pte) & L_PTE_YOUNG) +#define pte_special(pte) (0) /* * The following only works if pte_present() is not true. @@ -280,6 +281,8 @@ PTE_BIT_FUNC(mkdirty, |= L_PTE_DIRTY); PTE_BIT_FUNC(mkold, &= ~L_PTE_YOUNG); PTE_BIT_FUNC(mkyoung, |= L_PTE_YOUNG); +static inline pte_t pte_mkspecial(pte_t pte) { return pte; } + /* * Mark the prot value as uncacheable and unbufferable. */ diff --git a/include/asm-avr32/pgtable.h b/include/asm-avr32/pgtable.h index 3ae7b548fce7..c0e5e29417df 100644 --- a/include/asm-avr32/pgtable.h +++ b/include/asm-avr32/pgtable.h @@ -212,6 +212,10 @@ static inline int pte_young(pte_t pte) { return pte_val(pte) & _PAGE_ACCESSED; } +static inline int pte_special(pte_t pte) +{ + return 0; +} /* * The following only work if pte_present() is not true. @@ -252,6 +256,10 @@ static inline pte_t pte_mkyoung(pte_t pte) set_pte(&pte, __pte(pte_val(pte) | _PAGE_ACCESSED)); return pte; } +static inline pte_t pte_mkspecial(pte_t pte) +{ + return pte; +} #define pmd_none(x) (!pmd_val(x)) #define pmd_present(x) (pmd_val(x) & _PAGE_PRESENT) diff --git a/include/asm-cris/pgtable.h b/include/asm-cris/pgtable.h index a2607575681b..4c373624ee97 100644 --- a/include/asm-cris/pgtable.h +++ b/include/asm-cris/pgtable.h @@ -115,6 +115,7 @@ static inline int pte_write(pte_t pte) { return pte_val(pte) & _PAGE_WR static inline int pte_dirty(pte_t pte) { return pte_val(pte) & _PAGE_MODIFIED; } static inline int pte_young(pte_t pte) { return pte_val(pte) & _PAGE_ACCESSED; } static inline int pte_file(pte_t pte) { return pte_val(pte) & _PAGE_FILE; } +static inline int pte_special(pte_t pte) { return 0; } static inline pte_t pte_wrprotect(pte_t pte) { @@ -162,6 +163,7 @@ static inline pte_t pte_mkyoung(pte_t pte) } return pte; } +static inline pte_t pte_mkspecial(pte_t pte) { return pte; } /* * Conversion functions: convert a page and protection to a page entry, diff --git a/include/asm-frv/pgtable.h b/include/asm-frv/pgtable.h index 4e219046fe42..83c51aba534b 100644 --- a/include/asm-frv/pgtable.h +++ b/include/asm-frv/pgtable.h @@ -380,6 +380,7 @@ static inline pmd_t *pmd_offset(pud_t *dir, unsigned long address) static inline int pte_dirty(pte_t pte) { return (pte).pte & _PAGE_DIRTY; } static inline int pte_young(pte_t pte) { return (pte).pte & _PAGE_ACCESSED; } static inline int pte_write(pte_t pte) { return !((pte).pte & _PAGE_WP); } +static inline int pte_special(pte_t pte) { return 0; } static inline pte_t pte_mkclean(pte_t pte) { (pte).pte &= ~_PAGE_DIRTY; return pte; } static inline pte_t pte_mkold(pte_t pte) { (pte).pte &= ~_PAGE_ACCESSED; return pte; } @@ -387,6 +388,7 @@ static inline pte_t pte_wrprotect(pte_t pte) { (pte).pte |= _PAGE_WP; return pte static inline pte_t pte_mkdirty(pte_t pte) { (pte).pte |= _PAGE_DIRTY; return pte; } static inline pte_t pte_mkyoung(pte_t pte) { (pte).pte |= _PAGE_ACCESSED; return pte; } static inline pte_t pte_mkwrite(pte_t pte) { (pte).pte &= ~_PAGE_WP; return pte; } +static inline pte_t pte_mkspecial(pte_t pte) { return pte; } static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) { diff --git a/include/asm-ia64/pgtable.h b/include/asm-ia64/pgtable.h index ed70862ea247..7a9bff47564f 100644 --- a/include/asm-ia64/pgtable.h +++ b/include/asm-ia64/pgtable.h @@ -302,6 +302,8 @@ ia64_phys_addr_valid (unsigned long addr) #define pte_dirty(pte) ((pte_val(pte) & _PAGE_D) != 0) #define pte_young(pte) ((pte_val(pte) & _PAGE_A) != 0) #define pte_file(pte) ((pte_val(pte) & _PAGE_FILE) != 0) +#define pte_special(pte) 0 + /* * Note: we convert AR_RWX to AR_RX and AR_RW to AR_R by clearing the 2nd bit in the * access rights: @@ -313,6 +315,7 @@ ia64_phys_addr_valid (unsigned long addr) #define pte_mkclean(pte) (__pte(pte_val(pte) & ~_PAGE_D)) #define pte_mkdirty(pte) (__pte(pte_val(pte) | _PAGE_D)) #define pte_mkhuge(pte) (__pte(pte_val(pte))) +#define pte_mkspecial(pte) (pte) /* * Because ia64's Icache and Dcache is not coherent (on a cpu), we need to diff --git a/include/asm-m32r/pgtable.h b/include/asm-m32r/pgtable.h index 86505387be08..e6359c566b50 100644 --- a/include/asm-m32r/pgtable.h +++ b/include/asm-m32r/pgtable.h @@ -214,6 +214,11 @@ static inline int pte_file(pte_t pte) return pte_val(pte) & _PAGE_FILE; } +static inline int pte_special(pte_t pte) +{ + return 0; +} + static inline pte_t pte_mkclean(pte_t pte) { pte_val(pte) &= ~_PAGE_DIRTY; @@ -250,6 +255,11 @@ static inline pte_t pte_mkwrite(pte_t pte) return pte; } +static inline pte_t pte_mkspecial(pte_t pte) +{ + return pte; +} + static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) { return test_and_clear_bit(_PAGE_BIT_ACCESSED, ptep); diff --git a/include/asm-m68k/motorola_pgtable.h b/include/asm-m68k/motorola_pgtable.h index 13135d4821d8..8e9a8a754dde 100644 --- a/include/asm-m68k/motorola_pgtable.h +++ b/include/asm-m68k/motorola_pgtable.h @@ -168,6 +168,7 @@ static inline int pte_write(pte_t pte) { return !(pte_val(pte) & _PAGE_RONLY); static inline int pte_dirty(pte_t pte) { return pte_val(pte) & _PAGE_DIRTY; } static inline int pte_young(pte_t pte) { return pte_val(pte) & _PAGE_ACCESSED; } static inline int pte_file(pte_t pte) { return pte_val(pte) & _PAGE_FILE; } +static inline int pte_special(pte_t pte) { return 0; } static inline pte_t pte_wrprotect(pte_t pte) { pte_val(pte) |= _PAGE_RONLY; return pte; } static inline pte_t pte_mkclean(pte_t pte) { pte_val(pte) &= ~_PAGE_DIRTY; return pte; } @@ -185,6 +186,7 @@ static inline pte_t pte_mkcache(pte_t pte) pte_val(pte) = (pte_val(pte) & _CACHEMASK040) | m68k_supervisor_cachemode; return pte; } +static inline pte_t pte_mkspecial(pte_t pte) { return pte; } #define PAGE_DIR_OFFSET(tsk,address) pgd_offset((tsk),(address)) diff --git a/include/asm-m68k/sun3_pgtable.h b/include/asm-m68k/sun3_pgtable.h index b766fc261bde..f847ec732d62 100644 --- a/include/asm-m68k/sun3_pgtable.h +++ b/include/asm-m68k/sun3_pgtable.h @@ -169,6 +169,7 @@ static inline int pte_write(pte_t pte) { return pte_val(pte) & SUN3_PAGE_WRITEA static inline int pte_dirty(pte_t pte) { return pte_val(pte) & SUN3_PAGE_MODIFIED; } static inline int pte_young(pte_t pte) { return pte_val(pte) & SUN3_PAGE_ACCESSED; } static inline int pte_file(pte_t pte) { return pte_val(pte) & SUN3_PAGE_ACCESSED; } +static inline int pte_special(pte_t pte) { return 0; } static inline pte_t pte_wrprotect(pte_t pte) { pte_val(pte) &= ~SUN3_PAGE_WRITEABLE; return pte; } static inline pte_t pte_mkclean(pte_t pte) { pte_val(pte) &= ~SUN3_PAGE_MODIFIED; return pte; } @@ -181,6 +182,7 @@ static inline pte_t pte_mknocache(pte_t pte) { pte_val(pte) |= SUN3_PAGE_NOCACHE //static inline pte_t pte_mkcache(pte_t pte) { pte_val(pte) &= SUN3_PAGE_NOCACHE; return pte; } // until then, use: static inline pte_t pte_mkcache(pte_t pte) { return pte; } +static inline pte_t pte_mkspecial(pte_t pte) { return pte; } extern pgd_t swapper_pg_dir[PTRS_PER_PGD]; extern pgd_t kernel_pg_dir[PTRS_PER_PGD]; diff --git a/include/asm-mips/pgtable.h b/include/asm-mips/pgtable.h index 17a7703a2969..782221e57c0a 100644 --- a/include/asm-mips/pgtable.h +++ b/include/asm-mips/pgtable.h @@ -285,6 +285,8 @@ static inline pte_t pte_mkyoung(pte_t pte) return pte; } #endif +static inline int pte_special(pte_t pte) { return 0; } +static inline pte_t pte_mkspecial(pte_t pte) { return pte; } /* * Macro to make mark a page protection value as "uncacheable". Note diff --git a/include/asm-mn10300/pgtable.h b/include/asm-mn10300/pgtable.h index 375c4941deda..6dc30fc827c4 100644 --- a/include/asm-mn10300/pgtable.h +++ b/include/asm-mn10300/pgtable.h @@ -224,6 +224,7 @@ static inline int pte_read(pte_t pte) { return pte_val(pte) & __PAGE_PROT_USER; static inline int pte_dirty(pte_t pte) { return pte_val(pte) & _PAGE_DIRTY; } static inline int pte_young(pte_t pte) { return pte_val(pte) & _PAGE_ACCESSED; } static inline int pte_write(pte_t pte) { return pte_val(pte) & __PAGE_PROT_WRITE; } +static inline int pte_special(pte_t pte){ return 0; } /* * The following only works if pte_present() is not true. @@ -265,6 +266,8 @@ static inline pte_t pte_mkwrite(pte_t pte) return pte; } +static inline pte_t pte_mkspecial(pte_t pte) { return pte; } + #define pte_ERROR(e) \ printk(KERN_ERR "%s:%d: bad pte %08lx.\n", \ __FILE__, __LINE__, pte_val(e)) diff --git a/include/asm-parisc/pgtable.h b/include/asm-parisc/pgtable.h index dc86adbec916..470a4b88124d 100644 --- a/include/asm-parisc/pgtable.h +++ b/include/asm-parisc/pgtable.h @@ -323,6 +323,7 @@ static inline int pte_dirty(pte_t pte) { return pte_val(pte) & _PAGE_DIRTY; } static inline int pte_young(pte_t pte) { return pte_val(pte) & _PAGE_ACCESSED; } static inline int pte_write(pte_t pte) { return pte_val(pte) & _PAGE_WRITE; } static inline int pte_file(pte_t pte) { return pte_val(pte) & _PAGE_FILE; } +static inline int pte_special(pte_t pte) { return 0; } static inline pte_t pte_mkclean(pte_t pte) { pte_val(pte) &= ~_PAGE_DIRTY; return pte; } static inline pte_t pte_mkold(pte_t pte) { pte_val(pte) &= ~_PAGE_ACCESSED; return pte; } @@ -330,6 +331,7 @@ static inline pte_t pte_wrprotect(pte_t pte) { pte_val(pte) &= ~_PAGE_WRITE; ret static inline pte_t pte_mkdirty(pte_t pte) { pte_val(pte) |= _PAGE_DIRTY; return pte; } static inline pte_t pte_mkyoung(pte_t pte) { pte_val(pte) |= _PAGE_ACCESSED; return pte; } static inline pte_t pte_mkwrite(pte_t pte) { pte_val(pte) |= _PAGE_WRITE; return pte; } +static inline pte_t pte_mkspecial(pte_t pte) { return pte; } /* * Conversion functions: convert a page and protection to a page entry, diff --git a/include/asm-powerpc/pgtable-ppc32.h b/include/asm-powerpc/pgtable-ppc32.h index daea7692d070..7c97b5a08d08 100644 --- a/include/asm-powerpc/pgtable-ppc32.h +++ b/include/asm-powerpc/pgtable-ppc32.h @@ -504,6 +504,7 @@ static inline int pte_write(pte_t pte) { return pte_val(pte) & _PAGE_RW; } static inline int pte_dirty(pte_t pte) { return pte_val(pte) & _PAGE_DIRTY; } static inline int pte_young(pte_t pte) { return pte_val(pte) & _PAGE_ACCESSED; } static inline int pte_file(pte_t pte) { return pte_val(pte) & _PAGE_FILE; } +static inline int pte_special(pte_t pte) { return 0; } static inline void pte_uncache(pte_t pte) { pte_val(pte) |= _PAGE_NO_CACHE; } static inline void pte_cache(pte_t pte) { pte_val(pte) &= ~_PAGE_NO_CACHE; } @@ -521,6 +522,8 @@ static inline pte_t pte_mkdirty(pte_t pte) { pte_val(pte) |= _PAGE_DIRTY; return pte; } static inline pte_t pte_mkyoung(pte_t pte) { pte_val(pte) |= _PAGE_ACCESSED; return pte; } +static inline pte_t pte_mkspecial(pte_t pte) { + return pte; } static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) { diff --git a/include/asm-powerpc/pgtable-ppc64.h b/include/asm-powerpc/pgtable-ppc64.h index dd4c26dc57d2..27f18695f7d6 100644 --- a/include/asm-powerpc/pgtable-ppc64.h +++ b/include/asm-powerpc/pgtable-ppc64.h @@ -239,6 +239,7 @@ static inline int pte_write(pte_t pte) { return pte_val(pte) & _PAGE_RW;} static inline int pte_dirty(pte_t pte) { return pte_val(pte) & _PAGE_DIRTY;} static inline int pte_young(pte_t pte) { return pte_val(pte) & _PAGE_ACCESSED;} static inline int pte_file(pte_t pte) { return pte_val(pte) & _PAGE_FILE;} +static inline int pte_special(pte_t pte) { return 0; } static inline void pte_uncache(pte_t pte) { pte_val(pte) |= _PAGE_NO_CACHE; } static inline void pte_cache(pte_t pte) { pte_val(pte) &= ~_PAGE_NO_CACHE; } @@ -257,6 +258,8 @@ static inline pte_t pte_mkyoung(pte_t pte) { pte_val(pte) |= _PAGE_ACCESSED; return pte; } static inline pte_t pte_mkhuge(pte_t pte) { return pte; } +static inline pte_t pte_mkspecial(pte_t pte) { + return pte; } /* Atomic PTE updates */ static inline unsigned long pte_update(struct mm_struct *mm, diff --git a/include/asm-ppc/pgtable.h b/include/asm-ppc/pgtable.h index 70435d32129a..55f9d38e3bf8 100644 --- a/include/asm-ppc/pgtable.h +++ b/include/asm-ppc/pgtable.h @@ -483,6 +483,7 @@ static inline int pte_write(pte_t pte) { return pte_val(pte) & _PAGE_RW; } static inline int pte_dirty(pte_t pte) { return pte_val(pte) & _PAGE_DIRTY; } static inline int pte_young(pte_t pte) { return pte_val(pte) & _PAGE_ACCESSED; } static inline int pte_file(pte_t pte) { return pte_val(pte) & _PAGE_FILE; } +static inline int pte_special(pte_t pte) { return 0; } static inline void pte_uncache(pte_t pte) { pte_val(pte) |= _PAGE_NO_CACHE; } static inline void pte_cache(pte_t pte) { pte_val(pte) &= ~_PAGE_NO_CACHE; } @@ -500,6 +501,8 @@ static inline pte_t pte_mkdirty(pte_t pte) { pte_val(pte) |= _PAGE_DIRTY; return pte; } static inline pte_t pte_mkyoung(pte_t pte) { pte_val(pte) |= _PAGE_ACCESSED; return pte; } +static inline pte_t pte_mkspecial(pte_t pte) { + return pte; } static inline pte_t pte_modify(pte_t pte, pgprot_t newprot) { diff --git a/include/asm-s390/pgtable.h b/include/asm-s390/pgtable.h index 4c0698c0dda5..76e8a7904e8a 100644 --- a/include/asm-s390/pgtable.h +++ b/include/asm-s390/pgtable.h @@ -518,6 +518,11 @@ static inline int pte_file(pte_t pte) return (pte_val(pte) & mask) == _PAGE_TYPE_FILE; } +static inline int pte_special(pte_t pte) +{ + return 0; +} + #define __HAVE_ARCH_PTE_SAME #define pte_same(a,b) (pte_val(a) == pte_val(b)) @@ -715,6 +720,11 @@ static inline pte_t pte_mkyoung(pte_t pte) return pte; } +static inline pte_t pte_mkspecial(pte_t pte) +{ + return pte; +} + #define __HAVE_ARCH_PTEP_TEST_AND_CLEAR_YOUNG static inline int ptep_test_and_clear_young(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) diff --git a/include/asm-sh/pgtable_32.h b/include/asm-sh/pgtable_32.h index 3e3557c53c55..cbc731d35c25 100644 --- a/include/asm-sh/pgtable_32.h +++ b/include/asm-sh/pgtable_32.h @@ -326,6 +326,7 @@ static inline void set_pte(pte_t *ptep, pte_t pte) #define pte_dirty(pte) ((pte).pte_low & _PAGE_DIRTY) #define pte_young(pte) ((pte).pte_low & _PAGE_ACCESSED) #define pte_file(pte) ((pte).pte_low & _PAGE_FILE) +#define pte_special(pte) (0) #ifdef CONFIG_X2TLB #define pte_write(pte) ((pte).pte_high & _PAGE_EXT_USER_WRITE) @@ -356,6 +357,8 @@ PTE_BIT_FUNC(low, mkdirty, |= _PAGE_DIRTY); PTE_BIT_FUNC(low, mkold, &= ~_PAGE_ACCESSED); PTE_BIT_FUNC(low, mkyoung, |= _PAGE_ACCESSED); +static inline pte_t pte_mkspecial(pte_t pte) { return pte; } + /* * Macro and implementation to make a page protection as uncachable. */ diff --git a/include/asm-sh/pgtable_64.h b/include/asm-sh/pgtable_64.h index f9dd9d311441..c78990cda557 100644 --- a/include/asm-sh/pgtable_64.h +++ b/include/asm-sh/pgtable_64.h @@ -254,10 +254,11 @@ extern void __handle_bad_pmd_kernel(pmd_t * pmd); /* * The following have defined behavior only work if pte_present() is true. */ -static inline int pte_dirty(pte_t pte){ return pte_val(pte) & _PAGE_DIRTY; } -static inline int pte_young(pte_t pte){ return pte_val(pte) & _PAGE_ACCESSED; } -static inline int pte_file(pte_t pte) { return pte_val(pte) & _PAGE_FILE; } -static inline int pte_write(pte_t pte){ return pte_val(pte) & _PAGE_WRITE; } +static inline int pte_dirty(pte_t pte) { return pte_val(pte) & _PAGE_DIRTY; } +static inline int pte_young(pte_t pte) { return pte_val(pte) & _PAGE_ACCESSED; } +static inline int pte_file(pte_t pte) { return pte_val(pte) & _PAGE_FILE; } +static inline int pte_write(pte_t pte) { return pte_val(pte) & _PAGE_WRITE; } +static inline int pte_special(pte_t pte){ return 0; } static inline pte_t pte_wrprotect(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) & ~_PAGE_WRITE)); return pte; } static inline pte_t pte_mkclean(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) & ~_PAGE_DIRTY)); return pte; } @@ -266,6 +267,7 @@ static inline pte_t pte_mkwrite(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) | static inline pte_t pte_mkdirty(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) | _PAGE_DIRTY)); return pte; } static inline pte_t pte_mkyoung(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) | _PAGE_ACCESSED)); return pte; } static inline pte_t pte_mkhuge(pte_t pte) { set_pte(&pte, __pte(pte_val(pte) | _PAGE_SZHUGE)); return pte; } +static inline pte_t pte_mkspecial(pte_t pte) { return pte; } /* diff --git a/include/asm-sparc/pgtable.h b/include/asm-sparc/pgtable.h index 2cc235b74d94..d84af6d95f5c 100644 --- a/include/asm-sparc/pgtable.h +++ b/include/asm-sparc/pgtable.h @@ -219,6 +219,11 @@ static inline int pte_file(pte_t pte) return pte_val(pte) & BTFIXUP_HALF(pte_filei); } +static inline int pte_special(pte_t pte) +{ + return 0; +} + /* */ BTFIXUPDEF_HALF(pte_wrprotecti) @@ -251,6 +256,8 @@ BTFIXUPDEF_CALL_CONST(pte_t, pte_mkyoung, pte_t) #define pte_mkdirty(pte) BTFIXUP_CALL(pte_mkdirty)(pte) #define pte_mkyoung(pte) BTFIXUP_CALL(pte_mkyoung)(pte) +#define pte_mkspecial(pte) (pte) + #define pfn_pte(pfn, prot) mk_pte(pfn_to_page(pfn), prot) BTFIXUPDEF_CALL(unsigned long, pte_pfn, pte_t) diff --git a/include/asm-sparc64/pgtable.h b/include/asm-sparc64/pgtable.h index 549e45266b68..0e200e7acec7 100644 --- a/include/asm-sparc64/pgtable.h +++ b/include/asm-sparc64/pgtable.h @@ -506,6 +506,11 @@ static inline pte_t pte_mkyoung(pte_t pte) return __pte(pte_val(pte) | mask); } +static inline pte_t pte_mkspecial(pte_t pte) +{ + return pte; +} + static inline unsigned long pte_young(pte_t pte) { unsigned long mask; @@ -608,6 +613,11 @@ static inline unsigned long pte_present(pte_t pte) return val; } +static inline int pte_special(pte_t pte) +{ + return 0; +} + #define pmd_set(pmdp, ptep) \ (pmd_val(*(pmdp)) = (__pa((unsigned long) (ptep)) >> 11UL)) #define pud_set(pudp, pmdp) \ diff --git a/include/asm-um/pgtable.h b/include/asm-um/pgtable.h index 4102b443e925..02db81b7b86e 100644 --- a/include/asm-um/pgtable.h +++ b/include/asm-um/pgtable.h @@ -173,6 +173,11 @@ static inline int pte_newprot(pte_t pte) return(pte_present(pte) && (pte_get_bits(pte, _PAGE_NEWPROT))); } +static inline int pte_special(pte_t pte) +{ + return 0; +} + /* * ================================= * Flags setting section. @@ -241,6 +246,11 @@ static inline pte_t pte_mknewpage(pte_t pte) return(pte); } +static inline pte_t pte_mkspecial(pte_t pte) +{ + return(pte); +} + static inline void set_pte(pte_t *pteptr, pte_t pteval) { pte_copy(*pteptr, pteval); diff --git a/include/asm-x86/pgtable.h b/include/asm-x86/pgtable.h index a496d6335d3b..801b31f71452 100644 --- a/include/asm-x86/pgtable.h +++ b/include/asm-x86/pgtable.h @@ -195,6 +195,11 @@ static inline int pte_exec(pte_t pte) return !(pte_val(pte) & _PAGE_NX); } +static inline int pte_special(pte_t pte) +{ + return 0; +} + static inline int pmd_large(pmd_t pte) { return (pmd_val(pte) & (_PAGE_PSE | _PAGE_PRESENT)) == @@ -256,6 +261,11 @@ static inline pte_t pte_clrglobal(pte_t pte) return __pte(pte_val(pte) & ~(pteval_t)_PAGE_GLOBAL); } +static inline pte_t pte_mkspecial(pte_t pte) +{ + return pte; +} + extern pteval_t __supported_pte_mask; static inline pte_t pfn_pte(unsigned long page_nr, pgprot_t pgprot) diff --git a/include/asm-xtensa/pgtable.h b/include/asm-xtensa/pgtable.h index c8b024a48b4d..8014d96b21f1 100644 --- a/include/asm-xtensa/pgtable.h +++ b/include/asm-xtensa/pgtable.h @@ -210,6 +210,8 @@ static inline int pte_write(pte_t pte) { return pte_val(pte) & _PAGE_WRITABLE; } static inline int pte_dirty(pte_t pte) { return pte_val(pte) & _PAGE_DIRTY; } static inline int pte_young(pte_t pte) { return pte_val(pte) & _PAGE_ACCESSED; } static inline int pte_file(pte_t pte) { return pte_val(pte) & _PAGE_FILE; } +static inline int pte_special(pte_t pte) { return 0; } + static inline pte_t pte_wrprotect(pte_t pte) { pte_val(pte) &= ~(_PAGE_WRITABLE | _PAGE_HW_WRITE); return pte; } static inline pte_t pte_mkclean(pte_t pte) @@ -222,6 +224,8 @@ static inline pte_t pte_mkyoung(pte_t pte) { pte_val(pte) |= _PAGE_ACCESSED; return pte; } static inline pte_t pte_mkwrite(pte_t pte) { pte_val(pte) |= _PAGE_WRITABLE; return pte; } +static inline pte_t pte_mkspecial(pte_t pte) + { return pte; } /* * Conversion functions: convert a page and protection to a page entry, diff --git a/include/linux/mm.h b/include/linux/mm.h index c657ea0bd6aa..ba86ddaa2bb8 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -721,7 +721,9 @@ struct zap_details { unsigned long truncate_count; /* Compare vm_truncate_count */ }; -struct page *vm_normal_page(struct vm_area_struct *, unsigned long, pte_t); +struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, + pte_t pte); + unsigned long zap_page_range(struct vm_area_struct *vma, unsigned long address, unsigned long size, struct zap_details *); unsigned long unmap_vmas(struct mmu_gather **tlb, diff --git a/mm/memory.c b/mm/memory.c index 0da414c383e7..c5e88bcd8ec3 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -371,33 +371,37 @@ static inline int is_cow_mapping(unsigned int flags) } /* - * This function gets the "struct page" associated with a pte or returns - * NULL if no "struct page" is associated with the pte. + * vm_normal_page -- This function gets the "struct page" associated with a pte. * - * A raw VM_PFNMAP mapping (ie. one that is not COWed) may not have any "struct - * page" backing, and even if they do, they are not refcounted. COWed pages of - * a VM_PFNMAP do always have a struct page, and they are normally refcounted - * (they are _normal_ pages). + * "Special" mappings do not wish to be associated with a "struct page" (either + * it doesn't exist, or it exists but they don't want to touch it). In this + * case, NULL is returned here. "Normal" mappings do have a struct page. * - * So a raw PFNMAP mapping will have each page table entry just pointing - * to a page frame number, and as far as the VM layer is concerned, those do - * not have pages associated with them - even if the PFN might point to memory - * that otherwise is perfectly fine and has a "struct page". + * There are 2 broad cases. Firstly, an architecture may define a pte_special() + * pte bit, in which case this function is trivial. Secondly, an architecture + * may not have a spare pte bit, which requires a more complicated scheme, + * described below. + * + * A raw VM_PFNMAP mapping (ie. one that is not COWed) is always considered a + * special mapping (even if there are underlying and valid "struct pages"). + * COWed pages of a VM_PFNMAP are always normal. * * The way we recognize COWed pages within VM_PFNMAP mappings is through the * rules set up by "remap_pfn_range()": the vma will have the VM_PFNMAP bit - * set, and the vm_pgoff will point to the first PFN mapped: thus every - * page that is a raw mapping will always honor the rule + * set, and the vm_pgoff will point to the first PFN mapped: thus every special + * mapping will always honor the rule * * pfn_of_page == vma->vm_pgoff + ((addr - vma->vm_start) >> PAGE_SHIFT) * - * A call to vm_normal_page() will return NULL for such a page. + * And for normal mappings this is false. + * + * This restricts such mappings to be a linear translation from virtual address + * to pfn. To get around this restriction, we allow arbitrary mappings so long + * as the vma is not a COW mapping; in that case, we know that all ptes are + * special (because none can have been COWed). * - * If the page doesn't follow the "remap_pfn_range()" rule in a VM_PFNMAP - * then the page has been COW'ed. A COW'ed page _does_ have a "struct page" - * associated with it even if it is in a VM_PFNMAP range. Calling - * vm_normal_page() on such a page will therefore return the "struct page". * + * In order to support COW of arbitrary special mappings, we have VM_MIXEDMAP. * * VM_MIXEDMAP mappings can likewise contain memory with or without "struct * page" backing, however the difference is that _all_ pages with a struct @@ -407,16 +411,29 @@ static inline int is_cow_mapping(unsigned int flags) * advantage is that we don't have to follow the strict linearity rule of * PFNMAP mappings in order to support COWable mappings. * - * A call to vm_normal_page() with a VM_MIXEDMAP mapping will return the - * associated "struct page" or NULL for memory not backed by a "struct page". - * - * - * All other mappings should have a valid struct page, which will be - * returned by a call to vm_normal_page(). */ -struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, pte_t pte) +#ifdef __HAVE_ARCH_PTE_SPECIAL +# define HAVE_PTE_SPECIAL 1 +#else +# define HAVE_PTE_SPECIAL 0 +#endif +struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, + pte_t pte) { - unsigned long pfn = pte_pfn(pte); + unsigned long pfn; + + if (HAVE_PTE_SPECIAL) { + if (likely(!pte_special(pte))) { + VM_BUG_ON(!pfn_valid(pte_pfn(pte))); + return pte_page(pte); + } + VM_BUG_ON(!(vma->vm_flags & (VM_PFNMAP | VM_MIXEDMAP))); + return NULL; + } + + /* !HAVE_PTE_SPECIAL case follows: */ + + pfn = pte_pfn(pte); if (unlikely(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))) { if (vma->vm_flags & VM_MIXEDMAP) { @@ -424,7 +441,8 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, pte_ return NULL; goto out; } else { - unsigned long off = (addr-vma->vm_start) >> PAGE_SHIFT; + unsigned long off; + off = (addr - vma->vm_start) >> PAGE_SHIFT; if (pfn == vma->vm_pgoff + off) return NULL; if (!is_cow_mapping(vma->vm_flags)) @@ -432,25 +450,12 @@ struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, pte_ } } -#ifdef CONFIG_DEBUG_VM - /* - * Add some anal sanity checks for now. Eventually, - * we should just do "return pfn_to_page(pfn)", but - * in the meantime we check that we get a valid pfn, - * and that the resulting page looks ok. - */ - if (unlikely(!pfn_valid(pfn))) { - print_bad_pte(vma, pte, addr); - return NULL; - } -#endif + VM_BUG_ON(!pfn_valid(pfn)); /* - * NOTE! We still have PageReserved() pages in the page - * tables. + * NOTE! We still have PageReserved() pages in the page tables. * - * The PAGE_ZERO() pages and various VDSO mappings can - * cause them to exist. + * eg. VDSO mappings can cause them to exist. */ out: return pfn_to_page(pfn); @@ -1263,6 +1268,12 @@ int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr, pte_t *pte, entry; spinlock_t *ptl; + /* + * Technically, architectures with pte_special can avoid all these + * restrictions (same for remap_pfn_range). However we would like + * consistency in testing and feature parity among all, so we should + * try to keep these invariants in place for everybody. + */ BUG_ON(!(vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP))); BUG_ON((vma->vm_flags & (VM_PFNMAP|VM_MIXEDMAP)) == (VM_PFNMAP|VM_MIXEDMAP)); @@ -1278,7 +1289,7 @@ int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr, goto out_unlock; /* Ok, finally just insert the thing.. */ - entry = pfn_pte(pfn, vma->vm_page_prot); + entry = pte_mkspecial(pfn_pte(pfn, vma->vm_page_prot)); set_pte_at(mm, addr, pte, entry); update_mmu_cache(vma, addr, entry); @@ -1309,7 +1320,7 @@ static int remap_pte_range(struct mm_struct *mm, pmd_t *pmd, arch_enter_lazy_mmu_mode(); do { BUG_ON(!pte_none(*pte)); - set_pte_at(mm, addr, pte, pfn_pte(pfn, prot)); + set_pte_at(mm, addr, pte, pte_mkspecial(pfn_pte(pfn, prot))); pfn++; } while (pte++, addr += PAGE_SIZE, addr != end); arch_leave_lazy_mmu_mode(); -- cgit v1.2.3 From 423bad600443c590f34ed7ce357591f76f48f137 Mon Sep 17 00:00:00 2001 From: Nick Piggin Date: Mon, 28 Apr 2008 02:13:01 -0700 Subject: mm: add vm_insert_mixed vm_insert_mixed will insert either a raw pfn or a refcounted struct page into the page tables, depending on whether vm_normal_page() will return the page or not. With the introduction of the new pte bit, this is now a too tricky for drivers to be doing themselves. filemap_xip uses this in a subsequent patch. Signed-off-by: Nick Piggin Cc: Jared Hulbert Cc: Carsten Otte Cc: Martin Schwidefsky Cc: Heiko Carstens Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mm.h | 2 ++ mm/memory.c | 86 +++++++++++++++++++++++++++++++++++++----------------- 2 files changed, 62 insertions(+), 26 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index ba86ddaa2bb8..bc0ad24cf8c0 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1152,6 +1152,8 @@ int remap_pfn_range(struct vm_area_struct *, unsigned long addr, int vm_insert_page(struct vm_area_struct *, unsigned long addr, struct page *); int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr, unsigned long pfn); +int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr, + unsigned long pfn); struct page *follow_page(struct vm_area_struct *, unsigned long address, unsigned int foll_flags); diff --git a/mm/memory.c b/mm/memory.c index c5e88bcd8ec3..bbab1e37055e 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1176,8 +1176,10 @@ pte_t *get_locked_pte(struct mm_struct *mm, unsigned long addr, * old drivers should use this, and they needed to mark their * pages reserved for the old functions anyway. */ -static int insert_page(struct mm_struct *mm, unsigned long addr, struct page *page, pgprot_t prot) +static int insert_page(struct vm_area_struct *vma, unsigned long addr, + struct page *page, pgprot_t prot) { + struct mm_struct *mm = vma->vm_mm; int retval; pte_t *pte; spinlock_t *ptl; @@ -1237,17 +1239,46 @@ out: * * The page does not need to be reserved. */ -int vm_insert_page(struct vm_area_struct *vma, unsigned long addr, struct page *page) +int vm_insert_page(struct vm_area_struct *vma, unsigned long addr, + struct page *page) { if (addr < vma->vm_start || addr >= vma->vm_end) return -EFAULT; if (!page_count(page)) return -EINVAL; vma->vm_flags |= VM_INSERTPAGE; - return insert_page(vma->vm_mm, addr, page, vma->vm_page_prot); + return insert_page(vma, addr, page, vma->vm_page_prot); } EXPORT_SYMBOL(vm_insert_page); +static int insert_pfn(struct vm_area_struct *vma, unsigned long addr, + unsigned long pfn, pgprot_t prot) +{ + struct mm_struct *mm = vma->vm_mm; + int retval; + pte_t *pte, entry; + spinlock_t *ptl; + + retval = -ENOMEM; + pte = get_locked_pte(mm, addr, &ptl); + if (!pte) + goto out; + retval = -EBUSY; + if (!pte_none(*pte)) + goto out_unlock; + + /* Ok, finally just insert the thing.. */ + entry = pte_mkspecial(pfn_pte(pfn, prot)); + set_pte_at(mm, addr, pte, entry); + update_mmu_cache(vma, addr, entry); /* XXX: why not for insert_page? */ + + retval = 0; +out_unlock: + pte_unmap_unlock(pte, ptl); +out: + return retval; +} + /** * vm_insert_pfn - insert single pfn into user vma * @vma: user vma to map to @@ -1261,13 +1292,8 @@ EXPORT_SYMBOL(vm_insert_page); * in that case the handler should return NULL. */ int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr, - unsigned long pfn) + unsigned long pfn) { - struct mm_struct *mm = vma->vm_mm; - int retval; - pte_t *pte, entry; - spinlock_t *ptl; - /* * Technically, architectures with pte_special can avoid all these * restrictions (same for remap_pfn_range). However we would like @@ -1280,27 +1306,35 @@ int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr, BUG_ON((vma->vm_flags & VM_PFNMAP) && is_cow_mapping(vma->vm_flags)); BUG_ON((vma->vm_flags & VM_MIXEDMAP) && pfn_valid(pfn)); - retval = -ENOMEM; - pte = get_locked_pte(mm, addr, &ptl); - if (!pte) - goto out; - retval = -EBUSY; - if (!pte_none(*pte)) - goto out_unlock; + if (addr < vma->vm_start || addr >= vma->vm_end) + return -EFAULT; + return insert_pfn(vma, addr, pfn, vma->vm_page_prot); +} +EXPORT_SYMBOL(vm_insert_pfn); - /* Ok, finally just insert the thing.. */ - entry = pte_mkspecial(pfn_pte(pfn, vma->vm_page_prot)); - set_pte_at(mm, addr, pte, entry); - update_mmu_cache(vma, addr, entry); +int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr, + unsigned long pfn) +{ + BUG_ON(!(vma->vm_flags & VM_MIXEDMAP)); - retval = 0; -out_unlock: - pte_unmap_unlock(pte, ptl); + if (addr < vma->vm_start || addr >= vma->vm_end) + return -EFAULT; -out: - return retval; + /* + * If we don't have pte special, then we have to use the pfn_valid() + * based VM_MIXEDMAP scheme (see vm_normal_page), and thus we *must* + * refcount the page if pfn_valid is true (hence insert_page rather + * than insert_pfn). + */ + if (!HAVE_PTE_SPECIAL && pfn_valid(pfn)) { + struct page *page; + + page = pfn_to_page(pfn); + return insert_page(vma, addr, page, vma->vm_page_prot); + } + return insert_pfn(vma, addr, pfn, vma->vm_page_prot); } -EXPORT_SYMBOL(vm_insert_pfn); +EXPORT_SYMBOL(vm_insert_mixed); /* * maps a range of physical memory into the requested pages. the old -- cgit v1.2.3 From 30afcb4bd2762fa4b87b17ada9500aa46dc10b1b Mon Sep 17 00:00:00 2001 From: Jared Hulbert Date: Mon, 28 Apr 2008 02:13:02 -0700 Subject: return pfn from direct_access, for XIP Alter the block device ->direct_access() API to work with the new get_xip_mem() API (that requires both kaddr and pfn are returned). Some architectures will not do the right thing in their virt_to_page() for use by XIP (to translate from the kernel virtual address returned by direct_access(), to a user mappable pfn in XIP's page fault handler. However, we can't switch it to just return the pfn and not the kaddr, because we have no good way to get a kva from a pfn, and XIP requires the kva for its read(2) and write(2) handlers. So we have to return both. Signed-off-by: Jared Hulbert Signed-off-by: Nick Piggin Cc: Carsten Otte Cc: Heiko Carstens Cc: linux-mm@kvack.org Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/powerpc/sysdev/axonram.c | 5 +++-- drivers/block/brd.c | 5 +++-- drivers/s390/block/dcssblk.c | 8 +++++--- fs/ext2/xip.c | 24 ++++++++++++++---------- include/linux/fs.h | 3 ++- 5 files changed, 27 insertions(+), 18 deletions(-) diff --git a/arch/powerpc/sysdev/axonram.c b/arch/powerpc/sysdev/axonram.c index d359d6e92975..7f59188cd9a1 100644 --- a/arch/powerpc/sysdev/axonram.c +++ b/arch/powerpc/sysdev/axonram.c @@ -143,7 +143,7 @@ axon_ram_make_request(struct request_queue *queue, struct bio *bio) */ static int axon_ram_direct_access(struct block_device *device, sector_t sector, - unsigned long *data) + void **kaddr, unsigned long *pfn) { struct axon_ram_bank *bank = device->bd_disk->private_data; loff_t offset; @@ -154,7 +154,8 @@ axon_ram_direct_access(struct block_device *device, sector_t sector, return -ERANGE; } - *data = bank->ph_addr + offset; + *kaddr = (void *)(bank->ph_addr + offset); + *pfn = virt_to_phys(kaddr) >> PAGE_SHIFT; return 0; } diff --git a/drivers/block/brd.c b/drivers/block/brd.c index 7bd76639544c..e8e38faeafd8 100644 --- a/drivers/block/brd.c +++ b/drivers/block/brd.c @@ -319,7 +319,7 @@ out: #ifdef CONFIG_BLK_DEV_XIP static int brd_direct_access (struct block_device *bdev, sector_t sector, - unsigned long *data) + void **kaddr, unsigned long *pfn) { struct brd_device *brd = bdev->bd_disk->private_data; struct page *page; @@ -333,7 +333,8 @@ static int brd_direct_access (struct block_device *bdev, sector_t sector, page = brd_insert_page(brd, sector); if (!page) return -ENOMEM; - *data = (unsigned long)page_address(page); + *kaddr = page_address(page); + *pfn = page_to_pfn(page); return 0; } diff --git a/drivers/s390/block/dcssblk.c b/drivers/s390/block/dcssblk.c index 04787eab1016..bb52d2fbac18 100644 --- a/drivers/s390/block/dcssblk.c +++ b/drivers/s390/block/dcssblk.c @@ -36,7 +36,7 @@ static int dcssblk_open(struct inode *inode, struct file *filp); static int dcssblk_release(struct inode *inode, struct file *filp); static int dcssblk_make_request(struct request_queue *q, struct bio *bio); static int dcssblk_direct_access(struct block_device *bdev, sector_t secnum, - unsigned long *data); + void **kaddr, unsigned long *pfn); static char dcssblk_segments[DCSSBLK_PARM_LEN] = "\0"; @@ -636,7 +636,7 @@ fail: static int dcssblk_direct_access (struct block_device *bdev, sector_t secnum, - unsigned long *data) + void **kaddr, unsigned long *pfn) { struct dcssblk_dev_info *dev_info; unsigned long pgoff; @@ -649,7 +649,9 @@ dcssblk_direct_access (struct block_device *bdev, sector_t secnum, pgoff = secnum / (PAGE_SIZE / 512); if ((pgoff+1)*PAGE_SIZE-1 > dev_info->end - dev_info->start) return -ERANGE; - *data = (unsigned long) (dev_info->start+pgoff*PAGE_SIZE); + *kaddr = (void *) (dev_info->start+pgoff*PAGE_SIZE); + *pfn = virt_to_phys(*kaddr) >> PAGE_SHIFT; + return 0; } diff --git a/fs/ext2/xip.c b/fs/ext2/xip.c index ca7f00312388..430b4c8ee971 100644 --- a/fs/ext2/xip.c +++ b/fs/ext2/xip.c @@ -16,11 +16,13 @@ static inline int __inode_direct_access(struct inode *inode, sector_t sector, - unsigned long *data) + void **kaddr, unsigned long *pfn) { - BUG_ON(!inode->i_sb->s_bdev->bd_disk->fops->direct_access); - return inode->i_sb->s_bdev->bd_disk->fops - ->direct_access(inode->i_sb->s_bdev,sector,data); + struct block_device *bdev = inode->i_sb->s_bdev; + struct block_device_operations *ops = bdev->bd_disk->fops; + + BUG_ON(!ops->direct_access); + return ops->direct_access(bdev, sector, kaddr, pfn); } static inline int @@ -48,12 +50,13 @@ int ext2_clear_xip_target(struct inode *inode, int block) { sector_t sector = block * (PAGE_SIZE/512); - unsigned long data; + void *kaddr; + unsigned long pfn; int rc; - rc = __inode_direct_access(inode, sector, &data); + rc = __inode_direct_access(inode, sector, &kaddr, &pfn); if (!rc) - clear_page((void*)data); + clear_page(kaddr); return rc; } @@ -74,7 +77,8 @@ ext2_get_xip_page(struct address_space *mapping, sector_t offset, int create) { int rc; - unsigned long data; + void *kaddr; + unsigned long pfn; sector_t sector; /* first, retrieve the sector number */ @@ -84,9 +88,9 @@ ext2_get_xip_page(struct address_space *mapping, sector_t offset, /* retrieve address of the target data */ rc = __inode_direct_access - (mapping->host, sector * (PAGE_SIZE/512), &data); + (mapping->host, sector * (PAGE_SIZE/512), &kaddr, &pfn); if (!rc) - return virt_to_page(data); + return pfn_to_page(pfn); error: return ERR_PTR(rc); diff --git a/include/linux/fs.h b/include/linux/fs.h index d6d7c52055c6..bd05f5678045 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -1178,7 +1178,8 @@ struct block_device_operations { int (*ioctl) (struct inode *, struct file *, unsigned, unsigned long); long (*unlocked_ioctl) (struct file *, unsigned, unsigned long); long (*compat_ioctl) (struct file *, unsigned, unsigned long); - int (*direct_access) (struct block_device *, sector_t, unsigned long *); + int (*direct_access) (struct block_device *, sector_t, + void **, unsigned long *); int (*media_changed) (struct gendisk *); int (*revalidate_disk) (struct gendisk *); int (*getgeo)(struct block_device *, struct hd_geometry *); -- cgit v1.2.3 From 70688e4dd1647f0ceb502bbd5964fa344c5eb411 Mon Sep 17 00:00:00 2001 From: Nick Piggin Date: Mon, 28 Apr 2008 02:13:02 -0700 Subject: xip: support non-struct page backed memory Convert XIP to support non-struct page backed memory, using VM_MIXEDMAP for the user mappings. This requires the get_xip_page API to be changed to an address based one. Improve the API layering a little bit too, while we're here. This is required in order to support XIP filesystems on memory that isn't backed with struct page (but memory with struct page is still supported too). Signed-off-by: Nick Piggin Acked-by: Carsten Otte Cc: Jared Hulbert Cc: Martin Schwidefsky Cc: Heiko Carstens Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/ext2/inode.c | 2 +- fs/ext2/xip.c | 37 ++++------ fs/ext2/xip.h | 9 +-- fs/open.c | 2 +- include/linux/fs.h | 4 +- mm/fadvise.c | 2 +- mm/filemap_xip.c | 200 ++++++++++++++++++++++++++--------------------------- mm/madvise.c | 2 +- 8 files changed, 126 insertions(+), 132 deletions(-) diff --git a/fs/ext2/inode.c b/fs/ext2/inode.c index b8a2990bab83..687023bdfd1e 100644 --- a/fs/ext2/inode.c +++ b/fs/ext2/inode.c @@ -796,7 +796,7 @@ const struct address_space_operations ext2_aops = { const struct address_space_operations ext2_aops_xip = { .bmap = ext2_bmap, - .get_xip_page = ext2_get_xip_page, + .get_xip_mem = ext2_get_xip_mem, }; const struct address_space_operations ext2_nobh_aops = { diff --git a/fs/ext2/xip.c b/fs/ext2/xip.c index 430b4c8ee971..233f7fdbe31d 100644 --- a/fs/ext2/xip.c +++ b/fs/ext2/xip.c @@ -15,26 +15,28 @@ #include "xip.h" static inline int -__inode_direct_access(struct inode *inode, sector_t sector, +__inode_direct_access(struct inode *inode, sector_t block, void **kaddr, unsigned long *pfn) { struct block_device *bdev = inode->i_sb->s_bdev; struct block_device_operations *ops = bdev->bd_disk->fops; + sector_t sector; + + sector = block * (PAGE_SIZE / 512); /* ext2 block to bdev sector */ BUG_ON(!ops->direct_access); return ops->direct_access(bdev, sector, kaddr, pfn); } static inline int -__ext2_get_sector(struct inode *inode, sector_t offset, int create, +__ext2_get_block(struct inode *inode, pgoff_t pgoff, int create, sector_t *result) { struct buffer_head tmp; int rc; memset(&tmp, 0, sizeof(struct buffer_head)); - rc = ext2_get_block(inode, offset/ (PAGE_SIZE/512), &tmp, - create); + rc = ext2_get_block(inode, pgoff, &tmp, create); *result = tmp.b_blocknr; /* did we get a sparse block (hole in the file)? */ @@ -47,14 +49,13 @@ __ext2_get_sector(struct inode *inode, sector_t offset, int create, } int -ext2_clear_xip_target(struct inode *inode, int block) +ext2_clear_xip_target(struct inode *inode, sector_t block) { - sector_t sector = block * (PAGE_SIZE/512); void *kaddr; unsigned long pfn; int rc; - rc = __inode_direct_access(inode, sector, &kaddr, &pfn); + rc = __inode_direct_access(inode, block, &kaddr, &pfn); if (!rc) clear_page(kaddr); return rc; @@ -72,26 +73,18 @@ void ext2_xip_verify_sb(struct super_block *sb) } } -struct page * -ext2_get_xip_page(struct address_space *mapping, sector_t offset, - int create) +int ext2_get_xip_mem(struct address_space *mapping, pgoff_t pgoff, int create, + void **kmem, unsigned long *pfn) { int rc; - void *kaddr; - unsigned long pfn; - sector_t sector; + sector_t block; /* first, retrieve the sector number */ - rc = __ext2_get_sector(mapping->host, offset, create, §or); + rc = __ext2_get_block(mapping->host, pgoff, create, &block); if (rc) - goto error; + return rc; /* retrieve address of the target data */ - rc = __inode_direct_access - (mapping->host, sector * (PAGE_SIZE/512), &kaddr, &pfn); - if (!rc) - return pfn_to_page(pfn); - - error: - return ERR_PTR(rc); + rc = __inode_direct_access(mapping->host, block, kmem, pfn); + return rc; } diff --git a/fs/ext2/xip.h b/fs/ext2/xip.h index aa85331d6c56..18b34d2f31b3 100644 --- a/fs/ext2/xip.h +++ b/fs/ext2/xip.h @@ -7,19 +7,20 @@ #ifdef CONFIG_EXT2_FS_XIP extern void ext2_xip_verify_sb (struct super_block *); -extern int ext2_clear_xip_target (struct inode *, int); +extern int ext2_clear_xip_target (struct inode *, sector_t); static inline int ext2_use_xip (struct super_block *sb) { struct ext2_sb_info *sbi = EXT2_SB(sb); return (sbi->s_mount_opt & EXT2_MOUNT_XIP); } -struct page* ext2_get_xip_page (struct address_space *, sector_t, int); -#define mapping_is_xip(map) unlikely(map->a_ops->get_xip_page) +int ext2_get_xip_mem(struct address_space *, pgoff_t, int, + void **, unsigned long *); +#define mapping_is_xip(map) unlikely(map->a_ops->get_xip_mem) #else #define mapping_is_xip(map) 0 #define ext2_xip_verify_sb(sb) do { } while (0) #define ext2_use_xip(sb) 0 #define ext2_clear_xip_target(inode, chain) 0 -#define ext2_get_xip_page NULL +#define ext2_get_xip_mem NULL #endif diff --git a/fs/open.c b/fs/open.c index b70e7666bb2c..7af1f05d5978 100644 --- a/fs/open.c +++ b/fs/open.c @@ -837,7 +837,7 @@ static struct file *__dentry_open(struct dentry *dentry, struct vfsmount *mnt, if (f->f_flags & O_DIRECT) { if (!f->f_mapping->a_ops || ((!f->f_mapping->a_ops->direct_IO) && - (!f->f_mapping->a_ops->get_xip_page))) { + (!f->f_mapping->a_ops->get_xip_mem))) { fput(f); f = ERR_PTR(-EINVAL); } diff --git a/include/linux/fs.h b/include/linux/fs.h index bd05f5678045..2c925747bc49 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -474,8 +474,8 @@ struct address_space_operations { int (*releasepage) (struct page *, gfp_t); ssize_t (*direct_IO)(int, struct kiocb *, const struct iovec *iov, loff_t offset, unsigned long nr_segs); - struct page* (*get_xip_page)(struct address_space *, sector_t, - int); + int (*get_xip_mem)(struct address_space *, pgoff_t, int, + void **, unsigned long *); /* migrate the contents of a page to the specified target */ int (*migratepage) (struct address_space *, struct page *, struct page *); diff --git a/mm/fadvise.c b/mm/fadvise.c index 3c0f1e99f5e4..343cfdfebd9e 100644 --- a/mm/fadvise.c +++ b/mm/fadvise.c @@ -49,7 +49,7 @@ asmlinkage long sys_fadvise64_64(int fd, loff_t offset, loff_t len, int advice) goto out; } - if (mapping->a_ops->get_xip_page) { + if (mapping->a_ops->get_xip_mem) { switch (advice) { case POSIX_FADV_NORMAL: case POSIX_FADV_RANDOM: diff --git a/mm/filemap_xip.c b/mm/filemap_xip.c index 5e598c42afd7..3e744abcce9d 100644 --- a/mm/filemap_xip.c +++ b/mm/filemap_xip.c @@ -15,6 +15,7 @@ #include #include #include +#include /* * We do use our own empty page to avoid interference with other users @@ -42,37 +43,41 @@ static struct page *xip_sparse_page(void) /* * This is a file read routine for execute in place files, and uses - * the mapping->a_ops->get_xip_page() function for the actual low-level + * the mapping->a_ops->get_xip_mem() function for the actual low-level * stuff. * * Note the struct file* is not used at all. It may be NULL. */ -static void +static ssize_t do_xip_mapping_read(struct address_space *mapping, struct file_ra_state *_ra, struct file *filp, - loff_t *ppos, - read_descriptor_t *desc, - read_actor_t actor) + char __user *buf, + size_t len, + loff_t *ppos) { struct inode *inode = mapping->host; pgoff_t index, end_index; unsigned long offset; - loff_t isize; + loff_t isize, pos; + size_t copied = 0, error = 0; - BUG_ON(!mapping->a_ops->get_xip_page); + BUG_ON(!mapping->a_ops->get_xip_mem); - index = *ppos >> PAGE_CACHE_SHIFT; - offset = *ppos & ~PAGE_CACHE_MASK; + pos = *ppos; + index = pos >> PAGE_CACHE_SHIFT; + offset = pos & ~PAGE_CACHE_MASK; isize = i_size_read(inode); if (!isize) goto out; end_index = (isize - 1) >> PAGE_CACHE_SHIFT; - for (;;) { - struct page *page; - unsigned long nr, ret; + do { + unsigned long nr, left; + void *xip_mem; + unsigned long xip_pfn; + int zero = 0; /* nr is the maximum number of bytes to copy from this page */ nr = PAGE_CACHE_SIZE; @@ -85,19 +90,17 @@ do_xip_mapping_read(struct address_space *mapping, } } nr = nr - offset; + if (nr > len) + nr = len; - page = mapping->a_ops->get_xip_page(mapping, - index*(PAGE_SIZE/512), 0); - if (!page) - goto no_xip_page; - if (unlikely(IS_ERR(page))) { - if (PTR_ERR(page) == -ENODATA) { + error = mapping->a_ops->get_xip_mem(mapping, index, 0, + &xip_mem, &xip_pfn); + if (unlikely(error)) { + if (error == -ENODATA) { /* sparse */ - page = ZERO_PAGE(0); - } else { - desc->error = PTR_ERR(page); + zero = 1; + } else goto out; - } } /* If users can be writing to this page using arbitrary @@ -105,10 +108,10 @@ do_xip_mapping_read(struct address_space *mapping, * before reading the page on the kernel side. */ if (mapping_writably_mapped(mapping)) - flush_dcache_page(page); + /* address based flush */ ; /* - * Ok, we have the page, so now we can copy it to user space... + * Ok, we have the mem, so now we can copy it to user space... * * The actor routine returns how many bytes were actually used.. * NOTE! This may not be the same as how much of a user buffer @@ -116,47 +119,38 @@ do_xip_mapping_read(struct address_space *mapping, * "pos" here (the actor routine has to update the user buffer * pointers and the remaining count). */ - ret = actor(desc, page, offset, nr); - offset += ret; - index += offset >> PAGE_CACHE_SHIFT; - offset &= ~PAGE_CACHE_MASK; + if (!zero) + left = __copy_to_user(buf+copied, xip_mem+offset, nr); + else + left = __clear_user(buf + copied, nr); - if (ret == nr && desc->count) - continue; - goto out; + if (left) { + error = -EFAULT; + goto out; + } -no_xip_page: - /* Did not get the page. Report it */ - desc->error = -EIO; - goto out; - } + copied += (nr - left); + offset += (nr - left); + index += offset >> PAGE_CACHE_SHIFT; + offset &= ~PAGE_CACHE_MASK; + } while (copied < len); out: - *ppos = ((loff_t) index << PAGE_CACHE_SHIFT) + offset; + *ppos = pos + copied; if (filp) file_accessed(filp); + + return (copied ? copied : error); } ssize_t xip_file_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos) { - read_descriptor_t desc; - if (!access_ok(VERIFY_WRITE, buf, len)) return -EFAULT; - desc.written = 0; - desc.arg.buf = buf; - desc.count = len; - desc.error = 0; - - do_xip_mapping_read(filp->f_mapping, &filp->f_ra, filp, - ppos, &desc, file_read_actor); - - if (desc.written) - return desc.written; - else - return desc.error; + return do_xip_mapping_read(filp->f_mapping, &filp->f_ra, filp, + buf, len, ppos); } EXPORT_SYMBOL_GPL(xip_file_read); @@ -211,13 +205,16 @@ __xip_unmap (struct address_space * mapping, * * This function is derived from filemap_fault, but used for execute in place */ -static int xip_file_fault(struct vm_area_struct *area, struct vm_fault *vmf) +static int xip_file_fault(struct vm_area_struct *vma, struct vm_fault *vmf) { - struct file *file = area->vm_file; + struct file *file = vma->vm_file; struct address_space *mapping = file->f_mapping; struct inode *inode = mapping->host; - struct page *page; pgoff_t size; + void *xip_mem; + unsigned long xip_pfn; + struct page *page; + int error; /* XXX: are VM_FAULT_ codes OK? */ @@ -225,35 +222,44 @@ static int xip_file_fault(struct vm_area_struct *area, struct vm_fault *vmf) if (vmf->pgoff >= size) return VM_FAULT_SIGBUS; - page = mapping->a_ops->get_xip_page(mapping, - vmf->pgoff*(PAGE_SIZE/512), 0); - if (!IS_ERR(page)) - goto out; - if (PTR_ERR(page) != -ENODATA) + error = mapping->a_ops->get_xip_mem(mapping, vmf->pgoff, 0, + &xip_mem, &xip_pfn); + if (likely(!error)) + goto found; + if (error != -ENODATA) return VM_FAULT_OOM; /* sparse block */ - if ((area->vm_flags & (VM_WRITE | VM_MAYWRITE)) && - (area->vm_flags & (VM_SHARED| VM_MAYSHARE)) && + if ((vma->vm_flags & (VM_WRITE | VM_MAYWRITE)) && + (vma->vm_flags & (VM_SHARED | VM_MAYSHARE)) && (!(mapping->host->i_sb->s_flags & MS_RDONLY))) { + int err; + /* maybe shared writable, allocate new block */ - page = mapping->a_ops->get_xip_page(mapping, - vmf->pgoff*(PAGE_SIZE/512), 1); - if (IS_ERR(page)) + error = mapping->a_ops->get_xip_mem(mapping, vmf->pgoff, 1, + &xip_mem, &xip_pfn); + if (error) return VM_FAULT_SIGBUS; - /* unmap page at pgoff from all other vmas */ + /* unmap sparse mappings at pgoff from all other vmas */ __xip_unmap(mapping, vmf->pgoff); + +found: + err = vm_insert_mixed(vma, (unsigned long)vmf->virtual_address, + xip_pfn); + if (err == -ENOMEM) + return VM_FAULT_OOM; + BUG_ON(err); + return VM_FAULT_NOPAGE; } else { /* not shared and writable, use xip_sparse_page() */ page = xip_sparse_page(); if (!page) return VM_FAULT_OOM; - } -out: - page_cache_get(page); - vmf->page = page; - return 0; + page_cache_get(page); + vmf->page = page; + return 0; + } } static struct vm_operations_struct xip_file_vm_ops = { @@ -262,11 +268,11 @@ static struct vm_operations_struct xip_file_vm_ops = { int xip_file_mmap(struct file * file, struct vm_area_struct * vma) { - BUG_ON(!file->f_mapping->a_ops->get_xip_page); + BUG_ON(!file->f_mapping->a_ops->get_xip_mem); file_accessed(file); vma->vm_ops = &xip_file_vm_ops; - vma->vm_flags |= VM_CAN_NONLINEAR; + vma->vm_flags |= VM_CAN_NONLINEAR | VM_MIXEDMAP; return 0; } EXPORT_SYMBOL_GPL(xip_file_mmap); @@ -279,17 +285,17 @@ __xip_file_write(struct file *filp, const char __user *buf, const struct address_space_operations *a_ops = mapping->a_ops; struct inode *inode = mapping->host; long status = 0; - struct page *page; size_t bytes; ssize_t written = 0; - BUG_ON(!mapping->a_ops->get_xip_page); + BUG_ON(!mapping->a_ops->get_xip_mem); do { unsigned long index; unsigned long offset; size_t copied; - char *kaddr; + void *xip_mem; + unsigned long xip_pfn; offset = (pos & (PAGE_CACHE_SIZE -1)); /* Within page */ index = pos >> PAGE_CACHE_SHIFT; @@ -297,28 +303,22 @@ __xip_file_write(struct file *filp, const char __user *buf, if (bytes > count) bytes = count; - page = a_ops->get_xip_page(mapping, - index*(PAGE_SIZE/512), 0); - if (IS_ERR(page) && (PTR_ERR(page) == -ENODATA)) { + status = a_ops->get_xip_mem(mapping, index, 0, + &xip_mem, &xip_pfn); + if (status == -ENODATA) { /* we allocate a new page unmap it */ - page = a_ops->get_xip_page(mapping, - index*(PAGE_SIZE/512), 1); - if (!IS_ERR(page)) + status = a_ops->get_xip_mem(mapping, index, 1, + &xip_mem, &xip_pfn); + if (!status) /* unmap page at pgoff from all other vmas */ __xip_unmap(mapping, index); } - if (IS_ERR(page)) { - status = PTR_ERR(page); + if (status) break; - } - fault_in_pages_readable(buf, bytes); - kaddr = kmap_atomic(page, KM_USER0); copied = bytes - - __copy_from_user_inatomic_nocache(kaddr + offset, buf, bytes); - kunmap_atomic(kaddr, KM_USER0); - flush_dcache_page(page); + __copy_from_user_nocache(xip_mem + offset, buf, bytes); if (likely(copied > 0)) { status = copied; @@ -398,7 +398,7 @@ EXPORT_SYMBOL_GPL(xip_file_write); /* * truncate a page used for execute in place - * functionality is analog to block_truncate_page but does use get_xip_page + * functionality is analog to block_truncate_page but does use get_xip_mem * to get the page instead of page cache */ int @@ -408,9 +408,11 @@ xip_truncate_page(struct address_space *mapping, loff_t from) unsigned offset = from & (PAGE_CACHE_SIZE-1); unsigned blocksize; unsigned length; - struct page *page; + void *xip_mem; + unsigned long xip_pfn; + int err; - BUG_ON(!mapping->a_ops->get_xip_page); + BUG_ON(!mapping->a_ops->get_xip_mem); blocksize = 1 << mapping->host->i_blkbits; length = offset & (blocksize - 1); @@ -421,18 +423,16 @@ xip_truncate_page(struct address_space *mapping, loff_t from) length = blocksize - length; - page = mapping->a_ops->get_xip_page(mapping, - index*(PAGE_SIZE/512), 0); - if (!page) - return -ENOMEM; - if (unlikely(IS_ERR(page))) { - if (PTR_ERR(page) == -ENODATA) + err = mapping->a_ops->get_xip_mem(mapping, index, 0, + &xip_mem, &xip_pfn); + if (unlikely(err)) { + if (err == -ENODATA) /* Hole? No need to truncate */ return 0; else - return PTR_ERR(page); + return err; } - zero_user(page, offset, length); + memset(xip_mem + offset, 0, length); return 0; } EXPORT_SYMBOL_GPL(xip_truncate_page); diff --git a/mm/madvise.c b/mm/madvise.c index 93ee375b38e7..23a0ec3e0ea0 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -112,7 +112,7 @@ static long madvise_willneed(struct vm_area_struct * vma, if (!file) return -EBADF; - if (file->f_mapping->a_ops->get_xip_page) { + if (file->f_mapping->a_ops->get_xip_mem) { /* no bad return value, but ignore advice */ return 0; } -- cgit v1.2.3 From a08cb629f546d1cecebe955392197f226e58dbe1 Mon Sep 17 00:00:00 2001 From: Nick Piggin Date: Mon, 28 Apr 2008 02:13:03 -0700 Subject: s390: implement pte special bit Convert XIP to support non-struct page backed memory, using VM_MIXEDMAP for the user mappings. This requires the get_xip_page API to be changed to an address based one. Improve the API layering a little bit too, while we're here. This is required in order to support XIP filesystems on memory that isn't backed with struct page (but memory with struct page is still supported too). Signed-off-by: Nick Piggin Acked-by: Carsten Otte Cc: Jared Hulbert Cc: Martin Schwidefsky Cc: Heiko Carstens Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/asm-s390/pgtable.h | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/include/asm-s390/pgtable.h b/include/asm-s390/pgtable.h index 76e8a7904e8a..f8347ce9c5a1 100644 --- a/include/asm-s390/pgtable.h +++ b/include/asm-s390/pgtable.h @@ -220,6 +220,8 @@ extern char empty_zero_page[PAGE_SIZE]; /* Software bits in the page table entry */ #define _PAGE_SWT 0x001 /* SW pte type bit t */ #define _PAGE_SWX 0x002 /* SW pte type bit x */ +#define _PAGE_SPECIAL 0x004 /* SW associated with special page */ +#define __HAVE_ARCH_PTE_SPECIAL /* Six different types of pages. */ #define _PAGE_TYPE_EMPTY 0x400 @@ -520,7 +522,7 @@ static inline int pte_file(pte_t pte) static inline int pte_special(pte_t pte) { - return 0; + return (pte_val(pte) & _PAGE_SPECIAL); } #define __HAVE_ARCH_PTE_SAME @@ -722,6 +724,7 @@ static inline pte_t pte_mkyoung(pte_t pte) static inline pte_t pte_mkspecial(pte_t pte) { + pte_val(pte) |= _PAGE_SPECIAL; return pte; } -- cgit v1.2.3 From 122c7a59055c77434118d7dd4dff4b625d4a2c15 Mon Sep 17 00:00:00 2001 From: Ken'ichi Ohmichi Date: Mon, 28 Apr 2008 02:13:04 -0700 Subject: vmcoreinfo: add page flags values Add some values of page flags to the vmcoreinfo data. The vmcoreinfo data has the minimum debugging information only for dump filtering. makedumpfile (dump filtering command) gets it to distinguish unnecessary pages, and makedumpfile creates a small dumpfile. An old makedumpfile (v1.2.4 or before) had assumed some values of page flags internally, and this implementation could not follow the change of these values. For example, Christoph Lameter is changing these values by the follwing patch: http://lkml.org/lkml/2008/2/29/463 So a new makedumpfile (v1.2.5) came to need these values and I created this patch to let the kernel output them. Signed-off-by: Ken'ichi Ohmichi Cc: Christoph Lameter Cc: "Eric W. Biederman" Acked-by: Vivek Goyal Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- kernel/kexec.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/kexec.c b/kernel/kexec.c index 6782dce93d01..cb85c79989b4 100644 --- a/kernel/kexec.c +++ b/kernel/kexec.c @@ -1405,6 +1405,9 @@ static int __init crash_save_vmcoreinfo_init(void) VMCOREINFO_LENGTH(zone.free_area, MAX_ORDER); VMCOREINFO_LENGTH(free_area.free_list, MIGRATE_TYPES); VMCOREINFO_NUMBER(NR_FREE_PAGES); + VMCOREINFO_NUMBER(PG_lru); + VMCOREINFO_NUMBER(PG_private); + VMCOREINFO_NUMBER(PG_swapcache); arch_crash_save_vmcoreinfo(); -- cgit v1.2.3 From 3b1163006332302117b1b2acf226d4014ff46525 Mon Sep 17 00:00:00 2001 From: Adam Litke Date: Mon, 28 Apr 2008 02:13:06 -0700 Subject: Subject: [PATCH] hugetlb: vmstat events for huge page allocations Allocating huge pages directly from the buddy allocator is not guaranteed to succeed. Success depends on several factors (such as the amount of physical memory available and the level of fragmentation). With the addition of dynamic hugetlb pool resizing, allocations can occur much more frequently. For these reasons it is desirable to keep track of huge page allocation successes and failures. Add two new vmstat entries to track huge page allocations that succeed and fail. The presence of the two entries is contingent upon CONFIG_HUGETLB_PAGE being enabled. [akpm@linux-foundation.org: reduced ifdeffery] Signed-off-by: Adam Litke Signed-off-by: Eric Munson Tested-by: Mel Gorman Reviewed-by: Andy Whitcroft Cc: KOSAKI Motohiro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/vmstat.h | 4 ++++ mm/hugetlb.c | 7 +++++++ mm/vmstat.c | 4 ++++ 3 files changed, 15 insertions(+) diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index e726b6d46495..e83b69346d23 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -25,6 +25,7 @@ #define HIGHMEM_ZONE(xx) #endif + #define FOR_ALL_ZONES(xx) DMA_ZONE(xx) DMA32_ZONE(xx) xx##_NORMAL HIGHMEM_ZONE(xx) , xx##_MOVABLE enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, @@ -37,6 +38,9 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, FOR_ALL_ZONES(PGSCAN_DIRECT), PGINODESTEAL, SLABS_SCANNED, KSWAPD_STEAL, KSWAPD_INODESTEAL, PAGEOUTRUN, ALLOCSTALL, PGROTATED, +#ifdef CONFIG_HUGETLB_PAGE + HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL, +#endif NR_VM_EVENT_ITEMS }; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 93ea46a0fba4..8deae4eb9696 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -242,6 +242,11 @@ static int alloc_fresh_huge_page(void) hugetlb_next_nid = next_nid; } while (!page && hugetlb_next_nid != start_nid); + if (ret) + count_vm_event(HTLB_BUDDY_PGALLOC); + else + count_vm_event(HTLB_BUDDY_PGALLOC_FAIL); + return ret; } @@ -302,9 +307,11 @@ static struct page *alloc_buddy_huge_page(struct vm_area_struct *vma, */ nr_huge_pages_node[nid]++; surplus_huge_pages_node[nid]++; + __count_vm_event(HTLB_BUDDY_PGALLOC); } else { nr_huge_pages--; surplus_huge_pages--; + __count_vm_event(HTLB_BUDDY_PGALLOC_FAIL); } spin_unlock(&hugetlb_lock); diff --git a/mm/vmstat.c b/mm/vmstat.c index 879bcc0a1d4c..4c21670f8d91 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -645,6 +645,10 @@ static const char * const vmstat_text[] = { "allocstall", "pgrotated", +#ifdef CONFIG_HUGETLB_PAGE + "htlb_buddy_alloc_success", + "htlb_buddy_alloc_fail", +#endif #endif }; -- cgit v1.2.3 From f0be3d32b05d3fea2fcdbbb81a39dac2a7163169 Mon Sep 17 00:00:00 2001 From: Lee Schermerhorn Date: Mon, 28 Apr 2008 02:13:08 -0700 Subject: mempolicy: rename mpol_free to mpol_put This is a change that was requested some time ago by Mel Gorman. Makes sense to me, so here it is. Note: I retain the name "mpol_free_shared_policy()" because it actually does free the shared_policy, which is NOT a reference counted object. However, ... The mempolicy object[s] referenced by the shared_policy are reference counted, so mpol_put() is used to release the reference held by the shared_policy. The mempolicy might not be freed at this time, because some task attached to the shared object associated with the shared policy may be in the process of allocating a page based on the mempolicy. In that case, the task performing the allocation will hold a reference on the mempolicy, obtained via mpol_shared_policy_lookup(). The mempolicy will be freed when all tasks holding such a reference have called mpol_put() for the mempolicy. Signed-off-by: Lee Schermerhorn Cc: Christoph Lameter Cc: David Rientjes Cc: Mel Gorman Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mempolicy.h | 10 +++++----- kernel/exit.c | 2 +- kernel/fork.c | 2 +- mm/hugetlb.c | 2 +- mm/mempolicy.c | 26 +++++++++++++------------- mm/mmap.c | 6 +++--- mm/shmem.c | 4 ++-- 7 files changed, 26 insertions(+), 26 deletions(-) diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index 319fd342b1b7..507bf5e29f24 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -71,7 +71,7 @@ struct mm_struct; * * Freeing policy: * Mempolicy objects are reference counted. A mempolicy will be freed when - * mpol_free() decrements the reference count to zero. + * mpol_put() decrements the reference count to zero. * * Copying policy objects: * mpol_copy() allocates a new mempolicy and copies the specified mempolicy @@ -98,11 +98,11 @@ struct mempolicy { * The default fast path of a NULL MPOL_DEFAULT policy is always inlined. */ -extern void __mpol_free(struct mempolicy *pol); -static inline void mpol_free(struct mempolicy *pol) +extern void __mpol_put(struct mempolicy *pol); +static inline void mpol_put(struct mempolicy *pol) { if (pol) - __mpol_free(pol); + __mpol_put(pol); } extern struct mempolicy *__mpol_copy(struct mempolicy *pol); @@ -190,7 +190,7 @@ static inline int mpol_equal(struct mempolicy *a, struct mempolicy *b) return 1; } -static inline void mpol_free(struct mempolicy *p) +static inline void mpol_put(struct mempolicy *p) { } diff --git a/kernel/exit.c b/kernel/exit.c index 97f609f574b1..2a9d98c641ac 100644 --- a/kernel/exit.c +++ b/kernel/exit.c @@ -967,7 +967,7 @@ NORET_TYPE void do_exit(long code) proc_exit_connector(tsk); exit_notify(tsk, group_dead); #ifdef CONFIG_NUMA - mpol_free(tsk->mempolicy); + mpol_put(tsk->mempolicy); tsk->mempolicy = NULL; #endif #ifdef CONFIG_FUTEX diff --git a/kernel/fork.c b/kernel/fork.c index c674aa8d3c31..1a5ae2084574 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -1374,7 +1374,7 @@ bad_fork_cleanup_security: security_task_free(p); bad_fork_cleanup_policy: #ifdef CONFIG_NUMA - mpol_free(p->mempolicy); + mpol_put(p->mempolicy); bad_fork_cleanup_cgroup: #endif cgroup_exit(p, cgroup_callbacks_done); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8deae4eb9696..53afa8c76ada 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -116,7 +116,7 @@ static struct page *dequeue_huge_page_vma(struct vm_area_struct *vma, break; } } - mpol_free(mpol); /* unref if mpol !NULL */ + mpol_put(mpol); /* unref if mpol !NULL */ return page; } diff --git a/mm/mempolicy.c b/mm/mempolicy.c index c1b907789d84..ce2c5b6bf9f8 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -529,7 +529,7 @@ static int policy_vma(struct vm_area_struct *vma, struct mempolicy *new) if (!err) { mpol_get(new); vma->vm_policy = new; - mpol_free(old); + mpol_put(old); } return err; } @@ -595,7 +595,7 @@ static long do_set_mempolicy(unsigned short mode, unsigned short flags, new = mpol_new(mode, flags, nodes); if (IS_ERR(new)) return PTR_ERR(new); - mpol_free(current->mempolicy); + mpol_put(current->mempolicy); current->mempolicy = new; mpol_set_task_struct_flag(); if (new && new->policy == MPOL_INTERLEAVE && @@ -948,7 +948,7 @@ static long do_mbind(unsigned long start, unsigned long len, } up_write(&mm->mmap_sem); - mpol_free(new); + mpol_put(new); return err; } @@ -1446,14 +1446,14 @@ struct zonelist *huge_zonelist(struct vm_area_struct *vma, unsigned long addr, nid = interleave_nid(pol, vma, addr, HPAGE_SHIFT); if (unlikely(pol != &default_policy && pol != current->mempolicy)) - __mpol_free(pol); /* finished with pol */ + __mpol_put(pol); /* finished with pol */ return node_zonelist(nid, gfp_flags); } zl = zonelist_policy(GFP_HIGHUSER, pol); if (unlikely(pol != &default_policy && pol != current->mempolicy)) { if (pol->policy != MPOL_BIND) - __mpol_free(pol); /* finished with pol */ + __mpol_put(pol); /* finished with pol */ else *mpol = pol; /* unref needed after allocation */ } @@ -1512,7 +1512,7 @@ alloc_page_vma(gfp_t gfp, struct vm_area_struct *vma, unsigned long addr) nid = interleave_nid(pol, vma, addr, PAGE_SHIFT); if (unlikely(pol != &default_policy && pol != current->mempolicy)) - __mpol_free(pol); /* finished with pol */ + __mpol_put(pol); /* finished with pol */ return alloc_page_interleave(gfp, 0, nid); } zl = zonelist_policy(gfp, pol); @@ -1522,7 +1522,7 @@ alloc_page_vma(gfp_t gfp, struct vm_area_struct *vma, unsigned long addr) */ struct page *page = __alloc_pages_nodemask(gfp, 0, zl, nodemask_policy(gfp, pol)); - __mpol_free(pol); + __mpol_put(pol); return page; } /* @@ -1624,7 +1624,7 @@ int __mpol_equal(struct mempolicy *a, struct mempolicy *b) } /* Slow path of a mpol destructor. */ -void __mpol_free(struct mempolicy *p) +void __mpol_put(struct mempolicy *p) { if (!atomic_dec_and_test(&p->refcnt)) return; @@ -1720,7 +1720,7 @@ static void sp_delete(struct shared_policy *sp, struct sp_node *n) { pr_debug("deleting %lx-l%lx\n", n->start, n->end); rb_erase(&n->nd, &sp->root); - mpol_free(n->policy); + mpol_put(n->policy); kmem_cache_free(sn_cache, n); } @@ -1780,7 +1780,7 @@ restart: sp_insert(sp, new); spin_unlock(&sp->lock); if (new2) { - mpol_free(new2->policy); + mpol_put(new2->policy); kmem_cache_free(sn_cache, new2); } return 0; @@ -1805,7 +1805,7 @@ void mpol_shared_policy_init(struct shared_policy *info, unsigned short policy, /* Policy covers entire file */ pvma.vm_end = TASK_SIZE; mpol_set_shared_policy(info, &pvma, newpol); - mpol_free(newpol); + mpol_put(newpol); } } } @@ -1848,7 +1848,7 @@ void mpol_free_shared_policy(struct shared_policy *p) n = rb_entry(next, struct sp_node, nd); next = rb_next(&n->nd); rb_erase(&n->nd, &p->root); - mpol_free(n->policy); + mpol_put(n->policy); kmem_cache_free(sn_cache, n); } spin_unlock(&p->lock); @@ -2068,7 +2068,7 @@ int show_numa_map(struct seq_file *m, void *v) * unref shared or other task's mempolicy */ if (pol != &default_policy && pol != current->mempolicy) - __mpol_free(pol); + __mpol_put(pol); seq_printf(m, "%08lx %s", vma->vm_start, buffer); diff --git a/mm/mmap.c b/mm/mmap.c index 6aaf657adb87..36c85e04fa93 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -232,7 +232,7 @@ static struct vm_area_struct *remove_vma(struct vm_area_struct *vma) vma->vm_ops->close(vma); if (vma->vm_file) fput(vma->vm_file); - mpol_free(vma_policy(vma)); + mpol_put(vma_policy(vma)); kmem_cache_free(vm_area_cachep, vma); return next; } @@ -626,7 +626,7 @@ again: remove_next = 1 + (end > next->vm_end); if (file) fput(file); mm->map_count--; - mpol_free(vma_policy(next)); + mpol_put(vma_policy(next)); kmem_cache_free(vm_area_cachep, next); /* * In mprotect's case 6 (see comments on vma_merge), @@ -1182,7 +1182,7 @@ munmap_back: if (file && vma_merge(mm, prev, addr, vma->vm_end, vma->vm_flags, NULL, file, pgoff, vma_policy(vma))) { - mpol_free(vma_policy(vma)); + mpol_put(vma_policy(vma)); kmem_cache_free(vm_area_cachep, vma); fput(file); } else { diff --git a/mm/shmem.c b/mm/shmem.c index 177c7a7d2bb3..5326876d814d 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1196,7 +1196,7 @@ static struct page *shmem_swapin(swp_entry_t entry, gfp_t gfp, pvma.vm_ops = NULL; pvma.vm_policy = mpol_shared_policy_lookup(&info->policy, idx); page = swapin_readahead(entry, gfp, &pvma, 0); - mpol_free(pvma.vm_policy); + mpol_put(pvma.vm_policy); return page; } @@ -1212,7 +1212,7 @@ static struct page *shmem_alloc_page(gfp_t gfp, pvma.vm_ops = NULL; pvma.vm_policy = mpol_shared_policy_lookup(&info->policy, idx); page = alloc_page_vma(gfp, &pvma, 0); - mpol_free(pvma.vm_policy); + mpol_put(pvma.vm_policy); return page; } #else /* !CONFIG_NUMA */ -- cgit v1.2.3 From 846a16bf0fc80dc95a414ffce465e3cbf9680247 Mon Sep 17 00:00:00 2001 From: Lee Schermerhorn Date: Mon, 28 Apr 2008 02:13:09 -0700 Subject: mempolicy: rename mpol_copy to mpol_dup This patch renames mpol_copy() to mpol_dup() because, well, that's what it does. Like, e.g., strdup() for strings, mpol_dup() takes a pointer to an existing mempolicy, allocates a new one and copies the contents. In a later patch, I want to use the name mpol_copy() to copy the contents from one mempolicy to another like, e.g., strcpy() does for strings. Signed-off-by: Lee Schermerhorn Cc: Christoph Lameter Cc: David Rientjes Cc: Mel Gorman Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mempolicy.h | 14 +++++++------- kernel/cpuset.c | 4 ++-- kernel/fork.c | 4 ++-- mm/mempolicy.c | 6 +++--- mm/mmap.c | 4 ++-- 5 files changed, 16 insertions(+), 16 deletions(-) diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index 507bf5e29f24..5e19c2275a6f 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -73,10 +73,10 @@ struct mm_struct; * Mempolicy objects are reference counted. A mempolicy will be freed when * mpol_put() decrements the reference count to zero. * - * Copying policy objects: - * mpol_copy() allocates a new mempolicy and copies the specified mempolicy + * Duplicating policy objects: + * mpol_dup() allocates a new mempolicy and copies the specified mempolicy * to the new storage. The reference count of the new object is initialized - * to 1, representing the caller of mpol_copy(). + * to 1, representing the caller of mpol_dup(). */ struct mempolicy { atomic_t refcnt; @@ -105,11 +105,11 @@ static inline void mpol_put(struct mempolicy *pol) __mpol_put(pol); } -extern struct mempolicy *__mpol_copy(struct mempolicy *pol); -static inline struct mempolicy *mpol_copy(struct mempolicy *pol) +extern struct mempolicy *__mpol_dup(struct mempolicy *pol); +static inline struct mempolicy *mpol_dup(struct mempolicy *pol) { if (pol) - pol = __mpol_copy(pol); + pol = __mpol_dup(pol); return pol; } @@ -198,7 +198,7 @@ static inline void mpol_get(struct mempolicy *pol) { } -static inline struct mempolicy *mpol_copy(struct mempolicy *old) +static inline struct mempolicy *mpol_dup(struct mempolicy *old) { return NULL; } diff --git a/kernel/cpuset.c b/kernel/cpuset.c index c9923e3c9a3b..024888bb9814 100644 --- a/kernel/cpuset.c +++ b/kernel/cpuset.c @@ -941,7 +941,7 @@ static int update_nodemask(struct cpuset *cs, char *buf) cs->mems_generation = cpuset_mems_generation++; mutex_unlock(&callback_mutex); - cpuset_being_rebound = cs; /* causes mpol_copy() rebind */ + cpuset_being_rebound = cs; /* causes mpol_dup() rebind */ fudge = 10; /* spare mmarray[] slots */ fudge += cpus_weight(cs->cpus_allowed); /* imagine one fork-bomb/cpu */ @@ -992,7 +992,7 @@ static int update_nodemask(struct cpuset *cs, char *buf) * rebind the vma mempolicies of each mm in mmarray[] to their * new cpuset, and release that mm. The mpol_rebind_mm() * call takes mmap_sem, which we couldn't take while holding - * tasklist_lock. Forks can happen again now - the mpol_copy() + * tasklist_lock. Forks can happen again now - the mpol_dup() * cpuset_being_rebound check will catch such forks, and rebind * their vma mempolicies too. Because we still hold the global * cgroup_mutex, we know that no other rebind effort will diff --git a/kernel/fork.c b/kernel/fork.c index 1a5ae2084574..6067e429f281 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -279,7 +279,7 @@ static int dup_mmap(struct mm_struct *mm, struct mm_struct *oldmm) if (!tmp) goto fail_nomem; *tmp = *mpnt; - pol = mpol_copy(vma_policy(mpnt)); + pol = mpol_dup(vma_policy(mpnt)); retval = PTR_ERR(pol); if (IS_ERR(pol)) goto fail_nomem_policy; @@ -1116,7 +1116,7 @@ static struct task_struct *copy_process(unsigned long clone_flags, p->audit_context = NULL; cgroup_fork(p); #ifdef CONFIG_NUMA - p->mempolicy = mpol_copy(p->mempolicy); + p->mempolicy = mpol_dup(p->mempolicy); if (IS_ERR(p->mempolicy)) { retval = PTR_ERR(p->mempolicy); p->mempolicy = NULL; diff --git a/mm/mempolicy.c b/mm/mempolicy.c index ce2c5b6bf9f8..e9fc1c1ae66c 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1566,15 +1566,15 @@ struct page *alloc_pages_current(gfp_t gfp, unsigned order) EXPORT_SYMBOL(alloc_pages_current); /* - * If mpol_copy() sees current->cpuset == cpuset_being_rebound, then it + * If mpol_dup() sees current->cpuset == cpuset_being_rebound, then it * rebinds the mempolicy its copying by calling mpol_rebind_policy() * with the mems_allowed returned by cpuset_mems_allowed(). This * keeps mempolicies cpuset relative after its cpuset moves. See * further kernel/cpuset.c update_nodemask(). */ -/* Slow path of a mempolicy copy */ -struct mempolicy *__mpol_copy(struct mempolicy *old) +/* Slow path of a mempolicy duplicate */ +struct mempolicy *__mpol_dup(struct mempolicy *old) { struct mempolicy *new = kmem_cache_alloc(policy_cache, GFP_KERNEL); diff --git a/mm/mmap.c b/mm/mmap.c index 36c85e04fa93..677d184b0d42 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1810,7 +1810,7 @@ int split_vma(struct mm_struct * mm, struct vm_area_struct * vma, new->vm_pgoff += ((addr - vma->vm_start) >> PAGE_SHIFT); } - pol = mpol_copy(vma_policy(vma)); + pol = mpol_dup(vma_policy(vma)); if (IS_ERR(pol)) { kmem_cache_free(vm_area_cachep, new); return PTR_ERR(pol); @@ -2126,7 +2126,7 @@ struct vm_area_struct *copy_vma(struct vm_area_struct **vmap, new_vma = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL); if (new_vma) { *new_vma = *vma; - pol = mpol_copy(vma_policy(vma)); + pol = mpol_dup(vma_policy(vma)); if (IS_ERR(pol)) { kmem_cache_free(vm_area_cachep, new_vma); return NULL; -- cgit v1.2.3 From f4e53d910b7dde2685b177f1e7c3e3e0b4a42f7b Mon Sep 17 00:00:00 2001 From: Lee Schermerhorn Date: Mon, 28 Apr 2008 02:13:10 -0700 Subject: mempolicy: write lock mmap_sem while changing task mempolicy A read of /proc//numa_maps holds the target task's mmap_sem for read while examining each vma's mempolicy. A vma's mempolicy can fall back to the task's policy. However, the task could be changing it's task policy and free the one that the show_numa_maps() is examining. To prevent this, grab the mmap_sem for write when updating task mempolicy. Pointed out to me by Christoph Lameter and extracted and reworked from Christoph's alternative mempol reference counting patch. This is analogous to the way that do_mbind() and do_get_mempolicy() prevent races between task's sharing an mm_struct [a.k.a. threads] setting and querying a mempolicy for a particular address. Note: this is necessary, but not sufficient, to allow us to stop taking an extra reference on "other task's mempolicy" in get_vma_policy. Subsequent patches will complete this update, allowing us to simplify the tests for whether we need to unref a mempolicy at various points in the code. Signed-off-by: Lee Schermerhorn Cc: Christoph Lameter Cc: David Rientjes Cc: Mel Gorman Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/mempolicy.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index e9fc1c1ae66c..c6c61ea6bb8c 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -591,16 +591,29 @@ static long do_set_mempolicy(unsigned short mode, unsigned short flags, nodemask_t *nodes) { struct mempolicy *new; + struct mm_struct *mm = current->mm; new = mpol_new(mode, flags, nodes); if (IS_ERR(new)) return PTR_ERR(new); + + /* + * prevent changing our mempolicy while show_numa_maps() + * is using it. + * Note: do_set_mempolicy() can be called at init time + * with no 'mm'. + */ + if (mm) + down_write(&mm->mmap_sem); mpol_put(current->mempolicy); current->mempolicy = new; mpol_set_task_struct_flag(); if (new && new->policy == MPOL_INTERLEAVE && nodes_weight(new->v.nodes)) current->il_next = first_node(new->v.nodes); + if (mm) + up_write(&mm->mmap_sem); + return 0; } -- cgit v1.2.3 From ae4d8c16aa22775f5731677abb8a82f03cec877e Mon Sep 17 00:00:00 2001 From: Lee Schermerhorn Date: Mon, 28 Apr 2008 02:13:11 -0700 Subject: mempolicy: fixup Fallback for Default Shmem Policy get_vma_policy() is not handling fallback to task policy correctly when the get_policy() vm_op returns NULL. The NULL overwrites the 'pol' variable that was holding the fallback task mempolicy. So, it was falling back directly to system default policy. Fix get_vma_policy() to use only non-NULL policy returned from the vma get_policy op. shm_get_policy() was falling back to current task's mempolicy if the "backing file system" [tmpfs vs hugetlbfs] does not support the get_policy vm_op and the vma policy is null. This is incorrect for show_numa_maps() which is likely querying the numa_maps of some task other than current. Remove this fallback. Signed-off-by: Lee Schermerhorn Cc: Christoph Lameter Cc: David Rientjes Cc: Mel Gorman Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- ipc/shm.c | 3 +-- mm/mempolicy.c | 7 +++++-- 2 files changed, 6 insertions(+), 4 deletions(-) diff --git a/ipc/shm.c b/ipc/shm.c index cc63fae02f06..8d1b2c468cc4 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -274,8 +274,7 @@ static struct mempolicy *shm_get_policy(struct vm_area_struct *vma, else if (vma->vm_policy) { pol = vma->vm_policy; mpol_get(pol); /* get_vma_policy() expects this */ - } else - pol = current->mempolicy; + } return pol; } #endif diff --git a/mm/mempolicy.c b/mm/mempolicy.c index c6c61ea6bb8c..8924aaf4665c 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1262,7 +1262,7 @@ asmlinkage long compat_sys_mbind(compat_ulong_t start, compat_ulong_t len, * @task != current]. It is the caller's responsibility to * free the reference in these cases. */ -static struct mempolicy * get_vma_policy(struct task_struct *task, +static struct mempolicy *get_vma_policy(struct task_struct *task, struct vm_area_struct *vma, unsigned long addr) { struct mempolicy *pol = task->mempolicy; @@ -1270,7 +1270,10 @@ static struct mempolicy * get_vma_policy(struct task_struct *task, if (vma) { if (vma->vm_ops && vma->vm_ops->get_policy) { - pol = vma->vm_ops->get_policy(vma, addr); + struct mempolicy *vpol = vma->vm_ops->get_policy(vma, + addr); + if (vpol) + pol = vpol; shared_pol = 1; /* if pol non-NULL, add ref below */ } else if (vma->vm_policy && vma->vm_policy->policy != MPOL_DEFAULT) -- cgit v1.2.3 From 45c4745af381851b0406d8e4db99e62e265691c2 Mon Sep 17 00:00:00 2001 From: Lee Schermerhorn Date: Mon, 28 Apr 2008 02:13:12 -0700 Subject: mempolicy: rename struct mempolicy 'policy' member to 'mode' The terms 'policy' and 'mode' are both used in various places to describe the semantics of the value stored in the 'policy' member of struct mempolicy. Furthermore, the term 'policy' is used to refer to that member, to the entire struct mempolicy and to the more abstract concept of the tuple consisting of a "mode" and an optional node or set of nodes. Recently, we have added "mode flags" that are passed in the upper bits of the 'mode' [or sometimes, 'policy'] member of the numa APIs. I'd like to resolve this confusion, which perhaps only exists in my mind, by renaming the 'policy' member to 'mode' throughout, and fixing up the Documentation. Man pages will be updated separately. Signed-off-by: Lee Schermerhorn Cc: Christoph Lameter Cc: David Rientjes Cc: Mel Gorman Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/numa_memory_policy.txt | 4 --- include/linux/mempolicy.h | 6 ++--- mm/mempolicy.c | 46 ++++++++++++++++----------------- 3 files changed, 26 insertions(+), 30 deletions(-) diff --git a/Documentation/vm/numa_memory_policy.txt b/Documentation/vm/numa_memory_policy.txt index 1c7dd21623d2..27b9507a3769 100644 --- a/Documentation/vm/numa_memory_policy.txt +++ b/Documentation/vm/numa_memory_policy.txt @@ -145,10 +145,6 @@ Components of Memory Policies structure, struct mempolicy. Details of this structure will be discussed in context, below, as required to explain the behavior. - Note: in some functions AND in the struct mempolicy itself, the mode - is called "policy". However, to avoid confusion with the policy tuple, - this document will continue to use the term "mode". - Linux memory policy supports the following 4 behavioral modes: Default Mode--MPOL_DEFAULT: The behavior specified by this mode is diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index 5e19c2275a6f..9080fab1426d 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -80,7 +80,7 @@ struct mm_struct; */ struct mempolicy { atomic_t refcnt; - unsigned short policy; /* See MPOL_* above */ + unsigned short mode; /* See MPOL_* above */ unsigned short flags; /* See set_mempolicy() MPOL_F_* above */ union { short preferred_node; /* preferred */ @@ -149,7 +149,7 @@ struct shared_policy { spinlock_t lock; }; -void mpol_shared_policy_init(struct shared_policy *info, unsigned short policy, +void mpol_shared_policy_init(struct shared_policy *info, unsigned short mode, unsigned short flags, nodemask_t *nodes); int mpol_set_shared_policy(struct shared_policy *info, struct vm_area_struct *vma, @@ -213,7 +213,7 @@ static inline int mpol_set_shared_policy(struct shared_policy *info, } static inline void mpol_shared_policy_init(struct shared_policy *info, - unsigned short policy, unsigned short flags, nodemask_t *nodes) + unsigned short mode, unsigned short flags, nodemask_t *nodes) { } diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 8924aaf4665c..5e7eea2dc8b4 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -106,7 +106,7 @@ enum zone_type policy_zone = 0; struct mempolicy default_policy = { .refcnt = ATOMIC_INIT(1), /* never free it */ - .policy = MPOL_DEFAULT, + .mode = MPOL_DEFAULT, }; static const struct mempolicy_operations { @@ -211,7 +211,7 @@ static struct mempolicy *mpol_new(unsigned short mode, unsigned short flags, if (!policy) return ERR_PTR(-ENOMEM); atomic_set(&policy->refcnt, 1); - policy->policy = mode; + policy->mode = mode; policy->flags = flags; if (nodes) { @@ -302,7 +302,7 @@ static void mpol_rebind_policy(struct mempolicy *pol, if (!mpol_store_user_nodemask(pol) && nodes_equal(pol->w.cpuset_mems_allowed, *newmask)) return; - mpol_ops[pol->policy].rebind(pol, newmask); + mpol_ops[pol->mode].rebind(pol, newmask); } /* @@ -608,7 +608,7 @@ static long do_set_mempolicy(unsigned short mode, unsigned short flags, mpol_put(current->mempolicy); current->mempolicy = new; mpol_set_task_struct_flag(); - if (new && new->policy == MPOL_INTERLEAVE && + if (new && new->mode == MPOL_INTERLEAVE && nodes_weight(new->v.nodes)) current->il_next = first_node(new->v.nodes); if (mm) @@ -621,7 +621,7 @@ static long do_set_mempolicy(unsigned short mode, unsigned short flags, static void get_zonemask(struct mempolicy *p, nodemask_t *nodes) { nodes_clear(*nodes); - switch (p->policy) { + switch (p->mode) { case MPOL_DEFAULT: break; case MPOL_BIND: @@ -700,14 +700,14 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask, goto out; *policy = err; } else if (pol == current->mempolicy && - pol->policy == MPOL_INTERLEAVE) { + pol->mode == MPOL_INTERLEAVE) { *policy = current->il_next; } else { err = -EINVAL; goto out; } } else - *policy = pol->policy | pol->flags; + *policy = pol->mode | pol->flags; if (vma) { up_read(¤t->mm->mmap_sem); @@ -1276,7 +1276,7 @@ static struct mempolicy *get_vma_policy(struct task_struct *task, pol = vpol; shared_pol = 1; /* if pol non-NULL, add ref below */ } else if (vma->vm_policy && - vma->vm_policy->policy != MPOL_DEFAULT) + vma->vm_policy->mode != MPOL_DEFAULT) pol = vma->vm_policy; } if (!pol) @@ -1290,7 +1290,7 @@ static struct mempolicy *get_vma_policy(struct task_struct *task, static nodemask_t *nodemask_policy(gfp_t gfp, struct mempolicy *policy) { /* Lower zones don't get a nodemask applied for MPOL_BIND */ - if (unlikely(policy->policy == MPOL_BIND) && + if (unlikely(policy->mode == MPOL_BIND) && gfp_zone(gfp) >= policy_zone && cpuset_nodemask_valid_mems_allowed(&policy->v.nodes)) return &policy->v.nodes; @@ -1303,7 +1303,7 @@ static struct zonelist *zonelist_policy(gfp_t gfp, struct mempolicy *policy) { int nd; - switch (policy->policy) { + switch (policy->mode) { case MPOL_PREFERRED: nd = policy->v.preferred_node; if (nd < 0) @@ -1353,7 +1353,7 @@ static unsigned interleave_nodes(struct mempolicy *policy) */ unsigned slab_node(struct mempolicy *policy) { - unsigned short pol = policy ? policy->policy : MPOL_DEFAULT; + unsigned short pol = policy ? policy->mode : MPOL_DEFAULT; switch (pol) { case MPOL_INTERLEAVE: @@ -1454,9 +1454,9 @@ struct zonelist *huge_zonelist(struct vm_area_struct *vma, unsigned long addr, *mpol = NULL; /* probably no unref needed */ *nodemask = NULL; /* assume !MPOL_BIND */ - if (pol->policy == MPOL_BIND) { + if (pol->mode == MPOL_BIND) { *nodemask = &pol->v.nodes; - } else if (pol->policy == MPOL_INTERLEAVE) { + } else if (pol->mode == MPOL_INTERLEAVE) { unsigned nid; nid = interleave_nid(pol, vma, addr, HPAGE_SHIFT); @@ -1468,7 +1468,7 @@ struct zonelist *huge_zonelist(struct vm_area_struct *vma, unsigned long addr, zl = zonelist_policy(GFP_HIGHUSER, pol); if (unlikely(pol != &default_policy && pol != current->mempolicy)) { - if (pol->policy != MPOL_BIND) + if (pol->mode != MPOL_BIND) __mpol_put(pol); /* finished with pol */ else *mpol = pol; /* unref needed after allocation */ @@ -1522,7 +1522,7 @@ alloc_page_vma(gfp_t gfp, struct vm_area_struct *vma, unsigned long addr) cpuset_update_task_memory_state(); - if (unlikely(pol->policy == MPOL_INTERLEAVE)) { + if (unlikely(pol->mode == MPOL_INTERLEAVE)) { unsigned nid; nid = interleave_nid(pol, vma, addr, PAGE_SHIFT); @@ -1574,7 +1574,7 @@ struct page *alloc_pages_current(gfp_t gfp, unsigned order) cpuset_update_task_memory_state(); if (!pol || in_interrupt() || (gfp & __GFP_THISNODE)) pol = &default_policy; - if (pol->policy == MPOL_INTERLEAVE) + if (pol->mode == MPOL_INTERLEAVE) return alloc_page_interleave(gfp, order, interleave_nodes(pol)); return __alloc_pages_nodemask(gfp, order, zonelist_policy(gfp, pol), nodemask_policy(gfp, pol)); @@ -1620,11 +1620,11 @@ int __mpol_equal(struct mempolicy *a, struct mempolicy *b) { if (!a || !b) return 0; - if (a->policy != b->policy) + if (a->mode != b->mode) return 0; - if (a->policy != MPOL_DEFAULT && !mpol_match_intent(a, b)) + if (a->mode != MPOL_DEFAULT && !mpol_match_intent(a, b)) return 0; - switch (a->policy) { + switch (a->mode) { case MPOL_DEFAULT: return 1; case MPOL_BIND: @@ -1644,7 +1644,7 @@ void __mpol_put(struct mempolicy *p) { if (!atomic_dec_and_test(&p->refcnt)) return; - p->policy = MPOL_DEFAULT; + p->mode = MPOL_DEFAULT; kmem_cache_free(policy_cache, p); } @@ -1710,7 +1710,7 @@ static void sp_insert(struct shared_policy *sp, struct sp_node *new) rb_link_node(&new->nd, parent, p); rb_insert_color(&new->nd, &sp->root); pr_debug("inserting %lx-%lx: %d\n", new->start, new->end, - new->policy ? new->policy->policy : 0); + new->policy ? new->policy->mode : 0); } /* Find shared policy intersecting idx */ @@ -1835,7 +1835,7 @@ int mpol_set_shared_policy(struct shared_policy *info, pr_debug("set_shared_policy %lx sz %lu %d %d %lx\n", vma->vm_pgoff, - sz, npol ? npol->policy : -1, + sz, npol ? npol->mode : -1, npol ? npol->flags : -1, npol ? nodes_addr(npol->v.nodes)[0] : -1); @@ -1935,7 +1935,7 @@ static inline int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) char *p = buffer; int l; nodemask_t nodes; - unsigned short mode = pol ? pol->policy : MPOL_DEFAULT; + unsigned short mode = pol ? pol->mode : MPOL_DEFAULT; unsigned short flags = pol ? pol->flags : 0; switch (mode) { -- cgit v1.2.3 From aab0b1029f0843756b68e0ed3ca983685bf43ed6 Mon Sep 17 00:00:00 2001 From: Lee Schermerhorn Date: Mon, 28 Apr 2008 02:13:13 -0700 Subject: mempolicy: mark shared policies for unref As part of yet another rework of mempolicy reference counting, we want to be able to identify shared policies efficiently, because they have an extra ref taken on lookup that needs to be removed when we're finished using the policy. Note: the extra ref is required because the policies are shared between tasks/processes and can be changed/freed by one task while another task is using them--e.g., for page allocation. Building on David Rientjes mempolicy "mode flags" enhancement, this patch indicates a "shared" policy by setting a new MPOL_F_SHARED flag in the flags member of the struct mempolicy added by David. MPOL_F_SHARED, and any future "internal mode flags" are reserved from bit zero up, as they will never be passed in the upper bits of the mode argument of a mempolicy API. I set the MPOL_F_SHARED flag when the policy is installed in the shared policy rb-tree. Don't need/want to clear the flag when removing from the tree as the mempolicy is freed [unref'd] internally to the sp_delete() function. However, a task could hold another reference on this mempolicy from a prior lookup. We need the MPOL_F_SHARED flag to stay put so that any tasks holding a ref will unref, eventually freeing, the mempolicy. A later patch in this series will introduce a function to conditionally unref [mpol_free] a policy. The MPOL_F_SHARED flag is one reason [currently the only reason] to unref/free a policy via the conditional free. Signed-off-by: Lee Schermerhorn Cc: Christoph Lameter Cc: David Rientjes Cc: Mel Gorman Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mempolicy.h | 7 +++++++ mm/mempolicy.c | 1 + 2 files changed, 8 insertions(+) diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index 9080fab1426d..017def89e568 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -44,6 +44,13 @@ enum { #define MPOL_MF_MOVE_ALL (1<<2) /* Move every page to conform to mapping */ #define MPOL_MF_INTERNAL (1<<3) /* Internal flags start here */ +/* + * Internal flags that share the struct mempolicy flags word with + * "mode flags". These flags are allocated from bit 0 up, as they + * are never OR'ed into the mode in mempolicy API arguments. + */ +#define MPOL_F_SHARED (1 << 0) /* identify shared policies */ + #ifdef __KERNEL__ #include diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 5e7eea2dc8b4..78b18a60b9b2 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1750,6 +1750,7 @@ static struct sp_node *sp_alloc(unsigned long start, unsigned long end, n->start = start; n->end = end; mpol_get(pol); + pol->flags |= MPOL_F_SHARED; /* for unref */ n->policy = pol; return n; } -- cgit v1.2.3 From a6020ed759404372e8be2b276e85e51735472cc9 Mon Sep 17 00:00:00 2001 From: Lee Schermerhorn Date: Mon, 28 Apr 2008 02:13:14 -0700 Subject: mempolicy: document {set|get}_policy() vm_ops APIs Document mempolicy return value reference semantics assumed by the rest of the mempolicy code for the set_ and get_policy vm_ops in --where the prototypes are defined--to inform any future mempolicy vm_op writers what the rest of the subsystem expects of them. Signed-off-by: Lee Schermerhorn Cc: Christoph Lameter Cc: David Rientjes Cc: Mel Gorman Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mm.h | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index bc0ad24cf8c0..8b7f4a5d4f6a 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -172,7 +172,25 @@ struct vm_operations_struct { * writable, if an error is returned it will cause a SIGBUS */ int (*page_mkwrite)(struct vm_area_struct *vma, struct page *page); #ifdef CONFIG_NUMA + /* + * set_policy() op must add a reference to any non-NULL @new mempolicy + * to hold the policy upon return. Caller should pass NULL @new to + * remove a policy and fall back to surrounding context--i.e. do not + * install a MPOL_DEFAULT policy, nor the task or system default + * mempolicy. + */ int (*set_policy)(struct vm_area_struct *vma, struct mempolicy *new); + + /* + * get_policy() op must add reference [mpol_get()] to any policy at + * (vma,addr) marked as MPOL_SHARED. The shared policy infrastructure + * in mm/mempolicy.c will do this automatically. + * get_policy() must NOT add a ref if the policy at (vma,addr) is not + * marked as MPOL_SHARED. vma policies are protected by the mmap_sem. + * If no [shared/vma] mempolicy exists at the addr, get_policy() op + * must return NULL--i.e., do not "fallback" to task or system default + * policy. + */ struct mempolicy *(*get_policy)(struct vm_area_struct *vma, unsigned long addr); int (*migrate)(struct vm_area_struct *vma, const nodemask_t *from, -- cgit v1.2.3 From 52cd3b074050dd664380b5e8cfc85d4a6ed8ad48 Mon Sep 17 00:00:00 2001 From: Lee Schermerhorn Date: Mon, 28 Apr 2008 02:13:16 -0700 Subject: mempolicy: rework mempolicy Reference Counting [yet again] After further discussion with Christoph Lameter, it has become clear that my earlier attempts to clean up the mempolicy reference counting were a bit of overkill in some areas, resulting in superflous ref/unref in what are usually fast paths. In other areas, further inspection reveals that I botched the unref for interleave policies. A separate patch, suitable for upstream/stable trees, fixes up the known errors in the previous attempt to fix reference counting. This patch reworks the memory policy referencing counting and, one hopes, simplifies the code. Maybe I'll get it right this time. See the update to the numa_memory_policy.txt document for a discussion of memory policy reference counting that motivates this patch. Summary: Lookup of mempolicy, based on (vma, address) need only add a reference for shared policy, and we need only unref the policy when finished for shared policies. So, this patch backs out all of the unneeded extra reference counting added by my previous attempt. It then unrefs only shared policies when we're finished with them, using the mpol_cond_put() [conditional put] helper function introduced by this patch. Note that shmem_swapin() calls read_swap_cache_async() with a dummy vma containing just the policy. read_swap_cache_async() can call alloc_page_vma() multiple times, so we can't let alloc_page_vma() unref the shared policy in this case. To avoid this, we make a copy of any non-null shared policy and remove the MPOL_F_SHARED flag from the copy. This copy occurs before reading a page [or multiple pages] from swap, so the overhead should not be an issue here. I introduced a new static inline function "mpol_cond_copy()" to copy the shared policy to an on-stack policy and remove the flags that would require a conditional free. The current implementation of mpol_cond_copy() assumes that the struct mempolicy contains no pointers to dynamically allocated structures that must be duplicated or reference counted during copy. Signed-off-by: Lee Schermerhorn Cc: Christoph Lameter Cc: David Rientjes Cc: Mel Gorman Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/numa_memory_policy.txt | 68 +++++++++++++++ include/linux/mempolicy.h | 35 ++++++++ ipc/shm.c | 5 +- mm/hugetlb.c | 2 +- mm/mempolicy.c | 146 +++++++++++++++++--------------- mm/shmem.c | 16 ++-- 6 files changed, 195 insertions(+), 77 deletions(-) diff --git a/Documentation/vm/numa_memory_policy.txt b/Documentation/vm/numa_memory_policy.txt index 27b9507a3769..6719d642653f 100644 --- a/Documentation/vm/numa_memory_policy.txt +++ b/Documentation/vm/numa_memory_policy.txt @@ -311,6 +311,74 @@ Components of Memory Policies MPOL_PREFERRED policies that were created with an empty nodemask (local allocation). +MEMORY POLICY REFERENCE COUNTING + +To resolve use/free races, struct mempolicy contains an atomic reference +count field. Internal interfaces, mpol_get()/mpol_put() increment and +decrement this reference count, respectively. mpol_put() will only free +the structure back to the mempolicy kmem cache when the reference count +goes to zero. + +When a new memory policy is allocated, it's reference count is initialized +to '1', representing the reference held by the task that is installing the +new policy. When a pointer to a memory policy structure is stored in another +structure, another reference is added, as the task's reference will be dropped +on completion of the policy installation. + +During run-time "usage" of the policy, we attempt to minimize atomic operations +on the reference count, as this can lead to cache lines bouncing between cpus +and NUMA nodes. "Usage" here means one of the following: + +1) querying of the policy, either by the task itself [using the get_mempolicy() + API discussed below] or by another task using the /proc//numa_maps + interface. + +2) examination of the policy to determine the policy mode and associated node + or node lists, if any, for page allocation. This is considered a "hot + path". Note that for MPOL_BIND, the "usage" extends across the entire + allocation process, which may sleep during page reclaimation, because the + BIND policy nodemask is used, by reference, to filter ineligible nodes. + +We can avoid taking an extra reference during the usages listed above as +follows: + +1) we never need to get/free the system default policy as this is never + changed nor freed, once the system is up and running. + +2) for querying the policy, we do not need to take an extra reference on the + target task's task policy nor vma policies because we always acquire the + task's mm's mmap_sem for read during the query. The set_mempolicy() and + mbind() APIs [see below] always acquire the mmap_sem for write when + installing or replacing task or vma policies. Thus, there is no possibility + of a task or thread freeing a policy while another task or thread is + querying it. + +3) Page allocation usage of task or vma policy occurs in the fault path where + we hold them mmap_sem for read. Again, because replacing the task or vma + policy requires that the mmap_sem be held for write, the policy can't be + freed out from under us while we're using it for page allocation. + +4) Shared policies require special consideration. One task can replace a + shared memory policy while another task, with a distinct mmap_sem, is + querying or allocating a page based on the policy. To resolve this + potential race, the shared policy infrastructure adds an extra reference + to the shared policy during lookup while holding a spin lock on the shared + policy management structure. This requires that we drop this extra + reference when we're finished "using" the policy. We must drop the + extra reference on shared policies in the same query/allocation paths + used for non-shared policies. For this reason, shared policies are marked + as such, and the extra reference is dropped "conditionally"--i.e., only + for shared policies. + + Because of this extra reference counting, and because we must lookup + shared policies in a tree structure under spinlock, shared policies are + more expensive to use in the page allocation path. This is expecially + true for shared policies on shared memory regions shared by tasks running + on different NUMA nodes. This extra overhead can be avoided by always + falling back to task or system default policy for shared memory regions, + or by prefaulting the entire shared memory region into memory and locking + it down. However, this might not be appropriate for all applications. + MEMORY POLICY APIs Linux supports 3 system calls for controlling memory policy. These APIS diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index 017def89e568..172b9c6acb91 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -112,6 +112,31 @@ static inline void mpol_put(struct mempolicy *pol) __mpol_put(pol); } +/* + * Does mempolicy pol need explicit unref after use? + * Currently only needed for shared policies. + */ +static inline int mpol_needs_cond_ref(struct mempolicy *pol) +{ + return (pol && (pol->flags & MPOL_F_SHARED)); +} + +static inline void mpol_cond_put(struct mempolicy *pol) +{ + if (mpol_needs_cond_ref(pol)) + __mpol_put(pol); +} + +extern struct mempolicy *__mpol_cond_copy(struct mempolicy *tompol, + struct mempolicy *frompol); +static inline struct mempolicy *mpol_cond_copy(struct mempolicy *tompol, + struct mempolicy *frompol) +{ + if (!frompol) + return frompol; + return __mpol_cond_copy(tompol, frompol); +} + extern struct mempolicy *__mpol_dup(struct mempolicy *pol); static inline struct mempolicy *mpol_dup(struct mempolicy *pol) { @@ -201,6 +226,16 @@ static inline void mpol_put(struct mempolicy *p) { } +static inline void mpol_cond_put(struct mempolicy *pol) +{ +} + +static inline struct mempolicy *mpol_cond_copy(struct mempolicy *to, + struct mempolicy *from) +{ + return from; +} + static inline void mpol_get(struct mempolicy *pol) { } diff --git a/ipc/shm.c b/ipc/shm.c index 8d1b2c468cc4..e636910454a9 100644 --- a/ipc/shm.c +++ b/ipc/shm.c @@ -271,10 +271,9 @@ static struct mempolicy *shm_get_policy(struct vm_area_struct *vma, if (sfd->vm_ops->get_policy) pol = sfd->vm_ops->get_policy(vma, addr); - else if (vma->vm_policy) { + else if (vma->vm_policy) pol = vma->vm_policy; - mpol_get(pol); /* get_vma_policy() expects this */ - } + return pol; } #endif diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 53afa8c76ada..d36e1f11a5f2 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -116,7 +116,7 @@ static struct page *dequeue_huge_page_vma(struct vm_area_struct *vma, break; } } - mpol_put(mpol); /* unref if mpol !NULL */ + mpol_cond_put(mpol); return page; } diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 78b18a60b9b2..a237295f8190 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -241,6 +241,15 @@ static struct mempolicy *mpol_new(unsigned short mode, unsigned short flags, return policy; } +/* Slow path of a mpol destructor. */ +void __mpol_put(struct mempolicy *p) +{ + if (!atomic_dec_and_test(&p->refcnt)) + return; + p->mode = MPOL_DEFAULT; + kmem_cache_free(policy_cache, p); +} + static void mpol_rebind_default(struct mempolicy *pol, const nodemask_t *nodes) { } @@ -719,6 +728,7 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask, get_zonemask(pol, nmask); out: + mpol_cond_put(pol); if (vma) up_read(¤t->mm->mmap_sem); return err; @@ -1257,16 +1267,18 @@ asmlinkage long compat_sys_mbind(compat_ulong_t start, compat_ulong_t len, * * Returns effective policy for a VMA at specified address. * Falls back to @task or system default policy, as necessary. - * Returned policy has extra reference count if shared, vma, - * or some other task's policy [show_numa_maps() can pass - * @task != current]. It is the caller's responsibility to - * free the reference in these cases. + * Current or other task's task mempolicy and non-shared vma policies + * are protected by the task's mmap_sem, which must be held for read by + * the caller. + * Shared policies [those marked as MPOL_F_SHARED] require an extra reference + * count--added by the get_policy() vm_op, as appropriate--to protect against + * freeing by another task. It is the caller's responsibility to free the + * extra reference for shared policies. */ static struct mempolicy *get_vma_policy(struct task_struct *task, struct vm_area_struct *vma, unsigned long addr) { struct mempolicy *pol = task->mempolicy; - int shared_pol = 0; if (vma) { if (vma->vm_ops && vma->vm_ops->get_policy) { @@ -1274,20 +1286,20 @@ static struct mempolicy *get_vma_policy(struct task_struct *task, addr); if (vpol) pol = vpol; - shared_pol = 1; /* if pol non-NULL, add ref below */ } else if (vma->vm_policy && vma->vm_policy->mode != MPOL_DEFAULT) pol = vma->vm_policy; } if (!pol) pol = &default_policy; - else if (!shared_pol && pol != current->mempolicy) - mpol_get(pol); /* vma or other task's policy */ return pol; } -/* Return a nodemask representing a mempolicy */ -static nodemask_t *nodemask_policy(gfp_t gfp, struct mempolicy *policy) +/* + * Return a nodemask representing a mempolicy for filtering nodes for + * page allocation + */ +static nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy) { /* Lower zones don't get a nodemask applied for MPOL_BIND */ if (unlikely(policy->mode == MPOL_BIND) && @@ -1298,8 +1310,8 @@ static nodemask_t *nodemask_policy(gfp_t gfp, struct mempolicy *policy) return NULL; } -/* Return a zonelist representing a mempolicy */ -static struct zonelist *zonelist_policy(gfp_t gfp, struct mempolicy *policy) +/* Return a zonelist indicated by gfp for node representing a mempolicy */ +static struct zonelist *policy_zonelist(gfp_t gfp, struct mempolicy *policy) { int nd; @@ -1311,10 +1323,10 @@ static struct zonelist *zonelist_policy(gfp_t gfp, struct mempolicy *policy) break; case MPOL_BIND: /* - * Normally, MPOL_BIND allocations node-local are node-local - * within the allowed nodemask. However, if __GFP_THISNODE is - * set and the current node is part of the mask, we use the - * the zonelist for the first node in the mask instead. + * Normally, MPOL_BIND allocations are node-local within the + * allowed nodemask. However, if __GFP_THISNODE is set and the + * current node is part of the mask, we use the zonelist for + * the first node in the mask instead. */ nd = numa_node_id(); if (unlikely(gfp & __GFP_THISNODE) && @@ -1350,6 +1362,10 @@ static unsigned interleave_nodes(struct mempolicy *policy) /* * Depending on the memory policy provide a node from which to allocate the * next slab entry. + * @policy must be protected by freeing by the caller. If @policy is + * the current task's mempolicy, this protection is implicit, as only the + * task can change it's policy. The system default policy requires no + * such protection. */ unsigned slab_node(struct mempolicy *policy) { @@ -1435,43 +1451,27 @@ static inline unsigned interleave_nid(struct mempolicy *pol, * @mpol = pointer to mempolicy pointer for reference counted mempolicy * @nodemask = pointer to nodemask pointer for MPOL_BIND nodemask * - * Returns a zonelist suitable for a huge page allocation. - * If the effective policy is 'BIND, returns pointer to local node's zonelist, - * and a pointer to the mempolicy's @nodemask for filtering the zonelist. - * If it is also a policy for which get_vma_policy() returns an extra - * reference, we must hold that reference until after the allocation. - * In that case, return policy via @mpol so hugetlb allocation can drop - * the reference. For non-'BIND referenced policies, we can/do drop the - * reference here, so the caller doesn't need to know about the special case - * for default and current task policy. + * Returns a zonelist suitable for a huge page allocation and a pointer + * to the struct mempolicy for conditional unref after allocation. + * If the effective policy is 'BIND, returns a pointer to the mempolicy's + * @nodemask for filtering the zonelist. */ struct zonelist *huge_zonelist(struct vm_area_struct *vma, unsigned long addr, gfp_t gfp_flags, struct mempolicy **mpol, nodemask_t **nodemask) { - struct mempolicy *pol = get_vma_policy(current, vma, addr); struct zonelist *zl; - *mpol = NULL; /* probably no unref needed */ + *mpol = get_vma_policy(current, vma, addr); *nodemask = NULL; /* assume !MPOL_BIND */ - if (pol->mode == MPOL_BIND) { - *nodemask = &pol->v.nodes; - } else if (pol->mode == MPOL_INTERLEAVE) { - unsigned nid; - - nid = interleave_nid(pol, vma, addr, HPAGE_SHIFT); - if (unlikely(pol != &default_policy && - pol != current->mempolicy)) - __mpol_put(pol); /* finished with pol */ - return node_zonelist(nid, gfp_flags); - } - zl = zonelist_policy(GFP_HIGHUSER, pol); - if (unlikely(pol != &default_policy && pol != current->mempolicy)) { - if (pol->mode != MPOL_BIND) - __mpol_put(pol); /* finished with pol */ - else - *mpol = pol; /* unref needed after allocation */ + if (unlikely((*mpol)->mode == MPOL_INTERLEAVE)) { + zl = node_zonelist(interleave_nid(*mpol, vma, addr, + HPAGE_SHIFT), gfp_flags); + } else { + zl = policy_zonelist(gfp_flags, *mpol); + if ((*mpol)->mode == MPOL_BIND) + *nodemask = &(*mpol)->v.nodes; } return zl; } @@ -1526,25 +1526,23 @@ alloc_page_vma(gfp_t gfp, struct vm_area_struct *vma, unsigned long addr) unsigned nid; nid = interleave_nid(pol, vma, addr, PAGE_SHIFT); - if (unlikely(pol != &default_policy && - pol != current->mempolicy)) - __mpol_put(pol); /* finished with pol */ + mpol_cond_put(pol); return alloc_page_interleave(gfp, 0, nid); } - zl = zonelist_policy(gfp, pol); - if (pol != &default_policy && pol != current->mempolicy) { + zl = policy_zonelist(gfp, pol); + if (unlikely(mpol_needs_cond_ref(pol))) { /* - * slow path: ref counted policy -- shared or vma + * slow path: ref counted shared policy */ struct page *page = __alloc_pages_nodemask(gfp, 0, - zl, nodemask_policy(gfp, pol)); + zl, policy_nodemask(gfp, pol)); __mpol_put(pol); return page; } /* * fast path: default or task policy */ - return __alloc_pages_nodemask(gfp, 0, zl, nodemask_policy(gfp, pol)); + return __alloc_pages_nodemask(gfp, 0, zl, policy_nodemask(gfp, pol)); } /** @@ -1574,10 +1572,15 @@ struct page *alloc_pages_current(gfp_t gfp, unsigned order) cpuset_update_task_memory_state(); if (!pol || in_interrupt() || (gfp & __GFP_THISNODE)) pol = &default_policy; + + /* + * No reference counting needed for current->mempolicy + * nor system default_policy + */ if (pol->mode == MPOL_INTERLEAVE) return alloc_page_interleave(gfp, order, interleave_nodes(pol)); return __alloc_pages_nodemask(gfp, order, - zonelist_policy(gfp, pol), nodemask_policy(gfp, pol)); + policy_zonelist(gfp, pol), policy_nodemask(gfp, pol)); } EXPORT_SYMBOL(alloc_pages_current); @@ -1605,6 +1608,28 @@ struct mempolicy *__mpol_dup(struct mempolicy *old) return new; } +/* + * If *frompol needs [has] an extra ref, copy *frompol to *tompol , + * eliminate the * MPOL_F_* flags that require conditional ref and + * [NOTE!!!] drop the extra ref. Not safe to reference *frompol directly + * after return. Use the returned value. + * + * Allows use of a mempolicy for, e.g., multiple allocations with a single + * policy lookup, even if the policy needs/has extra ref on lookup. + * shmem_readahead needs this. + */ +struct mempolicy *__mpol_cond_copy(struct mempolicy *tompol, + struct mempolicy *frompol) +{ + if (!mpol_needs_cond_ref(frompol)) + return frompol; + + *tompol = *frompol; + tompol->flags &= ~MPOL_F_SHARED; /* copy doesn't need unref */ + __mpol_put(frompol); + return tompol; +} + static int mpol_match_intent(const struct mempolicy *a, const struct mempolicy *b) { @@ -1639,15 +1664,6 @@ int __mpol_equal(struct mempolicy *a, struct mempolicy *b) } } -/* Slow path of a mpol destructor. */ -void __mpol_put(struct mempolicy *p) -{ - if (!atomic_dec_and_test(&p->refcnt)) - return; - p->mode = MPOL_DEFAULT; - kmem_cache_free(policy_cache, p); -} - /* * Shared memory backing store policy support. * @@ -2081,11 +2097,7 @@ int show_numa_map(struct seq_file *m, void *v) pol = get_vma_policy(priv->task, vma, vma->vm_start); mpol_to_str(buffer, sizeof(buffer), pol); - /* - * unref shared or other task's mempolicy - */ - if (pol != &default_policy && pol != current->mempolicy) - __mpol_put(pol); + mpol_cond_put(pol); seq_printf(m, "%08lx %s", vma->vm_start, buffer); diff --git a/mm/shmem.c b/mm/shmem.c index 5326876d814d..0b591c669b2d 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1187,16 +1187,19 @@ static void shmem_show_mpol(struct seq_file *seq, unsigned short policy, static struct page *shmem_swapin(swp_entry_t entry, gfp_t gfp, struct shmem_inode_info *info, unsigned long idx) { + struct mempolicy mpol, *spol; struct vm_area_struct pvma; struct page *page; + spol = mpol_cond_copy(&mpol, + mpol_shared_policy_lookup(&info->policy, idx)); + /* Create a pseudo vma that just contains the policy */ pvma.vm_start = 0; pvma.vm_pgoff = idx; pvma.vm_ops = NULL; - pvma.vm_policy = mpol_shared_policy_lookup(&info->policy, idx); + pvma.vm_policy = spol; page = swapin_readahead(entry, gfp, &pvma, 0); - mpol_put(pvma.vm_policy); return page; } @@ -1204,16 +1207,17 @@ static struct page *shmem_alloc_page(gfp_t gfp, struct shmem_inode_info *info, unsigned long idx) { struct vm_area_struct pvma; - struct page *page; /* Create a pseudo vma that just contains the policy */ pvma.vm_start = 0; pvma.vm_pgoff = idx; pvma.vm_ops = NULL; pvma.vm_policy = mpol_shared_policy_lookup(&info->policy, idx); - page = alloc_page_vma(gfp, &pvma, 0); - mpol_put(pvma.vm_policy); - return page; + + /* + * alloc_page_vma() will drop the shared policy reference + */ + return alloc_page_vma(gfp, &pvma, 0); } #else /* !CONFIG_NUMA */ #ifdef CONFIG_TMPFS -- cgit v1.2.3 From bea904d54d6faa92400f10c8ea3d3828b8e1eb93 Mon Sep 17 00:00:00 2001 From: Lee Schermerhorn Date: Mon, 28 Apr 2008 02:13:18 -0700 Subject: mempolicy: use MPOL_PREFERRED for system-wide default policy Currently, when one specifies MPOL_DEFAULT via a NUMA memory policy API [set_mempolicy(), mbind() and internal versions], the kernel simply installs a NULL struct mempolicy pointer in the appropriate context: task policy, vma policy, or shared policy. This causes any use of that policy to "fall back" to the next most specific policy scope. The only use of MPOL_DEFAULT to mean "local allocation" is in the system default policy. This requires extra checks/cases for MPOL_DEFAULT in many mempolicy.c functions. There is another, "preferred" way to specify local allocation via the APIs. That is using the MPOL_PREFERRED policy mode with an empty nodemask. Internally, the empty nodemask gets converted to a preferred_node id of '-1'. All internal usage of MPOL_PREFERRED will convert the '-1' to the id of the node local to the cpu where the allocation occurs. System default policy, except during boot, is hard-coded to "local allocation". By using the MPOL_PREFERRED mode with a negative value of preferred node for system default policy, MPOL_DEFAULT will never occur in the 'policy' member of a struct mempolicy. Thus, we can remove all checks for MPOL_DEFAULT when converting policy to a node id/zonelist in the allocation paths. In slab_node() return local node id when policy pointer is NULL. No need to set a pol value to take the switch default. Replace switch default with BUG()--i.e., shouldn't happen. With this patch MPOL_DEFAULT is only used in the APIs, including internal calls to do_set_mempolicy() and in the display of policy in /proc//numa_maps. It always means "fall back" to the the next most specific policy scope. This simplifies the description of memory policies quite a bit, with no visible change in behavior. get_mempolicy() continues to return MPOL_DEFAULT and an empty nodemask when the requested policy [task or vma/shared] is NULL. These are the values one would supply via set_mempolicy() or mbind() to achieve that condition--default behavior. This patch updates Documentation to reflect this change. Signed-off-by: Lee Schermerhorn Cc: Christoph Lameter Cc: David Rientjes Cc: Mel Gorman Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/numa_memory_policy.txt | 54 +++++++++----------------- mm/mempolicy.c | 68 ++++++++++++++++++++------------- 2 files changed, 60 insertions(+), 62 deletions(-) diff --git a/Documentation/vm/numa_memory_policy.txt b/Documentation/vm/numa_memory_policy.txt index 6719d642653f..13cca5a3cf17 100644 --- a/Documentation/vm/numa_memory_policy.txt +++ b/Documentation/vm/numa_memory_policy.txt @@ -147,35 +147,18 @@ Components of Memory Policies Linux memory policy supports the following 4 behavioral modes: - Default Mode--MPOL_DEFAULT: The behavior specified by this mode is - context or scope dependent. - - As mentioned in the Policy Scope section above, during normal - system operation, the System Default Policy is hard coded to - contain the Default mode. - - In this context, default mode means "local" allocation--that is - attempt to allocate the page from the node associated with the cpu - where the fault occurs. If the "local" node has no memory, or the - node's memory can be exhausted [no free pages available], local - allocation will "fallback to"--attempt to allocate pages from-- - "nearby" nodes, in order of increasing "distance". - - Implementation detail -- subject to change: "Fallback" uses - a per node list of sibling nodes--called zonelists--built at - boot time, or when nodes or memory are added or removed from - the system [memory hotplug]. These per node zonelist are - constructed with nodes in order of increasing distance based - on information provided by the platform firmware. - - When a task/process policy or a shared policy contains the Default - mode, this also means "local allocation", as described above. - - In the context of a VMA, Default mode means "fall back to task - policy"--which may or may not specify Default mode. Thus, Default - mode can not be counted on to mean local allocation when used - on a non-shared region of the address space. However, see - MPOL_PREFERRED below. + Default Mode--MPOL_DEFAULT: This mode is only used in the memory + policy APIs. Internally, MPOL_DEFAULT is converted to the NULL + memory policy in all policy scopes. Any existing non-default policy + will simply be removed when MPOL_DEFAULT is specified. As a result, + MPOL_DEFAULT means "fall back to the next most specific policy scope." + + For example, a NULL or default task policy will fall back to the + system default policy. A NULL or default vma policy will fall + back to the task policy. + + When specified in one of the memory policy APIs, the Default mode + does not use the optional set of nodes. It is an error for the set of nodes specified for this policy to be non-empty. @@ -187,19 +170,18 @@ Components of Memory Policies MPOL_PREFERRED: This mode specifies that the allocation should be attempted from the single node specified in the policy. If that - allocation fails, the kernel will search other nodes, exactly as - it would for a local allocation that started at the preferred node - in increasing distance from the preferred node. "Local" allocation - policy can be viewed as a Preferred policy that starts at the node + allocation fails, the kernel will search other nodes, in order of + increasing distance from the preferred node based on information + provided by the platform firmware. containing the cpu where the allocation takes place. Internally, the Preferred policy uses a single node--the preferred_node member of struct mempolicy. A "distinguished value of this preferred_node, currently '-1', is interpreted as "the node containing the cpu where the allocation takes - place"--local allocation. This is the way to specify - local allocation for a specific range of addresses--i.e. for - VMA policies. + place"--local allocation. "Local" allocation policy can be + viewed as a Preferred policy that starts at the node containing + the cpu where the allocation takes place. It is possible for the user to specify that local allocation is always preferred by passing an empty nodemask with this mode. diff --git a/mm/mempolicy.c b/mm/mempolicy.c index a237295f8190..fea4a5da6e44 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -104,9 +104,13 @@ static struct kmem_cache *sn_cache; policied. */ enum zone_type policy_zone = 0; +/* + * run-time system-wide default policy => local allocation + */ struct mempolicy default_policy = { .refcnt = ATOMIC_INIT(1), /* never free it */ - .mode = MPOL_DEFAULT, + .mode = MPOL_PREFERRED, + .v = { .preferred_node = -1 }, }; static const struct mempolicy_operations { @@ -189,7 +193,7 @@ static struct mempolicy *mpol_new(unsigned short mode, unsigned short flags, if (mode == MPOL_DEFAULT) { if (nodes && !nodes_empty(*nodes)) return ERR_PTR(-EINVAL); - return NULL; + return NULL; /* simply delete any existing policy */ } VM_BUG_ON(!nodes); @@ -246,7 +250,6 @@ void __mpol_put(struct mempolicy *p) { if (!atomic_dec_and_test(&p->refcnt)) return; - p->mode = MPOL_DEFAULT; kmem_cache_free(policy_cache, p); } @@ -626,13 +629,16 @@ static long do_set_mempolicy(unsigned short mode, unsigned short flags, return 0; } -/* Fill a zone bitmap for a policy */ -static void get_zonemask(struct mempolicy *p, nodemask_t *nodes) +/* + * Return nodemask for policy for get_mempolicy() query + */ +static void get_policy_nodemask(struct mempolicy *p, nodemask_t *nodes) { nodes_clear(*nodes); + if (p == &default_policy) + return; + switch (p->mode) { - case MPOL_DEFAULT: - break; case MPOL_BIND: /* Fall through */ case MPOL_INTERLEAVE: @@ -686,6 +692,11 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask, } if (flags & MPOL_F_ADDR) { + /* + * Do NOT fall back to task policy if the + * vma/shared policy at addr is NULL. We + * want to return MPOL_DEFAULT in this case. + */ down_read(&mm->mmap_sem); vma = find_vma_intersection(mm, addr, addr+1); if (!vma) { @@ -700,7 +711,7 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask, return -EINVAL; if (!pol) - pol = &default_policy; + pol = &default_policy; /* indicates default behavior */ if (flags & MPOL_F_NODE) { if (flags & MPOL_F_ADDR) { @@ -715,8 +726,11 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask, err = -EINVAL; goto out; } - } else - *policy = pol->mode | pol->flags; + } else { + *policy = pol == &default_policy ? MPOL_DEFAULT : + pol->mode; + *policy |= pol->flags; + } if (vma) { up_read(¤t->mm->mmap_sem); @@ -725,7 +739,7 @@ static long do_get_mempolicy(int *policy, nodemask_t *nmask, err = 0; if (nmask) - get_zonemask(pol, nmask); + get_policy_nodemask(pol, nmask); out: mpol_cond_put(pol); @@ -1286,8 +1300,7 @@ static struct mempolicy *get_vma_policy(struct task_struct *task, addr); if (vpol) pol = vpol; - } else if (vma->vm_policy && - vma->vm_policy->mode != MPOL_DEFAULT) + } else if (vma->vm_policy) pol = vma->vm_policy; } if (!pol) @@ -1334,7 +1347,6 @@ static struct zonelist *policy_zonelist(gfp_t gfp, struct mempolicy *policy) nd = first_node(policy->v.nodes); break; case MPOL_INTERLEAVE: /* should not happen */ - case MPOL_DEFAULT: nd = numa_node_id(); break; default: @@ -1369,9 +1381,15 @@ static unsigned interleave_nodes(struct mempolicy *policy) */ unsigned slab_node(struct mempolicy *policy) { - unsigned short pol = policy ? policy->mode : MPOL_DEFAULT; + if (!policy) + return numa_node_id(); + + switch (policy->mode) { + case MPOL_PREFERRED: + if (unlikely(policy->v.preferred_node >= 0)) + return policy->v.preferred_node; + return numa_node_id(); - switch (pol) { case MPOL_INTERLEAVE: return interleave_nodes(policy); @@ -1390,13 +1408,8 @@ unsigned slab_node(struct mempolicy *policy) return zone->node; } - case MPOL_PREFERRED: - if (policy->v.preferred_node >= 0) - return policy->v.preferred_node; - /* Fall through */ - default: - return numa_node_id(); + BUG(); } } @@ -1650,8 +1663,6 @@ int __mpol_equal(struct mempolicy *a, struct mempolicy *b) if (a->mode != MPOL_DEFAULT && !mpol_match_intent(a, b)) return 0; switch (a->mode) { - case MPOL_DEFAULT: - return 1; case MPOL_BIND: /* Fall through */ case MPOL_INTERLEAVE: @@ -1828,7 +1839,7 @@ void mpol_shared_policy_init(struct shared_policy *info, unsigned short policy, if (policy != MPOL_DEFAULT) { struct mempolicy *newpol; - /* Falls back to MPOL_DEFAULT on any error */ + /* Falls back to NULL policy [MPOL_DEFAULT] on any error */ newpol = mpol_new(policy, flags, policy_nodes); if (!IS_ERR(newpol)) { /* Create pseudo-vma that contains just the policy */ @@ -1952,9 +1963,14 @@ static inline int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) char *p = buffer; int l; nodemask_t nodes; - unsigned short mode = pol ? pol->mode : MPOL_DEFAULT; + unsigned short mode; unsigned short flags = pol ? pol->flags : 0; + if (!pol || pol == &default_policy) + mode = MPOL_DEFAULT; + else + mode = pol->mode; + switch (mode) { case MPOL_DEFAULT: nodes_clear(nodes); -- cgit v1.2.3 From 53f2556b6792ed99fde965f5e061749edd455623 Mon Sep 17 00:00:00 2001 From: Lee Schermerhorn Date: Mon, 28 Apr 2008 02:13:20 -0700 Subject: mempolicy: mPOL_PREFERRED cleanups for "local allocation" Here are a couple of "cleanups" for MPOL_PREFERRED behavior when v.preferred_node < 0 -- i.e., "local allocation": 1) [do_]get_mempolicy() calls the now renamed get_policy_nodemask() to fetch the nodemask associated with a policy. Currently, get_policy_nodemask() returns the set of nodes with memory, when the policy 'mode' is 'PREFERRED, and the preferred_node is < 0. Change to return an empty nodemask, as this is what was specified to achieve "local allocation". 2) When a task is moved into a [new] cpuset, mpol_rebind_policy() is called to adjust any task and vma policy nodes to be valid in the new cpuset. However, when the policy is MPOL_PREFERRED, and the preferred_node is <0, no rebind is necessary. The "local allocation" indication is valid in any cpuset. Existing code will "do the right thing" because node_remap() will just return the argument node when it is outside of the valid range of node ids. However, I think it is clearer and cleaner to skip the remap explicitly in this case. 3) mpol_to_str() produces a printable, "human readable" string from a struct mempolicy. For MPOL_PREFERRED with preferred_node <0, show "local", as this indicates local allocation, as the task migrates among nodes. Note that this matches the usage of "local allocation" in libnuma() and numactl. Without this change, I believe that node_set() [via set_bit()] will set bit 31, resulting in a misleading display. Signed-off-by: Lee Schermerhorn Cc: Christoph Lameter Cc: David Rientjes Cc: Mel Gorman Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/mempolicy.c | 28 ++++++++++++++++++---------- 1 file changed, 18 insertions(+), 10 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index fea4a5da6e44..7b3ae977b158 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -645,11 +645,9 @@ static void get_policy_nodemask(struct mempolicy *p, nodemask_t *nodes) *nodes = p->v.nodes; break; case MPOL_PREFERRED: - /* or use current node instead of memory_map? */ - if (p->v.preferred_node < 0) - *nodes = node_states[N_HIGH_MEMORY]; - else + if (p->v.preferred_node >= 0) node_set(p->v.preferred_node, *nodes); + /* else return empty node mask for local allocation */ break; default: BUG(); @@ -804,7 +802,7 @@ int do_migrate_pages(struct mm_struct *mm, int err = 0; nodemask_t tmp; - down_read(&mm->mmap_sem); + down_read(&mm->mmap_sem); err = migrate_vmas(mm, from_nodes, to_nodes, flags); if (err) @@ -1948,10 +1946,12 @@ void numa_default_policy(void) } /* - * Display pages allocated per node and memory policy via /proc. + * "local" is pseudo-policy: MPOL_PREFERRED with preferred_node == -1 + * Used only for mpol_to_str() */ +#define MPOL_LOCAL (MPOL_INTERLEAVE + 1) static const char * const policy_types[] = - { "default", "prefer", "bind", "interleave" }; + { "default", "prefer", "bind", "interleave", "local" }; /* * Convert a mempolicy into a string. @@ -1962,6 +1962,7 @@ static inline int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) { char *p = buffer; int l; + int nid; nodemask_t nodes; unsigned short mode; unsigned short flags = pol ? pol->flags : 0; @@ -1978,7 +1979,11 @@ static inline int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) case MPOL_PREFERRED: nodes_clear(nodes); - node_set(pol->v.preferred_node, nodes); + nid = pol->v.preferred_node; + if (nid < 0) + mode = MPOL_LOCAL; /* pseudo-policy */ + else + node_set(nid, nodes); break; case MPOL_BIND: @@ -1993,8 +1998,8 @@ static inline int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) } l = strlen(policy_types[mode]); - if (buffer + maxlen < p + l + 1) - return -ENOSPC; + if (buffer + maxlen < p + l + 1) + return -ENOSPC; strcpy(p, policy_types[mode]); p += l; @@ -2093,6 +2098,9 @@ static inline void check_huge_range(struct vm_area_struct *vma, } #endif +/* + * Display pages allocated per node and memory policy via /proc. + */ int show_numa_map(struct seq_file *m, void *v) { struct proc_maps_private *priv = m->private; -- cgit v1.2.3 From fc36b8d3d819047eb4d23ca079fb4d3af20ff076 Mon Sep 17 00:00:00 2001 From: Lee Schermerhorn Date: Mon, 28 Apr 2008 02:13:21 -0700 Subject: mempolicy: use MPOL_F_LOCAL to Indicate Preferred Local Policy Now that we're using "preferred local" policy for system default, we need to make this as fast as possible. Because of the variable size of the mempolicy structure [based on size of nodemasks], the preferred_node may be in a different cacheline from the mode. This can result in accessing an extra cacheline in the normal case of system default policy. Suspect this is the cause of an observed 2-3% slowdown in page fault testing relative to kernel without this patch series. To alleviate this, use an internal mode flag, MPOL_F_LOCAL in the mempolicy flags member which is guaranteed [?] to be in the same cacheline as the mode itself. Verified that reworked mempolicy now performs slightly better on 25-rc8-mm1 for both anon and shmem segments with system default and vma [preferred local] policy. Signed-off-by: Lee Schermerhorn Cc: Christoph Lameter Cc: David Rientjes Cc: Mel Gorman Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/vm/numa_memory_policy.txt | 11 ++++---- include/linux/mempolicy.h | 1 + mm/mempolicy.c | 47 +++++++++++++++------------------ 3 files changed, 28 insertions(+), 31 deletions(-) diff --git a/Documentation/vm/numa_memory_policy.txt b/Documentation/vm/numa_memory_policy.txt index 13cca5a3cf17..bad16d3f6a47 100644 --- a/Documentation/vm/numa_memory_policy.txt +++ b/Documentation/vm/numa_memory_policy.txt @@ -176,12 +176,11 @@ Components of Memory Policies containing the cpu where the allocation takes place. Internally, the Preferred policy uses a single node--the - preferred_node member of struct mempolicy. A "distinguished - value of this preferred_node, currently '-1', is interpreted - as "the node containing the cpu where the allocation takes - place"--local allocation. "Local" allocation policy can be - viewed as a Preferred policy that starts at the node containing - the cpu where the allocation takes place. + preferred_node member of struct mempolicy. When the internal + mode flag MPOL_F_LOCAL is set, the preferred_node is ignored and + the policy is interpreted as local allocation. "Local" allocation + policy can be viewed as a Preferred policy that starts at the node + containing the cpu where the allocation takes place. It is possible for the user to specify that local allocation is always preferred by passing an empty nodemask with this mode. diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index 172b9c6acb91..b0fab9e80655 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -50,6 +50,7 @@ enum { * are never OR'ed into the mode in mempolicy API arguments. */ #define MPOL_F_SHARED (1 << 0) /* identify shared policies */ +#define MPOL_F_LOCAL (1 << 1) /* preferred local allocation */ #ifdef __KERNEL__ diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 7b3ae977b158..143b019e9834 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -110,7 +110,7 @@ enum zone_type policy_zone = 0; struct mempolicy default_policy = { .refcnt = ATOMIC_INIT(1), /* never free it */ .mode = MPOL_PREFERRED, - .v = { .preferred_node = -1 }, + .flags = MPOL_F_LOCAL, }; static const struct mempolicy_operations { @@ -163,7 +163,7 @@ static int mpol_new_interleave(struct mempolicy *pol, const nodemask_t *nodes) static int mpol_new_preferred(struct mempolicy *pol, const nodemask_t *nodes) { if (!nodes) - pol->v.preferred_node = -1; /* local allocation */ + pol->flags |= MPOL_F_LOCAL; /* local allocation */ else if (nodes_empty(*nodes)) return -EINVAL; /* no allowed nodes */ else @@ -290,14 +290,15 @@ static void mpol_rebind_preferred(struct mempolicy *pol, if (pol->flags & MPOL_F_STATIC_NODES) { int node = first_node(pol->w.user_nodemask); - if (node_isset(node, *nodes)) + if (node_isset(node, *nodes)) { pol->v.preferred_node = node; - else - pol->v.preferred_node = -1; + pol->flags &= ~MPOL_F_LOCAL; + } else + pol->flags |= MPOL_F_LOCAL; } else if (pol->flags & MPOL_F_RELATIVE_NODES) { mpol_relative_nodemask(&tmp, &pol->w.user_nodemask, nodes); pol->v.preferred_node = first_node(tmp); - } else if (pol->v.preferred_node != -1) { + } else if (!(pol->flags & MPOL_F_LOCAL)) { pol->v.preferred_node = node_remap(pol->v.preferred_node, pol->w.cpuset_mems_allowed, *nodes); @@ -645,7 +646,7 @@ static void get_policy_nodemask(struct mempolicy *p, nodemask_t *nodes) *nodes = p->v.nodes; break; case MPOL_PREFERRED: - if (p->v.preferred_node >= 0) + if (!(p->flags & MPOL_F_LOCAL)) node_set(p->v.preferred_node, *nodes); /* else return empty node mask for local allocation */ break; @@ -1324,13 +1325,12 @@ static nodemask_t *policy_nodemask(gfp_t gfp, struct mempolicy *policy) /* Return a zonelist indicated by gfp for node representing a mempolicy */ static struct zonelist *policy_zonelist(gfp_t gfp, struct mempolicy *policy) { - int nd; + int nd = numa_node_id(); switch (policy->mode) { case MPOL_PREFERRED: - nd = policy->v.preferred_node; - if (nd < 0) - nd = numa_node_id(); + if (!(policy->flags & MPOL_F_LOCAL)) + nd = policy->v.preferred_node; break; case MPOL_BIND: /* @@ -1339,16 +1339,13 @@ static struct zonelist *policy_zonelist(gfp_t gfp, struct mempolicy *policy) * current node is part of the mask, we use the zonelist for * the first node in the mask instead. */ - nd = numa_node_id(); if (unlikely(gfp & __GFP_THISNODE) && unlikely(!node_isset(nd, policy->v.nodes))) nd = first_node(policy->v.nodes); break; case MPOL_INTERLEAVE: /* should not happen */ - nd = numa_node_id(); break; default: - nd = 0; BUG(); } return node_zonelist(nd, gfp); @@ -1379,14 +1376,15 @@ static unsigned interleave_nodes(struct mempolicy *policy) */ unsigned slab_node(struct mempolicy *policy) { - if (!policy) + if (!policy || policy->flags & MPOL_F_LOCAL) return numa_node_id(); switch (policy->mode) { case MPOL_PREFERRED: - if (unlikely(policy->v.preferred_node >= 0)) - return policy->v.preferred_node; - return numa_node_id(); + /* + * handled MPOL_F_LOCAL above + */ + return policy->v.preferred_node; case MPOL_INTERLEAVE: return interleave_nodes(policy); @@ -1666,7 +1664,8 @@ int __mpol_equal(struct mempolicy *a, struct mempolicy *b) case MPOL_INTERLEAVE: return nodes_equal(a->v.nodes, b->v.nodes); case MPOL_PREFERRED: - return a->v.preferred_node == b->v.preferred_node; + return a->v.preferred_node == b->v.preferred_node && + a->flags == b->flags; default: BUG(); return 0; @@ -1946,7 +1945,7 @@ void numa_default_policy(void) } /* - * "local" is pseudo-policy: MPOL_PREFERRED with preferred_node == -1 + * "local" is pseudo-policy: MPOL_PREFERRED with MPOL_F_LOCAL flag * Used only for mpol_to_str() */ #define MPOL_LOCAL (MPOL_INTERLEAVE + 1) @@ -1962,7 +1961,6 @@ static inline int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) { char *p = buffer; int l; - int nid; nodemask_t nodes; unsigned short mode; unsigned short flags = pol ? pol->flags : 0; @@ -1979,11 +1977,10 @@ static inline int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) case MPOL_PREFERRED: nodes_clear(nodes); - nid = pol->v.preferred_node; - if (nid < 0) + if (flags & MPOL_F_LOCAL) mode = MPOL_LOCAL; /* pseudo-policy */ else - node_set(nid, nodes); + node_set(pol->v.preferred_node, nodes); break; case MPOL_BIND: @@ -2004,7 +2001,7 @@ static inline int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) strcpy(p, policy_types[mode]); p += l; - if (flags) { + if (flags & MPOL_MODE_FLAGS) { int need_bar = 0; if (buffer + maxlen < p + 2) -- cgit v1.2.3 From 2291990ab36b4b2d8a81b1f92e7a046e51632a60 Mon Sep 17 00:00:00 2001 From: Lee Schermerhorn Date: Mon, 28 Apr 2008 02:13:22 -0700 Subject: mempolicy: clean-up mpol-to-str() mempolicy formatting mpol-to-str() formats memory policies into printable strings. Currently this is only used to display "numa_maps". A subsequent patch will use mpol_to_str() for formatting tmpfs [shmem] mpol mount options, allowing us to remove essentially duplicate code in mm/shmem.c. This patch cleans up mpol_to_str() generally and in preparation for that patch. 1) show_numa_maps() is not checking the return code from mpol_to_str(). There's not a lot we can do in this context if mpol_to_str() did return the error [insufficient space in buffer]. Proposed "solution": just check, under DEBUG_VM, that callers are providing sufficient buffer space for the policy, flags, and a few nodes. This way, we'll get some display. show_numa_maps() is providing a 50-byte buffer, so it won't trip this check. 50-bytes should be sufficient unless one has a large number of nodes in a very sparse nodemask. 2) The display of the new mode flags ["static" & "relative"] was set up to display multiple flags, separated by a "bar" '|'. However, this support is incomplete--e.g., need_bar was never incremented; and currently, these two flags are mutually exclusive. So remove the "bar" support, for now, and only display one flag. 3) Use snprint() to format flags, so as not to overflow the buffer. Not that it's ever happed, AFAIK. Signed-off-by: Lee Schermerhorn Cc: Christoph Lameter Cc: David Rientjes Cc: Mel Gorman Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/mempolicy.c | 17 +++++++++++------ 1 file changed, 11 insertions(+), 6 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 143b019e9834..3c8ee31572ec 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1965,6 +1965,11 @@ static inline int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) unsigned short mode; unsigned short flags = pol ? pol->flags : 0; + /* + * Sanity check: room for longest mode, flag and some nodes + */ + VM_BUG_ON(maxlen < strlen("interleave") + strlen("relative") + 16); + if (!pol || pol == &default_policy) mode = MPOL_DEFAULT; else @@ -1991,7 +1996,6 @@ static inline int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) default: BUG(); - return -EFAULT; } l = strlen(policy_types[mode]); @@ -2002,16 +2006,17 @@ static inline int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) p += l; if (flags & MPOL_MODE_FLAGS) { - int need_bar = 0; - if (buffer + maxlen < p + 2) return -ENOSPC; *p++ = '='; + /* + * Currently, the only defined flags are mutually exclusive + */ if (flags & MPOL_F_STATIC_NODES) - p += sprintf(p, "%sstatic", need_bar++ ? "|" : ""); - if (flags & MPOL_F_RELATIVE_NODES) - p += sprintf(p, "%srelative", need_bar++ ? "|" : ""); + p += snprintf(p, buffer + maxlen - p, "static"); + else if (flags & MPOL_F_RELATIVE_NODES) + p += snprintf(p, buffer + maxlen - p, "relative"); } if (!nodes_empty(nodes)) { -- cgit v1.2.3 From 095f1fc4ebf36c64fddf9b6db29b1ab5517378e6 Mon Sep 17 00:00:00 2001 From: Lee Schermerhorn Date: Mon, 28 Apr 2008 02:13:23 -0700 Subject: mempolicy: rework shmem mpol parsing and display mm/shmem.c currently contains functions to parse and display memory policy strings for the tmpfs 'mpol' mount option. Move this to mm/mempolicy.c with the rest of the mempolicy support. With subsequent patches, we'll be able to remove knowledge of the details [mode, flags, policy, ...] completely from shmem.c 1) replace shmem_parse_mpol() in mm/shmem.c with mpol_parse_str() in mm/mempolicy.c. Rework to use the policy_types[] array [used by mpol_to_str()] to look up mode by name. 2) use mpol_to_str() to format policy for shmem_show_mpol(). mpol_to_str() expects a pointer to a struct mempolicy, so temporarily construct one. This will be replaced with a reference to a struct mempolicy in the tmpfs superblock in a subsequent patch. NOTE 1: I changed mpol_to_str() to use a colon ':' rather than an equal sign '=' as the nodemask delimiter to match mpol_parse_str() and the tmpfs/shmem mpol mount option formatting that now uses mpol_to_str(). This is a user visible change to numa_maps, but then the addition of the mode flags already changed the display. It makes sense to me to have the mounts and numa_maps display the policy in the same format. However, if anyone objects strongly, I can pass the desired nodemask delimeter as an arg to mpol_to_str(). Note 2: Like show_numa_map(), I don't check the return code from mpol_to_str(). I do use a longer buffer than the one provided by show_numa_map(), which seems to have sufficed so far. Signed-off-by: Lee Schermerhorn Cc: Christoph Lameter Cc: David Rientjes Cc: Mel Gorman Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/mempolicy.h | 21 +++++++++ mm/mempolicy.c | 104 +++++++++++++++++++++++++++++++++++++++- mm/shmem.c | 118 +++++----------------------------------------- 3 files changed, 136 insertions(+), 107 deletions(-) diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index b0fab9e80655..dcc17378c952 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -214,6 +214,13 @@ static inline void check_highest_zone(enum zone_type k) int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from_nodes, const nodemask_t *to_nodes, int flags); + +#ifdef CONFIG_TMPFS +extern int mpol_parse_str(char *str, unsigned short *mode, + unsigned short *mode_flags, nodemask_t *policy_nodes); + +extern int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol); +#endif #else struct mempolicy {}; @@ -313,6 +320,20 @@ static inline int do_migrate_pages(struct mm_struct *mm, static inline void check_highest_zone(int k) { } + +#ifdef CONFIG_TMPFS +static inline int mpol_parse_str(char *value, unsigned short *policy, + unsigned short flags, nodemask_t *policy_nodes) +{ + return 1; +} + +static inline int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) +{ + return 0; +} +#endif + #endif /* CONFIG_NUMA */ #endif /* __KERNEL__ */ diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 3c8ee31572ec..155bb284dbf1 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -88,6 +88,7 @@ #include #include #include +#include #include #include @@ -1944,6 +1945,10 @@ void numa_default_policy(void) do_set_mempolicy(MPOL_DEFAULT, 0, NULL); } +/* + * Parse and format mempolicy from/to strings + */ + /* * "local" is pseudo-policy: MPOL_PREFERRED with MPOL_F_LOCAL flag * Used only for mpol_to_str() @@ -1952,12 +1957,107 @@ void numa_default_policy(void) static const char * const policy_types[] = { "default", "prefer", "bind", "interleave", "local" }; + +#ifdef CONFIG_TMPFS +/** + * mpol_parse_str - parse string to mempolicy + * @str: string containing mempolicy to parse + * @mode: pointer to returned policy mode + * @mode_flags: pointer to returned flags + * @policy_nodes: pointer to returned nodemask + * + * Format of input: + * [=][:] + * + * Currently only used for tmpfs/shmem mount options + */ +int mpol_parse_str(char *str, unsigned short *mode, unsigned short *mode_flags, + nodemask_t *policy_nodes) +{ + char *nodelist = strchr(str, ':'); + char *flags = strchr(str, '='); + int i; + int err = 1; + + if (nodelist) { + /* NUL-terminate mode or flags string */ + *nodelist++ = '\0'; + if (nodelist_parse(nodelist, *policy_nodes)) + goto out; + if (!nodes_subset(*policy_nodes, node_states[N_HIGH_MEMORY])) + goto out; + } + if (flags) + *flags++ = '\0'; /* terminate mode string */ + + for (i = 0; i < MPOL_MAX; i++) { + if (!strcmp(str, policy_types[i])) { + *mode = i; + break; + } + } + if (i == MPOL_MAX) + goto out; + + switch (*mode) { + case MPOL_DEFAULT: + /* Don't allow a nodelist nor flags */ + if (!nodelist && !flags) + err = 0; + break; + case MPOL_PREFERRED: + /* Insist on a nodelist of one node only */ + if (nodelist) { + char *rest = nodelist; + while (isdigit(*rest)) + rest++; + if (!*rest) + err = 0; + } + break; + case MPOL_BIND: + /* Insist on a nodelist */ + if (nodelist) + err = 0; + break; + case MPOL_INTERLEAVE: + /* + * Default to online nodes with memory if no nodelist + */ + if (!nodelist) + *policy_nodes = node_states[N_HIGH_MEMORY]; + err = 0; + } + + *mode_flags = 0; + if (flags) { + /* + * Currently, we only support two mutually exclusive + * mode flags. + */ + if (!strcmp(flags, "static")) + *mode_flags |= MPOL_F_STATIC_NODES; + else if (!strcmp(flags, "relative")) + *mode_flags |= MPOL_F_RELATIVE_NODES; + else + err = 1; + } +out: + /* Restore string for error message */ + if (nodelist) + *--nodelist = ':'; + if (flags) + *--flags = '='; + return err; +} +#endif /* CONFIG_TMPFS */ + /* * Convert a mempolicy into a string. * Returns the number of characters in buffer (if positive) * or an error (negative) */ -static inline int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) +int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) { char *p = buffer; int l; @@ -2022,7 +2122,7 @@ static inline int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) if (!nodes_empty(nodes)) { if (buffer + maxlen < p + 2) return -ENOSPC; - *p++ = '='; + *p++ = ':'; p += nodelist_scnprintf(p, buffer + maxlen - p, nodes); } return p - buffer; diff --git a/mm/shmem.c b/mm/shmem.c index 0b591c669b2d..3c620dc10135 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1079,108 +1079,22 @@ redirty: #ifdef CONFIG_NUMA #ifdef CONFIG_TMPFS -static int shmem_parse_mpol(char *value, unsigned short *policy, - unsigned short *mode_flags, nodemask_t *policy_nodes) -{ - char *nodelist = strchr(value, ':'); - char *flags = strchr(value, '='); - int err = 1; - - if (nodelist) { - /* NUL-terminate policy string */ - *nodelist++ = '\0'; - if (nodelist_parse(nodelist, *policy_nodes)) - goto out; - if (!nodes_subset(*policy_nodes, node_states[N_HIGH_MEMORY])) - goto out; - } - if (flags) - *flags++ = '\0'; - if (!strcmp(value, "default")) { - *policy = MPOL_DEFAULT; - /* Don't allow a nodelist */ - if (!nodelist) - err = 0; - } else if (!strcmp(value, "prefer")) { - *policy = MPOL_PREFERRED; - /* Insist on a nodelist of one node only */ - if (nodelist) { - char *rest = nodelist; - while (isdigit(*rest)) - rest++; - if (!*rest) - err = 0; - } - } else if (!strcmp(value, "bind")) { - *policy = MPOL_BIND; - /* Insist on a nodelist */ - if (nodelist) - err = 0; - } else if (!strcmp(value, "interleave")) { - *policy = MPOL_INTERLEAVE; - /* - * Default to online nodes with memory if no nodelist - */ - if (!nodelist) - *policy_nodes = node_states[N_HIGH_MEMORY]; - err = 0; - } - - *mode_flags = 0; - if (flags) { - /* - * Currently, we only support two mutually exclusive - * mode flags. - */ - if (!strcmp(flags, "static")) - *mode_flags |= MPOL_F_STATIC_NODES; - else if (!strcmp(flags, "relative")) - *mode_flags |= MPOL_F_RELATIVE_NODES; - else - err = 1; /* unrecognized flag */ - } -out: - /* Restore string for error message */ - if (nodelist) - *--nodelist = ':'; - if (flags) - *--flags = '='; - return err; -} - -static void shmem_show_mpol(struct seq_file *seq, unsigned short policy, +static void shmem_show_mpol(struct seq_file *seq, unsigned short mode, unsigned short flags, const nodemask_t policy_nodes) { - char *policy_string; - - switch (policy) { - case MPOL_PREFERRED: - policy_string = "prefer"; - break; - case MPOL_BIND: - policy_string = "bind"; - break; - case MPOL_INTERLEAVE: - policy_string = "interleave"; - break; - default: - /* MPOL_DEFAULT */ - return; - } + struct mempolicy temp; + char buffer[64]; - seq_printf(seq, ",mpol=%s", policy_string); + if (mode == MPOL_DEFAULT) + return; /* show nothing */ - if (policy != MPOL_INTERLEAVE || - !nodes_equal(policy_nodes, node_states[N_HIGH_MEMORY])) { - char buffer[64]; - int len; + temp.mode = mode; + temp.flags = flags; + temp.v.nodes = policy_nodes; - len = nodelist_scnprintf(buffer, sizeof(buffer), policy_nodes); - if (len < sizeof(buffer)) - seq_printf(seq, ":%s", buffer); - else - seq_printf(seq, ":?"); - } + mpol_to_str(buffer, sizeof(buffer), &temp); + + seq_printf(seq, ",mpol=%s", buffer); } #endif /* CONFIG_TMPFS */ @@ -1221,12 +1135,6 @@ static struct page *shmem_alloc_page(gfp_t gfp, } #else /* !CONFIG_NUMA */ #ifdef CONFIG_TMPFS -static inline int shmem_parse_mpol(char *value, unsigned short *policy, - unsigned short *mode_flags, nodemask_t *policy_nodes) -{ - return 1; -} - static inline void shmem_show_mpol(struct seq_file *seq, unsigned short policy, unsigned short flags, const nodemask_t policy_nodes) { @@ -2231,8 +2139,8 @@ static int shmem_parse_options(char *options, struct shmem_sb_info *sbinfo, if (*rest) goto bad_val; } else if (!strcmp(this_char,"mpol")) { - if (shmem_parse_mpol(value, &sbinfo->policy, - &sbinfo->flags, &sbinfo->policy_nodes)) + if (mpol_parse_str(value, &sbinfo->policy, + &sbinfo->flags, &sbinfo->policy_nodes)) goto bad_val; } else { printk(KERN_ERR "tmpfs: Bad mount option %s\n", -- cgit v1.2.3 From 3f226aa1cbc006f9d90f22084f519ad2a1286cd8 Mon Sep 17 00:00:00 2001 From: Lee Schermerhorn Date: Mon, 28 Apr 2008 02:13:24 -0700 Subject: mempolicy: support mpol=local tmpfs mount option For tmpfs/shmem shared policies, MPOL_DEFAULT is not necessarily equivalent to "local allocation". Because shared policies are at the same "scope" level [see Documentation/vm/numa_memory_policy.txt], as vma policies MPOL_DEFAULT means "fall back to current task policy". This patch extends the memory policy string parsing function to display "local" for MPOL_PREFERRED + MPOL_F_LOCAL. This allows one to specify local allocation as the default policy for shared memory areas via the tmpfs mpol mount option, regardless of the current task's policy. Also, "local" is now displayed for this policy. This patch allows us to accept the same input format as the display. Signed-off-by: Lee Schermerhorn Cc: Christoph Lameter Cc: David Rientjes Cc: Mel Gorman Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/mempolicy.c | 25 +++++++++++++++++-------- 1 file changed, 17 insertions(+), 8 deletions(-) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 155bb284dbf1..6b751565eed1 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1951,7 +1951,7 @@ void numa_default_policy(void) /* * "local" is pseudo-policy: MPOL_PREFERRED with MPOL_F_LOCAL flag - * Used only for mpol_to_str() + * Used only for mpol_parse_str() and mpol_to_str() */ #define MPOL_LOCAL (MPOL_INTERLEAVE + 1) static const char * const policy_types[] = @@ -1990,21 +1990,16 @@ int mpol_parse_str(char *str, unsigned short *mode, unsigned short *mode_flags, if (flags) *flags++ = '\0'; /* terminate mode string */ - for (i = 0; i < MPOL_MAX; i++) { + for (i = 0; i <= MPOL_LOCAL; i++) { if (!strcmp(str, policy_types[i])) { *mode = i; break; } } - if (i == MPOL_MAX) + if (i > MPOL_LOCAL) goto out; switch (*mode) { - case MPOL_DEFAULT: - /* Don't allow a nodelist nor flags */ - if (!nodelist && !flags) - err = 0; - break; case MPOL_PREFERRED: /* Insist on a nodelist of one node only */ if (nodelist) { @@ -2027,6 +2022,20 @@ int mpol_parse_str(char *str, unsigned short *mode, unsigned short *mode_flags, if (!nodelist) *policy_nodes = node_states[N_HIGH_MEMORY]; err = 0; + break; + default: + /* + * MPOL_DEFAULT or MPOL_LOCAL + * Don't allow a nodelist nor flags + */ + if (!nodelist && !flags) + err = 0; + if (*mode == MPOL_DEFAULT) + goto out; + /* else MPOL_LOCAL */ + *mode = MPOL_PREFERRED; + nodes_clear(*policy_nodes); + break; } *mode_flags = 0; -- cgit v1.2.3 From 71fe804b6d56d6a7aed680e096901434cef6a2c3 Mon Sep 17 00:00:00 2001 From: Lee Schermerhorn Date: Mon, 28 Apr 2008 02:13:26 -0700 Subject: mempolicy: use struct mempolicy pointer in shmem_sb_info This patch replaces the mempolicy mode, mode_flags, and nodemask in the shmem_sb_info struct with a struct mempolicy pointer, initialized to NULL. This removes dependency on the details of mempolicy from shmem.c and hugetlbfs inode.c and simplifies the interfaces. mpol_parse_str() in mempolicy.c is changed to return, via a pointer to a pointer arg, a struct mempolicy pointer on success. For MPOL_DEFAULT, the returned pointer is NULL. Further, mpol_parse_str() now takes a 'no_context' argument that causes the input nodemask to be stored in the w.user_nodemask of the created mempolicy for use when the mempolicy is installed in a tmpfs inode shared policy tree. At that time, any cpuset contextualization is applied to the original input nodemask. This preserves the previous behavior where the input nodemask was stored in the superblock. We can think of the returned mempolicy as "context free". Because mpol_parse_str() is now calling mpol_new(), we can remove from mpol_to_str() the semantic checks that mpol_new() already performs. Add 'no_context' parameter to mpol_to_str() to specify that it should format the nodemask in w.user_nodemask for 'bind' and 'interleave' policies. Change mpol_shared_policy_init() to take a pointer to a "context free" struct mempolicy and to create a new, "contextualized" mempolicy using the mode, mode_flags and user_nodemask from the input mempolicy. Note: we know that the mempolicy passed to mpol_to_str() or mpol_shared_policy_init() from a tmpfs superblock is "context free". This is currently the only instance thereof. However, if we found more uses for this concept, and introduced any ambiguity as to whether a mempolicy was context free or not, we could add another internal mode flag to identify context free mempolicies. Then, we could remove the 'no_context' argument from mpol_to_str(). Added shmem_get_sbmpol() to return a reference counted superblock mempolicy, if one exists, to pass to mpol_shared_policy_init(). We must add the reference under the sb stat_lock to prevent races with replacement of the mpol by remount. This reference is removed in mpol_shared_policy_init(). [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: another build fix] [akpm@linux-foundation.org: yet another build fix] Signed-off-by: Lee Schermerhorn Cc: Christoph Lameter Cc: David Rientjes Cc: Mel Gorman Cc: Andi Kleen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/hugetlbfs/inode.c | 2 +- include/linux/mempolicy.h | 22 +++---- include/linux/shmem_fs.h | 4 +- mm/mempolicy.c | 144 ++++++++++++++++++++++++++++------------------ mm/shmem.c | 57 ++++++++++-------- 5 files changed, 134 insertions(+), 95 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 2e9e5bdd5629..9783723e8ffe 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -504,7 +504,7 @@ static struct inode *hugetlbfs_get_inode(struct super_block *sb, uid_t uid, inode->i_atime = inode->i_mtime = inode->i_ctime = CURRENT_TIME; INIT_LIST_HEAD(&inode->i_mapping->private_list); info = HUGETLBFS_I(inode); - mpol_shared_policy_init(&info->policy, MPOL_DEFAULT, 0, NULL); + mpol_shared_policy_init(&info->policy, NULL); switch (mode & S_IFMT) { default: init_special_inode(inode, mode, dev); diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h index dcc17378c952..3a39570b81b8 100644 --- a/include/linux/mempolicy.h +++ b/include/linux/mempolicy.h @@ -182,8 +182,7 @@ struct shared_policy { spinlock_t lock; }; -void mpol_shared_policy_init(struct shared_policy *info, unsigned short mode, - unsigned short flags, nodemask_t *nodes); +void mpol_shared_policy_init(struct shared_policy *sp, struct mempolicy *mpol); int mpol_set_shared_policy(struct shared_policy *info, struct vm_area_struct *vma, struct mempolicy *new); @@ -216,10 +215,10 @@ int do_migrate_pages(struct mm_struct *mm, #ifdef CONFIG_TMPFS -extern int mpol_parse_str(char *str, unsigned short *mode, - unsigned short *mode_flags, nodemask_t *policy_nodes); +extern int mpol_parse_str(char *str, struct mempolicy **mpol, int no_context); -extern int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol); +extern int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol, + int no_context); #endif #else @@ -262,8 +261,8 @@ static inline int mpol_set_shared_policy(struct shared_policy *info, return -EINVAL; } -static inline void mpol_shared_policy_init(struct shared_policy *info, - unsigned short mode, unsigned short flags, nodemask_t *nodes) +static inline void mpol_shared_policy_init(struct shared_policy *sp, + struct mempolicy *mpol) { } @@ -322,13 +321,14 @@ static inline void check_highest_zone(int k) } #ifdef CONFIG_TMPFS -static inline int mpol_parse_str(char *value, unsigned short *policy, - unsigned short flags, nodemask_t *policy_nodes) +static inline int mpol_parse_str(char *str, struct mempolicy **mpol, + int no_context) { - return 1; + return 1; /* error */ } -static inline int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) +static inline int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol, + int no_context) { return 0; } diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index d7699a628d78..f2d12d5a21b8 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -34,9 +34,7 @@ struct shmem_sb_info { uid_t uid; /* Mount uid for root directory */ gid_t gid; /* Mount gid for root directory */ mode_t mode; /* Mount mode for root directory */ - unsigned short policy; /* Default NUMA memory alloc policy */ - unsigned short flags; /* Optional mempolicy flags */ - nodemask_t policy_nodes; /* nodemask for preferred and bind */ + struct mempolicy *mpol; /* default memory policy for mappings */ }; static inline struct shmem_inode_info *SHMEM_I(struct inode *inode) diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 6b751565eed1..a37a5034f63d 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1828,27 +1828,35 @@ restart: return 0; } -void mpol_shared_policy_init(struct shared_policy *info, unsigned short policy, - unsigned short flags, nodemask_t *policy_nodes) -{ - info->root = RB_ROOT; - spin_lock_init(&info->lock); - - if (policy != MPOL_DEFAULT) { - struct mempolicy *newpol; - - /* Falls back to NULL policy [MPOL_DEFAULT] on any error */ - newpol = mpol_new(policy, flags, policy_nodes); - if (!IS_ERR(newpol)) { - /* Create pseudo-vma that contains just the policy */ - struct vm_area_struct pvma; - - memset(&pvma, 0, sizeof(struct vm_area_struct)); - /* Policy covers entire file */ - pvma.vm_end = TASK_SIZE; - mpol_set_shared_policy(info, &pvma, newpol); - mpol_put(newpol); - } +/** + * mpol_shared_policy_init - initialize shared policy for inode + * @sp: pointer to inode shared policy + * @mpol: struct mempolicy to install + * + * Install non-NULL @mpol in inode's shared policy rb-tree. + * On entry, the current task has a reference on a non-NULL @mpol. + * This must be released on exit. + */ +void mpol_shared_policy_init(struct shared_policy *sp, struct mempolicy *mpol) +{ + sp->root = RB_ROOT; /* empty tree == default mempolicy */ + spin_lock_init(&sp->lock); + + if (mpol) { + struct vm_area_struct pvma; + struct mempolicy *new; + + /* contextualize the tmpfs mount point mempolicy */ + new = mpol_new(mpol->mode, mpol->flags, &mpol->w.user_nodemask); + mpol_put(mpol); /* drop our ref on sb mpol */ + if (IS_ERR(new)) + return; /* no valid nodemask intersection */ + + /* Create pseudo-vma that contains just the policy */ + memset(&pvma, 0, sizeof(struct vm_area_struct)); + pvma.vm_end = TASK_SIZE; /* policy covers entire file */ + mpol_set_shared_policy(sp, &pvma, new); /* adds ref */ + mpol_put(new); /* drop initial ref */ } } @@ -1962,18 +1970,27 @@ static const char * const policy_types[] = /** * mpol_parse_str - parse string to mempolicy * @str: string containing mempolicy to parse - * @mode: pointer to returned policy mode - * @mode_flags: pointer to returned flags - * @policy_nodes: pointer to returned nodemask + * @mpol: pointer to struct mempolicy pointer, returned on success. + * @no_context: flag whether to "contextualize" the mempolicy * * Format of input: * [=][:] * - * Currently only used for tmpfs/shmem mount options + * if @no_context is true, save the input nodemask in w.user_nodemask in + * the returned mempolicy. This will be used to "clone" the mempolicy in + * a specific context [cpuset] at a later time. Used to parse tmpfs mpol + * mount option. Note that if 'static' or 'relative' mode flags were + * specified, the input nodemask will already have been saved. Saving + * it again is redundant, but safe. + * + * On success, returns 0, else 1 */ -int mpol_parse_str(char *str, unsigned short *mode, unsigned short *mode_flags, - nodemask_t *policy_nodes) +int mpol_parse_str(char *str, struct mempolicy **mpol, int no_context) { + struct mempolicy *new = NULL; + unsigned short uninitialized_var(mode); + unsigned short uninitialized_var(mode_flags); + nodemask_t nodes; char *nodelist = strchr(str, ':'); char *flags = strchr(str, '='); int i; @@ -1982,26 +1999,30 @@ int mpol_parse_str(char *str, unsigned short *mode, unsigned short *mode_flags, if (nodelist) { /* NUL-terminate mode or flags string */ *nodelist++ = '\0'; - if (nodelist_parse(nodelist, *policy_nodes)) + if (nodelist_parse(nodelist, nodes)) goto out; - if (!nodes_subset(*policy_nodes, node_states[N_HIGH_MEMORY])) + if (!nodes_subset(nodes, node_states[N_HIGH_MEMORY])) goto out; - } + } else + nodes_clear(nodes); + if (flags) *flags++ = '\0'; /* terminate mode string */ for (i = 0; i <= MPOL_LOCAL; i++) { if (!strcmp(str, policy_types[i])) { - *mode = i; + mode = i; break; } } if (i > MPOL_LOCAL) goto out; - switch (*mode) { + switch (mode) { case MPOL_PREFERRED: - /* Insist on a nodelist of one node only */ + /* + * Insist on a nodelist of one node only + */ if (nodelist) { char *rest = nodelist; while (isdigit(*rest)) @@ -2010,63 +2031,73 @@ int mpol_parse_str(char *str, unsigned short *mode, unsigned short *mode_flags, err = 0; } break; - case MPOL_BIND: - /* Insist on a nodelist */ - if (nodelist) - err = 0; - break; case MPOL_INTERLEAVE: /* * Default to online nodes with memory if no nodelist */ if (!nodelist) - *policy_nodes = node_states[N_HIGH_MEMORY]; + nodes = node_states[N_HIGH_MEMORY]; err = 0; break; - default: + case MPOL_LOCAL: /* - * MPOL_DEFAULT or MPOL_LOCAL - * Don't allow a nodelist nor flags + * Don't allow a nodelist; mpol_new() checks flags */ - if (!nodelist && !flags) - err = 0; - if (*mode == MPOL_DEFAULT) + if (nodelist) goto out; - /* else MPOL_LOCAL */ - *mode = MPOL_PREFERRED; - nodes_clear(*policy_nodes); + mode = MPOL_PREFERRED; break; + + /* + * case MPOL_BIND: mpol_new() enforces non-empty nodemask. + * case MPOL_DEFAULT: mpol_new() enforces empty nodemask, ignores flags. + */ } - *mode_flags = 0; + mode_flags = 0; if (flags) { /* * Currently, we only support two mutually exclusive * mode flags. */ if (!strcmp(flags, "static")) - *mode_flags |= MPOL_F_STATIC_NODES; + mode_flags |= MPOL_F_STATIC_NODES; else if (!strcmp(flags, "relative")) - *mode_flags |= MPOL_F_RELATIVE_NODES; + mode_flags |= MPOL_F_RELATIVE_NODES; else err = 1; } + + new = mpol_new(mode, mode_flags, &nodes); + if (IS_ERR(new)) + err = 1; + else if (no_context) + new->w.user_nodemask = nodes; /* save for contextualization */ + out: /* Restore string for error message */ if (nodelist) *--nodelist = ':'; if (flags) *--flags = '='; + if (!err) + *mpol = new; return err; } #endif /* CONFIG_TMPFS */ -/* +/** + * mpol_to_str - format a mempolicy structure for printing + * @buffer: to contain formatted mempolicy string + * @maxlen: length of @buffer + * @pol: pointer to mempolicy to be formatted + * @no_context: "context free" mempolicy - use nodemask in w.user_nodemask + * * Convert a mempolicy into a string. * Returns the number of characters in buffer (if positive) * or an error (negative) */ -int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) +int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol, int no_context) { char *p = buffer; int l; @@ -2100,7 +2131,10 @@ int mpol_to_str(char *buffer, int maxlen, struct mempolicy *pol) case MPOL_BIND: /* Fall through */ case MPOL_INTERLEAVE: - nodes = pol->v.nodes; + if (no_context) + nodes = pol->w.user_nodemask; + else + nodes = pol->v.nodes; break; default: @@ -2231,7 +2265,7 @@ int show_numa_map(struct seq_file *m, void *v) return 0; pol = get_vma_policy(priv->task, vma, vma->vm_start); - mpol_to_str(buffer, sizeof(buffer), pol); + mpol_to_str(buffer, sizeof(buffer), pol, 0); mpol_cond_put(pol); seq_printf(m, "%08lx %s", vma->vm_start, buffer); diff --git a/mm/shmem.c b/mm/shmem.c index 3c620dc10135..e6d9298aa22a 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1079,23 +1079,29 @@ redirty: #ifdef CONFIG_NUMA #ifdef CONFIG_TMPFS -static void shmem_show_mpol(struct seq_file *seq, unsigned short mode, - unsigned short flags, const nodemask_t policy_nodes) +static void shmem_show_mpol(struct seq_file *seq, struct mempolicy *mpol) { - struct mempolicy temp; char buffer[64]; - if (mode == MPOL_DEFAULT) + if (!mpol || mpol->mode == MPOL_DEFAULT) return; /* show nothing */ - temp.mode = mode; - temp.flags = flags; - temp.v.nodes = policy_nodes; - - mpol_to_str(buffer, sizeof(buffer), &temp); + mpol_to_str(buffer, sizeof(buffer), mpol, 1); seq_printf(seq, ",mpol=%s", buffer); } + +static struct mempolicy *shmem_get_sbmpol(struct shmem_sb_info *sbinfo) +{ + struct mempolicy *mpol = NULL; + if (sbinfo->mpol) { + spin_lock(&sbinfo->stat_lock); /* prevent replace/use races */ + mpol = sbinfo->mpol; + mpol_get(mpol); + spin_unlock(&sbinfo->stat_lock); + } + return mpol; +} #endif /* CONFIG_TMPFS */ static struct page *shmem_swapin(swp_entry_t entry, gfp_t gfp, @@ -1135,8 +1141,7 @@ static struct page *shmem_alloc_page(gfp_t gfp, } #else /* !CONFIG_NUMA */ #ifdef CONFIG_TMPFS -static inline void shmem_show_mpol(struct seq_file *seq, unsigned short policy, - unsigned short flags, const nodemask_t policy_nodes) +static inline void shmem_show_mpol(struct seq_file *seq, struct mempolicy *p) { } #endif /* CONFIG_TMPFS */ @@ -1154,6 +1159,13 @@ static inline struct page *shmem_alloc_page(gfp_t gfp, } #endif /* CONFIG_NUMA */ +#if !defined(CONFIG_NUMA) || !defined(CONFIG_TMPFS) +static inline struct mempolicy *shmem_get_sbmpol(struct shmem_sb_info *sbinfo) +{ + return NULL; +} +#endif + /* * shmem_getpage - either get the page from swap or allocate a new one * @@ -1508,8 +1520,8 @@ shmem_get_inode(struct super_block *sb, int mode, dev_t dev) case S_IFREG: inode->i_op = &shmem_inode_operations; inode->i_fop = &shmem_file_operations; - mpol_shared_policy_init(&info->policy, sbinfo->policy, - sbinfo->flags, &sbinfo->policy_nodes); + mpol_shared_policy_init(&info->policy, + shmem_get_sbmpol(sbinfo)); break; case S_IFDIR: inc_nlink(inode); @@ -1523,8 +1535,7 @@ shmem_get_inode(struct super_block *sb, int mode, dev_t dev) * Must not load anything in the rbtree, * mpol_free_shared_policy will not be called. */ - mpol_shared_policy_init(&info->policy, MPOL_DEFAULT, 0, - NULL); + mpol_shared_policy_init(&info->policy, NULL); break; } } else @@ -2139,8 +2150,7 @@ static int shmem_parse_options(char *options, struct shmem_sb_info *sbinfo, if (*rest) goto bad_val; } else if (!strcmp(this_char,"mpol")) { - if (mpol_parse_str(value, &sbinfo->policy, - &sbinfo->flags, &sbinfo->policy_nodes)) + if (mpol_parse_str(value, &sbinfo->mpol, 1)) goto bad_val; } else { printk(KERN_ERR "tmpfs: Bad mount option %s\n", @@ -2191,9 +2201,9 @@ static int shmem_remount_fs(struct super_block *sb, int *flags, char *data) sbinfo->free_blocks = config.max_blocks - blocks; sbinfo->max_inodes = config.max_inodes; sbinfo->free_inodes = config.max_inodes - inodes; - sbinfo->policy = config.policy; - sbinfo->flags = config.flags; - sbinfo->policy_nodes = config.policy_nodes; + + mpol_put(sbinfo->mpol); + sbinfo->mpol = config.mpol; /* transfers initial ref */ out: spin_unlock(&sbinfo->stat_lock); return error; @@ -2214,8 +2224,7 @@ static int shmem_show_options(struct seq_file *seq, struct vfsmount *vfs) seq_printf(seq, ",uid=%u", sbinfo->uid); if (sbinfo->gid != 0) seq_printf(seq, ",gid=%u", sbinfo->gid); - shmem_show_mpol(seq, sbinfo->policy, sbinfo->flags, - sbinfo->policy_nodes); + shmem_show_mpol(seq, sbinfo->mpol); return 0; } #endif /* CONFIG_TMPFS */ @@ -2245,9 +2254,7 @@ static int shmem_fill_super(struct super_block *sb, sbinfo->mode = S_IRWXUGO | S_ISVTX; sbinfo->uid = current->fsuid; sbinfo->gid = current->fsgid; - sbinfo->policy = MPOL_DEFAULT; - sbinfo->flags = 0; - sbinfo->policy_nodes = node_states[N_HIGH_MEMORY]; + sbinfo->mpol = NULL; sb->s_fs_info = sbinfo; #ifdef CONFIG_TMPFS -- cgit v1.2.3 From 6d779079bfd1196e077bb1d0a906c37ae770b102 Mon Sep 17 00:00:00 2001 From: Gerald Schaefer Date: Mon, 28 Apr 2008 02:13:27 -0700 Subject: hugetlbfs: architecture header cleanup This patch moves all architecture functions for hugetlb to architecture header files (include/asm-foo/hugetlb.h) and converts all macros to inline functions. It also removes (!) ARCH_HAS_HUGEPAGE_ONLY_RANGE, ARCH_HAS_HUGETLB_FREE_PGD_RANGE, ARCH_HAS_PREPARE_HUGEPAGE_RANGE, ARCH_HAS_SETCLEAR_HUGE_PTE and ARCH_HAS_HUGETLB_PREFAULT_HOOK. Getting rid of the ARCH_HAS_xxx #ifdef and macro fugliness should increase readability and maintainability, at the price of some code duplication. An asm-generic common part would have reduced the loc, but we would end up with new ARCH_HAS_xxx defines eventually. Acked-by: Martin Schwidefsky Signed-off-by: Gerald Schaefer Cc: Paul Mundt Cc: "Luck, Tony" Cc: Ingo Molnar Cc: Thomas Gleixner Cc: "David S. Miller" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/asm-ia64/hugetlb.h | 37 ++++++++++++++++++++++++++++++++ include/asm-ia64/page.h | 6 ------ include/asm-powerpc/hugetlb.h | 37 ++++++++++++++++++++++++++++++++ include/asm-powerpc/page_64.h | 7 ------- include/asm-sh/hugetlb.h | 49 +++++++++++++++++++++++++++++++++++++++++++ include/asm-sparc64/hugetlb.h | 42 +++++++++++++++++++++++++++++++++++++ include/asm-sparc64/page.h | 2 -- include/asm-x86/hugetlb.h | 49 +++++++++++++++++++++++++++++++++++++++++++ include/linux/hugetlb.h | 46 +--------------------------------------- 9 files changed, 215 insertions(+), 60 deletions(-) create mode 100644 include/asm-ia64/hugetlb.h create mode 100644 include/asm-powerpc/hugetlb.h create mode 100644 include/asm-sh/hugetlb.h create mode 100644 include/asm-sparc64/hugetlb.h create mode 100644 include/asm-x86/hugetlb.h diff --git a/include/asm-ia64/hugetlb.h b/include/asm-ia64/hugetlb.h new file mode 100644 index 000000000000..f0ee14c6e172 --- /dev/null +++ b/include/asm-ia64/hugetlb.h @@ -0,0 +1,37 @@ +#ifndef _ASM_IA64_HUGETLB_H +#define _ASM_IA64_HUGETLB_H + +#include + + +void hugetlb_free_pgd_range(struct mmu_gather **tlb, unsigned long addr, + unsigned long end, unsigned long floor, + unsigned long ceiling); + +int prepare_hugepage_range(unsigned long addr, unsigned long len); + +static inline int is_hugepage_only_range(struct mm_struct *mm, + unsigned long addr, + unsigned long len) +{ + return (REGION_NUMBER(addr) == RGN_HPAGE || + REGION_NUMBER((addr)+(len)-1) == RGN_HPAGE); +} + +static inline void hugetlb_prefault_arch_hook(struct mm_struct *mm) +{ +} + +static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte) +{ + set_pte_at(mm, addr, ptep, pte); +} + +static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm, + unsigned long addr, pte_t *ptep) +{ + return ptep_get_and_clear(mm, addr, ptep); +} + +#endif /* _ASM_IA64_HUGETLB_H */ diff --git a/include/asm-ia64/page.h b/include/asm-ia64/page.h index 4999a6c63775..36f39321b768 100644 --- a/include/asm-ia64/page.h +++ b/include/asm-ia64/page.h @@ -54,9 +54,6 @@ # define HPAGE_MASK (~(HPAGE_SIZE - 1)) # define HAVE_ARCH_HUGETLB_UNMAPPED_AREA -# define ARCH_HAS_HUGEPAGE_ONLY_RANGE -# define ARCH_HAS_PREPARE_HUGEPAGE_RANGE -# define ARCH_HAS_HUGETLB_FREE_PGD_RANGE #endif /* CONFIG_HUGETLB_PAGE */ #ifdef __ASSEMBLY__ @@ -153,9 +150,6 @@ typedef union ia64_va { # define htlbpage_to_page(x) (((unsigned long) REGION_NUMBER(x) << 61) \ | (REGION_OFFSET(x) >> (HPAGE_SHIFT-PAGE_SHIFT))) # define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT) -# define is_hugepage_only_range(mm, addr, len) \ - (REGION_NUMBER(addr) == RGN_HPAGE || \ - REGION_NUMBER((addr)+(len)-1) == RGN_HPAGE) extern unsigned int hpage_shift; #endif diff --git a/include/asm-powerpc/hugetlb.h b/include/asm-powerpc/hugetlb.h new file mode 100644 index 000000000000..f537993c5c87 --- /dev/null +++ b/include/asm-powerpc/hugetlb.h @@ -0,0 +1,37 @@ +#ifndef _ASM_POWERPC_HUGETLB_H +#define _ASM_POWERPC_HUGETLB_H + +#include + + +int is_hugepage_only_range(struct mm_struct *mm, unsigned long addr, + unsigned long len); + +void hugetlb_free_pgd_range(struct mmu_gather **tlb, unsigned long addr, + unsigned long end, unsigned long floor, + unsigned long ceiling); + +void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte); + +pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, + pte_t *ptep); + +/* + * If the arch doesn't supply something else, assume that hugepage + * size aligned regions are ok without further preparation. + */ +static inline int prepare_hugepage_range(unsigned long addr, unsigned long len) +{ + if (len & ~HPAGE_MASK) + return -EINVAL; + if (addr & ~HPAGE_MASK) + return -EINVAL; + return 0; +} + +static inline void hugetlb_prefault_arch_hook(struct mm_struct *mm) +{ +} + +#endif /* _ASM_POWERPC_HUGETLB_H */ diff --git a/include/asm-powerpc/page_64.h b/include/asm-powerpc/page_64.h index 67834eae5702..25af4fc8daf4 100644 --- a/include/asm-powerpc/page_64.h +++ b/include/asm-powerpc/page_64.h @@ -128,11 +128,6 @@ extern void slice_init_context(struct mm_struct *mm, unsigned int psize); extern void slice_set_user_psize(struct mm_struct *mm, unsigned int psize); #define slice_mm_new_context(mm) ((mm)->context.id == 0) -#define ARCH_HAS_HUGEPAGE_ONLY_RANGE -extern int is_hugepage_only_range(struct mm_struct *m, - unsigned long addr, - unsigned long len); - #endif /* __ASSEMBLY__ */ #else #define slice_init() @@ -146,8 +141,6 @@ do { \ #ifdef CONFIG_HUGETLB_PAGE -#define ARCH_HAS_HUGETLB_FREE_PGD_RANGE -#define ARCH_HAS_SETCLEAR_HUGE_PTE #define HAVE_ARCH_HUGETLB_UNMAPPED_AREA #endif /* !CONFIG_HUGETLB_PAGE */ diff --git a/include/asm-sh/hugetlb.h b/include/asm-sh/hugetlb.h new file mode 100644 index 000000000000..885218d2c844 --- /dev/null +++ b/include/asm-sh/hugetlb.h @@ -0,0 +1,49 @@ +#ifndef _ASM_SH_HUGETLB_H +#define _ASM_SH_HUGETLB_H + +#include + + +static inline int is_hugepage_only_range(struct mm_struct *mm, + unsigned long addr, + unsigned long len) { + return 0; +} + +/* + * If the arch doesn't supply something else, assume that hugepage + * size aligned regions are ok without further preparation. + */ +static inline int prepare_hugepage_range(unsigned long addr, unsigned long len) +{ + if (len & ~HPAGE_MASK) + return -EINVAL; + if (addr & ~HPAGE_MASK) + return -EINVAL; + return 0; +} + +static inline void hugetlb_prefault_arch_hook(struct mm_struct *mm) { +} + +static inline void hugetlb_free_pgd_range(struct mmu_gather **tlb, + unsigned long addr, unsigned long end, + unsigned long floor, + unsigned long ceiling) +{ + free_pgd_range(tlb, addr, end, floor, ceiling); +} + +static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte) +{ + set_pte_at(mm, addr, ptep, pte); +} + +static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm, + unsigned long addr, pte_t *ptep) +{ + return ptep_get_and_clear(mm, addr, ptep); +} + +#endif /* _ASM_SH_HUGETLB_H */ diff --git a/include/asm-sparc64/hugetlb.h b/include/asm-sparc64/hugetlb.h new file mode 100644 index 000000000000..7e111cfd31ea --- /dev/null +++ b/include/asm-sparc64/hugetlb.h @@ -0,0 +1,42 @@ +#ifndef _ASM_SPARC64_HUGETLB_H +#define _ASM_SPARC64_HUGETLB_H + +#include + + +void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte); + +pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, + pte_t *ptep); + +void hugetlb_prefault_arch_hook(struct mm_struct *mm); + +static inline int is_hugepage_only_range(struct mm_struct *mm, + unsigned long addr, + unsigned long len) { + return 0; +} + +/* + * If the arch doesn't supply something else, assume that hugepage + * size aligned regions are ok without further preparation. + */ +static inline int prepare_hugepage_range(unsigned long addr, unsigned long len) +{ + if (len & ~HPAGE_MASK) + return -EINVAL; + if (addr & ~HPAGE_MASK) + return -EINVAL; + return 0; +} + +static inline void hugetlb_free_pgd_range(struct mmu_gather **tlb, + unsigned long addr, unsigned long end, + unsigned long floor, + unsigned long ceiling) +{ + free_pgd_range(tlb, addr, end, floor, ceiling); +} + +#endif /* _ASM_SPARC64_HUGETLB_H */ diff --git a/include/asm-sparc64/page.h b/include/asm-sparc64/page.h index e93a482aa24a..618117def0dc 100644 --- a/include/asm-sparc64/page.h +++ b/include/asm-sparc64/page.h @@ -39,8 +39,6 @@ #define HPAGE_SIZE (_AC(1,UL) << HPAGE_SHIFT) #define HPAGE_MASK (~(HPAGE_SIZE - 1UL)) #define HUGETLB_PAGE_ORDER (HPAGE_SHIFT - PAGE_SHIFT) -#define ARCH_HAS_SETCLEAR_HUGE_PTE -#define ARCH_HAS_HUGETLB_PREFAULT_HOOK #define HAVE_ARCH_HUGETLB_UNMAPPED_AREA #endif diff --git a/include/asm-x86/hugetlb.h b/include/asm-x86/hugetlb.h new file mode 100644 index 000000000000..ec21cedd7149 --- /dev/null +++ b/include/asm-x86/hugetlb.h @@ -0,0 +1,49 @@ +#ifndef _ASM_X86_HUGETLB_H +#define _ASM_X86_HUGETLB_H + +#include + + +static inline int is_hugepage_only_range(struct mm_struct *mm, + unsigned long addr, + unsigned long len) { + return 0; +} + +/* + * If the arch doesn't supply something else, assume that hugepage + * size aligned regions are ok without further preparation. + */ +static inline int prepare_hugepage_range(unsigned long addr, unsigned long len) +{ + if (len & ~HPAGE_MASK) + return -EINVAL; + if (addr & ~HPAGE_MASK) + return -EINVAL; + return 0; +} + +static inline void hugetlb_prefault_arch_hook(struct mm_struct *mm) { +} + +static inline void hugetlb_free_pgd_range(struct mmu_gather **tlb, + unsigned long addr, unsigned long end, + unsigned long floor, + unsigned long ceiling) +{ + free_pgd_range(tlb, addr, end, floor, ceiling); +} + +static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte) +{ + set_pte_at(mm, addr, ptep, pte); +} + +static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm, + unsigned long addr, pte_t *ptep) +{ + return ptep_get_and_clear(mm, addr, ptep); +} + +#endif /* _ASM_X86_HUGETLB_H */ diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index addca4cd4f11..a79e80b689d8 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -8,6 +8,7 @@ #include #include #include +#include struct ctl_table; @@ -51,51 +52,6 @@ int pmd_huge(pmd_t pmd); void hugetlb_change_protection(struct vm_area_struct *vma, unsigned long address, unsigned long end, pgprot_t newprot); -#ifndef ARCH_HAS_HUGEPAGE_ONLY_RANGE -#define is_hugepage_only_range(mm, addr, len) 0 -#endif - -#ifndef ARCH_HAS_HUGETLB_FREE_PGD_RANGE -#define hugetlb_free_pgd_range free_pgd_range -#else -void hugetlb_free_pgd_range(struct mmu_gather **tlb, unsigned long addr, - unsigned long end, unsigned long floor, - unsigned long ceiling); -#endif - -#ifndef ARCH_HAS_PREPARE_HUGEPAGE_RANGE -/* - * If the arch doesn't supply something else, assume that hugepage - * size aligned regions are ok without further preparation. - */ -static inline int prepare_hugepage_range(unsigned long addr, unsigned long len) -{ - if (len & ~HPAGE_MASK) - return -EINVAL; - if (addr & ~HPAGE_MASK) - return -EINVAL; - return 0; -} -#else -int prepare_hugepage_range(unsigned long addr, unsigned long len); -#endif - -#ifndef ARCH_HAS_SETCLEAR_HUGE_PTE -#define set_huge_pte_at(mm, addr, ptep, pte) set_pte_at(mm, addr, ptep, pte) -#define huge_ptep_get_and_clear(mm, addr, ptep) ptep_get_and_clear(mm, addr, ptep) -#else -void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pte); -pte_t huge_ptep_get_and_clear(struct mm_struct *mm, unsigned long addr, - pte_t *ptep); -#endif - -#ifndef ARCH_HAS_HUGETLB_PREFAULT_HOOK -#define hugetlb_prefault_arch_hook(mm) do { } while (0) -#else -void hugetlb_prefault_arch_hook(struct mm_struct *mm); -#endif - #else /* !CONFIG_HUGETLB_PAGE */ static inline int is_vm_hugetlb_page(struct vm_area_struct *vma) -- cgit v1.2.3 From 8fe627ec5b7c47b1654dff50536d9709863295a3 Mon Sep 17 00:00:00 2001 From: Gerald Schaefer Date: Mon, 28 Apr 2008 02:13:28 -0700 Subject: hugetlbfs: add missing TLB flush to hugetlb_cow() A cow break on a hugetlbfs page with page_count > 1 will set a new pte with set_huge_pte_at(), w/o any tlb flush operation. The old pte will remain in the tlb and subsequent write access to the page will result in a page fault loop, for as long as it may take until the tlb is flushed from somewhere else. This patch introduces an architecture-specific huge_ptep_clear_flush() function, which is called before the the set_huge_pte_at() in hugetlb_cow(). ATTENTION: This is just a nop on all architectures for now, the s390 implementation will come with our large page patch later. Other architectures should define their own huge_ptep_clear_flush() if needed. Acked-by: Martin Schwidefsky Signed-off-by: Gerald Schaefer Cc: Paul Mundt Cc: "Luck, Tony" Cc: Ingo Molnar Cc: Thomas Gleixner Cc: "David S. Miller" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/asm-ia64/hugetlb.h | 5 +++++ include/asm-powerpc/hugetlb.h | 5 +++++ include/asm-sh/hugetlb.h | 5 +++++ include/asm-sparc64/hugetlb.h | 5 +++++ include/asm-x86/hugetlb.h | 5 +++++ mm/hugetlb.c | 1 + 6 files changed, 26 insertions(+) diff --git a/include/asm-ia64/hugetlb.h b/include/asm-ia64/hugetlb.h index f0ee14c6e172..5f5434374972 100644 --- a/include/asm-ia64/hugetlb.h +++ b/include/asm-ia64/hugetlb.h @@ -34,4 +34,9 @@ static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm, return ptep_get_and_clear(mm, addr, ptep); } +static inline void huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) +{ +} + #endif /* _ASM_IA64_HUGETLB_H */ diff --git a/include/asm-powerpc/hugetlb.h b/include/asm-powerpc/hugetlb.h index f537993c5c87..bead2ff78493 100644 --- a/include/asm-powerpc/hugetlb.h +++ b/include/asm-powerpc/hugetlb.h @@ -34,4 +34,9 @@ static inline void hugetlb_prefault_arch_hook(struct mm_struct *mm) { } +static inline void huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) +{ +} + #endif /* _ASM_POWERPC_HUGETLB_H */ diff --git a/include/asm-sh/hugetlb.h b/include/asm-sh/hugetlb.h index 885218d2c844..d1ed476467a1 100644 --- a/include/asm-sh/hugetlb.h +++ b/include/asm-sh/hugetlb.h @@ -46,4 +46,9 @@ static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm, return ptep_get_and_clear(mm, addr, ptep); } +static inline void huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) +{ +} + #endif /* _ASM_SH_HUGETLB_H */ diff --git a/include/asm-sparc64/hugetlb.h b/include/asm-sparc64/hugetlb.h index 7e111cfd31ea..0b9e44c85c5d 100644 --- a/include/asm-sparc64/hugetlb.h +++ b/include/asm-sparc64/hugetlb.h @@ -39,4 +39,9 @@ static inline void hugetlb_free_pgd_range(struct mmu_gather **tlb, free_pgd_range(tlb, addr, end, floor, ceiling); } +static inline void huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) +{ +} + #endif /* _ASM_SPARC64_HUGETLB_H */ diff --git a/include/asm-x86/hugetlb.h b/include/asm-x86/hugetlb.h index ec21cedd7149..f57236dfc8f4 100644 --- a/include/asm-x86/hugetlb.h +++ b/include/asm-x86/hugetlb.h @@ -46,4 +46,9 @@ static inline pte_t huge_ptep_get_and_clear(struct mm_struct *mm, return ptep_get_and_clear(mm, addr, ptep); } +static inline void huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) +{ +} + #endif /* _ASM_X86_HUGETLB_H */ diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d36e1f11a5f2..262d0a93d2b6 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -892,6 +892,7 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma, ptep = huge_pte_offset(mm, address & HPAGE_MASK); if (likely(pte_same(*ptep, pte))) { /* Break COW */ + huge_ptep_clear_flush(vma, address, ptep); set_huge_pte_at(mm, address, ptep, make_huge_pte(vma, new_page, 1)); /* Make the old page be freed below */ -- cgit v1.2.3 From 7f2e9525ba55b1c42ad6c4a5a59d7eb7bdd9be72 Mon Sep 17 00:00:00 2001 From: Gerald Schaefer Date: Mon, 28 Apr 2008 02:13:29 -0700 Subject: hugetlbfs: common code update for s390 Huge ptes have a special type on s390 and cannot be handled with the standard pte functions in certain cases, e.g. because of a different location of the invalid bit. This patch adds some new architecture- specific functions to hugetlb common code, as a prerequisite for the s390 large page support. This won't affect other architectures in functionality, but I need to add some new dummy inline functions to the headers. Acked-by: Martin Schwidefsky Signed-off-by: Gerald Schaefer Cc: Paul Mundt Cc: "Luck, Tony" Cc: Ingo Molnar Cc: Thomas Gleixner Cc: "David S. Miller" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/asm-ia64/hugetlb.h | 37 +++++++++++++++++++++++++++++++++++++ include/asm-powerpc/hugetlb.h | 37 +++++++++++++++++++++++++++++++++++++ include/asm-sh/hugetlb.h | 37 +++++++++++++++++++++++++++++++++++++ include/asm-sparc64/hugetlb.h | 37 +++++++++++++++++++++++++++++++++++++ include/asm-x86/hugetlb.h | 37 +++++++++++++++++++++++++++++++++++++ mm/hugetlb.c | 36 +++++++++++++++++++++--------------- 6 files changed, 206 insertions(+), 15 deletions(-) diff --git a/include/asm-ia64/hugetlb.h b/include/asm-ia64/hugetlb.h index 5f5434374972..f28a9701f1cf 100644 --- a/include/asm-ia64/hugetlb.h +++ b/include/asm-ia64/hugetlb.h @@ -39,4 +39,41 @@ static inline void huge_ptep_clear_flush(struct vm_area_struct *vma, { } +static inline int huge_pte_none(pte_t pte) +{ + return pte_none(pte); +} + +static inline pte_t huge_pte_wrprotect(pte_t pte) +{ + return pte_wrprotect(pte); +} + +static inline void huge_ptep_set_wrprotect(struct mm_struct *mm, + unsigned long addr, pte_t *ptep) +{ + ptep_set_wrprotect(mm, addr, ptep); +} + +static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep, + pte_t pte, int dirty) +{ + return ptep_set_access_flags(vma, addr, ptep, pte, dirty); +} + +static inline pte_t huge_ptep_get(pte_t *ptep) +{ + return *ptep; +} + +static inline int arch_prepare_hugepage(struct page *page) +{ + return 0; +} + +static inline void arch_release_hugepage(struct page *page) +{ +} + #endif /* _ASM_IA64_HUGETLB_H */ diff --git a/include/asm-powerpc/hugetlb.h b/include/asm-powerpc/hugetlb.h index bead2ff78493..649c6c3b87b3 100644 --- a/include/asm-powerpc/hugetlb.h +++ b/include/asm-powerpc/hugetlb.h @@ -39,4 +39,41 @@ static inline void huge_ptep_clear_flush(struct vm_area_struct *vma, { } +static inline int huge_pte_none(pte_t pte) +{ + return pte_none(pte); +} + +static inline pte_t huge_pte_wrprotect(pte_t pte) +{ + return pte_wrprotect(pte); +} + +static inline void huge_ptep_set_wrprotect(struct mm_struct *mm, + unsigned long addr, pte_t *ptep) +{ + ptep_set_wrprotect(mm, addr, ptep); +} + +static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep, + pte_t pte, int dirty) +{ + return ptep_set_access_flags(vma, addr, ptep, pte, dirty); +} + +static inline pte_t huge_ptep_get(pte_t *ptep) +{ + return *ptep; +} + +static inline int arch_prepare_hugepage(struct page *page) +{ + return 0; +} + +static inline void arch_release_hugepage(struct page *page) +{ +} + #endif /* _ASM_POWERPC_HUGETLB_H */ diff --git a/include/asm-sh/hugetlb.h b/include/asm-sh/hugetlb.h index d1ed476467a1..02402303d89b 100644 --- a/include/asm-sh/hugetlb.h +++ b/include/asm-sh/hugetlb.h @@ -51,4 +51,41 @@ static inline void huge_ptep_clear_flush(struct vm_area_struct *vma, { } +static inline int huge_pte_none(pte_t pte) +{ + return pte_none(pte); +} + +static inline pte_t huge_pte_wrprotect(pte_t pte) +{ + return pte_wrprotect(pte); +} + +static inline void huge_ptep_set_wrprotect(struct mm_struct *mm, + unsigned long addr, pte_t *ptep) +{ + ptep_set_wrprotect(mm, addr, ptep); +} + +static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep, + pte_t pte, int dirty) +{ + return ptep_set_access_flags(vma, addr, ptep, pte, dirty); +} + +static inline pte_t huge_ptep_get(pte_t *ptep) +{ + return *ptep; +} + +static inline int arch_prepare_hugepage(struct page *page) +{ + return 0; +} + +static inline void arch_release_hugepage(struct page *page) +{ +} + #endif /* _ASM_SH_HUGETLB_H */ diff --git a/include/asm-sparc64/hugetlb.h b/include/asm-sparc64/hugetlb.h index 0b9e44c85c5d..412af58926a0 100644 --- a/include/asm-sparc64/hugetlb.h +++ b/include/asm-sparc64/hugetlb.h @@ -44,4 +44,41 @@ static inline void huge_ptep_clear_flush(struct vm_area_struct *vma, { } +static inline int huge_pte_none(pte_t pte) +{ + return pte_none(pte); +} + +static inline pte_t huge_pte_wrprotect(pte_t pte) +{ + return pte_wrprotect(pte); +} + +static inline void huge_ptep_set_wrprotect(struct mm_struct *mm, + unsigned long addr, pte_t *ptep) +{ + ptep_set_wrprotect(mm, addr, ptep); +} + +static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep, + pte_t pte, int dirty) +{ + return ptep_set_access_flags(vma, addr, ptep, pte, dirty); +} + +static inline pte_t huge_ptep_get(pte_t *ptep) +{ + return *ptep; +} + +static inline int arch_prepare_hugepage(struct page *page) +{ + return 0; +} + +static inline void arch_release_hugepage(struct page *page) +{ +} + #endif /* _ASM_SPARC64_HUGETLB_H */ diff --git a/include/asm-x86/hugetlb.h b/include/asm-x86/hugetlb.h index f57236dfc8f4..14171a4924f6 100644 --- a/include/asm-x86/hugetlb.h +++ b/include/asm-x86/hugetlb.h @@ -51,4 +51,41 @@ static inline void huge_ptep_clear_flush(struct vm_area_struct *vma, { } +static inline int huge_pte_none(pte_t pte) +{ + return pte_none(pte); +} + +static inline pte_t huge_pte_wrprotect(pte_t pte) +{ + return pte_wrprotect(pte); +} + +static inline void huge_ptep_set_wrprotect(struct mm_struct *mm, + unsigned long addr, pte_t *ptep) +{ + ptep_set_wrprotect(mm, addr, ptep); +} + +static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep, + pte_t pte, int dirty) +{ + return ptep_set_access_flags(vma, addr, ptep, pte, dirty); +} + +static inline pte_t huge_ptep_get(pte_t *ptep) +{ + return *ptep; +} + +static inline int arch_prepare_hugepage(struct page *page) +{ + return 0; +} + +static inline void arch_release_hugepage(struct page *page) +{ +} + #endif /* _ASM_X86_HUGETLB_H */ diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 262d0a93d2b6..df28c1773fb2 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -132,6 +132,7 @@ static void update_and_free_page(struct page *page) } set_compound_page_dtor(page, NULL); set_page_refcounted(page); + arch_release_hugepage(page); __free_pages(page, HUGETLB_PAGE_ORDER); } @@ -201,6 +202,10 @@ static struct page *alloc_fresh_huge_page_node(int nid) htlb_alloc_mask|__GFP_COMP|__GFP_THISNODE|__GFP_NOWARN, HUGETLB_PAGE_ORDER); if (page) { + if (arch_prepare_hugepage(page)) { + __free_pages(page, HUGETLB_PAGE_ORDER); + return 0; + } set_compound_page_dtor(page, free_huge_page); spin_lock(&hugetlb_lock); nr_huge_pages++; @@ -735,7 +740,7 @@ static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page, entry = pte_mkwrite(pte_mkdirty(mk_pte(page, vma->vm_page_prot))); } else { - entry = pte_wrprotect(mk_pte(page, vma->vm_page_prot)); + entry = huge_pte_wrprotect(mk_pte(page, vma->vm_page_prot)); } entry = pte_mkyoung(entry); entry = pte_mkhuge(entry); @@ -748,8 +753,8 @@ static void set_huge_ptep_writable(struct vm_area_struct *vma, { pte_t entry; - entry = pte_mkwrite(pte_mkdirty(*ptep)); - if (ptep_set_access_flags(vma, address, ptep, entry, 1)) { + entry = pte_mkwrite(pte_mkdirty(huge_ptep_get(ptep))); + if (huge_ptep_set_access_flags(vma, address, ptep, entry, 1)) { update_mmu_cache(vma, address, entry); } } @@ -779,10 +784,10 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, spin_lock(&dst->page_table_lock); spin_lock(&src->page_table_lock); - if (!pte_none(*src_pte)) { + if (!huge_pte_none(huge_ptep_get(src_pte))) { if (cow) - ptep_set_wrprotect(src, addr, src_pte); - entry = *src_pte; + huge_ptep_set_wrprotect(src, addr, src_pte); + entry = huge_ptep_get(src_pte); ptepage = pte_page(entry); get_page(ptepage); set_huge_pte_at(dst, addr, dst_pte, entry); @@ -826,7 +831,7 @@ void __unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, continue; pte = huge_ptep_get_and_clear(mm, address, ptep); - if (pte_none(pte)) + if (huge_pte_none(pte)) continue; page = pte_page(pte); @@ -890,7 +895,7 @@ static int hugetlb_cow(struct mm_struct *mm, struct vm_area_struct *vma, spin_lock(&mm->page_table_lock); ptep = huge_pte_offset(mm, address & HPAGE_MASK); - if (likely(pte_same(*ptep, pte))) { + if (likely(pte_same(huge_ptep_get(ptep), pte))) { /* Break COW */ huge_ptep_clear_flush(vma, address, ptep); set_huge_pte_at(mm, address, ptep, @@ -960,7 +965,7 @@ retry: goto backout; ret = 0; - if (!pte_none(*ptep)) + if (!huge_pte_none(huge_ptep_get(ptep))) goto backout; new_pte = make_huge_pte(vma, page, ((vma->vm_flags & VM_WRITE) @@ -1002,8 +1007,8 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, * the same page in the page cache. */ mutex_lock(&hugetlb_instantiation_mutex); - entry = *ptep; - if (pte_none(entry)) { + entry = huge_ptep_get(ptep); + if (huge_pte_none(entry)) { ret = hugetlb_no_page(mm, vma, address, ptep, write_access); mutex_unlock(&hugetlb_instantiation_mutex); return ret; @@ -1013,7 +1018,7 @@ int hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, spin_lock(&mm->page_table_lock); /* Check for a racing update before calling hugetlb_cow */ - if (likely(pte_same(entry, *ptep))) + if (likely(pte_same(entry, huge_ptep_get(ptep)))) if (write_access && !pte_write(entry)) ret = hugetlb_cow(mm, vma, address, ptep, entry); spin_unlock(&mm->page_table_lock); @@ -1043,7 +1048,8 @@ int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, */ pte = huge_pte_offset(mm, vaddr & HPAGE_MASK); - if (!pte || pte_none(*pte) || (write && !pte_write(*pte))) { + if (!pte || huge_pte_none(huge_ptep_get(pte)) || + (write && !pte_write(huge_ptep_get(pte)))) { int ret; spin_unlock(&mm->page_table_lock); @@ -1059,7 +1065,7 @@ int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, } pfn_offset = (vaddr & ~HPAGE_MASK) >> PAGE_SHIFT; - page = pte_page(*pte); + page = pte_page(huge_ptep_get(pte)); same_page: if (pages) { get_page(page); @@ -1108,7 +1114,7 @@ void hugetlb_change_protection(struct vm_area_struct *vma, continue; if (huge_pmd_unshare(mm, &address, ptep)) continue; - if (!pte_none(*ptep)) { + if (!huge_pte_none(huge_ptep_get(ptep))) { pte = huge_ptep_get_and_clear(mm, address, ptep); pte = pte_mkhuge(pte_modify(pte, newprot)); set_huge_pte_at(mm, address, ptep, pte); -- cgit v1.2.3 From 04753278769f3b6c3b79a080edb52f21d83bf6e2 Mon Sep 17 00:00:00 2001 From: Yasunori Goto Date: Mon, 28 Apr 2008 02:13:31 -0700 Subject: memory hotplug: register section/node id to free This patch set is to free pages which is allocated by bootmem for memory-hotremove. Some structures of memory management are allocated by bootmem. ex) memmap, etc. To remove memory physically, some of them must be freed according to circumstance. This patch set makes basis to free those pages, and free memmaps. Basic my idea is using remain members of struct page to remember information of users of bootmem (section number or node id). When the section is removing, kernel can confirm it. By this information, some issues can be solved. 1) When the memmap of removing section is allocated on other section by bootmem, it should/can be free. 2) When the memmap of removing section is allocated on the same section, it shouldn't be freed. Because the section has to be logical memory offlined already and all pages must be isolated against page allocater. If it is freed, page allocator may use it which will be removed physically soon. 3) When removing section has other section's memmap, kernel will be able to show easily which section should be removed before it for user. (Not implemented yet) 4) When the above case 2), the page isolation will be able to check and skip memmap's page when logical memory offline (offline_pages()). Current page isolation code fails in this case because this page is just reserved page and it can't distinguish this pages can be removed or not. But, it will be able to do by this patch. (Not implemented yet.) 5) The node information like pgdat has similar issues. But, this will be able to be solved too by this. (Not implemented yet, but, remembering node id in the pages.) Fortunately, current bootmem allocator just keeps PageReserved flags, and doesn't use any other members of page struct. The users of bootmem doesn't use them too. This patch: This is to register information which is node or section's id. Kernel can distinguish which node/section uses the pages allcated by bootmem. This is basis for hot-remove sections or nodes. Signed-off-by: Yasunori Goto Cc: Badari Pulavarty Cc: Yinghai Lu Cc: Yasunori Goto Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/memory_hotplug.h | 27 ++++++++++++ include/linux/mmzone.h | 1 + mm/bootmem.c | 1 + mm/memory_hotplug.c | 99 +++++++++++++++++++++++++++++++++++++++++- mm/sparse.c | 3 +- 5 files changed, 128 insertions(+), 3 deletions(-) diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index aca9c65f8d08..73e358612eaf 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -11,6 +11,15 @@ struct pglist_data; struct mem_section; #ifdef CONFIG_MEMORY_HOTPLUG + +/* + * Magic number for free bootmem. + * The normal smallest mapcount is -1. Here is smaller value than it. + */ +#define SECTION_INFO 0xfffffffe +#define MIX_INFO 0xfffffffd +#define NODE_INFO 0xfffffffc + /* * pgdat resizing functions */ @@ -145,6 +154,18 @@ static inline void arch_refresh_nodedata(int nid, pg_data_t *pgdat) #endif /* CONFIG_NUMA */ #endif /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */ +#ifdef CONFIG_SPARSEMEM_VMEMMAP +static inline void register_page_bootmem_info_node(struct pglist_data *pgdat) +{ +} +static inline void put_page_bootmem(struct page *page) +{ +} +#else +extern void register_page_bootmem_info_node(struct pglist_data *pgdat); +extern void put_page_bootmem(struct page *page); +#endif + #else /* ! CONFIG_MEMORY_HOTPLUG */ /* * Stub functions for when hotplug is off @@ -172,6 +193,10 @@ static inline int mhp_notimplemented(const char *func) return -ENOSYS; } +static inline void register_page_bootmem_info_node(struct pglist_data *pgdat) +{ +} + #endif /* ! CONFIG_MEMORY_HOTPLUG */ extern int add_memory(int nid, u64 start, u64 size); @@ -180,5 +205,7 @@ extern int remove_memory(u64 start, u64 size); extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn, int nr_pages); extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms); +extern struct page *sparse_decode_mem_map(unsigned long coded_mem_map, + unsigned long pnum); #endif /* __LINUX_MEMORY_HOTPLUG_H */ diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index c3828497f41d..aad98003176f 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -896,6 +896,7 @@ static inline struct mem_section *__nr_to_section(unsigned long nr) return &mem_section[SECTION_NR_TO_ROOT(nr)][nr & SECTION_ROOT_MASK]; } extern int __section_nr(struct mem_section* ms); +extern unsigned long usemap_size(void); /* * We use the lower bits of the mem_map pointer to store diff --git a/mm/bootmem.c b/mm/bootmem.c index b6791646143e..369624d2789c 100644 --- a/mm/bootmem.c +++ b/mm/bootmem.c @@ -461,6 +461,7 @@ void __init free_bootmem_node(pg_data_t *pgdat, unsigned long physaddr, unsigned long __init free_all_bootmem_node(pg_data_t *pgdat) { + register_page_bootmem_info_node(pgdat); return free_all_bootmem_core(pgdat); } diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index c8b3ca79de2d..cba36ef0d506 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -58,8 +58,105 @@ static void release_memory_resource(struct resource *res) return; } - #ifdef CONFIG_MEMORY_HOTPLUG_SPARSE +#ifndef CONFIG_SPARSEMEM_VMEMMAP +static void get_page_bootmem(unsigned long info, struct page *page, int magic) +{ + atomic_set(&page->_mapcount, magic); + SetPagePrivate(page); + set_page_private(page, info); + atomic_inc(&page->_count); +} + +void put_page_bootmem(struct page *page) +{ + int magic; + + magic = atomic_read(&page->_mapcount); + BUG_ON(magic >= -1); + + if (atomic_dec_return(&page->_count) == 1) { + ClearPagePrivate(page); + set_page_private(page, 0); + reset_page_mapcount(page); + __free_pages_bootmem(page, 0); + } + +} + +void register_page_bootmem_info_section(unsigned long start_pfn) +{ + unsigned long *usemap, mapsize, section_nr, i; + struct mem_section *ms; + struct page *page, *memmap; + + if (!pfn_valid(start_pfn)) + return; + + section_nr = pfn_to_section_nr(start_pfn); + ms = __nr_to_section(section_nr); + + /* Get section's memmap address */ + memmap = sparse_decode_mem_map(ms->section_mem_map, section_nr); + + /* + * Get page for the memmap's phys address + * XXX: need more consideration for sparse_vmemmap... + */ + page = virt_to_page(memmap); + mapsize = sizeof(struct page) * PAGES_PER_SECTION; + mapsize = PAGE_ALIGN(mapsize) >> PAGE_SHIFT; + + /* remember memmap's page */ + for (i = 0; i < mapsize; i++, page++) + get_page_bootmem(section_nr, page, SECTION_INFO); + + usemap = __nr_to_section(section_nr)->pageblock_flags; + page = virt_to_page(usemap); + + mapsize = PAGE_ALIGN(usemap_size()) >> PAGE_SHIFT; + + for (i = 0; i < mapsize; i++, page++) + get_page_bootmem(section_nr, page, MIX_INFO); + +} + +void register_page_bootmem_info_node(struct pglist_data *pgdat) +{ + unsigned long i, pfn, end_pfn, nr_pages; + int node = pgdat->node_id; + struct page *page; + struct zone *zone; + + nr_pages = PAGE_ALIGN(sizeof(struct pglist_data)) >> PAGE_SHIFT; + page = virt_to_page(pgdat); + + for (i = 0; i < nr_pages; i++, page++) + get_page_bootmem(node, page, NODE_INFO); + + zone = &pgdat->node_zones[0]; + for (; zone < pgdat->node_zones + MAX_NR_ZONES - 1; zone++) { + if (zone->wait_table) { + nr_pages = zone->wait_table_hash_nr_entries + * sizeof(wait_queue_head_t); + nr_pages = PAGE_ALIGN(nr_pages) >> PAGE_SHIFT; + page = virt_to_page(zone->wait_table); + + for (i = 0; i < nr_pages; i++, page++) + get_page_bootmem(node, page, NODE_INFO); + } + } + + pfn = pgdat->node_start_pfn; + end_pfn = pfn + pgdat->node_spanned_pages; + + /* register_section info */ + for (; pfn < end_pfn; pfn += PAGES_PER_SECTION) + register_page_bootmem_info_section(pfn); + +} +#endif /* !CONFIG_SPARSEMEM_VMEMMAP */ + static int __add_zone(struct zone *zone, unsigned long phys_start_pfn) { struct pglist_data *pgdat = zone->zone_pgdat; diff --git a/mm/sparse.c b/mm/sparse.c index 186a85bf7912..8903c484389a 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -210,7 +210,6 @@ static unsigned long sparse_encode_mem_map(struct page *mem_map, unsigned long p /* * Decode mem_map from the coded memmap */ -static struct page *sparse_decode_mem_map(unsigned long coded_mem_map, unsigned long pnum) { /* mask off the extra low bits of information */ @@ -233,7 +232,7 @@ static int __meminit sparse_init_one_section(struct mem_section *ms, return 1; } -static unsigned long usemap_size(void) +unsigned long usemap_size(void) { unsigned long size_bytes; size_bytes = roundup(SECTION_BLOCKFLAGS_BITS, 8) / 8; -- cgit v1.2.3 From 9d99217a02a06a7cc83f065b73e976970970c58c Mon Sep 17 00:00:00 2001 From: Yasunori Goto Date: Mon, 28 Apr 2008 02:13:32 -0700 Subject: memory hotplug: align memmap to page size To free memmap easier, this patch aligns it to page size. Bootmem allocater may mix some objects in one pages. It's not good for freeing memmap of memory hot-remove. Signed-off-by: Yasunori Goto Cc: Badari Pulavarty Cc: Yinghai Lu Cc: Yasunori Goto Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/sparse.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/sparse.c b/mm/sparse.c index 8903c484389a..5398d48c360a 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -273,8 +273,8 @@ struct page __init *sparse_mem_map_populate(unsigned long pnum, int nid) if (map) return map; - map = alloc_bootmem_node(NODE_DATA(nid), - sizeof(struct page) * PAGES_PER_SECTION); + map = alloc_bootmem_pages_node(NODE_DATA(nid), + PAGE_ALIGN(sizeof(struct page) * PAGES_PER_SECTION)); return map; } #endif /* !CONFIG_SPARSEMEM_VMEMMAP */ -- cgit v1.2.3 From e70260aabea3af2a84b951e75166dcebe689b88e Mon Sep 17 00:00:00 2001 From: Yasunori Goto Date: Mon, 28 Apr 2008 02:13:32 -0700 Subject: memory hotplug: make alloc_bootmem_section() alloc_bootmem_section() can allocate specified section's area. This is used for usemap to keep same section with pgdat by later patch. Signed-off-by: Yasunori Goto Cc: Badari Pulavarty Cc: Yinghai Lu Cc: Yasunori Goto Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/bootmem.h | 2 ++ mm/bootmem.c | 31 +++++++++++++++++++++++++++++++ 2 files changed, 33 insertions(+) diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h index 4e4e340592fb..6a5dbdc8a7dc 100644 --- a/include/linux/bootmem.h +++ b/include/linux/bootmem.h @@ -101,6 +101,8 @@ extern void reserve_bootmem_node(pg_data_t *pgdat, extern void free_bootmem_node(pg_data_t *pgdat, unsigned long addr, unsigned long size); +extern void *alloc_bootmem_section(unsigned long size, + unsigned long section_nr); #ifndef CONFIG_HAVE_ARCH_BOOTMEM_NODE #define alloc_bootmem_node(pgdat, x) \ diff --git a/mm/bootmem.c b/mm/bootmem.c index 369624d2789c..e8fb927392b9 100644 --- a/mm/bootmem.c +++ b/mm/bootmem.c @@ -545,6 +545,37 @@ void * __init __alloc_bootmem_node(pg_data_t *pgdat, unsigned long size, return __alloc_bootmem(size, align, goal); } +#ifdef CONFIG_SPARSEMEM +void * __init alloc_bootmem_section(unsigned long size, + unsigned long section_nr) +{ + void *ptr; + unsigned long limit, goal, start_nr, end_nr, pfn; + struct pglist_data *pgdat; + + pfn = section_nr_to_pfn(section_nr); + goal = PFN_PHYS(pfn); + limit = PFN_PHYS(section_nr_to_pfn(section_nr + 1)) - 1; + pgdat = NODE_DATA(early_pfn_to_nid(pfn)); + ptr = __alloc_bootmem_core(pgdat->bdata, size, SMP_CACHE_BYTES, goal, + limit); + + if (!ptr) + return NULL; + + start_nr = pfn_to_section_nr(PFN_DOWN(__pa(ptr))); + end_nr = pfn_to_section_nr(PFN_DOWN(__pa(ptr) + size)); + if (start_nr != section_nr || end_nr != section_nr) { + printk(KERN_WARNING "alloc_bootmem failed on section %ld.\n", + section_nr); + free_bootmem_core(pgdat->bdata, __pa(ptr), size); + ptr = NULL; + } + + return ptr; +} +#endif + #ifndef ARCH_LOW_ADDRESS_LIMIT #define ARCH_LOW_ADDRESS_LIMIT 0xffffffffUL #endif -- cgit v1.2.3 From 86f6dae1377523689bd8468fed2f2dd180fc0560 Mon Sep 17 00:00:00 2001 From: Yasunori Goto Date: Mon, 28 Apr 2008 02:13:33 -0700 Subject: memory hotplug: allocate usemap on the section with pgdat Usemaps are allocated on the section which has pgdat by this. Because usemap size is very small, many other sections usemaps are allocated on only one page. If a section has usemap, it can't be removed until removing other sections. This dependency is not desirable for memory removing. Pgdat has similar feature. When a section has pgdat area, it must be the last section for removing on the node. So, if section A has pgdat and section B has usemap for section A, Both sections can't be removed due to dependency each other. To solve this issue, this patch collects usemap on same section with pgdat. If other sections doesn't have any dependency, this section will be able to be removed finally. Signed-off-by: Yasunori Goto Cc: Badari Pulavarty Cc: Yinghai Lu Cc: Yasunori Goto Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/sparse.c | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/mm/sparse.c b/mm/sparse.c index 5398d48c360a..08f053218ee8 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -249,11 +249,22 @@ static unsigned long *__kmalloc_section_usemap(void) static unsigned long *__init sparse_early_usemap_alloc(unsigned long pnum) { - unsigned long *usemap; + unsigned long *usemap, section_nr; struct mem_section *ms = __nr_to_section(pnum); int nid = sparse_early_nid(ms); + struct pglist_data *pgdat = NODE_DATA(nid); - usemap = alloc_bootmem_node(NODE_DATA(nid), usemap_size()); + /* + * Usemap's page can't be freed until freeing other sections + * which use it. And, Pgdat has same feature. + * If section A has pgdat and section B has usemap for other + * sections (includes section A), both sections can't be removed, + * because there is the dependency each other. + * To solve above issue, this collects all usemap on the same section + * which has pgdat. + */ + section_nr = pfn_to_section_nr(__pa(pgdat) >> PAGE_SHIFT); + usemap = alloc_bootmem_section(usemap_size(), section_nr); if (usemap) return usemap; -- cgit v1.2.3 From 0c0a4a517a31e05efb38304668198a873bfec6ca Mon Sep 17 00:00:00 2001 From: Yasunori Goto Date: Mon, 28 Apr 2008 02:13:34 -0700 Subject: memory hotplug: free memmaps allocated by bootmem This patch is to free memmaps which is allocated by bootmem. Freeing usemap is not necessary. The pages of usemap may be necessary for other sections. If removing section is last section on the node, its section is the final user of usemap page. (usemaps are allocated on its section by previous patch.) But it shouldn't be freed too, because the section must be logical offline state which all pages are isolated against page allocater. If it is freed, page alloctor may use it which will be removed physically soon. It will be disaster. So, this patch keeps it as it is. Signed-off-by: Yasunori Goto Cc: Badari Pulavarty Cc: Yinghai Lu Cc: Yasunori Goto Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/internal.h | 3 +-- mm/memory_hotplug.c | 11 +++++++++++ mm/page_alloc.c | 2 +- mm/sparse.c | 51 +++++++++++++++++++++++++++++++++++++++++++++++---- 4 files changed, 60 insertions(+), 7 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 789727309f4d..0034e947e4bc 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -34,8 +34,7 @@ static inline void __put_page(struct page *page) atomic_dec(&page->_count); } -extern void __init __free_pages_bootmem(struct page *page, - unsigned int order); +extern void __free_pages_bootmem(struct page *page, unsigned int order); /* * function for dealing with page's order in buddy system. diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index cba36ef0d506..c4ba85c8cb00 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -198,6 +198,16 @@ static int __add_section(struct zone *zone, unsigned long phys_start_pfn) return register_new_memory(__pfn_to_section(phys_start_pfn)); } +#ifdef CONFIG_SPARSEMEM_VMEMMAP +static int __remove_section(struct zone *zone, struct mem_section *ms) +{ + /* + * XXX: Freeing memmap with vmemmap is not implement yet. + * This should be removed later. + */ + return -EBUSY; +} +#else static int __remove_section(struct zone *zone, struct mem_section *ms) { unsigned long flags; @@ -216,6 +226,7 @@ static int __remove_section(struct zone *zone, struct mem_section *ms) pgdat_resize_unlock(pgdat, &flags); return 0; } +#endif /* * Reasonably generic function for adding memory. It is diff --git a/mm/page_alloc.c b/mm/page_alloc.c index e0fc3baba843..d3358efdf4e6 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -546,7 +546,7 @@ static void __free_pages_ok(struct page *page, unsigned int order) /* * permit the bootmem allocator to evade page validation on high-order frees */ -void __init __free_pages_bootmem(struct page *page, unsigned int order) +void __free_pages_bootmem(struct page *page, unsigned int order) { if (order == 0) { __ClearPageReserved(page); diff --git a/mm/sparse.c b/mm/sparse.c index 08f053218ee8..dff71f173ae9 100644 --- a/mm/sparse.c +++ b/mm/sparse.c @@ -8,6 +8,7 @@ #include #include #include +#include "internal.h" #include #include #include @@ -376,6 +377,9 @@ static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages) { return; /* XXX: Not implemented yet */ } +static void free_map_bootmem(struct page *page, unsigned long nr_pages) +{ +} #else static struct page *__kmalloc_section_memmap(unsigned long nr_pages) { @@ -413,17 +417,47 @@ static void __kfree_section_memmap(struct page *memmap, unsigned long nr_pages) free_pages((unsigned long)memmap, get_order(sizeof(struct page) * nr_pages)); } + +static void free_map_bootmem(struct page *page, unsigned long nr_pages) +{ + unsigned long maps_section_nr, removing_section_nr, i; + int magic; + + for (i = 0; i < nr_pages; i++, page++) { + magic = atomic_read(&page->_mapcount); + + BUG_ON(magic == NODE_INFO); + + maps_section_nr = pfn_to_section_nr(page_to_pfn(page)); + removing_section_nr = page->private; + + /* + * When this function is called, the removing section is + * logical offlined state. This means all pages are isolated + * from page allocator. If removing section's memmap is placed + * on the same section, it must not be freed. + * If it is freed, page allocator may allocate it which will + * be removed physically soon. + */ + if (maps_section_nr != removing_section_nr) + put_page_bootmem(page); + } +} #endif /* CONFIG_SPARSEMEM_VMEMMAP */ static void free_section_usemap(struct page *memmap, unsigned long *usemap) { + struct page *usemap_page; + unsigned long nr_pages; + if (!usemap) return; + usemap_page = virt_to_page(usemap); /* * Check to see if allocation came from hot-plug-add */ - if (PageSlab(virt_to_page(usemap))) { + if (PageSlab(usemap_page)) { kfree(usemap); if (memmap) __kfree_section_memmap(memmap, PAGES_PER_SECTION); @@ -431,10 +465,19 @@ static void free_section_usemap(struct page *memmap, unsigned long *usemap) } /* - * TODO: Allocations came from bootmem - how do I free up ? + * The usemap came from bootmem. This is packed with other usemaps + * on the section which has pgdat at boot time. Just keep it as is now. */ - printk(KERN_WARNING "Not freeing up allocations from bootmem " - "- leaking memory\n"); + + if (memmap) { + struct page *memmap_page; + memmap_page = virt_to_page(memmap); + + nr_pages = PAGE_ALIGN(PAGES_PER_SECTION * sizeof(struct page)) + >> PAGE_SHIFT; + + free_map_bootmem(memmap_page, nr_pages); + } } /* -- cgit v1.2.3 From 97d87c9710bc6c5f2585fb9dc58f5bedbe996f10 Mon Sep 17 00:00:00 2001 From: Li Zefan Date: Mon, 28 Apr 2008 02:13:35 -0700 Subject: oom_kill: remove unused parameter in badness() In commit 4c4a22148909e4c003562ea7ffe0a06e26919e3c, we moved the memcontroller-related code from badness() to select_bad_process(), so the parameter 'mem' in badness() is unused now. Signed-off-by: Li Zefan Acked-by: Balbir Singh Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/oom_kill.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/mm/oom_kill.c b/mm/oom_kill.c index e41504aa5da9..8a5467ee6265 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -53,8 +53,7 @@ static DEFINE_SPINLOCK(zone_scan_mutex); * of least surprise ... (be careful when you change it) */ -unsigned long badness(struct task_struct *p, unsigned long uptime, - struct mem_cgroup *mem) +unsigned long badness(struct task_struct *p, unsigned long uptime) { unsigned long points, cpu_time, run_time, s; struct mm_struct *mm; @@ -256,7 +255,7 @@ static struct task_struct *select_bad_process(unsigned long *ppoints, if (p->oomkilladj == OOM_DISABLE) continue; - points = badness(p, uptime.tv_sec, mem); + points = badness(p, uptime.tv_sec); if (points > *ppoints || !chosen) { chosen = p; *ppoints = points; -- cgit v1.2.3 From 2309f9e6fe3f1de661eab9613f7903ab4420c753 Mon Sep 17 00:00:00 2001 From: Pavel Machek Date: Mon, 28 Apr 2008 02:13:35 -0700 Subject: mm/page_alloc.c: remove hand-coded get_order() Remove hand-coded get_order() from page_alloc.c. Signed-off-by: Pavel Machek Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/page_alloc.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d3358efdf4e6..d1cf4f05dcda 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -4345,9 +4345,7 @@ void *__init alloc_large_system_hash(const char *tablename, else if (hashdist) table = __vmalloc(size, GFP_ATOMIC, PAGE_KERNEL); else { - unsigned long order; - for (order = 0; ((1UL << order) << PAGE_SHIFT) < size; order++) - ; + unsigned long order = get_order(size); table = (void*) __get_free_pages(GFP_ATOMIC, order); /* * If bucketsize is not a power-of-two, we may free -- cgit v1.2.3 From 8cece85ec744bdc7ea0fc2d33f65b3f031c28468 Mon Sep 17 00:00:00 2001 From: KAMEZAWA Hiroyuki Date: Mon, 28 Apr 2008 02:13:36 -0700 Subject: mm: fix broken gfp_zone with __GFP_THISNODE This hack, "base = MAX_NR_ZONES", at __GFP_THISNODE was used for old zonliests. Now, new zonelist[] have a list for __GFP_THISNODE and this hack is incorrect. Should be removed. Signed-off-by: KAMEZAWA Hiroyuki Cc: Mel Gorman Cc: Christoph Lameter Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/gfp.h | 17 +++++------------ 1 file changed, 5 insertions(+), 12 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index 898aa9d5b6c2..c37653b6843f 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -119,29 +119,22 @@ static inline int allocflags_to_migratetype(gfp_t gfp_flags) static inline enum zone_type gfp_zone(gfp_t flags) { - int base = 0; - -#ifdef CONFIG_NUMA - if (flags & __GFP_THISNODE) - base = MAX_NR_ZONES; -#endif - #ifdef CONFIG_ZONE_DMA if (flags & __GFP_DMA) - return base + ZONE_DMA; + return ZONE_DMA; #endif #ifdef CONFIG_ZONE_DMA32 if (flags & __GFP_DMA32) - return base + ZONE_DMA32; + return ZONE_DMA32; #endif if ((flags & (__GFP_HIGHMEM | __GFP_MOVABLE)) == (__GFP_HIGHMEM | __GFP_MOVABLE)) - return base + ZONE_MOVABLE; + return ZONE_MOVABLE; #ifdef CONFIG_HIGHMEM if (flags & __GFP_HIGHMEM) - return base + ZONE_HIGHMEM; + return ZONE_HIGHMEM; #endif - return base + ZONE_NORMAL; + return ZONE_NORMAL; } /* -- cgit v1.2.3 From 468fd62ed9090ccbe872489df5d0d099510df4b5 Mon Sep 17 00:00:00 2001 From: Dimitri Sivanich Date: Mon, 28 Apr 2008 02:13:37 -0700 Subject: vmstats: add cond_resched() to refresh_cpu_vm_stats() We've found that it can take quite a bit of time (100's of usec) to get through the zone loop in refresh_cpu_vm_stats(). Adding a cond_resched() to allow other threads to run in the non-preemptive case. Signed-off-by: Dimitri Sivanich Acked-by: Christoph Lameter Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/vmstat.c | 1 + 1 file changed, 1 insertion(+) diff --git a/mm/vmstat.c b/mm/vmstat.c index 4c21670f8d91..ec6035eda933 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -322,6 +322,7 @@ void refresh_cpu_vm_stats(int cpu) p->expire = 3; #endif } + cond_resched(); #ifdef CONFIG_NUMA /* * Deal with draining the remote pageset of this -- cgit v1.2.3 From 4016a1390d07f15b267eecb20e76a48fd5c524ef Mon Sep 17 00:00:00 2001 From: Michael Hennerich Date: Mon, 28 Apr 2008 02:13:38 -0700 Subject: mm/nommu.c: return 0 from kobjsize with invalid objects Don't perform kobjsize operations on objects the kernel doesn't manage. On Blackfin, drivers can get dma coherent memory by calling a function dma_alloc_coherent(). We do this in nommu by configuring a chunk of uncached memory at the top of memory. Since we don't want the kernel to use the uncached memory, we lie to the kernel, and tell it that it's max memory is between 0, and the start of the uncached dma coherent section. this all works well, until this memory gets exposed into userspace (with a frame buffer), when you look at the process's maps, it shows the framebuf: root:/proc> cat maps [snip] 03f0ef00-03f34700 rw-p 00000000 1f:00 192 /dev/fb0 root:/proc> This is outside the "normal" range for the kernel. When the kernel tries to find the size of this object (when you run ps), it dies in nommu.c in kobjsize. BUG_ON(page->index >= MAX_ORDER); since the page we are referring to is outside what the kernel thinks is it's max valid memory. root:~> while [ 1 ]; ps > /dev/null; done kernel BUG at mm/nommu.c:119! Kernel panic - not syncing: BUG! We fixed this by adding a check to reject out of range object pointers as it already does that for NULL pointers. Signed-off-by: Michael Hennerich Signed-off-by: Robin Getz Acked-by: David Howells Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- mm/nommu.c | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/mm/nommu.c b/mm/nommu.c index 5d8ae086f74e..1d32fe89d57b 100644 --- a/mm/nommu.c +++ b/mm/nommu.c @@ -105,7 +105,11 @@ unsigned int kobjsize(const void *objp) { struct page *page; - if (!objp || !((page = virt_to_page(objp)))) + /* + * If the object we have should not have ksize performed on it, + * return size of 0 + */ + if (!objp || (unsigned long)objp >= memory_end || !((page = virt_to_page(objp)))) return 0; if (PageSlab(page)) -- cgit v1.2.3 From 3898b1b4ebff8dcfbcf1807e0661585e06c9a91c Mon Sep 17 00:00:00 2001 From: "Andrew G. Morgan" Date: Mon, 28 Apr 2008 02:13:40 -0700 Subject: capabilities: implement per-process securebits Filesystem capability support makes it possible to do away with (set)uid-0 based privilege and use capabilities instead. That is, with filesystem support for capabilities but without this present patch, it is (conceptually) possible to manage a system with capabilities alone and never need to obtain privilege via (set)uid-0. Of course, conceptually isn't quite the same as currently possible since few user applications, certainly not enough to run a viable system, are currently prepared to leverage capabilities to exercise privilege. Further, many applications exist that may never get upgraded in this way, and the kernel will continue to want to support their setuid-0 base privilege needs. Where pure-capability applications evolve and replace setuid-0 binaries, it is desirable that there be a mechanisms by which they can contain their privilege. In addition to leveraging the per-process bounding and inheritable sets, this should include suppressing the privilege of the uid-0 superuser from the process' tree of children. The feature added by this patch can be leveraged to suppress the privilege associated with (set)uid-0. This suppression requires CAP_SETPCAP to initiate, and only immediately affects the 'current' process (it is inherited through fork()/exec()). This reimplementation differs significantly from the historical support for securebits which was system-wide, unwieldy and which has ultimately withered to a dead relic in the source of the modern kernel. With this patch applied a process, that is capable(CAP_SETPCAP), can now drop all legacy privilege (through uid=0) for itself and all subsequently fork()'d/exec()'d children with: prctl(PR_SET_SECUREBITS, 0x2f); This patch represents a no-op unless CONFIG_SECURITY_FILE_CAPABILITIES is enabled at configure time. [akpm@linux-foundation.org: fix uninitialised var warning] [serue@us.ibm.com: capabilities: use cap_task_prctl when !CONFIG_SECURITY] Signed-off-by: Andrew G. Morgan Acked-by: Serge Hallyn Reviewed-by: James Morris Cc: Stephen Smalley Cc: Paul Moore Signed-off-by: Serge E. Hallyn Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/capability.h | 3 +- include/linux/init_task.h | 3 +- include/linux/prctl.h | 9 +++- include/linux/sched.h | 3 +- include/linux/securebits.h | 25 ++++++++--- include/linux/security.h | 16 ++++--- kernel/sys.c | 27 +----------- security/capability.c | 1 + security/commoncap.c | 103 +++++++++++++++++++++++++++++++++++++++++---- security/dummy.c | 2 +- security/security.c | 4 +- security/selinux/hooks.c | 5 ++- 12 files changed, 141 insertions(+), 60 deletions(-) diff --git a/include/linux/capability.h b/include/linux/capability.h index 7d50ff6d269f..eaab759b1460 100644 --- a/include/linux/capability.h +++ b/include/linux/capability.h @@ -155,6 +155,7 @@ typedef struct kernel_cap_struct { * Add any capability from current's capability bounding set * to the current process' inheritable set * Allow taking bits out of capability bounding set + * Allow modification of the securebits for a process */ #define CAP_SETPCAP 8 @@ -490,8 +491,6 @@ extern const kernel_cap_t __cap_init_eff_set; int capable(int cap); int __capable(struct task_struct *t, int cap); -extern long cap_prctl_drop(unsigned long cap); - #endif /* __KERNEL__ */ #endif /* !_LINUX_CAPABILITY_H */ diff --git a/include/linux/init_task.h b/include/linux/init_task.h index 37a6f5bc4a92..bf6b8a61f8db 100644 --- a/include/linux/init_task.h +++ b/include/linux/init_task.h @@ -9,6 +9,7 @@ #include #include #include +#include #include #define INIT_FDTABLE \ @@ -172,7 +173,7 @@ extern struct group_info init_groups; .cap_inheritable = CAP_INIT_INH_SET, \ .cap_permitted = CAP_FULL_SET, \ .cap_bset = CAP_INIT_BSET, \ - .keep_capabilities = 0, \ + .securebits = SECUREBITS_DEFAULT, \ .user = INIT_USER, \ .comm = "swapper", \ .thread = INIT_THREAD, \ diff --git a/include/linux/prctl.h b/include/linux/prctl.h index 5c80b1939636..5ad79198d6f9 100644 --- a/include/linux/prctl.h +++ b/include/linux/prctl.h @@ -16,7 +16,8 @@ # define PR_UNALIGN_NOPRINT 1 /* silently fix up unaligned user accesses */ # define PR_UNALIGN_SIGBUS 2 /* generate SIGBUS on unaligned user access */ -/* Get/set whether or not to drop capabilities on setuid() away from uid 0 */ +/* Get/set whether or not to drop capabilities on setuid() away from + * uid 0 (as per security/commoncap.c) */ #define PR_GET_KEEPCAPS 7 #define PR_SET_KEEPCAPS 8 @@ -63,7 +64,7 @@ #define PR_GET_SECCOMP 21 #define PR_SET_SECCOMP 22 -/* Get/set the capability bounding set */ +/* Get/set the capability bounding set (as per security/commoncap.c) */ #define PR_CAPBSET_READ 23 #define PR_CAPBSET_DROP 24 @@ -73,4 +74,8 @@ # define PR_TSC_ENABLE 1 /* allow the use of the timestamp counter */ # define PR_TSC_SIGSEGV 2 /* throw a SIGSEGV instead of reading the TSC */ +/* Get/set securebits (as per security/commoncap.c) */ +#define PR_GET_SECUREBITS 27 +#define PR_SET_SECUREBITS 28 + #endif /* _LINUX_PRCTL_H */ diff --git a/include/linux/sched.h b/include/linux/sched.h index 9a4f3e63e3bf..024d72b47a0c 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -68,7 +68,6 @@ struct sched_param { #include #include #include -#include #include #include #include @@ -1133,7 +1132,7 @@ struct task_struct { gid_t gid,egid,sgid,fsgid; struct group_info *group_info; kernel_cap_t cap_effective, cap_inheritable, cap_permitted, cap_bset; - unsigned keep_capabilities:1; + unsigned securebits; struct user_struct *user; #ifdef CONFIG_KEYS struct key *request_key_auth; /* assumed request_key authority */ diff --git a/include/linux/securebits.h b/include/linux/securebits.h index 5b0617840fa4..c1f19dbceb05 100644 --- a/include/linux/securebits.h +++ b/include/linux/securebits.h @@ -3,28 +3,39 @@ #define SECUREBITS_DEFAULT 0x00000000 -extern unsigned securebits; - /* When set UID 0 has no special privileges. When unset, we support inheritance of root-permissions and suid-root executable under compatibility mode. We raise the effective and inheritable bitmasks *of the executable file* if the effective uid of the new process is 0. If the real uid is 0, we raise the inheritable bitmask of the executable file. */ -#define SECURE_NOROOT 0 +#define SECURE_NOROOT 0 +#define SECURE_NOROOT_LOCKED 1 /* make bit-0 immutable */ /* When set, setuid to/from uid 0 does not trigger capability-"fixes" to be compatible with old programs relying on set*uid to loose privileges. When unset, setuid doesn't change privileges. */ -#define SECURE_NO_SETUID_FIXUP 2 +#define SECURE_NO_SETUID_FIXUP 2 +#define SECURE_NO_SETUID_FIXUP_LOCKED 3 /* make bit-2 immutable */ + +/* When set, a process can retain its capabilities even after + transitioning to a non-root user (the set-uid fixup suppressed by + bit 2). Bit-4 is cleared when a process calls exec(); setting both + bit 4 and 5 will create a barrier through exec that no exec()'d + child can use this feature again. */ +#define SECURE_KEEP_CAPS 4 +#define SECURE_KEEP_CAPS_LOCKED 5 /* make bit-4 immutable */ /* Each securesetting is implemented using two bits. One bit specify whether the setting is on or off. The other bit specify whether the setting is fixed or not. A setting which is fixed cannot be changed from user-level. */ +#define issecure_mask(X) (1 << (X)) +#define issecure(X) (issecure_mask(X) & current->securebits) -#define issecure(X) ( (1 << (X+1)) & SECUREBITS_DEFAULT ? \ - (1 << (X)) & SECUREBITS_DEFAULT : \ - (1 << (X)) & securebits ) +#define SECURE_ALL_BITS (issecure_mask(SECURE_NOROOT) | \ + issecure_mask(SECURE_NO_SETUID_FIXUP) | \ + issecure_mask(SECURE_KEEP_CAPS)) +#define SECURE_ALL_LOCKS (SECURE_ALL_BITS << 1) #endif /* !_LINUX_SECUREBITS_H */ diff --git a/include/linux/security.h b/include/linux/security.h index 53a34539382a..e6299e50e210 100644 --- a/include/linux/security.h +++ b/include/linux/security.h @@ -34,8 +34,6 @@ #include #include -extern unsigned securebits; - /* Maximum number of letters for an LSM name string */ #define SECURITY_NAME_MAX 10 @@ -61,6 +59,8 @@ extern int cap_inode_need_killpriv(struct dentry *dentry); extern int cap_inode_killpriv(struct dentry *dentry); extern int cap_task_post_setuid (uid_t old_ruid, uid_t old_euid, uid_t old_suid, int flags); extern void cap_task_reparent_to_init (struct task_struct *p); +extern int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3, + unsigned long arg4, unsigned long arg5, long *rc_p); extern int cap_task_setscheduler (struct task_struct *p, int policy, struct sched_param *lp); extern int cap_task_setioprio (struct task_struct *p, int ioprio); extern int cap_task_setnice (struct task_struct *p, int nice); @@ -720,7 +720,9 @@ static inline void security_free_mnt_opts(struct security_mnt_opts *opts) * @arg3 contains a argument. * @arg4 contains a argument. * @arg5 contains a argument. - * Return 0 if permission is granted. + * @rc_p contains a pointer to communicate back the forced return code + * Return 0 if permission is granted, and non-zero if the security module + * has taken responsibility (setting *rc_p) for the prctl call. * @task_reparent_to_init: * Set the security attributes in @p->security for a kernel thread that * is being reparented to the init task. @@ -1420,7 +1422,7 @@ struct security_operations { int (*task_wait) (struct task_struct * p); int (*task_prctl) (int option, unsigned long arg2, unsigned long arg3, unsigned long arg4, - unsigned long arg5); + unsigned long arg5, long *rc_p); void (*task_reparent_to_init) (struct task_struct * p); void (*task_to_inode)(struct task_struct *p, struct inode *inode); @@ -1684,7 +1686,7 @@ int security_task_kill(struct task_struct *p, struct siginfo *info, int sig, u32 secid); int security_task_wait(struct task_struct *p); int security_task_prctl(int option, unsigned long arg2, unsigned long arg3, - unsigned long arg4, unsigned long arg5); + unsigned long arg4, unsigned long arg5, long *rc_p); void security_task_reparent_to_init(struct task_struct *p); void security_task_to_inode(struct task_struct *p, struct inode *inode); int security_ipc_permission(struct kern_ipc_perm *ipcp, short flag); @@ -2271,9 +2273,9 @@ static inline int security_task_wait (struct task_struct *p) static inline int security_task_prctl (int option, unsigned long arg2, unsigned long arg3, unsigned long arg4, - unsigned long arg5) + unsigned long arg5, long *rc_p) { - return 0; + return cap_task_prctl(option, arg2, arg3, arg3, arg5, rc_p); } static inline void security_task_reparent_to_init (struct task_struct *p) diff --git a/kernel/sys.c b/kernel/sys.c index 6a0cc71ee88d..f2a451366953 100644 --- a/kernel/sys.c +++ b/kernel/sys.c @@ -1632,10 +1632,9 @@ asmlinkage long sys_umask(int mask) asmlinkage long sys_prctl(int option, unsigned long arg2, unsigned long arg3, unsigned long arg4, unsigned long arg5) { - long error; + long uninitialized_var(error); - error = security_task_prctl(option, arg2, arg3, arg4, arg5); - if (error) + if (security_task_prctl(option, arg2, arg3, arg4, arg5, &error)) return error; switch (option) { @@ -1688,17 +1687,6 @@ asmlinkage long sys_prctl(int option, unsigned long arg2, unsigned long arg3, error = -EINVAL; break; - case PR_GET_KEEPCAPS: - if (current->keep_capabilities) - error = 1; - break; - case PR_SET_KEEPCAPS: - if (arg2 != 0 && arg2 != 1) { - error = -EINVAL; - break; - } - current->keep_capabilities = arg2; - break; case PR_SET_NAME: { struct task_struct *me = current; unsigned char ncomm[sizeof(me->comm)]; @@ -1732,17 +1720,6 @@ asmlinkage long sys_prctl(int option, unsigned long arg2, unsigned long arg3, case PR_SET_SECCOMP: error = prctl_set_seccomp(arg2); break; - - case PR_CAPBSET_READ: - if (!cap_valid(arg2)) - return -EINVAL; - return !!cap_raised(current->cap_bset, arg2); - case PR_CAPBSET_DROP: -#ifdef CONFIG_SECURITY_FILE_CAPABILITIES - return cap_prctl_drop(arg2); -#else - return -EINVAL; -#endif case PR_GET_TSC: error = GET_TSC_CTL(arg2); break; diff --git a/security/capability.c b/security/capability.c index 2c6e06d18fab..38ac54e3aed1 100644 --- a/security/capability.c +++ b/security/capability.c @@ -44,6 +44,7 @@ static struct security_operations capability_ops = { .task_setioprio = cap_task_setioprio, .task_setnice = cap_task_setnice, .task_post_setuid = cap_task_post_setuid, + .task_prctl = cap_task_prctl, .task_reparent_to_init = cap_task_reparent_to_init, .syslog = cap_syslog, diff --git a/security/commoncap.c b/security/commoncap.c index 852905789caf..e8c3f5e46705 100644 --- a/security/commoncap.c +++ b/security/commoncap.c @@ -24,11 +24,8 @@ #include #include #include - -/* Global security state */ - -unsigned securebits = SECUREBITS_DEFAULT; /* systemwide security settings */ -EXPORT_SYMBOL(securebits); +#include +#include int cap_netlink_send(struct sock *sk, struct sk_buff *skb) { @@ -368,7 +365,7 @@ void cap_bprm_apply_creds (struct linux_binprm *bprm, int unsafe) /* AUD: Audit candidate if current->cap_effective is set */ - current->keep_capabilities = 0; + current->securebits &= ~issecure_mask(SECURE_KEEP_CAPS); } int cap_bprm_secureexec (struct linux_binprm *bprm) @@ -448,7 +445,7 @@ static inline void cap_emulate_setxuid (int old_ruid, int old_euid, { if ((old_ruid == 0 || old_euid == 0 || old_suid == 0) && (current->uid != 0 && current->euid != 0 && current->suid != 0) && - !current->keep_capabilities) { + !issecure(SECURE_KEEP_CAPS)) { cap_clear (current->cap_permitted); cap_clear (current->cap_effective); } @@ -547,7 +544,7 @@ int cap_task_setnice (struct task_struct *p, int nice) * this task could get inconsistent info. There can be no * racing writer bc a task can only change its own caps. */ -long cap_prctl_drop(unsigned long cap) +static long cap_prctl_drop(unsigned long cap) { if (!capable(CAP_SETPCAP)) return -EPERM; @@ -556,6 +553,7 @@ long cap_prctl_drop(unsigned long cap) cap_lower(current->cap_bset, cap); return 0; } + #else int cap_task_setscheduler (struct task_struct *p, int policy, struct sched_param *lp) @@ -572,12 +570,99 @@ int cap_task_setnice (struct task_struct *p, int nice) } #endif +int cap_task_prctl(int option, unsigned long arg2, unsigned long arg3, + unsigned long arg4, unsigned long arg5, long *rc_p) +{ + long error = 0; + + switch (option) { + case PR_CAPBSET_READ: + if (!cap_valid(arg2)) + error = -EINVAL; + else + error = !!cap_raised(current->cap_bset, arg2); + break; +#ifdef CONFIG_SECURITY_FILE_CAPABILITIES + case PR_CAPBSET_DROP: + error = cap_prctl_drop(arg2); + break; + + /* + * The next four prctl's remain to assist with transitioning a + * system from legacy UID=0 based privilege (when filesystem + * capabilities are not in use) to a system using filesystem + * capabilities only - as the POSIX.1e draft intended. + * + * Note: + * + * PR_SET_SECUREBITS = + * issecure_mask(SECURE_KEEP_CAPS_LOCKED) + * | issecure_mask(SECURE_NOROOT) + * | issecure_mask(SECURE_NOROOT_LOCKED) + * | issecure_mask(SECURE_NO_SETUID_FIXUP) + * | issecure_mask(SECURE_NO_SETUID_FIXUP_LOCKED) + * + * will ensure that the current process and all of its + * children will be locked into a pure + * capability-based-privilege environment. + */ + case PR_SET_SECUREBITS: + if ((((current->securebits & SECURE_ALL_LOCKS) >> 1) + & (current->securebits ^ arg2)) /*[1]*/ + || ((current->securebits & SECURE_ALL_LOCKS + & ~arg2)) /*[2]*/ + || (arg2 & ~(SECURE_ALL_LOCKS | SECURE_ALL_BITS)) /*[3]*/ + || (cap_capable(current, CAP_SETPCAP) != 0)) { /*[4]*/ + /* + * [1] no changing of bits that are locked + * [2] no unlocking of locks + * [3] no setting of unsupported bits + * [4] doing anything requires privilege (go read about + * the "sendmail capabilities bug") + */ + error = -EPERM; /* cannot change a locked bit */ + } else { + current->securebits = arg2; + } + break; + case PR_GET_SECUREBITS: + error = current->securebits; + break; + +#endif /* def CONFIG_SECURITY_FILE_CAPABILITIES */ + + case PR_GET_KEEPCAPS: + if (issecure(SECURE_KEEP_CAPS)) + error = 1; + break; + case PR_SET_KEEPCAPS: + if (arg2 > 1) /* Note, we rely on arg2 being unsigned here */ + error = -EINVAL; + else if (issecure(SECURE_KEEP_CAPS_LOCKED)) + error = -EPERM; + else if (arg2) + current->securebits |= issecure_mask(SECURE_KEEP_CAPS); + else + current->securebits &= + ~issecure_mask(SECURE_KEEP_CAPS); + break; + + default: + /* No functionality available - continue with default */ + return 0; + } + + /* Functionality provided */ + *rc_p = error; + return 1; +} + void cap_task_reparent_to_init (struct task_struct *p) { cap_set_init_eff(p->cap_effective); cap_clear(p->cap_inheritable); cap_set_full(p->cap_permitted); - p->keep_capabilities = 0; + p->securebits = SECUREBITS_DEFAULT; return; } diff --git a/security/dummy.c b/security/dummy.c index b0232bbf427b..58d4dd1af5c7 100644 --- a/security/dummy.c +++ b/security/dummy.c @@ -604,7 +604,7 @@ static int dummy_task_kill (struct task_struct *p, struct siginfo *info, } static int dummy_task_prctl (int option, unsigned long arg2, unsigned long arg3, - unsigned long arg4, unsigned long arg5) + unsigned long arg4, unsigned long arg5, long *rc_p) { return 0; } diff --git a/security/security.c b/security/security.c index 8a285c7b9962..d5cb5898d967 100644 --- a/security/security.c +++ b/security/security.c @@ -733,9 +733,9 @@ int security_task_wait(struct task_struct *p) } int security_task_prctl(int option, unsigned long arg2, unsigned long arg3, - unsigned long arg4, unsigned long arg5) + unsigned long arg4, unsigned long arg5, long *rc_p) { - return security_ops->task_prctl(option, arg2, arg3, arg4, arg5); + return security_ops->task_prctl(option, arg2, arg3, arg4, arg5, rc_p); } void security_task_reparent_to_init(struct task_struct *p) diff --git a/security/selinux/hooks.c b/security/selinux/hooks.c index 308e2cf17d75..04acb5af8317 100644 --- a/security/selinux/hooks.c +++ b/security/selinux/hooks.c @@ -3303,12 +3303,13 @@ static int selinux_task_prctl(int option, unsigned long arg2, unsigned long arg3, unsigned long arg4, - unsigned long arg5) + unsigned long arg5, + long *rc_p) { /* The current prctl operations do not appear to require any SELinux controls since they merely observe or modify the state of the current process. */ - return 0; + return secondary_ops->task_prctl(option, arg2, arg3, arg4, arg5, rc_p); } static int selinux_task_wait(struct task_struct *p) -- cgit v1.2.3 From c60264c494a119cd3a716a22edc0137b11de6d1e Mon Sep 17 00:00:00 2001 From: Harvey Harrison Date: Mon, 28 Apr 2008 02:13:41 -0700 Subject: smack: fix integer as NULL pointer warning in smack_lsm.c security/smack/smack_lsm.c:1257:16: warning: Using plain integer as NULL pointer Signed-off-by: Harvey Harrison Acked-by: Casey Schaufler Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- security/smack/smack_lsm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c index 4215971434e6..430f97b9d04a 100644 --- a/security/smack/smack_lsm.c +++ b/security/smack/smack_lsm.c @@ -1242,7 +1242,7 @@ static void smack_set_catset(char *catset, struct netlbl_lsm_secattr *sap) int rc; int byte; - if (catset == 0) + if (!catset) return; sap->flags |= NETLBL_SECATTR_MLS_CAT; -- cgit v1.2.3 From 55d00ccfb336b4f85a476a24e18c17b2eaff919e Mon Sep 17 00:00:00 2001 From: "Serge E. Hallyn" Date: Mon, 28 Apr 2008 02:13:42 -0700 Subject: root_plug: use cap_task_prctl With the introduction of per-process securebits, the capabilities-related prctl callbacks were moved into cap_task_prctl(). Have root_plug use cap_task_prctl() so that PR_SET_KEEPCAPS is defined. Signed-off-by: Serge E. Hallyn Acked-by: Greg Kroah-Hartman Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- security/root_plug.c | 1 + 1 file changed, 1 insertion(+) diff --git a/security/root_plug.c b/security/root_plug.c index 6112d1404c81..a41cf42a4fa0 100644 --- a/security/root_plug.c +++ b/security/root_plug.c @@ -86,6 +86,7 @@ static struct security_operations rootplug_security_ops = { .task_post_setuid = cap_task_post_setuid, .task_reparent_to_init = cap_task_reparent_to_init, + .task_prctl = cap_task_prctl, .bprm_check_security = rootplug_bprm_check_security, }; -- cgit v1.2.3 From 30aa4faf62b2dd9b239ae06ca7a85f1d36d7ef25 Mon Sep 17 00:00:00 2001 From: Casey Schaufler Date: Mon, 28 Apr 2008 02:13:43 -0700 Subject: smack: make smk_cipso_doi() and smk_unlbl_ambient() The functions smk_cipso_doi and smk_unlbl_ambient are not used outside smackfs.c and should hence be static. Signed-off-by: Casey Schaufler Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- security/smack/smackfs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/security/smack/smackfs.c b/security/smack/smackfs.c index 6ba283783b70..a5da5a8cfe9b 100644 --- a/security/smack/smackfs.c +++ b/security/smack/smackfs.c @@ -317,7 +317,7 @@ static const struct file_operations smk_load_ops = { /** * smk_cipso_doi - initialize the CIPSO domain */ -void smk_cipso_doi(void) +static void smk_cipso_doi(void) { int rc; struct cipso_v4_doi *doip; @@ -350,7 +350,7 @@ void smk_cipso_doi(void) /** * smk_unlbl_ambient - initialize the unlabeled domain */ -void smk_unlbl_ambient(char *oldambient) +static void smk_unlbl_ambient(char *oldambient) { int rc; struct netlbl_audit audit_info; -- cgit v1.2.3 From 1236cc3cf8c69bd316c940b2e94f91b3795f97fe Mon Sep 17 00:00:00 2001 From: "Serge E. Hallyn" Date: Mon, 28 Apr 2008 02:13:43 -0700 Subject: smack: use cap_task_prctl With the introduction of per-process securebits, the capabilities-related prctl callbacks were moved into cap_task_prctl(). Have smack use cap_task_prctl() so that PR_SET_KEEPCAPS is defined. Signed-off-by: Serge E. Hallyn Acked-by: Casey Schaufler Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- security/smack/smack_lsm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/security/smack/smack_lsm.c b/security/smack/smack_lsm.c index 430f97b9d04a..77ec16a3b68b 100644 --- a/security/smack/smack_lsm.c +++ b/security/smack/smack_lsm.c @@ -2495,6 +2495,7 @@ struct security_operations smack_ops = { .task_wait = smack_task_wait, .task_reparent_to_init = cap_task_reparent_to_init, .task_to_inode = smack_task_to_inode, + .task_prctl = cap_task_prctl, .ipc_permission = smack_ipc_permission, -- cgit v1.2.3 From b901d40c970e6db319fe1f8d84db2b9684b6c9bf Mon Sep 17 00:00:00 2001 From: Jim Meyering Date: Mon, 28 Apr 2008 02:13:44 -0700 Subject: alpha: handle kcalloc failure arch/alpha/kernel/module.c (module_frob_arch_sections): Handle kcalloc failure. Signed-off-by: Jim Meyering Cc: Richard Henderson Cc: Ivan Kokshaysky Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/alpha/kernel/module.c | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/arch/alpha/kernel/module.c b/arch/alpha/kernel/module.c index 026ba9af6d6a..ebc3c894b5a2 100644 --- a/arch/alpha/kernel/module.c +++ b/arch/alpha/kernel/module.c @@ -120,6 +120,12 @@ module_frob_arch_sections(Elf64_Ehdr *hdr, Elf64_Shdr *sechdrs, nsyms = symtab->sh_size / sizeof(Elf64_Sym); chains = kcalloc(nsyms, sizeof(struct got_entry), GFP_KERNEL); + if (!chains) { + printk(KERN_ERR + "module %s: no memory for symbol chain buffer\n", + me->name); + return -ENOMEM; + } got->sh_size = 0; got->sh_addralign = 8; -- cgit v1.2.3 From bbb8d343affd21850849fa4d41bf91c7527a3d04 Mon Sep 17 00:00:00 2001 From: Harvey Harrison Date: Mon, 28 Apr 2008 02:13:46 -0700 Subject: alpha: remove remaining __FUNCTION__ occurrences __FUNCTION__ is gcc-specific, use __func__ The change in pci-iommu,c should be safe as arena has not been assigned when we get to this point. Some were within #if 0 blocks, have changed them and left the blocks as they appear to be debugging infrastructure. A #define FN __FUNCTION__ was removed and occurances of FN were replaced with __func__ as well. Signed-off-by: Harvey Harrison Cc: Ivan Kokshaysky Cc: Richard Henderson Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/alpha/kernel/core_marvel.c | 6 +++--- arch/alpha/kernel/core_t2.c | 24 +++++++++--------------- arch/alpha/kernel/core_titan.c | 34 +++++++++++++++++----------------- arch/alpha/kernel/core_tsunami.c | 28 +++++++++++++--------------- arch/alpha/kernel/pci.c | 2 +- arch/alpha/kernel/pci_iommu.c | 34 +++++++++++++++------------------- arch/alpha/kernel/smp.c | 4 ++-- arch/alpha/kernel/srm_env.c | 2 +- arch/alpha/kernel/sys_alcor.c | 2 +- arch/alpha/kernel/sys_marvel.c | 12 ++++++------ arch/alpha/kernel/sys_sable.c | 6 +++--- arch/alpha/kernel/sys_sio.c | 2 +- 12 files changed, 72 insertions(+), 84 deletions(-) diff --git a/arch/alpha/kernel/core_marvel.c b/arch/alpha/kernel/core_marvel.c index f10d2eddd2c3..b04f1feb1dda 100644 --- a/arch/alpha/kernel/core_marvel.c +++ b/arch/alpha/kernel/core_marvel.c @@ -994,7 +994,7 @@ marvel_agp_configure(alpha_agp_info *agp) * rate, but warn the user. */ printk("%s: unknown PLL setting RNGB=%lx (PLL6_CTL=%016lx)\n", - __FUNCTION__, IO7_PLL_RNGB(agp_pll), agp_pll); + __func__, IO7_PLL_RNGB(agp_pll), agp_pll); break; } @@ -1044,13 +1044,13 @@ marvel_agp_translate(alpha_agp_info *agp, dma_addr_t addr) if (addr < agp->aperture.bus_base || addr >= agp->aperture.bus_base + agp->aperture.size) { - printk("%s: addr out of range\n", __FUNCTION__); + printk("%s: addr out of range\n", __func__); return -EINVAL; } pte = aper->arena->ptes[baddr >> PAGE_SHIFT]; if (!(pte & 1)) { - printk("%s: pte not valid\n", __FUNCTION__); + printk("%s: pte not valid\n", __func__); return -EINVAL; } return (pte >> 1) << PAGE_SHIFT; diff --git a/arch/alpha/kernel/core_t2.c b/arch/alpha/kernel/core_t2.c index f5ca5255eb06..c0750291b44a 100644 --- a/arch/alpha/kernel/core_t2.c +++ b/arch/alpha/kernel/core_t2.c @@ -336,10 +336,7 @@ t2_direct_map_window1(unsigned long base, unsigned long length) #if DEBUG_PRINT_FINAL_SETTINGS printk("%s: setting WBASE1=0x%lx WMASK1=0x%lx TBASE1=0x%lx\n", - __FUNCTION__, - *(vulp)T2_WBASE1, - *(vulp)T2_WMASK1, - *(vulp)T2_TBASE1); + __func__, *(vulp)T2_WBASE1, *(vulp)T2_WMASK1, *(vulp)T2_TBASE1); #endif } @@ -366,10 +363,7 @@ t2_sg_map_window2(struct pci_controller *hose, #if DEBUG_PRINT_FINAL_SETTINGS printk("%s: setting WBASE2=0x%lx WMASK2=0x%lx TBASE2=0x%lx\n", - __FUNCTION__, - *(vulp)T2_WBASE2, - *(vulp)T2_WMASK2, - *(vulp)T2_TBASE2); + __func__, *(vulp)T2_WBASE2, *(vulp)T2_WMASK2, *(vulp)T2_TBASE2); #endif } @@ -377,15 +371,15 @@ static void __init t2_save_configuration(void) { #if DEBUG_PRINT_INITIAL_SETTINGS - printk("%s: HAE_1 was 0x%lx\n", __FUNCTION__, srm_hae); /* HW is 0 */ - printk("%s: HAE_2 was 0x%lx\n", __FUNCTION__, *(vulp)T2_HAE_2); - printk("%s: HAE_3 was 0x%lx\n", __FUNCTION__, *(vulp)T2_HAE_3); - printk("%s: HAE_4 was 0x%lx\n", __FUNCTION__, *(vulp)T2_HAE_4); - printk("%s: HBASE was 0x%lx\n", __FUNCTION__, *(vulp)T2_HBASE); + printk("%s: HAE_1 was 0x%lx\n", __func__, srm_hae); /* HW is 0 */ + printk("%s: HAE_2 was 0x%lx\n", __func__, *(vulp)T2_HAE_2); + printk("%s: HAE_3 was 0x%lx\n", __func__, *(vulp)T2_HAE_3); + printk("%s: HAE_4 was 0x%lx\n", __func__, *(vulp)T2_HAE_4); + printk("%s: HBASE was 0x%lx\n", __func__, *(vulp)T2_HBASE); - printk("%s: WBASE1=0x%lx WMASK1=0x%lx TBASE1=0x%lx\n", __FUNCTION__, + printk("%s: WBASE1=0x%lx WMASK1=0x%lx TBASE1=0x%lx\n", __func__, *(vulp)T2_WBASE1, *(vulp)T2_WMASK1, *(vulp)T2_TBASE1); - printk("%s: WBASE2=0x%lx WMASK2=0x%lx TBASE2=0x%lx\n", __FUNCTION__, + printk("%s: WBASE2=0x%lx WMASK2=0x%lx TBASE2=0x%lx\n", __func__, *(vulp)T2_WBASE2, *(vulp)T2_WMASK2, *(vulp)T2_TBASE2); #endif diff --git a/arch/alpha/kernel/core_titan.c b/arch/alpha/kernel/core_titan.c index 819326627b96..319fcb74611e 100644 --- a/arch/alpha/kernel/core_titan.c +++ b/arch/alpha/kernel/core_titan.c @@ -365,21 +365,21 @@ void __init titan_init_arch(void) { #if 0 - printk("%s: titan_init_arch()\n", __FUNCTION__); - printk("%s: CChip registers:\n", __FUNCTION__); - printk("%s: CSR_CSC 0x%lx\n", __FUNCTION__, TITAN_cchip->csc.csr); - printk("%s: CSR_MTR 0x%lx\n", __FUNCTION__, TITAN_cchip->mtr.csr); - printk("%s: CSR_MISC 0x%lx\n", __FUNCTION__, TITAN_cchip->misc.csr); - printk("%s: CSR_DIM0 0x%lx\n", __FUNCTION__, TITAN_cchip->dim0.csr); - printk("%s: CSR_DIM1 0x%lx\n", __FUNCTION__, TITAN_cchip->dim1.csr); - printk("%s: CSR_DIR0 0x%lx\n", __FUNCTION__, TITAN_cchip->dir0.csr); - printk("%s: CSR_DIR1 0x%lx\n", __FUNCTION__, TITAN_cchip->dir1.csr); - printk("%s: CSR_DRIR 0x%lx\n", __FUNCTION__, TITAN_cchip->drir.csr); - - printk("%s: DChip registers:\n", __FUNCTION__); - printk("%s: CSR_DSC 0x%lx\n", __FUNCTION__, TITAN_dchip->dsc.csr); - printk("%s: CSR_STR 0x%lx\n", __FUNCTION__, TITAN_dchip->str.csr); - printk("%s: CSR_DREV 0x%lx\n", __FUNCTION__, TITAN_dchip->drev.csr); + printk("%s: titan_init_arch()\n", __func__); + printk("%s: CChip registers:\n", __func__); + printk("%s: CSR_CSC 0x%lx\n", __func__, TITAN_cchip->csc.csr); + printk("%s: CSR_MTR 0x%lx\n", __func__, TITAN_cchip->mtr.csr); + printk("%s: CSR_MISC 0x%lx\n", __func__, TITAN_cchip->misc.csr); + printk("%s: CSR_DIM0 0x%lx\n", __func__, TITAN_cchip->dim0.csr); + printk("%s: CSR_DIM1 0x%lx\n", __func__, TITAN_cchip->dim1.csr); + printk("%s: CSR_DIR0 0x%lx\n", __func__, TITAN_cchip->dir0.csr); + printk("%s: CSR_DIR1 0x%lx\n", __func__, TITAN_cchip->dir1.csr); + printk("%s: CSR_DRIR 0x%lx\n", __func__, TITAN_cchip->drir.csr); + + printk("%s: DChip registers:\n", __func__); + printk("%s: CSR_DSC 0x%lx\n", __func__, TITAN_dchip->dsc.csr); + printk("%s: CSR_STR 0x%lx\n", __func__, TITAN_dchip->str.csr); + printk("%s: CSR_DREV 0x%lx\n", __func__, TITAN_dchip->drev.csr); #endif boot_cpuid = __hard_smp_processor_id(); @@ -700,13 +700,13 @@ titan_agp_translate(alpha_agp_info *agp, dma_addr_t addr) if (addr < agp->aperture.bus_base || addr >= agp->aperture.bus_base + agp->aperture.size) { - printk("%s: addr out of range\n", __FUNCTION__); + printk("%s: addr out of range\n", __func__); return -EINVAL; } pte = aper->arena->ptes[baddr >> PAGE_SHIFT]; if (!(pte & 1)) { - printk("%s: pte not valid\n", __FUNCTION__); + printk("%s: pte not valid\n", __func__); return -EINVAL; } diff --git a/arch/alpha/kernel/core_tsunami.c b/arch/alpha/kernel/core_tsunami.c index ef91e09590d4..5e7c28f92f19 100644 --- a/arch/alpha/kernel/core_tsunami.c +++ b/arch/alpha/kernel/core_tsunami.c @@ -241,8 +241,6 @@ tsunami_probe_write(volatile unsigned long *vaddr) #define tsunami_probe_read(ADDR) 1 #endif /* NXM_MACHINE_CHECKS_ON_TSUNAMI */ -#define FN __FUNCTION__ - static void __init tsunami_init_one_pchip(tsunami_pchip *pchip, int index) { @@ -383,27 +381,27 @@ tsunami_init_arch(void) /* NXMs just don't matter to Tsunami--unless they make it choke completely. */ tmp = (unsigned long)(TSUNAMI_cchip - 1); - printk("%s: probing bogus address: 0x%016lx\n", FN, bogus_addr); + printk("%s: probing bogus address: 0x%016lx\n", __func__, bogus_addr); printk("\tprobe %s\n", tsunami_probe_write((unsigned long *)bogus_addr) ? "succeeded" : "failed"); #endif /* NXM_MACHINE_CHECKS_ON_TSUNAMI */ #if 0 - printk("%s: CChip registers:\n", FN); - printk("%s: CSR_CSC 0x%lx\n", FN, TSUNAMI_cchip->csc.csr); - printk("%s: CSR_MTR 0x%lx\n", FN, TSUNAMI_cchip.mtr.csr); - printk("%s: CSR_MISC 0x%lx\n", FN, TSUNAMI_cchip->misc.csr); - printk("%s: CSR_DIM0 0x%lx\n", FN, TSUNAMI_cchip->dim0.csr); - printk("%s: CSR_DIM1 0x%lx\n", FN, TSUNAMI_cchip->dim1.csr); - printk("%s: CSR_DIR0 0x%lx\n", FN, TSUNAMI_cchip->dir0.csr); - printk("%s: CSR_DIR1 0x%lx\n", FN, TSUNAMI_cchip->dir1.csr); - printk("%s: CSR_DRIR 0x%lx\n", FN, TSUNAMI_cchip->drir.csr); + printk("%s: CChip registers:\n", __func__); + printk("%s: CSR_CSC 0x%lx\n", __func__, TSUNAMI_cchip->csc.csr); + printk("%s: CSR_MTR 0x%lx\n", __func__, TSUNAMI_cchip.mtr.csr); + printk("%s: CSR_MISC 0x%lx\n", __func__, TSUNAMI_cchip->misc.csr); + printk("%s: CSR_DIM0 0x%lx\n", __func__, TSUNAMI_cchip->dim0.csr); + printk("%s: CSR_DIM1 0x%lx\n", __func__, TSUNAMI_cchip->dim1.csr); + printk("%s: CSR_DIR0 0x%lx\n", __func__, TSUNAMI_cchip->dir0.csr); + printk("%s: CSR_DIR1 0x%lx\n", __func__, TSUNAMI_cchip->dir1.csr); + printk("%s: CSR_DRIR 0x%lx\n", __func__, TSUNAMI_cchip->drir.csr); printk("%s: DChip registers:\n"); - printk("%s: CSR_DSC 0x%lx\n", FN, TSUNAMI_dchip->dsc.csr); - printk("%s: CSR_STR 0x%lx\n", FN, TSUNAMI_dchip->str.csr); - printk("%s: CSR_DREV 0x%lx\n", FN, TSUNAMI_dchip->drev.csr); + printk("%s: CSR_DSC 0x%lx\n", __func__, TSUNAMI_dchip->dsc.csr); + printk("%s: CSR_STR 0x%lx\n", __func__, TSUNAMI_dchip->str.csr); + printk("%s: CSR_DREV 0x%lx\n", __func__, TSUNAMI_dchip->drev.csr); #endif /* With multiple PCI busses, we play with I/O as physical addrs. */ ioport_resource.end = ~0UL; diff --git a/arch/alpha/kernel/pci.c b/arch/alpha/kernel/pci.c index 78357798b6fd..baf57563b14c 100644 --- a/arch/alpha/kernel/pci.c +++ b/arch/alpha/kernel/pci.c @@ -208,7 +208,7 @@ pdev_save_srm_config(struct pci_dev *dev) tmp = kmalloc(sizeof(*tmp), GFP_KERNEL); if (!tmp) { - printk(KERN_ERR "%s: kmalloc() failed!\n", __FUNCTION__); + printk(KERN_ERR "%s: kmalloc() failed!\n", __func__); return; } tmp->next = srm_saved_configs; diff --git a/arch/alpha/kernel/pci_iommu.c b/arch/alpha/kernel/pci_iommu.c index dd6e334ab9e1..2179c602032a 100644 --- a/arch/alpha/kernel/pci_iommu.c +++ b/arch/alpha/kernel/pci_iommu.c @@ -79,25 +79,21 @@ iommu_arena_new_node(int nid, struct pci_controller *hose, dma_addr_t base, #ifdef CONFIG_DISCONTIGMEM - if (!NODE_DATA(nid) || - (NULL == (arena = alloc_bootmem_node(NODE_DATA(nid), - sizeof(*arena))))) { - printk("%s: couldn't allocate arena from node %d\n" - " falling back to system-wide allocation\n", - __FUNCTION__, nid); - arena = alloc_bootmem(sizeof(*arena)); - } - - if (!NODE_DATA(nid) || - (NULL == (arena->ptes = __alloc_bootmem_node(NODE_DATA(nid), - mem_size, - align, - 0)))) { - printk("%s: couldn't allocate arena ptes from node %d\n" - " falling back to system-wide allocation\n", - __FUNCTION__, nid); - arena->ptes = __alloc_bootmem(mem_size, align, 0); - } + arena = alloc_bootmem_node(NODE_DATA(nid), sizeof(*arena)); + if (!NODE_DATA(nid) || !arena) { + printk("%s: couldn't allocate arena from node %d\n" + " falling back to system-wide allocation\n", + __func__, nid); + arena = alloc_bootmem(sizeof(*arena)); + } + + arena->ptes = __alloc_bootmem_node(NODE_DATA(nid), mem_size, align, 0); + if (!NODE_DATA(nid) || !arena->ptes) { + printk("%s: couldn't allocate arena ptes from node %d\n" + " falling back to system-wide allocation\n", + __func__, nid); + arena->ptes = __alloc_bootmem(mem_size, align, 0); + } #else /* CONFIG_DISCONTIGMEM */ diff --git a/arch/alpha/kernel/smp.c b/arch/alpha/kernel/smp.c index 63c2073401ee..2525692db0ab 100644 --- a/arch/alpha/kernel/smp.c +++ b/arch/alpha/kernel/smp.c @@ -755,7 +755,7 @@ smp_call_function_on_cpu (void (*func) (void *info), void *info, int retry, if (atomic_read(&data.unstarted_count) > 0) { long start_time = jiffies; printk(KERN_ERR "%s: initial timeout -- trying long wait\n", - __FUNCTION__); + __func__); timeout = jiffies + 30 * HZ; while (atomic_read(&data.unstarted_count) > 0 && time_before(jiffies, timeout)) @@ -764,7 +764,7 @@ smp_call_function_on_cpu (void (*func) (void *info), void *info, int retry, long delta = jiffies - start_time; printk(KERN_ERR "%s: response %ld.%ld seconds into long wait\n", - __FUNCTION__, delta / HZ, + __func__, delta / HZ, (100 * (delta - ((delta / HZ) * HZ))) / HZ); } } diff --git a/arch/alpha/kernel/srm_env.c b/arch/alpha/kernel/srm_env.c index f7dd081d57ff..78ad7cd1bbd6 100644 --- a/arch/alpha/kernel/srm_env.c +++ b/arch/alpha/kernel/srm_env.c @@ -199,7 +199,7 @@ srm_env_init(void) printk(KERN_INFO "%s: This Alpha system doesn't " "know about SRM (or you've booted " "SRM->MILO->Linux, which gets " - "misdetected)...\n", __FUNCTION__); + "misdetected)...\n", __func__); return -ENODEV; } diff --git a/arch/alpha/kernel/sys_alcor.c b/arch/alpha/kernel/sys_alcor.c index d187d01d2a17..e53a1e1c2f21 100644 --- a/arch/alpha/kernel/sys_alcor.c +++ b/arch/alpha/kernel/sys_alcor.c @@ -259,7 +259,7 @@ alcor_init_pci(void) if (dev && dev->devfn == PCI_DEVFN(6,0)) { alpha_mv.sys.cia.gru_int_req_bits = XLT_GRU_INT_REQ_BITS; printk(KERN_INFO "%s: Detected AS500 or XLT motherboard.\n", - __FUNCTION__); + __func__); } pci_dev_put(dev); } diff --git a/arch/alpha/kernel/sys_marvel.c b/arch/alpha/kernel/sys_marvel.c index 922143ea1cdb..828449cd2636 100644 --- a/arch/alpha/kernel/sys_marvel.c +++ b/arch/alpha/kernel/sys_marvel.c @@ -80,7 +80,7 @@ io7_get_irq_ctl(unsigned int irq, struct io7 **pio7) if (!(io7 = marvel_find_io7(pid))) { printk(KERN_ERR "%s for nonexistent io7 -- vec %x, pid %d\n", - __FUNCTION__, irq, pid); + __func__, irq, pid); return NULL; } @@ -90,7 +90,7 @@ io7_get_irq_ctl(unsigned int irq, struct io7 **pio7) if (irq >= 0x180) { printk(KERN_ERR "%s for invalid irq -- pid %d adjusted irq %x\n", - __FUNCTION__, pid, irq); + __func__, pid, irq); return NULL; } @@ -110,8 +110,8 @@ io7_enable_irq(unsigned int irq) ctl = io7_get_irq_ctl(irq, &io7); if (!ctl || !io7) { - printk(KERN_ERR "%s: get_ctl failed for irq %x\n", - __FUNCTION__, irq); + printk(KERN_ERR "%s: get_ctl failed for irq %x\n", + __func__, irq); return; } @@ -130,8 +130,8 @@ io7_disable_irq(unsigned int irq) ctl = io7_get_irq_ctl(irq, &io7); if (!ctl || !io7) { - printk(KERN_ERR "%s: get_ctl failed for irq %x\n", - __FUNCTION__, irq); + printk(KERN_ERR "%s: get_ctl failed for irq %x\n", + __func__, irq); return; } diff --git a/arch/alpha/kernel/sys_sable.c b/arch/alpha/kernel/sys_sable.c index 906019cfa681..99a7f19da13a 100644 --- a/arch/alpha/kernel/sys_sable.c +++ b/arch/alpha/kernel/sys_sable.c @@ -454,7 +454,7 @@ sable_lynx_enable_irq(unsigned int irq) spin_unlock(&sable_lynx_irq_lock); #if 0 printk("%s: mask 0x%lx bit 0x%x irq 0x%x\n", - __FUNCTION__, mask, bit, irq); + __func__, mask, bit, irq); #endif } @@ -470,7 +470,7 @@ sable_lynx_disable_irq(unsigned int irq) spin_unlock(&sable_lynx_irq_lock); #if 0 printk("%s: mask 0x%lx bit 0x%x irq 0x%x\n", - __FUNCTION__, mask, bit, irq); + __func__, mask, bit, irq); #endif } @@ -524,7 +524,7 @@ sable_lynx_srm_device_interrupt(unsigned long vector) irq = sable_lynx_irq_swizzle->mask_to_irq[bit]; #if 0 printk("%s: vector 0x%lx bit 0x%x irq 0x%x\n", - __FUNCTION__, vector, bit, irq); + __func__, vector, bit, irq); #endif handle_irq(irq); } diff --git a/arch/alpha/kernel/sys_sio.c b/arch/alpha/kernel/sys_sio.c index ee7b9009ebb4..d4327e461c22 100644 --- a/arch/alpha/kernel/sys_sio.c +++ b/arch/alpha/kernel/sys_sio.c @@ -89,7 +89,7 @@ sio_pci_route(void) /* First, ALWAYS read and print the original setting. */ pci_bus_read_config_dword(pci_isa_hose->bus, PCI_DEVFN(7, 0), 0x60, &orig_route_tab); - printk("%s: PIRQ original 0x%x new 0x%x\n", __FUNCTION__, + printk("%s: PIRQ original 0x%x new 0x%x\n", __func__, orig_route_tab, alpha_mv.sys.sio.route_tab); #if defined(ALPHA_RESTORE_SRM_SETUP) -- cgit v1.2.3 From 95d193a90335b4e39dd1f750f1fc1672339ff487 Mon Sep 17 00:00:00 2001 From: Harvey Harrison Date: Mon, 28 Apr 2008 02:13:46 -0700 Subject: alpha: replace __inline with inline Signed-off-by: Harvey Harrison Cc: Richard Henderson Cc: Ivan Kokshaysky Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/asm-alpha/byteorder.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/asm-alpha/byteorder.h b/include/asm-alpha/byteorder.h index 7af2b8d25486..58e958fc7f1b 100644 --- a/include/asm-alpha/byteorder.h +++ b/include/asm-alpha/byteorder.h @@ -7,7 +7,7 @@ #ifdef __GNUC__ -static __inline __attribute_const__ __u32 __arch__swab32(__u32 x) +static inline __attribute_const__ __u32 __arch__swab32(__u32 x) { /* * Unfortunately, we can't use the 6 instruction sequence -- cgit v1.2.3 From 037f436f525dac36c9f5fd5c5054518a63debb3e Mon Sep 17 00:00:00 2001 From: "S.Caglar Onur" Date: Mon, 28 Apr 2008 02:13:47 -0700 Subject: arch/alpha/kernel/traps.c: use time_* macros The functions time_before, time_before_eq, time_after, and time_after_eq are more robust for comparing jiffies against other values. So implement usage of the time_after() macro, defined in linux/jiffies.h, which deals with wrapping correctly [akpm@linux-foundation.org: fix warning] Signed-off-by: S.Caglar Onur Cc: Richard Henderson Cc: Ivan Kokshaysky Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/alpha/kernel/traps.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/arch/alpha/kernel/traps.c b/arch/alpha/kernel/traps.c index 2dc7f9fed213..dc57790250d2 100644 --- a/arch/alpha/kernel/traps.c +++ b/arch/alpha/kernel/traps.c @@ -8,6 +8,7 @@ * This file initializes the trap entry points */ +#include #include #include #include @@ -770,7 +771,7 @@ do_entUnaUser(void __user * va, unsigned long opcode, unsigned long reg, struct pt_regs *regs) { static int cnt = 0; - static long last_time = 0; + static unsigned long last_time; unsigned long tmp1, tmp2, tmp3, tmp4; unsigned long fake_reg, *reg_addr = &fake_reg; @@ -781,7 +782,7 @@ do_entUnaUser(void __user * va, unsigned long opcode, with the unaliged access. */ if (!test_thread_flag (TIF_UAC_NOPRINT)) { - if (cnt >= 5 && jiffies - last_time > 5*HZ) { + if (cnt >= 5 && time_after(jiffies, last_time + 5 * HZ)) { cnt = 0; } if (++cnt < 5) { -- cgit v1.2.3 From ed6b9b97f42c091630335bfb71a2931e6f86388b Mon Sep 17 00:00:00 2001 From: Andrew Morton Date: Mon, 28 Apr 2008 02:13:48 -0700 Subject: alpha: teach the compiler that BUG doesn't return Fix things like this: security/selinux/netnode.c: In function 'sel_netnode_find': security/selinux/netnode.c:126: warning: 'idx' may be used uninitialized in this function security/selinux/netnode.c: In function 'sel_netnode_sid': security/selinux/netnode.c:225: warning: 'ret' may be used uninitialized in this function security/selinux/netnode.c:168: warning: 'idx' may be used uninitialized in this function due to code correctly not expecting BUG() to return. For some reason this reduces the object code size for that particular file. Cc: Ivan Kokshaysky Cc: Richard Henderson Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/asm-alpha/bug.h | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/include/asm-alpha/bug.h b/include/asm-alpha/bug.h index 39a3e2a5017d..695a5ee4b5d3 100644 --- a/include/asm-alpha/bug.h +++ b/include/asm-alpha/bug.h @@ -1,14 +1,24 @@ #ifndef _ALPHA_BUG_H #define _ALPHA_BUG_H +#include + #ifdef CONFIG_BUG #include /* ??? Would be nice to use .gprel32 here, but we can't be sure that the function loaded the GP, so this could fail in modules. */ -#define BUG() \ - __asm__ __volatile__("call_pal %0 # bugchk\n\t"".long %1\n\t.8byte %2" \ - : : "i" (PAL_bugchk), "i"(__LINE__), "i"(__FILE__)) +static inline void ATTRIB_NORET __BUG(const char *file, int line) +{ + __asm__ __volatile__( + "call_pal %0 # bugchk\n\t" + ".long %1\n\t.8byte %2" + : : "i" (PAL_bugchk), "i"(line), "i"(file)); + for ( ; ; ) + ; +} + +#define BUG() __BUG(__FILE__, __LINE__) #define HAVE_ARCH_BUG #endif -- cgit v1.2.3 From 6feef6e5f23d5a3d8a614ab8ea392dfa54c7365c Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Mon, 28 Apr 2008 02:13:48 -0700 Subject: m68k: remove redundant display of free swap space in show_mem() show_mem() has no need to print the amount of free swap space manually because show_free_areas() does this already and is called by the former. The two outputs only differ in text formatting: printk("Free swap = %lukB\n", ...); printk("Free swap: %6ldkB\n", ...); Signed-off-by: Johannes Weiner Cc: Geert Uytterhoeven Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/m68k/mm/init.c | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/m68k/mm/init.c b/arch/m68k/mm/init.c index f42caa79e4e8..a2bb01f59642 100644 --- a/arch/m68k/mm/init.c +++ b/arch/m68k/mm/init.c @@ -79,7 +79,6 @@ void show_mem(void) printk("\nMem-info:\n"); show_free_areas(); - printk("Free swap: %6ldkB\n", nr_swap_pages<<(PAGE_SHIFT-10)); for_each_online_pgdat(pgdat) { for (i = 0; i < pgdat->node_spanned_pages; i++) { struct page *page = pgdat->node_mem_map + i; -- cgit v1.2.3 From f85e7cdc3fd0db65ef1442476b82ced0f01c5c19 Mon Sep 17 00:00:00 2001 From: Harvey Harrison Date: Mon, 28 Apr 2008 02:13:49 -0700 Subject: m68k: replace remaining __FUNCTION__ occurrences __FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison Cc: Geert Uytterhoeven Cc: Roman Zippel Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/m68k/kernel/ints.c | 10 +++++----- arch/m68k/mac/oss.c | 4 ++-- arch/m68k/q40/q40ints.c | 2 +- 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/arch/m68k/kernel/ints.c b/arch/m68k/kernel/ints.c index 2b412454cb41..ded7dd2f67b2 100644 --- a/arch/m68k/kernel/ints.c +++ b/arch/m68k/kernel/ints.c @@ -186,7 +186,7 @@ int setup_irq(unsigned int irq, struct irq_node *node) if (irq >= NR_IRQS || !(contr = irq_controller[irq])) { printk("%s: Incorrect IRQ %d from %s\n", - __FUNCTION__, irq, node->devname); + __func__, irq, node->devname); return -ENXIO; } @@ -249,7 +249,7 @@ void free_irq(unsigned int irq, void *dev_id) unsigned long flags; if (irq >= NR_IRQS || !(contr = irq_controller[irq])) { - printk("%s: Incorrect IRQ %d\n", __FUNCTION__, irq); + printk("%s: Incorrect IRQ %d\n", __func__, irq); return; } @@ -267,7 +267,7 @@ void free_irq(unsigned int irq, void *dev_id) node->handler = NULL; } else printk("%s: Removing probably wrong IRQ %d\n", - __FUNCTION__, irq); + __func__, irq); if (!irq_list[irq]) { if (contr->shutdown) @@ -288,7 +288,7 @@ void enable_irq(unsigned int irq) if (irq >= NR_IRQS || !(contr = irq_controller[irq])) { printk("%s: Incorrect IRQ %d\n", - __FUNCTION__, irq); + __func__, irq); return; } @@ -312,7 +312,7 @@ void disable_irq(unsigned int irq) if (irq >= NR_IRQS || !(contr = irq_controller[irq])) { printk("%s: Incorrect IRQ %d\n", - __FUNCTION__, irq); + __func__, irq); return; } diff --git a/arch/m68k/mac/oss.c b/arch/m68k/mac/oss.c index 50603d3dce84..3c943d2ec570 100644 --- a/arch/m68k/mac/oss.c +++ b/arch/m68k/mac/oss.c @@ -190,7 +190,7 @@ void oss_irq_enable(int irq) { break; #ifdef DEBUG_IRQUSE default: - printk("%s unknown irq %d\n",__FUNCTION__, irq); + printk("%s unknown irq %d\n", __func__, irq); break; #endif } @@ -230,7 +230,7 @@ void oss_irq_disable(int irq) { break; #ifdef DEBUG_IRQUSE default: - printk("%s unknown irq %d\n", __FUNCTION__, irq); + printk("%s unknown irq %d\n", __func__, irq); break; #endif } diff --git a/arch/m68k/q40/q40ints.c b/arch/m68k/q40/q40ints.c index 46161cef08b9..9f0e3d59bf92 100644 --- a/arch/m68k/q40/q40ints.c +++ b/arch/m68k/q40/q40ints.c @@ -47,7 +47,7 @@ static int q40_irq_startup(unsigned int irq) switch (irq) { case 1: case 2: case 8: case 9: case 11: case 12: case 13: - printk("%s: ISA IRQ %d not implemented by HW\n", __FUNCTION__, irq); + printk("%s: ISA IRQ %d not implemented by HW\n", __func__, irq); return -ENXIO; } return 0; -- cgit v1.2.3 From 032c17e8afa150412810ffc19913ecd5eb531d57 Mon Sep 17 00:00:00 2001 From: Alan Cox Date: Mon, 28 Apr 2008 02:13:50 -0700 Subject: crisv10: prepare for BKL push down Just the modem bits this time Signed-off-by: Alan Cox Cc: Mikael Starvik Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/serial/crisv10.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/serial/crisv10.c b/drivers/serial/crisv10.c index 383c4e660cd5..88e7c1d5b919 100644 --- a/drivers/serial/crisv10.c +++ b/drivers/serial/crisv10.c @@ -3582,6 +3582,8 @@ rs_tiocmset(struct tty_struct *tty, struct file *file, { struct e100_serial *info = (struct e100_serial *)tty->driver_data; + lock_kernel(); + if (clear & TIOCM_RTS) e100_rts(info, 0); if (clear & TIOCM_DTR) @@ -3601,6 +3603,8 @@ rs_tiocmset(struct tty_struct *tty, struct file *file, e100_ri_out(info, 1); if (set & TIOCM_CD) e100_cd_out(info, 1); + + unlock_kernel(); return 0; } @@ -3610,6 +3614,7 @@ rs_tiocmget(struct tty_struct *tty, struct file *file) struct e100_serial *info = (struct e100_serial *)tty->driver_data; unsigned int result; + lock_kernel(); result = (!E100_RTS_GET(info) ? TIOCM_RTS : 0) | (!E100_DTR_GET(info) ? TIOCM_DTR : 0) @@ -3618,6 +3623,8 @@ rs_tiocmget(struct tty_struct *tty, struct file *file) | (!E100_CD_GET(info) ? TIOCM_CAR : 0) | (!E100_CTS_GET(info) ? TIOCM_CTS : 0); + unlock_kernel(); + #ifdef SERIAL_DEBUG_IO printk(KERN_DEBUG "ser%i: modem state: %i 0x%08X\n", info->line, result, result); -- cgit v1.2.3 From 5fd284fd976232dbd0d0dc94e07c91e50e2898b2 Mon Sep 17 00:00:00 2001 From: Johannes Weiner Date: Mon, 28 Apr 2008 02:13:51 -0700 Subject: cris: remove redundant display of free swap space in show_mem() show_mem() has no need to print the amount of free swap space manually because show_free_areas() does this already and is called by the former. The two outputs only differ in text formatting: printk("Free swap = %lukB\n", ...); printk("Free swap: %6ldkB\n", ...); Signed-off-by: Johannes Weiner Cc: Mikael Starvik Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/cris/mm/init.c | 1 - 1 file changed, 1 deletion(-) diff --git a/arch/cris/mm/init.c b/arch/cris/mm/init.c index 4207a2b52750..5b06ffa15e34 100644 --- a/arch/cris/mm/init.c +++ b/arch/cris/mm/init.c @@ -27,7 +27,6 @@ show_mem(void) printk("\nMem-info:\n"); show_free_areas(); - printk("Free swap: %6ldkB\n", nr_swap_pages<<(PAGE_SHIFT-10)); i = max_mapnr; while (i-- > 0) { total++; -- cgit v1.2.3 From 16a26ef5ad31b59c521bd9becccaee84c0157326 Mon Sep 17 00:00:00 2001 From: KOSAKI Motohiro Date: Mon, 28 Apr 2008 02:13:51 -0700 Subject: cris: add constfy to pgd_offset() add constfy to pgd_offset() for avoid following warnings. CC mm/pagewalk.o mm/pagewalk.c: In function 'walk_page_range': mm/pagewalk.c:111: warning: passing argument 1 of 'pgd_offset' discards qualifiers from p\ ointer target type Signed-off-by: KOSAKI Motohiro Cc: Matt Mackall Cc: "Vegard Nossum" Cc: Mikael Starvik Cc: Jesper Nilsson Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/asm-cris/pgtable.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/asm-cris/pgtable.h b/include/asm-cris/pgtable.h index 4c373624ee97..829e7a7d9fb9 100644 --- a/include/asm-cris/pgtable.h +++ b/include/asm-cris/pgtable.h @@ -231,7 +231,7 @@ static inline void pmd_set(pmd_t * pmdp, pte_t * ptep) #define pgd_index(address) (((address) >> PGDIR_SHIFT) & (PTRS_PER_PGD-1)) /* to find an entry in a page-table-directory */ -static inline pgd_t * pgd_offset(struct mm_struct * mm, unsigned long address) +static inline pgd_t * pgd_offset(const struct mm_struct *mm, unsigned long address) { return mm->pgd + pgd_index(address); } -- cgit v1.2.3 From 3af9c5bed1b8f284f3d7d479c77adf60ad059e91 Mon Sep 17 00:00:00 2001 From: WANG Cong Date: Mon, 28 Apr 2008 02:13:52 -0700 Subject: arch/um/kernel/um_arch.c: some small improvements Make some small improvements for arch/um/kernel/um_arch.c. Signed-off-by: WANG Cong Acked-by: Jeff Dike Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/um/kernel/um_arch.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/arch/um/kernel/um_arch.c b/arch/um/kernel/um_arch.c index a6c1dd1cf5a1..56deed623446 100644 --- a/arch/um/kernel/um_arch.c +++ b/arch/um/kernel/um_arch.c @@ -115,7 +115,7 @@ static int have_root __initdata = 0; /* Set in uml_mem_setup and modified in linux_main */ long long physmem_size = 32 * 1024 * 1024; -static char *usage_string = +static const char *usage_string = "User Mode Linux v%s\n" " available at http://user-mode-linux.sourceforge.net/\n\n"; @@ -202,7 +202,7 @@ static void __init uml_checksetup(char *line, int *add) p = &__uml_setup_start; while (p < &__uml_setup_end) { - int n; + size_t n; n = strlen(p->str); if (!strncmp(line, p->str, n) && p->setup_func(line + n, add)) @@ -258,7 +258,8 @@ int __init linux_main(int argc, char **argv) { unsigned long avail, diff; unsigned long virtmem_size, max_physmem; - unsigned int i, add; + unsigned int i; + int add; char * mode; for (i = 1; i < argc; i++) { -- cgit v1.2.3 From 3595726ac349ca9682703535e9a999c4f08c2d80 Mon Sep 17 00:00:00 2001 From: Harvey Harrison Date: Mon, 28 Apr 2008 02:13:53 -0700 Subject: uml: replace remaining __FUNCTION__ occurrences __FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison Cc: Jeff Dike Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/um/drivers/line.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/um/drivers/line.c b/arch/um/drivers/line.c index 2c898c4d6b6a..10b86e1cc659 100644 --- a/arch/um/drivers/line.c +++ b/arch/um/drivers/line.c @@ -304,7 +304,7 @@ int line_ioctl(struct tty_struct *tty, struct file * file, break; if (i == ARRAY_SIZE(tty_ioctls)) { printk(KERN_ERR "%s: %s: unknown ioctl: 0x%x\n", - __FUNCTION__, tty->name, cmd); + __func__, tty->name, cmd); } ret = -ENOIOCTLCMD; break; -- cgit v1.2.3 From 626c59f5edb284027bfe25cc15e7de2f532090b5 Mon Sep 17 00:00:00 2001 From: WANG Cong Date: Mon, 28 Apr 2008 02:13:53 -0700 Subject: arch/um/os-Linux/start_up.c: various improvements. - lets ptrace_child become void - adds checking for the return value of change_sig - moves errors info into stderr instead of stdout. Cc: Jeff Dike Signed-off-by: WANG Cong Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/um/os-Linux/start_up.c | 14 ++++++-------- 1 file changed, 6 insertions(+), 8 deletions(-) diff --git a/arch/um/os-Linux/start_up.c b/arch/um/os-Linux/start_up.c index b616e15638fb..997d01944f91 100644 --- a/arch/um/os-Linux/start_up.c +++ b/arch/um/os-Linux/start_up.c @@ -25,15 +25,15 @@ #include "registers.h" #include "skas_ptrace.h" -static int ptrace_child(void) +static void ptrace_child(void) { int ret; /* Calling os_getpid because some libcs cached getpid incorrectly */ int pid = os_getpid(), ppid = getppid(); int sc_result; - change_sig(SIGWINCH, 0); - if (ptrace(PTRACE_TRACEME, 0, 0, 0) < 0) { + if (change_sig(SIGWINCH, 0) < 0 || + ptrace(PTRACE_TRACEME, 0, 0, 0) < 0) { perror("ptrace"); kill(pid, SIGKILL); } @@ -75,9 +75,8 @@ static void fatal(char *fmt, ...) va_list list; va_start(list, fmt); - vprintf(fmt, list); + vfprintf(stderr, fmt, list); va_end(list); - fflush(stdout); exit(1); } @@ -87,9 +86,8 @@ static void non_fatal(char *fmt, ...) va_list list; va_start(list, fmt); - vprintf(fmt, list); + vfprintf(stderr, fmt, list); va_end(list); - fflush(stdout); } static int start_ptraced_child(void) @@ -495,7 +493,7 @@ int __init parse_iomem(char *str, int *add) driver = str; file = strchr(str,','); if (file == NULL) { - printf("parse_iomem : failed to parse iomem\n"); + fprintf(stderr, "parse_iomem : failed to parse iomem\n"); goto out; } *file = '\0'; -- cgit v1.2.3 From 3af7cb7bbcf0872b749a32bb48a7bc11f33bcd8c Mon Sep 17 00:00:00 2001 From: WANG Cong Date: Mon, 28 Apr 2008 02:13:55 -0700 Subject: uml: make a function static arch/um/drivers/chan_kern.c::open_chan() can become static. Acked-by: Jeff Dike Signed-off-by: WANG Cong Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/um/drivers/chan_kern.c | 2 +- arch/um/include/chan_kern.h | 1 - 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/arch/um/drivers/chan_kern.c b/arch/um/drivers/chan_kern.c index db3082b4da46..de22a101ef72 100644 --- a/arch/um/drivers/chan_kern.c +++ b/arch/um/drivers/chan_kern.c @@ -125,7 +125,7 @@ static int open_one_chan(struct chan *chan) return 0; } -int open_chan(struct list_head *chans) +static int open_chan(struct list_head *chans) { struct list_head *ele; struct chan *chan; diff --git a/arch/um/include/chan_kern.h b/arch/um/include/chan_kern.h index 624b5100a3cd..de5f6970cbdd 100644 --- a/arch/um/include/chan_kern.h +++ b/arch/um/include/chan_kern.h @@ -31,7 +31,6 @@ extern void chan_interrupt(struct list_head *chans, struct delayed_work *task, struct tty_struct *tty, int irq); extern int parse_chan_pair(char *str, struct line *line, int device, const struct chan_opts *opts, char **error_out); -extern int open_chan(struct list_head *chans); extern int write_chan(struct list_head *chans, const char *buf, int len, int write_irq); extern int console_write_chan(struct list_head *chans, const char *buf, -- cgit v1.2.3 From 02d324b15dfa31b3b1025fb5abda08a8ee23ce84 Mon Sep 17 00:00:00 2001 From: WANG Cong Date: Mon, 28 Apr 2008 02:13:56 -0700 Subject: uml: remove a useless function arch/um/drivers/chan_kern.c::chan_out_fd() is not used by anyone. Remove it. Acked-by: Jeff Dike Signed-off-by: WANG Cong Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/um/drivers/chan_kern.c | 13 ------------- arch/um/include/chan_kern.h | 1 - 2 files changed, 14 deletions(-) diff --git a/arch/um/drivers/chan_kern.c b/arch/um/drivers/chan_kern.c index de22a101ef72..6e51424745ab 100644 --- a/arch/um/drivers/chan_kern.c +++ b/arch/um/drivers/chan_kern.c @@ -583,19 +583,6 @@ int parse_chan_pair(char *str, struct line *line, int device, return 0; } -int chan_out_fd(struct list_head *chans) -{ - struct list_head *ele; - struct chan *chan; - - list_for_each(ele, chans) { - chan = list_entry(ele, struct chan, list); - if (chan->primary && chan->output) - return chan->fd; - } - return -1; -} - void chan_interrupt(struct list_head *chans, struct delayed_work *task, struct tty_struct *tty, int irq) { diff --git a/arch/um/include/chan_kern.h b/arch/um/include/chan_kern.h index de5f6970cbdd..1e651457e049 100644 --- a/arch/um/include/chan_kern.h +++ b/arch/um/include/chan_kern.h @@ -44,7 +44,6 @@ extern void close_chan(struct list_head *chans, int delay_free_irq); extern int chan_window_size(struct list_head *chans, unsigned short *rows_out, unsigned short *cols_out); -extern int chan_out_fd(struct list_head *chans); extern int chan_config_string(struct list_head *chans, char *str, int size, char **error_out); -- cgit v1.2.3 From 1605ec044300d0fd5d27fd0b6879ee14b104aebd Mon Sep 17 00:00:00 2001 From: WANG Cong Date: Mon, 28 Apr 2008 02:13:56 -0700 Subject: uml: make three functions static Make the following three functions static, since they don't need to be global. arch/um/drivers/mcast_kern.c::mcast_setup() arch/um/drivers/mconsole_user.c::mconsole_reply_v0() arch/um/drivers/port_user.c::port_pre_exec() Acked-by: Jeff Dike Signed-off-by: WANG Cong Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/um/drivers/mcast_kern.c | 2 +- arch/um/drivers/mconsole_user.c | 2 +- arch/um/drivers/port_user.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/arch/um/drivers/mcast_kern.c b/arch/um/drivers/mcast_kern.c index 822092f149be..8c4378a76d63 100644 --- a/arch/um/drivers/mcast_kern.c +++ b/arch/um/drivers/mcast_kern.c @@ -58,7 +58,7 @@ static const struct net_kern_info mcast_kern_info = { .write = mcast_write, }; -int mcast_setup(char *str, char **mac_out, void *data) +static int mcast_setup(char *str, char **mac_out, void *data) { struct mcast_init *init = data; char *port_str = NULL, *ttl_str = NULL, *remain; diff --git a/arch/um/drivers/mconsole_user.c b/arch/um/drivers/mconsole_user.c index 13af2f03ed84..f8cf4c8bedef 100644 --- a/arch/um/drivers/mconsole_user.c +++ b/arch/um/drivers/mconsole_user.c @@ -39,7 +39,7 @@ static struct mconsole_command commands[] = { /* Initialized in mconsole_init, which is an initcall */ char mconsole_socket_name[256]; -int mconsole_reply_v0(struct mc_request *req, char *reply) +static int mconsole_reply_v0(struct mc_request *req, char *reply) { struct iovec iov; struct msghdr msg; diff --git a/arch/um/drivers/port_user.c b/arch/um/drivers/port_user.c index addd75902656..d269ca387f10 100644 --- a/arch/um/drivers/port_user.c +++ b/arch/um/drivers/port_user.c @@ -153,7 +153,7 @@ struct port_pre_exec_data { int pipe_fd; }; -void port_pre_exec(void *arg) +static void port_pre_exec(void *arg) { struct port_pre_exec_data *data = arg; -- cgit v1.2.3 From 074a0db8e17ae271736148809c5f9d47dec2d993 Mon Sep 17 00:00:00 2001 From: WANG Cong Date: Mon, 28 Apr 2008 02:13:57 -0700 Subject: uml: make several things static Make several things static, because they no longer need to be global. Acked-by: Jeff Dike Signed-off-by: WANG Cong Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/um/drivers/net_kern.c | 6 +++--- arch/um/drivers/slip_kern.c | 4 ++-- arch/um/drivers/stdio_console.c | 4 ++-- 3 files changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/um/drivers/net_kern.c b/arch/um/drivers/net_kern.c index 1d43bdfc20c4..5b4ca8d93682 100644 --- a/arch/um/drivers/net_kern.c +++ b/arch/um/drivers/net_kern.c @@ -116,7 +116,7 @@ static void uml_dev_close(struct work_struct *work) dev_close(lp->dev); } -irqreturn_t uml_net_interrupt(int irq, void *dev_id) +static irqreturn_t uml_net_interrupt(int irq, void *dev_id) { struct net_device *dev = dev_id; struct uml_net_private *lp = dev->priv; @@ -296,7 +296,7 @@ static struct ethtool_ops uml_net_ethtool_ops = { .get_link = ethtool_op_get_link, }; -void uml_net_user_timer_expire(unsigned long _conn) +static void uml_net_user_timer_expire(unsigned long _conn) { #ifdef undef struct connection *conn = (struct connection *)_conn; @@ -786,7 +786,7 @@ static int uml_inetaddr_event(struct notifier_block *this, unsigned long event, } /* uml_net_init shouldn't be called twice on two CPUs at the same time */ -struct notifier_block uml_inetaddr_notifier = { +static struct notifier_block uml_inetaddr_notifier = { .notifier_call = uml_inetaddr_event, }; diff --git a/arch/um/drivers/slip_kern.c b/arch/um/drivers/slip_kern.c index 6b4a0f9e38de..d19faec7046e 100644 --- a/arch/um/drivers/slip_kern.c +++ b/arch/um/drivers/slip_kern.c @@ -13,7 +13,7 @@ struct slip_init { char *gate_addr; }; -void slip_init(struct net_device *dev, void *data) +static void slip_init(struct net_device *dev, void *data) { struct uml_net_private *private; struct slip_data *spri; @@ -57,7 +57,7 @@ static int slip_write(int fd, struct sk_buff *skb, struct uml_net_private *lp) (struct slip_data *) &lp->user); } -const struct net_kern_info slip_kern_info = { +static const struct net_kern_info slip_kern_info = { .init = slip_init, .protocol = slip_protocol, .read = slip_read, diff --git a/arch/um/drivers/stdio_console.c b/arch/um/drivers/stdio_console.c index cec0c33cdd39..49266f6108c4 100644 --- a/arch/um/drivers/stdio_console.c +++ b/arch/um/drivers/stdio_console.c @@ -34,7 +34,7 @@ static struct tty_driver *console_driver; -void stdio_announce(char *dev_name, int dev) +static void stdio_announce(char *dev_name, int dev) { printk(KERN_INFO "Virtual console %d assigned device '%s'\n", dev, dev_name); @@ -158,7 +158,7 @@ static struct console stdiocons = { .index = -1, }; -int stdio_init(void) +static int stdio_init(void) { char *new_title; -- cgit v1.2.3 From 4415d8a5aaec2008833e1c474b38627c0bc738ca Mon Sep 17 00:00:00 2001 From: WANG Cong Date: Mon, 28 Apr 2008 02:13:57 -0700 Subject: arch/um/os-Linux/sys-i386/task_size.c: improve a bit Improve this code a bit: check sigaction's return value and remove a useless fflush(). Acked-by: Jeff Dike Signed-off-by: WANG Cong Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/um/os-Linux/sys-i386/task_size.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/arch/um/os-Linux/sys-i386/task_size.c b/arch/um/os-Linux/sys-i386/task_size.c index 48d211b3d9a1..ccb49b0aff59 100644 --- a/arch/um/os-Linux/sys-i386/task_size.c +++ b/arch/um/os-Linux/sys-i386/task_size.c @@ -88,7 +88,10 @@ unsigned long os_get_task_size(void) sa.sa_handler = segfault; sigemptyset(&sa.sa_mask); sa.sa_flags = SA_NODEFER; - sigaction(SIGSEGV, &sa, &old); + if (sigaction(SIGSEGV, &sa, &old)) { + perror("os_get_task_size"); + exit(1); + } if (!page_ok(bottom)) { fprintf(stderr, "Address 0x%x no good?\n", @@ -110,11 +113,12 @@ unsigned long os_get_task_size(void) out: /* Restore the old SIGSEGV handling */ - sigaction(SIGSEGV, &old, NULL); - + if (sigaction(SIGSEGV, &old, NULL)) { + perror("os_get_task_size"); + exit(1); + } top <<= UM_KERN_PAGE_SHIFT; printf("0x%x\n", top); - fflush(stdout); return top; } -- cgit v1.2.3 From 5dc62b1b6408396d5f6c13ed585adc87b2e296f9 Mon Sep 17 00:00:00 2001 From: WANG Cong Date: Mon, 28 Apr 2008 02:13:58 -0700 Subject: uml: clean up arch/um/drivers/ubd_kern.c Make some global functions and variables static. And remove some useless declarations for local functions, since we just need to move their definitions ahead. [jdike@addtoit.com: checkpatch cleanups] Signed-off-by: WANG Cong Signed-off-by: Jeff Dike Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/um/drivers/ubd_kern.c | 385 ++++++++++++++++++++++----------------------- 1 file changed, 190 insertions(+), 195 deletions(-) diff --git a/arch/um/drivers/ubd_kern.c b/arch/um/drivers/ubd_kern.c index be3a2797dac4..5e45e39a8a8d 100644 --- a/arch/um/drivers/ubd_kern.c +++ b/arch/um/drivers/ubd_kern.c @@ -72,18 +72,6 @@ struct io_thread_req { int error; }; -extern int open_ubd_file(char *file, struct openflags *openflags, int shared, - char **backing_file_out, int *bitmap_offset_out, - unsigned long *bitmap_len_out, int *data_offset_out, - int *create_cow_out); -extern int create_cow_file(char *cow_file, char *backing_file, - struct openflags flags, int sectorsize, - int alignment, int *bitmap_offset_out, - unsigned long *bitmap_len_out, - int *data_offset_out); -extern int read_cow_bitmap(int fd, void *buf, int offset, int len); -extern void do_io(struct io_thread_req *req); - static inline int ubd_test_bit(__u64 bit, unsigned char *data) { __u64 n; @@ -200,7 +188,7 @@ struct ubd { } /* Protected by ubd_lock */ -struct ubd ubd_devs[MAX_DEV] = { [ 0 ... MAX_DEV - 1 ] = DEFAULT_UBD }; +static struct ubd ubd_devs[MAX_DEV] = { [0 ... MAX_DEV - 1] = DEFAULT_UBD }; /* Only changed by fake_ide_setup which is a setup */ static int fake_ide = 0; @@ -463,7 +451,7 @@ __uml_help(udb_setup, static void do_ubd_request(struct request_queue * q); /* Only changed by ubd_init, which is an initcall. */ -int thread_fd = -1; +static int thread_fd = -1; static void ubd_end_request(struct request *req, int bytes, int error) { @@ -531,7 +519,7 @@ static irqreturn_t ubd_intr(int irq, void *dev) /* Only changed by ubd_init, which is an initcall. */ static int io_pid = -1; -void kill_io_thread(void) +static void kill_io_thread(void) { if(io_pid != -1) os_kill_process(io_pid, 1); @@ -547,6 +535,192 @@ static inline int ubd_file_size(struct ubd *ubd_dev, __u64 *size_out) return os_file_size(file, size_out); } +static int read_cow_bitmap(int fd, void *buf, int offset, int len) +{ + int err; + + err = os_seek_file(fd, offset); + if (err < 0) + return err; + + err = os_read_file(fd, buf, len); + if (err < 0) + return err; + + return 0; +} + +static int backing_file_mismatch(char *file, __u64 size, time_t mtime) +{ + unsigned long modtime; + unsigned long long actual; + int err; + + err = os_file_modtime(file, &modtime); + if (err < 0) { + printk(KERN_ERR "Failed to get modification time of backing " + "file \"%s\", err = %d\n", file, -err); + return err; + } + + err = os_file_size(file, &actual); + if (err < 0) { + printk(KERN_ERR "Failed to get size of backing file \"%s\", " + "err = %d\n", file, -err); + return err; + } + + if (actual != size) { + /*__u64 can be a long on AMD64 and with %lu GCC complains; so + * the typecast.*/ + printk(KERN_ERR "Size mismatch (%llu vs %llu) of COW header " + "vs backing file\n", (unsigned long long) size, actual); + return -EINVAL; + } + if (modtime != mtime) { + printk(KERN_ERR "mtime mismatch (%ld vs %ld) of COW header vs " + "backing file\n", mtime, modtime); + return -EINVAL; + } + return 0; +} + +static int path_requires_switch(char *from_cmdline, char *from_cow, char *cow) +{ + struct uml_stat buf1, buf2; + int err; + + if (from_cmdline == NULL) + return 0; + if (!strcmp(from_cmdline, from_cow)) + return 0; + + err = os_stat_file(from_cmdline, &buf1); + if (err < 0) { + printk(KERN_ERR "Couldn't stat '%s', err = %d\n", from_cmdline, + -err); + return 0; + } + err = os_stat_file(from_cow, &buf2); + if (err < 0) { + printk(KERN_ERR "Couldn't stat '%s', err = %d\n", from_cow, + -err); + return 1; + } + if ((buf1.ust_dev == buf2.ust_dev) && (buf1.ust_ino == buf2.ust_ino)) + return 0; + + printk(KERN_ERR "Backing file mismatch - \"%s\" requested, " + "\"%s\" specified in COW header of \"%s\"\n", + from_cmdline, from_cow, cow); + return 1; +} + +static int open_ubd_file(char *file, struct openflags *openflags, int shared, + char **backing_file_out, int *bitmap_offset_out, + unsigned long *bitmap_len_out, int *data_offset_out, + int *create_cow_out) +{ + time_t mtime; + unsigned long long size; + __u32 version, align; + char *backing_file; + int fd, err, sectorsize, asked_switch, mode = 0644; + + fd = os_open_file(file, *openflags, mode); + if (fd < 0) { + if ((fd == -ENOENT) && (create_cow_out != NULL)) + *create_cow_out = 1; + if (!openflags->w || + ((fd != -EROFS) && (fd != -EACCES))) + return fd; + openflags->w = 0; + fd = os_open_file(file, *openflags, mode); + if (fd < 0) + return fd; + } + + if (shared) + printk(KERN_INFO "Not locking \"%s\" on the host\n", file); + else { + err = os_lock_file(fd, openflags->w); + if (err < 0) { + printk(KERN_ERR "Failed to lock '%s', err = %d\n", + file, -err); + goto out_close; + } + } + + /* Successful return case! */ + if (backing_file_out == NULL) + return fd; + + err = read_cow_header(file_reader, &fd, &version, &backing_file, &mtime, + &size, §orsize, &align, bitmap_offset_out); + if (err && (*backing_file_out != NULL)) { + printk(KERN_ERR "Failed to read COW header from COW file " + "\"%s\", errno = %d\n", file, -err); + goto out_close; + } + if (err) + return fd; + + asked_switch = path_requires_switch(*backing_file_out, backing_file, + file); + + /* Allow switching only if no mismatch. */ + if (asked_switch && !backing_file_mismatch(*backing_file_out, size, + mtime)) { + printk(KERN_ERR "Switching backing file to '%s'\n", + *backing_file_out); + err = write_cow_header(file, fd, *backing_file_out, + sectorsize, align, &size); + if (err) { + printk(KERN_ERR "Switch failed, errno = %d\n", -err); + goto out_close; + } + } else { + *backing_file_out = backing_file; + err = backing_file_mismatch(*backing_file_out, size, mtime); + if (err) + goto out_close; + } + + cow_sizes(version, size, sectorsize, align, *bitmap_offset_out, + bitmap_len_out, data_offset_out); + + return fd; + out_close: + os_close_file(fd); + return err; +} + +static int create_cow_file(char *cow_file, char *backing_file, + struct openflags flags, + int sectorsize, int alignment, int *bitmap_offset_out, + unsigned long *bitmap_len_out, int *data_offset_out) +{ + int err, fd; + + flags.c = 1; + fd = open_ubd_file(cow_file, &flags, 0, NULL, NULL, NULL, NULL, NULL); + if (fd < 0) { + err = fd; + printk(KERN_ERR "Open of COW file '%s' failed, errno = %d\n", + cow_file, -err); + goto out; + } + + err = init_cow_file(fd, cow_file, backing_file, sectorsize, alignment, + bitmap_offset_out, bitmap_len_out, + data_offset_out); + if (!err) + return fd; + os_close_file(fd); + out: + return err; +} + static void ubd_close_dev(struct ubd *ubd_dev) { os_close_file(ubd_dev->fd); @@ -1166,185 +1340,6 @@ static int ubd_ioctl(struct inode * inode, struct file * file, return -EINVAL; } -static int path_requires_switch(char *from_cmdline, char *from_cow, char *cow) -{ - struct uml_stat buf1, buf2; - int err; - - if(from_cmdline == NULL) - return 0; - if(!strcmp(from_cmdline, from_cow)) - return 0; - - err = os_stat_file(from_cmdline, &buf1); - if(err < 0){ - printk("Couldn't stat '%s', err = %d\n", from_cmdline, -err); - return 0; - } - err = os_stat_file(from_cow, &buf2); - if(err < 0){ - printk("Couldn't stat '%s', err = %d\n", from_cow, -err); - return 1; - } - if((buf1.ust_dev == buf2.ust_dev) && (buf1.ust_ino == buf2.ust_ino)) - return 0; - - printk("Backing file mismatch - \"%s\" requested,\n" - "\"%s\" specified in COW header of \"%s\"\n", - from_cmdline, from_cow, cow); - return 1; -} - -static int backing_file_mismatch(char *file, __u64 size, time_t mtime) -{ - unsigned long modtime; - unsigned long long actual; - int err; - - err = os_file_modtime(file, &modtime); - if(err < 0){ - printk("Failed to get modification time of backing file " - "\"%s\", err = %d\n", file, -err); - return err; - } - - err = os_file_size(file, &actual); - if(err < 0){ - printk("Failed to get size of backing file \"%s\", " - "err = %d\n", file, -err); - return err; - } - - if(actual != size){ - /*__u64 can be a long on AMD64 and with %lu GCC complains; so - * the typecast.*/ - printk("Size mismatch (%llu vs %llu) of COW header vs backing " - "file\n", (unsigned long long) size, actual); - return -EINVAL; - } - if(modtime != mtime){ - printk("mtime mismatch (%ld vs %ld) of COW header vs backing " - "file\n", mtime, modtime); - return -EINVAL; - } - return 0; -} - -int read_cow_bitmap(int fd, void *buf, int offset, int len) -{ - int err; - - err = os_seek_file(fd, offset); - if(err < 0) - return err; - - err = os_read_file(fd, buf, len); - if(err < 0) - return err; - - return 0; -} - -int open_ubd_file(char *file, struct openflags *openflags, int shared, - char **backing_file_out, int *bitmap_offset_out, - unsigned long *bitmap_len_out, int *data_offset_out, - int *create_cow_out) -{ - time_t mtime; - unsigned long long size; - __u32 version, align; - char *backing_file; - int fd, err, sectorsize, asked_switch, mode = 0644; - - fd = os_open_file(file, *openflags, mode); - if (fd < 0) { - if ((fd == -ENOENT) && (create_cow_out != NULL)) - *create_cow_out = 1; - if (!openflags->w || - ((fd != -EROFS) && (fd != -EACCES))) - return fd; - openflags->w = 0; - fd = os_open_file(file, *openflags, mode); - if (fd < 0) - return fd; - } - - if(shared) - printk("Not locking \"%s\" on the host\n", file); - else { - err = os_lock_file(fd, openflags->w); - if(err < 0){ - printk("Failed to lock '%s', err = %d\n", file, -err); - goto out_close; - } - } - - /* Successful return case! */ - if(backing_file_out == NULL) - return fd; - - err = read_cow_header(file_reader, &fd, &version, &backing_file, &mtime, - &size, §orsize, &align, bitmap_offset_out); - if(err && (*backing_file_out != NULL)){ - printk("Failed to read COW header from COW file \"%s\", " - "errno = %d\n", file, -err); - goto out_close; - } - if(err) - return fd; - - asked_switch = path_requires_switch(*backing_file_out, backing_file, file); - - /* Allow switching only if no mismatch. */ - if (asked_switch && !backing_file_mismatch(*backing_file_out, size, mtime)) { - printk("Switching backing file to '%s'\n", *backing_file_out); - err = write_cow_header(file, fd, *backing_file_out, - sectorsize, align, &size); - if (err) { - printk("Switch failed, errno = %d\n", -err); - goto out_close; - } - } else { - *backing_file_out = backing_file; - err = backing_file_mismatch(*backing_file_out, size, mtime); - if (err) - goto out_close; - } - - cow_sizes(version, size, sectorsize, align, *bitmap_offset_out, - bitmap_len_out, data_offset_out); - - return fd; - out_close: - os_close_file(fd); - return err; -} - -int create_cow_file(char *cow_file, char *backing_file, struct openflags flags, - int sectorsize, int alignment, int *bitmap_offset_out, - unsigned long *bitmap_len_out, int *data_offset_out) -{ - int err, fd; - - flags.c = 1; - fd = open_ubd_file(cow_file, &flags, 0, NULL, NULL, NULL, NULL, NULL); - if(fd < 0){ - err = fd; - printk("Open of COW file '%s' failed, errno = %d\n", cow_file, - -err); - goto out; - } - - err = init_cow_file(fd, cow_file, backing_file, sectorsize, alignment, - bitmap_offset_out, bitmap_len_out, - data_offset_out); - if(!err) - return fd; - os_close_file(fd); - out: - return err; -} - static int update_bitmap(struct io_thread_req *req) { int n; @@ -1369,7 +1364,7 @@ static int update_bitmap(struct io_thread_req *req) return 0; } -void do_io(struct io_thread_req *req) +static void do_io(struct io_thread_req *req) { char *buf; unsigned long len; -- cgit v1.2.3 From cdf8803768db6f652d498628fe1421a23c025253 Mon Sep 17 00:00:00 2001 From: Harvey Harrison Date: Mon, 28 Apr 2008 02:13:59 -0700 Subject: ncpfs: add prototypes to ncp_fs.h Removes some externs from C files, noticed from the sparse warnings: fs/ncpfs/dir.c:90:26: warning: symbol 'ncp_root_dentry_operations' was not declared. Should it be static? fs/ncpfs/symlink.c:107:5: warning: symbol 'ncp_symlink' was not declared. Should it be static? fs/ncpfs/symlink.c:101:39: warning: symbol 'ncp_symlink_aops' was not declared. Should it be static? Signed-off-by: Harvey Harrison Acked-by: Petr Vandrovec Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/ncpfs/inode.c | 6 ------ include/linux/ncp_fs.h | 7 +++++++ 2 files changed, 7 insertions(+), 6 deletions(-) diff --git a/fs/ncpfs/inode.c b/fs/ncpfs/inode.c index fbbb9f7afa1a..2e5ab1204dec 100644 --- a/fs/ncpfs/inode.c +++ b/fs/ncpfs/inode.c @@ -107,12 +107,6 @@ static const struct super_operations ncp_sops = .show_options = ncp_show_options, }; -extern struct dentry_operations ncp_root_dentry_operations; -#if defined(CONFIG_NCPFS_EXTRAS) || defined(CONFIG_NCPFS_NFS_NS) -extern const struct address_space_operations ncp_symlink_aops; -extern int ncp_symlink(struct inode*, struct dentry*, const char*); -#endif - /* * Fill in the ncpfs-specific information in the inode. */ diff --git a/include/linux/ncp_fs.h b/include/linux/ncp_fs.h index 88766e43e121..9f2d76347f19 100644 --- a/include/linux/ncp_fs.h +++ b/include/linux/ncp_fs.h @@ -204,6 +204,7 @@ void ncp_update_inode2(struct inode *, struct ncp_entry_info *); /* linux/fs/ncpfs/dir.c */ extern const struct inode_operations ncp_dir_inode_operations; extern const struct file_operations ncp_dir_operations; +extern struct dentry_operations ncp_root_dentry_operations; int ncp_conn_logged_in(struct super_block *); int ncp_date_dos2unix(__le16 time, __le16 date); void ncp_date_unix2dos(int unix_date, __le16 * time, __le16 * date); @@ -223,6 +224,12 @@ int ncp_disconnect(struct ncp_server *server); void ncp_lock_server(struct ncp_server *server); void ncp_unlock_server(struct ncp_server *server); +/* linux/fs/ncpfs/symlink.c */ +#if defined(CONFIG_NCPFS_EXTRAS) || defined(CONFIG_NCPFS_NFS_NS) +extern const struct address_space_operations ncp_symlink_aops; +int ncp_symlink(struct inode*, struct dentry*, const char*); +#endif + /* linux/fs/ncpfs/file.c */ extern const struct inode_operations ncp_file_inode_operations; extern const struct file_operations ncp_file_operations; -- cgit v1.2.3 From 305787e44ebc21d87ab4d4949da5b97d4252aa9b Mon Sep 17 00:00:00 2001 From: Harvey Harrison Date: Mon, 28 Apr 2008 02:14:00 -0700 Subject: ncpfs: fix sparse warnings in ioctl.c In both cases, these inode variables arebeing used to test the server's root inode against NULL. Change them to s_inode. fs/ncpfs/ioctl.c:391:18: warning: symbol 'inode' shadows an earlier one fs/ncpfs/ioctl.c:264:28: originally declared here fs/ncpfs/ioctl.c:441:17: warning: symbol 'inode' shadows an earlier one fs/ncpfs/ioctl.c:264:28: originally declared here In this case, we are about to return anyway, just reuse result. fs/ncpfs/ioctl.c:521:8: warning: symbol 'result' shadows an earlier one fs/ncpfs/ioctl.c:268:6: originally declared here Signed-off-by: Harvey Harrison Acked-by: Petr Vandrovec Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/ncpfs/ioctl.c | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/fs/ncpfs/ioctl.c b/fs/ncpfs/ioctl.c index ad8f167e54bc..3a97c95e1ca2 100644 --- a/fs/ncpfs/ioctl.c +++ b/fs/ncpfs/ioctl.c @@ -389,11 +389,11 @@ static int __ncp_ioctl(struct inode *inode, struct file *filp, struct dentry* dentry = inode->i_sb->s_root; if (dentry) { - struct inode* inode = dentry->d_inode; + struct inode* s_inode = dentry->d_inode; - if (inode) { - sr.volNumber = NCP_FINFO(inode)->volNumber; - sr.dirEntNum = NCP_FINFO(inode)->dirEntNum; + if (s_inode) { + sr.volNumber = NCP_FINFO(s_inode)->volNumber; + sr.dirEntNum = NCP_FINFO(s_inode)->dirEntNum; sr.namespace = server->name_space[sr.volNumber]; } else DPRINTK("ncpfs: s_root->d_inode==NULL\n"); @@ -439,12 +439,12 @@ static int __ncp_ioctl(struct inode *inode, struct file *filp, dentry = inode->i_sb->s_root; server->root_setuped = 1; if (dentry) { - struct inode* inode = dentry->d_inode; + struct inode* s_inode = dentry->d_inode; if (inode) { - NCP_FINFO(inode)->volNumber = vnum; - NCP_FINFO(inode)->dirEntNum = de; - NCP_FINFO(inode)->DosDirNum = dosde; + NCP_FINFO(s_inode)->volNumber = vnum; + NCP_FINFO(s_inode)->dirEntNum = de; + NCP_FINFO(s_inode)->DosDirNum = dosde; } else DPRINTK("ncpfs: s_root->d_inode==NULL\n"); } else @@ -519,7 +519,6 @@ static int __ncp_ioctl(struct inode *inode, struct file *filp, } { struct ncp_lock_ioctl rqdata; - int result; if (copy_from_user(&rqdata, argp, sizeof(rqdata))) return -EFAULT; -- cgit v1.2.3 From eee3754f5e45bd27e001ea41823bdbcdd0d192d4 Mon Sep 17 00:00:00 2001 From: Harvey Harrison Date: Mon, 28 Apr 2008 02:14:01 -0700 Subject: ncpfs: fix sparse warning in ncpsign_kernel.c We're casting anyway, might as well cast to the correct sign. Specific to i386 (ifdef __i386__) fs/ncpfs/ncpsign_kernel.c:58:23: warning: incorrect type in initializer (different signedness) fs/ncpfs/ncpsign_kernel.c:58:23: expected unsigned int *data2 fs/ncpfs/ncpsign_kernel.c:58:23: got int * Signed-off-by: Harvey Harrison Acked-by: Petr Vandrovec Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/ncpfs/ncpsign_kernel.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ncpfs/ncpsign_kernel.c b/fs/ncpfs/ncpsign_kernel.c index 749a18d33599..7c0b5c21e6cf 100644 --- a/fs/ncpfs/ncpsign_kernel.c +++ b/fs/ncpfs/ncpsign_kernel.c @@ -55,7 +55,7 @@ static void nwsign(char *r_data1, char *r_data2, char *outdata) { unsigned int w0,w1,w2,w3; static int rbit[4]={0, 2, 1, 3}; #ifdef __i386__ - unsigned int *data2=(int *)r_data2; + unsigned int *data2=(unsigned int *)r_data2; #else unsigned int data2[16]; for (i=0;i<16;i++) -- cgit v1.2.3 From 7a63ce5a1f2fde5ae737f059e2714e441447120c Mon Sep 17 00:00:00 2001 From: Sam Ravnborg Date: Mon, 28 Apr 2008 02:14:02 -0700 Subject: serial: silence section mismatch warnings in 8250_pci Fix following warnings: WARNING: drivers/serial/built-in.o(.data+0x5b8): Section mismatch in reference from the variable pci_serial_quirks to the function .devexit.text:pci_ite887x_exit() WARNING: drivers/serial/built-in.o(.data+0x5e0): Section mismatch in reference from the variable pci_serial_quirks to the function .devexit.text:pci_plx9050_exit() WARNING: drivers/serial/built-in.o(.data+0x608): Section mismatch in reference from the variable pci_serial_quirks to the function .devexit.text:pci_plx9050_exit() WARNING: drivers/serial/built-in.o(.data+0x658): Section mismatch in reference from the variable pci_serial_quirks to the function .devexit.text:pci_plx9050_exit() WARNING: drivers/serial/built-in.o(.data+0x680): Section mismatch in reference from the variable pci_serial_quirks to the function .devexit.text:pci_plx9050_exit() WARNING: drivers/serial/built-in.o(.data+0x6a8): Section mismatch in reference from the variable pci_serial_quirks to the function .devexit.text:pci_plx9050_exit() WARNING: drivers/serial/built-in.o(.data+0x6d0): Section mismatch in reference from the variable pci_serial_quirks to the function .devexit.text:sbs_exit() WARNING: drivers/serial/built-in.o(.data+0x6f8): Section mismatch in reference from the variable pci_serial_quirks to the function .devexit.text:sbs_exit() WARNING: drivers/serial/built-in.o(.data+0x720): Section mismatch in reference from the variable pci_serial_quirks to the function .devexit.text:sbs_exit() WARNING: drivers/serial/built-in.o(.data+0x748): Section mismatch in reference from the variable pci_serial_quirks to the function .devexit.text:sbs_exit() pci_serial_quirks contains a number of function pointers where the referenced function is annotated __devexit. This is OK so we annotate pci_serial_quirks with __refdata to ignore the __devexit references Signed-off-by: Sam Ravnborg Cc: Russell King Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/serial/8250_pci.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/serial/8250_pci.c b/drivers/serial/8250_pci.c index f97224ce59da..6e57382b9137 100644 --- a/drivers/serial/8250_pci.c +++ b/drivers/serial/8250_pci.c @@ -775,7 +775,7 @@ pci_default_setup(struct serial_private *priv, struct pciserial_board *board, * This list is ordered alphabetically by vendor then device. * Specific entries must come before more generic entries. */ -static struct pci_serial_quirk pci_serial_quirks[] = { +static struct pci_serial_quirk pci_serial_quirks[] __refdata = { /* * ADDI-DATA GmbH communication cards */ -- cgit v1.2.3 From 0fab6de09c71a976e5d765e1ff548b14be385153 Mon Sep 17 00:00:00 2001 From: Joe Perches Date: Mon, 28 Apr 2008 02:14:02 -0700 Subject: synclink drivers bool conversion Remove more TRUE/FALSE defines and uses Remove == TRUE tests Convert BOOLEAN to bool Convert int to bool where appropriate Signed-off-by: Joe Perches Acked-by: Paul Fulghum Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/char/pcmcia/synclink_cs.c | 125 +++++++++--------- drivers/char/synclink.c | 258 +++++++++++++++++++------------------- drivers/char/synclink_gt.c | 102 +++++++-------- drivers/char/synclinkmp.c | 157 +++++++++++------------ include/linux/synclink.h | 4 - 5 files changed, 319 insertions(+), 327 deletions(-) diff --git a/drivers/char/pcmcia/synclink_cs.c b/drivers/char/pcmcia/synclink_cs.c index 4e84d233e5a2..583356426dfb 100644 --- a/drivers/char/pcmcia/synclink_cs.c +++ b/drivers/char/pcmcia/synclink_cs.c @@ -189,20 +189,20 @@ typedef struct _mgslpc_info { u32 pending_bh; - int bh_running; - int bh_requested; + bool bh_running; + bool bh_requested; int dcd_chkcount; /* check counts to prevent */ int cts_chkcount; /* too many IRQs if a signal */ int dsr_chkcount; /* is floating */ int ri_chkcount; - int rx_enabled; - int rx_overflow; + bool rx_enabled; + bool rx_overflow; - int tx_enabled; - int tx_active; - int tx_aborting; + bool tx_enabled; + bool tx_active; + bool tx_aborting; u32 idle_mode; int if_mode; /* serial interface selection (RS-232, v.35 etc) */ @@ -216,12 +216,12 @@ typedef struct _mgslpc_info { unsigned char serial_signals; /* current serial signal states */ - char irq_occurred; /* for diagnostics use */ + bool irq_occurred; /* for diagnostics use */ char testing_irq; unsigned int init_error; /* startup error (DIAGS) */ char flag_buf[MAX_ASYNC_BUFFER_SIZE]; - BOOLEAN drop_rts_on_tx_done; + bool drop_rts_on_tx_done; struct _input_signal_events input_signal_events; @@ -402,8 +402,8 @@ static void hdlcdev_exit(MGSLPC_INFO *info); static void trace_block(MGSLPC_INFO *info,const char* data, int count, int xmit); -static BOOLEAN register_test(MGSLPC_INFO *info); -static BOOLEAN irq_test(MGSLPC_INFO *info); +static bool register_test(MGSLPC_INFO *info); +static bool irq_test(MGSLPC_INFO *info); static int adapter_test(MGSLPC_INFO *info); static int claim_resources(MGSLPC_INFO *info); @@ -411,7 +411,7 @@ static void release_resources(MGSLPC_INFO *info); static void mgslpc_add_device(MGSLPC_INFO *info); static void mgslpc_remove_device(MGSLPC_INFO *info); -static int rx_get_frame(MGSLPC_INFO *info); +static bool rx_get_frame(MGSLPC_INFO *info); static void rx_reset_buffers(MGSLPC_INFO *info); static int rx_alloc_buffers(MGSLPC_INFO *info); static void rx_free_buffers(MGSLPC_INFO *info); @@ -719,7 +719,7 @@ static int mgslpc_resume(struct pcmcia_device *link) } -static inline int mgslpc_paranoia_check(MGSLPC_INFO *info, +static inline bool mgslpc_paranoia_check(MGSLPC_INFO *info, char *name, const char *routine) { #ifdef MGSLPC_PARANOIA_CHECK @@ -730,17 +730,17 @@ static inline int mgslpc_paranoia_check(MGSLPC_INFO *info, if (!info) { printk(badinfo, name, routine); - return 1; + return true; } if (info->magic != MGSLPC_MAGIC) { printk(badmagic, name, routine); - return 1; + return true; } #else if (!info) - return 1; + return true; #endif - return 0; + return false; } @@ -752,16 +752,16 @@ static inline int mgslpc_paranoia_check(MGSLPC_INFO *info, #define CMD_TXEOM BIT1 // transmit end message #define CMD_TXRESET BIT0 // transmit reset -static BOOLEAN wait_command_complete(MGSLPC_INFO *info, unsigned char channel) +static bool wait_command_complete(MGSLPC_INFO *info, unsigned char channel) { int i = 0; /* wait for command completion */ while (read_reg(info, (unsigned char)(channel+STAR)) & BIT2) { udelay(1); if (i++ == 1000) - return FALSE; + return false; } - return TRUE; + return true; } static void issue_command(MGSLPC_INFO *info, unsigned char channel, unsigned char cmd) @@ -825,8 +825,8 @@ static int bh_action(MGSLPC_INFO *info) if (!rc) { /* Mark BH routine as complete */ - info->bh_running = 0; - info->bh_requested = 0; + info->bh_running = false; + info->bh_requested = false; } spin_unlock_irqrestore(&info->lock,flags); @@ -846,7 +846,7 @@ static void bh_handler(struct work_struct *work) printk( "%s(%d):bh_handler(%s) entry\n", __FILE__,__LINE__,info->device_name); - info->bh_running = 1; + info->bh_running = true; while((action = bh_action(info)) != 0) { @@ -913,7 +913,7 @@ static void rx_ready_hdlc(MGSLPC_INFO *info, int eom) /* no more free buffers */ issue_command(info, CHA, CMD_RXRESET); info->pending_bh |= BH_RECEIVE; - info->rx_overflow = 1; + info->rx_overflow = true; info->icount.buf_overrun++; return; } @@ -1032,8 +1032,8 @@ static void tx_done(MGSLPC_INFO *info) if (!info->tx_active) return; - info->tx_active = 0; - info->tx_aborting = 0; + info->tx_active = false; + info->tx_aborting = false; if (info->params.mode == MGSL_MODE_ASYNC) return; @@ -1047,7 +1047,7 @@ static void tx_done(MGSLPC_INFO *info) info->serial_signals &= ~SerialSignal_RTS; set_signals(info); } - info->drop_rts_on_tx_done = 0; + info->drop_rts_on_tx_done = false; } #if SYNCLINK_GENERIC_HDLC @@ -1081,7 +1081,7 @@ static void tx_ready(MGSLPC_INFO *info) return; } if (!info->tx_count) - info->tx_active = 0; + info->tx_active = false; } if (!info->tx_count) @@ -1261,7 +1261,7 @@ static irqreturn_t mgslpc_isr(int dummy, void *dev_id) { isr = read_reg16(info, CHA + ISR); if (isr & IRQ_TIMER) { - info->irq_occurred = 1; + info->irq_occurred = true; irq_disable(info, CHA, IRQ_TIMER); } @@ -1318,7 +1318,7 @@ static irqreturn_t mgslpc_isr(int dummy, void *dev_id) printk("%s(%d):%s queueing bh task.\n", __FILE__,__LINE__,info->device_name); schedule_work(&info->task); - info->bh_requested = 1; + info->bh_requested = true; } spin_unlock(&info->lock); @@ -1990,7 +1990,7 @@ static int tx_abort(MGSLPC_INFO * info) * This results in underrun and abort transmission. */ info->tx_count = info->tx_put = info->tx_get = 0; - info->tx_aborting = TRUE; + info->tx_aborting = true; } spin_unlock_irqrestore(&info->lock,flags); return 0; @@ -2589,7 +2589,8 @@ static int block_til_ready(struct tty_struct *tty, struct file *filp, { DECLARE_WAITQUEUE(wait, current); int retval; - int do_clocal = 0, extra_count = 0; + bool do_clocal = false; + bool extra_count = false; unsigned long flags; if (debug_level >= DEBUG_LEVEL_INFO) @@ -2604,7 +2605,7 @@ static int block_til_ready(struct tty_struct *tty, struct file *filp, } if (tty->termios->c_cflag & CLOCAL) - do_clocal = 1; + do_clocal = true; /* Wait for carrier detect and the line to become * free (i.e., not in use by the callout). While we are in @@ -2622,7 +2623,7 @@ static int block_til_ready(struct tty_struct *tty, struct file *filp, spin_lock_irqsave(&info->lock, flags); if (!tty_hung_up_p(filp)) { - extra_count = 1; + extra_count = true; info->count--; } spin_unlock_irqrestore(&info->lock, flags); @@ -3493,8 +3494,8 @@ static void rx_stop(MGSLPC_INFO *info) /* MODE:03 RAC Receiver Active, 0=inactive */ clear_reg_bits(info, CHA + MODE, BIT3); - info->rx_enabled = 0; - info->rx_overflow = 0; + info->rx_enabled = false; + info->rx_overflow = false; } static void rx_start(MGSLPC_INFO *info) @@ -3504,13 +3505,13 @@ static void rx_start(MGSLPC_INFO *info) __FILE__,__LINE__, info->device_name ); rx_reset_buffers(info); - info->rx_enabled = 0; - info->rx_overflow = 0; + info->rx_enabled = false; + info->rx_overflow = false; /* MODE:03 RAC Receiver Active, 1=active */ set_reg_bits(info, CHA + MODE, BIT3); - info->rx_enabled = 1; + info->rx_enabled = true; } static void tx_start(MGSLPC_INFO *info) @@ -3523,24 +3524,24 @@ static void tx_start(MGSLPC_INFO *info) /* If auto RTS enabled and RTS is inactive, then assert */ /* RTS and set a flag indicating that the driver should */ /* negate RTS when the transmission completes. */ - info->drop_rts_on_tx_done = 0; + info->drop_rts_on_tx_done = false; if (info->params.flags & HDLC_FLAG_AUTO_RTS) { get_signals(info); if (!(info->serial_signals & SerialSignal_RTS)) { info->serial_signals |= SerialSignal_RTS; set_signals(info); - info->drop_rts_on_tx_done = 1; + info->drop_rts_on_tx_done = true; } } if (info->params.mode == MGSL_MODE_ASYNC) { if (!info->tx_active) { - info->tx_active = 1; + info->tx_active = true; tx_ready(info); } } else { - info->tx_active = 1; + info->tx_active = true; tx_ready(info); mod_timer(&info->tx_timer, jiffies + msecs_to_jiffies(5000)); @@ -3548,7 +3549,7 @@ static void tx_start(MGSLPC_INFO *info) } if (!info->tx_enabled) - info->tx_enabled = 1; + info->tx_enabled = true; } static void tx_stop(MGSLPC_INFO *info) @@ -3559,8 +3560,8 @@ static void tx_stop(MGSLPC_INFO *info) del_timer(&info->tx_timer); - info->tx_enabled = 0; - info->tx_active = 0; + info->tx_enabled = false; + info->tx_active = false; } /* Reset the adapter to a known state and prepare it for further use. @@ -3860,19 +3861,19 @@ static void rx_reset_buffers(MGSLPC_INFO *info) /* Attempt to return a received HDLC frame * Only frames received without errors are returned. * - * Returns 1 if frame returned, otherwise 0 + * Returns true if frame returned, otherwise false */ -static int rx_get_frame(MGSLPC_INFO *info) +static bool rx_get_frame(MGSLPC_INFO *info) { unsigned short status; RXBUF *buf; unsigned int framesize = 0; unsigned long flags; struct tty_struct *tty = info->tty; - int return_frame = 0; + bool return_frame = false; if (info->rx_frame_count == 0) - return 0; + return false; buf = (RXBUF*)(info->rx_buf + (info->rx_get * info->rx_buf_size)); @@ -3891,7 +3892,7 @@ static int rx_get_frame(MGSLPC_INFO *info) else if (!(status & BIT5)) { info->icount.rxcrc++; if (info->params.crc_type & HDLC_CRC_RETURN_EX) - return_frame = 1; + return_frame = true; } framesize = 0; #if SYNCLINK_GENERIC_HDLC @@ -3902,7 +3903,7 @@ static int rx_get_frame(MGSLPC_INFO *info) } #endif } else - return_frame = 1; + return_frame = true; if (return_frame) framesize = buf->count; @@ -3945,16 +3946,16 @@ static int rx_get_frame(MGSLPC_INFO *info) info->rx_get = 0; spin_unlock_irqrestore(&info->lock,flags); - return 1; + return true; } -static BOOLEAN register_test(MGSLPC_INFO *info) +static bool register_test(MGSLPC_INFO *info) { static unsigned char patterns[] = { 0x00, 0xff, 0xaa, 0x55, 0x69, 0x96, 0x0f }; static unsigned int count = ARRAY_SIZE(patterns); unsigned int i; - BOOLEAN rc = TRUE; + bool rc = true; unsigned long flags; spin_lock_irqsave(&info->lock,flags); @@ -3965,7 +3966,7 @@ static BOOLEAN register_test(MGSLPC_INFO *info) write_reg(info, XAD2, patterns[(i + 1) % count]); if ((read_reg(info, XAD1) != patterns[i]) || (read_reg(info, XAD2) != patterns[(i + 1) % count])) { - rc = FALSE; + rc = false; break; } } @@ -3974,7 +3975,7 @@ static BOOLEAN register_test(MGSLPC_INFO *info) return rc; } -static BOOLEAN irq_test(MGSLPC_INFO *info) +static bool irq_test(MGSLPC_INFO *info) { unsigned long end_time; unsigned long flags; @@ -3982,10 +3983,10 @@ static BOOLEAN irq_test(MGSLPC_INFO *info) spin_lock_irqsave(&info->lock,flags); reset_device(info); - info->testing_irq = TRUE; + info->testing_irq = true; hdlc_mode(info); - info->irq_occurred = FALSE; + info->irq_occurred = false; /* init hdlc mode */ @@ -4000,13 +4001,13 @@ static BOOLEAN irq_test(MGSLPC_INFO *info) msleep_interruptible(10); } - info->testing_irq = FALSE; + info->testing_irq = false; spin_lock_irqsave(&info->lock,flags); reset_device(info); spin_unlock_irqrestore(&info->lock,flags); - return info->irq_occurred ? TRUE : FALSE; + return info->irq_occurred; } static int adapter_test(MGSLPC_INFO *info) @@ -4079,7 +4080,7 @@ static void tx_timeout(unsigned long context) info->icount.txtimeout++; } spin_lock_irqsave(&info->lock,flags); - info->tx_active = 0; + info->tx_active = false; info->tx_count = info->tx_put = info->tx_get = 0; spin_unlock_irqrestore(&info->lock,flags); diff --git a/drivers/char/synclink.c b/drivers/char/synclink.c index a3237d48a584..fadab1d9510f 100644 --- a/drivers/char/synclink.c +++ b/drivers/char/synclink.c @@ -218,9 +218,9 @@ struct mgsl_struct { u32 pending_bh; - int bh_running; /* Protection from multiple */ + bool bh_running; /* Protection from multiple */ int isr_overflow; - int bh_requested; + bool bh_requested; int dcd_chkcount; /* check counts to prevent */ int cts_chkcount; /* too many IRQs if a signal */ @@ -250,12 +250,12 @@ struct mgsl_struct { int tx_holding_count; /* number of tx holding buffers waiting */ struct tx_holding_buffer tx_holding_buffers[MAX_TX_HOLDING_BUFFERS]; - int rx_enabled; - int rx_overflow; - int rx_rcc_underrun; + bool rx_enabled; + bool rx_overflow; + bool rx_rcc_underrun; - int tx_enabled; - int tx_active; + bool tx_enabled; + bool tx_active; u32 idle_mode; u16 cmr_value; @@ -269,14 +269,14 @@ struct mgsl_struct { unsigned int io_base; /* base I/O address of adapter */ unsigned int io_addr_size; /* size of the I/O address range */ - int io_addr_requested; /* nonzero if I/O address requested */ + bool io_addr_requested; /* true if I/O address requested */ unsigned int irq_level; /* interrupt level */ unsigned long irq_flags; - int irq_requested; /* nonzero if IRQ requested */ + bool irq_requested; /* true if IRQ requested */ unsigned int dma_level; /* DMA channel */ - int dma_requested; /* nonzero if dma channel requested */ + bool dma_requested; /* true if dma channel requested */ u16 mbre_bit; u16 loopback_bits; @@ -286,27 +286,27 @@ struct mgsl_struct { unsigned char serial_signals; /* current serial signal states */ - int irq_occurred; /* for diagnostics use */ + bool irq_occurred; /* for diagnostics use */ unsigned int init_error; /* Initialization startup error (DIAGS) */ int fDiagnosticsmode; /* Driver in Diagnostic mode? (DIAGS) */ u32 last_mem_alloc; unsigned char* memory_base; /* shared memory address (PCI only) */ u32 phys_memory_base; - int shared_mem_requested; + bool shared_mem_requested; unsigned char* lcr_base; /* local config registers (PCI only) */ u32 phys_lcr_base; u32 lcr_offset; - int lcr_mem_requested; + bool lcr_mem_requested; u32 misc_ctrl_value; char flag_buf[MAX_ASYNC_BUFFER_SIZE]; char char_buf[MAX_ASYNC_BUFFER_SIZE]; - BOOLEAN drop_rts_on_tx_done; + bool drop_rts_on_tx_done; - BOOLEAN loopmode_insert_requested; - BOOLEAN loopmode_send_done_requested; + bool loopmode_insert_requested; + bool loopmode_send_done_requested; struct _input_signal_events input_signal_events; @@ -752,10 +752,10 @@ static void mgsl_trace_block(struct mgsl_struct *info,const char* data, int coun /* * Adapter diagnostic routines */ -static BOOLEAN mgsl_register_test( struct mgsl_struct *info ); -static BOOLEAN mgsl_irq_test( struct mgsl_struct *info ); -static BOOLEAN mgsl_dma_test( struct mgsl_struct *info ); -static BOOLEAN mgsl_memory_test( struct mgsl_struct *info ); +static bool mgsl_register_test( struct mgsl_struct *info ); +static bool mgsl_irq_test( struct mgsl_struct *info ); +static bool mgsl_dma_test( struct mgsl_struct *info ); +static bool mgsl_memory_test( struct mgsl_struct *info ); static int mgsl_adapter_test( struct mgsl_struct *info ); /* @@ -770,8 +770,8 @@ static struct mgsl_struct* mgsl_allocate_device(void); * DMA buffer manupulation functions. */ static void mgsl_free_rx_frame_buffers( struct mgsl_struct *info, unsigned int StartIndex, unsigned int EndIndex ); -static int mgsl_get_rx_frame( struct mgsl_struct *info ); -static int mgsl_get_raw_rx_frame( struct mgsl_struct *info ); +static bool mgsl_get_rx_frame( struct mgsl_struct *info ); +static bool mgsl_get_raw_rx_frame( struct mgsl_struct *info ); static void mgsl_reset_rx_dma_buffers( struct mgsl_struct *info ); static void mgsl_reset_tx_dma_buffers( struct mgsl_struct *info ); static int num_free_tx_dma_buffers(struct mgsl_struct *info); @@ -791,7 +791,7 @@ static int mgsl_alloc_intermediate_rxbuffer_memory(struct mgsl_struct *info); static void mgsl_free_intermediate_rxbuffer_memory(struct mgsl_struct *info); static int mgsl_alloc_intermediate_txbuffer_memory(struct mgsl_struct *info); static void mgsl_free_intermediate_txbuffer_memory(struct mgsl_struct *info); -static int load_next_tx_holding_buffer(struct mgsl_struct *info); +static bool load_next_tx_holding_buffer(struct mgsl_struct *info); static int save_tx_buffer_request(struct mgsl_struct *info,const char *Buffer, unsigned int BufferSize); /* @@ -847,7 +847,7 @@ static int mgsl_wait_event(struct mgsl_struct * info, int __user *mask); static int mgsl_loopmode_send_done( struct mgsl_struct * info ); /* set non-zero on successful registration with PCI subsystem */ -static int pci_registered; +static bool pci_registered; /* * Global linked list of SyncLink devices @@ -1054,8 +1054,8 @@ static int mgsl_bh_action(struct mgsl_struct *info) if (!rc) { /* Mark BH routine as complete */ - info->bh_running = 0; - info->bh_requested = 0; + info->bh_running = false; + info->bh_requested = false; } spin_unlock_irqrestore(&info->irq_spinlock,flags); @@ -1079,7 +1079,7 @@ static void mgsl_bh_handler(struct work_struct *work) printk( "%s(%d):mgsl_bh_handler(%s) entry\n", __FILE__,__LINE__,info->device_name); - info->bh_running = 1; + info->bh_running = true; while((action = mgsl_bh_action(info)) != 0) { @@ -1113,7 +1113,7 @@ static void mgsl_bh_handler(struct work_struct *work) static void mgsl_bh_receive(struct mgsl_struct *info) { - int (*get_rx_frame)(struct mgsl_struct *info) = + bool (*get_rx_frame)(struct mgsl_struct *info) = (info->params.mode == MGSL_MODE_HDLC ? mgsl_get_rx_frame : mgsl_get_raw_rx_frame); if ( debug_level >= DEBUG_LEVEL_BH ) @@ -1187,7 +1187,7 @@ static void mgsl_isr_receive_status( struct mgsl_struct *info ) usc_loopmode_active(info) ) { ++info->icount.rxabort; - info->loopmode_insert_requested = FALSE; + info->loopmode_insert_requested = false; /* clear CMR:13 to start echoing RxD to TxD */ info->cmr_value &= ~BIT13; @@ -1257,7 +1257,7 @@ static void mgsl_isr_transmit_status( struct mgsl_struct *info ) else info->icount.txunder++; - info->tx_active = 0; + info->tx_active = false; info->xmit_cnt = info->xmit_head = info->xmit_tail = 0; del_timer(&info->tx_timer); @@ -1267,7 +1267,7 @@ static void mgsl_isr_transmit_status( struct mgsl_struct *info ) info->serial_signals &= ~SerialSignal_RTS; usc_set_serial_signals( info ); } - info->drop_rts_on_tx_done = 0; + info->drop_rts_on_tx_done = false; } #if SYNCLINK_GENERIC_HDLC @@ -1403,7 +1403,7 @@ static void mgsl_isr_io_pin( struct mgsl_struct *info ) usc_OutReg( info, SICR, (unsigned short)(usc_InReg(info,SICR) & ~(SICR_TXC_ACTIVE+SICR_TXC_INACTIVE)) ); usc_UnlatchIostatusBits( info, MISCSTATUS_TXC_LATCHED ); - info->irq_occurred = 1; + info->irq_occurred = true; } } /* end of mgsl_isr_io_pin() */ @@ -1431,7 +1431,7 @@ static void mgsl_isr_transmit_data( struct mgsl_struct *info ) if ( info->xmit_cnt ) usc_load_txfifo( info ); else - info->tx_active = 0; + info->tx_active = false; if (info->xmit_cnt < WAKEUP_CHARS) info->pending_bh |= BH_TRANSMIT; @@ -1568,7 +1568,7 @@ static void mgsl_isr_misc( struct mgsl_struct *info ) /* schedule BH handler to restart receiver */ info->pending_bh |= BH_RECEIVE; - info->rx_rcc_underrun = 1; + info->rx_rcc_underrun = true; } usc_ClearIrqPendingBits( info, MISC ); @@ -1626,7 +1626,7 @@ static void mgsl_isr_receive_dma( struct mgsl_struct *info ) info->pending_bh |= BH_RECEIVE; if ( status & BIT3 ) { - info->rx_overflow = 1; + info->rx_overflow = true; info->icount.buf_overrun++; } @@ -1745,7 +1745,7 @@ static irqreturn_t mgsl_interrupt(int dummy, void *dev_id) printk("%s(%d):%s queueing bh task.\n", __FILE__,__LINE__,info->device_name); schedule_work(&info->task); - info->bh_requested = 1; + info->bh_requested = true; } spin_unlock(&info->irq_spinlock); @@ -3303,7 +3303,8 @@ static int block_til_ready(struct tty_struct *tty, struct file * filp, { DECLARE_WAITQUEUE(wait, current); int retval; - int do_clocal = 0, extra_count = 0; + bool do_clocal = false; + bool extra_count = false; unsigned long flags; if (debug_level >= DEBUG_LEVEL_INFO) @@ -3317,7 +3318,7 @@ static int block_til_ready(struct tty_struct *tty, struct file * filp, } if (tty->termios->c_cflag & CLOCAL) - do_clocal = 1; + do_clocal = true; /* Wait for carrier detect and the line to become * free (i.e., not in use by the callout). While we are in @@ -3335,7 +3336,7 @@ static int block_til_ready(struct tty_struct *tty, struct file * filp, spin_lock_irqsave(&info->irq_spinlock, flags); if (!tty_hung_up_p(filp)) { - extra_count = 1; + extra_count = true; info->count--; } spin_unlock_irqrestore(&info->irq_spinlock, flags); @@ -4043,13 +4044,13 @@ static void mgsl_free_intermediate_txbuffer_memory(struct mgsl_struct *info) * * info pointer to device instance data * - * Return Value: 1 if next buffered tx request loaded + * Return Value: true if next buffered tx request loaded * into adapter's tx dma buffer, - * 0 otherwise + * false otherwise */ -static int load_next_tx_holding_buffer(struct mgsl_struct *info) +static bool load_next_tx_holding_buffer(struct mgsl_struct *info) { - int ret = 0; + bool ret = false; if ( info->tx_holding_count ) { /* determine if we have enough tx dma buffers @@ -4073,7 +4074,7 @@ static int load_next_tx_holding_buffer(struct mgsl_struct *info) /* restart transmit timer */ mod_timer(&info->tx_timer, jiffies + msecs_to_jiffies(5000)); - ret = 1; + ret = true; } } @@ -4119,7 +4120,7 @@ static int mgsl_claim_resources(struct mgsl_struct *info) __FILE__,__LINE__,info->device_name, info->io_base); return -ENODEV; } - info->io_addr_requested = 1; + info->io_addr_requested = true; if ( request_irq(info->irq_level,mgsl_interrupt,info->irq_flags, info->device_name, info ) < 0 ) { @@ -4127,7 +4128,7 @@ static int mgsl_claim_resources(struct mgsl_struct *info) __FILE__,__LINE__,info->device_name, info->irq_level ); goto errout; } - info->irq_requested = 1; + info->irq_requested = true; if ( info->bus_type == MGSL_BUS_TYPE_PCI ) { if (request_mem_region(info->phys_memory_base,0x40000,"synclink") == NULL) { @@ -4135,13 +4136,13 @@ static int mgsl_claim_resources(struct mgsl_struct *info) __FILE__,__LINE__,info->device_name, info->phys_memory_base); goto errout; } - info->shared_mem_requested = 1; + info->shared_mem_requested = true; if (request_mem_region(info->phys_lcr_base + info->lcr_offset,128,"synclink") == NULL) { printk( "%s(%d):lcr mem addr conflict device %s Addr=%08X\n", __FILE__,__LINE__,info->device_name, info->phys_lcr_base + info->lcr_offset); goto errout; } - info->lcr_mem_requested = 1; + info->lcr_mem_requested = true; info->memory_base = ioremap(info->phys_memory_base,0x40000); if (!info->memory_base) { @@ -4172,7 +4173,7 @@ static int mgsl_claim_resources(struct mgsl_struct *info) mgsl_release_resources( info ); return -ENODEV; } - info->dma_requested = 1; + info->dma_requested = true; /* ISA adapter uses bus master DMA */ set_dma_mode(info->dma_level,DMA_MODE_CASCADE); @@ -4200,12 +4201,12 @@ static void mgsl_release_resources(struct mgsl_struct *info) if ( info->irq_requested ) { free_irq(info->irq_level, info); - info->irq_requested = 0; + info->irq_requested = false; } if ( info->dma_requested ) { disable_dma(info->dma_level); free_dma(info->dma_level); - info->dma_requested = 0; + info->dma_requested = false; } mgsl_free_dma_buffers(info); mgsl_free_intermediate_rxbuffer_memory(info); @@ -4213,15 +4214,15 @@ static void mgsl_release_resources(struct mgsl_struct *info) if ( info->io_addr_requested ) { release_region(info->io_base,info->io_addr_size); - info->io_addr_requested = 0; + info->io_addr_requested = false; } if ( info->shared_mem_requested ) { release_mem_region(info->phys_memory_base,0x40000); - info->shared_mem_requested = 0; + info->shared_mem_requested = false; } if ( info->lcr_mem_requested ) { release_mem_region(info->phys_lcr_base + info->lcr_offset,128); - info->lcr_mem_requested = 0; + info->lcr_mem_requested = false; } if (info->memory_base){ iounmap(info->memory_base); @@ -4486,7 +4487,7 @@ static int __init synclink_init(void) if ((rc = pci_register_driver(&synclink_pci_driver)) < 0) printk("%s:failed to register PCI driver, error=%d\n",__FILE__,rc); else - pci_registered = 1; + pci_registered = true; if ((rc = mgsl_init_tty()) < 0) goto error; @@ -4679,7 +4680,7 @@ static u16 usc_InReg( struct mgsl_struct *info, u16 RegAddr ) static void usc_set_sdlc_mode( struct mgsl_struct *info ) { u16 RegValue; - int PreSL1660; + bool PreSL1660; /* * determine if the IUSC on the adapter is pre-SL1660. If @@ -4692,11 +4693,7 @@ static void usc_set_sdlc_mode( struct mgsl_struct *info ) */ usc_OutReg(info,TMCR,0x1f); RegValue=usc_InReg(info,TMDR); - if ( RegValue == IUSC_PRE_SL1660 ) - PreSL1660 = 1; - else - PreSL1660 = 0; - + PreSL1660 = (RegValue == IUSC_PRE_SL1660); if ( info->params.flags & HDLC_FLAG_HDLC_LOOPMODE ) { @@ -5382,9 +5379,9 @@ static void usc_process_rxoverrun_sync( struct mgsl_struct *info ) int start_index; int end_index; int frame_start_index; - int start_of_frame_found = FALSE; - int end_of_frame_found = FALSE; - int reprogram_dma = FALSE; + bool start_of_frame_found = false; + bool end_of_frame_found = false; + bool reprogram_dma = false; DMABUFFERENTRY *buffer_list = info->rx_buffer_list; u32 phys_addr; @@ -5410,9 +5407,9 @@ static void usc_process_rxoverrun_sync( struct mgsl_struct *info ) if ( !start_of_frame_found ) { - start_of_frame_found = TRUE; + start_of_frame_found = true; frame_start_index = end_index; - end_of_frame_found = FALSE; + end_of_frame_found = false; } if ( buffer_list[end_index].status ) @@ -5423,8 +5420,8 @@ static void usc_process_rxoverrun_sync( struct mgsl_struct *info ) /* We want to leave the buffers for this frame intact. */ /* Move on to next possible frame. */ - start_of_frame_found = FALSE; - end_of_frame_found = TRUE; + start_of_frame_found = false; + end_of_frame_found = true; } /* advance to next buffer entry in linked list */ @@ -5439,8 +5436,8 @@ static void usc_process_rxoverrun_sync( struct mgsl_struct *info ) /* completely screwed, reset all receive buffers! */ mgsl_reset_rx_dma_buffers( info ); frame_start_index = 0; - start_of_frame_found = FALSE; - reprogram_dma = TRUE; + start_of_frame_found = false; + reprogram_dma = true; break; } } @@ -5466,7 +5463,7 @@ static void usc_process_rxoverrun_sync( struct mgsl_struct *info ) } while( start_index != end_index ); - reprogram_dma = TRUE; + reprogram_dma = true; } if ( reprogram_dma ) @@ -5536,9 +5533,9 @@ static void usc_stop_receiver( struct mgsl_struct *info ) usc_OutReg( info, CCSR, (u16)(usc_InReg(info,CCSR) | BIT13) ); usc_RTCmd( info, RTCmd_PurgeRxFifo ); - info->rx_enabled = 0; - info->rx_overflow = 0; - info->rx_rcc_underrun = 0; + info->rx_enabled = false; + info->rx_overflow = false; + info->rx_rcc_underrun = false; } /* end of stop_receiver() */ @@ -5601,7 +5598,7 @@ static void usc_start_receiver( struct mgsl_struct *info ) usc_OutReg( info, CCSR, 0x1020 ); - info->rx_enabled = 1; + info->rx_enabled = true; } /* end of usc_start_receiver() */ @@ -5628,14 +5625,14 @@ static void usc_start_transmitter( struct mgsl_struct *info ) /* RTS and set a flag indicating that the driver should */ /* negate RTS when the transmission completes. */ - info->drop_rts_on_tx_done = 0; + info->drop_rts_on_tx_done = false; if ( info->params.flags & HDLC_FLAG_AUTO_RTS ) { usc_get_serial_signals( info ); if ( !(info->serial_signals & SerialSignal_RTS) ) { info->serial_signals |= SerialSignal_RTS; usc_set_serial_signals( info ); - info->drop_rts_on_tx_done = 1; + info->drop_rts_on_tx_done = true; } } @@ -5699,11 +5696,11 @@ static void usc_start_transmitter( struct mgsl_struct *info ) mod_timer(&info->tx_timer, jiffies + msecs_to_jiffies(5000)); } - info->tx_active = 1; + info->tx_active = true; } if ( !info->tx_enabled ) { - info->tx_enabled = 1; + info->tx_enabled = true; if ( info->params.flags & HDLC_FLAG_AUTO_CTS ) usc_EnableTransmitter(info,ENABLE_AUTO_CTS); else @@ -5735,8 +5732,8 @@ static void usc_stop_transmitter( struct mgsl_struct *info ) usc_DmaCmd( info, DmaCmd_ResetTxChannel ); usc_RTCmd( info, RTCmd_PurgeTxFifo ); - info->tx_enabled = 0; - info->tx_active = 0; + info->tx_enabled = false; + info->tx_active = false; } /* end of usc_stop_transmitter() */ @@ -6520,7 +6517,7 @@ static void mgsl_reset_rx_dma_buffers( struct mgsl_struct *info ) */ static void mgsl_free_rx_frame_buffers( struct mgsl_struct *info, unsigned int StartIndex, unsigned int EndIndex ) { - int Done = 0; + bool Done = false; DMABUFFERENTRY *pBufEntry; unsigned int Index; @@ -6534,7 +6531,7 @@ static void mgsl_free_rx_frame_buffers( struct mgsl_struct *info, unsigned int S if ( Index == EndIndex ) { /* This is the last buffer of the frame! */ - Done = 1; + Done = true; } /* reset current buffer for reuse */ @@ -6559,18 +6556,18 @@ static void mgsl_free_rx_frame_buffers( struct mgsl_struct *info, unsigned int S * receive DMA buffers. Only frames received without errors are returned. * * Arguments: info pointer to device extension - * Return Value: 1 if frame returned, otherwise 0 + * Return Value: true if frame returned, otherwise false */ -static int mgsl_get_rx_frame(struct mgsl_struct *info) +static bool mgsl_get_rx_frame(struct mgsl_struct *info) { unsigned int StartIndex, EndIndex; /* index of 1st and last buffers of Rx frame */ unsigned short status; DMABUFFERENTRY *pBufEntry; unsigned int framesize = 0; - int ReturnCode = 0; + bool ReturnCode = false; unsigned long flags; struct tty_struct *tty = info->tty; - int return_frame = 0; + bool return_frame = false; /* * current_rx_buffer points to the 1st buffer of the next available @@ -6629,7 +6626,7 @@ static int mgsl_get_rx_frame(struct mgsl_struct *info) else { info->icount.rxcrc++; if ( info->params.crc_type & HDLC_CRC_RETURN_EX ) - return_frame = 1; + return_frame = true; } framesize = 0; #if SYNCLINK_GENERIC_HDLC @@ -6640,7 +6637,7 @@ static int mgsl_get_rx_frame(struct mgsl_struct *info) } #endif } else - return_frame = 1; + return_frame = true; if ( return_frame ) { /* receive frame has no errors, get frame size. @@ -6719,7 +6716,7 @@ static int mgsl_get_rx_frame(struct mgsl_struct *info) /* Free the buffers used by this frame. */ mgsl_free_rx_frame_buffers( info, StartIndex, EndIndex ); - ReturnCode = 1; + ReturnCode = true; Cleanup: @@ -6758,15 +6755,15 @@ Cleanup: * last Rx DMA buffer and return that last portion of the frame. * * Arguments: info pointer to device extension - * Return Value: 1 if frame returned, otherwise 0 + * Return Value: true if frame returned, otherwise false */ -static int mgsl_get_raw_rx_frame(struct mgsl_struct *info) +static bool mgsl_get_raw_rx_frame(struct mgsl_struct *info) { unsigned int CurrentIndex, NextIndex; unsigned short status; DMABUFFERENTRY *pBufEntry; unsigned int framesize = 0; - int ReturnCode = 0; + bool ReturnCode = false; unsigned long flags; struct tty_struct *tty = info->tty; @@ -6891,7 +6888,7 @@ static int mgsl_get_raw_rx_frame(struct mgsl_struct *info) /* Free the buffers used by this frame. */ mgsl_free_rx_frame_buffers( info, CurrentIndex, CurrentIndex ); - ReturnCode = 1; + ReturnCode = true; } @@ -7000,15 +6997,15 @@ static void mgsl_load_tx_dma_buffer(struct mgsl_struct *info, * Performs a register test of the 16C32. * * Arguments: info pointer to device instance data - * Return Value: TRUE if test passed, otherwise FALSE + * Return Value: true if test passed, otherwise false */ -static BOOLEAN mgsl_register_test( struct mgsl_struct *info ) +static bool mgsl_register_test( struct mgsl_struct *info ) { static unsigned short BitPatterns[] = { 0x0000, 0xffff, 0xaaaa, 0x5555, 0x1234, 0x6969, 0x9696, 0x0f0f }; static unsigned int Patterncount = ARRAY_SIZE(BitPatterns); unsigned int i; - BOOLEAN rc = TRUE; + bool rc = true; unsigned long flags; spin_lock_irqsave(&info->irq_spinlock,flags); @@ -7019,10 +7016,10 @@ static BOOLEAN mgsl_register_test( struct mgsl_struct *info ) if ( (usc_InReg( info, SICR ) != 0) || (usc_InReg( info, IVR ) != 0) || (usc_InDmaReg( info, DIVR ) != 0) ){ - rc = FALSE; + rc = false; } - if ( rc == TRUE ){ + if ( rc ){ /* Write bit patterns to various registers but do it out of */ /* sync, then read back and verify values. */ @@ -7040,7 +7037,7 @@ static BOOLEAN mgsl_register_test( struct mgsl_struct *info ) (usc_InReg( info, RCLR ) != BitPatterns[(i+3)%Patterncount]) || (usc_InReg( info, RSR ) != BitPatterns[(i+4)%Patterncount]) || (usc_InDmaReg( info, TBCR ) != BitPatterns[(i+5)%Patterncount]) ){ - rc = FALSE; + rc = false; break; } } @@ -7056,9 +7053,9 @@ static BOOLEAN mgsl_register_test( struct mgsl_struct *info ) /* mgsl_irq_test() Perform interrupt test of the 16C32. * * Arguments: info pointer to device instance data - * Return Value: TRUE if test passed, otherwise FALSE + * Return Value: true if test passed, otherwise false */ -static BOOLEAN mgsl_irq_test( struct mgsl_struct *info ) +static bool mgsl_irq_test( struct mgsl_struct *info ) { unsigned long EndTime; unsigned long flags; @@ -7068,10 +7065,10 @@ static BOOLEAN mgsl_irq_test( struct mgsl_struct *info ) /* * Setup 16C32 to interrupt on TxC pin (14MHz clock) transition. - * The ISR sets irq_occurred to 1. + * The ISR sets irq_occurred to true. */ - info->irq_occurred = FALSE; + info->irq_occurred = false; /* Enable INTEN gate for ISA adapter (Port 6, Bit12) */ /* Enable INTEN (Port 6, Bit12) */ @@ -7097,10 +7094,7 @@ static BOOLEAN mgsl_irq_test( struct mgsl_struct *info ) usc_reset(info); spin_unlock_irqrestore(&info->irq_spinlock,flags); - if ( !info->irq_occurred ) - return FALSE; - else - return TRUE; + return info->irq_occurred; } /* end of mgsl_irq_test() */ @@ -7111,16 +7105,16 @@ static BOOLEAN mgsl_irq_test( struct mgsl_struct *info ) * using single buffer DMA mode. * * Arguments: info pointer to device instance data - * Return Value: TRUE if test passed, otherwise FALSE + * Return Value: true if test passed, otherwise false */ -static BOOLEAN mgsl_dma_test( struct mgsl_struct *info ) +static bool mgsl_dma_test( struct mgsl_struct *info ) { unsigned short FifoLevel; unsigned long phys_addr; unsigned int FrameSize; unsigned int i; char *TmpPtr; - BOOLEAN rc = TRUE; + bool rc = true; unsigned short status=0; unsigned long EndTime; unsigned long flags; @@ -7233,7 +7227,7 @@ static BOOLEAN mgsl_dma_test( struct mgsl_struct *info ) for(;;) { if (time_after(jiffies, EndTime)) { - rc = FALSE; + rc = false; break; } @@ -7289,7 +7283,7 @@ static BOOLEAN mgsl_dma_test( struct mgsl_struct *info ) for(;;) { if (time_after(jiffies, EndTime)) { - rc = FALSE; + rc = false; break; } @@ -7309,7 +7303,7 @@ static BOOLEAN mgsl_dma_test( struct mgsl_struct *info ) } - if ( rc == TRUE ) + if ( rc ) { /* Enable 16C32 transmitter. */ @@ -7337,7 +7331,7 @@ static BOOLEAN mgsl_dma_test( struct mgsl_struct *info ) while ( !(status & (BIT6+BIT5+BIT4+BIT2+BIT1)) ) { if (time_after(jiffies, EndTime)) { - rc = FALSE; + rc = false; break; } @@ -7348,13 +7342,13 @@ static BOOLEAN mgsl_dma_test( struct mgsl_struct *info ) } - if ( rc == TRUE ){ + if ( rc ){ /* CHECK FOR TRANSMIT ERRORS */ if ( status & (BIT5 + BIT1) ) - rc = FALSE; + rc = false; } - if ( rc == TRUE ) { + if ( rc ) { /* WAIT FOR RECEIVE COMPLETE */ /* Wait 100ms */ @@ -7364,7 +7358,7 @@ static BOOLEAN mgsl_dma_test( struct mgsl_struct *info ) status=info->rx_buffer_list[0].status; while ( status == 0 ) { if (time_after(jiffies, EndTime)) { - rc = FALSE; + rc = false; break; } status=info->rx_buffer_list[0].status; @@ -7372,17 +7366,17 @@ static BOOLEAN mgsl_dma_test( struct mgsl_struct *info ) } - if ( rc == TRUE ) { + if ( rc ) { /* CHECK FOR RECEIVE ERRORS */ status = info->rx_buffer_list[0].status; if ( status & (BIT8 + BIT3 + BIT1) ) { /* receive error has occurred */ - rc = FALSE; + rc = false; } else { if ( memcmp( info->tx_buffer_list[0].virt_addr , info->rx_buffer_list[0].virt_addr, FrameSize ) ){ - rc = FALSE; + rc = false; } } } @@ -7445,9 +7439,9 @@ static int mgsl_adapter_test( struct mgsl_struct *info ) * Test the shared memory on a PCI adapter. * * Arguments: info pointer to device instance data - * Return Value: TRUE if test passed, otherwise FALSE + * Return Value: true if test passed, otherwise false */ -static BOOLEAN mgsl_memory_test( struct mgsl_struct *info ) +static bool mgsl_memory_test( struct mgsl_struct *info ) { static unsigned long BitPatterns[] = { 0x0, 0x55555555, 0xaaaaaaaa, 0x66666666, 0x99999999, 0xffffffff, 0x12345678 }; @@ -7457,7 +7451,7 @@ static BOOLEAN mgsl_memory_test( struct mgsl_struct *info ) unsigned long * TestAddr; if ( info->bus_type != MGSL_BUS_TYPE_PCI ) - return TRUE; + return true; TestAddr = (unsigned long *)info->memory_base; @@ -7466,7 +7460,7 @@ static BOOLEAN mgsl_memory_test( struct mgsl_struct *info ) for ( i = 0 ; i < Patterncount ; i++ ) { *TestAddr = BitPatterns[i]; if ( *TestAddr != BitPatterns[i] ) - return FALSE; + return false; } /* Test address lines with incrementing pattern over */ @@ -7481,13 +7475,13 @@ static BOOLEAN mgsl_memory_test( struct mgsl_struct *info ) for ( i = 0 ; i < TestLimit ; i++ ) { if ( *TestAddr != i * 4 ) - return FALSE; + return false; TestAddr++; } memset( info->memory_base, 0, SHARED_MEM_ADDRESS_SIZE ); - return TRUE; + return true; } /* End Of mgsl_memory_test() */ @@ -7604,7 +7598,7 @@ static void mgsl_tx_timeout(unsigned long context) info->icount.txtimeout++; } spin_lock_irqsave(&info->irq_spinlock,flags); - info->tx_active = 0; + info->tx_active = false; info->xmit_cnt = info->xmit_head = info->xmit_tail = 0; if ( info->params.flags & HDLC_FLAG_HDLC_LOOPMODE ) @@ -7632,7 +7626,7 @@ static int mgsl_loopmode_send_done( struct mgsl_struct * info ) spin_lock_irqsave(&info->irq_spinlock,flags); if (info->params.flags & HDLC_FLAG_HDLC_LOOPMODE) { if (info->tx_active) - info->loopmode_send_done_requested = TRUE; + info->loopmode_send_done_requested = true; else usc_loopmode_send_done(info); } @@ -7646,7 +7640,7 @@ static int mgsl_loopmode_send_done( struct mgsl_struct * info ) */ static void usc_loopmode_send_done( struct mgsl_struct * info ) { - info->loopmode_send_done_requested = FALSE; + info->loopmode_send_done_requested = false; /* clear CMR:13 to 0 to start echoing RxData to TxData */ info->cmr_value &= ~BIT13; usc_OutReg(info, CMR, info->cmr_value); @@ -7668,7 +7662,7 @@ static void usc_loopmode_cancel_transmit( struct mgsl_struct * info ) */ static void usc_loopmode_insert_request( struct mgsl_struct * info ) { - info->loopmode_insert_requested = TRUE; + info->loopmode_insert_requested = true; /* enable RxAbort irq. On next RxAbort, clear CMR:13 to * begin repeating TxData on RxData (complete insertion) diff --git a/drivers/char/synclink_gt.c b/drivers/char/synclink_gt.c index 3c89266c8255..f3d8d72e5ea4 100644 --- a/drivers/char/synclink_gt.c +++ b/drivers/char/synclink_gt.c @@ -117,7 +117,7 @@ static struct pci_driver pci_driver = { .remove = __devexit_p(remove_one), }; -static int pci_registered; +static bool pci_registered; /* * module configuration and status @@ -289,12 +289,12 @@ struct slgt_info { struct work_struct task; u32 pending_bh; - int bh_requested; - int bh_running; + bool bh_requested; + bool bh_running; int isr_overflow; - int irq_requested; /* nonzero if IRQ requested */ - int irq_occurred; /* for diagnostics use */ + bool irq_requested; /* true if IRQ requested */ + bool irq_occurred; /* for diagnostics use */ /* device configuration */ @@ -304,7 +304,7 @@ struct slgt_info { unsigned char __iomem * reg_addr; /* memory mapped registers address */ u32 phys_reg_addr; - int reg_addr_requested; + bool reg_addr_requested; MGSL_PARAMS params; /* communications parameters */ u32 idle_mode; @@ -315,11 +315,11 @@ struct slgt_info { /* device status */ - int rx_enabled; - int rx_restart; + bool rx_enabled; + bool rx_restart; - int tx_enabled; - int tx_active; + bool tx_enabled; + bool tx_active; unsigned char signals; /* serial signal states */ int init_error; /* initialization error */ @@ -329,7 +329,7 @@ struct slgt_info { char flag_buf[MAX_ASYNC_BUFFER_SIZE]; char char_buf[MAX_ASYNC_BUFFER_SIZE]; - BOOLEAN drop_rts_on_tx_done; + bool drop_rts_on_tx_done; struct _input_signal_events input_signal_events; int dcd_chkcount; /* check counts to prevent */ @@ -467,8 +467,8 @@ static void rx_start(struct slgt_info *info); static void reset_rbufs(struct slgt_info *info); static void free_rbufs(struct slgt_info *info, unsigned int first, unsigned int last); static void rdma_reset(struct slgt_info *info); -static int rx_get_frame(struct slgt_info *info); -static int rx_get_buf(struct slgt_info *info); +static bool rx_get_frame(struct slgt_info *info); +static bool rx_get_buf(struct slgt_info *info); static void tx_start(struct slgt_info *info); static void tx_stop(struct slgt_info *info); @@ -1968,8 +1968,8 @@ static int bh_action(struct slgt_info *info) rc = BH_STATUS; } else { /* Mark BH routine as complete */ - info->bh_running = 0; - info->bh_requested = 0; + info->bh_running = false; + info->bh_requested = false; rc = 0; } @@ -1988,7 +1988,7 @@ static void bh_handler(struct work_struct *work) if (!info) return; - info->bh_running = 1; + info->bh_running = true; while((action = bh_action(info))) { switch (action) { @@ -2158,7 +2158,7 @@ static void isr_serial(struct slgt_info *info) wr_reg16(info, SSR, status); /* clear pending */ - info->irq_occurred = 1; + info->irq_occurred = true; if (info->params.mode == MGSL_MODE_ASYNC) { if (status & IRQ_TXIDLE) { @@ -2225,7 +2225,7 @@ static void isr_rdma(struct slgt_info *info) if (status & (BIT5 + BIT4)) { DBGISR(("%s isr_rdma rx_restart=1\n", info->device_name)); - info->rx_restart = 1; + info->rx_restart = true; } info->pending_bh |= BH_RECEIVE; } @@ -2276,14 +2276,14 @@ static void isr_txeom(struct slgt_info *info, unsigned short status) info->icount.txok++; } - info->tx_active = 0; + info->tx_active = false; info->tx_count = 0; del_timer(&info->tx_timer); if (info->params.mode != MGSL_MODE_ASYNC && info->drop_rts_on_tx_done) { info->signals &= ~SerialSignal_RTS; - info->drop_rts_on_tx_done = 0; + info->drop_rts_on_tx_done = false; set_signals(info); } @@ -2337,7 +2337,7 @@ static irqreturn_t slgt_interrupt(int dummy, void *dev_id) while((gsr = rd_reg32(info, GSR) & 0xffffff00)) { DBGISR(("%s gsr=%08x\n", info->device_name, gsr)); - info->irq_occurred = 1; + info->irq_occurred = true; for(i=0; i < info->port_count ; i++) { if (info->port_array[i] == NULL) continue; @@ -2374,7 +2374,7 @@ static irqreturn_t slgt_interrupt(int dummy, void *dev_id) !port->bh_requested) { DBGISR(("%s bh queued\n", port->device_name)); schedule_work(&port->task); - port->bh_requested = 1; + port->bh_requested = true; } } @@ -3110,7 +3110,8 @@ static int block_til_ready(struct tty_struct *tty, struct file *filp, { DECLARE_WAITQUEUE(wait, current); int retval; - int do_clocal = 0, extra_count = 0; + bool do_clocal = false; + bool extra_count = false; unsigned long flags; DBGINFO(("%s block_til_ready\n", tty->driver->name)); @@ -3122,7 +3123,7 @@ static int block_til_ready(struct tty_struct *tty, struct file *filp, } if (tty->termios->c_cflag & CLOCAL) - do_clocal = 1; + do_clocal = true; /* Wait for carrier detect and the line to become * free (i.e., not in use by the callout). While we are in @@ -3136,7 +3137,7 @@ static int block_til_ready(struct tty_struct *tty, struct file *filp, spin_lock_irqsave(&info->lock, flags); if (!tty_hung_up_p(filp)) { - extra_count = 1; + extra_count = true; info->count--; } spin_unlock_irqrestore(&info->lock, flags); @@ -3321,7 +3322,7 @@ static int claim_resources(struct slgt_info *info) goto errout; } else - info->reg_addr_requested = 1; + info->reg_addr_requested = true; info->reg_addr = ioremap(info->phys_reg_addr, SLGT_REG_SIZE); if (!info->reg_addr) { @@ -3341,12 +3342,12 @@ static void release_resources(struct slgt_info *info) { if (info->irq_requested) { free_irq(info->irq_level, info); - info->irq_requested = 0; + info->irq_requested = false; } if (info->reg_addr_requested) { release_mem_region(info->phys_reg_addr, SLGT_REG_SIZE); - info->reg_addr_requested = 0; + info->reg_addr_requested = false; } if (info->reg_addr) { @@ -3511,7 +3512,7 @@ static void device_init(int adapter_num, struct pci_dev *pdev) port_array[0]->device_name, port_array[0]->irq_level)); } else { - port_array[0]->irq_requested = 1; + port_array[0]->irq_requested = true; adapter_test(port_array[0]); for (i=1 ; i < port_count ; i++) { port_array[i]->init_error = port_array[0]->init_error; @@ -3654,7 +3655,7 @@ static int __init slgt_init(void) printk("%s pci_register_driver error=%d\n", driver_name, rc); goto error; } - pci_registered = 1; + pci_registered = true; if (!slgt_device_list) printk("%s no devices found\n",driver_name); @@ -3812,8 +3813,8 @@ static void rx_stop(struct slgt_info *info) rdma_reset(info); - info->rx_enabled = 0; - info->rx_restart = 0; + info->rx_enabled = false; + info->rx_restart = false; } static void rx_start(struct slgt_info *info) @@ -3849,8 +3850,8 @@ static void rx_start(struct slgt_info *info) /* enable receiver */ wr_reg16(info, RCR, (unsigned short)(rd_reg16(info, RCR) | BIT1)); - info->rx_restart = 0; - info->rx_enabled = 1; + info->rx_restart = false; + info->rx_enabled = true; } static void tx_start(struct slgt_info *info) @@ -3858,11 +3859,11 @@ static void tx_start(struct slgt_info *info) if (!info->tx_enabled) { wr_reg16(info, TCR, (unsigned short)((rd_reg16(info, TCR) | BIT1) & ~BIT2)); - info->tx_enabled = TRUE; + info->tx_enabled = true; } if (info->tx_count) { - info->drop_rts_on_tx_done = 0; + info->drop_rts_on_tx_done = false; if (info->params.mode != MGSL_MODE_ASYNC) { if (info->params.flags & HDLC_FLAG_AUTO_RTS) { @@ -3870,7 +3871,7 @@ static void tx_start(struct slgt_info *info) if (!(info->signals & SerialSignal_RTS)) { info->signals |= SerialSignal_RTS; set_signals(info); - info->drop_rts_on_tx_done = 1; + info->drop_rts_on_tx_done = true; } } @@ -3888,7 +3889,7 @@ static void tx_start(struct slgt_info *info) wr_reg16(info, SSR, IRQ_TXIDLE); } tdma_start(info); - info->tx_active = 1; + info->tx_active = true; } } @@ -3949,8 +3950,8 @@ static void tx_stop(struct slgt_info *info) reset_tbufs(info); - info->tx_enabled = 0; - info->tx_active = 0; + info->tx_enabled = false; + info->tx_active = false; } static void reset_port(struct slgt_info *info) @@ -4470,14 +4471,13 @@ static void reset_rbufs(struct slgt_info *info) /* * pass receive HDLC frame to upper layer * - * return 1 if frame available, otherwise 0 + * return true if frame available, otherwise false */ -static int rx_get_frame(struct slgt_info *info) +static bool rx_get_frame(struct slgt_info *info) { unsigned int start, end; unsigned short status; unsigned int framesize = 0; - int rc = 0; unsigned long flags; struct tty_struct *tty = info->tty; unsigned char addr_field = 0xff; @@ -4601,23 +4601,23 @@ check_again: } } free_rbufs(info, start, end); - rc = 1; + return true; cleanup: - return rc; + return false; } /* * pass receive buffer (RAW synchronous mode) to tty layer - * return 1 if buffer available, otherwise 0 + * return true if buffer available, otherwise false */ -static int rx_get_buf(struct slgt_info *info) +static bool rx_get_buf(struct slgt_info *info) { unsigned int i = info->rbuf_current; unsigned int count; if (!desc_complete(info->rbufs[i])) - return 0; + return false; count = desc_count(info->rbufs[i]); switch(info->params.mode) { case MGSL_MODE_MONOSYNC: @@ -4633,7 +4633,7 @@ static int rx_get_buf(struct slgt_info *info) ldisc_receive_buf(info->tty, info->rbufs[i].buf, info->flag_buf, count); free_rbufs(info, i, i); - return 1; + return true; } static void reset_tbufs(struct slgt_info *info) @@ -4758,7 +4758,7 @@ static int irq_test(struct slgt_info *info) /* assume failure */ info->init_error = DiagStatus_IrqFailure; - info->irq_occurred = FALSE; + info->irq_occurred = false; spin_unlock_irqrestore(&info->lock, flags); @@ -4891,7 +4891,7 @@ static void tx_timeout(unsigned long context) info->icount.txtimeout++; } spin_lock_irqsave(&info->lock,flags); - info->tx_active = 0; + info->tx_active = false; info->tx_count = 0; spin_unlock_irqrestore(&info->lock,flags); diff --git a/drivers/char/synclinkmp.c b/drivers/char/synclinkmp.c index c96062ea72b4..66e8082aff7b 100644 --- a/drivers/char/synclinkmp.c +++ b/drivers/char/synclinkmp.c @@ -188,9 +188,9 @@ typedef struct _synclinkmp_info { u32 pending_bh; - int bh_running; /* Protection from multiple */ + bool bh_running; /* Protection from multiple */ int isr_overflow; - int bh_requested; + bool bh_requested; int dcd_chkcount; /* check counts to prevent */ int cts_chkcount; /* too many IRQs if a signal */ @@ -213,11 +213,11 @@ typedef struct _synclinkmp_info { unsigned char *tmp_rx_buf; unsigned int tmp_rx_buf_count; - int rx_enabled; - int rx_overflow; + bool rx_enabled; + bool rx_overflow; - int tx_enabled; - int tx_active; + bool tx_enabled; + bool tx_active; u32 idle_mode; unsigned char ie0_value; @@ -238,13 +238,13 @@ typedef struct _synclinkmp_info { unsigned int irq_level; /* interrupt level */ unsigned long irq_flags; - int irq_requested; /* nonzero if IRQ requested */ + bool irq_requested; /* true if IRQ requested */ MGSL_PARAMS params; /* communications parameters */ unsigned char serial_signals; /* current serial signal states */ - int irq_occurred; /* for diagnostics use */ + bool irq_occurred; /* for diagnostics use */ unsigned int init_error; /* Initialization startup error */ u32 last_mem_alloc; @@ -255,7 +255,7 @@ typedef struct _synclinkmp_info { unsigned char* sca_base; /* HD64570 SCA Memory address */ u32 phys_sca_base; u32 sca_offset; - int sca_base_requested; + bool sca_base_requested; unsigned char* lcr_base; /* local config registers (PCI only) */ u32 phys_lcr_base; @@ -265,12 +265,12 @@ typedef struct _synclinkmp_info { unsigned char* statctrl_base; /* status/control register memory */ u32 phys_statctrl_base; u32 statctrl_offset; - int sca_statctrl_requested; + bool sca_statctrl_requested; u32 misc_ctrl_value; char flag_buf[MAX_ASYNC_BUFFER_SIZE]; char char_buf[MAX_ASYNC_BUFFER_SIZE]; - BOOLEAN drop_rts_on_tx_done; + bool drop_rts_on_tx_done; struct _input_signal_events input_signal_events; @@ -571,12 +571,12 @@ static void shutdown(SLMP_INFO *info); static void program_hw(SLMP_INFO *info); static void change_params(SLMP_INFO *info); -static int init_adapter(SLMP_INFO *info); -static int register_test(SLMP_INFO *info); -static int irq_test(SLMP_INFO *info); -static int loopback_test(SLMP_INFO *info); +static bool init_adapter(SLMP_INFO *info); +static bool register_test(SLMP_INFO *info); +static bool irq_test(SLMP_INFO *info); +static bool loopback_test(SLMP_INFO *info); static int adapter_test(SLMP_INFO *info); -static int memory_test(SLMP_INFO *info); +static bool memory_test(SLMP_INFO *info); static void reset_adapter(SLMP_INFO *info); static void reset_port(SLMP_INFO *info); @@ -587,7 +587,7 @@ static void rx_stop(SLMP_INFO *info); static void rx_start(SLMP_INFO *info); static void rx_reset_buffers(SLMP_INFO *info); static void rx_free_frame_buffers(SLMP_INFO *info, unsigned int first, unsigned int last); -static int rx_get_frame(SLMP_INFO *info); +static bool rx_get_frame(SLMP_INFO *info); static void tx_start(SLMP_INFO *info); static void tx_stop(SLMP_INFO *info); @@ -2044,8 +2044,8 @@ int bh_action(SLMP_INFO *info) if (!rc) { /* Mark BH routine as complete */ - info->bh_running = 0; - info->bh_requested = 0; + info->bh_running = false; + info->bh_requested = false; } spin_unlock_irqrestore(&info->lock,flags); @@ -2067,7 +2067,7 @@ void bh_handler(struct work_struct *work) printk( "%s(%d):%s bh_handler() entry\n", __FILE__,__LINE__,info->device_name); - info->bh_running = 1; + info->bh_running = true; while((action = bh_action(info)) != 0) { @@ -2152,7 +2152,7 @@ void isr_timer(SLMP_INFO * info) */ write_reg(info, (unsigned char)(timer + TMCS), 0); - info->irq_occurred = TRUE; + info->irq_occurred = true; if ( debug_level >= DEBUG_LEVEL_ISR ) printk("%s(%d):%s isr_timer()\n", @@ -2232,7 +2232,7 @@ void isr_rxrdy(SLMP_INFO * info) while((status = read_reg(info,CST0)) & BIT0) { int flag = 0; - int over = 0; + bool over = false; DataByte = read_reg(info,TRB); icount->rx++; @@ -2265,7 +2265,7 @@ void isr_rxrdy(SLMP_INFO * info) * reported immediately, and doesn't * affect the current character */ - over = 1; + over = true; } } } /* end of if (error) */ @@ -2318,14 +2318,14 @@ static void isr_txeom(SLMP_INFO * info, unsigned char status) info->icount.txok++; } - info->tx_active = 0; + info->tx_active = false; info->tx_count = info->tx_put = info->tx_get = 0; del_timer(&info->tx_timer); if (info->params.mode != MGSL_MODE_ASYNC && info->drop_rts_on_tx_done ) { info->serial_signals &= ~SerialSignal_RTS; - info->drop_rts_on_tx_done = 0; + info->drop_rts_on_tx_done = false; set_signals(info); } @@ -2398,7 +2398,7 @@ void isr_txrdy(SLMP_INFO * info) if ( info->tx_count ) tx_load_fifo( info ); else { - info->tx_active = 0; + info->tx_active = false; info->ie0_value &= ~TXRDYE; write_reg(info, IE0, info->ie0_value); } @@ -2438,7 +2438,7 @@ void isr_rxdmaerror(SLMP_INFO * info) printk("%s(%d):%s isr_rxdmaerror(), status=%02x\n", __FILE__,__LINE__,info->device_name,status); - info->rx_overflow = TRUE; + info->rx_overflow = true; info->pending_bh |= BH_RECEIVE; } @@ -2691,7 +2691,7 @@ static irqreturn_t synclinkmp_interrupt(int dummy, void *dev_id) printk("%s(%d):%s queueing bh task.\n", __FILE__,__LINE__,port->device_name); schedule_work(&port->task); - port->bh_requested = 1; + port->bh_requested = true; } } @@ -3320,7 +3320,8 @@ static int block_til_ready(struct tty_struct *tty, struct file *filp, { DECLARE_WAITQUEUE(wait, current); int retval; - int do_clocal = 0, extra_count = 0; + bool do_clocal = false; + bool extra_count = false; unsigned long flags; if (debug_level >= DEBUG_LEVEL_INFO) @@ -3335,7 +3336,7 @@ static int block_til_ready(struct tty_struct *tty, struct file *filp, } if (tty->termios->c_cflag & CLOCAL) - do_clocal = 1; + do_clocal = true; /* Wait for carrier detect and the line to become * free (i.e., not in use by the callout). While we are in @@ -3353,7 +3354,7 @@ static int block_til_ready(struct tty_struct *tty, struct file *filp, spin_lock_irqsave(&info->lock, flags); if (!tty_hung_up_p(filp)) { - extra_count = 1; + extra_count = true; info->count--; } spin_unlock_irqrestore(&info->lock, flags); @@ -3596,7 +3597,7 @@ int claim_resources(SLMP_INFO *info) goto errout; } else - info->shared_mem_requested = 1; + info->shared_mem_requested = true; if (request_mem_region(info->phys_lcr_base + info->lcr_offset,128,"synclinkmp") == NULL) { printk( "%s(%d):%s lcr mem addr conflict, Addr=%08X\n", @@ -3605,7 +3606,7 @@ int claim_resources(SLMP_INFO *info) goto errout; } else - info->lcr_mem_requested = 1; + info->lcr_mem_requested = true; if (request_mem_region(info->phys_sca_base + info->sca_offset,SCA_BASE_SIZE,"synclinkmp") == NULL) { printk( "%s(%d):%s sca mem addr conflict, Addr=%08X\n", @@ -3614,7 +3615,7 @@ int claim_resources(SLMP_INFO *info) goto errout; } else - info->sca_base_requested = 1; + info->sca_base_requested = true; if (request_mem_region(info->phys_statctrl_base + info->statctrl_offset,SCA_REG_SIZE,"synclinkmp") == NULL) { printk( "%s(%d):%s stat/ctrl mem addr conflict, Addr=%08X\n", @@ -3623,7 +3624,7 @@ int claim_resources(SLMP_INFO *info) goto errout; } else - info->sca_statctrl_requested = 1; + info->sca_statctrl_requested = true; info->memory_base = ioremap(info->phys_memory_base,SCA_MEM_SIZE); if (!info->memory_base) { @@ -3682,24 +3683,24 @@ void release_resources(SLMP_INFO *info) if ( info->irq_requested ) { free_irq(info->irq_level, info); - info->irq_requested = 0; + info->irq_requested = false; } if ( info->shared_mem_requested ) { release_mem_region(info->phys_memory_base,SCA_MEM_SIZE); - info->shared_mem_requested = 0; + info->shared_mem_requested = false; } if ( info->lcr_mem_requested ) { release_mem_region(info->phys_lcr_base + info->lcr_offset,128); - info->lcr_mem_requested = 0; + info->lcr_mem_requested = false; } if ( info->sca_base_requested ) { release_mem_region(info->phys_sca_base + info->sca_offset,SCA_BASE_SIZE); - info->sca_base_requested = 0; + info->sca_base_requested = false; } if ( info->sca_statctrl_requested ) { release_mem_region(info->phys_statctrl_base + info->statctrl_offset,SCA_REG_SIZE); - info->sca_statctrl_requested = 0; + info->sca_statctrl_requested = false; } if (info->memory_base){ @@ -3902,7 +3903,7 @@ void device_init(int adapter_num, struct pci_dev *pdev) port_array[0]->irq_level ); } else { - port_array[0]->irq_requested = 1; + port_array[0]->irq_requested = true; adapter_test(port_array[0]); } } @@ -4155,8 +4156,8 @@ void rx_stop(SLMP_INFO *info) write_reg(info, RXDMA + DCMD, SWABORT); /* reset/init Rx DMA */ write_reg(info, RXDMA + DIR, 0); /* disable Rx DMA interrupts */ - info->rx_enabled = 0; - info->rx_overflow = 0; + info->rx_enabled = false; + info->rx_overflow = false; } /* enable the receiver @@ -4211,8 +4212,8 @@ void rx_start(SLMP_INFO *info) write_reg(info, CMD, RXENABLE); - info->rx_overflow = FALSE; - info->rx_enabled = 1; + info->rx_overflow = false; + info->rx_enabled = true; } /* Enable the transmitter and send a transmit frame if @@ -4227,7 +4228,7 @@ void tx_start(SLMP_INFO *info) if (!info->tx_enabled ) { write_reg(info, CMD, TXRESET); write_reg(info, CMD, TXENABLE); - info->tx_enabled = TRUE; + info->tx_enabled = true; } if ( info->tx_count ) { @@ -4236,7 +4237,7 @@ void tx_start(SLMP_INFO *info) /* RTS and set a flag indicating that the driver should */ /* negate RTS when the transmission completes. */ - info->drop_rts_on_tx_done = 0; + info->drop_rts_on_tx_done = false; if (info->params.mode != MGSL_MODE_ASYNC) { @@ -4245,7 +4246,7 @@ void tx_start(SLMP_INFO *info) if ( !(info->serial_signals & SerialSignal_RTS) ) { info->serial_signals |= SerialSignal_RTS; set_signals( info ); - info->drop_rts_on_tx_done = 1; + info->drop_rts_on_tx_done = true; } } @@ -4282,7 +4283,7 @@ void tx_start(SLMP_INFO *info) write_reg(info, IE0, info->ie0_value); } - info->tx_active = 1; + info->tx_active = true; } } @@ -4308,8 +4309,8 @@ void tx_stop( SLMP_INFO *info ) info->ie0_value &= ~TXRDYE; write_reg(info, IE0, info->ie0_value); /* disable tx data interrupts */ - info->tx_enabled = 0; - info->tx_active = 0; + info->tx_enabled = false; + info->tx_active = false; } /* Fill the transmit FIFO until the FIFO is full or @@ -4832,14 +4833,14 @@ void rx_reset_buffers(SLMP_INFO *info) */ void rx_free_frame_buffers(SLMP_INFO *info, unsigned int first, unsigned int last) { - int done = 0; + bool done = false; while(!done) { /* reset current buffer for reuse */ info->rx_buf_list[first].status = 0xff; if (first == last) { - done = 1; + done = true; /* set new last rx descriptor address */ write_reg16(info, RXDMA + EDA, info->rx_buf_list_ex[first].phys_entry); } @@ -4856,14 +4857,14 @@ void rx_free_frame_buffers(SLMP_INFO *info, unsigned int first, unsigned int las /* Return a received frame from the receive DMA buffers. * Only frames received without errors are returned. * - * Return Value: 1 if frame returned, otherwise 0 + * Return Value: true if frame returned, otherwise false */ -int rx_get_frame(SLMP_INFO *info) +bool rx_get_frame(SLMP_INFO *info) { unsigned int StartIndex, EndIndex; /* index of 1st and last buffers of Rx frame */ unsigned short status; unsigned int framesize = 0; - int ReturnCode = 0; + bool ReturnCode = false; unsigned long flags; struct tty_struct *tty = info->tty; unsigned char addr_field = 0xff; @@ -5014,7 +5015,7 @@ CheckAgain: /* Free the buffers used by this frame. */ rx_free_frame_buffers( info, StartIndex, EndIndex ); - ReturnCode = 1; + ReturnCode = true; Cleanup: if ( info->rx_enabled && info->rx_overflow ) { @@ -5073,12 +5074,12 @@ void tx_load_dma_buffer(SLMP_INFO *info, const char *buf, unsigned int count) info->last_tx_buf = ++i; } -int register_test(SLMP_INFO *info) +bool register_test(SLMP_INFO *info) { static unsigned char testval[] = {0x00, 0xff, 0xaa, 0x55, 0x69, 0x96}; static unsigned int count = ARRAY_SIZE(testval); unsigned int i; - int rc = TRUE; + bool rc = true; unsigned long flags; spin_lock_irqsave(&info->lock,flags); @@ -5101,7 +5102,7 @@ int register_test(SLMP_INFO *info) (read_reg(info, SA0) != testval[(i+2)%count]) || (read_reg(info, SA1) != testval[(i+3)%count]) ) { - rc = FALSE; + rc = false; break; } } @@ -5112,7 +5113,7 @@ int register_test(SLMP_INFO *info) return rc; } -int irq_test(SLMP_INFO *info) +bool irq_test(SLMP_INFO *info) { unsigned long timeout; unsigned long flags; @@ -5124,7 +5125,7 @@ int irq_test(SLMP_INFO *info) /* assume failure */ info->init_error = DiagStatus_IrqFailure; - info->irq_occurred = FALSE; + info->irq_occurred = false; /* setup timer0 on SCA0 to interrupt */ @@ -5163,7 +5164,7 @@ int irq_test(SLMP_INFO *info) /* initialize individual SCA device (2 ports) */ -static int sca_init(SLMP_INFO *info) +static bool sca_init(SLMP_INFO *info) { /* set wait controller to single mem partition (low), no wait states */ write_reg(info, PABR0, 0); /* wait controller addr boundary 0 */ @@ -5199,12 +5200,12 @@ static int sca_init(SLMP_INFO *info) */ write_reg(info, ITCR, 0); - return TRUE; + return true; } /* initialize adapter hardware */ -int init_adapter(SLMP_INFO *info) +bool init_adapter(SLMP_INFO *info) { int i; @@ -5257,20 +5258,20 @@ int init_adapter(SLMP_INFO *info) sca_init(info->port_array[0]); sca_init(info->port_array[2]); - return TRUE; + return true; } /* Loopback an HDLC frame to test the hardware * interrupt and DMA functions. */ -int loopback_test(SLMP_INFO *info) +bool loopback_test(SLMP_INFO *info) { #define TESTFRAMESIZE 20 unsigned long timeout; u16 count = TESTFRAMESIZE; unsigned char buf[TESTFRAMESIZE]; - int rc = FALSE; + bool rc = false; unsigned long flags; struct tty_struct *oldtty = info->tty; @@ -5304,16 +5305,16 @@ int loopback_test(SLMP_INFO *info) msleep_interruptible(10); if (rx_get_frame(info)) { - rc = TRUE; + rc = true; break; } } /* verify received frame length and contents */ - if (rc == TRUE && - ( info->tmp_rx_buf_count != count || - memcmp(buf, info->tmp_rx_buf,count))) { - rc = FALSE; + if (rc && + ( info->tmp_rx_buf_count != count || + memcmp(buf, info->tmp_rx_buf,count))) { + rc = false; } spin_lock_irqsave(&info->lock,flags); @@ -5390,7 +5391,7 @@ int adapter_test( SLMP_INFO *info ) /* Test the shared memory on a PCI adapter. */ -int memory_test(SLMP_INFO *info) +bool memory_test(SLMP_INFO *info) { static unsigned long testval[] = { 0x0, 0x55555555, 0xaaaaaaaa, 0x66666666, 0x99999999, 0xffffffff, 0x12345678 }; @@ -5404,7 +5405,7 @@ int memory_test(SLMP_INFO *info) for ( i = 0 ; i < count ; i++ ) { *addr = testval[i]; if ( *addr != testval[i] ) - return FALSE; + return false; } /* Test address lines with incrementing pattern over */ @@ -5419,12 +5420,12 @@ int memory_test(SLMP_INFO *info) for ( i = 0 ; i < limit ; i++ ) { if ( *addr != i * 4 ) - return FALSE; + return false; addr++; } memset( info->memory_base, 0, SCA_MEM_SIZE ); - return TRUE; + return true; } /* Load data into PCI adapter shared memory. @@ -5508,7 +5509,7 @@ void tx_timeout(unsigned long context) info->icount.txtimeout++; } spin_lock_irqsave(&info->lock,flags); - info->tx_active = 0; + info->tx_active = false; info->tx_count = info->tx_put = info->tx_get = 0; spin_unlock_irqrestore(&info->lock,flags); diff --git a/include/linux/synclink.h b/include/linux/synclink.h index 5562fbf72095..45f6bc82d317 100644 --- a/include/linux/synclink.h +++ b/include/linux/synclink.h @@ -13,10 +13,6 @@ #define _SYNCLINK_H_ #define SYNCLINK_H_VERSION 3.6 -#define BOOLEAN int -#define TRUE 1 -#define FALSE 0 - #define BIT0 0x0001 #define BIT1 0x0002 #define BIT2 0x0004 -- cgit v1.2.3 From ce9f9f73af0338a680d66288cbf0efe4b900e78b Mon Sep 17 00:00:00 2001 From: Harvey Harrison Date: Mon, 28 Apr 2008 02:14:05 -0700 Subject: char: make functions static in synclinkmp.c All were forward declared with static. Fixes sparse warnings: drivers/char/synclinkmp.c:1476:5: warning: symbol 'read_proc' was not declared. Should it be static? drivers/char/synclinkmp.c:2027:5: warning: symbol 'bh_action' was not declared. Should it be static? drivers/char/synclinkmp.c:2058:6: warning: symbol 'bh_handler' was not declared. Should it be static? drivers/char/synclinkmp.c:2103:6: warning: symbol 'bh_receive' was not declared. Should it be static? drivers/char/synclinkmp.c:2112:6: warning: symbol 'bh_transmit' was not declared. Should it be static? drivers/char/synclinkmp.c:2124:6: warning: symbol 'bh_status' was not declared. Should it be static? drivers/char/synclinkmp.c:2136:6: warning: symbol 'isr_timer' was not declared. Should it be static? drivers/char/synclinkmp.c:2162:6: warning: symbol 'isr_rxint' was not declared. Should it be static? drivers/char/synclinkmp.c:2221:6: warning: symbol 'isr_rxrdy' was not declared. Should it be static? drivers/char/synclinkmp.c:2351:6: warning: symbol 'isr_txint' was not declared. Should it be static? drivers/char/synclinkmp.c:2379:6: warning: symbol 'isr_txrdy' was not declared. Should it be static? drivers/char/synclinkmp.c:2410:6: warning: symbol 'isr_rxdmaok' was not declared. Should it be static? drivers/char/synclinkmp.c:2427:6: warning: symbol 'isr_rxdmaerror' was not declared. Should it be static? drivers/char/synclinkmp.c:2445:6: warning: symbol 'isr_txdmaok' was not declared. Should it be static? drivers/char/synclinkmp.c:2463:6: warning: symbol 'isr_txdmaerror' was not declared. Should it be static? drivers/char/synclinkmp.c:2480:6: warning: symbol 'isr_io_pin' was not declared. Should it be static? drivers/char/synclinkmp.c:3420:5: warning: symbol 'alloc_dma_bufs' was not declared. Should it be static? drivers/char/synclinkmp.c:3494:5: warning: symbol 'alloc_buf_list' was not declared. Should it be static? drivers/char/synclinkmp.c:3553:5: warning: symbol 'alloc_frame_bufs' was not declared. Should it be static? drivers/char/synclinkmp.c:3570:6: warning: symbol 'free_dma_bufs' was not declared. Should it be static? drivers/char/synclinkmp.c:3580:5: warning: symbol 'alloc_tmp_rx_buf' was not declared. Should it be static? drivers/char/synclinkmp.c:3588:6: warning: symbol 'free_tmp_rx_buf' was not declared. Should it be static? drivers/char/synclinkmp.c:3594:5: warning: symbol 'claim_resources' was not declared. Should it be static? drivers/char/synclinkmp.c:3681:6: warning: symbol 'release_resources' was not declared. Should it be static? drivers/char/synclinkmp.c:3737:6: warning: symbol 'add_device' was not declared. Should it be static? drivers/char/synclinkmp.c:3860:6: warning: symbol 'device_init' was not declared. Should it be static? drivers/char/synclinkmp.c:4054:6: warning: symbol 'enable_loopback' was not declared. Should it be static? drivers/char/synclinkmp.c:4101:6: warning: symbol 'set_rate' was not declared. Should it be static? drivers/char/synclinkmp.c:4147:6: warning: symbol 'rx_stop' was not declared. Should it be static? drivers/char/synclinkmp.c:4168:6: warning: symbol 'rx_start' was not declared. Should it be static? drivers/char/synclinkmp.c:4225:6: warning: symbol 'tx_start' was not declared. Should it be static? drivers/char/synclinkmp.c:4295:6: warning: symbol 'tx_stop' was not declared. Should it be static? drivers/char/synclinkmp.c:4322:6: warning: symbol 'tx_load_fifo' was not declared. Should it be static? drivers/char/synclinkmp.c:4371:6: warning: symbol 'reset_port' was not declared. Should it be static? drivers/char/synclinkmp.c:4395:6: warning: symbol 'reset_adapter' was not declared. Should it be static? drivers/char/synclinkmp.c:4407:6: warning: symbol 'async_mode' was not declared. Should it be static? drivers/char/synclinkmp.c:4546:6: warning: symbol 'hdlc_mode' was not declared. Should it be static? drivers/char/synclinkmp.c:4748:6: warning: symbol 'tx_set_idle' was not declared. Should it be static? drivers/char/synclinkmp.c:4768:6: warning: symbol 'get_signals' was not declared. Should it be static? drivers/char/synclinkmp.c:4797:6: warning: symbol 'set_signals' was not declared. Should it be static? drivers/char/synclinkmp.c:4826:6: warning: symbol 'rx_reset_buffers' was not declared. Should it be static? drivers/char/synclinkmp.c:4837:6: warning: symbol 'rx_free_frame_buffers' was not declared. Should it be static? drivers/char/synclinkmp.c:4865:5: warning: symbol 'rx_get_frame' was not declared. Should it be static? drivers/char/synclinkmp.c:5040:6: warning: symbol 'tx_load_dma_buffer' was not declared. Should it be static? drivers/char/synclinkmp.c:5080:5: warning: symbol 'register_test' was not declared. Should it be static? drivers/char/synclinkmp.c:5119:5: warning: symbol 'irq_test' was not declared. Should it be static? drivers/char/synclinkmp.c:5211:5: warning: symbol 'init_adapter' was not declared. Should it be static? drivers/char/synclinkmp.c:5270:5: warning: symbol 'loopback_test' was not declared. Should it be static? drivers/char/synclinkmp.c:5335:5: warning: symbol 'adapter_test' was not declared. Should it be static? drivers/char/synclinkmp.c:5397:5: warning: symbol 'memory_test' was not declared. Should it be static? drivers/char/synclinkmp.c:5449:6: warning: symbol 'load_pci_memory' was not declared. Should it be static? drivers/char/synclinkmp.c:5468:6: warning: symbol 'trace_block' was not declared. Should it be static? drivers/char/synclinkmp.c:5503:6: warning: symbol 'tx_timeout' was not declared. Should it be static? drivers/char/synclinkmp.c:5530:6: warning: symbol 'status_timeout' was not declared. Should it be static? drivers/char/synclinkmp.c:5581:15: warning: symbol 'read_reg' was not declared. Should it be static? drivers/char/synclinkmp.c:5586:6: warning: symbol 'write_reg' was not declared. Should it be static? drivers/char/synclinkmp.c:5592:5: warning: symbol 'read_reg16' was not declared. Should it be static? drivers/char/synclinkmp.c:5598:6: warning: symbol 'write_reg16' was not declared. Should it be static? drivers/char/synclinkmp.c:5604:15: warning: symbol 'read_status_reg' was not declared. Should it be static? drivers/char/synclinkmp.c:5610:6: warning: symbol 'write_control_reg' was not declared. Should it be static? Signed-off-by: Harvey Harrison Cc: Paul Fulghum Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/char/synclinkmp.c | 120 +++++++++++++++++++++++----------------------- 1 file changed, 60 insertions(+), 60 deletions(-) diff --git a/drivers/char/synclinkmp.c b/drivers/char/synclinkmp.c index 66e8082aff7b..e98c3e6f8216 100644 --- a/drivers/char/synclinkmp.c +++ b/drivers/char/synclinkmp.c @@ -1473,7 +1473,7 @@ static inline int line_info(char *buf, SLMP_INFO *info) /* Called to print information about devices */ -int read_proc(char *page, char **start, off_t off, int count, +static int read_proc(char *page, char **start, off_t off, int count, int *eof, void *data) { int len = 0, l; @@ -2024,7 +2024,7 @@ static void hdlcdev_exit(SLMP_INFO *info) /* Return next bottom half action to perform. * Return Value: BH action code or 0 if nothing to do. */ -int bh_action(SLMP_INFO *info) +static int bh_action(SLMP_INFO *info) { unsigned long flags; int rc = 0; @@ -2055,7 +2055,7 @@ int bh_action(SLMP_INFO *info) /* Perform bottom half processing of work items queued by ISR. */ -void bh_handler(struct work_struct *work) +static void bh_handler(struct work_struct *work) { SLMP_INFO *info = container_of(work, SLMP_INFO, task); int action; @@ -2100,7 +2100,7 @@ void bh_handler(struct work_struct *work) __FILE__,__LINE__,info->device_name); } -void bh_receive(SLMP_INFO *info) +static void bh_receive(SLMP_INFO *info) { if ( debug_level >= DEBUG_LEVEL_BH ) printk( "%s(%d):%s bh_receive()\n", @@ -2109,7 +2109,7 @@ void bh_receive(SLMP_INFO *info) while( rx_get_frame(info) ); } -void bh_transmit(SLMP_INFO *info) +static void bh_transmit(SLMP_INFO *info) { struct tty_struct *tty = info->tty; @@ -2121,7 +2121,7 @@ void bh_transmit(SLMP_INFO *info) tty_wakeup(tty); } -void bh_status(SLMP_INFO *info) +static void bh_status(SLMP_INFO *info) { if ( debug_level >= DEBUG_LEVEL_BH ) printk( "%s(%d):%s bh_status() entry\n", @@ -2133,7 +2133,7 @@ void bh_status(SLMP_INFO *info) info->cts_chkcount = 0; } -void isr_timer(SLMP_INFO * info) +static void isr_timer(SLMP_INFO * info) { unsigned char timer = (info->port_num & 1) ? TIMER2 : TIMER0; @@ -2159,7 +2159,7 @@ void isr_timer(SLMP_INFO * info) __FILE__,__LINE__,info->device_name); } -void isr_rxint(SLMP_INFO * info) +static void isr_rxint(SLMP_INFO * info) { struct tty_struct *tty = info->tty; struct mgsl_icount *icount = &info->icount; @@ -2218,7 +2218,7 @@ void isr_rxint(SLMP_INFO * info) /* * handle async rx data interrupts */ -void isr_rxrdy(SLMP_INFO * info) +static void isr_rxrdy(SLMP_INFO * info) { u16 status; unsigned char DataByte; @@ -2348,7 +2348,7 @@ static void isr_txeom(SLMP_INFO * info, unsigned char status) /* * handle tx status interrupts */ -void isr_txint(SLMP_INFO * info) +static void isr_txint(SLMP_INFO * info) { unsigned char status = read_reg(info, SR1) & info->ie1_value & (UDRN + IDLE + CCTS); @@ -2376,7 +2376,7 @@ void isr_txint(SLMP_INFO * info) /* * handle async tx data interrupts */ -void isr_txrdy(SLMP_INFO * info) +static void isr_txrdy(SLMP_INFO * info) { if ( debug_level >= DEBUG_LEVEL_ISR ) printk("%s(%d):%s isr_txrdy() tx_count=%d\n", @@ -2407,7 +2407,7 @@ void isr_txrdy(SLMP_INFO * info) info->pending_bh |= BH_TRANSMIT; } -void isr_rxdmaok(SLMP_INFO * info) +static void isr_rxdmaok(SLMP_INFO * info) { /* BIT7 = EOT (end of transfer) * BIT6 = EOM (end of message/frame) @@ -2424,7 +2424,7 @@ void isr_rxdmaok(SLMP_INFO * info) info->pending_bh |= BH_RECEIVE; } -void isr_rxdmaerror(SLMP_INFO * info) +static void isr_rxdmaerror(SLMP_INFO * info) { /* BIT5 = BOF (buffer overflow) * BIT4 = COF (counter overflow) @@ -2442,7 +2442,7 @@ void isr_rxdmaerror(SLMP_INFO * info) info->pending_bh |= BH_RECEIVE; } -void isr_txdmaok(SLMP_INFO * info) +static void isr_txdmaok(SLMP_INFO * info) { unsigned char status_reg1 = read_reg(info, SR1); @@ -2460,7 +2460,7 @@ void isr_txdmaok(SLMP_INFO * info) write_reg(info, IE0, info->ie0_value); } -void isr_txdmaerror(SLMP_INFO * info) +static void isr_txdmaerror(SLMP_INFO * info) { /* BIT5 = BOF (buffer overflow) * BIT4 = COF (counter overflow) @@ -2477,7 +2477,7 @@ void isr_txdmaerror(SLMP_INFO * info) /* handle input serial signal changes */ -void isr_io_pin( SLMP_INFO *info, u16 status ) +static void isr_io_pin( SLMP_INFO *info, u16 status ) { struct mgsl_icount *icount; @@ -3414,7 +3414,7 @@ static int block_til_ready(struct tty_struct *tty, struct file *filp, return retval; } -int alloc_dma_bufs(SLMP_INFO *info) +static int alloc_dma_bufs(SLMP_INFO *info) { unsigned short BuffersPerFrame; unsigned short BufferCount; @@ -3488,7 +3488,7 @@ int alloc_dma_bufs(SLMP_INFO *info) /* Allocate DMA buffers for the transmit and receive descriptor lists. */ -int alloc_buf_list(SLMP_INFO *info) +static int alloc_buf_list(SLMP_INFO *info) { unsigned int i; @@ -3547,7 +3547,7 @@ int alloc_buf_list(SLMP_INFO *info) /* Allocate the frame DMA buffers used by the specified buffer list. */ -int alloc_frame_bufs(SLMP_INFO *info, SCADESC *buf_list,SCADESC_EX *buf_list_ex,int count) +static int alloc_frame_bufs(SLMP_INFO *info, SCADESC *buf_list,SCADESC_EX *buf_list_ex,int count) { int i; unsigned long phys_addr; @@ -3564,7 +3564,7 @@ int alloc_frame_bufs(SLMP_INFO *info, SCADESC *buf_list,SCADESC_EX *buf_list_ex, return 0; } -void free_dma_bufs(SLMP_INFO *info) +static void free_dma_bufs(SLMP_INFO *info) { info->buffer_list = NULL; info->rx_buf_list = NULL; @@ -3574,7 +3574,7 @@ void free_dma_bufs(SLMP_INFO *info) /* allocate buffer large enough to hold max_frame_size. * This buffer is used to pass an assembled frame to the line discipline. */ -int alloc_tmp_rx_buf(SLMP_INFO *info) +static int alloc_tmp_rx_buf(SLMP_INFO *info) { info->tmp_rx_buf = kmalloc(info->max_frame_size, GFP_KERNEL); if (info->tmp_rx_buf == NULL) @@ -3582,13 +3582,13 @@ int alloc_tmp_rx_buf(SLMP_INFO *info) return 0; } -void free_tmp_rx_buf(SLMP_INFO *info) +static void free_tmp_rx_buf(SLMP_INFO *info) { kfree(info->tmp_rx_buf); info->tmp_rx_buf = NULL; } -int claim_resources(SLMP_INFO *info) +static int claim_resources(SLMP_INFO *info) { if (request_mem_region(info->phys_memory_base,SCA_MEM_SIZE,"synclinkmp") == NULL) { printk( "%s(%d):%s mem addr conflict, Addr=%08X\n", @@ -3675,7 +3675,7 @@ errout: return -ENODEV; } -void release_resources(SLMP_INFO *info) +static void release_resources(SLMP_INFO *info) { if ( debug_level >= DEBUG_LEVEL_INFO ) printk( "%s(%d):%s release_resources() entry\n", @@ -3731,7 +3731,7 @@ void release_resources(SLMP_INFO *info) /* Add the specified device instance data structure to the * global linked list of devices and increment the device count. */ -void add_device(SLMP_INFO *info) +static void add_device(SLMP_INFO *info) { info->next_device = NULL; info->line = synclinkmp_device_count; @@ -3854,7 +3854,7 @@ static SLMP_INFO *alloc_dev(int adapter_num, int port_num, struct pci_dev *pdev) return info; } -void device_init(int adapter_num, struct pci_dev *pdev) +static void device_init(int adapter_num, struct pci_dev *pdev) { SLMP_INFO *port_array[SCA_MAX_PORTS]; int port; @@ -4048,7 +4048,7 @@ module_exit(synclinkmp_exit); * The TxCLK and RxCLK signals are generated from the BRG and * the TxD is looped back to the RxD internally. */ -void enable_loopback(SLMP_INFO *info, int enable) +static void enable_loopback(SLMP_INFO *info, int enable) { if (enable) { /* MD2 (Mode Register 2) @@ -4095,7 +4095,7 @@ void enable_loopback(SLMP_INFO *info, int enable) * data_rate data rate of clock in bits per second * A data rate of 0 disables the AUX clock. */ -void set_rate( SLMP_INFO *info, u32 data_rate ) +static void set_rate( SLMP_INFO *info, u32 data_rate ) { u32 TMCValue; unsigned char BRValue; @@ -4141,7 +4141,7 @@ void set_rate( SLMP_INFO *info, u32 data_rate ) /* Disable receiver */ -void rx_stop(SLMP_INFO *info) +static void rx_stop(SLMP_INFO *info) { if (debug_level >= DEBUG_LEVEL_ISR) printk("%s(%d):%s rx_stop()\n", @@ -4162,7 +4162,7 @@ void rx_stop(SLMP_INFO *info) /* enable the receiver */ -void rx_start(SLMP_INFO *info) +static void rx_start(SLMP_INFO *info) { int i; @@ -4219,7 +4219,7 @@ void rx_start(SLMP_INFO *info) /* Enable the transmitter and send a transmit frame if * one is loaded in the DMA buffers. */ -void tx_start(SLMP_INFO *info) +static void tx_start(SLMP_INFO *info) { if (debug_level >= DEBUG_LEVEL_ISR) printk("%s(%d):%s tx_start() tx_count=%d\n", @@ -4289,7 +4289,7 @@ void tx_start(SLMP_INFO *info) /* stop the transmitter and DMA */ -void tx_stop( SLMP_INFO *info ) +static void tx_stop( SLMP_INFO *info ) { if (debug_level >= DEBUG_LEVEL_ISR) printk("%s(%d):%s tx_stop()\n", @@ -4316,7 +4316,7 @@ void tx_stop( SLMP_INFO *info ) /* Fill the transmit FIFO until the FIFO is full or * there is no more data to load. */ -void tx_load_fifo(SLMP_INFO *info) +static void tx_load_fifo(SLMP_INFO *info) { u8 TwoBytes[2]; @@ -4365,7 +4365,7 @@ void tx_load_fifo(SLMP_INFO *info) /* Reset a port to a known state */ -void reset_port(SLMP_INFO *info) +static void reset_port(SLMP_INFO *info) { if (info->sca_base) { @@ -4389,7 +4389,7 @@ void reset_port(SLMP_INFO *info) /* Reset all the ports to a known state. */ -void reset_adapter(SLMP_INFO *info) +static void reset_adapter(SLMP_INFO *info) { int i; @@ -4401,7 +4401,7 @@ void reset_adapter(SLMP_INFO *info) /* Program port for asynchronous communications. */ -void async_mode(SLMP_INFO *info) +static void async_mode(SLMP_INFO *info) { unsigned char RegValue; @@ -4540,7 +4540,7 @@ void async_mode(SLMP_INFO *info) /* Program the SCA for HDLC communications. */ -void hdlc_mode(SLMP_INFO *info) +static void hdlc_mode(SLMP_INFO *info) { unsigned char RegValue; u32 DpllDivisor; @@ -4742,7 +4742,7 @@ void hdlc_mode(SLMP_INFO *info) /* Set the transmit HDLC idle mode */ -void tx_set_idle(SLMP_INFO *info) +static void tx_set_idle(SLMP_INFO *info) { unsigned char RegValue = 0xff; @@ -4762,7 +4762,7 @@ void tx_set_idle(SLMP_INFO *info) /* Query the adapter for the state of the V24 status (input) signals. */ -void get_signals(SLMP_INFO *info) +static void get_signals(SLMP_INFO *info) { u16 status = read_reg(info, SR3); u16 gpstatus = read_status_reg(info); @@ -4791,7 +4791,7 @@ void get_signals(SLMP_INFO *info) /* Set the state of DTR and RTS based on contents of * serial_signals member of device context. */ -void set_signals(SLMP_INFO *info) +static void set_signals(SLMP_INFO *info) { unsigned char RegValue; u16 EnableBit; @@ -4820,7 +4820,7 @@ void set_signals(SLMP_INFO *info) * and set the current buffer to the first buffer. This effectively * makes all buffers free and discards any data in buffers. */ -void rx_reset_buffers(SLMP_INFO *info) +static void rx_reset_buffers(SLMP_INFO *info) { rx_free_frame_buffers(info, 0, info->rx_buf_count - 1); } @@ -4831,7 +4831,7 @@ void rx_reset_buffers(SLMP_INFO *info) * first index of 1st receive buffer of frame * last index of last receive buffer of frame */ -void rx_free_frame_buffers(SLMP_INFO *info, unsigned int first, unsigned int last) +static void rx_free_frame_buffers(SLMP_INFO *info, unsigned int first, unsigned int last) { bool done = false; @@ -4859,7 +4859,7 @@ void rx_free_frame_buffers(SLMP_INFO *info, unsigned int first, unsigned int las * * Return Value: true if frame returned, otherwise false */ -bool rx_get_frame(SLMP_INFO *info) +static bool rx_get_frame(SLMP_INFO *info) { unsigned int StartIndex, EndIndex; /* index of 1st and last buffers of Rx frame */ unsigned short status; @@ -5034,7 +5034,7 @@ Cleanup: /* load the transmit DMA buffer with data */ -void tx_load_dma_buffer(SLMP_INFO *info, const char *buf, unsigned int count) +static void tx_load_dma_buffer(SLMP_INFO *info, const char *buf, unsigned int count) { unsigned short copy_count; unsigned int i = 0; @@ -5074,7 +5074,7 @@ void tx_load_dma_buffer(SLMP_INFO *info, const char *buf, unsigned int count) info->last_tx_buf = ++i; } -bool register_test(SLMP_INFO *info) +static bool register_test(SLMP_INFO *info) { static unsigned char testval[] = {0x00, 0xff, 0xaa, 0x55, 0x69, 0x96}; static unsigned int count = ARRAY_SIZE(testval); @@ -5113,7 +5113,7 @@ bool register_test(SLMP_INFO *info) return rc; } -bool irq_test(SLMP_INFO *info) +static bool irq_test(SLMP_INFO *info) { unsigned long timeout; unsigned long flags; @@ -5205,7 +5205,7 @@ static bool sca_init(SLMP_INFO *info) /* initialize adapter hardware */ -bool init_adapter(SLMP_INFO *info) +static bool init_adapter(SLMP_INFO *info) { int i; @@ -5264,7 +5264,7 @@ bool init_adapter(SLMP_INFO *info) /* Loopback an HDLC frame to test the hardware * interrupt and DMA functions. */ -bool loopback_test(SLMP_INFO *info) +static bool loopback_test(SLMP_INFO *info) { #define TESTFRAMESIZE 20 @@ -5329,7 +5329,7 @@ bool loopback_test(SLMP_INFO *info) /* Perform diagnostics on hardware */ -int adapter_test( SLMP_INFO *info ) +static int adapter_test( SLMP_INFO *info ) { unsigned long flags; if ( debug_level >= DEBUG_LEVEL_INFO ) @@ -5391,7 +5391,7 @@ int adapter_test( SLMP_INFO *info ) /* Test the shared memory on a PCI adapter. */ -bool memory_test(SLMP_INFO *info) +static bool memory_test(SLMP_INFO *info) { static unsigned long testval[] = { 0x0, 0x55555555, 0xaaaaaaaa, 0x66666666, 0x99999999, 0xffffffff, 0x12345678 }; @@ -5443,7 +5443,7 @@ bool memory_test(SLMP_INFO *info) * the write transation. This allows any pending DMA request to gain control * of the local bus in a timely fasion. */ -void load_pci_memory(SLMP_INFO *info, char* dest, const char* src, unsigned short count) +static void load_pci_memory(SLMP_INFO *info, char* dest, const char* src, unsigned short count) { /* A load interval of 16 allows for 4 32-bit writes at */ /* 136ns each for a maximum latency of 542ns on the local bus.*/ @@ -5462,7 +5462,7 @@ void load_pci_memory(SLMP_INFO *info, char* dest, const char* src, unsigned shor memcpy(dest, src, count % sca_pci_load_interval); } -void trace_block(SLMP_INFO *info,const char* data, int count, int xmit) +static void trace_block(SLMP_INFO *info,const char* data, int count, int xmit) { int i; int linecount; @@ -5497,7 +5497,7 @@ void trace_block(SLMP_INFO *info,const char* data, int count, int xmit) /* called when HDLC frame times out * update stats and do tx completion processing */ -void tx_timeout(unsigned long context) +static void tx_timeout(unsigned long context) { SLMP_INFO *info = (SLMP_INFO*)context; unsigned long flags; @@ -5524,7 +5524,7 @@ void tx_timeout(unsigned long context) /* called to periodically check the DSR/RI modem signal input status */ -void status_timeout(unsigned long context) +static void status_timeout(unsigned long context) { u16 status = 0; SLMP_INFO *info = (SLMP_INFO*)context; @@ -5575,36 +5575,36 @@ void status_timeout(unsigned long context) } -unsigned char read_reg(SLMP_INFO * info, unsigned char Addr) +static unsigned char read_reg(SLMP_INFO * info, unsigned char Addr) { CALC_REGADDR(); return *RegAddr; } -void write_reg(SLMP_INFO * info, unsigned char Addr, unsigned char Value) +static void write_reg(SLMP_INFO * info, unsigned char Addr, unsigned char Value) { CALC_REGADDR(); *RegAddr = Value; } -u16 read_reg16(SLMP_INFO * info, unsigned char Addr) +static u16 read_reg16(SLMP_INFO * info, unsigned char Addr) { CALC_REGADDR(); return *((u16 *)RegAddr); } -void write_reg16(SLMP_INFO * info, unsigned char Addr, u16 Value) +static void write_reg16(SLMP_INFO * info, unsigned char Addr, u16 Value) { CALC_REGADDR(); *((u16 *)RegAddr) = Value; } -unsigned char read_status_reg(SLMP_INFO * info) +static unsigned char read_status_reg(SLMP_INFO * info) { unsigned char *RegAddr = (unsigned char *)info->statctrl_base; return *RegAddr; } -void write_control_reg(SLMP_INFO * info) +static void write_control_reg(SLMP_INFO * info) { unsigned char *RegAddr = (unsigned char *)info->statctrl_base; *RegAddr = info->port_array[0]->ctrlreg_value; -- cgit v1.2.3 From e991a2bd4fa0b2f475b67dfe8f33e8ecbdcbb40b Mon Sep 17 00:00:00 2001 From: Alan Cox Date: Mon, 28 Apr 2008 02:14:06 -0700 Subject: Fix tty speed handling on 8250 We try and write the correct speed back but the serial midlayer already mangles the speed on us and that means if we request B0 we report back B9600 when we should not. For now we'll hack around this in the drivers and serial code, pending a better long term solution. Signed-off-by: Alan Cox Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/serial/8250.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/serial/8250.c b/drivers/serial/8250.c index 96a585e1cee8..6d0ce64163e5 100644 --- a/drivers/serial/8250.c +++ b/drivers/serial/8250.c @@ -2228,7 +2228,9 @@ serial8250_set_termios(struct uart_port *port, struct ktermios *termios, } serial8250_set_mctrl(&up->port, up->port.mctrl); spin_unlock_irqrestore(&up->port.lock, flags); - tty_termios_encode_baud_rate(termios, baud, baud); + /* Don't rewrite B0 */ + if (tty_termios_baud_rate(termios)) + tty_termios_encode_baud_rate(termios, baud, baud); } static void -- cgit v1.2.3 From eb424fd21c0931e998156225f2a0910167c3e16c Mon Sep 17 00:00:00 2001 From: Alan Cox Date: Mon, 28 Apr 2008 02:14:07 -0700 Subject: uart_get_baud_rate: stop mangling termios Russell King noticed this one: We have to avoid replacing B0 when we pick a baud rate for a "hung up" port. Ugly but the proper fix is in the tty layer and means changing the tty<->serial interfaces so we will defer that for now. [akpm@linux-foundation.org: fix uninitialised var] Signed-off-by: Alan Cox Cc: Russell King Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/serial/serial_core.c | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/serial/serial_core.c b/drivers/serial/serial_core.c index a9ca03ead3e5..977ce820ce30 100644 --- a/drivers/serial/serial_core.c +++ b/drivers/serial/serial_core.c @@ -329,13 +329,15 @@ EXPORT_SYMBOL(uart_update_timeout); * If it's still invalid, we try 9600 baud. * * Update the @termios structure to reflect the baud rate - * we're actually going to be using. + * we're actually going to be using. Don't do this for the case + * where B0 is requested ("hang up"). */ unsigned int uart_get_baud_rate(struct uart_port *port, struct ktermios *termios, struct ktermios *old, unsigned int min, unsigned int max) { unsigned int try, baud, altbaud = 38400; + int hung_up = 0; upf_t flags = port->flags & UPF_SPD_MASK; if (flags == UPF_SPD_HI) @@ -360,8 +362,10 @@ uart_get_baud_rate(struct uart_port *port, struct ktermios *termios, /* * Special case: B0 rate. */ - if (baud == 0) + if (baud == 0) { + hung_up = 1; baud = 9600; + } if (baud >= min && baud <= max) return baud; @@ -373,7 +377,9 @@ uart_get_baud_rate(struct uart_port *port, struct ktermios *termios, termios->c_cflag &= ~CBAUD; if (old) { baud = tty_termios_baud_rate(old); - tty_termios_encode_baud_rate(termios, baud, baud); + if (!hung_up) + tty_termios_encode_baud_rate(termios, + baud, baud); old = NULL; continue; } @@ -382,7 +388,8 @@ uart_get_baud_rate(struct uart_port *port, struct ktermios *termios, * As a last resort, if the quotient is zero, * default to 9600 bps */ - tty_termios_encode_baud_rate(termios, 9600, 9600); + if (!hung_up) + tty_termios_encode_baud_rate(termios, 9600, 9600); } return 0; -- cgit v1.2.3 From baac58955d6933571f29126a1a95299b421faef7 Mon Sep 17 00:00:00 2001 From: Yoichi Yuasa Date: Mon, 28 Apr 2008 02:14:08 -0700 Subject: serial: add vr41xx_siu_early_setup() for serial console Add vr41xx_siu_early_setup() for serial console. Signed-off-by: Yoichi Yuasa Cc: Ralf Baechle Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/serial/vr41xx_siu.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/drivers/serial/vr41xx_siu.c b/drivers/serial/vr41xx_siu.c index 98ab649c1ff9..bb6ce6bba32f 100644 --- a/drivers/serial/vr41xx_siu.c +++ b/drivers/serial/vr41xx_siu.c @@ -1,7 +1,7 @@ /* * Driver for NEC VR4100 series Serial Interface Unit. * - * Copyright (C) 2004-2007 Yoichi Yuasa + * Copyright (C) 2004-2008 Yoichi Yuasa * * Based on drivers/serial/8250.c, by Russell King. * @@ -840,6 +840,19 @@ static int __devinit siu_console_init(void) console_initcall(siu_console_init); +void __init vr41xx_siu_early_setup(struct uart_port *port) +{ + if (port->type == PORT_UNKNOWN) + return; + + siu_uart_ports[port->line].line = port->line; + siu_uart_ports[port->line].type = port->type; + siu_uart_ports[port->line].uartclk = SIU_BAUD_BASE * 16; + siu_uart_ports[port->line].mapbase = port->mapbase; + siu_uart_ports[port->line].mapbase = port->mapbase; + siu_uart_ports[port->line].ops = &siu_uart_ops; +} + #define SERIAL_VR41XX_CONSOLE &siu_console #else #define SERIAL_VR41XX_CONSOLE NULL -- cgit v1.2.3 From fc3f341b5a1a3f26ec8ed74a38234db7d0d1bae1 Mon Sep 17 00:00:00 2001 From: Yoichi Yuasa Date: Mon, 28 Apr 2008 02:14:08 -0700 Subject: serial: add VR41xx SIU setup for serial console Add VR41xx SIU setup for serial console. Signed-off-by: Yoichi Yuasa Cc: Ralf Baechle Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/mips/vr41xx/common/init.c | 4 +++- arch/mips/vr41xx/common/siu.c | 36 +++++++++++++++++++++++++++++++++++- include/asm-mips/vr41xx/siu.h | 8 +++++++- include/asm-mips/vr41xx/vr41xx.h | 8 +++++++- 4 files changed, 52 insertions(+), 4 deletions(-) diff --git a/arch/mips/vr41xx/common/init.c b/arch/mips/vr41xx/common/init.c index 76d4b5ed3fc0..c64995342ba8 100644 --- a/arch/mips/vr41xx/common/init.c +++ b/arch/mips/vr41xx/common/init.c @@ -1,7 +1,7 @@ /* * init.c, Common initialization routines for NEC VR4100 series. * - * Copyright (C) 2003-2005 Yoichi Yuasa + * Copyright (C) 2003-2008 Yoichi Yuasa * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -53,6 +53,8 @@ void __init plat_time_init(void) void __init plat_mem_setup(void) { iomem_resource_init(); + + vr41xx_siu_setup(); } void __init prom_init(void) diff --git a/arch/mips/vr41xx/common/siu.c b/arch/mips/vr41xx/common/siu.c index b735f45b25f0..654dee6208be 100644 --- a/arch/mips/vr41xx/common/siu.c +++ b/arch/mips/vr41xx/common/siu.c @@ -1,7 +1,7 @@ /* * NEC VR4100 series SIU platform device. * - * Copyright (C) 2007 Yoichi Yuasa + * Copyright (C) 2007-2008 Yoichi Yuasa * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -118,3 +118,37 @@ err_free_device: return retval; } device_initcall(vr41xx_siu_add); + +void __init vr41xx_siu_setup(void) +{ + struct uart_port port; + struct resource *res; + unsigned int *type; + int i; + + switch (current_cpu_type()) { + case CPU_VR4111: + case CPU_VR4121: + type = siu_type1_ports; + res = siu_type1_resource; + break; + case CPU_VR4122: + case CPU_VR4131: + case CPU_VR4133: + type = siu_type2_ports; + res = siu_type2_resource; + break; + default: + return; + } + + for (i = 0; i < SIU_PORTS_MAX; i++) { + port.line = i; + port.type = type[i]; + if (port.type == PORT_UNKNOWN) + break; + port.mapbase = res[i].start; + port.membase = (unsigned char __iomem *)KSEG1ADDR(res[i].start); + vr41xx_siu_early_setup(&port); + } +} diff --git a/include/asm-mips/vr41xx/siu.h b/include/asm-mips/vr41xx/siu.h index 98cdb4096485..da9f6e373409 100644 --- a/include/asm-mips/vr41xx/siu.h +++ b/include/asm-mips/vr41xx/siu.h @@ -1,7 +1,7 @@ /* * Include file for NEC VR4100 series Serial Interface Unit. * - * Copyright (C) 2005 Yoichi Yuasa + * Copyright (C) 2005-2008 Yoichi Yuasa * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License as published by @@ -49,4 +49,10 @@ typedef enum { extern void vr41xx_select_irda_module(irda_module_t module, irda_speed_t speed); +#ifdef CONFIG_SERIAL_VR41XX_CONSOLE +extern void vr41xx_siu_early_setup(struct uart_port *port); +#else +static inline void vr41xx_siu_early_setup(struct uart_port *port) {} +#endif + #endif /* __NEC_VR41XX_SIU_H */ diff --git a/include/asm-mips/vr41xx/vr41xx.h b/include/asm-mips/vr41xx/vr41xx.h index 88b492f6ea9c..22be64971cc6 100644 --- a/include/asm-mips/vr41xx/vr41xx.h +++ b/include/asm-mips/vr41xx/vr41xx.h @@ -7,7 +7,7 @@ * Copyright (C) 2001, 2002 Paul Mundt * Copyright (C) 2002 MontaVista Software, Inc. * Copyright (C) 2002 TimeSys Corp. - * Copyright (C) 2003-2005 Yoichi Yuasa + * Copyright (C) 2003-2008 Yoichi Yuasa * * This program is free software; you can redistribute it and/or modify it * under the terms of the GNU General Public License as published by the @@ -143,4 +143,10 @@ extern void vr41xx_disable_csiint(uint16_t mask); extern void vr41xx_enable_bcuint(void); extern void vr41xx_disable_bcuint(void); +#ifdef CONFIG_SERIAL_VR41XX_CONSOLE +extern void vr41xx_siu_setup(void); +#else +static inline void vr41xx_siu_setup(void) {} +#endif + #endif /* __NEC_VR41XX_H */ -- cgit v1.2.3 From 01c194d9278efc15d4785ff205643e9c0bdcef53 Mon Sep 17 00:00:00 2001 From: Alex Williamson Date: Mon, 28 Apr 2008 02:14:09 -0700 Subject: serial 8250: tighten test for using backup timer Thomas Koeller had reported an issue where a device that had been making use of the UART_BUG_TXEN code in the 8250 driver was mistakenly being caught by the backup timer test, causing the device to work improperly. To fix this, tighten the test requirements to enable the backup timer workaround. The backup timer is really meant to catch UARTs that don't re-assert the THRE interrupt. The expectation is that they do initially assert THRE. This patch clarifies the test. Signed-off-by: Alex Williamson Cc: Thomas Koeller Cc: Russell King Cc: Alan Cox Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/serial/8250.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/serial/8250.c b/drivers/serial/8250.c index 6d0ce64163e5..ea41f2626458 100644 --- a/drivers/serial/8250.c +++ b/drivers/serial/8250.c @@ -1868,6 +1868,7 @@ static int serial8250_startup(struct uart_port *port) } if (is_real_interrupt(up->port.irq)) { + unsigned char iir1; /* * Test for UARTs that do not reassert THRE when the * transmitter is idle and the interrupt has already @@ -1881,7 +1882,7 @@ static int serial8250_startup(struct uart_port *port) wait_for_xmitr(up, UART_LSR_THRE); serial_out_sync(up, UART_IER, UART_IER_THRI); udelay(1); /* allow THRE to set */ - serial_in(up, UART_IIR); + iir1 = serial_in(up, UART_IIR); serial_out(up, UART_IER, 0); serial_out_sync(up, UART_IER, UART_IER_THRI); udelay(1); /* allow a working UART time to re-assert THRE */ @@ -1894,7 +1895,7 @@ static int serial8250_startup(struct uart_port *port) * If the interrupt is not reasserted, setup a timer to * kick the UART on a regular basis. */ - if (iir & UART_IIR_NO_INT) { + if (!(iir1 & UART_IIR_NO_INT) && (iir & UART_IIR_NO_INT)) { pr_debug("ttyS%d - using backup timer\n", port->line); up->timer.function = serial8250_backup_timeout; up->timer.data = (unsigned long)up; -- cgit v1.2.3 From d1ec61e6686c3c137aae33a11518b8e629e9c179 Mon Sep 17 00:00:00 2001 From: Julia Lawall Date: Mon, 28 Apr 2008 02:14:10 -0700 Subject: serial: use time_before, time_before_eq, etc The functions time_before, time_before_eq, time_after, and time_after_eq are more robust for comparing jiffies against other values. A simplified version of the semantic patch making this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // @ change_compare_np @ expression E; @@ ( - jiffies <= E + time_before_eq(jiffies,E) | - jiffies >= E + time_after_eq(jiffies,E) | - jiffies < E + time_before(jiffies,E) | - jiffies > E + time_after(jiffies,E) ) @ include depends on change_compare_np @ @@ #include @ no_include depends on !include && change_compare_np @ @@ #include + #include // Signed-off-by: Julia Lawall Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/serial/68360serial.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/serial/68360serial.c b/drivers/serial/68360serial.c index 2aa6bfe8fdb3..f59463601874 100644 --- a/drivers/serial/68360serial.c +++ b/drivers/serial/68360serial.c @@ -51,6 +51,7 @@ extern int kgdb_output_string (const char* s, unsigned int count); /* #ifdef CONFIG_SERIAL_CONSOLE */ /* This seems to be a post 2.0 thing - mles */ #include +#include /* this defines the index into rs_table for the port to use */ @@ -1729,7 +1730,7 @@ static void rs_360_wait_until_sent(struct tty_struct *tty, int timeout) msleep_interruptible(jiffies_to_msecs(char_time)); if (signal_pending(current)) break; - if (timeout && ((orig_jiffies + timeout) < jiffies)) + if (timeout && (time_after(jiffies, orig_jiffies + timeout))) break; /* The 'tx_cur' is really the next buffer to send. We * have to back up to the previous BD and wait for it -- cgit v1.2.3 From 6e10efefaae45989f2f143bacfef75af55068378 Mon Sep 17 00:00:00 2001 From: Michael Trimarchi Date: Mon, 28 Apr 2008 02:14:11 -0700 Subject: atmel_serial: remove duplicated macro definition After commit 39d4c922b596633da86878b1a5cc881785b8e5fa (atmel_serial: fix uart/console concurrent access) the UART_GET_TCR macro got redefined. This patch removes the duplicated definition. Signed-off-by: michael trimarchi Acked-by: Haavard Skinnemoen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/serial/atmel_serial.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/serial/atmel_serial.c b/drivers/serial/atmel_serial.c index 55492fa095a2..c065a704a93a 100644 --- a/drivers/serial/atmel_serial.c +++ b/drivers/serial/atmel_serial.c @@ -96,7 +96,6 @@ /* PDC registers */ #define UART_PUT_PTCR(port,v) __raw_writel(v, (port)->membase + ATMEL_PDC_PTCR) -#define UART_GET_TCR(port) __raw_readl((port)->membase + ATMEL_PDC_TCR) #define UART_GET_PTSR(port) __raw_readl((port)->membase + ATMEL_PDC_PTSR) #define UART_PUT_RPR(port,v) __raw_writel(v, (port)->membase + ATMEL_PDC_RPR) -- cgit v1.2.3 From d83fd8a26769c75d51a6b05d8dcb3e36302dd8ba Mon Sep 17 00:00:00 2001 From: Andrew Morton Date: Mon, 28 Apr 2008 02:14:13 -0700 Subject: drivers/acpi/thermal.c: fix build with CONFIG_DMI=n drivers/acpi/thermal.c: In function 'acpi_thermal_init': drivers/acpi/thermal.c:1794: error: 'thermal_dmi_table' undeclared (first use in this function) drivers/acpi/thermal.c:1794: error: (Each undeclared identifier is reported only once drivers/acpi/thermal.c:1794: error: for each function it appears in.) Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/acpi/thermal.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/acpi/thermal.c b/drivers/acpi/thermal.c index 1bcecc7dd2ca..766bd25d3376 100644 --- a/drivers/acpi/thermal.c +++ b/drivers/acpi/thermal.c @@ -1710,7 +1710,6 @@ static int acpi_thermal_resume(struct acpi_device *device) return AE_OK; } -#ifdef CONFIG_DMI static int thermal_act(const struct dmi_system_id *d) { if (act == 0) { @@ -1785,7 +1784,6 @@ static struct dmi_system_id thermal_dmi_table[] __initdata = { }, {} }; -#endif /* CONFIG_DMI */ static int __init acpi_thermal_init(void) { -- cgit v1.2.3 From 7ae9392c0a3bc01562361bb21e23dfb2e5c81c5a Mon Sep 17 00:00:00 2001 From: Thomas Petazzoni Date: Mon, 28 Apr 2008 02:14:14 -0700 Subject: x86: configurable DMI scanning code Turn CONFIG_DMI into a selectable option if EMBEDDED is defined, in order to be able to remove the DMI table scanning code if it's not needed, and then reduce the kernel code size. With CONFIG_DMI (i.e before) : text data bss dec hex filename 1076076 128656 98304 1303036 13e1fc vmlinux Without CONFIG_DMI (i.e after) : text data bss dec hex filename 1068092 126308 98304 1292704 13b9a0 vmlinux Result: text data bss dec hex filename -7984 -2348 0 -10332 -285c vmlinux The new option appears in "Processor type and features", only when CONFIG_EMBEDDED is defined. This patch is part of the Linux Tiny project, and is based on previous work done by Matt Mackall . Signed-off-by: Thomas Petazzoni Acked-by: Ingo Molnar Cc: Thomas Gleixner Cc: "H. Anvin" Signed-off-by: Matt Mackall Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/x86/Kconfig | 12 +++++++++--- include/linux/dmi.h | 1 + 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index a8ce13a54764..a12dbb2b93f3 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -86,9 +86,6 @@ config GENERIC_GPIO config ARCH_MAY_HAVE_PC_FDC def_bool y -config DMI - def_bool y - config RWSEM_GENERIC_SPINLOCK def_bool !X86_XADD @@ -485,6 +482,15 @@ config HPET_EMULATE_RTC # Mark as embedded because too many people got it wrong. # The code disables itself when not needed. +config DMI + default y + bool "Enable DMI scanning" if EMBEDDED + help + Enabled scanning of DMI to identify machine quirks. Say Y + here unless you have verified that your setup is not + affected by entries in the DMI blacklist. Required by PNP + BIOS code. + config GART_IOMMU bool "GART IOMMU support" if EMBEDDED default y diff --git a/include/linux/dmi.h b/include/linux/dmi.h index 325acdf5c462..2a063b64133f 100644 --- a/include/linux/dmi.h +++ b/include/linux/dmi.h @@ -90,6 +90,7 @@ static inline int dmi_check_system(const struct dmi_system_id *list) { return 0; static inline const char * dmi_get_system_info(int field) { return NULL; } static inline const struct dmi_device * dmi_find_device(int type, const char *name, const struct dmi_device *from) { return NULL; } +static inline void dmi_scan_machine(void) { return; } static inline int dmi_get_year(int year) { return 0; } static inline int dmi_name_in_vendors(const char *s) { return 0; } #define dmi_available 0 -- cgit v1.2.3 From 608dfddd845da5ab6accef70154c8910529699f7 Mon Sep 17 00:00:00 2001 From: Mike Travis Date: Mon, 28 Apr 2008 02:14:15 -0700 Subject: oprofile: change cpu_buffer from array to per_cpu variable Change cpu_buffer from array to per_cpu variable in oprofile functions. [akpm@linux-foundation.org: coding-style fixes] Cc: Philippe Elie Signed-off-by: Mike Travis Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/oprofile/buffer_sync.c | 2 +- drivers/oprofile/cpu_buffer.c | 16 ++++++++-------- drivers/oprofile/cpu_buffer.h | 3 ++- drivers/oprofile/oprofile_stats.c | 4 ++-- 4 files changed, 13 insertions(+), 12 deletions(-) diff --git a/drivers/oprofile/buffer_sync.c b/drivers/oprofile/buffer_sync.c index b07ba2a14119..9304c4555079 100644 --- a/drivers/oprofile/buffer_sync.c +++ b/drivers/oprofile/buffer_sync.c @@ -491,7 +491,7 @@ typedef enum { */ void sync_buffer(int cpu) { - struct oprofile_cpu_buffer * cpu_buf = &cpu_buffer[cpu]; + struct oprofile_cpu_buffer *cpu_buf = &per_cpu(cpu_buffer, cpu); struct mm_struct *mm = NULL; struct task_struct * new; unsigned long cookie = 0; diff --git a/drivers/oprofile/cpu_buffer.c b/drivers/oprofile/cpu_buffer.c index c93d3d2640ab..efcbf4b4579f 100644 --- a/drivers/oprofile/cpu_buffer.c +++ b/drivers/oprofile/cpu_buffer.c @@ -27,7 +27,7 @@ #include "buffer_sync.h" #include "oprof.h" -struct oprofile_cpu_buffer cpu_buffer[NR_CPUS] __cacheline_aligned; +DEFINE_PER_CPU_SHARED_ALIGNED(struct oprofile_cpu_buffer, cpu_buffer); static void wq_sync_buffer(struct work_struct *work); @@ -39,7 +39,7 @@ void free_cpu_buffers(void) int i; for_each_online_cpu(i) - vfree(cpu_buffer[i].buffer); + vfree(per_cpu(cpu_buffer, i).buffer); } int alloc_cpu_buffers(void) @@ -49,7 +49,7 @@ int alloc_cpu_buffers(void) unsigned long buffer_size = fs_cpu_buffer_size; for_each_online_cpu(i) { - struct oprofile_cpu_buffer * b = &cpu_buffer[i]; + struct oprofile_cpu_buffer *b = &per_cpu(cpu_buffer, i); b->buffer = vmalloc_node(sizeof(struct op_sample) * buffer_size, cpu_to_node(i)); @@ -83,7 +83,7 @@ void start_cpu_work(void) work_enabled = 1; for_each_online_cpu(i) { - struct oprofile_cpu_buffer * b = &cpu_buffer[i]; + struct oprofile_cpu_buffer *b = &per_cpu(cpu_buffer, i); /* * Spread the work by 1 jiffy per cpu so they dont all @@ -100,7 +100,7 @@ void end_cpu_work(void) work_enabled = 0; for_each_online_cpu(i) { - struct oprofile_cpu_buffer * b = &cpu_buffer[i]; + struct oprofile_cpu_buffer *b = &per_cpu(cpu_buffer, i); cancel_delayed_work(&b->work); } @@ -227,7 +227,7 @@ static void oprofile_end_trace(struct oprofile_cpu_buffer * cpu_buf) void oprofile_add_ext_sample(unsigned long pc, struct pt_regs * const regs, unsigned long event, int is_kernel) { - struct oprofile_cpu_buffer * cpu_buf = &cpu_buffer[smp_processor_id()]; + struct oprofile_cpu_buffer *cpu_buf = &__get_cpu_var(cpu_buffer); if (!backtrace_depth) { log_sample(cpu_buf, pc, is_kernel, event); @@ -254,13 +254,13 @@ void oprofile_add_sample(struct pt_regs * const regs, unsigned long event) void oprofile_add_pc(unsigned long pc, int is_kernel, unsigned long event) { - struct oprofile_cpu_buffer * cpu_buf = &cpu_buffer[smp_processor_id()]; + struct oprofile_cpu_buffer *cpu_buf = &__get_cpu_var(cpu_buffer); log_sample(cpu_buf, pc, is_kernel, event); } void oprofile_add_trace(unsigned long pc) { - struct oprofile_cpu_buffer * cpu_buf = &cpu_buffer[smp_processor_id()]; + struct oprofile_cpu_buffer *cpu_buf = &__get_cpu_var(cpu_buffer); if (!cpu_buf->tracing) return; diff --git a/drivers/oprofile/cpu_buffer.h b/drivers/oprofile/cpu_buffer.h index c66c025abe75..13588174311d 100644 --- a/drivers/oprofile/cpu_buffer.h +++ b/drivers/oprofile/cpu_buffer.h @@ -14,6 +14,7 @@ #include #include #include +#include struct task_struct; @@ -47,7 +48,7 @@ struct oprofile_cpu_buffer { struct delayed_work work; } ____cacheline_aligned; -extern struct oprofile_cpu_buffer cpu_buffer[]; +DECLARE_PER_CPU(struct oprofile_cpu_buffer, cpu_buffer); void cpu_buffer_reset(struct oprofile_cpu_buffer * cpu_buf); diff --git a/drivers/oprofile/oprofile_stats.c b/drivers/oprofile/oprofile_stats.c index d1f6d776e9e4..f99b28e7b79a 100644 --- a/drivers/oprofile/oprofile_stats.c +++ b/drivers/oprofile/oprofile_stats.c @@ -23,7 +23,7 @@ void oprofile_reset_stats(void) int i; for_each_possible_cpu(i) { - cpu_buf = &cpu_buffer[i]; + cpu_buf = &per_cpu(cpu_buffer, i); cpu_buf->sample_received = 0; cpu_buf->sample_lost_overflow = 0; cpu_buf->backtrace_aborted = 0; @@ -49,7 +49,7 @@ void oprofile_create_stats_files(struct super_block * sb, struct dentry * root) return; for_each_possible_cpu(i) { - cpu_buf = &cpu_buffer[i]; + cpu_buf = &per_cpu(cpu_buffer, i); snprintf(buf, 10, "cpu%d", i); cpudir = oprofilefs_mkdir(sb, dir, buf); -- cgit v1.2.3 From 79d8c7a8c888a7c2ab9dd4249495b24575b3f9a6 Mon Sep 17 00:00:00 2001 From: Alessandro Guido Date: Mon, 28 Apr 2008 02:14:16 -0700 Subject: spi: use menuconfig for CONFIG_SPI Signed-off-by: Alessandro Guido Signed-off-by: David Brownell Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/spi/Kconfig | 13 ++++++------- 1 file changed, 6 insertions(+), 7 deletions(-) diff --git a/drivers/spi/Kconfig b/drivers/spi/Kconfig index d8107890db15..fae9e8f3d092 100644 --- a/drivers/spi/Kconfig +++ b/drivers/spi/Kconfig @@ -5,11 +5,9 @@ # nobody's needed a slave side API yet. The master-role API is not # fully appropriate there, so it'd need some thought to do well. # -menu "SPI support" - depends on HAS_IOMEM - -config SPI +menuconfig SPI bool "SPI support" + depends on HAS_IOMEM help The "Serial Peripheral Interface" is a low level synchronous protocol. Chips that support SPI can have data transfer rates @@ -28,9 +26,11 @@ config SPI (half duplex), SSP, SSI, and PSP. This driver framework should work with most such devices and controllers. +if SPI + config SPI_DEBUG boolean "Debug support for SPI drivers" - depends on SPI && DEBUG_KERNEL + depends on DEBUG_KERNEL help Say "yes" to enable debug messaging (like dev_dbg and pr_debug), sysfs, and debugfs support in SPI controller and protocol drivers. @@ -245,5 +245,4 @@ config SPI_TLE62X0 # (slave support would go here) -endmenu # "SPI support" - +endif # SPI -- cgit v1.2.3 From cf43369d55a30a0d8f9ef4700c798c72dbd3afb7 Mon Sep 17 00:00:00 2001 From: David Brownell Date: Mon, 28 Apr 2008 02:14:17 -0700 Subject: spi: pxa2xx_spi "sparse" fixes Various cleanups to pxa2xx_spi suggested by "sparse": make sure that register addresess are "void __iomem *", and make a few functions properly static. Signed-off-by: David Brownell Cc: Ned Forrester Cc: Stephen Street Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/spi/pxa2xx_spi.c | 52 ++++++++++++++++++++++++++---------------------- 1 file changed, 28 insertions(+), 24 deletions(-) diff --git a/drivers/spi/pxa2xx_spi.c b/drivers/spi/pxa2xx_spi.c index 147e26a78d64..654bb58be630 100644 --- a/drivers/spi/pxa2xx_spi.c +++ b/drivers/spi/pxa2xx_spi.c @@ -67,8 +67,11 @@ MODULE_ALIAS("platform:pxa2xx-spi"); | SSCR1_SPH | SSCR1_SPO | SSCR1_LBM) #define DEFINE_SSP_REG(reg, off) \ -static inline u32 read_##reg(void *p) { return __raw_readl(p + (off)); } \ -static inline void write_##reg(u32 v, void *p) { __raw_writel(v, p + (off)); } +static inline u32 read_##reg(void const __iomem *p) \ +{ return __raw_readl(p + (off)); } \ +\ +static inline void write_##reg(u32 v, void __iomem *p) \ +{ __raw_writel(v, p + (off)); } DEFINE_SSP_REG(SSCR0, 0x00) DEFINE_SSP_REG(SSCR1, 0x04) @@ -106,7 +109,7 @@ struct driver_data { u32 *null_dma_buf; /* SSP register addresses */ - void *ioaddr; + void __iomem *ioaddr; u32 ssdr_physical; /* SSP masks*/ @@ -173,7 +176,7 @@ static int flush(struct driver_data *drv_data) { unsigned long limit = loops_per_jiffy << 1; - void *reg = drv_data->ioaddr; + void __iomem *reg = drv_data->ioaddr; do { while (read_SSSR(reg) & SSSR_RNE) { @@ -191,7 +194,7 @@ static void null_cs_control(u32 command) static int null_writer(struct driver_data *drv_data) { - void *reg = drv_data->ioaddr; + void __iomem *reg = drv_data->ioaddr; u8 n_bytes = drv_data->n_bytes; if (((read_SSSR(reg) & 0x00000f00) == 0x00000f00) @@ -206,7 +209,7 @@ static int null_writer(struct driver_data *drv_data) static int null_reader(struct driver_data *drv_data) { - void *reg = drv_data->ioaddr; + void __iomem *reg = drv_data->ioaddr; u8 n_bytes = drv_data->n_bytes; while ((read_SSSR(reg) & SSSR_RNE) @@ -220,7 +223,7 @@ static int null_reader(struct driver_data *drv_data) static int u8_writer(struct driver_data *drv_data) { - void *reg = drv_data->ioaddr; + void __iomem *reg = drv_data->ioaddr; if (((read_SSSR(reg) & 0x00000f00) == 0x00000f00) || (drv_data->tx == drv_data->tx_end)) @@ -234,7 +237,7 @@ static int u8_writer(struct driver_data *drv_data) static int u8_reader(struct driver_data *drv_data) { - void *reg = drv_data->ioaddr; + void __iomem *reg = drv_data->ioaddr; while ((read_SSSR(reg) & SSSR_RNE) && (drv_data->rx < drv_data->rx_end)) { @@ -247,7 +250,7 @@ static int u8_reader(struct driver_data *drv_data) static int u16_writer(struct driver_data *drv_data) { - void *reg = drv_data->ioaddr; + void __iomem *reg = drv_data->ioaddr; if (((read_SSSR(reg) & 0x00000f00) == 0x00000f00) || (drv_data->tx == drv_data->tx_end)) @@ -261,7 +264,7 @@ static int u16_writer(struct driver_data *drv_data) static int u16_reader(struct driver_data *drv_data) { - void *reg = drv_data->ioaddr; + void __iomem *reg = drv_data->ioaddr; while ((read_SSSR(reg) & SSSR_RNE) && (drv_data->rx < drv_data->rx_end)) { @@ -274,7 +277,7 @@ static int u16_reader(struct driver_data *drv_data) static int u32_writer(struct driver_data *drv_data) { - void *reg = drv_data->ioaddr; + void __iomem *reg = drv_data->ioaddr; if (((read_SSSR(reg) & 0x00000f00) == 0x00000f00) || (drv_data->tx == drv_data->tx_end)) @@ -288,7 +291,7 @@ static int u32_writer(struct driver_data *drv_data) static int u32_reader(struct driver_data *drv_data) { - void *reg = drv_data->ioaddr; + void __iomem *reg = drv_data->ioaddr; while ((read_SSSR(reg) & SSSR_RNE) && (drv_data->rx < drv_data->rx_end)) { @@ -412,7 +415,7 @@ static void giveback(struct driver_data *drv_data) msg->complete(msg->context); } -static int wait_ssp_rx_stall(void *ioaddr) +static int wait_ssp_rx_stall(void const __iomem *ioaddr) { unsigned long limit = loops_per_jiffy << 1; @@ -432,9 +435,9 @@ static int wait_dma_channel_stop(int channel) return limit; } -void dma_error_stop(struct driver_data *drv_data, const char *msg) +static void dma_error_stop(struct driver_data *drv_data, const char *msg) { - void *reg = drv_data->ioaddr; + void __iomem *reg = drv_data->ioaddr; /* Stop and reset */ DCSR(drv_data->rx_channel) = RESET_DMA_CHANNEL; @@ -456,7 +459,7 @@ void dma_error_stop(struct driver_data *drv_data, const char *msg) static void dma_transfer_complete(struct driver_data *drv_data) { - void *reg = drv_data->ioaddr; + void __iomem *reg = drv_data->ioaddr; struct spi_message *msg = drv_data->cur_msg; /* Clear and disable interrupts on SSP and DMA channels*/ @@ -536,7 +539,7 @@ static void dma_handler(int channel, void *data) static irqreturn_t dma_transfer(struct driver_data *drv_data) { u32 irq_status; - void *reg = drv_data->ioaddr; + void __iomem *reg = drv_data->ioaddr; irq_status = read_SSSR(reg) & drv_data->mask_sr; if (irq_status & SSSR_ROR) { @@ -570,7 +573,7 @@ static irqreturn_t dma_transfer(struct driver_data *drv_data) static void int_error_stop(struct driver_data *drv_data, const char* msg) { - void *reg = drv_data->ioaddr; + void __iomem *reg = drv_data->ioaddr; /* Stop and reset SSP */ write_SSSR(drv_data->clear_sr, reg); @@ -588,7 +591,7 @@ static void int_error_stop(struct driver_data *drv_data, const char* msg) static void int_transfer_complete(struct driver_data *drv_data) { - void *reg = drv_data->ioaddr; + void __iomem *reg = drv_data->ioaddr; /* Stop SSP */ write_SSSR(drv_data->clear_sr, reg); @@ -614,7 +617,7 @@ static void int_transfer_complete(struct driver_data *drv_data) static irqreturn_t interrupt_transfer(struct driver_data *drv_data) { - void *reg = drv_data->ioaddr; + void __iomem *reg = drv_data->ioaddr; u32 irq_mask = (read_SSCR1(reg) & SSCR1_TIE) ? drv_data->mask_sr : drv_data->mask_sr & ~SSSR_TFS; @@ -675,7 +678,7 @@ static irqreturn_t interrupt_transfer(struct driver_data *drv_data) static irqreturn_t ssp_int(int irq, void *dev_id) { struct driver_data *drv_data = dev_id; - void *reg = drv_data->ioaddr; + void __iomem *reg = drv_data->ioaddr; if (!drv_data->cur_msg) { @@ -695,7 +698,8 @@ static irqreturn_t ssp_int(int irq, void *dev_id) return drv_data->transfer_handler(drv_data); } -int set_dma_burst_and_threshold(struct chip_data *chip, struct spi_device *spi, +static int set_dma_burst_and_threshold(struct chip_data *chip, + struct spi_device *spi, u8 bits_per_word, u32 *burst_code, u32 *threshold) { @@ -809,7 +813,7 @@ static void pump_transfers(unsigned long data) struct spi_transfer *previous = NULL; struct chip_data *chip = NULL; struct ssp_device *ssp = drv_data->ssp; - void *reg = drv_data->ioaddr; + void __iomem *reg = drv_data->ioaddr; u32 clk_div = 0; u8 bits = 0; u32 speed = 0; @@ -1338,7 +1342,7 @@ static int __init pxa2xx_spi_probe(struct platform_device *pdev) struct device *dev = &pdev->dev; struct pxa2xx_spi_master *platform_info; struct spi_master *master; - struct driver_data *drv_data = 0; + struct driver_data *drv_data = NULL; struct ssp_device *ssp; int status = 0; -- cgit v1.2.3 From 31a16294261a897ab7f59a5c26e4935a851fd410 Mon Sep 17 00:00:00 2001 From: Randy Dunlap Date: Mon, 28 Apr 2008 02:14:18 -0700 Subject: documentation: move spidev_fdx example to its own source file Move sample source code to its own source file so that it can be used easier and build-tested/check/maintained by anyone. (Makefile changes are in a separate patch for all of Documentation/.) Signed-off-by: Randy Dunlap Acked-by: David Brownell Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/spi/spidev | 168 +---------------------------------------- Documentation/spi/spidev_fdx.c | 158 ++++++++++++++++++++++++++++++++++++++ 2 files changed, 160 insertions(+), 166 deletions(-) create mode 100644 Documentation/spi/spidev_fdx.c diff --git a/Documentation/spi/spidev b/Documentation/spi/spidev index 5c8e1b988a08..ed2da5e5b28a 100644 --- a/Documentation/spi/spidev +++ b/Documentation/spi/spidev @@ -126,8 +126,8 @@ NOTES: FULL DUPLEX CHARACTER DEVICE API ================================ -See the sample program below for one example showing the use of the full -duplex programming interface. (Although it doesn't perform a full duplex +See the spidev_fdx.c sample program for one example showing the use of the +full duplex programming interface. (Although it doesn't perform a full duplex transfer.) The model is the same as that used in the kernel spi_sync() request; the individual transfers offer the same capabilities as are available to kernel drivers (except that it's not asynchronous). @@ -141,167 +141,3 @@ and bitrate for each transfer segment.) To make a full duplex request, provide both rx_buf and tx_buf for the same transfer. It's even OK if those are the same buffer. - - -SAMPLE PROGRAM -============== - --------------------------------- CUT HERE -#include -#include -#include -#include -#include - -#include -#include -#include - -#include -#include - - -static int verbose; - -static void do_read(int fd, int len) -{ - unsigned char buf[32], *bp; - int status; - - /* read at least 2 bytes, no more than 32 */ - if (len < 2) - len = 2; - else if (len > sizeof(buf)) - len = sizeof(buf); - memset(buf, 0, sizeof buf); - - status = read(fd, buf, len); - if (status < 0) { - perror("read"); - return; - } - if (status != len) { - fprintf(stderr, "short read\n"); - return; - } - - printf("read(%2d, %2d): %02x %02x,", len, status, - buf[0], buf[1]); - status -= 2; - bp = buf + 2; - while (status-- > 0) - printf(" %02x", *bp++); - printf("\n"); -} - -static void do_msg(int fd, int len) -{ - struct spi_ioc_transfer xfer[2]; - unsigned char buf[32], *bp; - int status; - - memset(xfer, 0, sizeof xfer); - memset(buf, 0, sizeof buf); - - if (len > sizeof buf) - len = sizeof buf; - - buf[0] = 0xaa; - xfer[0].tx_buf = (__u64) buf; - xfer[0].len = 1; - - xfer[1].rx_buf = (__u64) buf; - xfer[1].len = len; - - status = ioctl(fd, SPI_IOC_MESSAGE(2), xfer); - if (status < 0) { - perror("SPI_IOC_MESSAGE"); - return; - } - - printf("response(%2d, %2d): ", len, status); - for (bp = buf; len; len--) - printf(" %02x", *bp++); - printf("\n"); -} - -static void dumpstat(const char *name, int fd) -{ - __u8 mode, lsb, bits; - __u32 speed; - - if (ioctl(fd, SPI_IOC_RD_MODE, &mode) < 0) { - perror("SPI rd_mode"); - return; - } - if (ioctl(fd, SPI_IOC_RD_LSB_FIRST, &lsb) < 0) { - perror("SPI rd_lsb_fist"); - return; - } - if (ioctl(fd, SPI_IOC_RD_BITS_PER_WORD, &bits) < 0) { - perror("SPI bits_per_word"); - return; - } - if (ioctl(fd, SPI_IOC_RD_MAX_SPEED_HZ, &speed) < 0) { - perror("SPI max_speed_hz"); - return; - } - - printf("%s: spi mode %d, %d bits %sper word, %d Hz max\n", - name, mode, bits, lsb ? "(lsb first) " : "", speed); -} - -int main(int argc, char **argv) -{ - int c; - int readcount = 0; - int msglen = 0; - int fd; - const char *name; - - while ((c = getopt(argc, argv, "hm:r:v")) != EOF) { - switch (c) { - case 'm': - msglen = atoi(optarg); - if (msglen < 0) - goto usage; - continue; - case 'r': - readcount = atoi(optarg); - if (readcount < 0) - goto usage; - continue; - case 'v': - verbose++; - continue; - case 'h': - case '?': -usage: - fprintf(stderr, - "usage: %s [-h] [-m N] [-r N] /dev/spidevB.D\n", - argv[0]); - return 1; - } - } - - if ((optind + 1) != argc) - goto usage; - name = argv[optind]; - - fd = open(name, O_RDWR); - if (fd < 0) { - perror("open"); - return 1; - } - - dumpstat(name, fd); - - if (msglen) - do_msg(fd, msglen); - - if (readcount) - do_read(fd, readcount); - - close(fd); - return 0; -} diff --git a/Documentation/spi/spidev_fdx.c b/Documentation/spi/spidev_fdx.c new file mode 100644 index 000000000000..fc354f760384 --- /dev/null +++ b/Documentation/spi/spidev_fdx.c @@ -0,0 +1,158 @@ +#include +#include +#include +#include +#include + +#include +#include +#include + +#include +#include + + +static int verbose; + +static void do_read(int fd, int len) +{ + unsigned char buf[32], *bp; + int status; + + /* read at least 2 bytes, no more than 32 */ + if (len < 2) + len = 2; + else if (len > sizeof(buf)) + len = sizeof(buf); + memset(buf, 0, sizeof buf); + + status = read(fd, buf, len); + if (status < 0) { + perror("read"); + return; + } + if (status != len) { + fprintf(stderr, "short read\n"); + return; + } + + printf("read(%2d, %2d): %02x %02x,", len, status, + buf[0], buf[1]); + status -= 2; + bp = buf + 2; + while (status-- > 0) + printf(" %02x", *bp++); + printf("\n"); +} + +static void do_msg(int fd, int len) +{ + struct spi_ioc_transfer xfer[2]; + unsigned char buf[32], *bp; + int status; + + memset(xfer, 0, sizeof xfer); + memset(buf, 0, sizeof buf); + + if (len > sizeof buf) + len = sizeof buf; + + buf[0] = 0xaa; + xfer[0].tx_buf = (__u64) buf; + xfer[0].len = 1; + + xfer[1].rx_buf = (__u64) buf; + xfer[1].len = len; + + status = ioctl(fd, SPI_IOC_MESSAGE(2), xfer); + if (status < 0) { + perror("SPI_IOC_MESSAGE"); + return; + } + + printf("response(%2d, %2d): ", len, status); + for (bp = buf; len; len--) + printf(" %02x", *bp++); + printf("\n"); +} + +static void dumpstat(const char *name, int fd) +{ + __u8 mode, lsb, bits; + __u32 speed; + + if (ioctl(fd, SPI_IOC_RD_MODE, &mode) < 0) { + perror("SPI rd_mode"); + return; + } + if (ioctl(fd, SPI_IOC_RD_LSB_FIRST, &lsb) < 0) { + perror("SPI rd_lsb_fist"); + return; + } + if (ioctl(fd, SPI_IOC_RD_BITS_PER_WORD, &bits) < 0) { + perror("SPI bits_per_word"); + return; + } + if (ioctl(fd, SPI_IOC_RD_MAX_SPEED_HZ, &speed) < 0) { + perror("SPI max_speed_hz"); + return; + } + + printf("%s: spi mode %d, %d bits %sper word, %d Hz max\n", + name, mode, bits, lsb ? "(lsb first) " : "", speed); +} + +int main(int argc, char **argv) +{ + int c; + int readcount = 0; + int msglen = 0; + int fd; + const char *name; + + while ((c = getopt(argc, argv, "hm:r:v")) != EOF) { + switch (c) { + case 'm': + msglen = atoi(optarg); + if (msglen < 0) + goto usage; + continue; + case 'r': + readcount = atoi(optarg); + if (readcount < 0) + goto usage; + continue; + case 'v': + verbose++; + continue; + case 'h': + case '?': +usage: + fprintf(stderr, + "usage: %s [-h] [-m N] [-r N] /dev/spidevB.D\n", + argv[0]); + return 1; + } + } + + if ((optind + 1) != argc) + goto usage; + name = argv[optind]; + + fd = open(name, O_RDWR); + if (fd < 0) { + perror("open"); + return 1; + } + + dumpstat(name, fd); + + if (msglen) + do_msg(fd, msglen); + + if (readcount) + do_read(fd, readcount); + + close(fd); + return 0; +} -- cgit v1.2.3 From b687d2a8f8d46921ac5e80bf77967688afce68e2 Mon Sep 17 00:00:00 2001 From: Harvey Harrison Date: Mon, 28 Apr 2008 02:14:19 -0700 Subject: spi: replace remaining __FUNCTION__ occurrences __FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison Acked-by: David Brownell Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/spi/omap_uwire.c | 4 ++-- drivers/spi/spi_bitbang.c | 2 +- drivers/spi/spi_mpc83xx.c | 2 +- drivers/spi/spi_s3c24xx.c | 2 +- drivers/spi/xilinx_spi.c | 8 ++++---- 5 files changed, 9 insertions(+), 9 deletions(-) diff --git a/drivers/spi/omap_uwire.c b/drivers/spi/omap_uwire.c index 5f00bd6500ef..d9ae111c27ae 100644 --- a/drivers/spi/omap_uwire.c +++ b/drivers/spi/omap_uwire.c @@ -151,7 +151,7 @@ static int wait_uwire_csr_flag(u16 mask, u16 val, int might_not_catch) if (time_after(jiffies, max_jiffies)) { printk(KERN_ERR "%s: timeout. reg=%#06x " "mask=%#06x val=%#06x\n", - __FUNCTION__, w, mask, val); + __func__, w, mask, val); return -1; } c++; @@ -437,7 +437,7 @@ static int uwire_setup_transfer(struct spi_device *spi, struct spi_transfer *t) } omap_uwire_configure_mode(spi->chip_select, flags); pr_debug("%s: uwire flags %02x, armxor %lu KHz, SCK %lu KHz\n", - __FUNCTION__, flags, + __func__, flags, clk_get_rate(uwire->ck) / 1000, rate / 1000); status = 0; diff --git a/drivers/spi/spi_bitbang.c b/drivers/spi/spi_bitbang.c index 71e881419cdd..96cc39ecb6e2 100644 --- a/drivers/spi/spi_bitbang.c +++ b/drivers/spi/spi_bitbang.c @@ -214,7 +214,7 @@ int spi_bitbang_setup(struct spi_device *spi) return retval; dev_dbg(&spi->dev, "%s, mode %d, %u bits/w, %u nsec/bit\n", - __FUNCTION__, spi->mode & (SPI_CPOL | SPI_CPHA), + __func__, spi->mode & (SPI_CPOL | SPI_CPHA), spi->bits_per_word, 2 * cs->nsecs); /* NOTE we _need_ to call chipselect() early, ideally with adapter diff --git a/drivers/spi/spi_mpc83xx.c b/drivers/spi/spi_mpc83xx.c index be15a6213205..189f706b9e4b 100644 --- a/drivers/spi/spi_mpc83xx.c +++ b/drivers/spi/spi_mpc83xx.c @@ -310,7 +310,7 @@ static int mpc83xx_spi_setup(struct spi_device *spi) return retval; dev_dbg(&spi->dev, "%s, mode %d, %u bits/w, %u nsec\n", - __FUNCTION__, spi->mode & (SPI_CPOL | SPI_CPHA), + __func__, spi->mode & (SPI_CPOL | SPI_CPHA), spi->bits_per_word, 2 * mpc83xx_spi->nsecs); /* NOTE we _need_ to call chipselect() early, ideally with adapter diff --git a/drivers/spi/spi_s3c24xx.c b/drivers/spi/spi_s3c24xx.c index b7476b888197..34bfb7dd7764 100644 --- a/drivers/spi/spi_s3c24xx.c +++ b/drivers/spi/spi_s3c24xx.c @@ -169,7 +169,7 @@ static int s3c24xx_spi_setup(struct spi_device *spi) } dev_dbg(&spi->dev, "%s: mode %d, %u bpw, %d hz\n", - __FUNCTION__, spi->mode, spi->bits_per_word, + __func__, spi->mode, spi->bits_per_word, spi->max_speed_hz); return 0; diff --git a/drivers/spi/xilinx_spi.c b/drivers/spi/xilinx_spi.c index cf6aef34fe25..113a0468ffcb 100644 --- a/drivers/spi/xilinx_spi.c +++ b/drivers/spi/xilinx_spi.c @@ -151,13 +151,13 @@ static int xilinx_spi_setup_transfer(struct spi_device *spi, hz = (t) ? t->speed_hz : spi->max_speed_hz; if (bits_per_word != 8) { dev_err(&spi->dev, "%s, unsupported bits_per_word=%d\n", - __FUNCTION__, bits_per_word); + __func__, bits_per_word); return -EINVAL; } if (hz && xspi->speed_hz > hz) { dev_err(&spi->dev, "%s, unsupported clock rate %uHz\n", - __FUNCTION__, hz); + __func__, hz); return -EINVAL; } @@ -181,7 +181,7 @@ static int xilinx_spi_setup(struct spi_device *spi) if (spi->mode & ~MODEBITS) { dev_err(&spi->dev, "%s, unsupported mode bits %x\n", - __FUNCTION__, spi->mode & ~MODEBITS); + __func__, spi->mode & ~MODEBITS); return -EINVAL; } @@ -190,7 +190,7 @@ static int xilinx_spi_setup(struct spi_device *spi) return retval; dev_dbg(&spi->dev, "%s, mode %d, %u bits/w, %u nsec/bit\n", - __FUNCTION__, spi->mode & MODEBITS, spi->bits_per_word, 0); + __func__, spi->mode & MODEBITS, spi->bits_per_word, 0); return 0; } -- cgit v1.2.3 From 06719814780da741e7acf587367a86c3965c03a2 Mon Sep 17 00:00:00 2001 From: Atsushi Nemoto Date: Mon, 28 Apr 2008 02:14:19 -0700 Subject: atmel_spi: support zero length transfer A spi transfer with zero length is not invalid. For example, such transfer (len == 0 && delay_usecs != 0) can be used to achieve delay before first CLK edge after chipselect assertion. Signed-off-by: Atsushi Nemoto Cc: Haavard Skinnemoen Cc: David Brownell Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/spi/atmel_spi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/spi/atmel_spi.c b/drivers/spi/atmel_spi.c index 1749a27be066..02c8e305b14f 100644 --- a/drivers/spi/atmel_spi.c +++ b/drivers/spi/atmel_spi.c @@ -616,7 +616,7 @@ static int atmel_spi_transfer(struct spi_device *spi, struct spi_message *msg) return -ESHUTDOWN; list_for_each_entry(xfer, &msg->transfers, transfer_list) { - if (!(xfer->tx_buf || xfer->rx_buf)) { + if (!(xfer->tx_buf || xfer->rx_buf) && xfer->len) { dev_dbg(&spi->dev, "missing rx or tx buf\n"); return -EINVAL; } -- cgit v1.2.3 From 5d9f3f6b7c4c9fe1706006f24f964e7c0fa49fb7 Mon Sep 17 00:00:00 2001 From: Andrea Paterniani Date: Mon, 28 Apr 2008 02:14:21 -0700 Subject: spi: spi_imx updates Updates to the i.MX SPI controller driver: 1) Some comments changed and/or added. 2) End of transfers is now managed on TXFIFO empty interrupt after the last write to TXFIFO. This speeds interrupt execution by removing the wait for TXFIFO to become empty. On TXFIFO empty interrupt the handler needs only to poll for the end of the ongoing transaction (SPI_CONTROL_XCH) to close the transfer. (2.1) Write only transfers are closed flushing RXFIFO. (2.2) Read transfers are closed reading trailing bytes from RXFIFO. (2.3) Read transfers where RXFIFO overrun occurred are closed by flushing RXFIFO and aborting the message. 3) Fifos are now flushed via SPI disable after the end of ongoing transaction. Signed-off-by: Andrea Paterniani Signed-off-by: David Brownell Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/spi/spi_imx.c | 223 ++++++++++++++++++++++++++------------------------ 1 file changed, 114 insertions(+), 109 deletions(-) diff --git a/drivers/spi/spi_imx.c b/drivers/spi/spi_imx.c index d4ba640366b6..c730d05bfeb6 100644 --- a/drivers/spi/spi_imx.c +++ b/drivers/spi/spi_imx.c @@ -270,19 +270,26 @@ struct chip_data { static void pump_messages(struct work_struct *work); -static int flush(struct driver_data *drv_data) +static void flush(struct driver_data *drv_data) { - unsigned long limit = loops_per_jiffy << 1; void __iomem *regs = drv_data->regs; - volatile u32 d; + u32 control; dev_dbg(&drv_data->pdev->dev, "flush\n"); + + /* Wait for end of transaction */ do { - while (readl(regs + SPI_INT_STATUS) & SPI_STATUS_RR) - d = readl(regs + SPI_RXDATA); - } while ((readl(regs + SPI_CONTROL) & SPI_CONTROL_XCH) && limit--); + control = readl(regs + SPI_CONTROL); + } while (control & SPI_CONTROL_XCH); + + /* Release chip select if requested, transfer delays are + handled in pump_transfers */ + if (drv_data->cs_change) + drv_data->cs_control(SPI_CS_DEASSERT); - return limit; + /* Disable SPI to flush FIFOs */ + writel(control & ~SPI_CONTROL_SPIEN, regs + SPI_CONTROL); + writel(control, regs + SPI_CONTROL); } static void restore_state(struct driver_data *drv_data) @@ -570,6 +577,7 @@ static void giveback(struct spi_message *message, struct driver_data *drv_data) writel(0, regs + SPI_INT_STATUS); writel(0, regs + SPI_DMA); + /* Unconditioned deselct */ drv_data->cs_control(SPI_CS_DEASSERT); message->state = NULL; @@ -592,13 +600,10 @@ static void dma_err_handler(int channel, void *data, int errcode) /* Disable both rx and tx dma channels */ imx_dma_disable(drv_data->rx_channel); imx_dma_disable(drv_data->tx_channel); - - if (flush(drv_data) == 0) - dev_err(&drv_data->pdev->dev, - "dma_err_handler - flush failed\n"); - unmap_dma_buffers(drv_data); + flush(drv_data); + msg->state = ERROR_STATE; tasklet_schedule(&drv_data->pump_transfers); } @@ -612,8 +617,7 @@ static void dma_tx_handler(int channel, void *data) imx_dma_disable(channel); /* Now waits for TX FIFO empty */ - writel(readl(drv_data->regs + SPI_INT_STATUS) | SPI_INTEN_TE, - drv_data->regs + SPI_INT_STATUS); + writel(SPI_INTEN_TE, drv_data->regs + SPI_INT_STATUS); } static irqreturn_t dma_transfer(struct driver_data *drv_data) @@ -621,19 +625,18 @@ static irqreturn_t dma_transfer(struct driver_data *drv_data) u32 status; struct spi_message *msg = drv_data->cur_msg; void __iomem *regs = drv_data->regs; - unsigned long limit; status = readl(regs + SPI_INT_STATUS); - if ((status & SPI_INTEN_RO) && (status & SPI_STATUS_RO)) { + if ((status & (SPI_INTEN_RO | SPI_STATUS_RO)) + == (SPI_INTEN_RO | SPI_STATUS_RO)) { writel(status & ~SPI_INTEN, regs + SPI_INT_STATUS); + imx_dma_disable(drv_data->tx_channel); imx_dma_disable(drv_data->rx_channel); unmap_dma_buffers(drv_data); - if (flush(drv_data) == 0) - dev_err(&drv_data->pdev->dev, - "dma_transfer - flush failed\n"); + flush(drv_data); dev_warn(&drv_data->pdev->dev, "dma_transfer - fifo overun\n"); @@ -649,20 +652,17 @@ static irqreturn_t dma_transfer(struct driver_data *drv_data) if (drv_data->rx) { /* Wait end of transfer before read trailing data */ - limit = loops_per_jiffy << 1; - while ((readl(regs + SPI_CONTROL) & SPI_CONTROL_XCH) && - limit--); - - if (limit == 0) - dev_err(&drv_data->pdev->dev, - "dma_transfer - end of tx failed\n"); - else - dev_dbg(&drv_data->pdev->dev, - "dma_transfer - end of tx\n"); + while (readl(regs + SPI_CONTROL) & SPI_CONTROL_XCH) + cpu_relax(); imx_dma_disable(drv_data->rx_channel); unmap_dma_buffers(drv_data); + /* Release chip select if requested, transfer delays are + handled in pump_transfers() */ + if (drv_data->cs_change) + drv_data->cs_control(SPI_CS_DEASSERT); + /* Calculate number of trailing data and read them */ dev_dbg(&drv_data->pdev->dev, "dma_transfer - test = 0x%08X\n", @@ -676,19 +676,12 @@ static irqreturn_t dma_transfer(struct driver_data *drv_data) /* Write only transfer */ unmap_dma_buffers(drv_data); - if (flush(drv_data) == 0) - dev_err(&drv_data->pdev->dev, - "dma_transfer - flush failed\n"); + flush(drv_data); } /* End of transfer, update total byte transfered */ msg->actual_length += drv_data->len; - /* Release chip select if requested, transfer delays are - handled in pump_transfers() */ - if (drv_data->cs_change) - drv_data->cs_control(SPI_CS_DEASSERT); - /* Move to next transfer */ msg->state = next_transfer(drv_data); @@ -711,44 +704,43 @@ static irqreturn_t interrupt_wronly_transfer(struct driver_data *drv_data) status = readl(regs + SPI_INT_STATUS); - while (status & SPI_STATUS_TH) { + if (status & SPI_INTEN_TE) { + /* TXFIFO Empty Interrupt on the last transfered word */ + writel(status & ~SPI_INTEN, regs + SPI_INT_STATUS); dev_dbg(&drv_data->pdev->dev, - "interrupt_wronly_transfer - status = 0x%08X\n", status); + "interrupt_wronly_transfer - end of tx\n"); - /* Pump data */ - if (write(drv_data)) { - writel(readl(regs + SPI_INT_STATUS) & ~SPI_INTEN, - regs + SPI_INT_STATUS); + flush(drv_data); - dev_dbg(&drv_data->pdev->dev, - "interrupt_wronly_transfer - end of tx\n"); + /* Update total byte transfered */ + msg->actual_length += drv_data->len; - if (flush(drv_data) == 0) - dev_err(&drv_data->pdev->dev, - "interrupt_wronly_transfer - " - "flush failed\n"); + /* Move to next transfer */ + msg->state = next_transfer(drv_data); - /* End of transfer, update total byte transfered */ - msg->actual_length += drv_data->len; + /* Schedule transfer tasklet */ + tasklet_schedule(&drv_data->pump_transfers); - /* Release chip select if requested, transfer delays are - handled in pump_transfers */ - if (drv_data->cs_change) - drv_data->cs_control(SPI_CS_DEASSERT); + return IRQ_HANDLED; + } else { + while (status & SPI_STATUS_TH) { + dev_dbg(&drv_data->pdev->dev, + "interrupt_wronly_transfer - status = 0x%08X\n", + status); - /* Move to next transfer */ - msg->state = next_transfer(drv_data); + /* Pump data */ + if (write(drv_data)) { + /* End of TXFIFO writes, + now wait until TXFIFO is empty */ + writel(SPI_INTEN_TE, regs + SPI_INT_STATUS); + return IRQ_HANDLED; + } - /* Schedule transfer tasklet */ - tasklet_schedule(&drv_data->pump_transfers); + status = readl(regs + SPI_INT_STATUS); - return IRQ_HANDLED; + /* We did something */ + handled = IRQ_HANDLED; } - - status = readl(regs + SPI_INT_STATUS); - - /* We did something */ - handled = IRQ_HANDLED; } return handled; @@ -758,45 +750,31 @@ static irqreturn_t interrupt_transfer(struct driver_data *drv_data) { struct spi_message *msg = drv_data->cur_msg; void __iomem *regs = drv_data->regs; - u32 status; + u32 status, control; irqreturn_t handled = IRQ_NONE; unsigned long limit; status = readl(regs + SPI_INT_STATUS); - while (status & (SPI_STATUS_TH | SPI_STATUS_RO)) { + if (status & SPI_INTEN_TE) { + /* TXFIFO Empty Interrupt on the last transfered word */ + writel(status & ~SPI_INTEN, regs + SPI_INT_STATUS); dev_dbg(&drv_data->pdev->dev, - "interrupt_transfer - status = 0x%08X\n", status); - - if (status & SPI_STATUS_RO) { - writel(readl(regs + SPI_INT_STATUS) & ~SPI_INTEN, - regs + SPI_INT_STATUS); - - dev_warn(&drv_data->pdev->dev, - "interrupt_transfer - fifo overun\n" - " data not yet written = %d\n" - " data not yet read = %d\n", - data_to_write(drv_data), - data_to_read(drv_data)); - - if (flush(drv_data) == 0) - dev_err(&drv_data->pdev->dev, - "interrupt_transfer - flush failed\n"); - - msg->state = ERROR_STATE; - tasklet_schedule(&drv_data->pump_transfers); + "interrupt_transfer - end of tx\n"); - return IRQ_HANDLED; - } - - /* Pump data */ - read(drv_data); - if (write(drv_data)) { - writel(readl(regs + SPI_INT_STATUS) & ~SPI_INTEN, - regs + SPI_INT_STATUS); + if (msg->state == ERROR_STATE) { + /* RXFIFO overrun was detected and message aborted */ + flush(drv_data); + } else { + /* Wait for end of transaction */ + do { + control = readl(regs + SPI_CONTROL); + } while (control & SPI_CONTROL_XCH); - dev_dbg(&drv_data->pdev->dev, - "interrupt_transfer - end of tx\n"); + /* Release chip select if requested, transfer delays are + handled in pump_transfers */ + if (drv_data->cs_change) + drv_data->cs_control(SPI_CS_DEASSERT); /* Read trailing bytes */ limit = loops_per_jiffy << 1; @@ -810,27 +788,54 @@ static irqreturn_t interrupt_transfer(struct driver_data *drv_data) dev_dbg(&drv_data->pdev->dev, "interrupt_transfer - end of rx\n"); - /* End of transfer, update total byte transfered */ + /* Update total byte transfered */ msg->actual_length += drv_data->len; - /* Release chip select if requested, transfer delays are - handled in pump_transfers */ - if (drv_data->cs_change) - drv_data->cs_control(SPI_CS_DEASSERT); - /* Move to next transfer */ msg->state = next_transfer(drv_data); + } - /* Schedule transfer tasklet */ - tasklet_schedule(&drv_data->pump_transfers); + /* Schedule transfer tasklet */ + tasklet_schedule(&drv_data->pump_transfers); - return IRQ_HANDLED; - } + return IRQ_HANDLED; + } else { + while (status & (SPI_STATUS_TH | SPI_STATUS_RO)) { + dev_dbg(&drv_data->pdev->dev, + "interrupt_transfer - status = 0x%08X\n", + status); + + if (status & SPI_STATUS_RO) { + /* RXFIFO overrun, abort message end wait + until TXFIFO is empty */ + writel(SPI_INTEN_TE, regs + SPI_INT_STATUS); + + dev_warn(&drv_data->pdev->dev, + "interrupt_transfer - fifo overun\n" + " data not yet written = %d\n" + " data not yet read = %d\n", + data_to_write(drv_data), + data_to_read(drv_data)); + + msg->state = ERROR_STATE; + + return IRQ_HANDLED; + } - status = readl(regs + SPI_INT_STATUS); + /* Pump data */ + read(drv_data); + if (write(drv_data)) { + /* End of TXFIFO writes, + now wait until TXFIFO is empty */ + writel(SPI_INTEN_TE, regs + SPI_INT_STATUS); + return IRQ_HANDLED; + } - /* We did something */ - handled = IRQ_HANDLED; + status = readl(regs + SPI_INT_STATUS); + + /* We did something */ + handled = IRQ_HANDLED; + } } return handled; -- cgit v1.2.3 From 61711f8fd8180e458cfb6846bcf4fc636a95f3db Mon Sep 17 00:00:00 2001 From: Magnus Damm Date: Mon, 28 Apr 2008 02:14:22 -0700 Subject: sm501: add uart support This patch extends the sm501 mfd with 8250 uart support. We're currently doing this in the board specific r2d-1 code already, but it would be nice to do move things into the mfd since it's more chip specific than board specific. Signed-off-by: Magnus Damm Cc: Ben Dooks Cc: Paul Mundt Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/mfd/sm501.c | 84 ++++++++++++++++++++++++++++++++++++--------- include/linux/serial_8250.h | 1 + 2 files changed, 69 insertions(+), 16 deletions(-) diff --git a/drivers/mfd/sm501.c b/drivers/mfd/sm501.c index 13bac53db69a..6e655b4c6682 100644 --- a/drivers/mfd/sm501.c +++ b/drivers/mfd/sm501.c @@ -22,6 +22,7 @@ #include #include +#include #include @@ -723,13 +724,14 @@ static void sm501_device_release(struct device *dev) */ static struct platform_device * -sm501_create_subdev(struct sm501_devdata *sm, - char *name, unsigned int res_count) +sm501_create_subdev(struct sm501_devdata *sm, char *name, + unsigned int res_count, unsigned int platform_data_size) { struct sm501_device *smdev; smdev = kzalloc(sizeof(struct sm501_device) + - sizeof(struct resource) * res_count, GFP_KERNEL); + (sizeof(struct resource) * res_count) + + platform_data_size, GFP_KERNEL); if (!smdev) return NULL; @@ -737,11 +739,15 @@ sm501_create_subdev(struct sm501_devdata *sm, smdev->pdev.name = name; smdev->pdev.id = sm->pdev_id; - smdev->pdev.resource = (struct resource *)(smdev+1); - smdev->pdev.num_resources = res_count; - smdev->pdev.dev.parent = sm->dev; + if (res_count) { + smdev->pdev.resource = (struct resource *)(smdev+1); + smdev->pdev.num_resources = res_count; + } + if (platform_data_size) + smdev->pdev.dev.platform_data = (void *)(smdev+1); + return &smdev->pdev; } @@ -829,7 +835,7 @@ static int sm501_register_usbhost(struct sm501_devdata *sm, { struct platform_device *pdev; - pdev = sm501_create_subdev(sm, "sm501-usb", 3); + pdev = sm501_create_subdev(sm, "sm501-usb", 3, 0); if (!pdev) return -ENOMEM; @@ -840,12 +846,55 @@ static int sm501_register_usbhost(struct sm501_devdata *sm, return sm501_register_device(sm, pdev); } +static void sm501_setup_uart_data(struct sm501_devdata *sm, + struct plat_serial8250_port *uart_data, + unsigned int offset) +{ + uart_data->membase = sm->regs + offset; + uart_data->mapbase = sm->io_res->start + offset; + uart_data->iotype = UPIO_MEM; + uart_data->irq = sm->irq; + uart_data->flags = UPF_BOOT_AUTOCONF | UPF_SKIP_TEST | UPF_SHARE_IRQ; + uart_data->regshift = 2; + uart_data->uartclk = (9600 * 16); +} + +static int sm501_register_uart(struct sm501_devdata *sm, int devices) +{ + struct platform_device *pdev; + struct plat_serial8250_port *uart_data; + + pdev = sm501_create_subdev(sm, "serial8250", 0, + sizeof(struct plat_serial8250_port) * 3); + if (!pdev) + return -ENOMEM; + + uart_data = pdev->dev.platform_data; + + if (devices & SM501_USE_UART0) { + sm501_setup_uart_data(sm, uart_data++, 0x30000); + sm501_unit_power(sm->dev, SM501_GATE_UART0, 1); + sm501_modify_reg(sm->dev, SM501_IRQ_MASK, 1 << 12, 0); + sm501_modify_reg(sm->dev, SM501_GPIO63_32_CONTROL, 0x01e0, 0); + } + if (devices & SM501_USE_UART1) { + sm501_setup_uart_data(sm, uart_data++, 0x30020); + sm501_unit_power(sm->dev, SM501_GATE_UART1, 1); + sm501_modify_reg(sm->dev, SM501_IRQ_MASK, 1 << 13, 0); + sm501_modify_reg(sm->dev, SM501_GPIO63_32_CONTROL, 0x1e00, 0); + } + + pdev->id = PLAT8250_DEV_SM501; + + return sm501_register_device(sm, pdev); +} + static int sm501_register_display(struct sm501_devdata *sm, resource_size_t *mem_avail) { struct platform_device *pdev; - pdev = sm501_create_subdev(sm, "sm501-fb", 4); + pdev = sm501_create_subdev(sm, "sm501-fb", 4, 0); if (!pdev) return -ENOMEM; @@ -963,6 +1012,7 @@ static unsigned int sm501_mem_local[] = { static int sm501_init_dev(struct sm501_devdata *sm) { + struct sm501_initdata *idata; resource_size_t mem_avail; unsigned long dramctrl; unsigned long devid; @@ -980,6 +1030,9 @@ static int sm501_init_dev(struct sm501_devdata *sm) return -EINVAL; } + /* disable irqs */ + writel(0, sm->regs + SM501_IRQ_MASK); + dramctrl = readl(sm->regs + SM501_DRAM_CONTROL); mem_avail = sm501_mem_local[(dramctrl >> 13) & 0x7]; @@ -998,15 +1051,14 @@ static int sm501_init_dev(struct sm501_devdata *sm) /* check to see if we have some device initialisation */ - if (sm->platdata) { - struct sm501_platdata *pdata = sm->platdata; + idata = sm->platdata ? sm->platdata->init : NULL; + if (idata) { + sm501_init_regs(sm, idata); - if (pdata->init) { - sm501_init_regs(sm, sm->platdata->init); - - if (pdata->init->devices & SM501_USE_USB_HOST) - sm501_register_usbhost(sm, &mem_avail); - } + if (idata->devices & SM501_USE_USB_HOST) + sm501_register_usbhost(sm, &mem_avail); + if (idata->devices & (SM501_USE_UART0 | SM501_USE_UART1)) + sm501_register_uart(sm, idata->devices); } ret = sm501_check_clocks(sm); diff --git a/include/linux/serial_8250.h b/include/linux/serial_8250.h index 00b65c0a82ca..3d37c94abbc8 100644 --- a/include/linux/serial_8250.h +++ b/include/linux/serial_8250.h @@ -46,6 +46,7 @@ enum { PLAT8250_DEV_HUB6, PLAT8250_DEV_MCA, PLAT8250_DEV_AU1X00, + PLAT8250_DEV_SM501, }; /* -- cgit v1.2.3 From f7440b0ecdeb3a04d07c546d02d29700d2a574b7 Mon Sep 17 00:00:00 2001 From: "Robert P. J. Day" Date: Mon, 28 Apr 2008 02:14:24 -0700 Subject: mfd: use shorter set_current_state() Since this routine declares a separate "tsk" pointer for no other reason than to call set_task_state(), get rid of it and just invoke set_current_state(). Signed-off-by: Robert P. J. Day Cc: Russell King Cc: Dmitry Torokhov Cc: Nicolas Pitre Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/mfd/ucb1x00-ts.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/drivers/mfd/ucb1x00-ts.c b/drivers/mfd/ucb1x00-ts.c index 5e859486eaf8..ad34e2d22524 100644 --- a/drivers/mfd/ucb1x00-ts.c +++ b/drivers/mfd/ucb1x00-ts.c @@ -204,8 +204,7 @@ static inline int ucb1x00_ts_pen_down(struct ucb1x00_ts *ts) static int ucb1x00_thread(void *_ts) { struct ucb1x00_ts *ts = _ts; - struct task_struct *tsk = current; - DECLARE_WAITQUEUE(wait, tsk); + DECLARE_WAITQUEUE(wait, current); int valid = 0; set_freezable(); @@ -234,7 +233,7 @@ static int ucb1x00_thread(void *_ts) if (ucb1x00_ts_pen_down(ts)) { - set_task_state(tsk, TASK_INTERRUPTIBLE); + set_current_state(TASK_INTERRUPTIBLE); ucb1x00_enable_irq(ts->ucb, UCB_IRQ_TSPX, machine_is_collie() ? UCB_RISING : UCB_FALLING); ucb1x00_disable(ts->ucb); @@ -262,7 +261,7 @@ static int ucb1x00_thread(void *_ts) valid = 1; } - set_task_state(tsk, TASK_INTERRUPTIBLE); + set_current_state(TASK_INTERRUPTIBLE); timeout = HZ / 100; } -- cgit v1.2.3 From 0341a4d0fdd2a0a3d9e2bb3a9afef9f8292c8502 Mon Sep 17 00:00:00 2001 From: Karl Dahlke Date: Mon, 28 Apr 2008 02:14:25 -0700 Subject: VT notifier extension for accessibility Some accessibility modules need to be able to catch the output on the console before the VT interpretation, and possibly swallow it. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Samuel Thibault Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/char/vt.c | 8 ++++++++ include/linux/notifier.h | 1 + 2 files changed, 9 insertions(+) diff --git a/drivers/char/vt.c b/drivers/char/vt.c index 9b58b894f823..df4c3ead9e2b 100644 --- a/drivers/char/vt.c +++ b/drivers/char/vt.c @@ -2054,6 +2054,7 @@ static int do_con_write(struct tty_struct *tty, const unsigned char *buf, int co unsigned long draw_from = 0, draw_to = 0; struct vc_data *vc; unsigned char vc_attr; + struct vt_notifier_param param; uint8_t rescan; uint8_t inverse; uint8_t width; @@ -2113,6 +2114,8 @@ static int do_con_write(struct tty_struct *tty, const unsigned char *buf, int co if (IS_FG(vc)) hide_cursor(vc); + param.vc = vc; + while (!tty->stopped && count) { int orig = *buf; c = orig; @@ -2201,6 +2204,11 @@ rescan_last_byte: tc = vc->vc_translate[vc->vc_toggle_meta ? (c | 0x80) : c]; } + param.c = tc; + if (atomic_notifier_call_chain(&vt_notifier_list, VT_PREWRITE, + ¶m) == NOTIFY_STOP) + continue; + /* If the original code was a control character we * only allow a glyph to be displayed if the code is * not normally used (such as for cursor movement) or diff --git a/include/linux/notifier.h b/include/linux/notifier.h index f4df40038f0c..20dfed590183 100644 --- a/include/linux/notifier.h +++ b/include/linux/notifier.h @@ -247,6 +247,7 @@ extern struct blocking_notifier_head reboot_notifier_list; #define VT_DEALLOCATE 0x0002 /* Console will be deallocated */ #define VT_WRITE 0x0003 /* A char got output */ #define VT_UPDATE 0x0004 /* A bigger update occurred */ +#define VT_PREWRITE 0x0005 /* A char is about to be written to the console */ #endif /* __KERNEL__ */ #endif /* _LINUX_NOTIFIER_H */ -- cgit v1.2.3 From 3d8d996e0ca5b4093203d3f050b0f70b5c949ae8 Mon Sep 17 00:00:00 2001 From: Srinivasa Ds Date: Mon, 28 Apr 2008 02:14:26 -0700 Subject: kprobes: prevent probing of preempt_schedule() Prohibit users from probing preempt_schedule(). One way of prohibiting the user from probing functions is by marking such functions with __kprobes. But this method doesn't work for those functions, which are already marked to different section like preempt_schedule() (belongs to __sched section). So we use blacklist approach to refuse user from probing these functions. In blacklist approach we populate the blacklisted function's starting address and its size in kprobe_blacklist structure. Then we verify the user specified address against start and end of the blacklisted function. So any attempt to register probe on blacklisted functions will be rejected. [akpm@linux-foundation.org: build fix] [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Srinivasa DS Signed-off-by: Ananth N Mavinakayanahalli Signed-off-by: Jim Keniston Cc: Dave Hansen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/kprobes.h | 7 +++++++ kernel/kprobes.c | 52 +++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 59 insertions(+) diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h index 0f28486f6360..cd507ab4fed7 100644 --- a/include/linux/kprobes.h +++ b/include/linux/kprobes.h @@ -173,6 +173,13 @@ struct kretprobe_blackpoint { const char *name; void *addr; }; + +struct kprobe_blackpoint { + const char *name; + unsigned long start_addr; + unsigned long range; +}; + extern struct kretprobe_blackpoint kretprobe_blacklist[]; static inline void kretprobe_assert(struct kretprobe_instance *ri, diff --git a/kernel/kprobes.c b/kernel/kprobes.c index fcfb580c3afc..f02a4311768b 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -72,6 +72,18 @@ DEFINE_MUTEX(kprobe_mutex); /* Protects kprobe_table */ DEFINE_SPINLOCK(kretprobe_lock); /* Protects kretprobe_inst_table */ static DEFINE_PER_CPU(struct kprobe *, kprobe_instance) = NULL; +/* + * Normally, functions that we'd want to prohibit kprobes in, are marked + * __kprobes. But, there are cases where such functions already belong to + * a different section (__sched for preempt_schedule) + * + * For such cases, we now have a blacklist + */ +struct kprobe_blackpoint kprobe_blacklist[] = { + {"preempt_schedule",}, + {NULL} /* Terminator */ +}; + #ifdef __ARCH_WANT_KPROBES_INSN_SLOT /* * kprobe->ainsn.insn points to the copy of the instruction to be @@ -492,9 +504,22 @@ static int __kprobes register_aggr_kprobe(struct kprobe *old_p, static int __kprobes in_kprobes_functions(unsigned long addr) { + struct kprobe_blackpoint *kb; + if (addr >= (unsigned long)__kprobes_text_start && addr < (unsigned long)__kprobes_text_end) return -EINVAL; + /* + * If there exists a kprobe_blacklist, verify and + * fail any probe registration in the prohibited area + */ + for (kb = kprobe_blacklist; kb->name != NULL; kb++) { + if (kb->start_addr) { + if (addr >= kb->start_addr && + addr < (kb->start_addr + kb->range)) + return -EINVAL; + } + } return 0; } @@ -811,6 +836,11 @@ void __kprobes unregister_kretprobe(struct kretprobe *rp) static int __init init_kprobes(void) { int i, err = 0; + unsigned long offset = 0, size = 0; + char *modname, namebuf[128]; + const char *symbol_name; + void *addr; + struct kprobe_blackpoint *kb; /* FIXME allocate the probe table, currently defined statically */ /* initialize all list heads */ @@ -819,6 +849,28 @@ static int __init init_kprobes(void) INIT_HLIST_HEAD(&kretprobe_inst_table[i]); } + /* + * Lookup and populate the kprobe_blacklist. + * + * Unlike the kretprobe blacklist, we'll need to determine + * the range of addresses that belong to the said functions, + * since a kprobe need not necessarily be at the beginning + * of a function. + */ + for (kb = kprobe_blacklist; kb->name != NULL; kb++) { + kprobe_lookup_name(kb->name, addr); + if (!addr) + continue; + + kb->start_addr = (unsigned long)addr; + symbol_name = kallsyms_lookup(kb->start_addr, + &size, &offset, &modname, namebuf); + if (!symbol_name) + kb->range = 0; + else + kb->range = size; + } + if (kretprobe_blacklist_size) { /* lookup the function address from its name */ for (i = 0; kretprobe_blacklist[i].name != NULL; i++) { -- cgit v1.2.3 From 99602572812442d47403d85f376ad51298dd82a6 Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Mon, 28 Apr 2008 02:14:27 -0700 Subject: list.h: add list_is_singular() Add list_is_singular() to check a list has just one entry. list_is_singular() is useful to check whether a list_head which have been temporarily allocated for listing objects can be released or not. Signed-off-by: Masami Hiramatsu Cc: Peter Zijlstra Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/list.h | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/include/linux/list.h b/include/linux/list.h index dac16f99c701..b4a939b6b625 100644 --- a/include/linux/list.h +++ b/include/linux/list.h @@ -319,6 +319,15 @@ static inline int list_empty_careful(const struct list_head *head) return (next == head) && (next == head->prev); } +/** + * list_is_singular - tests whether a list has just one entry. + * @head: the list to test. + */ +static inline int list_is_singular(const struct list_head *head) +{ + return !list_empty(head) && (head->next == head->prev); +} + static inline void __list_splice(struct list_head *list, struct list_head *head) { -- cgit v1.2.3 From 9861668f747895608cea425f8457989d8dd2edf2 Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Mon, 28 Apr 2008 02:14:28 -0700 Subject: kprobes: add (un)register_kprobes for batch registration Introduce unregister_/register_kprobes() for kprobe batch registration. This can reduce waiting time for synchronized_sched() when a lot of probes have to be unregistered at once. Signed-off-by: Masami Hiramatsu Cc: Ananth N Mavinakayanahalli Cc: Jim Keniston Cc: Prasanna S Panchamukhi Cc: Shaohua Li Cc: David Miller Cc: "Frank Ch. Eigler" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/kprobes.h | 9 ++++ kernel/kprobes.c | 124 +++++++++++++++++++++++++++++++++--------------- 2 files changed, 96 insertions(+), 37 deletions(-) diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h index cd507ab4fed7..2ba7df645a84 100644 --- a/include/linux/kprobes.h +++ b/include/linux/kprobes.h @@ -234,6 +234,8 @@ static inline struct kprobe_ctlblk *get_kprobe_ctlblk(void) int register_kprobe(struct kprobe *p); void unregister_kprobe(struct kprobe *p); +int register_kprobes(struct kprobe **kps, int num); +void unregister_kprobes(struct kprobe **kps, int num); int setjmp_pre_handler(struct kprobe *, struct pt_regs *); int longjmp_break_handler(struct kprobe *, struct pt_regs *); int register_jprobe(struct jprobe *p); @@ -261,9 +263,16 @@ static inline int register_kprobe(struct kprobe *p) { return -ENOSYS; } +static inline int register_kprobes(struct kprobe **kps, int num) +{ + return -ENOSYS; +} static inline void unregister_kprobe(struct kprobe *p) { } +static inline void unregister_kprobes(struct kprobe **kps, int num) +{ +} static inline int register_jprobe(struct jprobe *p) { return -ENOSYS; diff --git a/kernel/kprobes.c b/kernel/kprobes.c index f02a4311768b..76275fc025a5 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -580,6 +580,7 @@ static int __kprobes __register_kprobe(struct kprobe *p, } p->nmissed = 0; + INIT_LIST_HEAD(&p->list); mutex_lock(&kprobe_mutex); old_p = get_kprobe(p->addr); if (old_p) { @@ -606,35 +607,28 @@ out: return ret; } -int __kprobes register_kprobe(struct kprobe *p) -{ - return __register_kprobe(p, (unsigned long)__builtin_return_address(0)); -} - -void __kprobes unregister_kprobe(struct kprobe *p) +/* + * Unregister a kprobe without a scheduler synchronization. + */ +static int __kprobes __unregister_kprobe_top(struct kprobe *p) { - struct module *mod; struct kprobe *old_p, *list_p; - int cleanup_p; - mutex_lock(&kprobe_mutex); old_p = get_kprobe(p->addr); - if (unlikely(!old_p)) { - mutex_unlock(&kprobe_mutex); - return; - } + if (unlikely(!old_p)) + return -EINVAL; + if (p != old_p) { list_for_each_entry_rcu(list_p, &old_p->list, list) if (list_p == p) /* kprobe p is a valid probe */ goto valid_p; - mutex_unlock(&kprobe_mutex); - return; + return -EINVAL; } valid_p: if (old_p == p || (old_p->pre_handler == aggr_pre_handler && - p->list.next == &old_p->list && p->list.prev == &old_p->list)) { + list_is_singular(&old_p->list))) { /* * Only probe on the hash list. Disarm only if kprobes are * enabled - otherwise, the breakpoint would already have @@ -643,43 +637,97 @@ valid_p: if (kprobe_enabled) arch_disarm_kprobe(p); hlist_del_rcu(&old_p->hlist); - cleanup_p = 1; } else { + if (p->break_handler) + old_p->break_handler = NULL; + if (p->post_handler) { + list_for_each_entry_rcu(list_p, &old_p->list, list) { + if ((list_p != p) && (list_p->post_handler)) + goto noclean; + } + old_p->post_handler = NULL; + } +noclean: list_del_rcu(&p->list); - cleanup_p = 0; } + return 0; +} - mutex_unlock(&kprobe_mutex); +static void __kprobes __unregister_kprobe_bottom(struct kprobe *p) +{ + struct module *mod; + struct kprobe *old_p; - synchronize_sched(); if (p->mod_refcounted) { mod = module_text_address((unsigned long)p->addr); if (mod) module_put(mod); } - if (cleanup_p) { - if (p != old_p) { - list_del_rcu(&p->list); + if (list_empty(&p->list) || list_is_singular(&p->list)) { + if (!list_empty(&p->list)) { + /* "p" is the last child of an aggr_kprobe */ + old_p = list_entry(p->list.next, struct kprobe, list); + list_del(&p->list); kfree(old_p); } arch_remove_kprobe(p); - } else { - mutex_lock(&kprobe_mutex); - if (p->break_handler) - old_p->break_handler = NULL; - if (p->post_handler){ - list_for_each_entry_rcu(list_p, &old_p->list, list){ - if (list_p->post_handler){ - cleanup_p = 2; - break; - } - } - if (cleanup_p == 0) - old_p->post_handler = NULL; + } +} + +static int __register_kprobes(struct kprobe **kps, int num, + unsigned long called_from) +{ + int i, ret = 0; + + if (num <= 0) + return -EINVAL; + for (i = 0; i < num; i++) { + ret = __register_kprobe(kps[i], called_from); + if (ret < 0 && i > 0) { + unregister_kprobes(kps, i); + break; } - mutex_unlock(&kprobe_mutex); } + return ret; +} + +/* + * Registration and unregistration functions for kprobe. + */ +int __kprobes register_kprobe(struct kprobe *p) +{ + return __register_kprobes(&p, 1, + (unsigned long)__builtin_return_address(0)); +} + +void __kprobes unregister_kprobe(struct kprobe *p) +{ + unregister_kprobes(&p, 1); +} + +int __kprobes register_kprobes(struct kprobe **kps, int num) +{ + return __register_kprobes(kps, num, + (unsigned long)__builtin_return_address(0)); +} + +void __kprobes unregister_kprobes(struct kprobe **kps, int num) +{ + int i; + + if (num <= 0) + return; + mutex_lock(&kprobe_mutex); + for (i = 0; i < num; i++) + if (__unregister_kprobe_top(kps[i]) < 0) + kps[i]->addr = NULL; + mutex_unlock(&kprobe_mutex); + + synchronize_sched(); + for (i = 0; i < num; i++) + if (kps[i]->addr) + __unregister_kprobe_bottom(kps[i]); } static struct notifier_block kprobe_exceptions_nb = { @@ -1118,6 +1166,8 @@ module_init(init_kprobes); EXPORT_SYMBOL_GPL(register_kprobe); EXPORT_SYMBOL_GPL(unregister_kprobe); +EXPORT_SYMBOL_GPL(register_kprobes); +EXPORT_SYMBOL_GPL(unregister_kprobes); EXPORT_SYMBOL_GPL(register_jprobe); EXPORT_SYMBOL_GPL(unregister_jprobe); #ifdef CONFIG_KPROBES -- cgit v1.2.3 From 4a296e07c3a410c09b9155da4c2fa84a07964f38 Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Mon, 28 Apr 2008 02:14:29 -0700 Subject: kprobes: add (un)register_kretprobes for batch registration Introduce unregister_/register_kretprobes() for kretprobe batch registration. Signed-off-by: Masami Hiramatsu Cc: Ananth N Mavinakayanahalli Cc: Jim Keniston Cc: Prasanna S Panchamukhi Cc: Shaohua Li Cc: David Miller Cc: "Frank Ch. Eigler" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/kprobes.h | 9 ++++ kernel/kprobes.c | 108 +++++++++++++++++++++++++++++++++++++++--------- 2 files changed, 97 insertions(+), 20 deletions(-) diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h index 2ba7df645a84..94c855a236ae 100644 --- a/include/linux/kprobes.h +++ b/include/linux/kprobes.h @@ -245,6 +245,8 @@ unsigned long arch_deref_entry_point(void *); int register_kretprobe(struct kretprobe *rp); void unregister_kretprobe(struct kretprobe *rp); +int register_kretprobes(struct kretprobe **rps, int num); +void unregister_kretprobes(struct kretprobe **rps, int num); void kprobe_flush_task(struct task_struct *tk); void recycle_rp_inst(struct kretprobe_instance *ri, struct hlist_head *head); @@ -287,9 +289,16 @@ static inline int register_kretprobe(struct kretprobe *rp) { return -ENOSYS; } +static inline int register_kretprobes(struct kretprobe **rps, int num) +{ + return -ENOSYS; +} static inline void unregister_kretprobe(struct kretprobe *rp) { } +static inline void unregister_kretprobes(struct kretprobe **rps, int num) +{ +} static inline void kprobe_flush_task(struct task_struct *tk) { } diff --git a/kernel/kprobes.c b/kernel/kprobes.c index 76275fc025a5..5e3144ad9b64 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -429,6 +429,21 @@ static inline void free_rp_inst(struct kretprobe *rp) } } +static void __kprobes cleanup_rp_inst(struct kretprobe *rp) +{ + unsigned long flags; + struct kretprobe_instance *ri; + struct hlist_node *pos, *next; + /* No race here */ + spin_lock_irqsave(&kretprobe_lock, flags); + hlist_for_each_entry_safe(ri, pos, next, &rp->used_instances, uflist) { + ri->rp = NULL; + hlist_del(&ri->uflist); + } + spin_unlock_irqrestore(&kretprobe_lock, flags); + free_rp_inst(rp); +} + /* * Keep all fields in the kprobe consistent */ @@ -798,7 +813,8 @@ static int __kprobes pre_handler_kretprobe(struct kprobe *p, return 0; } -int __kprobes register_kretprobe(struct kretprobe *rp) +static int __kprobes __register_kretprobe(struct kretprobe *rp, + unsigned long called_from) { int ret = 0; struct kretprobe_instance *inst; @@ -844,43 +860,93 @@ int __kprobes register_kretprobe(struct kretprobe *rp) rp->nmissed = 0; /* Establish function entry probe point */ - if ((ret = __register_kprobe(&rp->kp, - (unsigned long)__builtin_return_address(0))) != 0) + ret = __register_kprobe(&rp->kp, called_from); + if (ret != 0) free_rp_inst(rp); return ret; } +static int __register_kretprobes(struct kretprobe **rps, int num, + unsigned long called_from) +{ + int ret = 0, i; + + if (num <= 0) + return -EINVAL; + for (i = 0; i < num; i++) { + ret = __register_kretprobe(rps[i], called_from); + if (ret < 0 && i > 0) { + unregister_kretprobes(rps, i); + break; + } + } + return ret; +} + +int __kprobes register_kretprobe(struct kretprobe *rp) +{ + return __register_kretprobes(&rp, 1, + (unsigned long)__builtin_return_address(0)); +} + +void __kprobes unregister_kretprobe(struct kretprobe *rp) +{ + unregister_kretprobes(&rp, 1); +} + +int __kprobes register_kretprobes(struct kretprobe **rps, int num) +{ + return __register_kretprobes(rps, num, + (unsigned long)__builtin_return_address(0)); +} + +void __kprobes unregister_kretprobes(struct kretprobe **rps, int num) +{ + int i; + + if (num <= 0) + return; + mutex_lock(&kprobe_mutex); + for (i = 0; i < num; i++) + if (__unregister_kprobe_top(&rps[i]->kp) < 0) + rps[i]->kp.addr = NULL; + mutex_unlock(&kprobe_mutex); + + synchronize_sched(); + for (i = 0; i < num; i++) { + if (rps[i]->kp.addr) { + __unregister_kprobe_bottom(&rps[i]->kp); + cleanup_rp_inst(rps[i]); + } + } +} + #else /* CONFIG_KRETPROBES */ int __kprobes register_kretprobe(struct kretprobe *rp) { return -ENOSYS; } -static int __kprobes pre_handler_kretprobe(struct kprobe *p, - struct pt_regs *regs) +int __kprobes register_kretprobes(struct kretprobe **rps, int num) { - return 0; + return -ENOSYS; } -#endif /* CONFIG_KRETPROBES */ - void __kprobes unregister_kretprobe(struct kretprobe *rp) { - unsigned long flags; - struct kretprobe_instance *ri; - struct hlist_node *pos, *next; +} - unregister_kprobe(&rp->kp); +void __kprobes unregister_kretprobes(struct kretprobe **rps, int num) +{ +} - /* No race here */ - spin_lock_irqsave(&kretprobe_lock, flags); - hlist_for_each_entry_safe(ri, pos, next, &rp->used_instances, uflist) { - ri->rp = NULL; - hlist_del(&ri->uflist); - } - spin_unlock_irqrestore(&kretprobe_lock, flags); - free_rp_inst(rp); +static int __kprobes pre_handler_kretprobe(struct kprobe *p, + struct pt_regs *regs) +{ + return 0; } +#endif /* CONFIG_KRETPROBES */ + static int __init init_kprobes(void) { int i, err = 0; @@ -1177,4 +1243,6 @@ EXPORT_SYMBOL_GPL(jprobe_return); #ifdef CONFIG_KPROBES EXPORT_SYMBOL_GPL(register_kretprobe); EXPORT_SYMBOL_GPL(unregister_kretprobe); +EXPORT_SYMBOL_GPL(register_kretprobes); +EXPORT_SYMBOL_GPL(unregister_kretprobes); #endif -- cgit v1.2.3 From 26b31c1908e02a316edfba08080373342e662c14 Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Mon, 28 Apr 2008 02:14:29 -0700 Subject: kprobes: add (un)register_jprobes for batch registration Introduce unregister_/register_jprobes() for jprobe batch registration. Signed-off-by: Masami Hiramatsu Cc: Ananth N Mavinakayanahalli Cc: Jim Keniston Cc: Prasanna S Panchamukhi Cc: Shaohua Li Cc: David Miller Cc: "Frank Ch. Eigler" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/kprobes.h | 9 +++++++ kernel/kprobes.c | 65 ++++++++++++++++++++++++++++++++++++++++++------- 2 files changed, 65 insertions(+), 9 deletions(-) diff --git a/include/linux/kprobes.h b/include/linux/kprobes.h index 94c855a236ae..1036631ff4fa 100644 --- a/include/linux/kprobes.h +++ b/include/linux/kprobes.h @@ -240,6 +240,8 @@ int setjmp_pre_handler(struct kprobe *, struct pt_regs *); int longjmp_break_handler(struct kprobe *, struct pt_regs *); int register_jprobe(struct jprobe *p); void unregister_jprobe(struct jprobe *p); +int register_jprobes(struct jprobe **jps, int num); +void unregister_jprobes(struct jprobe **jps, int num); void jprobe_return(void); unsigned long arch_deref_entry_point(void *); @@ -279,9 +281,16 @@ static inline int register_jprobe(struct jprobe *p) { return -ENOSYS; } +static inline int register_jprobes(struct jprobe **jps, int num) +{ + return -ENOSYS; +} static inline void unregister_jprobe(struct jprobe *p) { } +static inline void unregister_jprobes(struct jprobe **jps, int num) +{ +} static inline void jprobe_return(void) { } diff --git a/kernel/kprobes.c b/kernel/kprobes.c index 5e3144ad9b64..1e0250cb9486 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -755,24 +755,69 @@ unsigned long __weak arch_deref_entry_point(void *entry) return (unsigned long)entry; } -int __kprobes register_jprobe(struct jprobe *jp) +static int __register_jprobes(struct jprobe **jps, int num, + unsigned long called_from) { - unsigned long addr = arch_deref_entry_point(jp->entry); + struct jprobe *jp; + int ret = 0, i; - if (!kernel_text_address(addr)) + if (num <= 0) return -EINVAL; + for (i = 0; i < num; i++) { + unsigned long addr; + jp = jps[i]; + addr = arch_deref_entry_point(jp->entry); + + if (!kernel_text_address(addr)) + ret = -EINVAL; + else { + /* Todo: Verify probepoint is a function entry point */ + jp->kp.pre_handler = setjmp_pre_handler; + jp->kp.break_handler = longjmp_break_handler; + ret = __register_kprobe(&jp->kp, called_from); + } + if (ret < 0 && i > 0) { + unregister_jprobes(jps, i); + break; + } + } + return ret; +} - /* Todo: Verify probepoint is a function entry point */ - jp->kp.pre_handler = setjmp_pre_handler; - jp->kp.break_handler = longjmp_break_handler; - - return __register_kprobe(&jp->kp, +int __kprobes register_jprobe(struct jprobe *jp) +{ + return __register_jprobes(&jp, 1, (unsigned long)__builtin_return_address(0)); } void __kprobes unregister_jprobe(struct jprobe *jp) { - unregister_kprobe(&jp->kp); + unregister_jprobes(&jp, 1); +} + +int __kprobes register_jprobes(struct jprobe **jps, int num) +{ + return __register_jprobes(jps, num, + (unsigned long)__builtin_return_address(0)); +} + +void __kprobes unregister_jprobes(struct jprobe **jps, int num) +{ + int i; + + if (num <= 0) + return; + mutex_lock(&kprobe_mutex); + for (i = 0; i < num; i++) + if (__unregister_kprobe_top(&jps[i]->kp) < 0) + jps[i]->kp.addr = NULL; + mutex_unlock(&kprobe_mutex); + + synchronize_sched(); + for (i = 0; i < num; i++) { + if (jps[i]->kp.addr) + __unregister_kprobe_bottom(&jps[i]->kp); + } } #ifdef CONFIG_KRETPROBES @@ -1236,6 +1281,8 @@ EXPORT_SYMBOL_GPL(register_kprobes); EXPORT_SYMBOL_GPL(unregister_kprobes); EXPORT_SYMBOL_GPL(register_jprobe); EXPORT_SYMBOL_GPL(unregister_jprobe); +EXPORT_SYMBOL_GPL(register_jprobes); +EXPORT_SYMBOL_GPL(unregister_jprobes); #ifdef CONFIG_KPROBES EXPORT_SYMBOL_GPL(jprobe_return); #endif -- cgit v1.2.3 From 3b0cb4caefeca6fe6b05c6c5a76e9c633b44c58f Mon Sep 17 00:00:00 2001 From: Masami Hiramatsu Date: Mon, 28 Apr 2008 02:14:30 -0700 Subject: kprobes: update document about batch registration Add the description of batch registration interfaces to Documentation/kprobes.txt. Signed-off-by: Masami Hiramatsu Cc: Ananth N Mavinakayanahalli Cc: Jim Keniston Cc: Prasanna S Panchamukhi Cc: Shaohua Li Cc: David Miller Cc: "Frank Ch. Eigler" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/kprobes.txt | 51 +++++++++++++++++++++++++++++++++++++++++++---- 1 file changed, 47 insertions(+), 4 deletions(-) diff --git a/Documentation/kprobes.txt b/Documentation/kprobes.txt index be89f393274f..6877e7187113 100644 --- a/Documentation/kprobes.txt +++ b/Documentation/kprobes.txt @@ -37,6 +37,11 @@ registration function such as register_kprobe() specifies where the probe is to be inserted and what handler is to be called when the probe is hit. +There are also register_/unregister_*probes() functions for batch +registration/unregistration of a group of *probes. These functions +can speed up unregistration process when you have to unregister +a lot of probes at once. + The next three subsections explain how the different types of probes work. They explain certain things that you'll need to know in order to make the best use of Kprobes -- e.g., the @@ -190,10 +195,11 @@ code mapping. 4. API Reference The Kprobes API includes a "register" function and an "unregister" -function for each type of probe. Here are terse, mini-man-page -specifications for these functions and the associated probe handlers -that you'll write. See the files in the samples/kprobes/ sub-directory -for examples. +function for each type of probe. The API also includes "register_*probes" +and "unregister_*probes" functions for (un)registering arrays of probes. +Here are terse, mini-man-page specifications for these functions and +the associated probe handlers that you'll write. See the files in the +samples/kprobes/ sub-directory for examples. 4.1 register_kprobe @@ -319,6 +325,43 @@ void unregister_kretprobe(struct kretprobe *rp); Removes the specified probe. The unregister function can be called at any time after the probe has been registered. +NOTE: +If the functions find an incorrect probe (ex. an unregistered probe), +they clear the addr field of the probe. + +4.5 register_*probes + +#include +int register_kprobes(struct kprobe **kps, int num); +int register_kretprobes(struct kretprobe **rps, int num); +int register_jprobes(struct jprobe **jps, int num); + +Registers each of the num probes in the specified array. If any +error occurs during registration, all probes in the array, up to +the bad probe, are safely unregistered before the register_*probes +function returns. +- kps/rps/jps: an array of pointers to *probe data structures +- num: the number of the array entries. + +NOTE: +You have to allocate(or define) an array of pointers and set all +of the array entries before using these functions. + +4.6 unregister_*probes + +#include +void unregister_kprobes(struct kprobe **kps, int num); +void unregister_kretprobes(struct kretprobe **rps, int num); +void unregister_jprobes(struct jprobe **jps, int num); + +Removes each of the num probes in the specified array at once. + +NOTE: +If the functions find some incorrect probes (ex. unregistered +probes) in the specified array, they clear the addr field of those +incorrect probes. However, other probes in the array are +unregistered correctly. + 5. Kprobes Features and Limitations Kprobes allows multiple probes at the same address. Currently, -- cgit v1.2.3 From 338bf9afda91ec005a1e9a0de4af0271cc167d56 Mon Sep 17 00:00:00 2001 From: Andrew Perepechko Date: Mon, 28 Apr 2008 02:14:31 -0700 Subject: quota: do not allow setting of quota limits to too high values We should check whether quota limits set via Q_SETQUOTA are not exceeding limits which quota format is able to handle. Signed-off-by: Andrew Perepechko Signed-off-by: Jan Kara Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/dquot.c | 22 +++++++++++++++++----- fs/quota_v1.c | 3 +++ fs/quota_v2.c | 3 +++ include/linux/quota.h | 2 ++ 4 files changed, 25 insertions(+), 5 deletions(-) diff --git a/fs/dquot.c b/fs/dquot.c index 41b9dbd68b0e..24eef582d2a0 100644 --- a/fs/dquot.c +++ b/fs/dquot.c @@ -1709,10 +1709,19 @@ int vfs_get_dqblk(struct super_block *sb, int type, qid_t id, struct if_dqblk *d } /* Generic routine for setting common part of quota structure */ -static void do_set_dqblk(struct dquot *dquot, struct if_dqblk *di) +static int do_set_dqblk(struct dquot *dquot, struct if_dqblk *di) { struct mem_dqblk *dm = &dquot->dq_dqb; int check_blim = 0, check_ilim = 0; + struct mem_dqinfo *dqi = &sb_dqopt(dquot->dq_sb)->info[dquot->dq_type]; + + if ((di->dqb_valid & QIF_BLIMITS && + (di->dqb_bhardlimit > dqi->dqi_maxblimit || + di->dqb_bsoftlimit > dqi->dqi_maxblimit)) || + (di->dqb_valid & QIF_ILIMITS && + (di->dqb_ihardlimit > dqi->dqi_maxilimit || + di->dqb_isoftlimit > dqi->dqi_maxilimit))) + return -ERANGE; spin_lock(&dq_data_lock); if (di->dqb_valid & QIF_SPACE) { @@ -1744,7 +1753,7 @@ static void do_set_dqblk(struct dquot *dquot, struct if_dqblk *di) clear_bit(DQ_BLKS_B, &dquot->dq_flags); } else if (!(di->dqb_valid & QIF_BTIME)) /* Set grace only if user hasn't provided his own... */ - dm->dqb_btime = get_seconds() + sb_dqopt(dquot->dq_sb)->info[dquot->dq_type].dqi_bgrace; + dm->dqb_btime = get_seconds() + dqi->dqi_bgrace; } if (check_ilim) { if (!dm->dqb_isoftlimit || dm->dqb_curinodes < dm->dqb_isoftlimit) { @@ -1752,7 +1761,7 @@ static void do_set_dqblk(struct dquot *dquot, struct if_dqblk *di) clear_bit(DQ_INODES_B, &dquot->dq_flags); } else if (!(di->dqb_valid & QIF_ITIME)) /* Set grace only if user hasn't provided his own... */ - dm->dqb_itime = get_seconds() + sb_dqopt(dquot->dq_sb)->info[dquot->dq_type].dqi_igrace; + dm->dqb_itime = get_seconds() + dqi->dqi_igrace; } if (dm->dqb_bhardlimit || dm->dqb_bsoftlimit || dm->dqb_ihardlimit || dm->dqb_isoftlimit) clear_bit(DQ_FAKE_B, &dquot->dq_flags); @@ -1760,21 +1769,24 @@ static void do_set_dqblk(struct dquot *dquot, struct if_dqblk *di) set_bit(DQ_FAKE_B, &dquot->dq_flags); spin_unlock(&dq_data_lock); mark_dquot_dirty(dquot); + + return 0; } int vfs_set_dqblk(struct super_block *sb, int type, qid_t id, struct if_dqblk *di) { struct dquot *dquot; + int rc; mutex_lock(&sb_dqopt(sb)->dqonoff_mutex); if (!(dquot = dqget(sb, id, type))) { mutex_unlock(&sb_dqopt(sb)->dqonoff_mutex); return -ESRCH; } - do_set_dqblk(dquot, di); + rc = do_set_dqblk(dquot, di); dqput(dquot); mutex_unlock(&sb_dqopt(sb)->dqonoff_mutex); - return 0; + return rc; } /* Generic routine for getting common part of quota file information */ diff --git a/fs/quota_v1.c b/fs/quota_v1.c index f3841f233069..a6cf9269105c 100644 --- a/fs/quota_v1.c +++ b/fs/quota_v1.c @@ -139,6 +139,9 @@ static int v1_read_file_info(struct super_block *sb, int type) goto out; } ret = 0; + /* limits are stored as unsigned 32-bit data */ + dqopt->info[type].dqi_maxblimit = 0xffffffff; + dqopt->info[type].dqi_maxilimit = 0xffffffff; dqopt->info[type].dqi_igrace = dqblk.dqb_itime ? dqblk.dqb_itime : MAX_IQ_TIME; dqopt->info[type].dqi_bgrace = dqblk.dqb_btime ? dqblk.dqb_btime : MAX_DQ_TIME; out: diff --git a/fs/quota_v2.c b/fs/quota_v2.c index c519a583e681..23b647f25d08 100644 --- a/fs/quota_v2.c +++ b/fs/quota_v2.c @@ -59,6 +59,9 @@ static int v2_read_file_info(struct super_block *sb, int type) sb->s_id); return -1; } + /* limits are stored as unsigned 32-bit data */ + info->dqi_maxblimit = 0xffffffff; + info->dqi_maxilimit = 0xffffffff; info->dqi_bgrace = le32_to_cpu(dinfo.dqi_bgrace); info->dqi_igrace = le32_to_cpu(dinfo.dqi_igrace); info->dqi_flags = le32_to_cpu(dinfo.dqi_flags); diff --git a/include/linux/quota.h b/include/linux/quota.h index eb560d031acd..326cb80e3867 100644 --- a/include/linux/quota.h +++ b/include/linux/quota.h @@ -206,6 +206,8 @@ struct mem_dqinfo { unsigned long dqi_flags; unsigned int dqi_bgrace; unsigned int dqi_igrace; + qsize_t dqi_maxblimit; + qsize_t dqi_maxilimit; union { struct v1_mem_dqinfo v1_i; struct v2_mem_dqinfo v2_i; -- cgit v1.2.3 From 8794b5b246cf6f67baf57bd9db386e79ca5cac33 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Mon, 28 Apr 2008 02:14:32 -0700 Subject: quota: remove superfluous DQUOT_OFF() in fs/namespace.c We don't need to turn quotas off before remounting root ro, because do_remount_sb() already handles this. Signed-off-by: Jan Kara Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/namespace.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/fs/namespace.c b/fs/namespace.c index f48f98110c30..fe376805cf5f 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -14,7 +14,6 @@ #include #include #include -#include #include #include #include @@ -1084,7 +1083,6 @@ static int do_umount(struct vfsmount *mnt, int flags) down_write(&sb->s_umount); if (!(sb->s_flags & MS_RDONLY)) { lock_kernel(); - DQUOT_OFF(sb); retval = do_remount_sb(sb, MS_RDONLY, NULL, 0); unlock_kernel(); } -- cgit v1.2.3 From 03f6e92bdd467aed9d7571a571868563ae6ad288 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Mon, 28 Apr 2008 02:14:32 -0700 Subject: quota: various style cleanups Cleanups in quota code: Change __inline__ to inline. Change some macros to inline functions. Remove vfs_quota_off_mount() macro. DQUOT_OFF() should be (0) is CONFIG_QUOTA is disabled. Move declaration of mark_dquot_dirty and dirty_dquot from quota.h to dquot.c [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Jan Kara Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/dquot.c | 10 ++++++- fs/reiserfs/super.c | 2 +- include/linux/quota.h | 5 ---- include/linux/quotaops.h | 71 ++++++++++++++++++++++++++++++++++-------------- 4 files changed, 60 insertions(+), 28 deletions(-) diff --git a/fs/dquot.c b/fs/dquot.c index 24eef582d2a0..fc26d1097d3c 100644 --- a/fs/dquot.c +++ b/fs/dquot.c @@ -289,7 +289,15 @@ static void wait_on_dquot(struct dquot *dquot) mutex_unlock(&dquot->dq_lock); } -#define mark_dquot_dirty(dquot) ((dquot)->dq_sb->dq_op->mark_dirty(dquot)) +static inline int dquot_dirty(struct dquot *dquot) +{ + return test_bit(DQ_MOD_B, &dquot->dq_flags); +} + +static inline int mark_dquot_dirty(struct dquot *dquot) +{ + return dquot->dq_sb->dq_op->mark_dirty(dquot); +} int dquot_mark_dquot_dirty(struct dquot *dquot) { diff --git a/fs/reiserfs/super.c b/fs/reiserfs/super.c index 393cc22c1717..3e1972d31e44 100644 --- a/fs/reiserfs/super.c +++ b/fs/reiserfs/super.c @@ -304,7 +304,7 @@ static int finish_unfinished(struct super_block *s) /* Turn quotas off */ for (i = 0; i < MAXQUOTAS; i++) { if (sb_dqopt(s)->files[i]) - vfs_quota_off_mount(s, i); + vfs_quota_off(s, i); } if (ms_active_set) /* Restore the flag back */ diff --git a/include/linux/quota.h b/include/linux/quota.h index 326cb80e3867..48556b039b1c 100644 --- a/include/linux/quota.h +++ b/include/linux/quota.h @@ -331,11 +331,6 @@ struct quota_info { struct quota_format_ops *ops[MAXQUOTAS]; /* Operations for each type */ }; -/* Inline would be better but we need to dereference super_block which is not defined yet */ -int mark_dquot_dirty(struct dquot *dquot); - -#define dquot_dirty(dquot) test_bit(DQ_MOD_B, &(dquot)->dq_flags) - #define sb_has_quota_enabled(sb, type) ((type)==USRQUOTA ? \ (sb_dqopt(sb)->flags & DQUOT_USR_ENABLED) : (sb_dqopt(sb)->flags & DQUOT_GRP_ENABLED)) diff --git a/include/linux/quotaops.h b/include/linux/quotaops.h index 5110201a4159..1aac25511f07 100644 --- a/include/linux/quotaops.h +++ b/include/linux/quotaops.h @@ -41,7 +41,6 @@ extern int vfs_quota_on(struct super_block *sb, int type, int format_id, char *p extern int vfs_quota_on_mount(struct super_block *sb, char *qf_name, int format_id, int type); extern int vfs_quota_off(struct super_block *sb, int type); -#define vfs_quota_off_mount(sb, type) vfs_quota_off(sb, type) extern int vfs_quota_sync(struct super_block *sb, int type); extern int vfs_get_dqinfo(struct super_block *sb, int type, struct if_dqinfo *ii); extern int vfs_set_dqinfo(struct super_block *sb, int type, struct if_dqinfo *ii); @@ -59,7 +58,7 @@ extern struct quotactl_ops vfs_quotactl_ops; /* It is better to call this function outside of any transaction as it might * need a lot of space in journal for dquot structure allocation. */ -static __inline__ void DQUOT_INIT(struct inode *inode) +static inline void DQUOT_INIT(struct inode *inode) { BUG_ON(!inode->i_sb); if (sb_any_quota_enabled(inode->i_sb) && !IS_NOQUOTA(inode)) @@ -67,7 +66,7 @@ static __inline__ void DQUOT_INIT(struct inode *inode) } /* The same as with DQUOT_INIT */ -static __inline__ void DQUOT_DROP(struct inode *inode) +static inline void DQUOT_DROP(struct inode *inode) { /* Here we can get arbitrary inode from clear_inode() so we have * to be careful. OTOH we don't need locking as quota operations @@ -90,7 +89,7 @@ static __inline__ void DQUOT_DROP(struct inode *inode) /* The following allocation/freeing/transfer functions *must* be called inside * a transaction (deadlocks possible otherwise) */ -static __inline__ int DQUOT_PREALLOC_SPACE_NODIRTY(struct inode *inode, qsize_t nr) +static inline int DQUOT_PREALLOC_SPACE_NODIRTY(struct inode *inode, qsize_t nr) { if (sb_any_quota_enabled(inode->i_sb)) { /* Used space is updated in alloc_space() */ @@ -102,7 +101,7 @@ static __inline__ int DQUOT_PREALLOC_SPACE_NODIRTY(struct inode *inode, qsize_t return 0; } -static __inline__ int DQUOT_PREALLOC_SPACE(struct inode *inode, qsize_t nr) +static inline int DQUOT_PREALLOC_SPACE(struct inode *inode, qsize_t nr) { int ret; if (!(ret = DQUOT_PREALLOC_SPACE_NODIRTY(inode, nr))) @@ -110,7 +109,7 @@ static __inline__ int DQUOT_PREALLOC_SPACE(struct inode *inode, qsize_t nr) return ret; } -static __inline__ int DQUOT_ALLOC_SPACE_NODIRTY(struct inode *inode, qsize_t nr) +static inline int DQUOT_ALLOC_SPACE_NODIRTY(struct inode *inode, qsize_t nr) { if (sb_any_quota_enabled(inode->i_sb)) { /* Used space is updated in alloc_space() */ @@ -122,7 +121,7 @@ static __inline__ int DQUOT_ALLOC_SPACE_NODIRTY(struct inode *inode, qsize_t nr) return 0; } -static __inline__ int DQUOT_ALLOC_SPACE(struct inode *inode, qsize_t nr) +static inline int DQUOT_ALLOC_SPACE(struct inode *inode, qsize_t nr) { int ret; if (!(ret = DQUOT_ALLOC_SPACE_NODIRTY(inode, nr))) @@ -130,7 +129,7 @@ static __inline__ int DQUOT_ALLOC_SPACE(struct inode *inode, qsize_t nr) return ret; } -static __inline__ int DQUOT_ALLOC_INODE(struct inode *inode) +static inline int DQUOT_ALLOC_INODE(struct inode *inode) { if (sb_any_quota_enabled(inode->i_sb)) { DQUOT_INIT(inode); @@ -140,7 +139,7 @@ static __inline__ int DQUOT_ALLOC_INODE(struct inode *inode) return 0; } -static __inline__ void DQUOT_FREE_SPACE_NODIRTY(struct inode *inode, qsize_t nr) +static inline void DQUOT_FREE_SPACE_NODIRTY(struct inode *inode, qsize_t nr) { if (sb_any_quota_enabled(inode->i_sb)) inode->i_sb->dq_op->free_space(inode, nr); @@ -148,19 +147,19 @@ static __inline__ void DQUOT_FREE_SPACE_NODIRTY(struct inode *inode, qsize_t nr) inode_sub_bytes(inode, nr); } -static __inline__ void DQUOT_FREE_SPACE(struct inode *inode, qsize_t nr) +static inline void DQUOT_FREE_SPACE(struct inode *inode, qsize_t nr) { DQUOT_FREE_SPACE_NODIRTY(inode, nr); mark_inode_dirty(inode); } -static __inline__ void DQUOT_FREE_INODE(struct inode *inode) +static inline void DQUOT_FREE_INODE(struct inode *inode) { if (sb_any_quota_enabled(inode->i_sb)) inode->i_sb->dq_op->free_inode(inode, 1); } -static __inline__ int DQUOT_TRANSFER(struct inode *inode, struct iattr *iattr) +static inline int DQUOT_TRANSFER(struct inode *inode, struct iattr *iattr) { if (sb_any_quota_enabled(inode->i_sb) && !IS_NOQUOTA(inode)) { DQUOT_INIT(inode); @@ -171,9 +170,12 @@ static __inline__ int DQUOT_TRANSFER(struct inode *inode, struct iattr *iattr) } /* The following two functions cannot be called inside a transaction */ -#define DQUOT_SYNC(sb) sync_dquots(sb, -1) +static inline void DQUOT_SYNC(struct super_block *sb) +{ + sync_dquots(sb, -1); +} -static __inline__ int DQUOT_OFF(struct super_block *sb) +static inline int DQUOT_OFF(struct super_block *sb) { int ret = -ENOSYS; @@ -194,7 +196,7 @@ static __inline__ int DQUOT_OFF(struct super_block *sb) #define DQUOT_ALLOC_INODE(inode) (0) #define DQUOT_FREE_INODE(inode) do { } while(0) #define DQUOT_SYNC(sb) do { } while(0) -#define DQUOT_OFF(sb) do { } while(0) +#define DQUOT_OFF(sb) (0) #define DQUOT_TRANSFER(inode, iattr) (0) static inline int DQUOT_PREALLOC_SPACE_NODIRTY(struct inode *inode, qsize_t nr) { @@ -235,11 +237,38 @@ static inline void DQUOT_FREE_SPACE(struct inode *inode, qsize_t nr) #endif /* CONFIG_QUOTA */ -#define DQUOT_PREALLOC_BLOCK_NODIRTY(inode, nr) DQUOT_PREALLOC_SPACE_NODIRTY(inode, ((qsize_t)(nr)) << (inode)->i_sb->s_blocksize_bits) -#define DQUOT_PREALLOC_BLOCK(inode, nr) DQUOT_PREALLOC_SPACE(inode, ((qsize_t)(nr)) << (inode)->i_sb->s_blocksize_bits) -#define DQUOT_ALLOC_BLOCK_NODIRTY(inode, nr) DQUOT_ALLOC_SPACE_NODIRTY(inode, ((qsize_t)(nr)) << (inode)->i_sb->s_blocksize_bits) -#define DQUOT_ALLOC_BLOCK(inode, nr) DQUOT_ALLOC_SPACE(inode, ((qsize_t)(nr)) << (inode)->i_sb->s_blocksize_bits) -#define DQUOT_FREE_BLOCK_NODIRTY(inode, nr) DQUOT_FREE_SPACE_NODIRTY(inode, ((qsize_t)(nr)) << (inode)->i_sb->s_blocksize_bits) -#define DQUOT_FREE_BLOCK(inode, nr) DQUOT_FREE_SPACE(inode, ((qsize_t)(nr)) << (inode)->i_sb->s_blocksize_bits) +static inline int DQUOT_PREALLOC_BLOCK_NODIRTY(struct inode *inode, qsize_t nr) +{ + return DQUOT_PREALLOC_SPACE_NODIRTY(inode, + nr << inode->i_sb->s_blocksize_bits); +} + +static inline int DQUOT_PREALLOC_BLOCK(struct inode *inode, qsize_t nr) +{ + return DQUOT_PREALLOC_SPACE(inode, + nr << inode->i_sb->s_blocksize_bits); +} + +static inline int DQUOT_ALLOC_BLOCK_NODIRTY(struct inode *inode, qsize_t nr) +{ + return DQUOT_ALLOC_SPACE_NODIRTY(inode, + nr << inode->i_sb->s_blocksize_bits); +} + +static inline int DQUOT_ALLOC_BLOCK(struct inode *inode, qsize_t nr) +{ + return DQUOT_ALLOC_SPACE(inode, + nr << inode->i_sb->s_blocksize_bits); +} + +static inline void DQUOT_FREE_BLOCK_NODIRTY(struct inode *inode, qsize_t nr) +{ + DQUOT_FREE_SPACE_NODIRTY(inode, nr << inode->i_sb->s_blocksize_bits); +} + +static inline void DQUOT_FREE_BLOCK(struct inode *inode, qsize_t nr) +{ + DQUOT_FREE_SPACE(inode, nr << inode->i_sb->s_blocksize_bits); +} #endif /* _LINUX_QUOTAOPS_ */ -- cgit v1.2.3 From 0ff5af8340aa6be44220d7237ef4a654314cf795 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Mon, 28 Apr 2008 02:14:33 -0700 Subject: quota: quota core changes for quotaon on remount Currently, we just turn quotas off on remount of filesystem to read-only state. The patch below adds necessary framework so that we can turn quotas off on remount RO but we are able to automatically reenable them again when filesystem is remounted to RW state. All we need to do is to keep references to inodes of quota files when remounting RO and using these references to reenable quotas when remounting RW. Signed-off-by: Jan Kara Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/dquot.c | 77 ++++++++++++++++++++++++++++++++++++++++++------ fs/quota.c | 5 ++-- fs/super.c | 10 +++++-- include/linux/quota.h | 14 +++++++-- include/linux/quotaops.h | 29 ++++++++++++++---- 5 files changed, 113 insertions(+), 22 deletions(-) diff --git a/fs/dquot.c b/fs/dquot.c index fc26d1097d3c..dfba1623cccb 100644 --- a/fs/dquot.c +++ b/fs/dquot.c @@ -1449,31 +1449,43 @@ static inline void set_enable_flags(struct quota_info *dqopt, int type) switch (type) { case USRQUOTA: dqopt->flags |= DQUOT_USR_ENABLED; + dqopt->flags &= ~DQUOT_USR_SUSPENDED; break; case GRPQUOTA: dqopt->flags |= DQUOT_GRP_ENABLED; + dqopt->flags &= ~DQUOT_GRP_SUSPENDED; break; } } -static inline void reset_enable_flags(struct quota_info *dqopt, int type) +static inline void reset_enable_flags(struct quota_info *dqopt, int type, + int remount) { switch (type) { case USRQUOTA: dqopt->flags &= ~DQUOT_USR_ENABLED; + if (remount) + dqopt->flags |= DQUOT_USR_SUSPENDED; + else + dqopt->flags &= ~DQUOT_USR_SUSPENDED; break; case GRPQUOTA: dqopt->flags &= ~DQUOT_GRP_ENABLED; + if (remount) + dqopt->flags |= DQUOT_GRP_SUSPENDED; + else + dqopt->flags &= ~DQUOT_GRP_SUSPENDED; break; } } + /* * Turn quota off on a device. type == -1 ==> quotaoff for all types (umount) */ -int vfs_quota_off(struct super_block *sb, int type) +int vfs_quota_off(struct super_block *sb, int type, int remount) { - int cnt; + int cnt, ret = 0; struct quota_info *dqopt = sb_dqopt(sb); struct inode *toputinode[MAXQUOTAS]; @@ -1483,9 +1495,17 @@ int vfs_quota_off(struct super_block *sb, int type) toputinode[cnt] = NULL; if (type != -1 && cnt != type) continue; + /* If we keep inodes of quota files after remount and quotaoff + * is called, drop kept inodes. */ + if (!remount && sb_has_quota_suspended(sb, cnt)) { + iput(dqopt->files[cnt]); + dqopt->files[cnt] = NULL; + reset_enable_flags(dqopt, cnt, 0); + continue; + } if (!sb_has_quota_enabled(sb, cnt)) continue; - reset_enable_flags(dqopt, cnt); + reset_enable_flags(dqopt, cnt, remount); /* Note: these are blocking operations */ drop_dquot_ref(sb, cnt); @@ -1501,7 +1521,8 @@ int vfs_quota_off(struct super_block *sb, int type) put_quota_format(dqopt->info[cnt].dqi_format); toputinode[cnt] = dqopt->files[cnt]; - dqopt->files[cnt] = NULL; + if (!remount) + dqopt->files[cnt] = NULL; dqopt->info[cnt].dqi_flags = 0; dqopt->info[cnt].dqi_igrace = 0; dqopt->info[cnt].dqi_bgrace = 0; @@ -1531,12 +1552,19 @@ int vfs_quota_off(struct super_block *sb, int type) mutex_unlock(&toputinode[cnt]->i_mutex); mark_inode_dirty(toputinode[cnt]); } - iput(toputinode[cnt]); mutex_unlock(&dqopt->dqonoff_mutex); + /* On remount RO, we keep the inode pointer so that we + * can reenable quota on the subsequent remount RW. + * But we have better not keep inode pointer when there + * is pending delete on the quota file... */ + if (!remount) + iput(toputinode[cnt]); + else if (!toputinode[cnt]->i_nlink) + ret = -EBUSY; } if (sb->s_bdev) invalidate_bdev(sb->s_bdev); - return 0; + return ret; } /* @@ -1574,7 +1602,8 @@ static int vfs_quota_on_inode(struct inode *inode, int type, int format_id) invalidate_bdev(sb->s_bdev); mutex_lock(&inode->i_mutex); mutex_lock(&dqopt->dqonoff_mutex); - if (sb_has_quota_enabled(sb, type)) { + if (sb_has_quota_enabled(sb, type) || + sb_has_quota_suspended(sb, type)) { error = -EBUSY; goto out_lock; } @@ -1597,6 +1626,7 @@ static int vfs_quota_on_inode(struct inode *inode, int type, int format_id) dqopt->ops[type] = fmt->qf_ops; dqopt->info[type].dqi_format = fmt; + dqopt->info[type].dqi_fmt_id = format_id; INIT_LIST_HEAD(&dqopt->info[type].dqi_dirty_list); mutex_lock(&dqopt->dqio_mutex); if ((error = dqopt->ops[type]->read_file_info(sb, type)) < 0) { @@ -1632,12 +1662,41 @@ out_fmt: return error; } +/* Reenable quotas on remount RW */ +static int vfs_quota_on_remount(struct super_block *sb, int type) +{ + struct quota_info *dqopt = sb_dqopt(sb); + struct inode *inode; + int ret; + + mutex_lock(&dqopt->dqonoff_mutex); + if (!sb_has_quota_suspended(sb, type)) { + mutex_unlock(&dqopt->dqonoff_mutex); + return 0; + } + BUG_ON(sb_has_quota_enabled(sb, type)); + + inode = dqopt->files[type]; + dqopt->files[type] = NULL; + reset_enable_flags(dqopt, type, 0); + mutex_unlock(&dqopt->dqonoff_mutex); + + ret = vfs_quota_on_inode(inode, type, dqopt->info[type].dqi_fmt_id); + iput(inode); + + return ret; +} + /* Actual function called from quotactl() */ -int vfs_quota_on(struct super_block *sb, int type, int format_id, char *path) +int vfs_quota_on(struct super_block *sb, int type, int format_id, char *path, + int remount) { struct nameidata nd; int error; + if (remount) + return vfs_quota_on_remount(sb, type); + error = path_lookup(path, LOOKUP_FOLLOW, &nd); if (error < 0) return error; diff --git a/fs/quota.c b/fs/quota.c index 84f28dd72116..db1cc9f3c7aa 100644 --- a/fs/quota.c +++ b/fs/quota.c @@ -69,7 +69,6 @@ static int generic_quotactl_valid(struct super_block *sb, int type, int cmd, qid switch (cmd) { case Q_GETFMT: case Q_GETINFO: - case Q_QUOTAOFF: case Q_SETINFO: case Q_SETQUOTA: case Q_GETQUOTA: @@ -229,12 +228,12 @@ static int do_quotactl(struct super_block *sb, int type, int cmd, qid_t id, void if (IS_ERR(pathname = getname(addr))) return PTR_ERR(pathname); - ret = sb->s_qcop->quota_on(sb, type, id, pathname); + ret = sb->s_qcop->quota_on(sb, type, id, pathname, 0); putname(pathname); return ret; } case Q_QUOTAOFF: - return sb->s_qcop->quota_off(sb, type); + return sb->s_qcop->quota_off(sb, type, 0); case Q_GETFMT: { __u32 fmt; diff --git a/fs/super.c b/fs/super.c index 4798350b2bc9..a5a4aca7e22f 100644 --- a/fs/super.c +++ b/fs/super.c @@ -179,7 +179,7 @@ void deactivate_super(struct super_block *s) if (atomic_dec_and_lock(&s->s_active, &sb_lock)) { s->s_count -= S_BIAS-1; spin_unlock(&sb_lock); - DQUOT_OFF(s); + DQUOT_OFF(s, 0); down_write(&s->s_umount); fs->kill_sb(s); put_filesystem(fs); @@ -608,6 +608,7 @@ retry: int do_remount_sb(struct super_block *sb, int flags, void *data, int force) { int retval; + int remount_rw; #ifdef CONFIG_BLOCK if (!(flags & MS_RDONLY) && bdev_read_only(sb->s_bdev)) @@ -625,8 +626,11 @@ int do_remount_sb(struct super_block *sb, int flags, void *data, int force) mark_files_ro(sb); else if (!fs_may_remount_ro(sb)) return -EBUSY; - DQUOT_OFF(sb); + retval = DQUOT_OFF(sb, 1); + if (retval < 0 && retval != -ENOSYS) + return -EBUSY; } + remount_rw = !(flags & MS_RDONLY) && (sb->s_flags & MS_RDONLY); if (sb->s_op->remount_fs) { lock_super(sb); @@ -636,6 +640,8 @@ int do_remount_sb(struct super_block *sb, int flags, void *data, int force) return retval; } sb->s_flags = (sb->s_flags & ~MS_RMT_MASK) | (flags & MS_RMT_MASK); + if (remount_rw) + DQUOT_ON_REMOUNT(sb); return 0; } diff --git a/include/linux/quota.h b/include/linux/quota.h index 48556b039b1c..52e49dce6584 100644 --- a/include/linux/quota.h +++ b/include/linux/quota.h @@ -202,6 +202,8 @@ struct quota_format_type; struct mem_dqinfo { struct quota_format_type *dqi_format; + int dqi_fmt_id; /* Id of the dqi_format - used when turning + * quotas on after remount RW */ struct list_head dqi_dirty_list; /* List of dirty dquots */ unsigned long dqi_flags; unsigned int dqi_bgrace; @@ -298,8 +300,8 @@ struct dquot_operations { /* Operations handling requests from userspace */ struct quotactl_ops { - int (*quota_on)(struct super_block *, int, int, char *); - int (*quota_off)(struct super_block *, int); + int (*quota_on)(struct super_block *, int, int, char *, int); + int (*quota_off)(struct super_block *, int, int); int (*quota_sync)(struct super_block *, int); int (*get_info)(struct super_block *, int, struct if_dqinfo *); int (*set_info)(struct super_block *, int, struct if_dqinfo *); @@ -320,6 +322,10 @@ struct quota_format_type { #define DQUOT_USR_ENABLED 0x01 /* User diskquotas enabled */ #define DQUOT_GRP_ENABLED 0x02 /* Group diskquotas enabled */ +#define DQUOT_USR_SUSPENDED 0x04 /* User diskquotas are off, but + * we have necessary info in + * memory to turn them on */ +#define DQUOT_GRP_SUSPENDED 0x08 /* The same for group quotas */ struct quota_info { unsigned int flags; /* Flags for diskquotas on this device */ @@ -337,6 +343,10 @@ struct quota_info { #define sb_any_quota_enabled(sb) (sb_has_quota_enabled(sb, USRQUOTA) | \ sb_has_quota_enabled(sb, GRPQUOTA)) +#define sb_has_quota_suspended(sb, type) \ + ((type) == USRQUOTA ? (sb_dqopt(sb)->flags & DQUOT_USR_SUSPENDED) : \ + (sb_dqopt(sb)->flags & DQUOT_GRP_SUSPENDED)) + int register_quota_format(struct quota_format_type *fmt); void unregister_quota_format(struct quota_format_type *fmt); diff --git a/include/linux/quotaops.h b/include/linux/quotaops.h index 1aac25511f07..c97c8f3fa6ee 100644 --- a/include/linux/quotaops.h +++ b/include/linux/quotaops.h @@ -37,10 +37,11 @@ extern int dquot_release(struct dquot *dquot); extern int dquot_commit_info(struct super_block *sb, int type); extern int dquot_mark_dquot_dirty(struct dquot *dquot); -extern int vfs_quota_on(struct super_block *sb, int type, int format_id, char *path); +extern int vfs_quota_on(struct super_block *sb, int type, int format_id, + char *path, int remount); extern int vfs_quota_on_mount(struct super_block *sb, char *qf_name, int format_id, int type); -extern int vfs_quota_off(struct super_block *sb, int type); +extern int vfs_quota_off(struct super_block *sb, int type, int remount); extern int vfs_quota_sync(struct super_block *sb, int type); extern int vfs_get_dqinfo(struct super_block *sb, int type, struct if_dqinfo *ii); extern int vfs_set_dqinfo(struct super_block *sb, int type, struct if_dqinfo *ii); @@ -175,12 +176,27 @@ static inline void DQUOT_SYNC(struct super_block *sb) sync_dquots(sb, -1); } -static inline int DQUOT_OFF(struct super_block *sb) +static inline int DQUOT_OFF(struct super_block *sb, int remount) { int ret = -ENOSYS; - if (sb_any_quota_enabled(sb) && sb->s_qcop && sb->s_qcop->quota_off) - ret = sb->s_qcop->quota_off(sb, -1); + if (sb->s_qcop && sb->s_qcop->quota_off) + ret = sb->s_qcop->quota_off(sb, -1, remount); + return ret; +} + +static inline int DQUOT_ON_REMOUNT(struct super_block *sb) +{ + int cnt; + int ret = 0, err; + + if (!sb->s_qcop || !sb->s_qcop->quota_on) + return -ENOSYS; + for (cnt = 0; cnt < MAXQUOTAS; cnt++) { + err = sb->s_qcop->quota_on(sb, cnt, 0, NULL, 1); + if (err < 0 && !ret) + ret = err; + } return ret; } @@ -196,7 +212,8 @@ static inline int DQUOT_OFF(struct super_block *sb) #define DQUOT_ALLOC_INODE(inode) (0) #define DQUOT_FREE_INODE(inode) do { } while(0) #define DQUOT_SYNC(sb) do { } while(0) -#define DQUOT_OFF(sb) (0) +#define DQUOT_OFF(sb, remount) (0) +#define DQUOT_ON_REMOUNT(sb) (0) #define DQUOT_TRANSFER(inode, iattr) (0) static inline int DQUOT_PREALLOC_SPACE_NODIRTY(struct inode *inode, qsize_t nr) { -- cgit v1.2.3 From 2fd83a4f3cd5a725168e3a269746dfce2adfa56a Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Mon, 28 Apr 2008 02:14:34 -0700 Subject: quota: ext3: make ext3 handle quotaon on remount Update ext3 handle quotaon on remount RW. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Jan Kara Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/ext3/super.c | 19 ++++++++++--------- 1 file changed, 10 insertions(+), 9 deletions(-) diff --git a/fs/ext3/super.c b/fs/ext3/super.c index ad5360664082..883ff965d984 100644 --- a/fs/ext3/super.c +++ b/fs/ext3/super.c @@ -685,7 +685,8 @@ static int ext3_acquire_dquot(struct dquot *dquot); static int ext3_release_dquot(struct dquot *dquot); static int ext3_mark_dquot_dirty(struct dquot *dquot); static int ext3_write_info(struct super_block *sb, int type); -static int ext3_quota_on(struct super_block *sb, int type, int format_id, char *path); +static int ext3_quota_on(struct super_block *sb, int type, int format_id, + char *path, int remount); static int ext3_quota_on_mount(struct super_block *sb, int type); static ssize_t ext3_quota_read(struct super_block *sb, int type, char *data, size_t len, loff_t off); @@ -1415,7 +1416,7 @@ static void ext3_orphan_cleanup (struct super_block * sb, /* Turn quotas off */ for (i = 0; i < MAXQUOTAS; i++) { if (sb_dqopt(sb)->files[i]) - vfs_quota_off(sb, i); + vfs_quota_off(sb, i, 0); } #endif sb->s_flags = s_flags; /* Restore MS_RDONLY status */ @@ -2743,17 +2744,17 @@ static int ext3_quota_on_mount(struct super_block *sb, int type) * Standard function to be called on quota_on */ static int ext3_quota_on(struct super_block *sb, int type, int format_id, - char *path) + char *path, int remount) { int err; struct nameidata nd; if (!test_opt(sb, QUOTA)) return -EINVAL; - /* Not journalling quota? */ - if (!EXT3_SB(sb)->s_qf_names[USRQUOTA] && - !EXT3_SB(sb)->s_qf_names[GRPQUOTA]) - return vfs_quota_on(sb, type, format_id, path); + /* Not journalling quota or remount? */ + if ((!EXT3_SB(sb)->s_qf_names[USRQUOTA] && + !EXT3_SB(sb)->s_qf_names[GRPQUOTA]) || remount) + return vfs_quota_on(sb, type, format_id, path, remount); err = path_lookup(path, LOOKUP_FOLLOW, &nd); if (err) return err; @@ -2762,13 +2763,13 @@ static int ext3_quota_on(struct super_block *sb, int type, int format_id, path_put(&nd.path); return -EXDEV; } - /* Quotafile not of fs root? */ + /* Quotafile not in fs root? */ if (nd.path.dentry->d_parent->d_inode != sb->s_root->d_inode) printk(KERN_WARNING "EXT3-fs: Quota file not on filesystem root. " "Journalled quota will not work.\n"); path_put(&nd.path); - return vfs_quota_on(sb, type, format_id, path); + return vfs_quota_on(sb, type, format_id, path, remount); } /* Read data from quotafile - avoid pagecache and such because we cannot afford -- cgit v1.2.3 From 6f28e08794749f3431e89302728e612343d9d9e4 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Mon, 28 Apr 2008 02:14:34 -0700 Subject: quota: ext4: make ext4 handle quotaon on remount Update ext4 to handle quotaon on remount RW. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Jan Kara Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/ext4/super.c | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/fs/ext4/super.c b/fs/ext4/super.c index 13383ba18f1d..c81a8e759bad 100644 --- a/fs/ext4/super.c +++ b/fs/ext4/super.c @@ -813,7 +813,8 @@ static int ext4_acquire_dquot(struct dquot *dquot); static int ext4_release_dquot(struct dquot *dquot); static int ext4_mark_dquot_dirty(struct dquot *dquot); static int ext4_write_info(struct super_block *sb, int type); -static int ext4_quota_on(struct super_block *sb, int type, int format_id, char *path); +static int ext4_quota_on(struct super_block *sb, int type, int format_id, + char *path, int remount); static int ext4_quota_on_mount(struct super_block *sb, int type); static ssize_t ext4_quota_read(struct super_block *sb, int type, char *data, size_t len, loff_t off); @@ -1632,7 +1633,7 @@ static void ext4_orphan_cleanup (struct super_block * sb, /* Turn quotas off */ for (i = 0; i < MAXQUOTAS; i++) { if (sb_dqopt(sb)->files[i]) - vfs_quota_off(sb, i); + vfs_quota_off(sb, i, 0); } #endif sb->s_flags = s_flags; /* Restore MS_RDONLY status */ @@ -3143,7 +3144,7 @@ static int ext4_quota_on_mount(struct super_block *sb, int type) * Standard function to be called on quota_on */ static int ext4_quota_on(struct super_block *sb, int type, int format_id, - char *path) + char *path, int remount) { int err; struct nameidata nd; @@ -3151,9 +3152,9 @@ static int ext4_quota_on(struct super_block *sb, int type, int format_id, if (!test_opt(sb, QUOTA)) return -EINVAL; /* Not journalling quota? */ - if (!EXT4_SB(sb)->s_qf_names[USRQUOTA] && - !EXT4_SB(sb)->s_qf_names[GRPQUOTA]) - return vfs_quota_on(sb, type, format_id, path); + if ((!EXT4_SB(sb)->s_qf_names[USRQUOTA] && + !EXT4_SB(sb)->s_qf_names[GRPQUOTA]) || remount) + return vfs_quota_on(sb, type, format_id, path, remount); err = path_lookup(path, LOOKUP_FOLLOW, &nd); if (err) return err; @@ -3168,7 +3169,7 @@ static int ext4_quota_on(struct super_block *sb, int type, int format_id, "EXT4-fs: Quota file not on filesystem root. " "Journalled quota will not work.\n"); path_put(&nd.path); - return vfs_quota_on(sb, type, format_id, path); + return vfs_quota_on(sb, type, format_id, path, remount); } /* Read data from quotafile - avoid pagecache and such because we cannot afford -- cgit v1.2.3 From 1b445a9c21f593be9d3c4ab912359d2c51c371dd Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Mon, 28 Apr 2008 02:14:35 -0700 Subject: quota: reiserfs: make reiserfs handle quotaon on remount Update reiserfs to handle quotaon on remount RW. Signed-off-by: Jan Kara Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- fs/reiserfs/super.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) diff --git a/fs/reiserfs/super.c b/fs/reiserfs/super.c index 3e1972d31e44..f8f9a473e670 100644 --- a/fs/reiserfs/super.c +++ b/fs/reiserfs/super.c @@ -304,7 +304,7 @@ static int finish_unfinished(struct super_block *s) /* Turn quotas off */ for (i = 0; i < MAXQUOTAS; i++) { if (sb_dqopt(s)->files[i]) - vfs_quota_off(s, i); + vfs_quota_off(s, i, 0); } if (ms_active_set) /* Restore the flag back */ @@ -634,7 +634,7 @@ static int reiserfs_acquire_dquot(struct dquot *); static int reiserfs_release_dquot(struct dquot *); static int reiserfs_mark_dquot_dirty(struct dquot *); static int reiserfs_write_info(struct super_block *, int); -static int reiserfs_quota_on(struct super_block *, int, int, char *); +static int reiserfs_quota_on(struct super_block *, int, int, char *, int); static struct dquot_operations reiserfs_quota_operations = { .initialize = reiserfs_dquot_initialize, @@ -2015,13 +2015,16 @@ static int reiserfs_quota_on_mount(struct super_block *sb, int type) * Standard function to be called on quota_on */ static int reiserfs_quota_on(struct super_block *sb, int type, int format_id, - char *path) + char *path, int remount) { int err; struct nameidata nd; if (!(REISERFS_SB(sb)->s_mount_opt & (1 << REISERFS_QUOTA))) return -EINVAL; + /* No more checks needed? Path and format_id are bogus anyway... */ + if (remount) + return vfs_quota_on(sb, type, format_id, path, 1); err = path_lookup(path, LOOKUP_FOLLOW, &nd); if (err) return err; @@ -2041,7 +2044,7 @@ static int reiserfs_quota_on(struct super_block *sb, int type, int format_id, if (!REISERFS_SB(sb)->s_qf_names[USRQUOTA] && !REISERFS_SB(sb)->s_qf_names[GRPQUOTA]) { path_put(&nd.path); - return vfs_quota_on(sb, type, format_id, path); + return vfs_quota_on(sb, type, format_id, path, 0); } /* Quotafile not of fs root? */ if (nd.path.dentry->d_parent->d_inode != sb->s_root->d_inode) @@ -2049,7 +2052,7 @@ static int reiserfs_quota_on(struct super_block *sb, int type, int format_id, "reiserfs: Quota file not on filesystem root. " "Journalled quota will not work."); path_put(&nd.path); - return vfs_quota_on(sb, type, format_id, path); + return vfs_quota_on(sb, type, format_id, path, 0); } /* Read data from quotafile - avoid pagecache and such because we cannot afford -- cgit v1.2.3 From 50f8c370e77befe9121720bd7bdada2ac0d13915 Mon Sep 17 00:00:00 2001 From: Andrew Morton Date: Mon, 28 Apr 2008 02:14:35 -0700 Subject: quota: convert stub functions from macros into inlines Fixes things like this: fs/super.c: In function `deactivate_super': fs/super.c:182: warning: statement with no effect fs/super.c: In function `do_remount_sb': fs/super.c:644: warning: statement with no effect Cc: Jan Kara Cc: Al Viro Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/quotaops.h | 45 +++++++++++++++++++++++++++++++++++++-------- 1 file changed, 37 insertions(+), 8 deletions(-) diff --git a/include/linux/quotaops.h b/include/linux/quotaops.h index c97c8f3fa6ee..f86702053853 100644 --- a/include/linux/quotaops.h +++ b/include/linux/quotaops.h @@ -207,14 +207,43 @@ static inline int DQUOT_ON_REMOUNT(struct super_block *sb) */ #define sb_dquot_ops (NULL) #define sb_quotactl_ops (NULL) -#define DQUOT_INIT(inode) do { } while(0) -#define DQUOT_DROP(inode) do { } while(0) -#define DQUOT_ALLOC_INODE(inode) (0) -#define DQUOT_FREE_INODE(inode) do { } while(0) -#define DQUOT_SYNC(sb) do { } while(0) -#define DQUOT_OFF(sb, remount) (0) -#define DQUOT_ON_REMOUNT(sb) (0) -#define DQUOT_TRANSFER(inode, iattr) (0) + +static inline void DQUOT_INIT(struct inode *inode) +{ +} + +static inline void DQUOT_DROP(struct inode *inode) +{ +} + +static inline int DQUOT_ALLOC_INODE(struct inode *inode) +{ + return 0; +} + +static inline void DQUOT_FREE_INODE(struct inode *inode) +{ +} + +static inline void DQUOT_SYNC(struct super_block *sb) +{ +} + +static inline int DQUOT_OFF(struct super_block *sb, int remount) +{ + return 0; +} + +static inline int DQUOT_ON_REMOUNT(struct super_block *sb) +{ + return 0; +} + +static inline int DQUOT_TRANSFER(struct inode *inode, struct iattr *iattr) +{ + return 0; +} + static inline int DQUOT_PREALLOC_SPACE_NODIRTY(struct inode *inode, qsize_t nr) { inode_add_bytes(inode, nr); -- cgit v1.2.3 From 2f9e9b6db31d96fe4e8b519b8aab1ba172dd3ddf Mon Sep 17 00:00:00 2001 From: Harvey Harrison Date: Mon, 28 Apr 2008 02:14:37 -0700 Subject: capi: fix sparse warnings using integer as NULL pointer drivers/isdn/capi/kcapi.c:829:30: warning: Using plain integer as NULL pointer drivers/isdn/capi/kcapi.c:838:27: warning: Using plain integer as NULL pointer drivers/isdn/capi/kcapi.c:954:17: warning: Using plain integer as NULL pointer drivers/isdn/capi/kcapi.c:1007:37: warning: Using plain integer as NULL pointer drivers/isdn/capi/kcapi.c:1009:33: warning: Using plain integer as NULL pointer drivers/isdn/capi/capiutil.c:453:24: warning: Using plain integer as NULL pointer drivers/isdn/capi/capilib.c:47:30: warning: Using plain integer as NULL pointer drivers/isdn/capi/capi.c:353:29: warning: Using plain integer as NULL pointer drivers/isdn/capi/capi.c:369:15: warning: Using plain integer as NULL pointer drivers/isdn/capi/capi.c:486:48: warning: Using plain integer as NULL pointer drivers/isdn/capi/capi.c:515:46: warning: Using plain integer as NULL pointer drivers/isdn/capi/capi.c:541:47: warning: Using plain integer as NULL pointer drivers/isdn/capi/capi.c:692:47: warning: Using plain integer as NULL pointer drivers/isdn/capi/capi.c:699:49: warning: Using plain integer as NULL pointer drivers/isdn/capi/capi.c:704:14: warning: Using plain integer as NULL pointer drivers/isdn/capi/capi.c:943:53: warning: Using plain integer as NULL pointer drivers/isdn/capi/capi.c:948:32: warning: Using plain integer as NULL pointer drivers/isdn/capi/capi.c:969:42: warning: Using plain integer as NULL pointer drivers/isdn/capi/capi.c:989:48: warning: Using plain integer as NULL pointer drivers/isdn/capi/capi.c:1026:69: warning: Using plain integer as NULL pointer drivers/isdn/capi/capi.c:1028:19: warning: Using plain integer as NULL pointer drivers/isdn/capi/capi.c:1061:20: warning: Using plain integer as NULL pointer drivers/isdn/capi/capi.c:1529:37: warning: Using plain integer as NULL pointer drivers/isdn/capi/capi.c:1531:33: warning: Using plain integer as NULL pointer drivers/isdn/capi/capidrv.c:338:15: warning: Using plain integer as NULL pointer drivers/isdn/capi/capidrv.c:758:32: warning: Using plain integer as NULL pointer drivers/isdn/capi/capidrv.c:880:40: warning: Using plain integer as NULL pointer drivers/isdn/capi/capidrv.c:407:15: warning: Using plain integer as NULL pointer drivers/isdn/capi/capidrv.c:429:49: warning: Using plain integer as NULL pointer drivers/isdn/capi/capidrv.c:407:15: warning: Using plain integer as NULL pointer drivers/isdn/capi/capidrv.c:444:49: warning: Using plain integer as NULL pointer drivers/isdn/capi/capidrv.c:429:49: warning: Using plain integer as NULL pointer drivers/isdn/capi/capidrv.c:429:49: warning: Using plain integer as NULL pointer drivers/isdn/capi/capidrv.c:429:49: warning: Using plain integer as NULL pointer drivers/isdn/capi/capidrv.c:429:49: warning: Using plain integer as NULL pointer drivers/isdn/capi/capidrv.c:429:49: warning: Using plain integer as NULL pointer drivers/isdn/capi/capidrv.c:1664:61: warning: Using plain integer as NULL pointer drivers/isdn/capi/capidrv.c:1969:37: warning: Using plain integer as NULL pointer drivers/isdn/capi/capidrv.c:2294:37: warning: Using plain integer as NULL pointer drivers/isdn/capi/capidrv.c:2297:33: warning: Using plain integer as NULL pointer drivers/isdn/capi/capidrv.c:2338:37: warning: Using plain integer as NULL pointer drivers/isdn/capi/capidrv.c:2341:33: warning: Using plain integer as NULL pointer drivers/isdn/capi/capifs.c:192:37: warning: Using plain integer as NULL pointer drivers/isdn/capi/capifs.c:194:33: warning: Using plain integer as NULL pointer Signed-off-by: Harvey Harrison Cc: Karsten Keil Cc: Jeff Garzik Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/isdn/capi/capi.c | 34 +++++++++++++++++----------------- drivers/isdn/capi/capidrv.c | 24 ++++++++++++------------ drivers/isdn/capi/capifs.c | 4 ++-- drivers/isdn/capi/capilib.c | 2 +- drivers/isdn/capi/capiutil.c | 2 +- drivers/isdn/capi/kcapi.c | 10 +++++----- 6 files changed, 38 insertions(+), 38 deletions(-) diff --git a/drivers/isdn/capi/capi.c b/drivers/isdn/capi/capi.c index 23ae66c76d47..24c6b7ca62be 100644 --- a/drivers/isdn/capi/capi.c +++ b/drivers/isdn/capi/capi.c @@ -350,7 +350,7 @@ static void capincci_free(struct capidev *cdev, u32 ncci) if (ncci == 0xffffffff || np->ncci == ncci) { *pp = (*pp)->next; #ifdef CONFIG_ISDN_CAPI_MIDDLEWARE - if ((mp = np->minorp) != 0) { + if ((mp = np->minorp) != NULL) { #if defined(CONFIG_ISDN_CAPI_CAPIFS) || defined(CONFIG_ISDN_CAPI_CAPIFS_MODULE) capifs_free_ncci(mp->minor); #endif @@ -366,7 +366,7 @@ static void capincci_free(struct capidev *cdev, u32 ncci) } #endif /* CONFIG_ISDN_CAPI_MIDDLEWARE */ kfree(np); - if (*pp == 0) return; + if (*pp == NULL) return; } else { pp = &(*pp)->next; } @@ -483,7 +483,7 @@ static int handle_recv_skb(struct capiminor *mp, struct sk_buff *skb) #endif goto bad; } - if ((nskb = gen_data_b3_resp_for(mp, skb)) == 0) { + if ((nskb = gen_data_b3_resp_for(mp, skb)) == NULL) { printk(KERN_ERR "capi: gen_data_b3_resp failed\n"); goto bad; } @@ -512,7 +512,7 @@ bad: static void handle_minor_recv(struct capiminor *mp) { struct sk_buff *skb; - while ((skb = skb_dequeue(&mp->inqueue)) != 0) { + while ((skb = skb_dequeue(&mp->inqueue)) != NULL) { unsigned int len = skb->len; mp->inbytes -= len; if (handle_recv_skb(mp, skb) < 0) { @@ -538,7 +538,7 @@ static int handle_minor_send(struct capiminor *mp) return 0; } - while ((skb = skb_dequeue(&mp->outqueue)) != 0) { + while ((skb = skb_dequeue(&mp->outqueue)) != NULL) { datahandle = mp->datahandle; len = (u16)skb->len; skb_push(skb, CAPI_DATA_B3_REQ_LEN); @@ -689,19 +689,19 @@ capi_read(struct file *file, char __user *buf, size_t count, loff_t *ppos) if (!cdev->ap.applid) return -ENODEV; - if ((skb = skb_dequeue(&cdev->recvqueue)) == 0) { + if ((skb = skb_dequeue(&cdev->recvqueue)) == NULL) { if (file->f_flags & O_NONBLOCK) return -EAGAIN; for (;;) { interruptible_sleep_on(&cdev->recvwait); - if ((skb = skb_dequeue(&cdev->recvqueue)) != 0) + if ((skb = skb_dequeue(&cdev->recvqueue)) != NULL) break; if (signal_pending(current)) break; } - if (skb == 0) + if (skb == NULL) return -ERESTARTNOHAND; } if (skb->len > count) { @@ -940,12 +940,12 @@ capi_ioctl(struct inode *inode, struct file *file, return -EFAULT; mutex_lock(&cdev->ncci_list_mtx); - if ((nccip = capincci_find(cdev, (u32) ncci)) == 0) { + if ((nccip = capincci_find(cdev, (u32) ncci)) == NULL) { mutex_unlock(&cdev->ncci_list_mtx); return 0; } #ifdef CONFIG_ISDN_CAPI_MIDDLEWARE - if ((mp = nccip->minorp) != 0) { + if ((mp = nccip->minorp) != NULL) { count += atomic_read(&mp->ttyopencount); } #endif /* CONFIG_ISDN_CAPI_MIDDLEWARE */ @@ -966,7 +966,7 @@ capi_ioctl(struct inode *inode, struct file *file, return -EFAULT; mutex_lock(&cdev->ncci_list_mtx); nccip = capincci_find(cdev, (u32) ncci); - if (!nccip || (mp = nccip->minorp) == 0) { + if (!nccip || (mp = nccip->minorp) == NULL) { mutex_unlock(&cdev->ncci_list_mtx); return -ESRCH; } @@ -986,7 +986,7 @@ capi_open(struct inode *inode, struct file *file) if (file->private_data) return -EEXIST; - if ((file->private_data = capidev_alloc()) == 0) + if ((file->private_data = capidev_alloc()) == NULL) return -ENOMEM; return nonseekable_open(inode, file); @@ -1023,9 +1023,9 @@ static int capinc_tty_open(struct tty_struct * tty, struct file * file) struct capiminor *mp; unsigned long flags; - if ((mp = capiminor_find(iminor(file->f_path.dentry->d_inode))) == 0) + if ((mp = capiminor_find(iminor(file->f_path.dentry->d_inode))) == NULL) return -ENXIO; - if (mp->nccip == 0) + if (mp->nccip == NULL) return -ENXIO; tty->driver_data = (void *)mp; @@ -1058,7 +1058,7 @@ static void capinc_tty_close(struct tty_struct * tty, struct file * file) #ifdef _DEBUG_REFCOUNT printk(KERN_DEBUG "capinc_tty_close ocount=%d\n", atomic_read(&mp->ttyopencount)); #endif - if (mp->nccip == 0) + if (mp->nccip == NULL) capiminor_free(mp); } @@ -1526,9 +1526,9 @@ static int __init capi_init(void) char *compileinfo; int major_ret; - if ((p = strchr(revision, ':')) != 0 && p[1]) { + if ((p = strchr(revision, ':')) != NULL && p[1]) { strlcpy(rev, p + 2, sizeof(rev)); - if ((p = strchr(rev, '$')) != 0 && p > rev) + if ((p = strchr(rev, '$')) != NULL && p > rev) *(p-1) = 0; } else strcpy(rev, "1.0"); diff --git a/drivers/isdn/capi/capidrv.c b/drivers/isdn/capi/capidrv.c index cb42b690b45e..2e602dd07ffa 100644 --- a/drivers/isdn/capi/capidrv.c +++ b/drivers/isdn/capi/capidrv.c @@ -335,7 +335,7 @@ static capidrv_plci *new_plci(capidrv_contr * card, int chan) plcip = kzalloc(sizeof(capidrv_plci), GFP_ATOMIC); - if (plcip == 0) + if (plcip == NULL) return NULL; plcip->state = ST_PLCI_NONE; @@ -404,7 +404,7 @@ static inline capidrv_ncci *new_ncci(capidrv_contr * card, nccip = kzalloc(sizeof(capidrv_ncci), GFP_ATOMIC); - if (nccip == 0) + if (nccip == NULL) return NULL; nccip->ncci = ncci; @@ -426,7 +426,7 @@ static inline capidrv_ncci *find_ncci(capidrv_contr * card, u32 ncci) capidrv_plci *plcip; capidrv_ncci *p; - if ((plcip = find_plci_by_ncci(card, ncci)) == 0) + if ((plcip = find_plci_by_ncci(card, ncci)) == NULL) return NULL; for (p = plcip->ncci_list; p; p = p->next) @@ -441,7 +441,7 @@ static inline capidrv_ncci *find_ncci_by_msgid(capidrv_contr * card, capidrv_plci *plcip; capidrv_ncci *p; - if ((plcip = find_plci_by_ncci(card, ncci)) == 0) + if ((plcip = find_plci_by_ncci(card, ncci)) == NULL) return NULL; for (p = plcip->ncci_list; p; p = p->next) @@ -755,7 +755,7 @@ static inline int new_bchan(capidrv_contr * card) { int i; for (i = 0; i < card->nbchan; i++) { - if (card->bchans[i].plcip == 0) { + if (card->bchans[i].plcip == NULL) { card->bchans[i].disconnecting = 0; return i; } @@ -877,7 +877,7 @@ static void handle_incoming_call(capidrv_contr * card, _cmsg * cmsg) return; } bchan = &card->bchans[chan]; - if ((plcip = new_plci(card, chan)) == 0) { + if ((plcip = new_plci(card, chan)) == NULL) { printk(KERN_ERR "capidrv-%d: incoming call: no memory, sorry.\n", card->contrnr); return; } @@ -1661,7 +1661,7 @@ static int capidrv_command(isdn_ctrl * c, capidrv_contr * card) NULL, /* Useruserdata */ NULL /* Facilitydataarray */ ); - if ((plcip = new_plci(card, (c->arg % card->nbchan))) == 0) { + if ((plcip = new_plci(card, (c->arg % card->nbchan))) == NULL) { cmd.command = ISDN_STAT_DHUP; cmd.driver = card->myid; cmd.arg = (c->arg % card->nbchan); @@ -1966,7 +1966,7 @@ static void enable_dchannel_trace(capidrv_contr *card) card->name, errcode); return; } - if (strstr(manufacturer, "AVM") == 0) { + if (strstr(manufacturer, "AVM") == NULL) { printk(KERN_ERR "%s: not from AVM, no d-channel trace possible (%s)\n", card->name, manufacturer); return; @@ -2291,10 +2291,10 @@ static int __init capidrv_init(void) u32 ncontr, contr; u16 errcode; - if ((p = strchr(revision, ':')) != 0 && p[1]) { + if ((p = strchr(revision, ':')) != NULL && p[1]) { strncpy(rev, p + 2, sizeof(rev)); rev[sizeof(rev)-1] = 0; - if ((p = strchr(rev, '$')) != 0 && p > rev) + if ((p = strchr(rev, '$')) != NULL && p > rev) *(p-1) = 0; } else strcpy(rev, "1.0"); @@ -2335,10 +2335,10 @@ static void __exit capidrv_exit(void) char rev[32]; char *p; - if ((p = strchr(revision, ':')) != 0) { + if ((p = strchr(revision, ':')) != NULL) { strncpy(rev, p + 1, sizeof(rev)); rev[sizeof(rev)-1] = 0; - if ((p = strchr(rev, '$')) != 0) + if ((p = strchr(rev, '$')) != NULL) *p = 0; } else { strcpy(rev, " ??? "); diff --git a/drivers/isdn/capi/capifs.c b/drivers/isdn/capi/capifs.c index 6d7c47ec0367..eafe0e9daa7c 100644 --- a/drivers/isdn/capi/capifs.c +++ b/drivers/isdn/capi/capifs.c @@ -189,9 +189,9 @@ static int __init capifs_init(void) char *p; int err; - if ((p = strchr(revision, ':')) != 0 && p[1]) { + if ((p = strchr(revision, ':')) != NULL && p[1]) { strlcpy(rev, p + 2, sizeof(rev)); - if ((p = strchr(rev, '$')) != 0 && p > rev) + if ((p = strchr(rev, '$')) != NULL && p > rev) *(p-1) = 0; } else strcpy(rev, "1.0"); diff --git a/drivers/isdn/capi/capilib.c b/drivers/isdn/capi/capilib.c index 68409d971e73..34d8be2761c1 100644 --- a/drivers/isdn/capi/capilib.c +++ b/drivers/isdn/capi/capilib.c @@ -44,7 +44,7 @@ static inline void mq_init(struct capilib_ncci * np) static inline int mq_enqueue(struct capilib_ncci * np, u16 msgid) { struct capilib_msgidqueue *mq; - if ((mq = np->msgidfree) == 0) + if ((mq = np->msgidfree) == NULL) return 0; np->msgidfree = mq->next; mq->msgid = msgid; diff --git a/drivers/isdn/capi/capiutil.c b/drivers/isdn/capi/capiutil.c index 22379b94e88f..ebef4ce1b00c 100644 --- a/drivers/isdn/capi/capiutil.c +++ b/drivers/isdn/capi/capiutil.c @@ -450,7 +450,7 @@ static void pars_2_message(_cmsg * cmsg) cmsg->l += 4; break; case _CSTRUCT: - if (*(u8 **) OFF == 0) { + if (*(u8 **) OFF == NULL) { *(cmsg->m + cmsg->l) = '\0'; cmsg->l++; } else if (**(_cstruct *) OFF != 0xff) { diff --git a/drivers/isdn/capi/kcapi.c b/drivers/isdn/capi/kcapi.c index f55531869313..ef6de217b9fc 100644 --- a/drivers/isdn/capi/kcapi.c +++ b/drivers/isdn/capi/kcapi.c @@ -826,7 +826,7 @@ static int old_capi_manufacturer(unsigned int cmd, void __user *data) card = capi_ctr_get(card); if (!card) return -ESRCH; - if (card->load_firmware == 0) { + if (card->load_firmware == NULL) { printk(KERN_DEBUG "kcapi: load: no load function\n"); return -ESRCH; } @@ -835,7 +835,7 @@ static int old_capi_manufacturer(unsigned int cmd, void __user *data) printk(KERN_DEBUG "kcapi: load: invalid parameter: length of t4file is %d ?\n", ldef.t4file.len); return -EINVAL; } - if (ldef.t4file.data == 0) { + if (ldef.t4file.data == NULL) { printk(KERN_DEBUG "kcapi: load: invalid parameter: dataptr is 0\n"); return -EINVAL; } @@ -951,7 +951,7 @@ int capi20_manufacturer(unsigned int cmd, void __user *data) if (strcmp(driver->name, cdef.driver) == 0) break; } - if (driver == 0) { + if (driver == NULL) { printk(KERN_ERR "kcapi: driver \"%s\" not loaded.\n", cdef.driver); return -ESRCH; @@ -1004,9 +1004,9 @@ static int __init kcapi_init(void) return ret; kcapi_proc_init(); - if ((p = strchr(revision, ':')) != 0 && p[1]) { + if ((p = strchr(revision, ':')) != NULL && p[1]) { strlcpy(rev, p + 2, sizeof(rev)); - if ((p = strchr(rev, '$')) != 0 && p > rev) + if ((p = strchr(rev, '$')) != NULL && p > rev) *(p-1) = 0; } else strcpy(rev, "1.0"); -- cgit v1.2.3 From 8e44b29da5300f4698c41b5fd2d1ce52c28e2148 Mon Sep 17 00:00:00 2001 From: Harvey Harrison Date: Mon, 28 Apr 2008 02:14:38 -0700 Subject: avm: fix sparse warning using integer as NULL pointer drivers/isdn/hardware/avm/b1isa.c:206:37: warning: Using plain integer as NULL pointer drivers/isdn/hardware/avm/b1isa.c:208:33: warning: Using plain integer as NULL pointer drivers/isdn/hardware/avm/b1.c:664:42: warning: Using plain integer as NULL pointer drivers/isdn/hardware/avm/b1.c:666:44: warning: Using plain integer as NULL pointer drivers/isdn/hardware/avm/b1.c:668:42: warning: Using plain integer as NULL pointer drivers/isdn/hardware/avm/b1.c:791:37: warning: Using plain integer as NULL pointer drivers/isdn/hardware/avm/b1.c:793:33: warning: Using plain integer as NULL pointer drivers/isdn/hardware/avm/b1pci.c:385:37: warning: Using plain integer as NULL pointer drivers/isdn/hardware/avm/b1pci.c:387:33: warning: Using plain integer as NULL pointer drivers/isdn/hardware/avm/b1dma.c:886:42: warning: Using plain integer as NULL pointer drivers/isdn/hardware/avm/b1dma.c:888:44: warning: Using plain integer as NULL pointer drivers/isdn/hardware/avm/b1dma.c:890:42: warning: Using plain integer as NULL pointer drivers/isdn/hardware/avm/b1dma.c:973:37: warning: Using plain integer as NULL pointer drivers/isdn/hardware/avm/b1dma.c:975:33: warning: Using plain integer as NULL pointer drivers/isdn/hardware/avm/b1pcmcia.c:204:37: warning: Using plain integer as NULL pointer drivers/isdn/hardware/avm/b1pcmcia.c:206:33: warning: Using plain integer as NULL pointer drivers/isdn/hardware/avm/t1isa.c:554:37: warning: Using plain integer as NULL pointer drivers/isdn/hardware/avm/t1isa.c:556:33: warning: Using plain integer as NULL pointer drivers/isdn/hardware/avm/t1pci.c:236:37: warning: Using plain integer as NULL pointer drivers/isdn/hardware/avm/t1pci.c:238:33: warning: Using plain integer as NULL pointer drivers/isdn/hardware/avm/c4.c:1091:42: warning: Using plain integer as NULL pointer drivers/isdn/hardware/avm/c4.c:1093:44: warning: Using plain integer as NULL pointer drivers/isdn/hardware/avm/c4.c:1095:42: warning: Using plain integer as NULL pointer drivers/isdn/hardware/avm/c4.c:1170:21: warning: Using plain integer as NULL pointer drivers/isdn/hardware/avm/c4.c:1294:37: warning: Using plain integer as NULL pointer drivers/isdn/hardware/avm/c4.c:1296:33: warning: Using plain integer as NULL pointer Signed-off-by: Harvey Harrison Cc: Karsten Keil Cc: Jeff Garzik Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/isdn/hardware/avm/b1.c | 10 +++++----- drivers/isdn/hardware/avm/b1dma.c | 10 +++++----- drivers/isdn/hardware/avm/b1isa.c | 4 ++-- drivers/isdn/hardware/avm/b1pci.c | 4 ++-- drivers/isdn/hardware/avm/b1pcmcia.c | 4 ++-- drivers/isdn/hardware/avm/c4.c | 12 ++++++------ drivers/isdn/hardware/avm/t1isa.c | 4 ++-- drivers/isdn/hardware/avm/t1pci.c | 4 ++-- 8 files changed, 26 insertions(+), 26 deletions(-) diff --git a/drivers/isdn/hardware/avm/b1.c b/drivers/isdn/hardware/avm/b1.c index 4484a6417235..abf05ec31760 100644 --- a/drivers/isdn/hardware/avm/b1.c +++ b/drivers/isdn/hardware/avm/b1.c @@ -661,11 +661,11 @@ int b1ctl_read_proc(char *page, char **start, off_t off, len += sprintf(page+len, "%-16s %s\n", "type", s); if (card->cardtype == avm_t1isa) len += sprintf(page+len, "%-16s %d\n", "cardnr", card->cardnr); - if ((s = cinfo->version[VER_DRIVER]) != 0) + if ((s = cinfo->version[VER_DRIVER]) != NULL) len += sprintf(page+len, "%-16s %s\n", "ver_driver", s); - if ((s = cinfo->version[VER_CARDTYPE]) != 0) + if ((s = cinfo->version[VER_CARDTYPE]) != NULL) len += sprintf(page+len, "%-16s %s\n", "ver_cardtype", s); - if ((s = cinfo->version[VER_SERIAL]) != 0) + if ((s = cinfo->version[VER_SERIAL]) != NULL) len += sprintf(page+len, "%-16s %s\n", "ver_serial", s); if (card->cardtype != avm_m1) { @@ -788,9 +788,9 @@ static int __init b1_init(void) char *p; char rev[32]; - if ((p = strchr(revision, ':')) != 0 && p[1]) { + if ((p = strchr(revision, ':')) != NULL && p[1]) { strlcpy(rev, p + 2, 32); - if ((p = strchr(rev, '$')) != 0 && p > rev) + if ((p = strchr(rev, '$')) != NULL && p > rev) *(p-1) = 0; } else strcpy(rev, "1.0"); diff --git a/drivers/isdn/hardware/avm/b1dma.c b/drivers/isdn/hardware/avm/b1dma.c index 669f6f67449c..da34b98e3de7 100644 --- a/drivers/isdn/hardware/avm/b1dma.c +++ b/drivers/isdn/hardware/avm/b1dma.c @@ -883,11 +883,11 @@ int b1dmactl_read_proc(char *page, char **start, off_t off, default: s = "???"; break; } len += sprintf(page+len, "%-16s %s\n", "type", s); - if ((s = cinfo->version[VER_DRIVER]) != 0) + if ((s = cinfo->version[VER_DRIVER]) != NULL) len += sprintf(page+len, "%-16s %s\n", "ver_driver", s); - if ((s = cinfo->version[VER_CARDTYPE]) != 0) + if ((s = cinfo->version[VER_CARDTYPE]) != NULL) len += sprintf(page+len, "%-16s %s\n", "ver_cardtype", s); - if ((s = cinfo->version[VER_SERIAL]) != 0) + if ((s = cinfo->version[VER_SERIAL]) != NULL) len += sprintf(page+len, "%-16s %s\n", "ver_serial", s); if (card->cardtype != avm_m1) { @@ -970,9 +970,9 @@ static int __init b1dma_init(void) char *p; char rev[32]; - if ((p = strchr(revision, ':')) != 0 && p[1]) { + if ((p = strchr(revision, ':')) != NULL && p[1]) { strlcpy(rev, p + 2, sizeof(rev)); - if ((p = strchr(rev, '$')) != 0 && p > rev) + if ((p = strchr(rev, '$')) != NULL && p > rev) *(p-1) = 0; } else strcpy(rev, "1.0"); diff --git a/drivers/isdn/hardware/avm/b1isa.c b/drivers/isdn/hardware/avm/b1isa.c index 80fb488848b8..1e288eeb5e2a 100644 --- a/drivers/isdn/hardware/avm/b1isa.c +++ b/drivers/isdn/hardware/avm/b1isa.c @@ -203,9 +203,9 @@ static int __init b1isa_init(void) char rev[32]; int i; - if ((p = strchr(revision, ':')) != 0 && p[1]) { + if ((p = strchr(revision, ':')) != NULL && p[1]) { strlcpy(rev, p + 2, 32); - if ((p = strchr(rev, '$')) != 0 && p > rev) + if ((p = strchr(rev, '$')) != NULL && p > rev) *(p-1) = 0; } else strcpy(rev, "1.0"); diff --git a/drivers/isdn/hardware/avm/b1pci.c b/drivers/isdn/hardware/avm/b1pci.c index 90e2e6643d19..5b314a2c4049 100644 --- a/drivers/isdn/hardware/avm/b1pci.c +++ b/drivers/isdn/hardware/avm/b1pci.c @@ -382,9 +382,9 @@ static int __init b1pci_init(void) char rev[32]; int err; - if ((p = strchr(revision, ':')) != 0 && p[1]) { + if ((p = strchr(revision, ':')) != NULL && p[1]) { strlcpy(rev, p + 2, 32); - if ((p = strchr(rev, '$')) != 0 && p > rev) + if ((p = strchr(rev, '$')) != NULL && p > rev) *(p-1) = 0; } else strcpy(rev, "1.0"); diff --git a/drivers/isdn/hardware/avm/b1pcmcia.c b/drivers/isdn/hardware/avm/b1pcmcia.c index e479c0aef38d..7740403b40e1 100644 --- a/drivers/isdn/hardware/avm/b1pcmcia.c +++ b/drivers/isdn/hardware/avm/b1pcmcia.c @@ -201,9 +201,9 @@ static int __init b1pcmcia_init(void) char *p; char rev[32]; - if ((p = strchr(revision, ':')) != 0 && p[1]) { + if ((p = strchr(revision, ':')) != NULL && p[1]) { strlcpy(rev, p + 2, 32); - if ((p = strchr(rev, '$')) != 0 && p > rev) + if ((p = strchr(rev, '$')) != NULL && p > rev) *(p-1) = 0; } else strcpy(rev, "1.0"); diff --git a/drivers/isdn/hardware/avm/c4.c b/drivers/isdn/hardware/avm/c4.c index 4bbbbe688077..9df1d3f66c87 100644 --- a/drivers/isdn/hardware/avm/c4.c +++ b/drivers/isdn/hardware/avm/c4.c @@ -1088,11 +1088,11 @@ static int c4_read_proc(char *page, char **start, off_t off, default: s = "???"; break; } len += sprintf(page+len, "%-16s %s\n", "type", s); - if ((s = cinfo->version[VER_DRIVER]) != 0) + if ((s = cinfo->version[VER_DRIVER]) != NULL) len += sprintf(page+len, "%-16s %s\n", "ver_driver", s); - if ((s = cinfo->version[VER_CARDTYPE]) != 0) + if ((s = cinfo->version[VER_CARDTYPE]) != NULL) len += sprintf(page+len, "%-16s %s\n", "ver_cardtype", s); - if ((s = cinfo->version[VER_SERIAL]) != 0) + if ((s = cinfo->version[VER_SERIAL]) != NULL) len += sprintf(page+len, "%-16s %s\n", "ver_serial", s); if (card->cardtype != avm_m1) { @@ -1167,7 +1167,7 @@ static int c4_add_card(struct capicardparams *p, struct pci_dev *dev, } card->mbase = ioremap(card->membase, 128); - if (card->mbase == 0) { + if (card->mbase == NULL) { printk(KERN_NOTICE "c4: can't remap memory at 0x%lx\n", card->membase); retval = -EIO; @@ -1291,9 +1291,9 @@ static int __init c4_init(void) char rev[32]; int err; - if ((p = strchr(revision, ':')) != 0 && p[1]) { + if ((p = strchr(revision, ':')) != NULL && p[1]) { strlcpy(rev, p + 2, 32); - if ((p = strchr(rev, '$')) != 0 && p > rev) + if ((p = strchr(rev, '$')) != NULL && p > rev) *(p-1) = 0; } else strcpy(rev, "1.0"); diff --git a/drivers/isdn/hardware/avm/t1isa.c b/drivers/isdn/hardware/avm/t1isa.c index 6130724e46e7..e7724493738c 100644 --- a/drivers/isdn/hardware/avm/t1isa.c +++ b/drivers/isdn/hardware/avm/t1isa.c @@ -551,9 +551,9 @@ static int __init t1isa_init(void) char *p; int i; - if ((p = strchr(revision, ':')) != 0 && p[1]) { + if ((p = strchr(revision, ':')) != NULL && p[1]) { strlcpy(rev, p + 2, 32); - if ((p = strchr(rev, '$')) != 0 && p > rev) + if ((p = strchr(rev, '$')) != NULL && p > rev) *(p-1) = 0; } else strcpy(rev, "1.0"); diff --git a/drivers/isdn/hardware/avm/t1pci.c b/drivers/isdn/hardware/avm/t1pci.c index d1e253c94db4..e6d298d75146 100644 --- a/drivers/isdn/hardware/avm/t1pci.c +++ b/drivers/isdn/hardware/avm/t1pci.c @@ -233,9 +233,9 @@ static int __init t1pci_init(void) char rev[32]; int err; - if ((p = strchr(revision, ':')) != 0 && p[1]) { + if ((p = strchr(revision, ':')) != NULL && p[1]) { strlcpy(rev, p + 2, 32); - if ((p = strchr(rev, '$')) != 0 && p > rev) + if ((p = strchr(rev, '$')) != NULL && p > rev) *(p-1) = 0; } else strcpy(rev, "1.0"); -- cgit v1.2.3 From dd58c0dd30ac761837b1d0d8cc434c7ec7b2df68 Mon Sep 17 00:00:00 2001 From: Harvey Harrison Date: Mon, 28 Apr 2008 02:14:39 -0700 Subject: eicon: fix sparse integer as NULL pointer warnings drivers/isdn/hardware/eicon/message.c:745:47: warning: Using plain integer as NULL pointer drivers/isdn/hardware/eicon/message.c:761:45: warning: Using plain integer as NULL pointer drivers/isdn/hardware/eicon/message.c:9122:16: warning: Using plain integer as NULL pointer drivers/isdn/hardware/eicon/message.c:9147:16: warning: Using plain integer as NULL pointer drivers/isdn/hardware/eicon/message.c:9173:14: warning: Using plain integer as NULL pointer drivers/isdn/hardware/eicon/divasmain.c:396:23: warning: Using plain integer as NULL pointer Signed-off-by: Harvey Harrison Cc: Karsten Keil Cc: Jeff Garzik Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/isdn/hardware/eicon/divasmain.c | 2 +- drivers/isdn/hardware/eicon/message.c | 12 ++++++------ 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/drivers/isdn/hardware/eicon/divasmain.c b/drivers/isdn/hardware/eicon/divasmain.c index 6d39f9360766..5fcbdccd7a53 100644 --- a/drivers/isdn/hardware/eicon/divasmain.c +++ b/drivers/isdn/hardware/eicon/divasmain.c @@ -393,7 +393,7 @@ void diva_free_dma_map(void *hdev, struct _diva_dma_map_entry *pmap) dma_addr_t dma_handle; void *addr_handle; - for (i = 0; (pmap != 0); i++) { + for (i = 0; (pmap != NULL); i++) { diva_get_dma_map_entry(pmap, i, &cpu_addr, &phys_addr); if (!cpu_addr) { break; diff --git a/drivers/isdn/hardware/eicon/message.c b/drivers/isdn/hardware/eicon/message.c index 1ff98e7eb794..599fed88222d 100644 --- a/drivers/isdn/hardware/eicon/message.c +++ b/drivers/isdn/hardware/eicon/message.c @@ -742,7 +742,7 @@ static void start_internal_command (dword Id, PLCI *plci, t_std_internal_comma else { i = 1; - while (plci->internal_command_queue[i] != 0) + while (plci->internal_command_queue[i] != NULL) i++; plci->internal_command_queue[i] = command_function; } @@ -758,7 +758,7 @@ static void next_internal_command (dword Id, PLCI *plci) plci->internal_command = 0; plci->internal_command_queue[0] = NULL; - while (plci->internal_command_queue[1] != 0) + while (plci->internal_command_queue[1] != NULL) { for (i = 0; i < MAX_INTERNAL_COMMAND_LEVELS - 1; i++) plci->internal_command_queue[i] = plci->internal_command_queue[i+1]; @@ -9119,7 +9119,7 @@ word AdvCodecSupport(DIVA_CAPI_ADAPTER *a, PLCI *plci, APPL *appl, byte ho dbug(1,dprintf("AdvSigPlci=0x%x",a->AdvSignalPLCI)); return 0x2001; /* codec in use by another application */ } - if(plci!=0) + if(plci!=NULL) { a->AdvSignalPLCI = plci; plci->tel=ADV_VOICE; @@ -9144,7 +9144,7 @@ word AdvCodecSupport(DIVA_CAPI_ADAPTER *a, PLCI *plci, APPL *appl, byte ho } /* indicate D-ch connect if */ } /* codec is connected OK */ - if(plci!=0) + if(plci!=NULL) { a->AdvSignalPLCI = plci; plci->tel=ADV_VOICE; @@ -9170,7 +9170,7 @@ word AdvCodecSupport(DIVA_CAPI_ADAPTER *a, PLCI *plci, APPL *appl, byte ho { if(hook_listen) return 0x300B; /* Facility not supported */ /* no hook with SCOM */ - if(plci!=0) plci->tel = CODEC; + if(plci!=NULL) plci->tel = CODEC; dbug(1,dprintf("S/SCOM codec")); /* first time we use the scom-s codec we must shut down the internal */ /* handset application of the card. This can be done by an assign with */ @@ -14604,7 +14604,7 @@ static void channel_xmit_extended_xon (PLCI * plci) { int max_ch = ARRAY_SIZE(a->ch_flow_control); int i, one_requested = 0; - if ((!plci) || (!plci->Id) || ((a = plci->adapter) == 0)) { + if ((!plci) || (!plci->Id) || ((a = plci->adapter) == NULL)) { return; } -- cgit v1.2.3 From 156f1ed640170d70c9fc8e5f6f797ea1f2a1653b Mon Sep 17 00:00:00 2001 From: Harvey Harrison Date: Mon, 28 Apr 2008 02:14:40 -0700 Subject: isdn: replace remaining __FUNCTION__ occurrences __FUNCTION__ is gcc-specific, use __func__ Signed-off-by: Harvey Harrison Cc: Karsten Keil Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/isdn/capi/capidrv.c | 4 ++-- drivers/isdn/capi/capilib.c | 2 +- drivers/isdn/capi/kcapi.c | 4 ++-- drivers/isdn/capi/kcapi.h | 2 +- drivers/isdn/hisax/asuscom.c | 2 +- drivers/isdn/hisax/avm_pci.c | 2 +- drivers/isdn/hisax/diva.c | 2 +- drivers/isdn/hisax/elsa.c | 2 +- drivers/isdn/hisax/hfc_sx.c | 2 +- drivers/isdn/hisax/hfc_usb.c | 6 +++--- drivers/isdn/hisax/hfcscard.c | 2 +- drivers/isdn/hisax/hisax_debug.h | 6 +++--- drivers/isdn/hisax/hisax_fcpcipnp.c | 2 +- drivers/isdn/hisax/ix1_micro.c | 2 +- drivers/isdn/hisax/niccy.c | 2 +- drivers/isdn/hisax/sedlbauer.c | 2 +- drivers/isdn/hisax/st5481.h | 10 +++++----- drivers/isdn/hisax/teles3.c | 2 +- drivers/isdn/i4l/isdn_common.c | 2 +- drivers/isdn/i4l/isdn_net.h | 6 +++--- drivers/isdn/i4l/isdn_ppp.c | 32 ++++++++++++++++---------------- drivers/isdn/i4l/isdn_tty.c | 6 +++--- 22 files changed, 51 insertions(+), 51 deletions(-) diff --git a/drivers/isdn/capi/capidrv.c b/drivers/isdn/capi/capidrv.c index 2e602dd07ffa..d5b4cc357a3c 100644 --- a/drivers/isdn/capi/capidrv.c +++ b/drivers/isdn/capi/capidrv.c @@ -1388,12 +1388,12 @@ static void capidrv_recv_message(struct capi20_appl *ap, struct sk_buff *skb) _cdebbuf *cdb = capi_cmsg2str(&s_cmsg); if (cdb) { - printk(KERN_DEBUG "%s: applid=%d %s\n", __FUNCTION__, + printk(KERN_DEBUG "%s: applid=%d %s\n", __func__, ap->applid, cdb->buf); cdebbuf_free(cdb); } else printk(KERN_DEBUG "%s: applid=%d %s not traced\n", - __FUNCTION__, ap->applid, + __func__, ap->applid, capi_cmd2str(s_cmsg.Command, s_cmsg.Subcommand)); } if (s_cmsg.Command == CAPI_DATA_B3 diff --git a/drivers/isdn/capi/capilib.c b/drivers/isdn/capi/capilib.c index 34d8be2761c1..fcaa1241ee77 100644 --- a/drivers/isdn/capi/capilib.c +++ b/drivers/isdn/capi/capilib.c @@ -4,7 +4,7 @@ #include #define DBG(format, arg...) do { \ -printk(KERN_DEBUG "%s: " format "\n" , __FUNCTION__ , ## arg); \ +printk(KERN_DEBUG "%s: " format "\n" , __func__ , ## arg); \ } while (0) struct capilib_msgidqueue { diff --git a/drivers/isdn/capi/kcapi.c b/drivers/isdn/capi/kcapi.c index ef6de217b9fc..063de5a29fcf 100644 --- a/drivers/isdn/capi/kcapi.c +++ b/drivers/isdn/capi/kcapi.c @@ -154,7 +154,7 @@ static void register_appl(struct capi_ctr *card, u16 applid, capi_register_param if (card) card->register_appl(card, applid, rparam); else - printk(KERN_WARNING "%s: cannot get card resources\n", __FUNCTION__); + printk(KERN_WARNING "%s: cannot get card resources\n", __func__); } @@ -178,7 +178,7 @@ static void notify_up(u32 contr) printk(KERN_DEBUG "kcapi: notify up contr %d\n", contr); } if (!card) { - printk(KERN_WARNING "%s: invalid contr %d\n", __FUNCTION__, contr); + printk(KERN_WARNING "%s: invalid contr %d\n", __func__, contr); return; } for (applid = 1; applid <= CAPI_MAXAPPL; applid++) { diff --git a/drivers/isdn/capi/kcapi.h b/drivers/isdn/capi/kcapi.h index 1cb2c40f9921..244711f7f838 100644 --- a/drivers/isdn/capi/kcapi.h +++ b/drivers/isdn/capi/kcapi.h @@ -17,7 +17,7 @@ #ifdef KCAPI_DEBUG #define DBG(format, arg...) do { \ -printk(KERN_DEBUG "%s: " format "\n" , __FUNCTION__ , ## arg); \ +printk(KERN_DEBUG "%s: " format "\n" , __func__ , ## arg); \ } while (0) #else #define DBG(format, arg...) /* */ diff --git a/drivers/isdn/hisax/asuscom.c b/drivers/isdn/hisax/asuscom.c index b96f3184c2e5..1f879b500d83 100644 --- a/drivers/isdn/hisax/asuscom.c +++ b/drivers/isdn/hisax/asuscom.c @@ -344,7 +344,7 @@ setup_asuscom(struct IsdnCard *card) err = pnp_activate_dev(pnp_d); if (err<0) { printk(KERN_WARNING "%s: pnp_activate_dev ret(%d)\n", - __FUNCTION__, err); + __func__, err); return(0); } card->para[1] = pnp_port_start(pnp_d, 0); diff --git a/drivers/isdn/hisax/avm_pci.c b/drivers/isdn/hisax/avm_pci.c index 0f1db1f669b2..7cabc5a19492 100644 --- a/drivers/isdn/hisax/avm_pci.c +++ b/drivers/isdn/hisax/avm_pci.c @@ -797,7 +797,7 @@ static int __devinit avm_pnp_setup(struct IsdnCardState *cs) err = pnp_activate_dev(pnp_avm_d); if (err<0) { printk(KERN_WARNING "%s: pnp_activate_dev ret(%d)\n", - __FUNCTION__, err); + __func__, err); return(0); } cs->hw.avm.cfg_reg = diff --git a/drivers/isdn/hisax/diva.c b/drivers/isdn/hisax/diva.c index 2d670856d141..018bd293e580 100644 --- a/drivers/isdn/hisax/diva.c +++ b/drivers/isdn/hisax/diva.c @@ -1088,7 +1088,7 @@ static int __devinit setup_diva_isapnp(struct IsdnCard *card) err = pnp_activate_dev(pnp_d); if (err<0) { printk(KERN_WARNING "%s: pnp_activate_dev ret(%d)\n", - __FUNCTION__, err); + __func__, err); return(0); } card->para[1] = pnp_port_start(pnp_d, 0); diff --git a/drivers/isdn/hisax/elsa.c b/drivers/isdn/hisax/elsa.c index 2c3691fda300..aa29d1cf16af 100644 --- a/drivers/isdn/hisax/elsa.c +++ b/drivers/isdn/hisax/elsa.c @@ -937,7 +937,7 @@ setup_elsa_isapnp(struct IsdnCard *card) err = pnp_activate_dev(pnp_d); if (err<0) { printk(KERN_WARNING "%s: pnp_activate_dev ret(%d)\n", - __FUNCTION__, err); + __func__, err); return(0); } card->para[1] = pnp_port_start(pnp_d, 0); diff --git a/drivers/isdn/hisax/hfc_sx.c b/drivers/isdn/hisax/hfc_sx.c index f4a213877e35..d92e8d6c2ae2 100644 --- a/drivers/isdn/hisax/hfc_sx.c +++ b/drivers/isdn/hisax/hfc_sx.c @@ -1417,7 +1417,7 @@ setup_hfcsx(struct IsdnCard *card) err = pnp_activate_dev(pnp_d); if (err<0) { printk(KERN_WARNING "%s: pnp_activate_dev ret(%d)\n", - __FUNCTION__, err); + __func__, err); return(0); } card->para[1] = pnp_port_start(pnp_d, 0); diff --git a/drivers/isdn/hisax/hfc_usb.c b/drivers/isdn/hisax/hfc_usb.c index 98b0149bca68..8df889b0c1a9 100644 --- a/drivers/isdn/hisax/hfc_usb.c +++ b/drivers/isdn/hisax/hfc_usb.c @@ -905,7 +905,7 @@ rx_int_complete(struct urb *urb) if (status) { printk(KERN_INFO "HFC-S USB: %s error resubmitting URB fifo(%d)\n", - __FUNCTION__, fifon); + __func__, fifon); } } @@ -1543,14 +1543,14 @@ hfc_usb_disconnect(struct usb_interface *intf) stop_isoc_chain(&context->fifos[i]); DBG(HFCUSB_DBG_INIT, "HFC-S USB: %s stopping ISOC chain Fifo(%i)", - __FUNCTION__, i); + __func__, i); } } else { if (context->fifos[i].active > 0) { context->fifos[i].active = 0; DBG(HFCUSB_DBG_INIT, "HFC-S USB: %s unlinking URB for Fifo(%i)", - __FUNCTION__, i); + __func__, i); } usb_kill_urb(context->fifos[i].urb); usb_free_urb(context->fifos[i].urb); diff --git a/drivers/isdn/hisax/hfcscard.c b/drivers/isdn/hisax/hfcscard.c index 909d6709ec16..cf082665cc8b 100644 --- a/drivers/isdn/hisax/hfcscard.c +++ b/drivers/isdn/hisax/hfcscard.c @@ -193,7 +193,7 @@ setup_hfcs(struct IsdnCard *card) err = pnp_activate_dev(pnp_d); if (err<0) { printk(KERN_WARNING "%s: pnp_activate_dev ret(%d)\n", - __FUNCTION__, err); + __func__, err); return(0); } card->para[1] = pnp_port_start(pnp_d, 0); diff --git a/drivers/isdn/hisax/hisax_debug.h b/drivers/isdn/hisax/hisax_debug.h index ceafecdb1037..5ed3b1c44184 100644 --- a/drivers/isdn/hisax/hisax_debug.h +++ b/drivers/isdn/hisax/hisax_debug.h @@ -27,14 +27,14 @@ #define DBG(level, format, arg...) do { \ if (level & __debug_variable) \ -printk(KERN_DEBUG "%s: " format "\n" , __FUNCTION__ , ## arg); \ +printk(KERN_DEBUG "%s: " format "\n" , __func__ , ## arg); \ } while (0) #define DBG_PACKET(level,data,count) \ - if (level & __debug_variable) dump_packet(__FUNCTION__,data,count) + if (level & __debug_variable) dump_packet(__func__,data,count) #define DBG_SKB(level,skb) \ - if ((level & __debug_variable) && skb) dump_packet(__FUNCTION__,skb->data,skb->len) + if ((level & __debug_variable) && skb) dump_packet(__func__,skb->data,skb->len) static void __attribute__((unused)) diff --git a/drivers/isdn/hisax/hisax_fcpcipnp.c b/drivers/isdn/hisax/hisax_fcpcipnp.c index 76043dedba5b..539b2e0c8254 100644 --- a/drivers/isdn/hisax/hisax_fcpcipnp.c +++ b/drivers/isdn/hisax/hisax_fcpcipnp.c @@ -935,7 +935,7 @@ static int __devinit fcpnp_probe(struct pnp_dev *pdev, const struct pnp_device_i pnp_disable_dev(pdev); retval = pnp_activate_dev(pdev); if (retval < 0) { - printk(KERN_WARNING "%s: pnp_activate_dev(%s) ret(%d)\n", __FUNCTION__, + printk(KERN_WARNING "%s: pnp_activate_dev(%s) ret(%d)\n", __func__, (char *)dev_id->driver_data, retval); goto err_free; } diff --git a/drivers/isdn/hisax/ix1_micro.c b/drivers/isdn/hisax/ix1_micro.c index 2d18d4f1e57e..a92bf0d2cab2 100644 --- a/drivers/isdn/hisax/ix1_micro.c +++ b/drivers/isdn/hisax/ix1_micro.c @@ -252,7 +252,7 @@ setup_ix1micro(struct IsdnCard *card) err = pnp_activate_dev(pnp_d); if (err<0) { printk(KERN_WARNING "%s: pnp_activate_dev ret(%d)\n", - __FUNCTION__, err); + __func__, err); return(0); } card->para[1] = pnp_port_start(pnp_d, 0); diff --git a/drivers/isdn/hisax/niccy.c b/drivers/isdn/hisax/niccy.c index 421b8e6763d7..ef00633e1d2a 100644 --- a/drivers/isdn/hisax/niccy.c +++ b/drivers/isdn/hisax/niccy.c @@ -255,7 +255,7 @@ int __devinit setup_niccy(struct IsdnCard *card) err = pnp_activate_dev(pnp_d); if (err < 0) { printk(KERN_WARNING "%s: pnp_activate_dev " - "ret(%d)\n", __FUNCTION__, err); + "ret(%d)\n", __func__, err); return 0; } card->para[1] = pnp_port_start(pnp_d, 0); diff --git a/drivers/isdn/hisax/sedlbauer.c b/drivers/isdn/hisax/sedlbauer.c index 95425f3d2220..a10dfa82c734 100644 --- a/drivers/isdn/hisax/sedlbauer.c +++ b/drivers/isdn/hisax/sedlbauer.c @@ -555,7 +555,7 @@ setup_sedlbauer_isapnp(struct IsdnCard *card, int *bytecnt) err = pnp_activate_dev(pnp_d); if (err<0) { printk(KERN_WARNING "%s: pnp_activate_dev ret(%d)\n", - __FUNCTION__, err); + __func__, err); return(0); } card->para[1] = pnp_port_start(pnp_d, 0); diff --git a/drivers/isdn/hisax/st5481.h b/drivers/isdn/hisax/st5481.h index 04416bad611d..2044e7173ab4 100644 --- a/drivers/isdn/hisax/st5481.h +++ b/drivers/isdn/hisax/st5481.h @@ -218,13 +218,13 @@ enum { #define L1_EVENT_COUNT (EV_TIMER3 + 1) #define ERR(format, arg...) \ -printk(KERN_ERR "%s:%s: " format "\n" , __FILE__, __FUNCTION__ , ## arg) +printk(KERN_ERR "%s:%s: " format "\n" , __FILE__, __func__ , ## arg) #define WARN(format, arg...) \ -printk(KERN_WARNING "%s:%s: " format "\n" , __FILE__, __FUNCTION__ , ## arg) +printk(KERN_WARNING "%s:%s: " format "\n" , __FILE__, __func__ , ## arg) #define INFO(format, arg...) \ -printk(KERN_INFO "%s:%s: " format "\n" , __FILE__, __FUNCTION__ , ## arg) +printk(KERN_INFO "%s:%s: " format "\n" , __FILE__, __func__ , ## arg) #include "isdnhdlc.h" #include "fsm.h" @@ -406,7 +406,7 @@ struct st5481_adapter { /* * Submit an URB with error reporting. This is a macro so - * the __FUNCTION__ returns the caller function name. + * the __func__ returns the caller function name. */ #define SUBMIT_URB(urb, mem_flags) \ ({ \ @@ -470,7 +470,7 @@ extern int st5481_debug; #ifdef CONFIG_HISAX_DEBUG #define DBG_ISO_PACKET(level,urb) \ - if (level & __debug_variable) dump_iso_packet(__FUNCTION__,urb) + if (level & __debug_variable) dump_iso_packet(__func__,urb) static void __attribute__((unused)) dump_iso_packet(const char *name, struct urb *urb) diff --git a/drivers/isdn/hisax/teles3.c b/drivers/isdn/hisax/teles3.c index 6a5e379e0774..5dc9f1a43629 100644 --- a/drivers/isdn/hisax/teles3.c +++ b/drivers/isdn/hisax/teles3.c @@ -301,7 +301,7 @@ setup_teles3(struct IsdnCard *card) err = pnp_activate_dev(pnp_d); if (err<0) { printk(KERN_WARNING "%s: pnp_activate_dev ret(%d)\n", - __FUNCTION__, err); + __func__, err); return(0); } card->para[3] = pnp_port_start(pnp_d, 2); diff --git a/drivers/isdn/i4l/isdn_common.c b/drivers/isdn/i4l/isdn_common.c index d4ad6992f776..0f3c66de69bc 100644 --- a/drivers/isdn/i4l/isdn_common.c +++ b/drivers/isdn/i4l/isdn_common.c @@ -1924,7 +1924,7 @@ isdn_free_channel(int di, int ch, int usage) if ((di < 0) || (ch < 0)) { printk(KERN_WARNING "%s: called with invalid drv(%d) or channel(%d)\n", - __FUNCTION__, di, ch); + __func__, di, ch); return; } for (i = 0; i < ISDN_MAX_CHANNELS; i++) diff --git a/drivers/isdn/i4l/isdn_net.h b/drivers/isdn/i4l/isdn_net.h index bc2f0dd962ea..be4949715d55 100644 --- a/drivers/isdn/i4l/isdn_net.h +++ b/drivers/isdn/i4l/isdn_net.h @@ -108,7 +108,7 @@ static __inline__ void isdn_net_add_to_bundle(isdn_net_dev *nd, isdn_net_local * lp = nd->queue; // printk(KERN_DEBUG "%s: lp:%s(%p) nlp:%s(%p) last(%p)\n", -// __FUNCTION__, lp->name, lp, nlp->name, nlp, lp->last); +// __func__, lp->name, lp, nlp->name, nlp, lp->last); nlp->last = lp->last; lp->last->next = nlp; lp->last = nlp; @@ -129,7 +129,7 @@ static __inline__ void isdn_net_rm_from_bundle(isdn_net_local *lp) master_lp = (isdn_net_local *) lp->master->priv; // printk(KERN_DEBUG "%s: lp:%s(%p) mlp:%s(%p) last(%p) next(%p) mndq(%p)\n", -// __FUNCTION__, lp->name, lp, master_lp->name, master_lp, lp->last, lp->next, master_lp->netdev->queue); +// __func__, lp->name, lp, master_lp->name, master_lp, lp->last, lp->next, master_lp->netdev->queue); spin_lock_irqsave(&master_lp->netdev->queue_lock, flags); lp->last->next = lp->next; lp->next->last = lp->last; @@ -141,7 +141,7 @@ static __inline__ void isdn_net_rm_from_bundle(isdn_net_local *lp) } lp->next = lp->last = lp; /* (re)set own pointers */ // printk(KERN_DEBUG "%s: mndq(%p)\n", -// __FUNCTION__, master_lp->netdev->queue); +// __func__, master_lp->netdev->queue); spin_unlock_irqrestore(&master_lp->netdev->queue_lock, flags); } diff --git a/drivers/isdn/i4l/isdn_ppp.c b/drivers/isdn/i4l/isdn_ppp.c index 9f5fe372f83d..127cfdad68e7 100644 --- a/drivers/isdn/i4l/isdn_ppp.c +++ b/drivers/isdn/i4l/isdn_ppp.c @@ -110,7 +110,7 @@ isdn_ppp_free(isdn_net_local * lp) if (lp->ppp_slot < 0 || lp->ppp_slot >= ISDN_MAX_CHANNELS) { printk(KERN_ERR "%s: ppp_slot(%d) out of range\n", - __FUNCTION__, lp->ppp_slot); + __func__, lp->ppp_slot); return 0; } @@ -127,7 +127,7 @@ isdn_ppp_free(isdn_net_local * lp) #endif /* CONFIG_ISDN_MPP */ if (lp->ppp_slot < 0 || lp->ppp_slot >= ISDN_MAX_CHANNELS) { printk(KERN_ERR "%s: ppp_slot(%d) now invalid\n", - __FUNCTION__, lp->ppp_slot); + __func__, lp->ppp_slot); return 0; } is = ippp_table[lp->ppp_slot]; @@ -226,7 +226,7 @@ isdn_ppp_wakeup_daemon(isdn_net_local * lp) { if (lp->ppp_slot < 0 || lp->ppp_slot >= ISDN_MAX_CHANNELS) { printk(KERN_ERR "%s: ppp_slot(%d) out of range\n", - __FUNCTION__, lp->ppp_slot); + __func__, lp->ppp_slot); return; } ippp_table[lp->ppp_slot]->state = IPPP_OPEN | IPPP_CONNECT | IPPP_NOBLOCK; @@ -245,7 +245,7 @@ isdn_ppp_closewait(int slot) if (slot < 0 || slot >= ISDN_MAX_CHANNELS) { printk(KERN_ERR "%s: slot(%d) out of range\n", - __FUNCTION__, slot); + __func__, slot); return 0; } is = ippp_table[slot]; @@ -343,7 +343,7 @@ isdn_ppp_release(int min, struct file *file) is = file->private_data; if (!is) { - printk(KERN_ERR "%s: no file->private_data\n", __FUNCTION__); + printk(KERN_ERR "%s: no file->private_data\n", __func__); return; } if (is->debug & 0x1) @@ -353,7 +353,7 @@ isdn_ppp_release(int min, struct file *file) isdn_net_dev *p = is->lp->netdev; if (!p) { - printk(KERN_ERR "%s: no lp->netdev\n", __FUNCTION__); + printk(KERN_ERR "%s: no lp->netdev\n", __func__); return; } is->state &= ~IPPP_CONNECT; /* -> effect: no call of wakeup */ @@ -1080,7 +1080,7 @@ isdn_ppp_push_higher(isdn_net_dev * net_dev, isdn_net_local * lp, struct sk_buff printk(KERN_DEBUG "isdn_ppp: VJC_UNCOMP\n"); if (net_dev->local->ppp_slot < 0) { printk(KERN_ERR "%s: net_dev->local->ppp_slot(%d) out of range\n", - __FUNCTION__, net_dev->local->ppp_slot); + __func__, net_dev->local->ppp_slot); goto drop_packet; } if (slhc_remember(ippp_table[net_dev->local->ppp_slot]->slcomp, skb->data, skb->len) <= 0) { @@ -1107,7 +1107,7 @@ isdn_ppp_push_higher(isdn_net_dev * net_dev, isdn_net_local * lp, struct sk_buff skb_old->len); if (net_dev->local->ppp_slot < 0) { printk(KERN_ERR "%s: net_dev->local->ppp_slot(%d) out of range\n", - __FUNCTION__, net_dev->local->ppp_slot); + __func__, net_dev->local->ppp_slot); goto drop_packet; } pkt_len = slhc_uncompress(ippp_table[net_dev->local->ppp_slot]->slcomp, @@ -1553,7 +1553,7 @@ static int isdn_ppp_mp_init( isdn_net_local * lp, ippp_bundle * add_to ) if (lp->ppp_slot < 0) { printk(KERN_ERR "%s: lp->ppp_slot(%d) out of range\n", - __FUNCTION__, lp->ppp_slot); + __func__, lp->ppp_slot); return(-EINVAL); } @@ -1604,7 +1604,7 @@ static void isdn_ppp_mp_receive(isdn_net_dev * net_dev, isdn_net_local * lp, slot = lp->ppp_slot; if (slot < 0 || slot >= ISDN_MAX_CHANNELS) { printk(KERN_ERR "%s: lp->ppp_slot(%d)\n", - __FUNCTION__, lp->ppp_slot); + __func__, lp->ppp_slot); stats->frame_drops++; dev_kfree_skb(skb); spin_unlock_irqrestore(&mp->lock, flags); @@ -1641,7 +1641,7 @@ static void isdn_ppp_mp_receive(isdn_net_dev * net_dev, isdn_net_local * lp, slot = lpq->ppp_slot; if (slot < 0 || slot >= ISDN_MAX_CHANNELS) { printk(KERN_ERR "%s: lpq->ppp_slot(%d)\n", - __FUNCTION__, lpq->ppp_slot); + __func__, lpq->ppp_slot); } else { u32 lls = ippp_table[slot]->last_link_seqno; if (MP_LT(lls, minseq)) @@ -1875,7 +1875,7 @@ void isdn_ppp_mp_reassembly( isdn_net_dev * net_dev, isdn_net_local * lp, if (lp->ppp_slot < 0 || lp->ppp_slot >= ISDN_MAX_CHANNELS) { printk(KERN_ERR "%s: lp->ppp_slot(%d) out of range\n", - __FUNCTION__, lp->ppp_slot); + __func__, lp->ppp_slot); return; } if( MP_FLAGS(from) == (MP_BEGIN_FRAG | MP_END_FRAG) ) { @@ -2655,7 +2655,7 @@ static void isdn_ppp_receive_ccp(isdn_net_dev *net_dev, isdn_net_local *lp, lp->ppp_slot); if (lp->ppp_slot < 0 || lp->ppp_slot >= ISDN_MAX_CHANNELS) { printk(KERN_ERR "%s: lp->ppp_slot(%d) out of range\n", - __FUNCTION__, lp->ppp_slot); + __func__, lp->ppp_slot); return; } is = ippp_table[lp->ppp_slot]; @@ -2665,7 +2665,7 @@ static void isdn_ppp_receive_ccp(isdn_net_dev *net_dev, isdn_net_local *lp, int slot = ((isdn_net_local *) (lp->master->priv))->ppp_slot; if (slot < 0 || slot >= ISDN_MAX_CHANNELS) { printk(KERN_ERR "%s: slot(%d) out of range\n", - __FUNCTION__, slot); + __func__, slot); return; } mis = ippp_table[slot]; @@ -2829,7 +2829,7 @@ static void isdn_ppp_send_ccp(isdn_net_dev *net_dev, isdn_net_local *lp, struct return; if (slot < 0 || slot >= ISDN_MAX_CHANNELS) { printk(KERN_ERR "%s: lp->ppp_slot(%d) out of range\n", - __FUNCTION__, slot); + __func__, slot); return; } is = ippp_table[slot]; @@ -2852,7 +2852,7 @@ static void isdn_ppp_send_ccp(isdn_net_dev *net_dev, isdn_net_local *lp, struct slot = ((isdn_net_local *) (lp->master->priv))->ppp_slot; if (slot < 0 || slot >= ISDN_MAX_CHANNELS) { printk(KERN_ERR "%s: slot(%d) out of range\n", - __FUNCTION__, slot); + __func__, slot); return; } mis = ippp_table[slot]; diff --git a/drivers/isdn/i4l/isdn_tty.c b/drivers/isdn/i4l/isdn_tty.c index 133eb18e65cc..8af0df1d5b8c 100644 --- a/drivers/isdn/i4l/isdn_tty.c +++ b/drivers/isdn/i4l/isdn_tty.c @@ -1347,7 +1347,7 @@ isdn_tty_tiocmget(struct tty_struct *tty, struct file *file) modem_info *info = (modem_info *) tty->driver_data; u_char control, status; - if (isdn_tty_paranoia_check(info, tty->name, __FUNCTION__)) + if (isdn_tty_paranoia_check(info, tty->name, __func__)) return -ENODEV; if (tty->flags & (1 << TTY_IO_ERROR)) return -EIO; @@ -1372,7 +1372,7 @@ isdn_tty_tiocmset(struct tty_struct *tty, struct file *file, { modem_info *info = (modem_info *) tty->driver_data; - if (isdn_tty_paranoia_check(info, tty->name, __FUNCTION__)) + if (isdn_tty_paranoia_check(info, tty->name, __func__)) return -ENODEV; if (tty->flags & (1 << TTY_IO_ERROR)) return -EIO; @@ -1608,7 +1608,7 @@ isdn_tty_open(struct tty_struct *tty, struct file *filp) if (isdn_tty_paranoia_check(info, tty->name, "isdn_tty_open")) return -ENODEV; if (!try_module_get(info->owner)) { - printk(KERN_WARNING "%s: cannot reserve module\n", __FUNCTION__); + printk(KERN_WARNING "%s: cannot reserve module\n", __func__); return -ENODEV; } #ifdef ISDN_DEBUG_MODEM_OPEN -- cgit v1.2.3 From f3429545d03a553c6a3e9fcf60ddea31819848ad Mon Sep 17 00:00:00 2001 From: "Robert P. J. Day" Date: Mon, 28 Apr 2008 02:14:40 -0700 Subject: isdn: fix obvious cut-and-paste error in st5481_usb.c Fix a rather obvious cut-and-paste error, where earlier code for the controller URB got somehow mixed in with code for the interrupt URB. Signed-off-by: Robert P. J. Day Cc: Karsten Keil Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/isdn/hisax/st5481_usb.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/isdn/hisax/st5481_usb.c b/drivers/isdn/hisax/st5481_usb.c index 4ada66b8b679..427a8b0520f5 100644 --- a/drivers/isdn/hisax/st5481_usb.c +++ b/drivers/isdn/hisax/st5481_usb.c @@ -342,7 +342,7 @@ void st5481_release_usb(struct st5481_adapter *adapter) usb_kill_urb(intr->urb); kfree(intr->urb->transfer_buffer); usb_free_urb(intr->urb); - ctrl->urb = NULL; + intr->urb = NULL; } /* -- cgit v1.2.3 From 30d55e71a81b1f5a8136f191dc9f4c21f18e77e6 Mon Sep 17 00:00:00 2001 From: Bjorn Helgaas Date: Mon, 28 Apr 2008 02:14:41 -0700 Subject: hisax: depend on CONFIG_PNP, not __ISAPNP__ The PNP driver interfaces depend on CONFIG_PNP, so test that rather than __ISAPNP__. Signed-off-by: Bjorn Helgaas Cc: Karsten Keil Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/isdn/hisax/hisax_fcpcipnp.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/drivers/isdn/hisax/hisax_fcpcipnp.c b/drivers/isdn/hisax/hisax_fcpcipnp.c index 539b2e0c8254..c0b4db2f8364 100644 --- a/drivers/isdn/hisax/hisax_fcpcipnp.c +++ b/drivers/isdn/hisax/hisax_fcpcipnp.c @@ -68,7 +68,7 @@ static struct pci_device_id fcpci_ids[] = { MODULE_DEVICE_TABLE(pci, fcpci_ids); -#ifdef __ISAPNP__ +#ifdef CONFIG_PNP static struct pnp_device_id fcpnp_ids[] __devinitdata = { { .id = "AVM0900", @@ -914,7 +914,7 @@ static int __devinit fcpci_probe(struct pci_dev *pdev, return retval; } -#ifdef __ISAPNP__ +#ifdef CONFIG_PNP static int __devinit fcpnp_probe(struct pnp_dev *pdev, const struct pnp_device_id *dev_id) { struct fritz_adapter *adapter; @@ -974,6 +974,8 @@ static struct pnp_driver fcpnp_driver = { .remove = __devexit_p(fcpnp_remove), .id_table = fcpnp_ids, }; +#else +static struct pnp_driver fcpnp_driver; #endif static void __devexit fcpci_remove(struct pci_dev *pdev) @@ -1001,7 +1003,7 @@ static int __init hisax_fcpcipnp_init(void) retval = pci_register_driver(&fcpci_driver); if (retval) return retval; -#ifdef __ISAPNP__ +#ifdef CONFIG_PNP retval = pnp_register_driver(&fcpnp_driver); if (retval < 0) { pci_unregister_driver(&fcpci_driver); @@ -1013,7 +1015,7 @@ static int __init hisax_fcpcipnp_init(void) static void __exit hisax_fcpcipnp_exit(void) { -#ifdef __ISAPNP__ +#ifdef CONFIG_PNP pnp_unregister_driver(&fcpnp_driver); #endif pci_unregister_driver(&fcpci_driver); -- cgit v1.2.3 From c24e9b3fa3fdfca3834eba0bb217c8c197a43b7e Mon Sep 17 00:00:00 2001 From: Cyrill Gorcunov Date: Mon, 28 Apr 2008 02:14:41 -0700 Subject: capifs: fix memory leak on remount capifs_remount may reach 'return' statement without freeing of memory allocated by kstrdup call Signed-off-by: Cyrill Gorcunov Cc: Karsten Keil Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/isdn/capi/capifs.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/isdn/capi/capifs.c b/drivers/isdn/capi/capifs.c index eafe0e9daa7c..550e80f390a6 100644 --- a/drivers/isdn/capi/capifs.c +++ b/drivers/isdn/capi/capifs.c @@ -69,6 +69,7 @@ static int capifs_remount(struct super_block *s, int *flags, char *data) } else if (sscanf(this_char, "mode=%o%c", &n, &dummy) == 1) mode = n & ~S_IFMT; else { + kfree(new_opt); printk("capifs: called with bogus options\n"); return -EINVAL; } -- cgit v1.2.3 From 37772ac0fcc6728df47e6b0609766b7b77a8064b Mon Sep 17 00:00:00 2001 From: "Robert P. J. Day" Date: Mon, 28 Apr 2008 02:14:42 -0700 Subject: isdn: rename CONFIG_AVMB1_COMPAT to not look like a Kconfig variable Since CONFIG_AVMB1_COMPAT is not a Kconfig variable, move it out of the Kconfig namespace. Signed-off-by: Robert P. J. Day Acked-by: Karsten Keil Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/isdn/capi/kcapi.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/isdn/capi/kcapi.c b/drivers/isdn/capi/kcapi.c index 063de5a29fcf..75726ea0fbbd 100644 --- a/drivers/isdn/capi/kcapi.c +++ b/drivers/isdn/capi/kcapi.c @@ -10,7 +10,7 @@ * */ -#define CONFIG_AVMB1_COMPAT +#define AVMB1_COMPAT #include "kcapi.h" #include @@ -29,7 +29,7 @@ #include #include #include -#ifdef CONFIG_AVMB1_COMPAT +#ifdef AVMB1_COMPAT #include #endif #include @@ -740,7 +740,7 @@ u16 capi20_get_profile(u32 contr, struct capi_profile *profp) EXPORT_SYMBOL(capi20_get_profile); -#ifdef CONFIG_AVMB1_COMPAT +#ifdef AVMB1_COMPAT static int old_capi_manufacturer(unsigned int cmd, void __user *data) { avmb1_loadandconfigdef ldef; @@ -904,7 +904,7 @@ int capi20_manufacturer(unsigned int cmd, void __user *data) struct capi_ctr *card; switch (cmd) { -#ifdef CONFIG_AVMB1_COMPAT +#ifdef AVMB1_COMPAT case AVMB1_LOAD: case AVMB1_LOAD_AND_CONFIG: case AVMB1_RESETCARD: -- cgit v1.2.3 From 73fcdc9e15c27bb92595c611c8938a36645ea20d Mon Sep 17 00:00:00 2001 From: Ilpo Järvinen Date: Mon, 28 Apr 2008 02:14:43 -0700 Subject: i2o: remove static inline forward declarations MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Nothing in between of them and the later declaration with body needs them. Signed-off-by: Ilpo Järvinen Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/linux/i2o.h | 5 ----- 1 file changed, 5 deletions(-) diff --git a/include/linux/i2o.h b/include/linux/i2o.h index e92170dda245..f65e58a1d925 100644 --- a/include/linux/i2o.h +++ b/include/linux/i2o.h @@ -613,14 +613,9 @@ struct i2o_sys_tbl { extern struct list_head i2o_controllers; /* Message functions */ -static inline struct i2o_message *i2o_msg_get(struct i2o_controller *); extern struct i2o_message *i2o_msg_get_wait(struct i2o_controller *, int); -static inline void i2o_msg_post(struct i2o_controller *, struct i2o_message *); -static inline int i2o_msg_post_wait(struct i2o_controller *, - struct i2o_message *, unsigned long); extern int i2o_msg_post_wait_mem(struct i2o_controller *, struct i2o_message *, unsigned long, struct i2o_dma *); -static inline void i2o_flush_reply(struct i2o_controller *, u32); /* IOP functions */ extern int i2o_status_get(struct i2o_controller *); -- cgit v1.2.3 From 438d8908b379b6322fc3b28d45c9ebdddf58bc20 Mon Sep 17 00:00:00 2001 From: Guennadi Liakhovetski Date: Mon, 28 Apr 2008 02:14:44 -0700 Subject: gpiolib: better rmmod infrastructure As long as one or more GPIOs on a gpio chip are used its driver should not be unloaded. The existing mechanism (gpiochip_remove failure) doesn't address that, since rmmod can no longer be made to fail by having the cleanup code report errors. Module usecounts are the solution. Assuming standard "initialize struct to zero" policies, this change won't affect SOC platform drivers. However, drivers for external chips (on I2C and SPI busses) should be updated if they can be built as modules. Signed-off-by: Guennadi Liakhovetski [ gpio_ensure_requested() needs to update module usecounts too ] Signed-off-by: David Brownell Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/gpio/gpiolib.c | 15 ++++++++++++--- include/asm-generic/gpio.h | 2 ++ 2 files changed, 14 insertions(+), 3 deletions(-) diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index d8db2f8ee411..eb75d12e83b7 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -68,6 +68,9 @@ static void gpio_ensure_requested(struct gpio_desc *desc) if (test_and_set_bit(FLAG_REQUESTED, &desc->flags) == 0) { pr_warning("GPIO-%d autorequested\n", (int)(desc - gpio_desc)); desc_set_label(desc, "[auto]"); + if (!try_module_get(desc->chip->owner)) + pr_err("GPIO-%d: module can't be gotten \n", + (int)(desc - gpio_desc)); } } @@ -177,6 +180,9 @@ int gpio_request(unsigned gpio, const char *label) if (desc->chip == NULL) goto done; + if (!try_module_get(desc->chip->owner)) + goto done; + /* NOTE: gpio_request() can be called in early boot, * before IRQs are enabled. */ @@ -184,8 +190,10 @@ int gpio_request(unsigned gpio, const char *label) if (test_and_set_bit(FLAG_REQUESTED, &desc->flags) == 0) { desc_set_label(desc, label ? : "?"); status = 0; - } else + } else { status = -EBUSY; + module_put(desc->chip->owner); + } done: if (status) @@ -209,9 +217,10 @@ void gpio_free(unsigned gpio) spin_lock_irqsave(&gpio_lock, flags); desc = &gpio_desc[gpio]; - if (desc->chip && test_and_clear_bit(FLAG_REQUESTED, &desc->flags)) + if (desc->chip && test_and_clear_bit(FLAG_REQUESTED, &desc->flags)) { desc_set_label(desc, NULL); - else + module_put(desc->chip->owner); + } else WARN_ON(extra_checks); spin_unlock_irqrestore(&gpio_lock, flags); diff --git a/include/asm-generic/gpio.h b/include/asm-generic/gpio.h index f29a502f4a6c..7e77b6ff45bb 100644 --- a/include/asm-generic/gpio.h +++ b/include/asm-generic/gpio.h @@ -17,6 +17,7 @@ #endif struct seq_file; +struct module; /** * struct gpio_chip - abstract a GPIO controller @@ -48,6 +49,7 @@ struct seq_file; */ struct gpio_chip { char *label; + struct module *owner; int (*direction_input)(struct gpio_chip *chip, unsigned offset); -- cgit v1.2.3 From d72cbed0c486e3db8b56380635f8e845073ce63a Mon Sep 17 00:00:00 2001 From: Guennadi Liakhovetski Date: Mon, 28 Apr 2008 02:14:45 -0700 Subject: gpiolib: i2c/spi drivers: handle rmmod better Use the newly introduced owner field in struct gpio_chip to protect the current (small) set of non-SOC GPIO drivers from being unloaded while any of their GPIOs are in use. Signed-off-by: Guennadi Liakhovetski [ add mcp23s08 and pcf857x ] Signed-off-by: David Brownell Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/gpio/mcp23s08.c | 1 + drivers/gpio/pca953x.c | 1 + drivers/gpio/pcf857x.c | 1 + 3 files changed, 3 insertions(+) diff --git a/drivers/gpio/mcp23s08.c b/drivers/gpio/mcp23s08.c index bb60e8c1a1f0..7fb5b9d009d4 100644 --- a/drivers/gpio/mcp23s08.c +++ b/drivers/gpio/mcp23s08.c @@ -239,6 +239,7 @@ static int mcp23s08_probe(struct spi_device *spi) mcp->chip.base = pdata->base; mcp->chip.ngpio = 8; mcp->chip.can_sleep = 1; + mcp->chip.owner = THIS_MODULE; spi_set_drvdata(spi, mcp); diff --git a/drivers/gpio/pca953x.c b/drivers/gpio/pca953x.c index 6e72fd31184d..e0e0af536108 100644 --- a/drivers/gpio/pca953x.c +++ b/drivers/gpio/pca953x.c @@ -189,6 +189,7 @@ static void pca953x_setup_gpio(struct pca953x_chip *chip, int gpios) gc->base = chip->gpio_start; gc->ngpio = gpios; gc->label = chip->client->name; + gc->owner = THIS_MODULE; } static int __devinit pca953x_probe(struct i2c_client *client) diff --git a/drivers/gpio/pcf857x.c b/drivers/gpio/pcf857x.c index c6b3b5378384..1106aa15ac79 100644 --- a/drivers/gpio/pcf857x.c +++ b/drivers/gpio/pcf857x.c @@ -159,6 +159,7 @@ static int pcf857x_probe(struct i2c_client *client) gpio->chip.base = pdata->gpio_base; gpio->chip.can_sleep = 1; + gpio->chip.owner = THIS_MODULE; /* NOTE: the OnSemi jlc1562b is also largely compatible with * these parts, notably for output. It has a low-resolution -- cgit v1.2.3 From e6de1808f8ebfeb7e49f3c5a30cb8f2032beb287 Mon Sep 17 00:00:00 2001 From: Guennadi Liakhovetski Date: Mon, 28 Apr 2008 02:14:46 -0700 Subject: gpio: define gpio_is_valid() Introduce a gpio_is_valid() predicate; use it in gpiolib. Signed-off-by: Guennadi Liakhovetski [ use inline function; follow the gpio_* naming convention; work without gpiolib; all programming interfaces need docs ] Signed-off-by: David Brownell Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/gpio.txt | 10 ++++++++++ drivers/gpio/gpiolib.c | 14 +++++++------- include/asm-generic/gpio.h | 12 ++++++++++++ 3 files changed, 29 insertions(+), 7 deletions(-) diff --git a/Documentation/gpio.txt b/Documentation/gpio.txt index 54630095aa3c..c35ca9e40d4c 100644 --- a/Documentation/gpio.txt +++ b/Documentation/gpio.txt @@ -107,6 +107,16 @@ type of GPIO controller, and on one particular board 80-95 with an FPGA. The numbers need not be contiguous; either of those platforms could also use numbers 2000-2063 to identify GPIOs in a bank of I2C GPIO expanders. +If you want to initialize a structure with an invalid GPIO number, use +some negative number (perhaps "-EINVAL"); that will never be valid. To +test if a number could reference a GPIO, you may use this predicate: + + int gpio_is_valid(int number); + +A number that's not valid will be rejected by calls which may request +or free GPIOs (see below). Other numbers may also be rejected; for +example, a number might be valid but unused on a given board. + Whether a platform supports multiple GPIO controllers is currently a platform-specific implementation issue. diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index eb75d12e83b7..623fcd9b547a 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -99,7 +99,7 @@ int gpiochip_add(struct gpio_chip *chip) * dynamic allocation. We don't currently support that. */ - if (chip->base < 0 || (chip->base + chip->ngpio) >= ARCH_NR_GPIOS) { + if (chip->base < 0 || !gpio_is_valid(chip->base + chip->ngpio)) { status = -EINVAL; goto fail; } @@ -174,7 +174,7 @@ int gpio_request(unsigned gpio, const char *label) spin_lock_irqsave(&gpio_lock, flags); - if (gpio >= ARCH_NR_GPIOS) + if (!gpio_is_valid(gpio)) goto done; desc = &gpio_desc[gpio]; if (desc->chip == NULL) @@ -209,7 +209,7 @@ void gpio_free(unsigned gpio) unsigned long flags; struct gpio_desc *desc; - if (gpio >= ARCH_NR_GPIOS) { + if (!gpio_is_valid(gpio)) { WARN_ON(extra_checks); return; } @@ -245,7 +245,7 @@ const char *gpiochip_is_requested(struct gpio_chip *chip, unsigned offset) { unsigned gpio = chip->base + offset; - if (gpio >= ARCH_NR_GPIOS || gpio_desc[gpio].chip != chip) + if (!gpio_is_valid(gpio) || gpio_desc[gpio].chip != chip) return NULL; if (test_bit(FLAG_REQUESTED, &gpio_desc[gpio].flags) == 0) return NULL; @@ -276,7 +276,7 @@ int gpio_direction_input(unsigned gpio) spin_lock_irqsave(&gpio_lock, flags); - if (gpio >= ARCH_NR_GPIOS) + if (!gpio_is_valid(gpio)) goto fail; chip = desc->chip; if (!chip || !chip->get || !chip->direction_input) @@ -314,7 +314,7 @@ int gpio_direction_output(unsigned gpio, int value) spin_lock_irqsave(&gpio_lock, flags); - if (gpio >= ARCH_NR_GPIOS) + if (!gpio_is_valid(gpio)) goto fail; chip = desc->chip; if (!chip || !chip->set || !chip->direction_output) @@ -531,7 +531,7 @@ static int gpiolib_show(struct seq_file *s, void *unused) /* REVISIT this isn't locked against gpio_chip removal ... */ - for (gpio = 0; gpio < ARCH_NR_GPIOS; gpio++) { + for (gpio = 0; gpio_is_valid(gpio); gpio++) { if (chip == gpio_desc[gpio].chip) continue; chip = gpio_desc[gpio].chip; diff --git a/include/asm-generic/gpio.h b/include/asm-generic/gpio.h index 7e77b6ff45bb..464c5b334dc2 100644 --- a/include/asm-generic/gpio.h +++ b/include/asm-generic/gpio.h @@ -16,6 +16,12 @@ #define ARCH_NR_GPIOS 256 #endif +static inline int gpio_is_valid(int number) +{ + /* only some non-negative numbers are valid */ + return ((unsigned)number) < ARCH_NR_GPIOS; +} + struct seq_file; struct module; @@ -99,6 +105,12 @@ extern int __gpio_cansleep(unsigned gpio); #else +static inline int gpio_is_valid(int number) +{ + /* only non-negative numbers are valid */ + return number >= 0; +} + /* platforms that don't directly support access to GPIOs through I2C, SPI, * or other blocking infrastructure can use these wrappers. */ -- cgit v1.2.3 From 8d0aab2f16c4fa170f32e7a74a52cd0122bbafef Mon Sep 17 00:00:00 2001 From: Anton Vorontsov Date: Mon, 28 Apr 2008 02:14:46 -0700 Subject: gpiolib: dynamic gpio number allocation If gpio_chip->base is negative during registration, gpiolib performs dynamic base allocation. This is useful for devices that aren't always present, such as GPIOs on hotplugged devices rather than mainboards. (This behavior was previously specified but not implemented.) To avoid using any numbers that may have been explicitly assigned but not yet registered, this dynamic allocation assigns GPIO numbers from the biggest number on down, instead of from the smallest on up. Signed-off-by: Anton Vorontsov Signed-off-by: David Brownell Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/gpio/gpiolib.c | 52 +++++++++++++++++++++++++++++++++++++++++++------- 1 file changed, 45 insertions(+), 7 deletions(-) diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index 623fcd9b547a..2ba6127c4fae 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -80,6 +80,33 @@ static inline struct gpio_chip *gpio_to_chip(unsigned gpio) return gpio_desc[gpio].chip; } +/* dynamic allocation of GPIOs, e.g. on a hotplugged device */ +static int gpiochip_find_base(int ngpio) +{ + int i; + int spare = 0; + int base = -ENOSPC; + + for (i = ARCH_NR_GPIOS - 1; i >= 0 ; i--) { + struct gpio_chip *chip = gpio_desc[i].chip; + + if (!chip) { + spare++; + if (spare == ngpio) { + base = i; + break; + } + } else { + spare = 0; + i -= chip->ngpio - 1; + } + } + + if (gpio_is_valid(base)) + pr_debug("%s: found new base at %d\n", __func__, base); + return base; +} + /** * gpiochip_add() - register a gpio_chip * @chip: the chip to register, with chip->base initialized @@ -88,38 +115,49 @@ static inline struct gpio_chip *gpio_to_chip(unsigned gpio) * Returns a negative errno if the chip can't be registered, such as * because the chip->base is invalid or already associated with a * different chip. Otherwise it returns zero as a success code. + * + * If chip->base is negative, this requests dynamic assignment of + * a range of valid GPIOs. */ int gpiochip_add(struct gpio_chip *chip) { unsigned long flags; int status = 0; unsigned id; + int base = chip->base; - /* NOTE chip->base negative is reserved to mean a request for - * dynamic allocation. We don't currently support that. - */ - - if (chip->base < 0 || !gpio_is_valid(chip->base + chip->ngpio)) { + if ((!gpio_is_valid(base) || !gpio_is_valid(base + chip->ngpio)) + && base >= 0) { status = -EINVAL; goto fail; } spin_lock_irqsave(&gpio_lock, flags); + if (base < 0) { + base = gpiochip_find_base(chip->ngpio); + if (base < 0) { + status = base; + goto fail_unlock; + } + chip->base = base; + } + /* these GPIO numbers must not be managed by another gpio_chip */ - for (id = chip->base; id < chip->base + chip->ngpio; id++) { + for (id = base; id < base + chip->ngpio; id++) { if (gpio_desc[id].chip != NULL) { status = -EBUSY; break; } } if (status == 0) { - for (id = chip->base; id < chip->base + chip->ngpio; id++) { + for (id = base; id < base + chip->ngpio; id++) { gpio_desc[id].chip = chip; gpio_desc[id].flags = 0; } } +fail_unlock: spin_unlock_irqrestore(&gpio_lock, flags); fail: /* failures here can mean systems won't boot... */ -- cgit v1.2.3 From 169b6a7a6e91e1ea32136681b475cbaf2074bf35 Mon Sep 17 00:00:00 2001 From: Anton Vorontsov Date: Mon, 28 Apr 2008 02:14:47 -0700 Subject: gpiochip_reserve() Add a new function gpiochip_reserve() to reserve ranges of gpios that platform code has pre-allocated. That is, this marks gpio numbers which will be claimed by drivers that haven't yet been loaded, and thus are not available for dynamic gpio number allocation. [akpm@linux-foundation.org: remove unneeded __must_check] [david-b@pacbell.net: don't export gpiochip_reserve (section fix)] Signed-off-by: Anton Vorontsov Signed-off-by: David Brownell Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/gpio/gpiolib.c | 50 +++++++++++++++++++++++++++++++++++++++++++--- include/asm-generic/gpio.h | 1 + 2 files changed, 48 insertions(+), 3 deletions(-) diff --git a/drivers/gpio/gpiolib.c b/drivers/gpio/gpiolib.c index 2ba6127c4fae..24c62b848bf9 100644 --- a/drivers/gpio/gpiolib.c +++ b/drivers/gpio/gpiolib.c @@ -43,6 +43,7 @@ struct gpio_desc { /* flag symbols are bit numbers */ #define FLAG_REQUESTED 0 #define FLAG_IS_OUT 1 +#define FLAG_RESERVED 2 #ifdef CONFIG_DEBUG_FS const char *label; @@ -88,9 +89,10 @@ static int gpiochip_find_base(int ngpio) int base = -ENOSPC; for (i = ARCH_NR_GPIOS - 1; i >= 0 ; i--) { - struct gpio_chip *chip = gpio_desc[i].chip; + struct gpio_desc *desc = &gpio_desc[i]; + struct gpio_chip *chip = desc->chip; - if (!chip) { + if (!chip && !test_bit(FLAG_RESERVED, &desc->flags)) { spare++; if (spare == ngpio) { base = i; @@ -98,7 +100,8 @@ static int gpiochip_find_base(int ngpio) } } else { spare = 0; - i -= chip->ngpio - 1; + if (chip) + i -= chip->ngpio - 1; } } @@ -107,6 +110,47 @@ static int gpiochip_find_base(int ngpio) return base; } +/** + * gpiochip_reserve() - reserve range of gpios to use with platform code only + * @start: starting gpio number + * @ngpio: number of gpios to reserve + * Context: platform init, potentially before irqs or kmalloc will work + * + * Returns a negative errno if any gpio within the range is already reserved + * or registered, else returns zero as a success code. Use this function + * to mark a range of gpios as unavailable for dynamic gpio number allocation, + * for example because its driver support is not yet loaded. + */ +int __init gpiochip_reserve(int start, int ngpio) +{ + int ret = 0; + unsigned long flags; + int i; + + if (!gpio_is_valid(start) || !gpio_is_valid(start + ngpio)) + return -EINVAL; + + spin_lock_irqsave(&gpio_lock, flags); + + for (i = start; i < start + ngpio; i++) { + struct gpio_desc *desc = &gpio_desc[i]; + + if (desc->chip || test_bit(FLAG_RESERVED, &desc->flags)) { + ret = -EBUSY; + goto err; + } + + set_bit(FLAG_RESERVED, &desc->flags); + } + + pr_debug("%s: reserved gpios from %d to %d\n", + __func__, start, start + ngpio - 1); +err: + spin_unlock_irqrestore(&gpio_lock, flags); + + return ret; +} + /** * gpiochip_add() - register a gpio_chip * @chip: the chip to register, with chip->base initialized diff --git a/include/asm-generic/gpio.h b/include/asm-generic/gpio.h index 464c5b334dc2..ecf675a59d21 100644 --- a/include/asm-generic/gpio.h +++ b/include/asm-generic/gpio.h @@ -74,6 +74,7 @@ struct gpio_chip { extern const char *gpiochip_is_requested(struct gpio_chip *chip, unsigned offset); +extern int __init __must_check gpiochip_reserve(int start, int ngpio); /* add/remove chips */ extern int gpiochip_add(struct gpio_chip *chip); -- cgit v1.2.3 From 6b745b6fd02213f4b2fef2f2635985929fc5b8cc Mon Sep 17 00:00:00 2001 From: Michal Januszewski Date: Mon, 28 Apr 2008 02:14:48 -0700 Subject: fbdev: make the best-fit section of fb_find_mode return the closest matching mode Currently, if a perfect match in terms of resolution is not found, fb_find_mode() only looks for a best-fit mode among modes with a higher resolution than the one requested. Thus, if the user requests a resolution higher than the largest supported one, they are dropped to the default mode (usually a low resolution one). Change this behaviour so that all valid video modes are considered when looking for a best-fit mode, while still preferring modes with a higher resolution. Signed-off-by: Michal Januszewski Cc: Krzysztof Helt Cc: "Antonino A. Daplas" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/video/modedb.c | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/drivers/video/modedb.c b/drivers/video/modedb.c index 08d072552233..640351c9a9e1 100644 --- a/drivers/video/modedb.c +++ b/drivers/video/modedb.c @@ -522,7 +522,7 @@ int fb_find_mode(struct fb_var_screeninfo *var, int res_specified = 0, bpp_specified = 0, refresh_specified = 0; unsigned int xres = 0, yres = 0, bpp = default_bpp, refresh = 0; int yres_specified = 0, cvt = 0, rb = 0, interlace = 0, margins = 0; - u32 best, diff; + u32 best, diff, tdiff; for (i = namelen-1; i >= 0; i--) { switch (name[i]) { @@ -651,19 +651,27 @@ done: return (refresh_specified) ? 2 : 1; } - diff = xres + yres; + diff = 2 * (xres + yres); best = -1; DPRINTK("Trying best-fit modes\n"); for (i = 0; i < dbsize; i++) { - if (xres <= db[i].xres && yres <= db[i].yres) { DPRINTK("Trying %ix%i\n", db[i].xres, db[i].yres); if (!fb_try_mode(var, info, &db[i], bpp)) { - if (diff > (db[i].xres - xres) + (db[i].yres - yres)) { - diff = (db[i].xres - xres) + (db[i].yres - yres); - best = i; - } + tdiff = abs(db[i].xres - xres) + + abs(db[i].yres - yres); + + /* + * Penalize modes with resolutions smaller + * than requested. + */ + if (xres > db[i].xres || yres > db[i].yres) + tdiff += xres + yres; + + if (diff > tdiff) { + diff = tdiff; + best = i; + } } - } } if (best != -1) { fb_try_mode(var, info, &db[best], bpp); -- cgit v1.2.3 From e4c690e061b909127ab0f12e929f82f3f39ec953 Mon Sep 17 00:00:00 2001 From: Anton Vorontsov Date: Mon, 28 Apr 2008 02:14:49 -0700 Subject: fb: add support for foreign endianness Add support for the framebuffers with non-native endianness. This is done via FBINFO_FOREIGN_ENDIAN flag that will be used by the drivers. Depending on the host endianness this flag will be overwritten by FBINFO_BE_MATH internal flag, or cleared. Tested to work on MPC8360E-RDK (BE) + Fujitsu MINT framebuffer (LE). Signed-off-by: Anton Vorontsov Cc: "Antonino A. Daplas" Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Cc: Clemens Koller Cc: Krzysztof Helt Cc: Geert Uytterhoeven Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/video/Kconfig | 24 +++++++++++++++++++++ drivers/video/cfbcopyarea.c | 23 +++++++++++--------- drivers/video/cfbfillrect.c | 48 ++++++++++++++++++++++------------------- drivers/video/cfbimgblt.c | 52 ++++++++++++++++++++++----------------------- drivers/video/fb_draw.h | 31 +++++++++++++++------------ drivers/video/fbmem.c | 30 ++++++++++++++++++++++++++ drivers/video/syscopyarea.c | 20 ++++++++--------- drivers/video/sysfillrect.c | 49 +++++++++++++++++++++--------------------- drivers/video/sysimgblt.c | 47 +++++++++++++++++++--------------------- include/linux/fb.h | 44 ++++++++++++++++++++++++++++++-------- 10 files changed, 228 insertions(+), 140 deletions(-) diff --git a/drivers/video/Kconfig b/drivers/video/Kconfig index e3dc8f8d0c3e..da7b9705ca81 100644 --- a/drivers/video/Kconfig +++ b/drivers/video/Kconfig @@ -139,6 +139,30 @@ config FB_SYS_IMAGEBLIT blitting. This is used by drivers that don't provide their own (accelerated) version and the framebuffer is in system RAM. +menuconfig FB_FOREIGN_ENDIAN + bool "Framebuffer foreign endianness support" + depends on FB + ---help--- + This menu will let you enable support for the framebuffers with + non-native endianness (e.g. Little-Endian framebuffer on a + Big-Endian machine). Most probably you don't have such hardware, + so it's safe to say "n" here. + +choice + prompt "Choice endianness support" + depends on FB_FOREIGN_ENDIAN + +config FB_BOTH_ENDIAN + bool "Support for Big- and Little-Endian framebuffers" + +config FB_BIG_ENDIAN + bool "Support for Big-Endian framebuffers only" + +config FB_LITTLE_ENDIAN + bool "Support for Little-Endian framebuffers only" + +endchoice + config FB_SYS_FOPS tristate depends on FB diff --git a/drivers/video/cfbcopyarea.c b/drivers/video/cfbcopyarea.c index b07e419b12d2..df03f3776dcc 100644 --- a/drivers/video/cfbcopyarea.c +++ b/drivers/video/cfbcopyarea.c @@ -44,15 +44,16 @@ */ static void -bitcpy(unsigned long __iomem *dst, int dst_idx, const unsigned long __iomem *src, - int src_idx, int bits, unsigned n, u32 bswapmask) +bitcpy(struct fb_info *p, unsigned long __iomem *dst, int dst_idx, + const unsigned long __iomem *src, int src_idx, int bits, + unsigned n, u32 bswapmask) { unsigned long first, last; int const shift = dst_idx-src_idx; int left, right; - first = fb_shifted_pixels_mask_long(dst_idx, bswapmask); - last = ~fb_shifted_pixels_mask_long((dst_idx+n) % bits, bswapmask); + first = fb_shifted_pixels_mask_long(p, dst_idx, bswapmask); + last = ~fb_shifted_pixels_mask_long(p, (dst_idx+n) % bits, bswapmask); if (!shift) { // Same alignment for source and dest @@ -202,8 +203,9 @@ bitcpy(unsigned long __iomem *dst, int dst_idx, const unsigned long __iomem *src */ static void -bitcpy_rev(unsigned long __iomem *dst, int dst_idx, const unsigned long __iomem *src, - int src_idx, int bits, unsigned n, u32 bswapmask) +bitcpy_rev(struct fb_info *p, unsigned long __iomem *dst, int dst_idx, + const unsigned long __iomem *src, int src_idx, int bits, + unsigned n, u32 bswapmask) { unsigned long first, last; int shift; @@ -221,8 +223,9 @@ bitcpy_rev(unsigned long __iomem *dst, int dst_idx, const unsigned long __iomem shift = dst_idx-src_idx; - first = fb_shifted_pixels_mask_long(bits - 1 - dst_idx, bswapmask); - last = ~fb_shifted_pixels_mask_long(bits - 1 - ((dst_idx-n) % bits), bswapmask); + first = fb_shifted_pixels_mask_long(p, bits - 1 - dst_idx, bswapmask); + last = ~fb_shifted_pixels_mask_long(p, bits - 1 - ((dst_idx-n) % bits), + bswapmask); if (!shift) { // Same alignment for source and dest @@ -404,7 +407,7 @@ void cfb_copyarea(struct fb_info *p, const struct fb_copyarea *area) dst_idx &= (bytes - 1); src += src_idx >> (ffs(bits) - 1); src_idx &= (bytes - 1); - bitcpy_rev(dst, dst_idx, src, src_idx, bits, + bitcpy_rev(p, dst, dst_idx, src, src_idx, bits, width*p->var.bits_per_pixel, bswapmask); } } else { @@ -413,7 +416,7 @@ void cfb_copyarea(struct fb_info *p, const struct fb_copyarea *area) dst_idx &= (bytes - 1); src += src_idx >> (ffs(bits) - 1); src_idx &= (bytes - 1); - bitcpy(dst, dst_idx, src, src_idx, bits, + bitcpy(p, dst, dst_idx, src, src_idx, bits, width*p->var.bits_per_pixel, bswapmask); dst_idx += bits_per_line; src_idx += bits_per_line; diff --git a/drivers/video/cfbfillrect.c b/drivers/video/cfbfillrect.c index 23d70a12e4da..64b35766b2a2 100644 --- a/drivers/video/cfbfillrect.c +++ b/drivers/video/cfbfillrect.c @@ -36,16 +36,16 @@ */ static void -bitfill_aligned(unsigned long __iomem *dst, int dst_idx, unsigned long pat, - unsigned n, int bits, u32 bswapmask) +bitfill_aligned(struct fb_info *p, unsigned long __iomem *dst, int dst_idx, + unsigned long pat, unsigned n, int bits, u32 bswapmask) { unsigned long first, last; if (!n) return; - first = fb_shifted_pixels_mask_long(dst_idx, bswapmask); - last = ~fb_shifted_pixels_mask_long((dst_idx+n) % bits, bswapmask); + first = fb_shifted_pixels_mask_long(p, dst_idx, bswapmask); + last = ~fb_shifted_pixels_mask_long(p, (dst_idx+n) % bits, bswapmask); if (dst_idx+n <= bits) { // Single word @@ -93,16 +93,16 @@ bitfill_aligned(unsigned long __iomem *dst, int dst_idx, unsigned long pat, */ static void -bitfill_unaligned(unsigned long __iomem *dst, int dst_idx, unsigned long pat, - int left, int right, unsigned n, int bits) +bitfill_unaligned(struct fb_info *p, unsigned long __iomem *dst, int dst_idx, + unsigned long pat, int left, int right, unsigned n, int bits) { unsigned long first, last; if (!n) return; - first = FB_SHIFT_HIGH(~0UL, dst_idx); - last = ~(FB_SHIFT_HIGH(~0UL, (dst_idx+n) % bits)); + first = FB_SHIFT_HIGH(p, ~0UL, dst_idx); + last = ~(FB_SHIFT_HIGH(p, ~0UL, (dst_idx+n) % bits)); if (dst_idx+n <= bits) { // Single word @@ -147,8 +147,9 @@ bitfill_unaligned(unsigned long __iomem *dst, int dst_idx, unsigned long pat, * Aligned pattern invert using 32/64-bit memory accesses */ static void -bitfill_aligned_rev(unsigned long __iomem *dst, int dst_idx, unsigned long pat, - unsigned n, int bits, u32 bswapmask) +bitfill_aligned_rev(struct fb_info *p, unsigned long __iomem *dst, + int dst_idx, unsigned long pat, unsigned n, int bits, + u32 bswapmask) { unsigned long val = pat, dat; unsigned long first, last; @@ -156,8 +157,8 @@ bitfill_aligned_rev(unsigned long __iomem *dst, int dst_idx, unsigned long pat, if (!n) return; - first = fb_shifted_pixels_mask_long(dst_idx, bswapmask); - last = ~fb_shifted_pixels_mask_long((dst_idx+n) % bits, bswapmask); + first = fb_shifted_pixels_mask_long(p, dst_idx, bswapmask); + last = ~fb_shifted_pixels_mask_long(p, (dst_idx+n) % bits, bswapmask); if (dst_idx+n <= bits) { // Single word @@ -217,16 +218,17 @@ bitfill_aligned_rev(unsigned long __iomem *dst, int dst_idx, unsigned long pat, */ static void -bitfill_unaligned_rev(unsigned long __iomem *dst, int dst_idx, unsigned long pat, - int left, int right, unsigned n, int bits) +bitfill_unaligned_rev(struct fb_info *p, unsigned long __iomem *dst, + int dst_idx, unsigned long pat, int left, int right, + unsigned n, int bits) { unsigned long first, last, dat; if (!n) return; - first = FB_SHIFT_HIGH(~0UL, dst_idx); - last = ~(FB_SHIFT_HIGH(~0UL, (dst_idx+n) % bits)); + first = FB_SHIFT_HIGH(p, ~0UL, dst_idx); + last = ~(FB_SHIFT_HIGH(p, ~0UL, (dst_idx+n) % bits)); if (dst_idx+n <= bits) { // Single word @@ -306,7 +308,8 @@ void cfb_fillrect(struct fb_info *p, const struct fb_fillrect *rect) p->fbops->fb_sync(p); if (!left) { u32 bswapmask = fb_compute_bswapmask(p); - void (*fill_op32)(unsigned long __iomem *dst, int dst_idx, + void (*fill_op32)(struct fb_info *p, + unsigned long __iomem *dst, int dst_idx, unsigned long pat, unsigned n, int bits, u32 bswapmask) = NULL; @@ -325,16 +328,17 @@ void cfb_fillrect(struct fb_info *p, const struct fb_fillrect *rect) while (height--) { dst += dst_idx >> (ffs(bits) - 1); dst_idx &= (bits - 1); - fill_op32(dst, dst_idx, pat, width*bpp, bits, bswapmask); + fill_op32(p, dst, dst_idx, pat, width*bpp, bits, + bswapmask); dst_idx += p->fix.line_length*8; } } else { int right; int r; int rot = (left-dst_idx) % bpp; - void (*fill_op)(unsigned long __iomem *dst, int dst_idx, - unsigned long pat, int left, int right, - unsigned n, int bits) = NULL; + void (*fill_op)(struct fb_info *p, unsigned long __iomem *dst, + int dst_idx, unsigned long pat, int left, + int right, unsigned n, int bits) = NULL; /* rotate pattern to correct start position */ pat = pat << rot | pat >> (bpp-rot); @@ -355,7 +359,7 @@ void cfb_fillrect(struct fb_info *p, const struct fb_fillrect *rect) while (height--) { dst += dst_idx >> (ffs(bits) - 1); dst_idx &= (bits - 1); - fill_op(dst, dst_idx, pat, left, right, + fill_op(p, dst, dst_idx, pat, left, right, width*bpp, bits); r = (p->fix.line_length*8) % bpp; pat = pat << (bpp-r) | pat >> r; diff --git a/drivers/video/cfbimgblt.c b/drivers/video/cfbimgblt.c index f598907b42ad..ff3136bd464b 100644 --- a/drivers/video/cfbimgblt.c +++ b/drivers/video/cfbimgblt.c @@ -43,30 +43,26 @@ #define DPRINTK(fmt, args...) #endif -static const u32 cfb_tab8[] = { -#if defined(__BIG_ENDIAN) +static const u32 cfb_tab8_be[] = { 0x00000000,0x000000ff,0x0000ff00,0x0000ffff, 0x00ff0000,0x00ff00ff,0x00ffff00,0x00ffffff, 0xff000000,0xff0000ff,0xff00ff00,0xff00ffff, 0xffff0000,0xffff00ff,0xffffff00,0xffffffff -#elif defined(__LITTLE_ENDIAN) +}; + +static const u32 cfb_tab8_le[] = { 0x00000000,0xff000000,0x00ff0000,0xffff0000, 0x0000ff00,0xff00ff00,0x00ffff00,0xffffff00, 0x000000ff,0xff0000ff,0x00ff00ff,0xffff00ff, 0x0000ffff,0xff00ffff,0x00ffffff,0xffffffff -#else -#error FIXME: No endianness?? -#endif }; -static const u32 cfb_tab16[] = { -#if defined(__BIG_ENDIAN) +static const u32 cfb_tab16_be[] = { 0x00000000, 0x0000ffff, 0xffff0000, 0xffffffff -#elif defined(__LITTLE_ENDIAN) +}; + +static const u32 cfb_tab16_le[] = { 0x00000000, 0xffff0000, 0x0000ffff, 0xffffffff -#else -#error FIXME: No endianness?? -#endif }; static const u32 cfb_tab32[] = { @@ -98,7 +94,8 @@ static inline void color_imageblit(const struct fb_image *image, val = 0; if (start_index) { - u32 start_mask = ~fb_shifted_pixels_mask_u32(start_index, bswapmask); + u32 start_mask = ~fb_shifted_pixels_mask_u32(p, + start_index, bswapmask); val = FB_READL(dst) & start_mask; shift = start_index; } @@ -108,20 +105,21 @@ static inline void color_imageblit(const struct fb_image *image, color = palette[*src]; else color = *src; - color <<= FB_LEFT_POS(bpp); - val |= FB_SHIFT_HIGH(color, shift ^ bswapmask); + color <<= FB_LEFT_POS(p, bpp); + val |= FB_SHIFT_HIGH(p, color, shift ^ bswapmask); if (shift >= null_bits) { FB_WRITEL(val, dst++); val = (shift == null_bits) ? 0 : - FB_SHIFT_LOW(color, 32 - shift); + FB_SHIFT_LOW(p, color, 32 - shift); } shift += bpp; shift &= (32 - 1); src++; } if (shift) { - u32 end_mask = fb_shifted_pixels_mask_u32(shift, bswapmask); + u32 end_mask = fb_shifted_pixels_mask_u32(p, shift, + bswapmask); FB_WRITEL((FB_READL(dst) & end_mask) | val, dst); } @@ -152,8 +150,8 @@ static inline void slow_imageblit(const struct fb_image *image, struct fb_info * u32 bswapmask = fb_compute_bswapmask(p); dst2 = (u32 __iomem *) dst1; - fgcolor <<= FB_LEFT_POS(bpp); - bgcolor <<= FB_LEFT_POS(bpp); + fgcolor <<= FB_LEFT_POS(p, bpp); + bgcolor <<= FB_LEFT_POS(p, bpp); for (i = image->height; i--; ) { shift = val = 0; @@ -164,7 +162,8 @@ static inline void slow_imageblit(const struct fb_image *image, struct fb_info * /* write leading bits */ if (start_index) { - u32 start_mask = ~fb_shifted_pixels_mask_u32(start_index, bswapmask); + u32 start_mask = ~fb_shifted_pixels_mask_u32(p, + start_index, bswapmask); val = FB_READL(dst) & start_mask; shift = start_index; } @@ -172,13 +171,13 @@ static inline void slow_imageblit(const struct fb_image *image, struct fb_info * while (j--) { l--; color = (*s & (1 << l)) ? fgcolor : bgcolor; - val |= FB_SHIFT_HIGH(color, shift ^ bswapmask); + val |= FB_SHIFT_HIGH(p, color, shift ^ bswapmask); /* Did the bitshift spill bits to the next long? */ if (shift >= null_bits) { FB_WRITEL(val, dst++); val = (shift == null_bits) ? 0 : - FB_SHIFT_LOW(color,32 - shift); + FB_SHIFT_LOW(p, color, 32 - shift); } shift += bpp; shift &= (32 - 1); @@ -187,7 +186,8 @@ static inline void slow_imageblit(const struct fb_image *image, struct fb_info * /* write trailing bits */ if (shift) { - u32 end_mask = fb_shifted_pixels_mask_u32(shift, bswapmask); + u32 end_mask = fb_shifted_pixels_mask_u32(p, shift, + bswapmask); FB_WRITEL((FB_READL(dst) & end_mask) | val, dst); } @@ -223,13 +223,13 @@ static inline void fast_imageblit(const struct fb_image *image, struct fb_info * u32 __iomem *dst; const u32 *tab = NULL; int i, j, k; - + switch (bpp) { case 8: - tab = cfb_tab8; + tab = fb_be_math(p) ? cfb_tab8_be : cfb_tab8_le; break; case 16: - tab = cfb_tab16; + tab = fb_be_math(p) ? cfb_tab16_be : cfb_tab16_le; break; case 32: default: diff --git a/drivers/video/fb_draw.h b/drivers/video/fb_draw.h index a2a0618d86a5..1db622192bde 100644 --- a/drivers/video/fb_draw.h +++ b/drivers/video/fb_draw.h @@ -94,41 +94,44 @@ static inline unsigned long fb_rev_pixels_in_long(unsigned long val, return val; } -static inline u32 fb_shifted_pixels_mask_u32(u32 index, u32 bswapmask) +static inline u32 fb_shifted_pixels_mask_u32(struct fb_info *p, u32 index, + u32 bswapmask) { u32 mask; if (!bswapmask) { - mask = FB_SHIFT_HIGH(~(u32)0, index); + mask = FB_SHIFT_HIGH(p, ~(u32)0, index); } else { - mask = 0xff << FB_LEFT_POS(8); - mask = FB_SHIFT_LOW(mask, index & (bswapmask)) & mask; - mask = FB_SHIFT_HIGH(mask, index & ~(bswapmask)); + mask = 0xff << FB_LEFT_POS(p, 8); + mask = FB_SHIFT_LOW(p, mask, index & (bswapmask)) & mask; + mask = FB_SHIFT_HIGH(p, mask, index & ~(bswapmask)); #if defined(__i386__) || defined(__x86_64__) /* Shift argument is limited to 0 - 31 on x86 based CPU's */ if(index + bswapmask < 32) #endif - mask |= FB_SHIFT_HIGH(~(u32)0, + mask |= FB_SHIFT_HIGH(p, ~(u32)0, (index + bswapmask) & ~(bswapmask)); } return mask; } -static inline unsigned long fb_shifted_pixels_mask_long(u32 index, u32 bswapmask) +static inline unsigned long fb_shifted_pixels_mask_long(struct fb_info *p, + u32 index, + u32 bswapmask) { unsigned long mask; if (!bswapmask) { - mask = FB_SHIFT_HIGH(~0UL, index); + mask = FB_SHIFT_HIGH(p, ~0UL, index); } else { - mask = 0xff << FB_LEFT_POS(8); - mask = FB_SHIFT_LOW(mask, index & (bswapmask)) & mask; - mask = FB_SHIFT_HIGH(mask, index & ~(bswapmask)); + mask = 0xff << FB_LEFT_POS(p, 8); + mask = FB_SHIFT_LOW(p, mask, index & (bswapmask)) & mask; + mask = FB_SHIFT_HIGH(p, mask, index & ~(bswapmask)); #if defined(__i386__) || defined(__x86_64__) /* Shift argument is limited to 0 - 31 on x86 based CPU's */ if(index + bswapmask < BITS_PER_LONG) #endif - mask |= FB_SHIFT_HIGH(~0UL, + mask |= FB_SHIFT_HIGH(p, ~0UL, (index + bswapmask) & ~(bswapmask)); } return mask; @@ -158,8 +161,8 @@ static inline unsigned long fb_rev_pixels_in_long(unsigned long val, return val; } -#define fb_shifted_pixels_mask_u32(i, b) FB_SHIFT_HIGH(~(u32)0, (i)) -#define fb_shifted_pixels_mask_long(i, b) FB_SHIFT_HIGH(~0UL, (i)) +#define fb_shifted_pixels_mask_u32(p, i, b) FB_SHIFT_HIGH((p), ~(u32)0, (i)) +#define fb_shifted_pixels_mask_long(p, i, b) FB_SHIFT_HIGH((p), ~0UL, (i)) #define fb_compute_bswapmask(...) 0 #endif /* CONFIG_FB_CFB_REV_PIXELS_IN_BYTE */ diff --git a/drivers/video/fbmem.c b/drivers/video/fbmem.c index 01072f4b3e8f..279c2dbef8f8 100644 --- a/drivers/video/fbmem.c +++ b/drivers/video/fbmem.c @@ -1352,6 +1352,32 @@ static const struct file_operations fb_fops = { struct class *fb_class; EXPORT_SYMBOL(fb_class); + +static int fb_check_foreignness(struct fb_info *fi) +{ + const bool foreign_endian = fi->flags & FBINFO_FOREIGN_ENDIAN; + + fi->flags &= ~FBINFO_FOREIGN_ENDIAN; + +#ifdef __BIG_ENDIAN + fi->flags |= foreign_endian ? 0 : FBINFO_BE_MATH; +#else + fi->flags |= foreign_endian ? FBINFO_BE_MATH : 0; +#endif /* __BIG_ENDIAN */ + + if (fi->flags & FBINFO_BE_MATH && !fb_be_math(fi)) { + pr_err("%s: enable CONFIG_FB_BIG_ENDIAN to " + "support this framebuffer\n", fi->fix.id); + return -ENOSYS; + } else if (!(fi->flags & FBINFO_BE_MATH) && fb_be_math(fi)) { + pr_err("%s: enable CONFIG_FB_LITTLE_ENDIAN to " + "support this framebuffer\n", fi->fix.id); + return -ENOSYS; + } + + return 0; +} + /** * register_framebuffer - registers a frame buffer device * @fb_info: frame buffer info structure @@ -1371,6 +1397,10 @@ register_framebuffer(struct fb_info *fb_info) if (num_registered_fb == FB_MAX) return -ENXIO; + + if (fb_check_foreignness(fb_info)) + return -ENOSYS; + num_registered_fb++; for (i = 0 ; i < FB_MAX; i++) if (!registered_fb[i]) diff --git a/drivers/video/syscopyarea.c b/drivers/video/syscopyarea.c index 37af10ab8f52..a352d5f46bbf 100644 --- a/drivers/video/syscopyarea.c +++ b/drivers/video/syscopyarea.c @@ -26,15 +26,15 @@ */ static void -bitcpy(unsigned long *dst, int dst_idx, const unsigned long *src, - int src_idx, int bits, unsigned n) +bitcpy(struct fb_info *p, unsigned long *dst, int dst_idx, + const unsigned long *src, int src_idx, int bits, unsigned n) { unsigned long first, last; int const shift = dst_idx-src_idx; int left, right; - first = FB_SHIFT_HIGH(~0UL, dst_idx); - last = ~(FB_SHIFT_HIGH(~0UL, (dst_idx+n) % bits)); + first = FB_SHIFT_HIGH(p, ~0UL, dst_idx); + last = ~(FB_SHIFT_HIGH(p, ~0UL, (dst_idx+n) % bits)); if (!shift) { /* Same alignment for source and dest */ @@ -167,8 +167,8 @@ bitcpy(unsigned long *dst, int dst_idx, const unsigned long *src, */ static void -bitcpy_rev(unsigned long *dst, int dst_idx, const unsigned long *src, - int src_idx, int bits, unsigned n) +bitcpy_rev(struct fb_info *p, unsigned long *dst, int dst_idx, + const unsigned long *src, int src_idx, int bits, unsigned n) { unsigned long first, last; int shift; @@ -186,8 +186,8 @@ bitcpy_rev(unsigned long *dst, int dst_idx, const unsigned long *src, shift = dst_idx-src_idx; - first = FB_SHIFT_LOW(~0UL, bits - 1 - dst_idx); - last = ~(FB_SHIFT_LOW(~0UL, bits - 1 - ((dst_idx-n) % bits))); + first = FB_SHIFT_LOW(p, ~0UL, bits - 1 - dst_idx); + last = ~(FB_SHIFT_LOW(p, ~0UL, bits - 1 - ((dst_idx-n) % bits))); if (!shift) { /* Same alignment for source and dest */ @@ -353,7 +353,7 @@ void sys_copyarea(struct fb_info *p, const struct fb_copyarea *area) dst_idx &= (bytes - 1); src += src_idx >> (ffs(bits) - 1); src_idx &= (bytes - 1); - bitcpy_rev(dst, dst_idx, src, src_idx, bits, + bitcpy_rev(p, dst, dst_idx, src, src_idx, bits, width*p->var.bits_per_pixel); } } else { @@ -362,7 +362,7 @@ void sys_copyarea(struct fb_info *p, const struct fb_copyarea *area) dst_idx &= (bytes - 1); src += src_idx >> (ffs(bits) - 1); src_idx &= (bytes - 1); - bitcpy(dst, dst_idx, src, src_idx, bits, + bitcpy(p, dst, dst_idx, src, src_idx, bits, width*p->var.bits_per_pixel); dst_idx += bits_per_line; src_idx += bits_per_line; diff --git a/drivers/video/sysfillrect.c b/drivers/video/sysfillrect.c index a261e9e6a675..f94d6b6e29ee 100644 --- a/drivers/video/sysfillrect.c +++ b/drivers/video/sysfillrect.c @@ -22,16 +22,16 @@ */ static void -bitfill_aligned(unsigned long *dst, int dst_idx, unsigned long pat, - unsigned n, int bits) +bitfill_aligned(struct fb_info *p, unsigned long *dst, int dst_idx, + unsigned long pat, unsigned n, int bits) { unsigned long first, last; if (!n) return; - first = FB_SHIFT_HIGH(~0UL, dst_idx); - last = ~(FB_SHIFT_HIGH(~0UL, (dst_idx+n) % bits)); + first = FB_SHIFT_HIGH(p, ~0UL, dst_idx); + last = ~(FB_SHIFT_HIGH(p, ~0UL, (dst_idx+n) % bits)); if (dst_idx+n <= bits) { /* Single word */ @@ -78,16 +78,16 @@ bitfill_aligned(unsigned long *dst, int dst_idx, unsigned long pat, */ static void -bitfill_unaligned(unsigned long *dst, int dst_idx, unsigned long pat, - int left, int right, unsigned n, int bits) +bitfill_unaligned(struct fb_info *p, unsigned long *dst, int dst_idx, + unsigned long pat, int left, int right, unsigned n, int bits) { unsigned long first, last; if (!n) return; - first = FB_SHIFT_HIGH(~0UL, dst_idx); - last = ~(FB_SHIFT_HIGH(~0UL, (dst_idx+n) % bits)); + first = FB_SHIFT_HIGH(p, ~0UL, dst_idx); + last = ~(FB_SHIFT_HIGH(p, ~0UL, (dst_idx+n) % bits)); if (dst_idx+n <= bits) { /* Single word */ @@ -132,8 +132,8 @@ bitfill_unaligned(unsigned long *dst, int dst_idx, unsigned long pat, * Aligned pattern invert using 32/64-bit memory accesses */ static void -bitfill_aligned_rev(unsigned long *dst, int dst_idx, unsigned long pat, - unsigned n, int bits) +bitfill_aligned_rev(struct fb_info *p, unsigned long *dst, int dst_idx, + unsigned long pat, unsigned n, int bits) { unsigned long val = pat; unsigned long first, last; @@ -141,8 +141,8 @@ bitfill_aligned_rev(unsigned long *dst, int dst_idx, unsigned long pat, if (!n) return; - first = FB_SHIFT_HIGH(~0UL, dst_idx); - last = ~(FB_SHIFT_HIGH(~0UL, (dst_idx+n) % bits)); + first = FB_SHIFT_HIGH(p, ~0UL, dst_idx); + last = ~(FB_SHIFT_HIGH(p, ~0UL, (dst_idx+n) % bits)); if (dst_idx+n <= bits) { /* Single word */ @@ -188,16 +188,17 @@ bitfill_aligned_rev(unsigned long *dst, int dst_idx, unsigned long pat, */ static void -bitfill_unaligned_rev(unsigned long *dst, int dst_idx, unsigned long pat, - int left, int right, unsigned n, int bits) +bitfill_unaligned_rev(struct fb_info *p, unsigned long *dst, int dst_idx, + unsigned long pat, int left, int right, unsigned n, + int bits) { unsigned long first, last; if (!n) return; - first = FB_SHIFT_HIGH(~0UL, dst_idx); - last = ~(FB_SHIFT_HIGH(~0UL, (dst_idx+n) % bits)); + first = FB_SHIFT_HIGH(p, ~0UL, dst_idx); + last = ~(FB_SHIFT_HIGH(p, ~0UL, (dst_idx+n) % bits)); if (dst_idx+n <= bits) { /* Single word */ @@ -267,9 +268,9 @@ void sys_fillrect(struct fb_info *p, const struct fb_fillrect *rect) if (p->fbops->fb_sync) p->fbops->fb_sync(p); if (!left) { - void (*fill_op32)(unsigned long *dst, int dst_idx, - unsigned long pat, unsigned n, int bits) = - NULL; + void (*fill_op32)(struct fb_info *p, unsigned long *dst, + int dst_idx, unsigned long pat, unsigned n, + int bits) = NULL; switch (rect->rop) { case ROP_XOR: @@ -287,16 +288,16 @@ void sys_fillrect(struct fb_info *p, const struct fb_fillrect *rect) while (height--) { dst += dst_idx >> (ffs(bits) - 1); dst_idx &= (bits - 1); - fill_op32(dst, dst_idx, pat, width*bpp, bits); + fill_op32(p, dst, dst_idx, pat, width*bpp, bits); dst_idx += p->fix.line_length*8; } } else { int right; int r; int rot = (left-dst_idx) % bpp; - void (*fill_op)(unsigned long *dst, int dst_idx, - unsigned long pat, int left, int right, - unsigned n, int bits) = NULL; + void (*fill_op)(struct fb_info *p, unsigned long *dst, + int dst_idx, unsigned long pat, int left, + int right, unsigned n, int bits) = NULL; /* rotate pattern to correct start position */ pat = pat << rot | pat >> (bpp-rot); @@ -318,7 +319,7 @@ void sys_fillrect(struct fb_info *p, const struct fb_fillrect *rect) while (height--) { dst += dst_idx >> (ffs(bits) - 1); dst_idx &= (bits - 1); - fill_op(dst, dst_idx, pat, left, right, + fill_op(p, dst, dst_idx, pat, left, right, width*bpp, bits); r = (p->fix.line_length*8) % bpp; pat = pat << (bpp-r) | pat >> r; diff --git a/drivers/video/sysimgblt.c b/drivers/video/sysimgblt.c index bd7e7e9d155f..88daa9b6f69a 100644 --- a/drivers/video/sysimgblt.c +++ b/drivers/video/sysimgblt.c @@ -23,30 +23,26 @@ #define DPRINTK(fmt, args...) #endif -static const u32 cfb_tab8[] = { -#if defined(__BIG_ENDIAN) +static const u32 cfb_tab8_be[] = { 0x00000000,0x000000ff,0x0000ff00,0x0000ffff, 0x00ff0000,0x00ff00ff,0x00ffff00,0x00ffffff, 0xff000000,0xff0000ff,0xff00ff00,0xff00ffff, 0xffff0000,0xffff00ff,0xffffff00,0xffffffff -#elif defined(__LITTLE_ENDIAN) +}; + +static const u32 cfb_tab8_le[] = { 0x00000000,0xff000000,0x00ff0000,0xffff0000, 0x0000ff00,0xff00ff00,0x00ffff00,0xffffff00, 0x000000ff,0xff0000ff,0x00ff00ff,0xffff00ff, 0x0000ffff,0xff00ffff,0x00ffffff,0xffffffff -#else -#error FIXME: No endianness?? -#endif }; -static const u32 cfb_tab16[] = { -#if defined(__BIG_ENDIAN) +static const u32 cfb_tab16_be[] = { 0x00000000, 0x0000ffff, 0xffff0000, 0xffffffff -#elif defined(__LITTLE_ENDIAN) +}; + +static const u32 cfb_tab16_le[] = { 0x00000000, 0xffff0000, 0x0000ffff, 0xffffffff -#else -#error FIXME: No endianness?? -#endif }; static const u32 cfb_tab32[] = { @@ -72,7 +68,7 @@ static void color_imageblit(const struct fb_image *image, struct fb_info *p, val = 0; if (start_index) { - u32 start_mask = ~(FB_SHIFT_HIGH(~(u32)0, + u32 start_mask = ~(FB_SHIFT_HIGH(p, ~(u32)0, start_index)); val = *dst & start_mask; shift = start_index; @@ -83,20 +79,20 @@ static void color_imageblit(const struct fb_image *image, struct fb_info *p, color = palette[*src]; else color = *src; - color <<= FB_LEFT_POS(bpp); - val |= FB_SHIFT_HIGH(color, shift); + color <<= FB_LEFT_POS(p, bpp); + val |= FB_SHIFT_HIGH(p, color, shift); if (shift >= null_bits) { *dst++ = val; val = (shift == null_bits) ? 0 : - FB_SHIFT_LOW(color, 32 - shift); + FB_SHIFT_LOW(p, color, 32 - shift); } shift += bpp; shift &= (32 - 1); src++; } if (shift) { - u32 end_mask = FB_SHIFT_HIGH(~(u32)0, shift); + u32 end_mask = FB_SHIFT_HIGH(p, ~(u32)0, shift); *dst &= end_mask; *dst |= val; @@ -125,8 +121,8 @@ static void slow_imageblit(const struct fb_image *image, struct fb_info *p, u32 i, j, l; dst2 = dst1; - fgcolor <<= FB_LEFT_POS(bpp); - bgcolor <<= FB_LEFT_POS(bpp); + fgcolor <<= FB_LEFT_POS(p, bpp); + bgcolor <<= FB_LEFT_POS(p, bpp); for (i = image->height; i--; ) { shift = val = 0; @@ -137,7 +133,8 @@ static void slow_imageblit(const struct fb_image *image, struct fb_info *p, /* write leading bits */ if (start_index) { - u32 start_mask = ~(FB_SHIFT_HIGH(~(u32)0,start_index)); + u32 start_mask = ~(FB_SHIFT_HIGH(p, ~(u32)0, + start_index)); val = *dst & start_mask; shift = start_index; } @@ -145,13 +142,13 @@ static void slow_imageblit(const struct fb_image *image, struct fb_info *p, while (j--) { l--; color = (*s & (1 << l)) ? fgcolor : bgcolor; - val |= FB_SHIFT_HIGH(color, shift); + val |= FB_SHIFT_HIGH(p, color, shift); /* Did the bitshift spill bits to the next long? */ if (shift >= null_bits) { *dst++ = val; val = (shift == null_bits) ? 0 : - FB_SHIFT_LOW(color,32 - shift); + FB_SHIFT_LOW(p, color, 32 - shift); } shift += bpp; shift &= (32 - 1); @@ -160,7 +157,7 @@ static void slow_imageblit(const struct fb_image *image, struct fb_info *p, /* write trailing bits */ if (shift) { - u32 end_mask = FB_SHIFT_HIGH(~(u32)0, shift); + u32 end_mask = FB_SHIFT_HIGH(p, ~(u32)0, shift); *dst &= end_mask; *dst |= val; @@ -199,10 +196,10 @@ static void fast_imageblit(const struct fb_image *image, struct fb_info *p, switch (bpp) { case 8: - tab = cfb_tab8; + tab = fb_be_math(p) ? cfb_tab8_be : cfb_tab8_le; break; case 16: - tab = cfb_tab16; + tab = fb_be_math(p) ? cfb_tab16_be : cfb_tab16_le; break; case 32: default: diff --git a/include/linux/fb.h b/include/linux/fb.h index 58c57a33e5dd..72295b099228 100644 --- a/include/linux/fb.h +++ b/include/linux/fb.h @@ -791,6 +791,17 @@ struct fb_tile_ops { */ #define FBINFO_MISC_ALWAYS_SETPAR 0x40000 +/* + * Host and GPU endianness differ. + */ +#define FBINFO_FOREIGN_ENDIAN 0x100000 +/* + * Big endian math. This is the same flags as above, but with different + * meaning, it is set by the fb subsystem depending FOREIGN_ENDIAN flag + * and host endianness. Drivers should not use this flag. + */ +#define FBINFO_BE_MATH 0x100000 + struct fb_info { int node; int flags; @@ -899,15 +910,11 @@ struct fb_info { #endif -#if defined (__BIG_ENDIAN) -#define FB_LEFT_POS(bpp) (32 - bpp) -#define FB_SHIFT_HIGH(val, bits) ((val) >> (bits)) -#define FB_SHIFT_LOW(val, bits) ((val) << (bits)) -#else -#define FB_LEFT_POS(bpp) (0) -#define FB_SHIFT_HIGH(val, bits) ((val) << (bits)) -#define FB_SHIFT_LOW(val, bits) ((val) >> (bits)) -#endif +#define FB_LEFT_POS(p, bpp) (fb_be_math(p) ? (32 - (bpp)) : 0) +#define FB_SHIFT_HIGH(p, val, bits) (fb_be_math(p) ? (val) >> (bits) : \ + (val) << (bits)) +#define FB_SHIFT_LOW(p, val, bits) (fb_be_math(p) ? (val) << (bits) : \ + (val) >> (bits)) /* * `Generic' versions of the frame buffer device operations @@ -970,6 +977,25 @@ extern void fb_deferred_io_cleanup(struct fb_info *info); extern int fb_deferred_io_fsync(struct file *file, struct dentry *dentry, int datasync); +static inline bool fb_be_math(struct fb_info *info) +{ +#ifdef CONFIG_FB_FOREIGN_ENDIAN +#if defined(CONFIG_FB_BOTH_ENDIAN) + return info->flags & FBINFO_BE_MATH; +#elif defined(CONFIG_FB_BIG_ENDIAN) + return true; +#elif defined(CONFIG_FB_LITTLE_ENDIAN) + return false; +#endif /* CONFIG_FB_BOTH_ENDIAN */ +#else +#ifdef __BIG_ENDIAN + return true; +#else + return false; +#endif /* __BIG_ENDIAN */ +#endif /* CONFIG_FB_FOREIGN_ENDIAN */ +} + /* drivers/video/fbsysfs.c */ extern struct fb_info *framebuffer_alloc(size_t size, struct device *dev); extern void framebuffer_release(struct fb_info *info); -- cgit v1.2.3 From 7f29b87a7779505288a31df16ba84a85fc1ae93c Mon Sep 17 00:00:00 2001 From: Anton Vorontsov Date: Mon, 28 Apr 2008 02:14:50 -0700 Subject: powerpc: offb: add support for foreign endianness Signed-off-by: Anton Vorontsov Cc: "Antonino A. Daplas" Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/video/offb.c | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/drivers/video/offb.c b/drivers/video/offb.c index 452433d46973..d7b3dcc0dc43 100644 --- a/drivers/video/offb.c +++ b/drivers/video/offb.c @@ -248,7 +248,7 @@ static void __iomem *offb_map_reg(struct device_node *np, int index, static void __init offb_init_fb(const char *name, const char *full_name, int width, int height, int depth, int pitch, unsigned long address, - struct device_node *dp) + int foreign_endian, struct device_node *dp) { unsigned long res_size = pitch * height * (depth + 7) / 8; struct offb_par *par = &default_par; @@ -397,7 +397,7 @@ static void __init offb_init_fb(const char *name, const char *full_name, info->screen_base = ioremap(address, fix->smem_len); info->par = par; info->pseudo_palette = (void *) (info + 1); - info->flags = FBINFO_DEFAULT; + info->flags = FBINFO_DEFAULT | foreign_endian; fb_alloc_cmap(&info->cmap, 256, 0); @@ -424,6 +424,15 @@ static void __init offb_init_nodriver(struct device_node *dp, int no_real_node) u64 rstart, address = OF_BAD_ADDR; const u32 *pp, *addrp, *up; u64 asize; + int foreign_endian = 0; + +#ifdef __BIG_ENDIAN + if (of_get_property(dp, "little-endian", NULL)) + foreign_endian = FBINFO_FOREIGN_ENDIAN; +#else + if (of_get_property(dp, "big-endian", NULL)) + foreign_endian = FBINFO_FOREIGN_ENDIAN; +#endif pp = of_get_property(dp, "linux,bootx-depth", &len); if (pp == NULL) @@ -509,7 +518,7 @@ static void __init offb_init_nodriver(struct device_node *dp, int no_real_node) offb_init_fb(no_real_node ? "bootx" : dp->name, no_real_node ? "display" : dp->full_name, width, height, depth, pitch, address, - no_real_node ? NULL : dp); + foreign_endian, no_real_node ? NULL : dp); } } -- cgit v1.2.3 From 416e74ea7813597b586eafc24f67779eeb86e12f Mon Sep 17 00:00:00 2001 From: Julia Lawall Date: Mon, 28 Apr 2008 02:14:51 -0700 Subject: fbdev: use DIV_ROUND_UP or roundup The kernel.h macro DIV_ROUND_UP performs the computation (((n) + (d) - 1) / (d)) but is perhaps more readable. An extract of the semantic patch that makes this change is as follows: (http://www.emn.fr/x-info/coccinelle/) // @haskernel@ @@ #include @depends on haskernel@ expression n,d; @@ ( - (n + d - 1) / d + DIV_ROUND_UP(n,d) | - (n + (d - 1)) / d + DIV_ROUND_UP(n,d) ) @depends on haskernel@ expression n,d; @@ - DIV_ROUND_UP((n),d) + DIV_ROUND_UP(n,d) @depends on haskernel@ expression n,d; @@ - DIV_ROUND_UP(n,(d)) + DIV_ROUND_UP(n,d) // Signed-off-by: Julia Lawall Cc: "Antonino A. Daplas" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/video/atafb.c | 2 +- drivers/video/cirrusfb.c | 2 +- drivers/video/console/fbcon.c | 3 +-- drivers/video/gxt4500.c | 2 +- 4 files changed, 4 insertions(+), 5 deletions(-) diff --git a/drivers/video/atafb.c b/drivers/video/atafb.c index 5d4fbaa53a6c..dff35474b854 100644 --- a/drivers/video/atafb.c +++ b/drivers/video/atafb.c @@ -1270,7 +1270,7 @@ again: gstart = (prescale / 2 + plen * left_margin) / prescale; /* gend1 is for hde (gend-gstart multiple of align), shifter's xres */ - gend1 = gstart + ((xres + align - 1) / align) * align * plen / prescale; + gend1 = gstart + roundup(xres, align) * plen / prescale; /* gend2 is for hbb, visible xres (rest to gend1 is cut off by hblank) */ gend2 = gstart + xres * plen / prescale; par->HHT = plen * (left_margin + xres + right_margin) / diff --git a/drivers/video/cirrusfb.c b/drivers/video/cirrusfb.c index f7e2d5add831..ccfbdc5a40db 100644 --- a/drivers/video/cirrusfb.c +++ b/drivers/video/cirrusfb.c @@ -3117,7 +3117,7 @@ static void bestclock(long freq, long *best, long *nom, } } } - d = ((143181 * n) + f - 1) / f; + d = DIV_ROUND_UP(143181 * n, f); if ((d >= 7) && (d <= 63)) { if (d > 31) d = (d / 2) * 2; diff --git a/drivers/video/console/fbcon.c b/drivers/video/console/fbcon.c index 022282494d3f..025d4f5e1f6f 100644 --- a/drivers/video/console/fbcon.c +++ b/drivers/video/console/fbcon.c @@ -620,8 +620,7 @@ static void fbcon_prepare_logo(struct vc_data *vc, struct fb_info *info, if (fb_get_color_depth(&info->var, &info->fix) == 1) erase &= ~0x400; logo_height = fb_prepare_logo(info, ops->rotate); - logo_lines = (logo_height + vc->vc_font.height - 1) / - vc->vc_font.height; + logo_lines = DIV_ROUND_UP(logo_height, vc->vc_font.height); q = (unsigned short *) (vc->vc_origin + vc->vc_size_row * rows); step = logo_lines * cols; diff --git a/drivers/video/gxt4500.c b/drivers/video/gxt4500.c index e92337bef50d..564557792bed 100644 --- a/drivers/video/gxt4500.c +++ b/drivers/video/gxt4500.c @@ -238,7 +238,7 @@ static int calc_pll(int period_ps, struct gxt4500_par *par) for (pdiv1 = 1; pdiv1 <= 8; ++pdiv1) { for (pdiv2 = 1; pdiv2 <= pdiv1; ++pdiv2) { postdiv = pdiv1 * pdiv2; - pll_period = (period_ps + postdiv - 1) / postdiv; + pll_period = DIV_ROUND_UP(period_ps, postdiv); /* keep pll in range 350..600 MHz */ if (pll_period < 1666 || pll_period > 2857) continue; -- cgit v1.2.3 From 2ae09f0da1cd0c8c646edea2e68356e76789461c Mon Sep 17 00:00:00 2001 From: Krzysztof Helt Date: Mon, 28 Apr 2008 02:14:51 -0700 Subject: pm2fb: correct error values returned from probe function Fix error values returned in some code branches in the pm2fb_probe() function. Signed-off-by: Krzysztof Helt Cc: "Antonino A. Daplas" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/video/pm2fb.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/video/pm2fb.c b/drivers/video/pm2fb.c index 30181b593829..82aa8242f441 100644 --- a/drivers/video/pm2fb.c +++ b/drivers/video/pm2fb.c @@ -1687,10 +1687,12 @@ static int __devinit pm2fb_probe(struct pci_dev *pdev, if (!err || err == 4) info->var = pm2fb_var; - if (fb_alloc_cmap(&info->cmap, 256, 0) < 0) + retval = fb_alloc_cmap(&info->cmap, 256, 0); + if (retval < 0) goto err_exit_both; - if (register_framebuffer(info) < 0) + retval = register_framebuffer(info); + if (retval < 0) goto err_exit_all; printk(KERN_INFO "fb%d: %s frame buffer device, memory = %dK.\n", -- cgit v1.2.3 From 22af89aa0c0b4012a7431114a340efd3665a7617 Mon Sep 17 00:00:00 2001 From: Harvey Harrison Date: Mon, 28 Apr 2008 02:14:53 -0700 Subject: fbcon: replace mono_col macro with static inline Use __u32 for max_len to match the declaration of length in the struct fb_bitfield. Suppresses sparse shadowed variable warnings from the nested max() macros: drivers/video/console/fbcon.h:130:8: warning: symbol '_x' shadows an earlier one drivers/video/console/fbcon.h:130:8: originally declared here drivers/video/console/fbcon.h:130:8: warning: symbol '_x' shadows an earlier one drivers/video/console/fbcon.h:130:8: originally declared here drivers/video/console/fbcon.h:130:8: warning: symbol '_y' shadows an earlier one drivers/video/console/fbcon.h:130:8: originally declared here [akpm@linux-foundation.org: fix constness] Signed-off-by: Harvey Harrison Cc: "Antonino A. Daplas" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/video/console/fbcon.h | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/video/console/fbcon.h b/drivers/video/console/fbcon.h index 3706307e70ed..0135e0395456 100644 --- a/drivers/video/console/fbcon.h +++ b/drivers/video/console/fbcon.h @@ -104,10 +104,14 @@ struct fbcon_ops { #define attr_blink(s) \ ((s) & 0x8000) -#define mono_col(info) \ - (~(0xfff << (max((info)->var.green.length, \ - max((info)->var.red.length, \ - (info)->var.blue.length)))) & 0xff) + +static inline int mono_col(const struct fb_info *info) +{ + __u32 max_len; + max_len = max(info->var.green.length, info->var.red.length); + max_len = max(info->var.blue.length, max_len); + return ~(0xfff << (max_len & 0xff)); +} static inline int attr_col_ec(int shift, struct vc_data *vc, struct fb_info *info, int is_fg) -- cgit v1.2.3 From 32bf87e3697cf2f730b8fbf47cad903ceef718a2 Mon Sep 17 00:00:00 2001 From: Andres Salomon Date: Mon, 28 Apr 2008 02:14:53 -0700 Subject: x86: geode: MSR cleanup This cleans up a few MSR-using drivers in the following manner: - Ensures MSRs are all defined in asm/geode.h, rather than in misc places - Makes the naming consistent; cs553[56] ones begin with MSR_, GX-specific ones start with MSR_GX_, and LX-specific ones start with MSR_LX_. Also, make the names match the data sheet. - Use MSR names rather than numbers in source code - Document the fact that the LX's MSR_PADSEL has the wrong value in the data sheet. That's, uh, good to note. Signed-off-by: Andres Salomon Acked-by: Jordan Crouse Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- arch/x86/kernel/mfgpt_32.c | 8 ++++---- drivers/video/geode/display_gx.h | 1 - drivers/video/geode/gxfb_core.c | 3 ++- drivers/video/geode/lxfb.h | 8 -------- drivers/video/geode/lxfb_ops.c | 19 ++++++++++--------- drivers/video/geode/video_gx.c | 5 +++-- drivers/video/geode/video_gx.h | 3 --- include/asm-x86/geode.h | 16 +++++++++++++--- 8 files changed, 32 insertions(+), 31 deletions(-) diff --git a/arch/x86/kernel/mfgpt_32.c b/arch/x86/kernel/mfgpt_32.c index cfc2648d25ff..3cad17fe026b 100644 --- a/arch/x86/kernel/mfgpt_32.c +++ b/arch/x86/kernel/mfgpt_32.c @@ -63,7 +63,7 @@ static int __init mfgpt_fix(char *s) /* The following udocumented bit resets the MFGPT timers */ val = 0xFF; dummy = 0; - wrmsr(0x5140002B, val, dummy); + wrmsr(MSR_MFGPT_SETUP, val, dummy); return 1; } __setup("mfgptfix", mfgpt_fix); @@ -127,17 +127,17 @@ int geode_mfgpt_toggle_event(int timer, int cmp, int event, int enable) * 6; that is, resets for 7 and 8 will be ignored. Is this * a problem? -dilinger */ - msr = MFGPT_NR_MSR; + msr = MSR_MFGPT_NR; mask = 1 << (timer + 24); break; case MFGPT_EVENT_NMI: - msr = MFGPT_NR_MSR; + msr = MSR_MFGPT_NR; mask = 1 << (timer + shift); break; case MFGPT_EVENT_IRQ: - msr = MFGPT_IRQ_MSR; + msr = MSR_MFGPT_IRQ; mask = 1 << (timer + shift); break; diff --git a/drivers/video/geode/display_gx.h b/drivers/video/geode/display_gx.h index 0af33f329e88..df94e4fc6626 100644 --- a/drivers/video/geode/display_gx.h +++ b/drivers/video/geode/display_gx.h @@ -17,7 +17,6 @@ int gx_line_delta(int xres, int bpp); extern struct geode_dc_ops gx_dc_ops; /* MSR that tells us if a TFT or CRT is attached */ -#define GLD_MSR_CONFIG 0xC0002001 #define GLD_MSR_CONFIG_DM_FP 0x40 /* Display controller registers */ diff --git a/drivers/video/geode/gxfb_core.c b/drivers/video/geode/gxfb_core.c index cf841efa229a..4e22ee0377e7 100644 --- a/drivers/video/geode/gxfb_core.c +++ b/drivers/video/geode/gxfb_core.c @@ -30,6 +30,7 @@ #include #include #include +#include #include "geodefb.h" #include "display_gx.h" @@ -326,7 +327,7 @@ static int __init gxfb_probe(struct pci_dev *pdev, const struct pci_device_id *i /* Figure out if this is a TFT or CRT part */ - rdmsrl(GLD_MSR_CONFIG, val); + rdmsrl(MSR_GX_GLD_MSR_CONFIG, val); if ((val & GLD_MSR_CONFIG_DM_FP) == GLD_MSR_CONFIG_DM_FP) par->enable_crt = 0; diff --git a/drivers/video/geode/lxfb.h b/drivers/video/geode/lxfb.h index ca13c48d19b0..8c83a1b4439b 100644 --- a/drivers/video/geode/lxfb.h +++ b/drivers/video/geode/lxfb.h @@ -31,14 +31,6 @@ void lx_set_palette_reg(struct fb_info *, unsigned int, unsigned int, /* MSRS */ -#define MSR_LX_GLD_CONFIG 0x48002001 -#define MSR_LX_GLCP_DOTPLL 0x4c000015 -#define MSR_LX_DF_PADSEL 0x48002011 -#define MSR_LX_DC_SPARE 0x80000011 -#define MSR_LX_DF_GLCONFIG 0x48002001 - -#define MSR_LX_GLIU0_P2D_RO0 0x10000029 - #define GLCP_DOTPLL_RESET (1 << 0) #define GLCP_DOTPLL_BYPASS (1 << 15) #define GLCP_DOTPLL_HALFPIX (1 << 24) diff --git a/drivers/video/geode/lxfb_ops.c b/drivers/video/geode/lxfb_ops.c index 4fbc99be96ef..a52c180062c8 100644 --- a/drivers/video/geode/lxfb_ops.c +++ b/drivers/video/geode/lxfb_ops.c @@ -13,6 +13,7 @@ #include #include #include +#include #include "lxfb.h" @@ -101,7 +102,7 @@ static void lx_set_dotpll(u32 pllval) u32 dotpll_lo, dotpll_hi; int i; - rdmsr(MSR_LX_GLCP_DOTPLL, dotpll_lo, dotpll_hi); + rdmsr(MSR_GLCP_DOTPLL, dotpll_lo, dotpll_hi); if ((dotpll_lo & GLCP_DOTPLL_LOCK) && (dotpll_hi == pllval)) return; @@ -110,7 +111,7 @@ static void lx_set_dotpll(u32 pllval) dotpll_lo &= ~(GLCP_DOTPLL_BYPASS | GLCP_DOTPLL_HALFPIX); dotpll_lo |= GLCP_DOTPLL_RESET; - wrmsr(MSR_LX_GLCP_DOTPLL, dotpll_lo, dotpll_hi); + wrmsr(MSR_GLCP_DOTPLL, dotpll_lo, dotpll_hi); /* Wait 100us for the PLL to lock */ @@ -119,7 +120,7 @@ static void lx_set_dotpll(u32 pllval) /* Now, loop for the lock bit */ for (i = 0; i < 1000; i++) { - rdmsr(MSR_LX_GLCP_DOTPLL, dotpll_lo, dotpll_hi); + rdmsr(MSR_GLCP_DOTPLL, dotpll_lo, dotpll_hi); if (dotpll_lo & GLCP_DOTPLL_LOCK) break; } @@ -127,7 +128,7 @@ static void lx_set_dotpll(u32 pllval) /* Clear the reset bit */ dotpll_lo &= ~GLCP_DOTPLL_RESET; - wrmsr(MSR_LX_GLCP_DOTPLL, dotpll_lo, dotpll_hi); + wrmsr(MSR_GLCP_DOTPLL, dotpll_lo, dotpll_hi); } /* Set the clock based on the frequency specified by the current mode */ @@ -255,7 +256,7 @@ static void lx_graphics_enable(struct fb_info *info) msrlo = DF_DEFAULT_TFT_PAD_SEL_LOW; msrhi = DF_DEFAULT_TFT_PAD_SEL_HIGH; - wrmsr(MSR_LX_DF_PADSEL, msrlo, msrhi); + wrmsr(MSR_LX_MSR_PADSEL, msrlo, msrhi); } if (par->output & OUTPUT_CRT) { @@ -321,7 +322,7 @@ void lx_set_mode(struct fb_info *info) /* Set output mode */ - rdmsrl(MSR_LX_DF_GLCONFIG, msrval); + rdmsrl(MSR_LX_GLD_MSR_CONFIG, msrval); msrval &= ~DF_CONFIG_OUTPUT_MASK; if (par->output & OUTPUT_PANEL) { @@ -335,7 +336,7 @@ void lx_set_mode(struct fb_info *info) msrval |= DF_OUTPUT_CRT; } - wrmsrl(MSR_LX_DF_GLCONFIG, msrval); + wrmsrl(MSR_LX_GLD_MSR_CONFIG, msrval); /* Clear the various buffers */ /* FIXME: Adjust for panning here */ @@ -383,13 +384,13 @@ void lx_set_mode(struct fb_info *info) /* Set default watermark values */ - rdmsrl(MSR_LX_DC_SPARE, msrval); + rdmsrl(MSR_LX_SPARE_MSR, msrval); msrval &= ~(DC_SPARE_DISABLE_CFIFO_HGO | DC_SPARE_VFIFO_ARB_SELECT | DC_SPARE_LOAD_WM_LPEN_MASK | DC_SPARE_WM_LPEN_OVRD | DC_SPARE_DISABLE_INIT_VID_PRI | DC_SPARE_DISABLE_VFIFO_WM); msrval |= DC_SPARE_DISABLE_VFIFO_WM | DC_SPARE_DISABLE_INIT_VID_PRI; - wrmsrl(MSR_LX_DC_SPARE, msrval); + wrmsrl(MSR_LX_SPARE_MSR, msrval); gcfg = DC_GCFG_DFLE; /* Display fifo enable */ gcfg |= 0xB600; /* Set default priority */ diff --git a/drivers/video/geode/video_gx.c b/drivers/video/geode/video_gx.c index febf09c63492..d0885370675d 100644 --- a/drivers/video/geode/video_gx.c +++ b/drivers/video/geode/video_gx.c @@ -16,6 +16,7 @@ #include #include #include +#include #include "geodefb.h" #include "video_gx.h" @@ -184,10 +185,10 @@ gx_configure_tft(struct fb_info *info) /* Set up the DF pad select MSR */ - rdmsrl(GX_VP_MSR_PAD_SELECT, val); + rdmsrl(MSR_GX_MSR_PADSEL, val); val &= ~GX_VP_PAD_SELECT_MASK; val |= GX_VP_PAD_SELECT_TFT; - wrmsrl(GX_VP_MSR_PAD_SELECT, val); + wrmsrl(MSR_GX_MSR_PADSEL, val); /* Turn off the panel */ diff --git a/drivers/video/geode/video_gx.h b/drivers/video/geode/video_gx.h index ce28d8f382dc..d21bca020594 100644 --- a/drivers/video/geode/video_gx.h +++ b/drivers/video/geode/video_gx.h @@ -14,7 +14,6 @@ extern struct geode_vid_ops gx_vid_ops; /* GX Flatpanel control MSR */ -#define GX_VP_MSR_PAD_SELECT 0xC0002011 #define GX_VP_PAD_SELECT_MASK 0x3FFFFFFF #define GX_VP_PAD_SELECT_TFT 0x1FFFFFFF @@ -59,12 +58,10 @@ extern struct geode_vid_ops gx_vid_ops; /* Geode GX clock control MSRs */ -#define MSR_GLCP_SYS_RSTPLL 0x4c000014 # define MSR_GLCP_SYS_RSTPLL_DOTPREDIV2 (0x0000000000000002ull) # define MSR_GLCP_SYS_RSTPLL_DOTPREMULT2 (0x0000000000000004ull) # define MSR_GLCP_SYS_RSTPLL_DOTPOSTDIV3 (0x0000000000000008ull) -#define MSR_GLCP_DOTPLL 0x4c000015 # define MSR_GLCP_DOTPLL_DOTRESET (0x0000000000000001ull) # define MSR_GLCP_DOTPLL_BYPASS (0x0000000000008000ull) # define MSR_GLCP_DOTPLL_LOCK (0x0000000002000000ull) diff --git a/include/asm-x86/geode.h b/include/asm-x86/geode.h index 9870cc1f2f8f..b1bdf6378563 100644 --- a/include/asm-x86/geode.h +++ b/include/asm-x86/geode.h @@ -30,7 +30,11 @@ extern int geode_get_dev_base(unsigned int dev); /* MSRS */ -#define GX_GLCP_SYS_RSTPLL 0x4C000014 +#define MSR_LX_GLD_MSR_CONFIG 0x48002001 +#define MSR_LX_MSR_PADSEL 0x48002011 /* NOT 0x48000011; the data + * sheet has the wrong value */ +#define MSR_GLCP_SYS_RSTPLL 0x4C000014 +#define MSR_GLCP_DOTPLL 0x4C000015 #define MSR_LBAR_SMB 0x5140000B #define MSR_LBAR_GPIO 0x5140000C @@ -45,8 +49,14 @@ extern int geode_get_dev_base(unsigned int dev); #define MSR_PIC_ZSEL_LOW 0x51400022 #define MSR_PIC_ZSEL_HIGH 0x51400023 -#define MFGPT_IRQ_MSR 0x51400028 -#define MFGPT_NR_MSR 0x51400029 +#define MSR_MFGPT_IRQ 0x51400028 +#define MSR_MFGPT_NR 0x51400029 +#define MSR_MFGPT_SETUP 0x5140002B + +#define MSR_LX_SPARE_MSR 0x80000011 /* DC-specific */ + +#define MSR_GX_GLD_MSR_CONFIG 0xC0002001 +#define MSR_GX_MSR_PADSEL 0xC0002011 /* Resource Sizes */ -- cgit v1.2.3 From e9338364e6989ca2707638c7c70ae22975b0bb6c Mon Sep 17 00:00:00 2001 From: Andres Salomon Date: Mon, 28 Apr 2008 02:14:54 -0700 Subject: x86: GEODE: add Virtual Systems Architecture detection This is generic VSA2 detection. It's used by OLPC to determine whether or not the BIOS contains VSA2, but since other BIOSes are coming out that don't use the VSA (ie, tinybios), it might end up being useful for others. Signed-off-by: Andres Salomon Acked-by: Alan Cox Cc: Jordan Crouse Cc: Ingo Molnar Cc: Thomas Gleixner Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/asm-x86/geode.h | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/include/asm-x86/geode.h b/include/asm-x86/geode.h index b1bdf6378563..3978200f126c 100644 --- a/include/asm-x86/geode.h +++ b/include/asm-x86/geode.h @@ -103,6 +103,14 @@ extern int geode_get_dev_base(unsigned int dev); #define PM_AWKD 0x50 #define PM_SSC 0x54 +/* VSA2 magic values */ + +#define VSA_VRC_INDEX 0xAC1C +#define VSA_VRC_DATA 0xAC1E +#define VSA_VR_UNLOCK 0xFC53 /* unlock virtual register */ +#define VSA_VR_SIGNATURE 0x0003 +#define VSA_SIG 0x4132 /* signature is ascii 'VSA2' */ + /* GPIO */ #define GPIO_OUTPUT_VAL 0x00 @@ -174,6 +182,17 @@ static inline int is_geode(void) return (is_geode_gx() || is_geode_lx()); } +/* + * The VSA has virtual registers that we can query for a signature. + */ +static inline int geode_has_vsa2(void) +{ + outw(VSA_VR_UNLOCK, VSA_VRC_INDEX); + outw(VSA_VR_SIGNATURE, VSA_VRC_INDEX); + + return (inw(VSA_VRC_DATA) == VSA_SIG); +} + /* MFGPTs */ #define MFGPT_MAX_TIMERS 8 -- cgit v1.2.3 From f0a0c1f20f837221c0d990a54ae5426acf039036 Mon Sep 17 00:00:00 2001 From: Jordan Crouse Date: Mon, 28 Apr 2008 02:14:55 -0700 Subject: gxfb: set the right registers to tweak the sync polarity While running in flatpanel mode it is important to change the FP sync bits (VG register 0x408) rather then the CRT sync bits (VG register 0x008). This patch keeps the CRT sync bits at default when a flatpanel exists. Note that this also fixes inverted logic; we want CRT_VSYNC_POL to be set (ie, vsync is normally high) when FB_SYNC_VERT_HIGH_ACT is unset. Signed-off-by: Jordan Crouse Signed-off-by: Andres Salomon Cc: "Antonino A. Daplas" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/video/geode/video_gx.c | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/drivers/video/geode/video_gx.c b/drivers/video/geode/video_gx.c index d0885370675d..06245a8400c7 100644 --- a/drivers/video/geode/video_gx.c +++ b/drivers/video/geode/video_gx.c @@ -208,7 +208,7 @@ gx_configure_tft(struct fb_info *info) fp = 0x0F100000; - /* Add sync polarity */ + /* Configure sync polarity */ if (!(info->var.sync & FB_SYNC_VERT_HIGH_ACT)) fp |= GX_FP_PT2_VSP; @@ -269,11 +269,15 @@ static void gx_configure_display(struct fb_info *info) /* Enable hsync and vsync. */ dcfg |= GX_DCFG_HSYNC_EN | GX_DCFG_VSYNC_EN; - /* Sync polarities. */ - if (info->var.sync & FB_SYNC_HOR_HIGH_ACT) - dcfg |= GX_DCFG_CRT_HSYNC_POL; - if (info->var.sync & FB_SYNC_VERT_HIGH_ACT) - dcfg |= GX_DCFG_CRT_VSYNC_POL; + /* Only change the sync polarities if we are running + * in CRT mode. The FP polarities will be handled in + * gxfb_configure_tft */ + if (par->enable_crt) { + if (!(info->var.sync & FB_SYNC_HOR_HIGH_ACT)) + dcfg |= GX_DCFG_CRT_HSYNC_POL; + if (!(info->var.sync & FB_SYNC_VERT_HIGH_ACT)) + dcfg |= GX_DCFG_CRT_VSYNC_POL; + } /* Enable the display logic */ /* Set up the DACS to blank normally */ -- cgit v1.2.3 From e2b118090969f153f134647acbcbbf01a9005e64 Mon Sep 17 00:00:00 2001 From: Jordan Crouse Date: Mon, 28 Apr 2008 02:14:56 -0700 Subject: gxfb: don't enable the CRT DACs when we are in flatpanel mode When the FP strap is enabled, don't turn on the CRT DACs - that will save about 35 mA of power. Updated/cleaned up by Andres Salomon. Signed-off-by: Andres Salomon Signed-off-by: Jordan Crouse Cc: "Antonino A. Daplas" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/video/geode/video_gx.c | 32 +++++++++++++++++--------------- 1 file changed, 17 insertions(+), 15 deletions(-) diff --git a/drivers/video/geode/video_gx.c b/drivers/video/geode/video_gx.c index 06245a8400c7..cfe2c80b025d 100644 --- a/drivers/video/geode/video_gx.c +++ b/drivers/video/geode/video_gx.c @@ -239,18 +239,6 @@ static void gx_configure_display(struct fb_info *info) struct geodefb_par *par = info->par; u32 dcfg, misc; - /* Set up the MISC register */ - - misc = readl(par->vid_regs + GX_MISC); - - /* Power up the DAC */ - misc &= ~(GX_MISC_A_PWRDN | GX_MISC_DAC_PWRDN); - - /* Disable gamma correction */ - misc |= GX_MISC_GAM_EN; - - writel(misc, par->vid_regs + GX_MISC); - /* Write the display configuration */ dcfg = readl(par->vid_regs + GX_DCFG); @@ -269,14 +257,28 @@ static void gx_configure_display(struct fb_info *info) /* Enable hsync and vsync. */ dcfg |= GX_DCFG_HSYNC_EN | GX_DCFG_VSYNC_EN; - /* Only change the sync polarities if we are running - * in CRT mode. The FP polarities will be handled in - * gxfb_configure_tft */ + misc = readl(par->vid_regs + GX_MISC); + + /* Disable gamma correction */ + misc |= GX_MISC_GAM_EN; + if (par->enable_crt) { + + /* Power up the CRT DACs */ + misc &= ~(GX_MISC_A_PWRDN | GX_MISC_DAC_PWRDN); + writel(misc, par->vid_regs + GX_MISC); + + /* Only change the sync polarities if we are running + * in CRT mode. The FP polarities will be handled in + * gxfb_configure_tft */ if (!(info->var.sync & FB_SYNC_HOR_HIGH_ACT)) dcfg |= GX_DCFG_CRT_HSYNC_POL; if (!(info->var.sync & FB_SYNC_VERT_HIGH_ACT)) dcfg |= GX_DCFG_CRT_VSYNC_POL; + } else { + /* Power down the CRT DACs if in FP mode */ + misc |= (GX_MISC_A_PWRDN | GX_MISC_DAC_PWRDN); + writel(misc, par->vid_regs + GX_MISC); } /* Enable the display logic */ -- cgit v1.2.3 From 0a5e79098799a4bead070a9bd7f1a2213ba5eef5 Mon Sep 17 00:00:00 2001 From: Andres Salomon Date: Mon, 28 Apr 2008 02:14:57 -0700 Subject: gxfb: use PCI_DEVICE() for gxfb's pci device table Drop the class/class_mask stuff; it's unnecessary as long as the vendor and device IDs match. Signed-off-by: Andres Salomon Cc: "Antonino A. Daplas" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- drivers/video/geode/gxfb_core.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/video/geode/gxfb_core.c b/drivers/video/geode/gxfb_core.c index 4e22ee0377e7..546e53038faf 100644 --- a/drivers/video/geode/gxfb_core.c +++ b/drivers/video/geode/gxfb_core.c @@ -398,9 +398,7 @@ static void gxfb_remove(struct pci_dev *pdev) } static struct pci_device_id gxfb_id_table[] = { - { PCI_VENDOR_ID_NS, PCI_DEVICE_ID_NS_GX_VIDEO, - PCI_ANY_ID, PCI_ANY_ID, PCI_BASE_CLASS_DISPLAY << 16, - 0xff0000, 0 }, + { PCI_DEVICE(PCI_VENDOR_ID_NS, PCI_DEVICE_ID_NS_GX_VIDEO) }, { 0, } }; -- cgit v1.2.3 From fa20c8a6e520d9ccd68c8101155ffdbc19c977c3 Mon Sep 17 00:00:00 2001 From: Andres Salomon Date: Mon, 28 Apr 2008 02:14:57 -0700 Subject: gxfb: replace FBSIZE config option with a module parameter Use a command line option (vram) rather than hardcoding the vram size. LxFB already does this; it's useful for machines that can't query the BIOS for fb size. This patch originated from David Woodhouse, was modified by Jordan Crouse, and was then modified further by me. This also adds some gxfb documentation in Documentation/fb. Signed-off-by: Andres Salomon Cc: Jordan Crouse Cc: "Antonino A. Daplas" Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- Documentation/fb/gxfb.txt | 51 ++++++++++++++++++++++++++++++++++++++++ drivers/video/geode/Kconfig | 20 ---------------- drivers/video/geode/display_gx.c | 7 ------ drivers/video/geode/gxfb_core.c | 14 ++++++----- 4 files changed, 59 insertions(+), 33 deletions(-) create mode 100644 Documentation/fb/gxfb.txt diff --git a/Documentation/fb/gxfb.txt b/Documentation/fb/gxfb.txt new file mode 100644 index 000000000000..b56096142017 --- /dev/null +++ b/Documentation/fb/gxfb.txt @@ -0,0 +1,51 @@ +[This file is cloned from VesaFB/aty128fb] + +What is gxfb? +================= + +This is a graphics framebuffer driver for AMD Geode GX2 based processors. + +Advantages: + + * No need to use AMD's VSA code (or other VESA emulation layer) in the + BIOS. + * It provides a nice large console (128 cols + 48 lines with 1024x768) + without using tiny, unreadable fonts. + * You can run XF68_FBDev on top of /dev/fb0 + * Most important: boot logo :-) + +Disadvantages: + + * graphic mode is slower than text mode... + + +How to use it? +============== + +Switching modes is done using gxfb.mode_option=... boot +parameter or using `fbset' program. + +See Documentation/fb/modedb.txt for more information on modedb +resolutions. + + +X11 +=== + +XF68_FBDev should generally work fine, but it is non-accelerated. + + +Configuration +============= + +You can pass kernel command line options to gxfb with gxfb.