summaryrefslogtreecommitdiffstats
path: root/include (follow)
Commit message (Collapse)AuthorAgeFilesLines
* Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-blockLinus Torvalds2009-06-162-0/+46
|\ | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://git.kernel.dk/linux-2.6-block: block: remove some includings of blktrace_api.h mg_disk: seperate mg_disk.h again block: Introduce helper to reset queue limits to default values cfq: remove extraneous '\n' in blktrace output ubifs: register backing_dev_info btrfs: properly register fs backing device block: don't overwrite bdi->state after bdi_init() has been run cfq: cleanup for last_end_request in cfq_data
| * mg_disk: seperate mg_disk.h againunsik Kim2009-06-161-0/+45
| | | | | | | | | | | | | | | | | | | | eec9462088a26c046d4db3100796a340a50890b8 fold mg_disk.h into mg_disk.c, but mg_disk platform driver needs private data for operation. This also make mg_disk.c as machine independent. Seperate only needed structure and defines to mg_disk.h Signed-off-by: unsik Kim <donari75@gmail.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
| * block: Introduce helper to reset queue limits to default valuesMartin K. Petersen2009-06-161-0/+1
| | | | | | | | | | | | | | | | | | | | | | DM reuses the request queue when swapping in a new device table Introduce blk_set_default_limits() which can be used to reset the the queue_limits prior to stacking devices. Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Acked-by: Alasdair G Kergon <agk@redhat.com> Acked-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* | Merge branch 'merge' of ↵Linus Torvalds2009-06-162-0/+47
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (38 commits) ps3flash: Always read chunks of 256 KiB, and cache them ps3flash: Cache the last accessed FLASH chunk ps3: Replace direct file operations by callback ps3: Switch ps3_os_area_[gs]et_rtc_diff to EXPORT_SYMBOL_GPL() ps3: Correct debug message in dma_ioc0_map_pages() drivers/ps3: Add missing annotations ps3fb: Use ps3_system_bus_[gs]et_drvdata() instead of direct access ps3flash: Use ps3_system_bus_[gs]et_drvdata() instead of direct access ps3: shorten ps3_system_bus_[gs]et_driver_data to ps3_system_bus_[gs]et_drvdata ps3: Use dev_[gs]et_drvdata() instead of direct access for system bus devices block/ps3: remove driver_data direct access of struct device ps3vram: Make ps3vram_priv.reports a void * ps3vram: Remove no longer used ps3vram_priv.ddr_base ps3vram: Replace mutex by spinlock + bio_list block: Add bio_list_peek() powerpc: Use generic atomic64_t implementation on 32-bit processors lib: Provide generic atomic64_t implementation powerpc: Add compiler memory barrier to mtmsr macro powerpc/iseries: Mark signal_vsp_instruction() as maybe unused powerpc/iseries: Fix unused function warning in iSeries DT code ...
| * | block: Add bio_list_peek()Geert Uytterhoeven2009-06-151-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce bio_list_peek(), to obtain a pointer to the first bio on the bio_list without actually removing it from the list. This is needed when you want to serialize based on the list being empty or not. Signed-off-by: Geert Uytterhoeven <Geert.Uytterhoeven@sonycom.com> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | lib: Provide generic atomic64_t implementationPaul Mackerras2009-06-151-0/+42
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Many processor architectures have no 64-bit atomic instructions, but we need atomic64_t in order to support the perf_counter subsystem. This adds an implementation of 64-bit atomic operations using hashed spinlocks to provide atomicity. For each atomic operation, the address of the atomic64_t variable is hashed to an index into an array of 16 spinlocks. That spinlock is taken (with interrupts disabled) around the operation, which can then be coded non-atomically within the lock. On UP, all the spinlock manipulation goes away and we simply disable interrupts around each operation. In fact gcc eliminates the whole atomic64_lock variable as well. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
| * | Merge commit 'origin/master' into nextBenjamin Herrenschmidt2009-06-15136-608/+5820
| |\ \
* | \ \ Merge branch 'acpica' of ↵Linus Torvalds2009-06-164-41/+55
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6 * 'acpica' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-acpi-2.6: (27 commits) ACPICA: Update version to 20090521. ACPICA: Disable preservation of SCI enable bit (SCI_EN) ACPICA: Region deletion: Ensure region object is removed from handler list ACPICA: Eliminate extra call to NsGetParentNode ACPICA: Simplify internal operation region interface ACPICA: Update Load() to use operation region interfaces ACPICA: New: AcpiInstallMethod - install a single control method ACPICA: Invalidate DdbHandle after table unload ACPICA: Fix reference count issues for DdbHandle object ACPICA: Simplify and optimize NsGetNextNode function ACPICA: Additional validation of _PRT packages (resource mgr) ACPICA: Fix DebugObject output for DdbHandle objects ACPICA: Fix allowable release order for ASL mutex objects ACPICA: Mutex support: Fix release ordering issue and current sync level ACPICA: Update version to 20090422. ACPICA: Linux OSL: cleanup/update/merge ACPICA: Fix implementation of AML BreakPoint operator (break to debugger) ACPICA: Fix miscellaneous warnings under gcc 4+ ACPICA: Miscellaneous lint changes ACPICA: Fix possible dereference of null pointer ...
| * | | | ACPICA: Update version to 20090521.Bob Moore2009-05-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update version number. Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * | | | ACPICA: New: AcpiInstallMethod - install a single control methodLin Ming2009-05-271-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This interface enables the override or creation of a single control method. Useful to repair a bug or install a missing method. Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * | | | ACPICA: Update version to 20090422.Bob Moore2009-05-271-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Version 20090422. Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * | | | ACPICA: Linux OSL: cleanup/update/mergeBob Moore2009-05-271-25/+38
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Merge the OSL with the actual file used by Linux, so that the file does not require patching when integrated with Linux. General cleanup and some restructuring. Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * | | | ACPICA: Cleanup byte/word/dword extraction macros, fix possible warningsBob Moore2009-05-271-14/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Removed unnecessary masking. For the 64-bit macros, removed the structure overlay. Fixes aliasing warnings seen with gcc 4+ compilers. Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * | | | ACPICA: Fix a few warnings for gcc 3.4.4Bob Moore2009-05-271-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Mostly for acpiexec, one in the core subsystem. Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
| * | | | ACPICA: Update error/warning interfacesBob Moore2009-05-271-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Moved the module name and line number to the end of the message. Signed-off-by: Bob Moore <robert.moore@intel.com> Signed-off-by: Lin Ming <ming.m.lin@intel.com> Signed-off-by: Len Brown <len.brown@intel.com>
* | | | | printk: Add KERN_DEFAULT printk log-levelLinus Torvalds2009-06-161-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds a KERN_DEFAULT loglevel marker, for when you cannot decide which loglevel you want, and just want to keep an existing printk with the default loglevel. The difference between having KERN_DEFAULT and having no log-level marker at all is two-fold: - having the log-level marker will now force a new-line if the previous printout had not added one (perhaps because it forgot, but perhaps because it expected a continuation) - having a log-level marker is required if you are printing out a message that otherwise itself could perhaps otherwise be mistaken for a log-level. Signed-of-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | | printk: clean up handling of log-levels and newlinesLinus Torvalds2009-06-161-1/+1
| |_|_|/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It used to be that we would only look at the log-level in a printk() after explicit newlines, which can cause annoying problems when the previous printk() did not end with a '\n'. In that case, the log-level marker would be just printed out in the middle of the line, and be seen as just noise rather than change the logging level. This changes things to always look at the log-level in the first bytes of the printout. If a log level marker is found, it is always used as the log-level. Additionally, if no newline existed, one is added (unless the log-level is the explicit KERN_CONT marker, to explicitly show that it's a continuation of a previous line). Acked-by: Arjan van de Ven <arjan@infradead.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge branch 'timers-for-linus-migration' of ↵Linus Torvalds2009-06-154-2/+30
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus-migration' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: timers: Logic to move non pinned timers timers: /proc/sys sysctl hook to enable timer migration timers: Identifying the existing pinned timers timers: Framework for identifying pinned timers timers: allow deferrable timers for intervals tv2-tv5 to be deferred Fix up conflicts in kernel/sched.c and kernel/timer.c manually
| * | | | timers: Logic to move non pinned timersArun R Bharadwaj2009-05-132-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-04-16 12:11:36]: This patch migrates all non pinned timers and hrtimers to the current idle load balancer, from all the idle CPUs. Timers firing on busy CPUs are not migrated. While migrating hrtimers, care should be taken to check if migrating a hrtimer would result in a latency or not. So we compare the expiry of the hrtimer with the next timer interrupt on the target cpu and migrate the hrtimer only if it expires *after* the next interrupt on the target cpu. So, added a clockevents_get_next_event() helper function to return the next_event on the target cpu's clock_event_device. [ tglx: cleanups and simplifications ] Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | timers: /proc/sys sysctl hook to enable timer migrationArun R Bharadwaj2009-05-131-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-04-16 12:11:36]: This patch creates the /proc/sys sysctl interface at /proc/sys/kernel/timer_migration Timer migration is enabled by default. To disable timer migration, when CONFIG_SCHED_DEBUG = y, echo 0 > /proc/sys/kernel/timer_migration Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | timers: Framework for identifying pinned timersArun R Bharadwaj2009-05-132-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Arun R Bharadwaj <arun@linux.vnet.ibm.com> [2009-04-16 12:11:36]: This patch creates a new framework for identifying cpu-pinned timers and hrtimers. This framework is needed because pinned timers are expected to fire on the same CPU on which they are queued. So it is essential to identify these and not migrate them, in case there are any. For regular timers, the currently existing add_timer_on() can be used queue pinned timers and subsequently mod_timer_pinned() can be used to modify the 'expires' field. For hrtimers, new modes HRTIMER_ABS_PINNED and HRTIMER_REL_PINNED are added to queue cpu-pinned hrtimer. [ tglx: use .._PINNED mode argument instead of creating tons of new functions ] Signed-off-by: Arun R Bharadwaj <arun@linux.vnet.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | | | | Merge branch 'timers-for-linus-clocksource' of ↵Linus Torvalds2009-06-151-0/+3
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus-clocksource' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: clocksource: prevent selection of low resolution clocksourse also for nohz=on clocksource: sanity check sysfs clocksource changes
| * | | | | clocksource: prevent selection of low resolution clocksourse also for nohz=onThomas Gleixner2009-06-132-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 3f68535adad (clocksource: sanity check sysfs clocksource changes) prevents selection of non high resolution capable clocksources when high resolution mode is active, but did not take into account that the same rules apply for highres=off nohz=on. Check the tick device mode instead of hrtimer_hres_active() to verify whether the system needs to be protected from a switch to jiffies or other non highres capable clock sources. Reported-by: Luming Yu <luming.yu@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
| * | | | | clocksource: sanity check sysfs clocksource changesjohn stultz2009-06-111-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Thomas, Andrew and Ingo pointed out that we don't have any safety checks in the clocksource sysfs entries to make sure sysadmins don't try to change the clocksource to a non high-res timer capable clocksource (such as jiffies) when high-res timers (HRT) is enabled. Doing so will likely hang a system. Correct this by filtering non HRT clocksources from available_clocksources and not accepting non HRT clocksources with HRT enabled. Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | | | | | Merge branch 'timers-for-linus-ntp' of ↵Linus Torvalds2009-06-151-11/+31
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus-ntp' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: ntp: fix comment typos ntp: adjust SHIFT_PLL to improve NTP convergence
| * | | | | | ntp: fix comment typosjohn stultz2009-05-121-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Bernhard Schiffner noticed I had a few comment typos in this patch, (note: to save embarrassment, when making typos, avoid copying and pasting them) so this patch corrects them. [ Impact: cleanup ] Reported-by: Bernhard Schiffner <bernhard@schiffner-limbach.de> Signed-off-by: John Stultz <johnstul@us.ibm.com> Cc: riel@redhat.com Cc: akpm@linux-foundation.org LKML-Reference: <1242090794.7214.131.camel@localhost.localdomain> Signed-off-by: Ingo Molnar <mingo@elte.hu>
| * | | | | | ntp: adjust SHIFT_PLL to improve NTP convergencejohn stultz2009-05-061-11/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The conversion to the ntpv4 reference model f19923937321244e7dc334767eb4b67e0e3d5c74 ("ntp: convert to the NTP4 reference model") in 2.6.19 added nanosecond resolution the adjtimex interface, but also changed the "stiffness" of the frequency adjustments, causing NTP convergence time to greatly increase. SHIFT_PLL, which reduces the stiffness of the freq adjustments, was designed to be inversely linked to HZ, and the reference value of 4 was designed for Unix systems using HZ=100. However Linux's clock steering code mostly independent of HZ. So this patch reduces the SHIFT_PLL value from 4 to 2, which causes NTPd behavior to match kernels prior to 2.6.19, greatly reducing convergence times, and improving close synchronization through environmental thermal changes. The patch also changes some l's to L's in nearby code to avoid misreading 50l as 501. [ Impact: tweak NTP algorithm for faster convergence ] Signed-off-by: John Stultz <johnstul@us.ibm.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: zippel@linux-m68k.org Signed-off-by: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <200905051956.n45JuVo9025575@imap1.linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
* | | | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6Linus Torvalds2009-06-15101-1305/+3741
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6: (1244 commits) pkt_sched: Rename PSCHED_US2NS and PSCHED_NS2US ipv4: Fix fib_trie rebalancing Bluetooth: Fix issue with uninitialized nsh.type in DTL-1 driver Bluetooth: Fix Kconfig issue with RFKILL integration PIM-SM: namespace changes ipv4: update ARPD help text net: use a deferred timer in rt_check_expire ieee802154: fix kconfig bool/tristate muckup bonding: initialization rework bonding: use is_zero_ether_addr bonding: network device names are case sensative bonding: elminate bad refcount code bonding: fix style issues bonding: fix destructor bonding: remove bonding read/write semaphore bonding: initialize before registration bonding: bond_create always called with default parameters x_tables: Convert printk to pr_err netfilter: conntrack: optional reliable conntrack event delivery list_nulls: add hlist_nulls_add_head and hlist_nulls_del ...
| * \ \ \ \ \ \ Merge branch 'master' of ↵David S. Miller2009-06-15256-1716/+10381
| |\ \ \ \ \ \ \ | | | |_|_|_|_|/ | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 Conflicts: Documentation/feature-removal-schedule.txt drivers/scsi/fcoe/fcoe.c net/core/drop_monitor.c net/core/net-traces.c
| * | | | | | | pkt_sched: Rename PSCHED_US2NS and PSCHED_NS2USJarek Poplawski2009-06-151-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Let's use TICKS instead of US, so PSCHED_TICKS2NS and PSCHED_NS2TICKS (like in PSCHED_TICKS_PER_SEC already) to avoid misleading. Signed-off-by: Jarek Poplawski <jarkao2@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | | | netfilter: conntrack: optional reliable conntrack event deliveryPablo Neira Ayuso2009-06-133-17/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch improves ctnetlink event reliability if one broadcast listener has set the NETLINK_BROADCAST_ERROR socket option. The logic is the following: if an event delivery fails, we keep the undelivered events in the missed event cache. Once the next packet arrives, we add the new events (if any) to the missed events in the cache and we try a new delivery, and so on. Thus, if ctnetlink fails to deliver an event, we try to deliver them once we see a new packet. Therefore, we may lose state transitions but the userspace process gets in sync at some point. At worst case, if no events were delivered to userspace, we make sure that destroy events are successfully delivered. Basically, if ctnetlink fails to deliver the destroy event, we remove the conntrack entry from the hashes and we insert them in the dying list, which contains inactive entries. Then, the conntrack timer is added with an extra grace timeout of random32() % 15 seconds to trigger the event again (this grace timeout is tunable via /proc). The use of a limited random timeout value allows distributing the "destroy" resends, thus, avoiding accumulating lots "destroy" events at the same time. Event delivery may re-order but we can identify them by means of the tuple plus the conntrack ID. The maximum number of conntrack entries (active or inactive) is still handled by nf_conntrack_max. Thus, we may start dropping packets at some point if we accumulate a lot of inactive conntrack entries that did not successfully report the destroy event to userspace. During my stress tests consisting of setting a very small buffer of 2048 bytes for conntrackd and the NETLINK_BROADCAST_ERROR socket flag, and generating lots of very small connections, I noticed very few destroy entries on the fly waiting to be resend. A simple way to test this patch consist of creating a lot of entries, set a very small Netlink buffer in conntrackd (+ a patch which is not in the git tree to set the BROADCAST_ERROR flag) and invoke `conntrack -F'. For expectations, no changes are introduced in this patch. Currently, event delivery is only done for new expectations (no events from expectation expiration, removal and confirmation). In that case, they need a per-expectation event cache to implement the same idea that is exposed in this patch. This patch can be useful to provide reliable flow-accouting. We still have to add a new conntrack extension to store the creation and destroy time. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | | | | | list_nulls: add hlist_nulls_add_head and hlist_nulls_delPablo Neira Ayuso2009-06-131-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds the hlist_nulls_add_head() function which is based on hlist_nulls_add_head_rcu() but without the use of rcu_assign_pointer(). It also adds hlist_nulls_del which is exactly the same like hlist_nulls_del_rcu(). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | | | | | netfilter: conntrack: move helper destruction to nf_ct_helper_destroy()Pablo Neira Ayuso2009-06-131-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch moves the helper destruction to a function that lives in nf_conntrack_helper.c. This new function is used in the patch to add ctnetlink reliable event delivery. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | | | | | netfilter: conntrack: move event caching to conntrack extension infrastructurePablo Neira Ayuso2009-06-133-64/+73
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch reworks the per-cpu event caching to use the conntrack extension infrastructure. The main drawback is that we consume more memory per conntrack if event delivery is enabled. This patch is required by the reliable event delivery that follows to this patch. BTW, this patch allows you to enable/disable event delivery via /proc/sys/net/netfilter/nf_conntrack_events in runtime, although you can still disable event caching as compilation option. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | | | | | Merge branch 'master' of ↵David S. Miller2009-06-1215-118/+270
| |\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/kaber/nf-next-2.6
| | * \ \ \ \ \ \ Merge branch 'master' of ↵Patrick McHardy2009-06-1143-232/+1072
| | |\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next-2.6
| | * | | | | | | | netfilter: nf_conntrack: use per-conntrack locks for protocol dataPatrick McHardy2009-06-102-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce per-conntrack locks and use them instead of the global protocol locks to avoid contention. Especially tcp_lock shows up very high in profiles on larger machines. This will also allow to simplify the upcoming reliable event delivery patches. Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * | | | | | | | netfilter: xt_socket: added new revision of the 'socket' match supporting flagsLaszlo Attila Toth2009-06-091-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the XT_SOCKET_TRANSPARENT flag is set, enabled 'transparent' socket option is required for the socket to be matched. Signed-off-by: Laszlo Attila Toth <panther@balabit.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * | | | | | | | netfilter: passive OS fingerprint xtables matchEvgeniy Polyakov2009-06-083-1/+136
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Passive OS fingerprinting netfilter module allows to passively detect remote OS and perform various netfilter actions based on that knowledge. This module compares some data (WS, MSS, options and it's order, ttl, df and others) from packets with SYN bit set with dynamically loaded OS fingerprints. Fingerprint matching rules can be downloaded from OpenBSD source tree or found in archive and loaded via netfilter netlink subsystem into the kernel via special util found in archive. Archive contains library file (also attached), which was shipped with iptables extensions some time ago (at least when ipt_osf existed in patch-o-matic). Following changes were made in this release: * added NLM_F_CREATE/NLM_F_EXCL checks * dropped _rcu list traversing helpers in the protected add/remove calls * dropped unneded structures, debug prints, obscure comment and check Fingerprints can be downloaded from http://www.openbsd.org/cgi-bin/cvsweb/src/etc/pf.os or can be found in archive Example usage: -d switch removes fingerprints Please consider for inclusion. Thank you. Passive OS fingerprint homepage (archives, examples): http://www.ioremap.net/projects/osf Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net> Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * | | | | | | | netfilter: nf_ct_icmp: keep the ICMP ct entries longerJan Kasprzak2009-06-083-21/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Current conntrack code kills the ICMP conntrack entry as soon as the first reply is received. This is incorrect, as we then see only the first ICMP echo reply out of several possible duplicates as ESTABLISHED, while the rest will be INVALID. Also this unnecessarily increases the conntrackd traffic on H-A firewalls. Make all the ICMP conntrack entries (including the replied ones) last for the default of nf_conntrack_icmp{,v6}_timeout seconds. Signed-off-by: Jan "Yenya" Kasprzak <kas@fi.muni.cz> Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * | | | | | | | netfilter: xt_NFQUEUE: queue balancing supportFlorian Westphal2009-06-051-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Adds support for specifying a range of queues instead of a single queue id. Flows will be distributed across the given range. This is useful for multicore systems: Instead of having a single application read packets from a queue, start multiple instances on queues x, x+1, .. x+n. Each instance can process flows independently. Packets for the same connection are put into the same queue. Signed-off-by: Holger Eitzenberger <heitzenberger@astaro.com> Signed-off-by: Florian Westphal <fwestphal@astaro.com> Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * | | | | | | | netfilter: x_tables: added hook number into match extension parameter structure.Evgeniy Polyakov2009-06-041-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Signed-off-by: Evgeniy Polyakov <zbr@ioremap.net> Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * | | | | | | | netfilter: conntrack: replace notify chain by function pointerPablo Neira Ayuso2009-06-032-21/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch removes the notify chain infrastructure and replace it by a simple function pointer. This issue has been mentioned in the mailing list several times: the use of the notify chain adds too much overhead for something that is only used by ctnetlink. This patch also changes nfnetlink_send(). It seems that gfp_any() returns GFP_KERNEL for user-context request, like those via ctnetlink, inside the RCU read-side section which is not valid. Using GFP_KERNEL is also evil since netlink may schedule(), this leads to "scheduling while atomic" bug reports. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * | | | | | | | netfilter: conntrack: simplify event caching systemPablo Neira Ayuso2009-06-021-30/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch simplifies the conntrack event caching system by removing several events: * IPCT_[*]_VOLATILE, IPCT_HELPINFO and IPCT_NATINFO has been deleted since the have no clients. * IPCT_COUNTER_FILLING which is a leftover of the 32-bits counter days. * IPCT_REFRESH which is not of any use since we always include the timeout in the messages. After this patch, the existing events are: * IPCT_NEW, IPCT_RELATED and IPCT_DESTROY, that are used to identify addition and deletion of entries. * IPCT_STATUS, that notes that the status bits have changes, eg. IPS_SEEN_REPLY and IPS_ASSURED. * IPCT_PROTOINFO, that reports that internal protocol information has changed, eg. the TCP, DCCP and SCTP protocol state. * IPCT_HELPER, that a helper has been assigned or unassigned to this entry. * IPCT_MARK and IPCT_SECMARK, that reports that the mark has changed, this covers the case when a mark is set to zero. * IPCT_NATSEQADJ, to report that there's updates in the NAT sequence adjustment. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * | | | | | | | netfilter: conntrack: remove events flags from userspace exposed filePablo Neira Ayuso2009-06-022-69/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch moves the event flags from linux/netfilter/nf_conntrack_common.h to net/netfilter/nf_conntrack_ecache.h. This flags are not of any use from userspace. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * | | | | | | | netfilter: conntrack: don't report events on module removalPablo Neira Ayuso2009-06-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | During the module removal there are no possible event listeners since ctnetlink must be removed before to allow removing nf_conntrack. This patch removes the event reporting for the module removal case which is not of any use in the existing code. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * | | | | | | | netfilter: ctnetlink: rename tuple() by nf_ct_tuple() macro definitionPablo Neira Ayuso2009-06-021-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch move the internal tuple() macro definition to the header file as nf_ct_tuple(). Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
| | * | | | | | | | netfilter: nf_ct_tcp: TCP simultaneous open supportJozsef Kadlecsik2009-06-021-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The patch below adds supporting TCP simultaneous open to conntrack. The unused LISTEN state is replaced by a new state (SYN_SENT2) denoting the second SYN sent from the reply direction in the new case. The state table is updated and the function tcp_in_window is modified to handle simultaneous open. The functionality can fairly easily be tested by socat. A sample tcpdump recording 23:21:34.244733 IP (tos 0x0, ttl 64, id 49224, offset 0, flags [DF], proto TCP (6), length 60) 192.168.0.254.2020 > 192.168.0.1.2020: S, cksum 0xe75f (correct), 3383710133:3383710133(0) win 5840 <mss 1460,sackOK,timestamp 173445629 0,nop,wscale 7> 23:21:34.244783 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 40) 192.168.0.1.2020 > 192.168.0.254.2020: R, cksum 0x0253 (correct), 0:0(0) ack 3383710134 win 0 23:21:36.038680 IP (tos 0x0, ttl 64, id 28092, offset 0, flags [DF], proto TCP (6), length 60) 192.168.0.1.2020 > 192.168.0.254.2020: S, cksum 0x704b (correct), 2634546729:2634546729(0) win 5840 <mss 1460,sackOK,timestamp 824213 0,nop,wscale 1> 23:21:36.038777 IP (tos 0x0, ttl 64, id 49225, offset 0, flags [DF], proto TCP (6), length 60) 192.168.0.254.2020 > 192.168.0.1.2020: S, cksum 0xb179 (correct), 3383710133:3383710133(0) ack 2634546730 win 5840 <mss 1460,sackOK,timestamp 173447423 824213,nop,wscale 7> 23:21:36.038847 IP (tos 0x0, ttl 64, id 28093, offset 0, flags [DF], proto TCP (6), length 52) 192.168.0.1.2020 > 192.168.0.254.2020: ., cksum 0xebad (correct), ack 3383710134 win 2920 <nop,nop,timestamp 824213 173447423> and the corresponding netlink events: [NEW] tcp 6 120 SYN_SENT src=192.168.0.254 dst=192.168.0.1 sport=2020 dport=2020 [UNREPLIED] src=192.168.0.1 dst=192.168.0.254 sport=2020 dport=2020 [UPDATE] tcp 6 120 LISTEN src=192.168.0.254 dst=192.168.0.1 sport=2020 dport=2020 src=192.168.0.1 dst=192.168.0.254 sport=2020 dport=2020 [UPDATE] tcp 6 60 SYN_RECV src=192.168.0.254 dst=192.168.0.1 sport=2020 dport=2020 src=192.168.0.1 dst=192.168.0.254 sport=2020 dport=2020 [UPDATE] tcp 6 432000 ESTABLISHED src=192.168.0.254 dst=192.168.0.1 sport=2020 dport=2020 src=192.168.0.1 dst=192.168.0.254 sport=2020 dport=2020 [ASSURED] The RST packet was dropped in the raw table, thus it did not reach conntrack. nfnetlink_conntrack is unpatched so it shows the new SYN_SENT2 state as the old unused LISTEN. With TCP simultaneous open support we satisfy REQ-2 in RFC 5382 ;-) . Additional minor correction in this patch is that in order to catch uninitialized reply directions, "td_maxwin == 0" is used instead of "td_end == 0" because the former can't be true except in uninitialized state while td_end may accidentally be equal to zero in the mid of a connection. Signed-off-by: Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Signed-off-by: Patrick McHardy <kaber@trash.net>
| | * | | | | | | | netfilter: conntrack: add support for DCCP handshake sequence to ctnetlinkPablo Neira Ayuso2009-05-272-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds CTA_PROTOINFO_DCCP_HANDSHAKE_SEQ that exposes the u64 handshake sequence number to user-space. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: Patrick McHardy <kaber@trash.net>
| * | | | | | | | | Merge branch 'linux-2.6.31.y' of ↵David S. Miller2009-06-121-1/+1
| |\ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/inaky/wimax