summaryrefslogtreecommitdiffstats
path: root/include (follow)
Commit message (Collapse)AuthorAgeFilesLines
* drm/doc: Add function reference documentation for drm_mm.cDaniel Vetter2014-03-131-29/+125
| | | | | | | | | | | | While at it do a tiny bit of interface cleanup and convert boolean return values to bool. With this patch all exported functions and inline helpers which are part of the drm_mm public interface are documented. Also drop superflous extern function modifiers since most of drm_mm.h doesn't use them - more consistent that way. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* drm/docs: Include hdmi infoframe helper referenceDaniel Vetter2014-03-131-0/+12
| | | | | | | | | | | | Thierry created such nice kerneldocs, it's a shame we've left them lingering! For the fun of it also add a bit of kerneldoc to the header so that we can also include that. Just in case someone adds kerneldoc in there. Cc: Thierry Reding <thierry.reding@gmail.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* Merge branch 'drm-next-3.15' of ↵Dave Airlie2014-03-051-0/+15
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://people.freedesktop.org/~deathsimple/linux into drm-next this is the second pull request for 3.15 radeon changes. Highlights this time: - Better VRAM usage - VM page table rework - Enabling different UVD clocks again - Some general cleanups and improvements * 'drm-next-3.15' of git://people.freedesktop.org/~deathsimple/linux: drm/radeon: remove struct radeon_bo_list drm/radeon: drop non blocking allocations from sub allocator drm/radeon: remove global vm lock drm/radeon: use normal BOs for the page tables v4 drm/radeon: further cleanup vm flushing & fencing drm/radeon: separate gart and vm functions drm/radeon: fix VCE suspend/resume drm/radeon: fix missing bo reservation drm/radeon: limit how much memory TTM can move per IB according to VRAM usage drm/radeon: validate relocations in the order determined by userspace v3 drm/radeon: add buffers to the LRU list from smallest to largest drm/radeon: deduplicate code in radeon_gem_busy_ioctl drm/radeon: track memory statistics about VRAM and GTT usage and buffer moves v2 drm/radeon: add a way to get and set initial buffer domains v2 drm/radeon: use variable UVD clocks drm/radeon: cleanup the fence ring locking code drm/radeon: improve ring lockup detection code v2
| * drm/radeon: track memory statistics about VRAM and GTT usage and buffer moves v2Marek Olšák2014-03-031-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The statistics are: - VRAM usage in bytes - GTT usage in bytes - number of bytes moved by TTM The last one is actually a counter, so you need to sample it before and after command submission and take the difference. This is useful for finding performance bottlenecks. Userspace queries are also added. v2: use atomic64_t Signed-off-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>
| * drm/radeon: add a way to get and set initial buffer domains v2Marek Olšák2014-03-031-0/+12
| | | | | | | | | | | | | | | | | | | | When passing buffers between processes, the receiving process needs to know the original buffer domain, so that it doesn't accidentally move the buffer. v2: reserve the buffer Signed-off-by: Marek Olšák <marek.olsak@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com>
* | Merge tag 'drm-intel-next-2014-02-14' of ↵Dave Airlie2014-03-032-0/+7
|\ \ | |/ |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ssh://git.freedesktop.org/git/drm-intel into drm-next - Fix the execbuf rebind performance regression due to topic/ppgtt (Chris). - Fix up the connector cleanup ordering for sdvod i2c and dp aux devices (Imre). - Try to preserve the firmware modeset config on driver load. And a bit of prep work for smooth takeover of the fb contents (Jesse). - Prep cleanup for larger gtt address spaces on bdw (Ben). - Improve our vblank_wait code to make hsw modesets faster (Paulo). - Display debugfs file (Jesse). - DRRS prep work from Vandana Kannan. - pipestat interrupt handler to fix a few races around vblank/pageflip handling on byt (Imre). - Improve display fuse handling for display-less SKUs (Damien). - Drop locks while stalling for the gpu when serving pagefaults to improve interactivity (Chris). - And as usual piles of other improvements and small fixes all over. * tag 'drm-intel-next-2014-02-14' of ssh://git.freedesktop.org/git/drm-intel: (65 commits) drm/i915: fix NULL deref in the load detect code drm/i915: Only bind each object rather than for every execbuffer drm/i915: Directly return the vma from bind_to_vm drm/i915: Simplify i915_gem_object_ggtt_unpin drm/i915: Allow blocking in the PDE alloc when running low on gtt space drm/i915: Don't allocate context pages as mappable drm/i915: Handle set_cache_level errors in the status page setup drm/i915: Don't pin the status page as mappable drm/i915: Don't set PIN_MAPPABLE for legacy ringbuffers drm/i915: Handle set_cache_level errors in the pipe control scratch setup drm/i915: split PIN_GLOBAL out from PIN_MAPPABLE drm/i915: Consolidate binding parameters into flags drm/i915: sdvo: add i2c sysfs symlink to the connector's directory drm/i915: sdvo: fix error path in sdvo_connector_init drm/i915: dp: fix order of dp aux i2c device cleanup drm/i915: add unregister callback to connector drm/i915: don't reference null pointer at i915_sink_crc drm/i915/lvds: Remove dead code from failing case drm/i915: don't preserve inherited configs with nothing on v2 drm/i915/bdw: Split up PPGTT cleanup ...
| * drm: export cmdline and preferred mode functions from fb helperJesse Barnes2014-02-121-0/+6
| | | | | | | | | | | | | | | | This allows drivers to use them in custom initial_config functions. Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Acked-by: Dave Airlie <airlied@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
| * drm: expose subpixel order name routine v3Jesse Barnes2014-02-121-0/+1
| | | | | | | | | | | | | | | | | | | | | | Just like we have for connector type etc. v2: drop static array (Chris) v3: add kdoc (Daniel) Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Acked-by: Dave Airlie <airlied@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* | Merge branch 'drm-next-3.15' of ↵Dave Airlie2014-02-271-0/+5
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://people.freedesktop.org/~deathsimple/linux into drm-next So this is the initial pull request for radeon drm-next 3.15. Highlights: - VCE bringup including DPM support - Few cleanups for the ring handling code * 'drm-next-3.15' of git://people.freedesktop.org/~deathsimple/linux: drm/radeon: cleanup false positive lockup handling drm/radeon: drop radeon_ring_force_activity drm/radeon: drop drivers copy of the rptr drm/radeon/cik: enable/disable vce cg when encoding v2 drm/radeon: add support for vce 2.0 clock gating drm/radeon/dpm: properly enable/disable vce when vce pg is enabled drm/radeon/dpm: enable dynamic vce state switching v2 drm/radeon: add vce dpm support for KV/KB drm/radeon: enable vce dpm on CI drm/radeon: add vce dpm support for CI drm/radeon: fill in set_vce_clocks for CIK asics drm/radeon/dpm: fetch vce states from the vbios drm/radeon/dpm: fill in some initial vce infrastructure drm/radeon/dpm: move platform caps fetching to a separate function drm/radeon: add callback for setting vce clocks drm/radeon: add VCE version parsing and checking drm/radeon: add VCE ring query drm/radeon: initial VCE support v4 drm/radeon: fix CP semaphores on CIK
| * | drm/radeon: add VCE version parsing and checkingChristian König2014-02-181-0/+4
| | | | | | | | | | | | | | | | | | Also make the result available to userspace. Signed-off-by: Christian König <christian.koenig@amd.com>
| * | drm/radeon: initial VCE support v4Christian König2014-02-181-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Only VCE 2.0 support so far. v2: squashing multiple patches into this one v3: add IRQ support for CIK, major cleanups, basic code documentation v4: remove HAINAN from chipset list Signed-off-by: Christian König <christian.koenig@amd.com>
* | | Merge tag 'drm-intel-next-2014-02-07' of ↵Dave Airlie2014-02-271-0/+10
|\ \ \ | | |/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ssh://git.freedesktop.org/git/drm-intel into drm-next - Yet more steps towards atomic modeset from Ville. - DP panel power sequencing improvements from Paulo. - irq code cleanups from Ville. - 5.4 GHz dp lane clock support for bdw/hsw from Todd. - Clock readout support for hsw/bdw (aka fastboot) from Jesse. - Make pipe underruns report at ERROR level (Ville). This is to check our improved watermarks code. - Full ppgtt support from Ben for gen7. - More fbc fixes and improvements from Ville all over the place, unfortunately not yet enabled by default on more platforms. - w/a cleanups from Ville. - HiZ stall optimization settings (Chia-I Wu). - Display register mmio offset refactor patch from Antti. - RPS improvements for corner-cases from Jeff McGee. * tag 'drm-intel-next-2014-02-07' of ssh://git.freedesktop.org/git/drm-intel: (166 commits) drm/i915: Update rps interrupt limits drm/i915: Restore rps/rc6 on reset drm/i915: Prevent recursion by retiring requests when the ring is full drm/i915: Generate a hang error code drm/i915: unify FLIP_DONE macro names drm/i915: vlv: s/spin_lock_irqsave/spin_lock/ in irq handler drm/i915: factor out valleyview_pipestat_irq_handler drm/i915: vlv: don't unmask IIR[DISPLAY_PIPE_A/B_VBLANK] interrupt drm/i915: Reorganize display pipe register accesses drm/i915: Treat using a purged buffer as a source of EFAULT drm/i915: Convert EFAULT into a silent SIGBUS drm/i915: release mutex in i915_gem_init()'s error path drm/i915: check for oom when allocating private_default_ctx drm/i915/vlv: WA to fix Voltage not getting dropped to Vmin when Gfx is power gated. drm/i915: Get rid of acthd based guilty batch search drm/i915: Use hangcheck score to find guilty context drm/i915: Drop WaDisablePSDDualDispatchEnable:ivb for IVB GT2 drm/i915: Fix IVB GT2 WaDisableDopClockGating and WaDisablePSDDualDispatchEnable drm/i915: Don't access snooped pages through the GTT (even for error capture) drm/i915: Only print information for filing bug reports once ... Conflicts: drivers/gpu/drm/i915/intel_dp.c
| * | Merge remote-tracking branch 'airlied/drm-next' into drm-intel-next-queuedDaniel Vetter2014-01-305-10/+48
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Backmerge drm-next - I need to backmerge drm-intel-fixes patches touching the error capture code to be able to merge Ben's cleanup patches. Conflicts: drivers/gpu/drm/i915/i915_gpu_error.c Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
| * | | drm: dp helper: Add DP test sink CRC definition.Rodrigo Vivi2014-01-271-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This address will be used to verify panel CRC for test and validation purposes. Signed-off-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> [danvet: Fix whitespace fail.] Acked-by: Dave Airlie <airlied@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
* | | | Merge tag 'drm/dp-aux-for-3.15-rc1' of ↵Dave Airlie2014-02-271-0/+111
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://anongit.freedesktop.org/tegra/linux into drm-next drm: DisplayPort AUX framework for v3.15-rc1 This series of patches implements a small framework that abstracts away some of the functionality that the DisplayPort AUX channel provides. It comes with a set of generic helpers that use the driver implementations to reduce code duplication. * tag 'drm/dp-aux-for-3.15-rc1' of git://anongit.freedesktop.org/tegra/linux: drm/dp: Allow registering AUX channels as I2C busses drm/dp: Add DisplayPort link helpers drm/dp: Add drm_dp_dpcd_read_link_status() drm/dp: Add AUX channel infrastructure
| * | | | drm/dp: Allow registering AUX channels as I2C bussesThierry Reding2014-02-261-0/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Implements an I2C-over-AUX I2C adapter on top of the generic drm_dp_aux infrastructure. It extracts the retry logic from existing drivers, which should help in porting those drivers to this new helper. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Thierry Reding <treding@nvidia.com> --- Changes in v5: - move comments partially to to header file - keep MOT set between I2C messages - return -EPROTO on short reads Changes in v4: - fix typo "bitrate" -> "bit rate" Changes in v3: - add back DRM_DEBUG_KMS and DRM_ERROR messages - embed i2c_adapter within struct drm_dp_aux - fix typo in comment
| * | | | drm/dp: Add DisplayPort link helpersThierry Reding2014-02-261-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a helper to probe a DP link (read out the supported DPCD revision, maximum rate, link count and capabilities) as well as power up the DP link and configure it accordingly. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Thierry Reding <treding@nvidia.com> --- Changes in v5: - export helpers Changes in v4: - fix a couple of typos in comments as pointed out by Alex Deucher Changes in v3: - split into drm_dp_link_power_up() and drm_dp_link_configure() - do not change sink state for DPCD versions earlier than 1.1 - sleep for 1-2 ms after setting local sink to D0 state - read and write consecutive registers where possible - read DPCD revision when link is probed - remove duplicate kerneldoc
| * | | | drm/dp: Add drm_dp_dpcd_read_link_status()Thierry Reding2014-02-261-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The function reads the link status (6 bytes starting at offset 0x202) from the DPCD so that it can be conveniently passed to other DPCD helpers. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Thierry Reding <treding@nvidia.com>
| * | | | drm/dp: Add AUX channel infrastructureThierry Reding2014-02-261-0/+79
| | |_|/ | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This is a superset of the current i2c_dp_aux bus functionality and can be used to transfer native AUX in addition to I2C-over-AUX messages. Helpers are provided to read and write the DPCD, either blockwise or byte-wise. Many of the existing helpers for DisplayPort take a copy of a portion of the DPCD and operate on that, without a way to write data back to the DPCD (e.g. for configuration of the link). Subsequent patches will build upon this infrastructure to provide common functionality in a generic way. Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Thierry Reding <treding@nvidia.com> --- Changes in v5: - move comments partially to struct drm_dp_aux_msg in header file - return -EPROTO on short reads in DPCD helpers Changes in v4: - fix a typo in a comment Changes in v3: - reorder drm_dp_dpcd_writeb() arguments to be more intuitive - return number of bytes transferred in drm_dp_dpcd_write() - factor out drm_dp_dpcd_access() - describe error codes
* | | | sched: Add 'flags' argument to sched_{set,get}attr() syscallsPeter Zijlstra2014-02-211-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Because of a recent syscall design debate; its deemed appropriate for each syscall to have a flags argument for future extension; without immediately requiring new syscalls. Cc: juri.lelli@gmail.com Cc: Ingo Molnar <mingo@redhat.com> Suggested-by: Michael Kerrisk <mtk.manpages@gmail.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140214161929.GL27965@twins.programming.kicks-ass.net Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* | | | Merge tag 'pci-v3.14-fixes-1' of ↵Linus Torvalds2014-02-201-0/+20
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci Pull PCI updates from Bjorn Helgaas: "The most interesting thing here is the change to enable INTx (by clearing PCI_COMMAND_INTX_DISABLE) if the BIOS left INTx disabled. Apparently the Baytrail BIOS does this, which means EHCI doesn't work. Also, fix an AHCI MSI regression and other issues with the recent MSI changes. This also adds pci_enable_msi_exact() and pci_enable_msix_exact(), which aren't regression fixes, but will keep us from touching drivers twice (once to stop using the deprecated pci_enable_msi(), etc., and again to use the *_exact() variants). There's also a minor MVEBU fix. Summary: MSI: - Fix AHCI single-MSI fallback (Alexander Gordeev) - Fix populate_msi_sysfs() error paths (Greg Kroah-Hartman) - Fix htmldocs problem (Masanari Iida) - Add pci_enable_msi_exact() and pci_enable_msix_exact() (Alexander Gordeev) - Update documentation (Alexander Gordeev) Miscellaneous: - mvebu: expose device ID & revision via lspci (Andrew Lunn) - Enable INTx if the BIOS left them disabled (Bjorn Helgaas)" * tag 'pci-v3.14-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci: ahci: Fix broken fallback to single MSI mode PCI: Enable INTx if BIOS left them disabled PCI/MSI: Add pci_enable_msi_exact() and pci_enable_msix_exact() PCI/MSI: Fix cut-and-paste errors in documentation PCI/MSI: Add pci_enable_msi() documentation back PCI/MSI: Fix pci_msix_vec_count() htmldocs failure PCI/MSI: Fix leak of msi_attrs PCI/MSI: Check kmalloc() return value, fix leak of name PCI: mvebu: Use Device ID and revision from underlying endpoint
| * | | | PCI/MSI: Add pci_enable_msi_exact() and pci_enable_msix_exact()Alexander Gordeev2014-02-131-0/+20
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The new functions are special cases for pci_enable_msi_range() and pci_enable_msix_range() when a particular number of MSI or MSI-X is needed. By contrast with pci_enable_msi_range() and pci_enable_msix_range() functions, pci_enable_msi_exact() and pci_enable_msix_exact() return zero in case of success, which indicates MSI or MSI-X interrupts have been successfully allocated. Signed-off-by: Alexander Gordeev <agordeev@redhat.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
* | | | Merge branch 'for-3.14-fixes' of ↵Linus Torvalds2014-02-201-0/+2
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "Quite a few fixes this time. Three locking fixes, all marked for -stable. A couple error path fixes and some misc fixes. Hugh found a bug in memcg offlining sequence and we thought we could fix that from cgroup core side but that turned out to be insufficient and got reverted. A different fix has been applied to -mm" * 'for-3.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: update cgroup_enable_task_cg_lists() to grab siglock Revert "cgroup: use an ordered workqueue for cgroup destruction" cgroup: protect modifications to cgroup_idr with cgroup_mutex cgroup: fix locking in cgroup_cfts_commit() cgroup: fix error return from cgroup_create() cgroup: fix error return value in cgroup_mount() cgroup: use an ordered workqueue for cgroup destruction nfs: include xattr.h from fs/nfs/nfs3proc.c cpuset: update MAINTAINERS entry arm, pm, vmpressure: add missing slab.h includes
| * | | | cgroup: protect modifications to cgroup_idr with cgroup_mutexLi Zefan2014-02-111-0/+2
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Setup cgroupfs like this: # mount -t cgroup -o cpuacct xxx /cgroup # mkdir /cgroup/sub1 # mkdir /cgroup/sub2 Then run these two commands: # for ((; ;)) { mkdir /cgroup/sub1/tmp && rmdir /mnt/sub1/tmp; } & # for ((; ;)) { mkdir /cgroup/sub2/tmp && rmdir /mnt/sub2/tmp; } & After seconds you may see this warning: ------------[ cut here ]------------ WARNING: CPU: 1 PID: 25243 at lib/idr.c:527 sub_remove+0x87/0x1b0() idr_remove called for id=6 which is not allocated. ... Call Trace: [<ffffffff8156063c>] dump_stack+0x7a/0x96 [<ffffffff810591ac>] warn_slowpath_common+0x8c/0xc0 [<ffffffff81059296>] warn_slowpath_fmt+0x46/0x50 [<ffffffff81300aa7>] sub_remove+0x87/0x1b0 [<ffffffff810f3f02>] ? css_killed_work_fn+0x32/0x1b0 [<ffffffff81300bf5>] idr_remove+0x25/0xd0 [<ffffffff810f2bab>] cgroup_destroy_css_killed+0x5b/0xc0 [<ffffffff810f4000>] css_killed_work_fn+0x130/0x1b0 [<ffffffff8107cdbc>] process_one_work+0x26c/0x550 [<ffffffff8107eefe>] worker_thread+0x12e/0x3b0 [<ffffffff81085f96>] kthread+0xe6/0xf0 [<ffffffff81570bac>] ret_from_fork+0x7c/0xb0 ---[ end trace 2d1577ec10cf80d0 ]--- It's because allocating/removing cgroup ID is not properly synchronized. The bug was introduced when we converted cgroup_ida to cgroup_idr. While synchronization is already done inside ida_simple_{get,remove}(), users are responsible for concurrent calls to idr_{alloc,remove}(). tj: Refreshed on top of b58c89986a77 ("cgroup: fix error return from cgroup_create()"). Fixes: 4e96ee8e981b ("cgroup: convert cgroup_ida to cgroup_idr") Cc: <stable@vger.kernel.org> #3.12+ Reported-by: Michal Hocko <mhocko@suse.cz> Signed-off-by: Li Zefan <lizefan@huawei.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* | | | Merge branch 'for-3.14-fixes' of ↵Linus Torvalds2014-02-201-4/+1
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fixes from Tejun Heo: "Two workqueue fixes. One for an unlikely but possible critical bug during kworker shutdown and the other to make lockdep names a bit more descriptive" * 'for-3.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: ensure @task is valid across kthread_stop() workqueue: add args to workqueue lockdep name
| * | | | workqueue: add args to workqueue lockdep nameLi Zhong2014-02-141-4/+1
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Tommi noticed a 'funny' lock class name: "%s#5" from a lock acquired in process_one_work(). Maybe #fmt plus #args could be used as the lock_name to give some more information for some fmt string like the above. __builtin_constant_p() check is removed (as there seems no good way to check all the variables in args list). However, by removing the check, it only adds two additional "s for those constants. Some lockdep name examples printed out after the change: lockdep name wq->name "events_long" events_long "%s"("khelper") khelper "xfs-data/%s"mp->m_fsname xfs-data/dm-3 Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Signed-off-by: Tejun Heo <tj@kernel.org>
* | | | Merge tag 'mfd-fixes-3.14-1' of git://git.linaro.org/people/lee.jones/mfdLinus Torvalds2014-02-193-4/+4
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull MFD fixes from Lee Jones: "Couple of small issues solved: - Suspend/Resume call-backs require CONFIG_PM_SLEEP - Some drivers written for 32bit architectures fail when compiled with a 64bit compiler. The fixes will future proof the drivers" * tag 'mfd-fixes-3.14-1' of git://git.linaro.org/people/lee.jones/mfd: mfd: sec-core: sec_pmic_{suspend,resume}() should depend on CONFIG_PM_SLEEP mfd: max14577: max14577_{suspend,resume}() should depend on CONFIG_PM_SLEEP mfd: tps65217: Naturalise cross-architecture discrepancies mfd: wm8994-core: Naturalise cross-architecture discrepancies mfd: max8998: Naturalise cross-architecture discrepancies mfd: max8997: Naturalise cross-architecture discrepancies
| * | | | mfd: tps65217: Naturalise cross-architecture discrepanciesLee Jones2014-02-191-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we compile the TPS65217 for a 64bit architecture we receive the following warnings: drivers/mfd/tps65217.c: In function ‘tps65217_probe’: drivers/mfd/tps65217.c:173:13: warning: cast from pointer to integer of different size chip_id = (unsigned int)match->data; ^ Signed-off-by: Lee Jones <lee.jones@linaro.org>
| * | | | mfd: max8998: Naturalise cross-architecture discrepanciesLee Jones2014-02-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we compile the MAX8998 for a 64bit architecture we receive the following warnings: drivers/mfd/max8998.c: In function ‘max8998_i2c_get_driver_data’: drivers/mfd/max8998.c:178:10: warning: cast from pointer to integer of different size return (int)match->data; ^ Signed-off-by: Lee Jones <lee.jones@linaro.org>
| * | | | mfd: max8997: Naturalise cross-architecture discrepanciesLee Jones2014-02-191-1/+1
| |/ / / | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we compile the MAX8997 for a 64bit architecture we receive the following warnings: drivers/mfd/max8997.c: In function ‘max8997_i2c_get_driver_data’: drivers/mfd/max8997.c:173:10: warning: cast from pointer to integer of different size return (int)match->data; ^ Signed-off-by: Lee Jones <lee.jones@linaro.org>
* | | | Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds2014-02-194-0/+8
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull drm fixes from Dave Airlie: "Lots of little small things, nothing too major: nouveau regression fixes, vmware fixes for the new hw support, memory leaks in error path fixes" * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (31 commits) drm/radeon/ni: fix typo in dpm sq ramping setup drm/radeon/si: fix typo in dpm sq ramping setup drm/radeon: fix CP semaphores on CIK drm/radeon: delete a stray tab drm/radeon: fix display tiling setup on SI drm/radeon/dpm: reduce r7xx vblank mclk threshold to 200 drm/radeon: fill in DRM_CAPs for cursor size drm: add DRM_CAPs for cursor size drm/radeon: unify bpc handling drm/ttm: Fix memory leak in ttm_agp_backend.c drm/ttm: declare 'struct device' in ttm_page_alloc.h drm/nouveau: fix TTM_PL_TT memtype on pre-nv50 drm/nv50/disp: use correct register to determine DP display bpp drm/nouveau/fb: use correct ram oclass for nv1a hardware drm/nv50/gr: add missing nv_error parameter priv drm/nouveau: fix ENG_RUNLIST register address drm/nv4c/bios: disallow retrieving from prom on nv4x igp's drm/nv4c/vga: decode register is in a different place on nv4x igp's drm/nv4c/mc: nv4x igp's have a different msi rearm register drm/nouveau: set irq_enabled manually ...
| * \ \ \ Merge tag 'ttm-fixes-3.14-2014-02-18' of ↵Dave Airlie2014-02-181-0/+2
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://people.freedesktop.org/~thomash/linux into drm-fixes Pull request of 2014-02-18 One compile fix and one memory leak. * tag 'ttm-fixes-3.14-2014-02-18' of git://people.freedesktop.org/~thomash/linux: drm/ttm: Fix memory leak in ttm_agp_backend.c drm/ttm: declare 'struct device' in ttm_page_alloc.h
| | * | | | drm/ttm: declare 'struct device' in ttm_page_alloc.hAlexandre Courbot2014-02-181-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Declare 'struct device' explicitly in ttm_page_alloc.h as this file does not include any file declaring it. This removes the following warning: warning: 'struct device' declared inside parameter list Signed-off-by: Alexandre Courbot <acourbot@nvidia.com> Reviewed-by: Thierry Reding <treding@nvidia.com>
| * | | | | Merge tag 'vmwgfx-fixes-3.14-2014-02-18' of ↵Dave Airlie2014-02-181-0/+1
| |\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://people.freedesktop.org/~thomash/linux into drm-fixes Pull request of 2014-02-18. Nothing special. The biggest change is adding a couple of command defines and packing the command data correctly. * tag 'vmwgfx-fixes-3.14-2014-02-18' of git://people.freedesktop.org/~thomash/linux: drm/vmwgfx: Fix command defines and checks drm/vmwgfx: Fix possible integer overflow drm/vmwgfx: Remove stray const drm/vmwgfx: unlock on error path in vmw_execbuf_process() drm/vmwgfx: Get maximum mob size from register SVGA_REG_MOB_MAX_SIZE drm/vmwgfx: Fix a couple of sparse warnings and errors
| | * | | | | drm/vmwgfx: Get maximum mob size from register SVGA_REG_MOB_MAX_SIZECharmaine Lee2014-02-121-0/+1
| | | |/ / / | | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch queries the register SVGA_REG_MOB_MAX_SIZE for the maximum size of a single mob. Signed-off-by: Charmaine Lee <charmainel@vmware.com> Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com>
| * | | | | drm: add DRM_CAPs for cursor sizeAlex Deucher2014-02-182-0/+5
| | |/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some hardware may not support standard 64x64 cursors. Add a drm cap to query the cursor size from the kernel. Some examples include radeon CIK parts (128x128 cursors) and armada (32x64 or 64x32). This allows things like device specific ddxes to remove asics specific logic and also allows xf86-video-modesetting to work properly with hw cursors on this hardware. Default to 64 if the driver doesn't specify a size. Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Rob Clark <robdclark@gmail.com>
* | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/netLinus Torvalds2014-02-193-17/+50
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull networking fixes from David Miller: 1) kvaser CAN driver has fixed limits of some of it's table, validate that we won't exceed those limits at probe time. Fix from Olivier Sobrie. 2) Fix rtl8192ce disabling interrupts for too long, from Olivier Langlois. 3) Fix botched shift in ath5k driver, from Dan Carpenter. 4) Fix corruption of deferred packets in TIPC, from Erik Hugne. 5) Fix newlink error path in macvlan driver, from Cong Wang. 6) Fix netpoll deadlock in bonding, from Ding Tianhong. 7) Handle GSO packets properly in forwarding path when fragmentation is necessary on egress, from Florian Westphal. 8) Fix axienet build errors, from Michal Simek. 9) Fix refcounting of ubufs on tx in vhost net driver, from Michael S Tsirkin. 10) Carrier status isn't set properly in hyperv driver, from Haiyang Zhang. 11) Missing pci_disable_device() in tulip_remove_one), from Ingo Molnar. 12) AF_PACKET qdisc bypass mode doesn't adhere to driver provided TX queue selection method. Add a fallback method mechanism to fix this bug, from Daniel Borkmann. 13) Fix regression in link local route handling on GRE tunnels, from Nicolas Dichtel. 14) Bonding can assign dup aggregator IDs in some sequences of configuration, fix by making the allocation counter per-bond instead of global. From Jiri Bohac. 15) sctp_connectx() needs compat translations, from Daniel Borkmann. 16) Fix of_mdio PHY interrupt parsing, from Ben Dooks * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (62 commits) MAINTAINERS: add entry for the PHY library of_mdio: fix phy interrupt passing net: ethernet: update dependency and help text of mvneta NET: fec: only enable napi if we are successful af_packet: remove a stray tab in packet_set_ring() net: sctp: fix sctp_connectx abi for ia32 emulation/compat mode ipv4: fix counter in_slow_tot irtty-sir.c: Do not set_termios() on irtty_close() bonding: 802.3ad: make aggregator_identifier bond-private usbnet: remove generic hard_header_len check gre: add link local route when local addr is any batman-adv: fix potential kernel paging error for unicast transmissions batman-adv: avoid double free when orig_node initialization fails batman-adv: free skb on TVLV parsing success batman-adv: fix TT CRC computation by ensuring byte order batman-adv: fix potential orig_node reference leak batman-adv: avoid potential race condition when adding a new neighbour batman-adv: properly check pskb_may_pull return value batman-adv: release vlan object after checking the CRC batman-adv: fix TT-TVLV parsing on OGM reception ...
| * | | | | netdevice: move netdev_cap_txqueue for shared usage to headerDaniel Borkmann2014-02-171-0/+20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In order to allow users to invoke netdev_cap_txqueue, it needs to be moved into netdevice.h header file. While at it, also add kernel doc header to document the API. Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | netdevice: add queue selection fallback handler for ndo_select_queueDaniel Borkmann2014-02-171-3/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add a new argument for ndo_select_queue() callback that passes a fallback handler. This gets invoked through netdev_pick_tx(); fallback handler is currently __netdev_pick_tx() as most drivers invoke this function within their customized implementation in case for skbs that don't need any special handling. This fallback handler can then be replaced on other call-sites with different queue selection methods (e.g. in packet sockets, pktgen etc). This also has the nice side-effect that __netdev_pick_tx() is then only invoked from netdev_pick_tx() and export of that function to modules can be undone. Suggested-by: David S. Miller <davem@davemloft.net> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | net: sctp: Fix a_rwnd/rwnd management to reflect real state of the ↵Matija Glavinic Pecotic2014-02-171-13/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | receiver's buffer Implementation of (a)rwnd calculation might lead to severe performance issues and associations completely stalling. These problems are described and solution is proposed which improves lksctp's robustness in congestion state. 1) Sudden drop of a_rwnd and incomplete window recovery afterwards Data accounted in sctp_assoc_rwnd_decrease takes only payload size (sctp data), but size of sk_buff, which is blamed against receiver buffer, is not accounted in rwnd. Theoretically, this should not be the problem as actual size of buffer is double the amount requested on the socket (SO_RECVBUF). Problem here is that this will have bad scaling for data which is less then sizeof sk_buff. E.g. in 4G (LTE) networks, link interfacing radio side will have a large portion of traffic of this size (less then 100B). An example of sudden drop and incomplete window recovery is given below. Node B exhibits problematic behavior. Node A initiates association and B is configured to advertise rwnd of 10000. A sends messages of size 43B (size of typical sctp message in 4G (LTE) network). On B data is left in buffer by not reading socket in userspace. Lets examine when we will hit pressure state and declare rwnd to be 0 for scenario with above stated parameters (rwnd == 10000, chunk size == 43, each chunk is sent in separate sctp packet) Logic is implemented in sctp_assoc_rwnd_decrease: socket_buffer (see below) is maximum size which can be held in socket buffer (sk_rcvbuf). current_alloced is amount of data currently allocated (rx_count) A simple expression is given for which it will be examined after how many packets for above stated parameters we enter pressure state: We start by condition which has to be met in order to enter pressure state: socket_buffer < currently_alloced; currently_alloced is represented as size of sctp packets received so far and not yet delivered to userspace. x is the number of chunks/packets (since there is no bundling, and each chunk is delivered in separate packet, we can observe each chunk also as sctp packet, and what is important here, having its own sk_buff): socket_buffer < x*each_sctp_packet; each_sctp_packet is sctp chunk size + sizeof(struct sk_buff). socket_buffer is twice the amount of initially requested size of socket buffer, which is in case of sctp, twice the a_rwnd requested: 2*rwnd < x*(payload+sizeof(struc sk_buff)); sizeof(struct sk_buff) is 190 (3.13.0-rc4+). Above is stated that rwnd is 10000 and each payload size is 43 20000 < x(43+190); x > 20000/233; x ~> 84; After ~84 messages, pressure state is entered and 0 rwnd is advertised while received 84*43B ~= 3612B sctp data. This is why external observer notices sudden drop from 6474 to 0, as it will be now shown in example: IP A.34340 > B.12345: sctp (1) [INIT] [init tag: 1875509148] [rwnd: 81920] [OS: 10] [MIS: 65535] [init TSN: 1096057017] IP B.12345 > A.34340: sctp (1) [INIT ACK] [init tag: 3198966556] [rwnd: 10000] [OS: 10] [MIS: 10] [init TSN: 902132839] IP A.34340 > B.12345: sctp (1) [COOKIE ECHO] IP B.12345 > A.34340: sctp (1) [COOKIE ACK] IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057017] [SID: 0] [SSEQ 0] [PPID 0x18] IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057017] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0] IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057018] [SID: 0] [SSEQ 1] [PPID 0x18] IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057018] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0] IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057019] [SID: 0] [SSEQ 2] [PPID 0x18] IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057019] [a_rwnd 9914] [#gap acks 0] [#dup tsns 0] <...> IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057098] [SID: 0] [SSEQ 81] [PPID 0x18] IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057098] [a_rwnd 6517] [#gap acks 0] [#dup tsns 0] IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057099] [SID: 0] [SSEQ 82] [PPID 0x18] IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057099] [a_rwnd 6474] [#gap acks 0] [#dup tsns 0] IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057100] [SID: 0] [SSEQ 83] [PPID 0x18] --> Sudden drop IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057100] [a_rwnd 0] [#gap acks 0] [#dup tsns 0] At this point, rwnd_press stores current rwnd value so it can be later restored in sctp_assoc_rwnd_increase. This however doesn't happen as condition to start slowly increasing rwnd until rwnd_press is returned to rwnd is never met. This condition is not met since rwnd, after it hit 0, must first reach rwnd_press by adding amount which is read from userspace. Let us observe values in above example. Initial a_rwnd is 10000, pressure was hit when rwnd was ~6500 and the amount of actual sctp data currently waiting to be delivered to userspace is ~3500. When userspace starts to read, sctp_assoc_rwnd_increase will be blamed only for sctp data, which is ~3500. Condition is never met, and when userspace reads all data, rwnd stays on 3569. IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057100] [a_rwnd 1505] [#gap acks 0] [#dup tsns 0] IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057100] [a_rwnd 3010] [#gap acks 0] [#dup tsns 0] IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057101] [SID: 0] [SSEQ 84] [PPID 0x18] IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057101] [a_rwnd 3569] [#gap acks 0] [#dup tsns 0] --> At this point userspace read everything, rwnd recovered only to 3569 IP A.34340 > B.12345: sctp (1) [DATA] (B)(E) [TSN: 1096057102] [SID: 0] [SSEQ 85] [PPID 0x18] IP B.12345 > A.34340: sctp (1) [SACK] [cum ack 1096057102] [a_rwnd 3569] [#gap acks 0] [#dup tsns 0] Reproduction is straight forward, it is enough for sender to send packets of size less then sizeof(struct sk_buff) and receiver keeping them in its buffers. 2) Minute size window for associations sharing the same socket buffer In case multiple associations share the same socket, and same socket buffer (sctp.rcvbuf_policy == 0), different scenarios exist in which congestion on one of the associations can permanently drop rwnd of other association(s). Situation will be typically observed as one association suddenly having rwnd dropped to size of last packet received and never recovering beyond that point. Different scenarios will lead to it, but all have in common that one of the associations (let it be association from 1)) nearly depleted socket buffer, and the other association blames socket buffer just for the amount enough to start the pressure. This association will enter pressure state, set rwnd_press and announce 0 rwnd. When data is read by userspace, similar situation as in 1) will occur, rwnd will increase just for the size read by userspace but rwnd_press will be high enough so that association doesn't have enough credit to reach rwnd_press and restore to previous state. This case is special case of 1), being worse as there is, in the worst case, only one packet in buffer for which size rwnd will be increased. Consequence is association which has very low maximum rwnd ('minute size', in our case down to 43B - size of packet which caused pressure) and as such unusable. Scenario happened in the field and labs frequently after congestion state (link breaks, different probabilities of packet drop, packet reordering) and with scenario 1) preceding. Here is given a deterministic scenario for reproduction: >From node A establish two associations on the same socket, with rcvbuf_policy being set to share one common buffer (sctp.rcvbuf_policy == 0). On association 1 repeat scenario from 1), that is, bring it down to 0 and restore up. Observe scenario 1). Use small payload size (here we use 43). Once rwnd is 'recovered', bring it down close to 0, as in just one more packet would close it. This has as a consequence that association number 2 is able to receive (at least) one more packet which will bring it in pressure state. E.g. if association 2 had rwnd of 10000, packet received was 43, and we enter at this point into pressure, rwnd_press will have 9957. Once payload is delivered to userspace, rwnd will increase for 43, but conditions to restore rwnd to original state, just as in 1), will never be satisfied. --> Association 1, between A.y and B.12345 IP A.55915 > B.12345: sctp (1) [INIT] [init tag: 836880897] [rwnd: 10000] [OS: 10] [MIS: 65535] [init TSN: 4032536569] IP B.12345 > A.55915: sctp (1) [INIT ACK] [init tag: 2873310749] [rwnd: 81920] [OS: 10] [MIS: 10] [init TSN: 3799315613] IP A.55915 > B.12345: sctp (1) [COOKIE ECHO] IP B.12345 > A.55915: sctp (1) [COOKIE ACK] --> Association 2, between A.z and B.12346 IP A.55915 > B.12346: sctp (1) [INIT] [init tag: 534798321] [rwnd: 10000] [OS: 10] [MIS: 65535] [init TSN: 2099285173] IP B.12346 > A.55915: sctp (1) [INIT ACK] [init tag: 516668823] [rwnd: 81920] [OS: 10] [MIS: 10] [init TSN: 3676403240] IP A.55915 > B.12346: sctp (1) [COOKIE ECHO] IP B.12346 > A.55915: sctp (1) [COOKIE ACK] --> Deplete socket buffer by sending messages of size 43B over association 1 IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315613] [SID: 0] [SSEQ 0] [PPID 0x18] IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315613] [a_rwnd 9957] [#gap acks 0] [#dup tsns 0] <...> IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315696] [a_rwnd 6388] [#gap acks 0] [#dup tsns 0] IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315697] [SID: 0] [SSEQ 84] [PPID 0x18] IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315697] [a_rwnd 6345] [#gap acks 0] [#dup tsns 0] --> Sudden drop on 1 IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315698] [SID: 0] [SSEQ 85] [PPID 0x18] IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315698] [a_rwnd 0] [#gap acks 0] [#dup tsns 0] --> Here userspace read, rwnd 'recovered' to 3698, now deplete again using association 1 so there is place in buffer for only one more packet IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315799] [SID: 0] [SSEQ 186] [PPID 0x18] IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315799] [a_rwnd 86] [#gap acks 0] [#dup tsns 0] IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315800] [SID: 0] [SSEQ 187] [PPID 0x18] IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315800] [a_rwnd 43] [#gap acks 0] [#dup tsns 0] --> Socket buffer is almost depleted, but there is space for one more packet, send them over association 2, size 43B IP B.12346 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3676403240] [SID: 0] [SSEQ 0] [PPID 0x18] IP A.55915 > B.12346: sctp (1) [SACK] [cum ack 3676403240] [a_rwnd 0] [#gap acks 0] [#dup tsns 0] --> Immediate drop IP A.60995 > B.12346: sctp (1) [SACK] [cum ack 387491510] [a_rwnd 0] [#gap acks 0] [#dup tsns 0] --> Read everything from the socket, both association recover up to maximum rwnd they are capable of reaching, note that association 1 recovered up to 3698, and association 2 recovered only to 43 IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315800] [a_rwnd 1548] [#gap acks 0] [#dup tsns 0] IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315800] [a_rwnd 3053] [#gap acks 0] [#dup tsns 0] IP B.12345 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3799315801] [SID: 0] [SSEQ 188] [PPID 0x18] IP A.55915 > B.12345: sctp (1) [SACK] [cum ack 3799315801] [a_rwnd 3698] [#gap acks 0] [#dup tsns 0] IP B.12346 > A.55915: sctp (1) [DATA] (B)(E) [TSN: 3676403241] [SID: 0] [SSEQ 1] [PPID 0x18] IP A.55915 > B.12346: sctp (1) [SACK] [cum ack 3676403241] [a_rwnd 43] [#gap acks 0] [#dup tsns 0] A careful reader might wonder why it is necessary to reproduce 1) prior reproduction of 2). It is simply easier to observe when to send packet over association 2 which will push association into the pressure state. Proposed solution: Both problems share the same root cause, and that is improper scaling of socket buffer with rwnd. Solution in which sizeof(sk_buff) is taken into concern while calculating rwnd is not possible due to fact that there is no linear relationship between amount of data blamed in increase/decrease with IP packet in which payload arrived. Even in case such solution would be followed, complexity of the code would increase. Due to nature of current rwnd handling, slow increase (in sctp_assoc_rwnd_increase) of rwnd after pressure state is entered is rationale, but it gives false representation to the sender of current buffer space. Furthermore, it implements additional congestion control mechanism which is defined on implementation, and not on standard basis. Proposed solution simplifies whole algorithm having on mind definition from rfc: o Receiver Window (rwnd): This gives the sender an indication of the space available in the receiver's inbound buffer. Core of the proposed solution is given with these lines: sctp_assoc_rwnd_update: if ((asoc->base.sk->sk_rcvbuf - rx_count) > 0) asoc->rwnd = (asoc->base.sk->sk_rcvbuf - rx_count) >> 1; else asoc->rwnd = 0; We advertise to sender (half of) actual space we have. Half is in the braces depending whether you would like to observe size of socket buffer as SO_RECVBUF or twice the amount, i.e. size is the one visible from userspace, that is, from kernelspace. In this way sender is given with good approximation of our buffer space, regardless of the buffer policy - we always advertise what we have. Proposed solution fixes described problems and removes necessity for rwnd restoration algorithm. Finally, as proposed solution is simplification, some lines of code, along with some bytes in struct sctp_association are saved. Version 2 of the patch addressed comments from Vlad. Name of the function is set to be more descriptive, and two parts of code are changed, in one removing the superfluous call to sctp_assoc_rwnd_update since call would not result in update of rwnd, and the other being reordering of the code in a way that call to sctp_assoc_rwnd_update updates rwnd. Version 3 corrected change introduced in v2 in a way that existing function is not reordered/copied in line, but it is correctly called. Thanks Vlad for suggesting. Signed-off-by: Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nsn.com> Reviewed-by: Alexander Sverdlin <alexander.sverdlin@nsn.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | net: ip, ipv6: handle gso skbs in forwarding pathFlorian Westphal2014-02-131-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Marcelo Ricardo Leitner reported problems when the forwarding link path has a lower mtu than the incoming one if the inbound interface supports GRO. Given: Host <mtu1500> R1 <mtu1200> R2 Host sends tcp stream which is routed via R1 and R2. R1 performs GRO. In this case, the kernel will fail to send ICMP fragmentation needed messages (or pkt too big for ipv6), as GSO packets currently bypass dstmtu checks in forward path. Instead, Linux tries to send out packets exceeding the mtu. When locking route MTU on Host (i.e., no ipv4 DF bit set), R1 does not fragment the packets when forwarding, and again tries to send out packets exceeding R1-R2 link mtu. This alters the forwarding dstmtu checks to take the individual gso segment lengths into account. For ipv6, we send out pkt too big error for gso if the individual segments are too big. For ipv4, we either send icmp fragmentation needed, or, if the DF bit is not set, perform software segmentation and let the output path create fragments when the packet is leaving the machine. It is not 100% correct as the error message will contain the headers of the GRO skb instead of the original/segmented one, but it seems to work fine in my (limited) tests. Eric Dumazet suggested to simply shrink mss via ->gso_size to avoid sofware segmentation. However it turns out that skb_segment() assumes skb nr_frags is related to mss size so we would BUG there. I don't want to mess with it considering Herbert and Eric disagree on what the correct behavior should be. Hannes Frederic Sowa notes that when we would shrink gso_size skb_segment would then also need to deal with the case where SKB_MAX_FRAGS would be exceeded. This uses sofware segmentation in the forward path when we hit ipv4 non-DF packets and the outgoing link mtu is too small. Its not perfect, but given the lack of bug reports wrt. GRO fwd being broken this is a rare case anyway. Also its not like this could not be improved later once the dust settles. Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Reported-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
| * | | | | net: core: introduce netif_skb_dev_featuresFlorian Westphal2014-02-131-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Will be used by upcoming ipv4 forward path change that needs to determine feature mask using skb->dst->dev instead of skb->dev. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2014-02-171-2/+3
|\ \ \ \ \ \ | |_|/ / / / |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fixes from Sage Weil: "We have some patches fixing up ACL support issues from Zheng and Guangliang and a mount option to enable/disable this support. (These fixes were somewhat delayed by the Chinese holiday.) There is also a small fix for cached readdir handling when directories are fragmented" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: ceph: fix __dcache_readdir() ceph: add acl, noacl options for cephfs mount ceph: make ceph_forget_all_cached_acls() static inline ceph: add missing init_acl() for mkdir() and atomic_open() ceph: fix ceph_set_acl() ceph: fix ceph_removexattr() ceph: remove xattr when null value is given to setxattr() ceph: properly handle XATTR_CREATE and XATTR_REPLACE
| * | | | | ceph: remove xattr when null value is given to setxattr()Yan, Zheng2014-02-171-2/+3
| | |/ / / | |/| | | | | | | | | | | | | | | | | | | | | | | | | | | | For the setxattr request, introduce a new flag CEPH_XATTR_REMOVE to distinguish null value case from the zero-length value case. Signed-off-by: Yan, Zheng <zheng.z.yan@intel.com>
* | | | | Merge tag 'dma-buf-for-3.14' of ↵Linus Torvalds2014-02-171-1/+1
|\ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sumits/dma-buf Pull dma-buf fix from Sumit Semwal: "Just some debugfs output updates. There's another patch related to dma-buf, but it'll get upstreamed via Greg KH's pull request" * tag 'dma-buf-for-3.14' of git://git.kernel.org/pub/scm/linux/kernel/git/sumits/dma-buf: dma-buf: update debugfs output
| * | | | | dma-buf: update debugfs outputSumit Semwal2014-02-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Russell King observed 'wierd' looking output from debugfs, and also suggested better ways of getting device names (use KBUILD_MODNAME, dev_name()) This patch addresses these issues to make the debugfs output correct and better looking. While at it, replace seq_printf with seq_puts to remove the checkpatch.pl warnings. Reported-by: Russell King - ARM Linux <linux@arm.linux.org.uk> Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org>
* | | | | | Merge branch 'merge' of ↵Linus Torvalds2014-02-171-0/+39
|\ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc Pull powerpc fixes from Ben Herrenschmidt: "Here are some more powerpc fixes for 3.14 The main one is a nasty issue with the NUMA balancing support which requires a small generic change and the addition of a new accessor to set _PAGE_NUMA. Both have been reviewed and acked by Mel and Rik. The changelog should have plenty of details but basically, without this fix, we get random user segfaults and/or corruptions due to missing TLB/hash flushes. Aneesh series of 3 patches fixes it. We have some vDSO vs. perf fixes from Anton, some small EEH fixes from Gavin, a ppc32 regression vs the stack overflow detector, and a fix for the way we handle PCIe host bridge speed settings on pseries (which is needed for proper operations of AMD graphics cards on Power8)" * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: powerpc/eeh: Disable EEH on reboot powerpc/eeh: Cleanup on eeh_subsystem_enabled powerpc/powernv: Rework EEH reset powerpc: Use unstripped VDSO image for more accurate profiling data powerpc: Link VDSOs at 0x0 mm: Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bit mm: Dirty accountable change only apply to non prot numa case powerpc/mm: Add new "set" flag argument to pte/pmd update function powerpc/pseries: Add Gen3 definitions for PCIE link speed powerpc/pseries: Fix regression on PCI link speed powerpc: Set the correct ksp_limit on ppc32 when switching to irq stack
| * | | | | | mm: Use ptep/pmdp_set_numa() for updating _PAGE_NUMA bitAneesh Kumar K.V2014-02-171-0/+39
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Archs like ppc64 doesn't do tlb flush in set_pte/pmd functions when using a hash table MMU for various reasons (the flush is handled as part of the PTE modification when necessary). ppc64 thus doesn't implement flush_tlb_range for hash based MMUs. Additionally ppc64 require the tlb flushing to be batched within ptl locks. The reason to do that is to ensure that the hash page table is in sync with linux page table. We track the hpte index in linux pte and if we clear them without flushing hash and drop the ptl lock, we can have another cpu update the pte and can end up with duplicate entry in the hash table, which is fatal. We also want to keep set_pte_at simpler by not requiring them to do hash flush for performance reason. We do that by assuming that set_pte_at() is never *ever* called on a PTE that is already valid. This was the case until the NUMA code went in which broke that assumption. Fix that by introducing a new pair of helpers to set _PAGE_NUMA in a way similar to ptep/pmdp_set_wrprotect(), with a generic implementation using set_pte_at() and a powerpc specific one using the appropriate mechanism needed to keep the hash table in sync. Acked-by: Mel Gorman <mgorman@suse.de> Reviewed-by: Rik van Riel <riel@redhat.com> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
* | | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2014-02-161-1/+0
|\ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs Pull btrfs fixes from Chris Mason: "We have a small collection of fixes in my for-linus branch. The big thing that stands out is a revert of a new ioctl. Users haven't shipped yet in btrfs-progs, and Dave Sterba found a better way to export the information" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: Btrfs: use right clone root offset for compressed extents btrfs: fix null pointer deference at btrfs_sysfs_add_one+0x105 Btrfs: unset DCACHE_DISCONNECTED when mounting default subvol Btrfs: fix max_inline mount option Btrfs: fix a lockdep warning when cleaning up aborted transaction Revert "btrfs: add ioctl to export size of global metadata reservation"
| * | | | | | | Revert "btrfs: add ioctl to export size of global metadata reservation"Chris Mason2014-02-141-1/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 01e219e8069516cdb98594d417b8bb8d906ed30d. David Sterba found a different way to provide these features without adding a new ioctl. We haven't released any progs with this ioctl yet, so I'm taking this out for now until we finalize things. Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: David Sterba <dsterba@suse.cz> CC: Jeff Mahoney <jeffm@suse.com>