summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* qlcnic: dont assume NET_IP_ALIGN is 2Eric Dumazet2010-09-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | qlcnic driver allocates rx skbs and gives to hardware too bytes of extra storage, allowing for corruption of kernel data. NET_IP_ALIGN being 0 on some platforms (including x86), drivers should not assume it's 2. rds_ring->skb_size = rds_ring->dma_size + NET_IP_ALIGN; ... skb = dev_alloc_skb(rds_ring->skb_size); skb_reserve(skb, 2); pci_map_single(pdev, skb->data, rds_ring->dma_size, PCI_DMA_FROMDEVICE); (and rds_ring->skb_size == rds_ring->dma_size) -> bug Because of extra alignment (1500 + 32) -> four extra bytes are available before the struct skb_shared_info, so corruption is not noticed. Note: this driver could use netdev_alloc_skb_ip_align() Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* dca: disable dca on IOAT ver.3.0 multiple-IOH platformsSosnowski, Maciej2010-09-181-6/+79
| | | | | | | | Direct Cache Access is not supported on IOAT ver.3.0 multiple-IOH platforms. This patch blocks registering of dca providers when multiple IOH detected with IOAT ver.3.0. Signed-off-by: Maciej Sosnowski <maciej.sosnowski@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* netpoll: Disable IRQ around RCU dereference in netpoll_rxHerbert Xu2010-09-181-4/+4
| | | | | | | | | | | | | We cannot use rcu_dereference_bh safely in netpoll_rx as we may be called with IRQs disabled. We could however simply disable IRQs as that too causes BH to be disabled and is safe in either case. Thanks to John Linville for discovering this bug and providing a patch. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
* sctp: Do not reset the packet during sctp_packet_config().Vlad Yasevich2010-09-181-1/+0
| | | | | | | | | | | | sctp_packet_config() is called when getting the packet ready for appending of chunks. The function should not touch the current state, since it's possible to ping-pong between two transports when sending, and that can result packet corruption followed by skb overlfow crash. Reported-by: Thomas Dreibholz <dreibh@iem.uni-due.de> Signed-off-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/llc: storing negative error codes in unsigned shortDan Carpenter2010-09-171-1/+1
| | | | | | | | If the alloc_skb() fails then we return 65431 instead of -ENOBUFS (-105). Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* MAINTAINERS: move atlx discussions to netdevChris Snook2010-09-171-1/+1
| | | | | | | | | The atlx drivers are sufficiently mature that we no longer need a separate mailing list for them. Move the discussion to netdev, so we can decommission atl1-devel, which is now mostly spam. Signed-off-by: Chris Snook <chris.snook@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* drivers/net/cxgb3/cxgb3_main.c: prevent reading uninitialized stack memoryDan Rosenberg2010-09-171-0/+2
| | | | | | | | | | | | | Fixed formatting (tabs and line breaks). The CHELSIO_GET_QSET_NUM device ioctl allows unprivileged users to read 4 bytes of uninitialized stack memory, because the "addr" member of the ch_reg struct declared on the stack in cxgb_extension_ioctl() is not altered or zeroed before being copied back to the user. This patch takes care of it. Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* drivers/net/eql.c: prevent reading uninitialized stack memoryDan Rosenberg2010-09-171-0/+2
| | | | | | | | | | | | | Fixed formatting (tabs and line breaks). The EQL_GETMASTRCFG device ioctl allows unprivileged users to read 16 bytes of uninitialized stack memory, because the "master_name" member of the master_config_t struct declared on the stack in eql_g_master_cfg() is not altered or zeroed before being copied back to the user. This patch takes care of it. Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* drivers/net/usb/hso.c: prevent reading uninitialized memoryDan Rosenberg2010-09-171-0/+2
| | | | | | | | | | | | | Fixed formatting (tabs and line breaks). The TIOCGICOUNT device ioctl allows unprivileged users to read uninitialized stack memory, because the "reserved" member of the serial_icounter_struct struct declared on the stack in hso_get_count() is not altered or zeroed before being copied back to the user. This patch takes care of it. Signed-off-by: Dan Rosenberg <dan.j.rosenberg@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* xfrm: dont assume rcu_read_lock in xfrm_output_one()Eric Dumazet2010-09-171-1/+1
| | | | | | | | | ip_local_out() is called with rcu_read_lock() held from ip_queue_xmit() but not from other call sites. Reported-and-bisected-by: Nick Bowler <nbowler@elliptictech.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* r8169: Handle rxfifo errors on 8168 chipsMatthew Garrett2010-09-161-3/+2
| | | | | | | | | | | | | | The Thinkpad X100e seems to have some odd behaviour when the display is powered off - the onboard r8169 starts generating rxfifo overflow errors. The root cause of this has not yet been identified and may well be a hardware design bug on the platform, but r8169 should be more resiliant to this. This patch enables the rxfifo interrupt on 8168 devices and removes the MAC version check in the interrupt handler, and the machine no longer crashes when under network load while the screen turns off. Signed-off-by: Matthew Garrett <mjg@redhat.com> Acked-by: Francois Romieu <romieu@fr.zoreil.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* 3c59x: Remove atomic context inside vortex_{set|get}_wolDenis Kirjanov2010-09-151-4/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no need to use spinlocks in vortex_{set|get}_wol. This also fixes a bug: [ 254.214993] 3c59x 0000:00:0d.0: PME# enabled [ 254.215021] BUG: sleeping function called from invalid context at kernel/mutex.c:94 [ 254.215030] in_atomic(): 0, irqs_disabled(): 1, pid: 4875, name: ethtool [ 254.215042] Pid: 4875, comm: ethtool Tainted: G W 2.6.36-rc3+ #7 [ 254.215049] Call Trace: [ 254.215050] [] __might_sleep+0xb1/0xb6 [ 254.215050] [] mutex_lock+0x17/0x30 [ 254.215050] [] acpi_enable_wakeup_device_power+0x2b/0xb1 [ 254.215050] [] acpi_pm_device_sleep_wake+0x42/0x7f [ 254.215050] [] acpi_pci_sleep_wake+0x5d/0x63 [ 254.215050] [] platform_pci_sleep_wake+0x1d/0x20 [ 254.215050] [] __pci_enable_wake+0x90/0xd0 [ 254.215050] [] acpi_set_WOL+0x8e/0xf5 [3c59x] [ 254.215050] [] vortex_set_wol+0x4e/0x5e [3c59x] [ 254.215050] [] dev_ethtool+0x1cf/0xb61 [ 254.215050] [] ? debug_mutex_free_waiter+0x45/0x4a [ 254.215050] [] ? __mutex_lock_common+0x204/0x20e [ 254.215050] [] ? __mutex_lock_slowpath+0x12/0x15 [ 254.215050] [] ? mutex_lock+0x23/0x30 [ 254.215050] [] dev_ioctl+0x42c/0x533 [ 254.215050] [] ? _cond_resched+0x8/0x1c [ 254.215050] [] ? lock_page+0x1c/0x30 [ 254.215050] [] ? page_address+0x15/0x7c [ 254.215050] [] ? filemap_fault+0x187/0x2c4 [ 254.215050] [] sock_ioctl+0x1d4/0x1e0 [ 254.215050] [] ? sock_ioctl+0x0/0x1e0 [ 254.215050] [] vfs_ioctl+0x19/0x33 [ 254.215050] [] do_vfs_ioctl+0x424/0x46f [ 254.215050] [] ? selinux_file_ioctl+0x3c/0x40 [ 254.215050] [] sys_ioctl+0x40/0x5a [ 254.215050] [] sysenter_do_call+0x12/0x22 vortex_set_wol protected with a spinlock, but nested acpi_set_WOL acquires a mutex inside atomic context. Ethtool operations are already serialized by RTNL mutex, so it is safe to drop the locks. Signed-off-by: Denis Kirjanov <dkirjanov@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* tcp: Prevent overzealous packetization by SWS logic.Alexey Kuznetsov2010-09-151-2/+16
| | | | | | | | | | | | | | | | | If peer uses tiny MSS (say, 75 bytes) and similarly tiny advertised window, the SWS logic will packetize to half the MSS unnecessarily. This causes problems with some embedded devices. However for large MSS devices we do want to half-MSS packetize otherwise we never get enough packets into the pipe for things like fast retransmit and recovery to work. Be careful also to handle the case where MSS > window, otherwise we'll never send until the probe timer. Reported-by: ツ Leandro Melo de Sales <leandroal@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: RPS needs to depend upon USE_GENERIC_SMP_HELPERSDavid S. Miller2010-09-151-1/+1
| | | | | | | | You cannot invoke __smp_call_function_single() unless the architecture sets this symbol. Reported-by: Daniel Hellstrom <daniel@gaisler.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* phylib: fix PAL state machine restart on resumeSimon Guinot2010-09-141-2/+2
| | | | | | | | | | | | | On resume, before starting the PAL state machine, check if the adjust_link() method is well supplied. If not, this would lead to a NULL pointer dereference in the phy_state_machine() function. This scenario can happen if the Ethernet driver call manually the PHY functions instead of using the PAL state machine. The mv643xx_eth driver is a such example. Signed-off-by: Simon Guinot <sguinot@lacie.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net: use rcu_barrier() in rollback_registered_manyEric Dumazet2010-09-141-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | netdev_wait_allrefs() waits that all references to a device vanishes. It currently uses a _very_ pessimistic 250 ms delay between each probe. Some users reported that no more than 4 devices can be dismantled per second, this is a pretty serious problem for some setups. Most of the time, a refcount is about to be released by an RCU callback, that is still in flight because rollback_registered_many() uses a synchronize_rcu() call instead of rcu_barrier(). Problem is visible if number of online cpus is one, because synchronize_rcu() is then a no op. time to remove 50 ipip tunnels on a UP machine : before patch : real 11.910s after patch : real 1.250s Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reported-by: Octavian Purdila <opurdila@ixiacom.com> Reported-by: Benjamin LaHaise <bcrl@kvack.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* bonding: correctly process non-linear skbsAndy Gospodarek2010-09-142-0/+6
| | | | | | | | | | | | | | | | | | | | | | | It was recently brought to my attention that 802.3ad mode bonds would no longer form when using some network hardware after a driver update. After snooping around I realized that the particular hardware was using page-based skbs and found that skb->data did not contain a valid LACPDU as it was not stored there. That explained the inability to form an 802.3ad-based bond. For balance-alb mode bonds this was also an issue as ARPs would not be properly processed. This patch fixes the issue in my tests and should be applied to 2.6.36 and as far back as anyone cares to add it to stable. Thanks to Alexander Duyck <alexander.h.duyck@intel.com> and Jesse Brandeburg <jesse.brandeburg@intel.com> for the suggestions on this one. Signed-off-by: Andy Gospodarek <andy@greyhouse.net> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Jesse Brandeburg <jesse.brandeburg@intel.com> CC: stable@kerne.org Signed-off-by: Jay Vosburgh <fubar@us.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: enable getsockopt() for IP_NODEFRAGMichael Kerrisk2010-09-141-0/+3
| | | | | | | | | | | While integrating your man-pages patch for IP_NODEFRAG, I noticed that this option is settable by setsockopt(), but not gettable by getsockopt(). I suppose this is not intended. The (untested, trivial) patch below adds getsockopt() support. Signed-off-by: Michael kerrisk <mtk.manpages@gmail.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* ipv4: force_igmp_version ignored when a IGMPv3 query receivedBob Arendt2010-09-131-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After all these years, it turns out that the /proc/sys/net/ipv4/conf/*/force_igmp_version parameter isn't fully implemented. *Symptom*: When set force_igmp_version to a value of 2, the kernel should only perform multicast IGMPv2 operations (IETF rfc2236). An host-initiated Join message will be sent as a IGMPv2 Join message. But if a IGMPv3 query message is received, the host responds with a IGMPv3 join message. Per rfc3376 and rfc2236, a IGMPv2 host should treat a IGMPv3 query as a IGMPv2 query and respond with an IGMPv2 Join message. *Consequences*: This is an issue when a IGMPv3 capable switch is the querier and will only issue IGMPv3 queries (which double as IGMPv2 querys) and there's an intermediate switch that is only IGMPv2 capable. The intermediate switch processes the initial v2 Join, but fails to recognize the IGMPv3 Join responses to the Query, resulting in a dropped connection when the intermediate v2-only switch times it out. *Identifying issue in the kernel source*: The issue is in this section of code (in net/ipv4/igmp.c), which is called when an IGMP query is received (from mainline 2.6.36-rc3 gitweb): ... A IGMPv3 query has a length >= 12 and no sources. This routine will exit after line 880, setting the general query timer (random timeout between 0 and query response time). This calls igmp_gq_timer_expire(): ... .. which only sends a v3 response. So if a v3 query is received, the kernel always sends a v3 response. IGMP queries happen once every 60 sec (per vlan), so the traffic is low. A IGMPv3 query *is* a strict superset of a IGMPv2 query, so this patch properly short circuit's the v3 behaviour. One issue is that this does not address force_igmp_version=1. Then again, I've never seen any IGMPv1 multicast equipment in the wild. However there is a lot of v2-only equipment. If it's necessary to support the IGMPv1 case as well: 837 if (len == 8 || IGMP_V2_SEEN(in_dev) || IGMP_V1_SEEN(in_dev)) { Signed-off-by: David S. Miller <davem@davemloft.net>
* ppp: potential NULL dereference in ppp_mp_explode()Dan Carpenter2010-09-131-2/+7
| | | | | | | | | | Smatch complains because we check whether "pch->chan" is NULL and then dereference it unconditionally on the next line. Partly the reason this bug was introduced is because code was too complicated. I've simplified it a little. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* net/llc: make opt unsigned in llc_ui_setsockopt()Dan Carpenter2010-09-131-1/+2
| | | | | | | | | | The members of struct llc_sock are unsigned so if we pass a negative value for "opt" it can cause a sign bug. Also it can cause an integer overflow when we multiply "opt * HZ". CC: stable@kernel.org Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* sch_atm: Fix potential NULL deref.David S. Miller2010-09-121-4/+0
| | | | | | | | | The list_head conversion unearther an unnecessary flow check. Since flow is always NULL here we don't need to see if a matching flow exists already. Reported-by: Jiri Slaby <jirislaby@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* Merge branch 'vhost-net' of ↵David S. Miller2010-09-103-27/+73
|\ | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
| * vhost: error handling fixMichael S. Tsirkin2010-09-061-0/+1
| | | | | | | | | | | | | | vhost should set worker to NULL on cgroups attach failure, so that we won't try to destroy the worker again on close. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * vhost: fix attach to cgroups regressionMichael S. Tsirkin2010-09-061-22/+57
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Since 2.6.36-rc1, non-root users of vhost-net fail to attach if they are in any cgroups. The reason is that when qemu uses vhost, vhost wants to attach its thread to all cgroups that qemu has. But we got the API backwards, so a non-priveledged process (Qemu) tried to control the priveledged one (vhost), which fails. Fix this by switching to the new cgroup_attach_task_all, and running it from the vhost thread. Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
| * cgroups: fix API thinkoMichael S. Tsirkin2010-09-052-5/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | cgroup_attach_task_current_cg API that have upstream is backwards: we really need an API to attach to the cgroups from another process A to the current one. In our case (vhost), a priveledged user wants to attach it's task to cgroups from a less priveledged one, the API makes us run it in the other task's context, and this fails. So let's make the API generic and just pass in 'from' and 'to' tasks. Add an inline wrapper for cgroup_attach_task_current_cg to avoid breaking bisect. Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Li Zefan <lizf@cn.fujitsu.com> Acked-by: Paul Menage <menage@google.com>
* | ipheth: remove incorrect devtype to WWANDan Williams2010-09-101-6/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 'wwan' devtype is meant for devices that require preconfiguration and *every* time setup before the ethernet interface can be used, like cellular modems which require a series of setup commands on serial ports or other mechanisms before the ethernet interface will handle packets. As ipheth only requires one-per-hotplug pairing setup with no preconfiguration (like APN, phone #, etc) and the network interface is usable at any time after that initial setup, remove the incorrect devtype wwan. Signed-off-by: Dan Williams <dcbw@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | MAINTAINERS: Add CAIFJoe Perches2010-09-101-0/+10
| | | | | | | | | | Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | sctp: fix test for end of loopJoe Perches2010-09-101-23/+23
| | | | | | | | | | | | | | | | | | | | Add a list_has_sctp_addr function to simplify loop Based on a patches by Dan Carpenter and David Miller Signed-off-by: Joe Perches <joe@perches.com> Acked-by: Vlad Yasevich <vladislav.yasevich@hp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
* | Merge branch 'master' of ↵David S. Miller2010-09-096129-470316/+351884
|\ \ | | | | | | | | | master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6
| * \ Merge branch 'for-linus' of ↵Linus Torvalds2010-09-0824-291/+281
| |\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: PCI: bus speed strings should be const PCI hotplug: Fix build with CONFIG_ACPI unset PCI: PCIe: Remove the port driver module exit routine PCI: PCIe: Move PCIe PME code to the pcie directory PCI: PCIe: Disable PCIe port services during port initialization PCI: PCIe: Ask BIOS for control of all native services at once ACPI/PCI: Negotiate _OSC control bits before requesting them ACPI/PCI: Do not preserve _OSC control bits returned by a query ACPI/PCI: Make acpi_pci_query_osc() return control bits ACPI/PCI: Reorder checks in acpi_pci_osc_control_set() PCI: PCIe: Introduce commad line switch for disabling port services PCI: PCIe AER: Introduce pci_aer_available() x86/PCI: only define pci_domain_nr if PCI and PCI_DOMAINS are set PCI: provide stub pci_domain_nr function for !CONFIG_PCI configs
| | * | PCI: bus speed strings should be constStephen Hemminger2010-09-011-1/+1
| | | | | | | | | | | | | | | | | | | | Signed-off-by: Stephen Hemminger <shemminger@vyatta.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
| | * | PCI hotplug: Fix build with CONFIG_ACPI unsetRafael J. Wysocki2010-08-251-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One of the recent changes caused complilation of drivers/pci/hotplug/pciehp_core.c to fail. Fix this issue. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
| | * | PCI: PCIe: Remove the port driver module exit routineKenji Kaneshige2010-08-241-7/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The PCIe port driver's module exit routine is never used, so drop it. Signed-off-by: Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com> Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
| | * | PCI: PCIe: Move PCIe PME code to the pcie directoryRafael J. Wysocki2010-08-243-8/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The PCIe PME code only consists of one file, so it doesn't need to occupy its own directory. Move it to drivers/pci/pcie/pme.c and remove the contents of drivers/pci/pcie/pme . Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
| | * | PCI: PCIe: Disable PCIe port services during port initializationRafael J. Wysocki2010-08-241-3/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In principle PCIe port services may be enabled by the BIOS, so it's better to disable them during port initialization to avoid spurious events from being generated. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
| | * | PCI: PCIe: Ask BIOS for control of all native services at onceRafael J. Wysocki2010-08-2417-237/+155
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | After commit 852972acff8f10f3a15679be2059bb94916cba5d (ACPI: Disable ASPM if the platform won't provide _OSC control for PCIe) control of the PCIe Capability Structure is unconditionally requested by acpi_pci_root_add(), which in principle may cause problems to happen in two ways. First, the BIOS may refuse to give control of the PCIe Capability Structure if it is not asked for any of the _OSC features depending on it at the same time. Second, the BIOS may assume that control of the _OSC features depending on the PCIe Capability Structure will be requested in the future and may behave incorrectly if that doesn't happen. For this reason, control of the PCIe Capability Structure should always be requested along with control of any other _OSC features that may depend on it (ie. PCIe native PME, PCIe native hot-plug, PCIe AER). Rework the PCIe port driver so that (1) it checks which native PCIe port services can be enabled, according to the BIOS, and (2) it requests control of all these services simultaneously. In particular, this causes pcie_portdrv_probe() to fail if the BIOS refuses to grant control of the PCIe Capability Structure, which means that no native PCIe port services can be enabled for the PCIe Root Complex the given port belongs to. If that happens, ASPM is disabled to avoid problems with mishandling it by the part of the PCIe hierarchy for which control of the PCIe Capability Structure has not been received. Make it possible to override this behavior using 'pcie_ports=native' (use the PCIe native services regardless of the BIOS response to the control request), or 'pcie_ports=compat' (do not use the PCIe native services at all). Accordingly, rework the existing PCIe port service drivers so that they don't request control of the services directly. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
| | * | ACPI/PCI: Negotiate _OSC control bits before requesting them Rafael J. Wysocki2010-08-245-30/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It is possible that the BIOS will not grant control of all _OSC features requested via acpi_pci_osc_control_set(), so it is recommended to negotiate the final set of _OSC features with the query flag set before calling _OSC to request control of these features. To implement it, rework acpi_pci_osc_control_set() so that the caller can specify the mask of _OSC control bits to negotiate and the mask of _OSC control bits that are absolutely necessary to it. Then, acpi_pci_osc_control_set() will run _OSC queries in a loop until the mask of _OSC control bits returned by the BIOS is equal to the mask passed to it. Also, before running the _OSC request acpi_pci_osc_control_set() will check if the caller's required control bits are present in the final mask. Using this mechanism we will be able to avoid situations in which the BIOS doesn't grant control of certain _OSC features, because they depend on some other _OSC features that have not been requested. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
| | * | ACPI/PCI: Do not preserve _OSC control bits returned by a query Rafael J. Wysocki2010-08-242-16/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is the assumption in acpi_pci_osc_control_set() that it is always sufficient to compare the mask of _OSC control bits to be requested with the result of an _OSC query where all of the known control bits have been checked. However, in general, that need not be the case. For example, if an _OSC feature A depends on an _OSC feature B and control of A, B plus another _OSC feature C is requested simultaneously, the BIOS may return A, B, C, while it would only return C if A and C were requested without B. That may result in passing a wrong mask of _OSC control bits to an _OSC control request, in which case the BIOS may only grant control of a subset of the requested features. Moreover, acpi_pci_run_osc() will return error code if that happens and the caller of acpi_pci_osc_control_set() will not know that it's been granted control of some _OSC features. Consequently, the system will generally not work as expected. Apart from this acpi_pci_osc_control_set() always uses the mask of _OSC control bits returned by the very first invocation of acpi_pci_query_osc(), but that is done with the second argument equal to OSC_PCI_SEGMENT_GROUPS_SUPPORT which generally happens to affect the returned _OSC control bits. For these reasons, make acpi_pci_osc_control_set() always check if control of the requested _OSC features will be granted before making the final control request. As a result, the osc_control_qry and osc_queried members of struct acpi_pci_root are not necessary any more, so drop them and remove the remaining code referring to them. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
| | * | ACPI/PCI: Make acpi_pci_query_osc() return control bitsRafael J. Wysocki2010-08-241-11/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make acpi_pci_query_osc() use an additional pointer argument to return the mask of control bits obtained from the BIOS to the caller. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
| | * | ACPI/PCI: Reorder checks in acpi_pci_osc_control_set()Rafael J. Wysocki2010-08-241-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Make acpi_pci_osc_control_set() attempt to find the handle of the _OSC object under the given PCI root bridge object after verifying that its second argument is correct and that there is a struct acpi_pci_root object for the given root bridge handle, which is more logical than the old code. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Reviewed-by: Hidetoshi Seto <seto.hidetoshi@jp.fujitsu.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
| | * | PCI: PCIe: Introduce commad line switch for disabling port servicesRafael J. Wysocki2010-08-244-0/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce kernel command line switch pcie_ports= allowing one to disable all of the native PCIe port services, so that PCIe ports are treated like PCI-to-PCI bridges. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
| | * | PCI: PCIe AER: Introduce pci_aer_available()Rafael J. Wysocki2010-08-242-3/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Introduce a function allowing the caller to check whether to try to enable PCIe AER. Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
| | * | x86/PCI: only define pci_domain_nr if PCI and PCI_DOMAINS are setJesse Barnes2010-08-171-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Otherwise we'll duplicate definitions with the pci.h stubs. Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
| | * | PCI: provide stub pci_domain_nr function for !CONFIG_PCI configsDave Airlie2010-08-141-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allows the new PCI domain aware DRM code to compile on m68k. Reported-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Dave Airlie <airlied@gmail.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
| * | | Merge branch 'for-linus' of git://oss.sgi.com/xfs/xfsLinus Torvalds2010-09-087-14/+35
| |\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://oss.sgi.com/xfs/xfs: xfs: Make fiemap work with sparse files xfs: prevent 32bit overflow in space reservation xfs: Disallow 32bit project quota id xfs: improve buffer cache hash scalability
| | * \ \ Merge branch '2.6.36-xfs-misc' of ↵Alex Elder2010-09-033-11/+11
| | |\ \ \ | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfsdev
| | | * | | xfs: prevent 32bit overflow in space reservationDave Chinner2010-09-031-3/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we attempt to preallocate more than 2^32 blocks of space in a single syscall, the transaction block reservation will overflow leading to a hangs in the superblock block accounting code. This is trivially reproduced with xfs_io. Fix the problem by capping the allocation reservation to the maximum number of blocks a single xfs_bmapi() call can allocate (2^21 blocks). Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
| | | * | | xfs: improve buffer cache hash scalabilityDave Chinner2010-09-022-8/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When doing large parallel file creates on a 16p machines, large amounts of time is being spent in _xfs_buf_find(). A system wide profile with perf top shows this: 1134740.00 19.3% _xfs_buf_find 733142.00 12.5% __ticket_spin_lock The problem is that the hash contains 45,000 buffers, and the hash table width is only 256 buffers. That means we've got around 200 buffers per chain, and searching it is quite expensive. The hash table size needs to increase. Secondly, every time we do a lookup, we promote the buffer we find to the head of the hash chain. This is causing cachelines to be dirtied and causes invalidation of cachelines across all CPUs that may have walked the hash chain recently. hence every walk of the hash chain is effectively a cold cache walk. Remove the promotion to avoid this invalidation. The results are: 1045043.00 21.2% __ticket_spin_lock 326184.00 6.6% _xfs_buf_find A 70% drop in the CPU usage when looking up buffers. Unfortunately that does not result in an increase in performance underthis workload as contention on the inode_lock soaks up most of the reduction in CPU usage. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de>
| | * | | | xfs: Make fiemap work with sparse filesTao Ma2010-09-033-3/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In xfs_vn_fiemap, we set bvm_count to fi_extent_max + 1 and want to return fi_extent_max extents, but actually it won't work for a sparse file. The reason is that in xfs_getbmap we will calculate holes and set it in 'out', while out is malloced by bmv_count(fi_extent_max+1) which didn't consider holes. So in the worst case, if 'out' vector looks like [hole, extent, hole, extent, hole, ... hole, extent, hole], we will only return half of fi_extent_max extents. This patch add a new parameter BMV_IF_NO_HOLES for bvm_iflags. So with this flags, we don't use our 'out' in xfs_getbmap for a hole. The solution is a bit ugly by just don't increasing index of 'out' vector. I felt that it is not easy to skip it at the very beginning since we have the complicated check and some function like xfs_getbmapx_fix_eof_hole to adjust 'out'. Cc: Dave Chinner <david@fromorbit.com> Signed-off-by: Tao Ma <tao.ma@oracle.com> Signed-off-by: Alex Elder <aelder@sgi.com>