summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* i7core: add support for Lynnfield alternate addressMauro Carvalho Chehab2010-05-102-2/+12
| | | | Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* i7core_edac: Add initial support for LynnfieldMauro Carvalho Chehab2010-05-102-2/+52
| | | | Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* i7core_edac: do not export static functionsStephen Rothwell2010-05-101-1/+0
| | | | | Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* edac: fix i7core buildRandy Dunlap2010-05-101-0/+3
| | | | | | | | | | | Fix build warning (missing header file) and build error when CONFIG_SMP=n. drivers/edac/i7core_edac.c:860: error: implicit declaration of function 'msleep' drivers/edac/i7core_edac.c:1700: error: 'struct cpuinfo_x86' has no member named 'phys_proc_id' Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* edac: i7core_edac produces undefined behaviour on 32bitAlan Cox2010-05-101-12/+12
| | | | | | | | Fix the shifts up Signed-off-by: Alan Cox <alan@linux.intel.com> Acked-by: Doug Thompson <dougthompson@xmission.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* i7core_edac: Use a more generic approach for probing PCI devicesMauro Carvalho Chehab2010-05-101-39/+40
| | | | | | | | Currently, only one PCI set of tables is allowed. This prevents using the driver for other devices like Lynnfield, with have a different set of PCI ID's. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* i7core_edac: PCI device is called NONCORE, instead of NOCOREMauro Carvalho Chehab2010-05-102-5/+5
| | | | Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* i7core_edac: Fix ringbuffer maxsizeMauro Carvalho Chehab2010-05-101-6/+6
| | | | Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* i7core_edac: First store, then incrementMauro Carvalho Chehab2010-05-101-7/+6
| | | | | | | | | Fix ringbuffer store logic. While here, add a few comments to the code and remove the undesired printk that could otherwise be called during NMI time. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* i7core_edac: Better parse "any" addrmaskMauro Carvalho Chehab2010-05-101-1/+1
| | | | | | Instead of accepting just "any", accept also "any\n" Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* i7core_edac: Use a lockless ringbufferMauro Carvalho Chehab2010-05-101-28/+55
| | | | Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* edac: Create an unique instance for each kobjMauro Carvalho Chehab2010-05-102-32/+64
| | | | | | | Current code only works when there's just one memory controller, since we need one kobj for each instance. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* Documentation/edac.txt: Reflect the sysfs changes at the documentMauro Carvalho Chehab2010-05-101-27/+29
| | | | Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* i7core_edac: Convert UDIMM error counters into a proper sysfs groupMauro Carvalho Chehab2010-05-101-37/+44
| | | | | | | | | | | | | | | | | Instead of displaying 3 values at the same var, break it into 3 different sysfs nodes: /sys/devices/system/edac/mc/mc0/all_channel_counts/udimm0 /sys/devices/system/edac/mc/mc0/all_channel_counts/udimm1 /sys/devices/system/edac/mc/mc0/all_channel_counts/udimm2 For registered dimms, however, the error counters are already being displayed at: /sys/devices/system/edac/mc/mc0/csrow*/ce_count So, there's no need to add any extra sysfs nodes. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* edac: Don't create csrow entries on instance groupsMauro Carvalho Chehab2010-05-101-2/+2
| | | | Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* edac: store/show methods for device groups weren't workingMauro Carvalho Chehab2010-05-103-9/+88
| | | | Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* i7core_edac: Add support for sysfs addrmatch groupMauro Carvalho Chehab2010-05-101-103/+70
| | | | Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* edac_core: Allow the creation of sysfs groupsMauro Carvalho Chehab2010-05-102-28/+67
| | | | | | | | | Currently, all sysfs nodes are stored at /sys/.*/mc. (regex) However, sometimes it is needed to create attribute groups. This patch extends edac_core to allow groups creation. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* i7core_edac: Avoid printing a warning when debug is disabledMauro Carvalho Chehab2010-05-101-2/+1
| | | | Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* i7core_edac: We need to use list_for_each_entry_safe to avoid errorsMauro Carvalho Chehab2010-05-101-2/+3
| | | | Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* i7core_edac: change remove module strategyMauro Carvalho Chehab2010-05-101-20/+35
| | | | | | | | | | | | | | The old remove module stragegy didn't work on devices with multiple cores, since only one PCI device is used to open all mc's, due to Nehalem nature. Also, it were based at pdev value. However, this doesn't point to the pci device used at mci->dev. So, instead, it unregisters all devices at once, deleting them from the device list. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* i7core_edac: remove static counter for max socketsMauro Carvalho Chehab2010-05-101-1/+0
| | | | | | | The number of sockets is now fully dynamic. Get rid of this obsolete var. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* i7core_edac: at remove, don't remove all pci devices at onceMauro Carvalho Chehab2010-05-101-17/+19
| | | | Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* i7core_edac: Fix a bug when printing error counts with RDIMMsMauro Carvalho Chehab2010-05-101-5/+8
| | | | Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* Documentation/edac.txt: Improve it to reflect the latest changes at the driverMauro Carvalho Chehab2010-05-101-16/+56
| | | | Signed-off-by: Mauro Carvalho Chehab <mcheahb@redhat.com>
* i7core_edac: a few fixes for multiple mc'sMauro Carvalho Chehab2010-05-101-9/+12
| | | | Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* i7core_edac: sanity check: print a warning if a mcelog is ignoredMauro Carvalho Chehab2010-05-101-1/+6
| | | | | | In thesis, the other mc controller should handle it. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* i7core_edac: create one mc per socket/QPIMauro Carvalho Chehab2010-05-101-279/+228
| | | | | | | | | Instead of creating just one memory controller, create one per socket (e. g. per Quick Link Path Interconnect). This better reflects the Nehalem architecture. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* Dynamically allocate memory for PCI devicesMauro Carvalho Chehab2010-05-101-61/+114
| | | | | | | | | Instead of using a static table assuming always 2 CPU sockets, allocate space dynamically for Nehalem PCI devs. This patch is part of a series of patches that changes i7core_edac to allow more than 2 sockets and to properly report one memory controller per socket.
* i7core: temporary workaround to allow it to compile against 2.6.30Mauro Carvalho Chehab2010-05-101-2/+4
| | | | Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* i7core_edac: Improve corrected_error_counts output for RDIMMMauro Carvalho Chehab2010-05-101-3/+3
| | | | | | | | | | | | | | | | | | | | Just cosmetics. instead of showing something like: socket 0, channel 2dimm0: 1 dimm1: 0 dimm2: 0 socket 1, channel 2dimm0: 0 dimm1: 0 dimm2: 0 Show: socket 0, channel 2 RDIMM0: 1 RDIMM1: 0 RDIMM2: 0 socket 0, channel 2 RDIMM0: 0 RDIMM1: 0 RDIMM2: 0 This is more synthetic and easier to parse. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* i7core_edac: Probe on Xeons earilerKeith Mannthey2010-05-101-13/+19
| | | | | | | | | | | | | | On the Xeon 55XX series cpus the pci deives are not exposed via acpi so we much explicitly probe them to make the usable as a Linux PCI device. This moves the detection of this state to before pci_register_driver is called. Its present position was not working on my systems, the driver would complain about not finding a specific device. This patch allows the driver to load on my systems. Signed-off-by: Keith Mannthey <kmannth@us.ibm.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* i7core: Use registered memories per processorMauro Carvalho Chehab2010-05-101-16/+23
| | | | | | | | | | | | Instead of assuming that the entire machine has either registered or unregistered memories, do it at CPU socket based. While here, fix a bug at i7core_mce_output_error(), where the we're using m->cpu directly as if it would represent a socket. Instead, the proper socket_id is given by cpu_data[m->cpu].phys_proc_id. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com> ---
* i7core_edac: Use Device 3 function 2 to report errors with RDIMM'sMauro Carvalho Chehab2010-05-101-30/+178
| | | | | | | | | | | | | | | | | Nehalem and upper chipsets provide an special device that has corrected memory error counters detected with registered dimms. This device is only seen if there are registered memories plugged. After this patch, on a machine fully equiped with RDIMM's, it will use the Device 3 function 2 to count corrected errors instead on relying at mcelog. For unregistered DIMMs, it will keep the old behavior, counting errors via mcelog. This patch were developed together with Keith Mannthey <kmannth@us.ibm.com> Signed-off-by: Keith Mannthey <kmannth@us.ibm.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* i7core_edac: Fix ecc enable shiftKeith Mannthey2010-05-101-1/+1
| | | | | | | | | | | | From: Keith Mannthey <kmannth@us.ibm.com> Simple correction to a shift value. ECC_ENABLED is bit 4 of MC_STATUS, Dev 3 Fun 0 Offset 0x4c This correctly identifies the state of the ECC at the machine. Signed-off-by: Keith Mannthey <kmannth@us.ibm.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* i7core_edac: Print an error message if pci register failsMauro Carvalho Chehab2010-05-101-1/+7
| | | | Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* i7core_edac: CodingSyle fixes/cleanupsMauro Carvalho Chehab2010-05-101-27/+23
| | | | | | No functional changes. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* Documentation/edac.txt: Add Nehalem specific EDAC characteristicsMauro Carvalho Chehab2010-05-101-0/+110
| | | | | | | As Nehalem has a different binding to EDAC API, and its own different error injection code, documents it. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* i7core_edac: fix error injectionMauro Carvalho Chehab2010-05-101-15/+12
| | | | | | | | | | | There were two stupid error injection bugs introduced by wrong cut-and-paste: one at socket store, and another at the error inject register. The last one were causing the code to not work at all. While here, adds debug messages to allow seeing what registers are being set while sending error injection. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* i7core_edac: fix error codes for sysfs error injection interfaceMauro Carvalho Chehab2010-05-101-4/+4
| | | | Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* i7core_edac: some fixes at error injection codeMauro Carvalho Chehab2010-05-101-53/+51
| | | | Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* i7core_edac: Some cleanups at displayed infoMauro Carvalho Chehab2010-05-101-12/+9
| | | | Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* i7core: remove some uneeded noisy debug messagesMauro Carvalho Chehab2010-05-101-4/+0
| | | | Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* i7core: add socket info at the debug msgMauro Carvalho Chehab2010-05-101-2/+2
| | | | Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* i7core: better document i7core_get_active_channels()Mauro Carvalho Chehab2010-05-101-1/+17
| | | | Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* i7core: fix get_devices routine for Xeon55xxMauro Carvalho Chehab2010-05-101-78/+108
| | | | | | | | | | | i7core_get_devices() were preparet to get just the first found device of each type. Due to that, on Xeon 55xx, only socket 1 were retrived. Rework i7core_get_devices() to clean it and to properly support Xeon 55xx. While here, fix a small typo. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* i7core: enrich error information based on memory transaction typeMauro Carvalho Chehab2010-05-101-5/+27
| | | | Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* i7core: check if the memory error is fatal or non-fatalMauro Carvalho Chehab2010-05-101-3/+13
| | | | Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* i7core: fix probing on Xeon55xxMauro Carvalho Chehab2010-05-102-3/+21
| | | | | | | | | | | | | Xeon55xx fails to probe with this error message: EDAC DEBUG: in drivers/edac/i7core_edac.c, line at 1660: MC: drivers/edac/i7core_edac.c: i7core_init() EDAC i7core: Device not found: dev 00:00.0 PCI ID 8086:2c41 i7core_edac: probe of 0000:00:14.0 failed with error -22 This is due to the fact that, on Xeon35xx (and i7core), device 00.0 has PCI ID 8086:2c40. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
* i7core_edac: some fixes at memory error parserMauro Carvalho Chehab2010-05-101-8/+14
| | | | | | | | | | | m->bank is not related to the memory bank but, instead, to the MCA Error register bank. Fix it accordingly. While here, improves the comments for Nehalem bank. A later fix is needed, in order to get bank/rank information from MCA error log. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>