summaryrefslogtreecommitdiffstats
path: root/drivers/edac/mce_amd.c (follow)
Commit message (Collapse)AuthorAgeFilesLines
* EDAC, mce_amd: Detect SMCA using X86_FEATURE_SMCAYazen Ghannam2016-05-121-6/+3
| | | | | | | | | | | | | | | | | | | | Use X86_FEATURE_SMCA when detecting if SMCA is available instead of directly using CPUID 0x80000007_EBX. Signed-off-by: Yazen Ghannam <Yazen.Ghannam@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Borislav Petkov <bp@alien8.de> Cc: Brian Gerst <brgerst@gmail.com> Cc: Denys Vlasenko <dvlasenk@redhat.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1462971509-3856-7-git-send-email-bp@alien8.de Signed-off-by: Ingo Molnar <mingo@kernel.org>
* x86/mce/AMD, EDAC: Enable error decoding of Scalable MCA errorsAravind Gopalakrishnan2016-03-081-3/+332
| | | | | | | | | | | | | | | | | | | | | | | | | For Scalable MCA enabled processors, errors are listed per IP block. And since it is not required for an IP to map to a particular bank, we need to use HWID and McaType values from the MCx_IPID register to figure out which IP a given bank represents. We also have a new bit (TCC) in the MCx_STATUS register to indicate Task context is corrupt. Add logic here to decode errors from all known IP blocks for Fam17h Model 00-0fh and to print TCC errors. [ Minor fixups. ] Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1457021458-2522-3-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* EDAC, mce_amd: Don't emit 'CE' for Deferred errorAravind Gopalakrishnan2015-07-141-1/+2
| | | | | | | | | | | | | | | | | | | | Currently, when decoding an MCE, we display 'CE' for a Deferred error, like this: [Hardware Error]: CPU:0 (15:2:0) MC4_STATUS[Over|CE|MiscV|-|AddrV|Deferred|-|UECC]: 0xdc04b00095080813 When the 'UC' bit in the MCx_STATUS register is clear, the error status is either a Corrected error or Deferred error as determined by the 'Deferred' bit. So do not print 'CE' on a deferred error. Refer to AMD Error Scope Hierarchy table in a newer BKDG (example: 49125_15h_Models_30h-3Fh_BKDG.pdf, section "RAS Features"). Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: linux-edac <linux-edac@vger.kernel.org> Link: http://lkml.kernel.org/r/1436788382-6463-1-git-send-email-aravind.gopalakrishnan@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
* EDAC, MCE, AMD: Correct formatting of decoded textBorislav Petkov2014-11-251-3/+3
| | | | | | | | Write out MCx_ADDR into the more humanly readable "MCx Error Address" and remove double colon in the output. Cc: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de>
* EDAC, MCE, AMD: Add decoding table for MC6 xecAravind Gopalakrishnan2014-11-041-30/+11
| | | | | | | | | Extended error code meanings are tabulated for other banks. Extend that tradition for MC6 too. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/1415122868-10969-1-git-send-email-aravind.gopalakrishnan@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
* EDAC, MCE, AMD: Add MCE decoding for F15h M60hAravind Gopalakrishnan2014-07-141-4/+40
| | | | | | | | | | | Add decoding logic for new Fam15h model 60h. Tested using mce_amd_inj module and works fine. Signed-off-by: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/1405098795-4678-1-git-send-email-Aravind.Gopalakrishnan@amd.com [ Boris: simplify a bit. ] Signed-off-by: Borislav Petkov <bp@suse.de>
* EDAC, MCE, AMD: Remove leftover unused maskBorislav Petkov2014-05-081-2/+0
| | | | | | | | 295d8cda2689 ("EDAC, MCE, AMD: Drop local coreid reporting") removed the code snippet which used that mask but forgot to drop the mask itself. Do that now. Signed-off-by: Borislav Petkov <bp@suse.de>
* MCE, AMD: Fix decoding module loading on unsupported hwBorislav Petkov2014-02-241-32/+33
| | | | | | | | | | | | | | | | | | | We want to still be able to issue some error information on systems for which there is no decoding support (think older distro kernels here, for example). Therefore, we allow module registration but skip the per-family bank-specific decoders and issue the general information only, i.e.: [ 46.822828] [Hardware Error]: Error Status: Uncorrected, software containable error. [ 46.822846] [Hardware Error]: CPU:0 (15:30:0) MC0_STATUS[-|UE|-|-|-|-|-]: 0xa000000000010f0f [ 46.822858] [Hardware Error]: cache level: L3/GEN, mem/io: GEN, mem-tx: GEN, part-proc: GEN (timed out) with the hope that it still contains helpful useful bits. Suggested-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> Tested-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> Link: http://lkml.kernel.org/r/1392659391-2411-1-git-send-email-Aravind.Gopalakrishnan@amd.com Signed-off-by: Borislav Petkov <bp@suse.de>
* EDAC, MCE, AMD: Add an MCE signature for new Fam15h modelsAravind Gopalakrishnan2013-06-081-2/+3
| | | | | | | | | | Add a new error signature for Family 15h, models 30h-3fh. Patch has been tested on Fam15h using mce_amd_inj facility and has been verified to work correctly. Signed-off-by: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com> [ cleanup commit message and error string ] Signed-off-by: Borislav Petkov <bp@suse.de>
* EDAC, MCE, AMD: Remove unneeded exportsBorislav Petkov2013-01-221-11/+6
| | | | | | | | | | | | Initially, those strings describing different parts of an MCE message were shared with amd64_edac and were therefore exported to modules. However, all except pp_msgs are used only in one place right now so hide them and make them static. No functionality change. Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Borislav Petkov <bp@alien8.de>
* EDAC, MCE, AMD: Add MCE decoding support for Family 16hJacob Shin2013-01-221-17/+78
| | | | | | | | | | | | Add MCE decoding logic for AMD Family 16h processors. Boris: - drop unneeded uu_msgs export - exit early in cat_mc1_mce and save us an indentation level Signed-off-by: Jacob Shin <jacob.shin@amd.com> Signed-off-by: Borislav Petkov <bp@alien8.de>
* EDAC, MCE, AMD: Make MC2 decoding per-familyJacob Shin2013-01-221-27/+29
| | | | | | | | | Currently only AMD Family 15h processors have special handling for MC2 errors. Since upcoming Family 16h will also need unique handling, let's make MC2 handling part of amd_decoder_ops. Signed-off-by: Jacob Shin <jacob.shin@amd.com> Signed-off-by: Borislav Petkov <bp@alien8.de>
* MCE, AMD: Dump error statusBorislav Petkov2012-11-281-2/+20
| | | | | | | Dump error status after decoding the error which describes the error disposition. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* MCE, AMD: Report decoded error type firstBorislav Petkov2012-11-281-25/+25
| | | | | | | Instead of starting with the error details, report the decoded, readable error type first. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* MCE, AMD: Dump CPU f/m/s triple with the errorBorislav Petkov2012-11-281-4/+6
| | | | | | | It is very useful to have the family/model/stepping with the reported error so dump it. This saves us asking the bug reporter about it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* MCE, AMD: Remove functional unit referencesBorislav Petkov2012-11-281-94/+92
| | | | | | | | | | | | | Having the functional unit names in each bank decode is only misleading as this code supports multiple families and there's no guarantee the mapping between FUs and MCE banks will stay the same. And also, knowing the functional unit name doesn't help much since you end up looking at the respective BKDG anyway. So drop all FU references and use the MC bank numbers instead. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* MCE, AMD: Drop too granulary family model checksBorislav Petkov2012-04-041-4/+2
| | | | | | | | | | MCA details seldom change inbetween the models of a family so don't be too conservative and enable decoding on everything starting from K8 onwards. Minor adjustments can come in later but most importantly, we have some decoding infrastructure in place for upcoming models by default. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* MCE, AMD: Constify error tablesBorislav Petkov2012-03-191-7/+7
| | | | | | | ... so that checkpatch can chill out. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>
* MCE, AMD: Correct bank 5 error signaturesBorislav Petkov2012-03-191-4/+1
| | | | | | | ... and remove superfluous ErrorCodeExt check. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>
* MCE, AMD: Rework NB MCE signaturesBorislav Petkov2012-03-191-128/+48
| | | | | | | | Correct their formulation, replace per-family functions with a single, unified lookup table. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>
* MCE, AMD: Correct VB data error descriptionBorislav Petkov2012-03-191-1/+1
| | | | | | | Sync with latest BKDG error types. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>
* MCE, AMD: Correct ucode patch buffer descriptionBorislav Petkov2012-03-191-2/+6
| | | | | | | This MC1 error signature is called differently now, fix it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>
* MCE, AMD: Correct some MC0 error typesBorislav Petkov2012-03-191-3/+2
| | | | | | | | Use "System Read Data Error" as a more general name for MC0 bus errors on F15h and update some error definitions. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com> Reviewed-by: Andreas Herrmann <andreas.herrmann3@amd.com>
* x86, mce: Add wrappers for registering on the decode chainBorislav Petkov2011-12-141-2/+2
| | | | | | | No functionality change, this is done so that in a follow-on patch all queued-up MCEs can be decoded after registering on the chain. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* EDAC, MCE, AMD: Simplify NB MCE decoder interfaceBorislav Petkov2011-10-061-10/+10
| | | | | | | | Drop third nbcfg argument which is old remains and not required anymore. No functionality change. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* EDAC, MCE, AMD: Drop local coreid reportingBorislav Petkov2011-10-061-19/+1
| | | | | | | | | MCE decoding code is reporting the core which encountered the error unconditionally now so drop this piece. Besides, it reported the coreid in the local processor package which is not that valuable as a datapoint. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* EDAC, MCE, AMD: Print valid addr when reporting an errorBorislav Petkov2011-10-061-1/+3
| | | | | | | | | The MCi_STATUS bank has a AddrV bit which, when set, denotes that the corresponding MCi_ADDR MSR contains a valid address belonging to the MCE currently being reported. Dump it since it is definitely relevant information. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* EDAC, MCE, AMD: Print CPU number when reporting the errorBorislav Petkov2011-10-061-2/+2
| | | | | | | | | Currently, correctable ECCs go through mcelog and do not print the scary MCE banner. In that case, however, reporting the core where the CECC happened is important information so dump it along with the decoded string albeit at risk of having a minor redundancy. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* amd64_edac: Enable driver on F15hBorislav Petkov2011-03-171-3/+3
| | | | | | | | | Add the PCI device ids required for driver registration. Remove pvt->ctl_name and use the family descriptor directly, instead. Then, bump driver version and fixup its format. Finally, enable DRAM ECC decoding on F15h. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* amd64_edac: Cleanup NBSH cruftBorislav Petkov2011-03-171-1/+1
| | | | | | | | | Remove reporting of errors with UC bit set - this is done by the MCE decoding code anyway and this driver deals with DRAM ECC errors only. UC (NB uncorrectable error) doesn't necessarily mean it is a DRAM error. Remove unused macros while at it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* EDAC, MCE: Fix NB error formattingBorislav Petkov2011-01-071-7/+10
| | | | | | | Minor formatting fixup since the information which core was associated with the MCE is not always valid. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* EDAC, MCE: Use BIT_64() to eliminate warnings on 32-bitRandy Dunlap2011-01-071-2/+2
| | | | | | | | | | | | | Building for X86_32 produces shift count warnings, so use BIT_64() to eliminate the warnings. drivers/edac/mce_amd.c:778: warning: left shift count >= width of type drivers/edac/mce_amd.c:778: warning: left shift count >= width of type Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Cc: Doug Thompson <dougthompson@xmission.com> Cc: bluesmoke-devel@lists.sourceforge.net Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* EDAC, MCE: Enable MCE decoding on F15hBorislav Petkov2011-01-071-6/+8
| | | | | | | Now that everything is inplace, enable MCE decoding on F15h. Make initcall routine a bit more readable. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* EDAC, MCE: Shorten error report formattingBorislav Petkov2011-01-071-22/+32
| | | | | | | Shorten up MCi_STATUS flags and add BD's new deferred and poison types. Also, simplify formatting. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* EDAC, MCE: Overhaul error fields extraction macrosBorislav Petkov2011-01-071-47/+36
| | | | | | Make macro names shorter thus making code shorter and more clear. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* EDAC, MCE: Add F15h FP MCE decoderBorislav Petkov2011-01-071-0/+44
| | | | | | Add decoder for FP MCEs. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* EDAC, MCE: Add F15 EX MCE decoderBorislav Petkov2011-01-071-7/+34
| | | | | | | Integrate the single FIROB signature into an expanded table along with the new BD MCE types. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* EDAC, MCE: Add an F15h NB MCE decoderBorislav Petkov2011-01-071-0/+10
| | | | | | by (almost) reusing the F10h one since the signatures are the same. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* EDAC, MCE: No F15h LS MCE decoderBorislav Petkov2011-01-071-1/+1
| | | | | | F15h BD doesn't generate LS MCEs so warn about it. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* EDAC, MCE: Add F15h CU MCE decoderBorislav Petkov2011-01-071-1/+61
| | | | | | | MCE bank 2 is redefined from a BU to a CU (Combined Unit) bank on F15h. Add a decoder function for CU MCEs. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* EDAC, MCE: Add F15h IC MCE decoderBorislav Petkov2011-01-071-3/+50
| | | | | | Add support for decoding F15h IC MCEs. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* EDAC, MCE: Add F15h DC MCE decoderBorislav Petkov2011-01-071-18/+61
| | | | | | | Add a decoder for F15h DC MCEs to support the new types of DC MCEs introduced by the BD microarchitecture. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* EDAC, MCE: Select extended error code maskBorislav Petkov2011-01-071-4/+9
| | | | | | | | F15h enlarges the extended error code of an MCE to a 5-bit field (MCi_STATUS[20:16]). Add a mask variable which default 0xf is overridden on F15h. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* EDAC, MCE: Fix shift warning on 32-bitBorislav Petkov2010-10-211-1/+1
| | | | | | | | | | | Fix drivers/edac/mce_amd.c:262: warning: left shift count >= width of type on 32-bit builds. Reported-by: Randy Dunlap <randy.dunlap@oracle.com> Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* EDAC, MCE: Enable MCE decoding on F12hBorislav Petkov2010-10-211-1/+1
| | | | | | Turn on MCE decoding on F12h. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* EDAC, MCE: Add F12h NB MCE decoderBorislav Petkov2010-10-211-2/+3
| | | | | | F12h is completely covered by the generic path. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* EDAC, MCE: Add F12h IC MCE decoderBorislav Petkov2010-10-211-0/+1
| | | | | | ... which is the same as for K8 and F10h. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* EDAC, MCE: Add F12h DC MCE decoderBorislav Petkov2010-10-211-7/+17
| | | | | | F12h DC MCE signatures are a subset of F10h's so reuse them. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* EDAC, MCE: Add support for F11h MCEsBorislav Petkov2010-10-211-3/+12
| | | | | | | F11h has almost the same MCE signatures as K8 except DRAM ECC and MC5 bank errors. Reuse functionality from the other families. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
* EDAC, MCE: Enable MCE decoding on F14hBorislav Petkov2010-10-211-4/+5
| | | | | | | | Now that all decoders have been taught about F14h, models < 0x10 MCEs, enable decoding on this family of CPUs. Also, issue a short informational message upon boot that MCE decoding gets enabled. Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>