Tracing dell r720 hardware errors in ESXi 6

DevilWAHDevilWAH Member Posts: 2,997 ■■■■■■■■□□
Hi,

we are running some R720's for VMware View and they all are having this same issues. where they purple screen of death. and in the logs we have the following errors





A bus fatal error was detected on a component at bus 0 device 3 function 0.
A bus fatal error was detected on a component at slot 6.





I tried to use lspci and I get an output of the device but all i see is...

0000:00:03.0 Bridge: Intel Corporation Xeon E7 v2/Xeon E5 v2/Core i7 PCI Express Root Port 3a [PCIe RP[0000:00:03.0]]

but I cant go any further than this, the -s switch some people have suggested is not accepted.

can any one help me determine what device is actually causing the issues. what physical slot does 3a refer to?

I would be very very grateful for some help here.
  • If you can't explain it simply, you don't understand it well enough. Albert Einstein
  • An arrow can only be shot by pulling it backward. So when life is dragging you back with difficulties. It means that its going to launch you into something great. So just focus and keep aiming.

Comments

  • Mike7Mike7 Member Posts: 1,107 ■■■■□□□□□□
    Do you have Dell OMSA installed?
  • DevilWAHDevilWAH Member Posts: 2,997 ■■■■■■■■□□
    I believe so (pretty sure) but I am not a server engineer so not really used it apart from what it shows in the hardware tab of vmware .
    • If you can't explain it simply, you don't understand it well enough. Albert Einstein
    • An arrow can only be shot by pulling it backward. So when life is dragging you back with difficulties. It means that its going to launch you into something great. So just focus and keep aiming.
  • kiki162kiki162 Member Posts: 635 ■■■■■□□□□□
    Update the firmware on your NIC card(s) for that Dell Server. I've had similar issues with it crashing for no reason, and at various times throughout the day.

    Anytime you deal with Dell servers, you need to stay on top of all of the firmware updates.
  • tbgree00tbgree00 Member Posts: 553 ■■■■□□□□□□
    Dell sells a vCenter plugin that will monitor your hardware in the web client and tell you when there's a critical update for firmware plus let you remote control a server without going to the iDrac/CMC and tell you if you're are a firmware/driver/hardware version compatible with the version of ESXi you're running. I'm sure you can talk to your sales rep to get a demo. It's really handy.
    I finally started that blog - www.thomgreene.com
  • DevilWAHDevilWAH Member Posts: 2,997 ■■■■■■■■□□
    Umm this is the fun point, we upgraded firmware's for another reason and then the issues started, we have now been told to update to x y an z firmware but all have same issues.

    but no one can seem to tell us what is the device causing the issue. I want to know how to trace the device as much as how to fix the issue.
    • If you can't explain it simply, you don't understand it well enough. Albert Einstein
    • An arrow can only be shot by pulling it backward. So when life is dragging you back with difficulties. It means that its going to launch you into something great. So just focus and keep aiming.
  • Mike7Mike7 Member Posts: 1,107 ■■■■□□□□□□
    DevilWAH wrote: »
    I believe so (pretty sure) but I am not a server engineer so not really used it apart from what it shows in the hardware tab of vmware .

    You probably need to use OpenManage Server Administrator on a Windows/Linux server to access it (use manage Remote Node option). That will show you all the hardware details including any hardware failure logs. Google for steps.

    It could be NIC firmware issue as the other poster mention. Maybe give that a try first.

    I assume the company is not using OME (OpenManage Essentials). This is a consolidated system management system for viewing dell server hardware details and collect hardware faults. We used it to schedule server firmware, BIOS and driver updates. And there is also Dell SupportAssist for OME. Your server configuration and hardware failure details are sent to Dell's 24x7 enterprise support; they will inform you of any faults and send replacement hardware to you automatically. All for free. icon_rolleyes.gif
Sign In or Register to comment.