Mysterious storage question

undomielundomiel Member Posts: 2,818
We had a problem yesterday with our server attached to our md3000. The server unexpectedly restarted and went into a chkdsk finding a lot of corruption on one of our volumes on the md3000. Upon finishing all of the data on that particular volume was trashed but all other volumes were fine. Checking the event logs we saw several hours worth of NTFS errors "The file system structure on the disk is corrupt and unusable. Please run the chkdsk utility on the volume New Volume." One of the admins here got on the phone with Dell and he claims that since there was no lun 31 established for that server that is why the server could no longer see the data and that is why it got trashed. I am not a storage expert so that is why I am asking here. Could the loss of lun 31 affect access to the data whatsoever? Further more would the loss of lun 31 cause a server to reboot unexpectedly? We also had a user on Friday have deleted data reappear on that volume mysteriously which may have been a related symptom. I'm not buying into the explanation that is being presented to me that someone removing lun 31 would cause a server to reboot and data corruption. Management is done of out-of-band. Anyone care to take a shot at this one?
Jumping on the IT blogging band wagon -- http://www.jefferyland.com/

Comments

  • tierstentiersten Member Posts: 4,505
    If an FS is sufficiently corrupted then it can make the OS panic. It will see things that don't make sense or shouldn't ever happen. There are only so many safeguards and checks you can put in before it will start to negatively impact performance which isn't something you really want in your FS code.
  • undomielundomiel Member Posts: 2,818
    And would having lun 31 not set up cause a corrupt file system on the storage array?
    Jumping on the IT blogging band wagon -- http://www.jefferyland.com/
  • tierstentiersten Member Posts: 4,505
    Is LUN 31 the volume that is damaged?
  • astorrsastorrs Member Posts: 3,139 ■■■■■■□□□□
    tiersten wrote:
    Is LUN 31 the volume that is damaged?
    Yeah what is LUN 31 :)
  • undomielundomiel Member Posts: 2,818
    lun 31 is the management lun which to the best that I can figure out is used for in-band management. I do not know though its use in the day to day operation of the san. It is not a storage volume, the storage volume that went bye-bye was a different lun.
    Jumping on the IT blogging band wagon -- http://www.jefferyland.com/
  • tierstentiersten Member Posts: 4,505
    If LUN 31 is an internal system volume that the disk array uses then I'm not surprised it was a tad unhappy when somebody deleted it...
  • dynamikdynamik Banned Posts: 12,314 ■■■■■■■■□□
    Yea, I'm a novice when it comes to storage, but just out of curiosity, what is special about LUN 31?
  • skrpuneskrpune Member Posts: 1,409
    I did a little googling and found an IBM help article that references LUN 31...but I'll be darned if I can make heads or tails of most of it. From what I can gather, the default setting is for the access logical drive to be mapped to LUN 31, or at least it is in the case of the storage setup in that help article. So if it's the same in this case, then I guess it would make sense that if the "map" is wrong or a part of it is missing, things would get all wonky.
    Currently Studying For: Nothing (cert-wise, anyway)
    Next Up: Security+, 291?

    Enrolled in Masters program: CS 2011 expected completion
  • ULWizULWiz Member Posts: 722
    I believe this is what you are looking for. Not 100% sure.

    In computer storage, a logical unit number or LUN is simply the number assigned to a logical unit. A logical unit is a SCSI protocol entity, the only one which may be addressed by the actual input/output (I/O) operations. Each SCSI target provides one or more logical units, and does not perform I/O as itself, but only on behalf of a specific logical unit.

    Contents [hide]
    1 Examples
    2 Other protocols
    3 cXtXdXsX nomenclature in Unix
    4 Other uses
    5 See also



    [edit] Examples
    To provide a practical example, a typical disk array has multiple physical SCSI ports, each with one SCSI target address assigned. Then the disk array is formatted as a RAID and then this RAID is partitioned into several separate storage volumes. To represent each volume, a SCSI target is configured to provide a LUN. Each SCSI target may provide multiple LUNs and thus represent multiple volumes, which does not mean that those volumes are concatenated.

    Another example is a single disk drive with one physical SCSI port. It usually provides just a single target, which in turn usually provides just a single LUN numbered zero. This LUN represents the entire storage of the disk drive.


    [edit] Other protocols
    LUN term is applicable not only to traditional parallel SCSI, but also to its descendants, like Fibre Channel Protocol (FCP), iSCSI, HyperSCSI, and others.


    [edit] cXtXdXsX nomenclature in Unix
    From the computer perspective, SCSI LUN is only a part of full SCSI address. The full device's address is made from the:

    controller ID of the host bus adapter,
    target ID identifying the SCSI target on that bus,
    disk ID identifying a LUN on that target,
    an optional (and largely obsolete) slice ID identifying a specific slice on that disk.
    In the Unix family of operating systems, these IDs are often combined into a single "name". For example, /dev/dsk/c1t2d3s4 would refer to controller 1, target 2, disk 3, slice 4. Presently Solaris, HP-UX, NCR, and others continue to use "cXtXdXsX" nomenclature, while AIX has abandoned it in favor of more familiar names.


    [edit] Other uses
    The term logical unit number also applies to a file access channel within certain programming languages. For example in FORTRAN, the WRITE statement has a form which identifies the LUN of the target file and the FORMAT of the data to be written as in WRITE(5,32) where 5 is the LUN of the file and 32 is the FORMAT statement for the write.
    CompTIA A+ Nov 25, 1997
    CompTIA Network+ March 7, 2008
    MCTS Vista 620 June 14, 2008
    MCP Server 290 Nov 15, 2008
    MCP Server 291 In Progress (Exam 12/28/09)
    Cisco CCENT In Progress
    MCP Server 291 In Progress
    C|EH In Progress
  • jibbajabbajibbajabba Member Posts: 4,317 ■■■■■■■■□□
    I am not a storage myself, but Lun 31 on Dells are indeed the management (access) lun ...

    (just know that from vmware)

    http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1004069
    My own knowledge base made public: http://open902.com :p
  • undomielundomiel Member Posts: 2,818
    I found that article as well skrpune, thank you very much. It was probably the most helpful article I've found so far. As far as I can tell not having lun 31 (management lun) mapped to access from the host just prevents in-band management, but it doesn't seem like it would have an impact upon the day-to-day operation of the device. But as I mentioned earlier I am not clear on the overall roll of the management lun and what impact having the mapping removed would have. If anybody can clear up that part it would be greatly appreciated.
    Jumping on the IT blogging band wagon -- http://www.jefferyland.com/
  • undomielundomiel Member Posts: 2,818
    Ok finally came across someone addressing a similar question: http://unix.ittoolbox.com/groups/technical-functional/ibm-aix-l/fastt-lun-31-an-idle-logical-drive-647408?cv=expanded

    So it looks like lack of access to the management lun would only affect in-band management and would have no impact upon the daily use of the storage luns. Now to see if I can find some official documentation on that somewhere to convince people.
    Jumping on the IT blogging band wagon -- http://www.jefferyland.com/
  • astorrsastorrs Member Posts: 3,139 ■■■■■■□□□□
    undomiel wrote:
    Ok finally came across someone addressing a similar question: http://unix.ittoolbox.com/groups/technical-functional/ibm-aix-l/fastt-lun-31-an-idle-logical-drive-647408?cv=expanded

    So it looks like lack of access to the management lun would only affect in-band management and would have no impact upon the daily use of the storage luns. Now to see if I can find some official documentation on that somewhere to convince people.
    I sent an email to some guys I know at Dell to see if they had a pointer. Might be a few days with US Thanksgiving and all.
  • undomielundomiel Member Posts: 2,818
    Thanks for checking astorrs, I appreciate it!
    Jumping on the IT blogging band wagon -- http://www.jefferyland.com/
  • astorrsastorrs Member Posts: 3,139 ■■■■■■□□□□
    Got a reply from Scott Hanson (a senior engineering consultant on the Dell Enterprise Technology Center team) at Dell.

    He says "LUN31 is for in-band management. When you use Modular Disk Storage Manager (MDSM) and search for in-band controllers that's what it's looking for. If you only use out-of-band management you can delete it."

    I can provide you with his contact info if that's not good enough to convince your peers at work.
  • undomielundomiel Member Posts: 2,818
    Thanks astorrs! I appreciate it. Though if they are not convinced by that then I'm just letting them drop the whole matter and sit back thinking they "fixed" the problem until it occurs again. Then I'll be laughing, and laughing hard.
    Jumping on the IT blogging band wagon -- http://www.jefferyland.com/
  • KaminskyKaminsky Member Posts: 1,235
    read the thread a couple of times and I am still confussed!

    Is this because someone deleted something they shouldn't or did a raid member disk just die ?

    I'm confussed because I don't understand that if a disk died (in an array) why you dont just replace the disk with a new one and let the raid 5 backplane rebuild it ?

    No offence but I wouldn't believe a bloody word that comes out of Dell's Hardware support response... I work for a large corporate too and I know the resolve/productivity quotient game as well. Rule #2 ... come the weekend, if it's too hard to figure out ... blind the buggers with science and waffle... at bear minimum, it will get the SLA past the weekend whilst they think about the crap we just gave them...
    Kam.
  • astorrsastorrs Member Posts: 3,139 ■■■■■■□□□□
    Kaminsky wrote:
    read the thread a couple of times and I am still confussed!

    Is this because someone deleted something they shouldn't or did a raid member disk just die ?

    I'm confussed because I don't understand that if a disk died (in an array) why you dont just replace the disk with a new one and let the raid 5 backplane rebuild it ?

    No offence but I wouldn't believe a bloody word that comes out of Dell's Hardware support response... I work for a large corporate too and I know the resolve/productivity quotient game as well. Rule #2 ... come the weekend, if it's too hard to figure out ... blind the buggers with science and waffle... at bear minimum, it will get the SLA past the weekend whilst they think about the crap we just gave them...
    LUN31 (think of a LUN as a virtual disk - it just happens to span more than one physical one) was not present and the Dell support flake was blaming that as the cause of their corruption (on a different LUN).

    undomiel was trying to determine if LUN31 was actually required for normal operation of the array. Some of the people in his IT group were pressing that it was, since that's what the Dell Support guy said. But he wanted confirmation if it was and clarification on why it would be necessary.

    I was able to get a reply from a friend at Dell ETC (in between him taking bites of turkey) who confirmed that the Dell Support guy was off his rocker and the LUN31 doesn't matter unless you are trying to manage the array over TCP/IP on the iSCSI network (instead of using a separate management network) and that it being missing doesn't make a difference in this case. I trust his opinion (over that of some lowly paid phone support tech) his team is the ones that do performance testing of the hardware in combination with things like VMware/Hyper-V/SQL/Exchange/etc, and then write best practices documents, whitepapers, etc. They know their stuff.

    Basically what it boils down to is that undomiel is correct, in his companies environment LUN31 is not required and it being missing had nothing to do with the problem they encountered.
Sign In or Register to comment.