HP VA7400存储故障诊断,数据恢复有可能

环境:VA7400
           两个盘笼,每个盘笼分别14块硬盘 总共28块硬盘,分别做了两个RAID GROUP  每个RAIDGOURP是AUTORAID(RAID 0+1)
           其中无法读取数据的VG(一读到这个VG里LV里的某些固定的文件的时候,主机HUANG住,存储不停的在扫描硬盘,硬件已经判定有不止一块有坏道的盘),该VG总共有两个LUN组成  分别在存储两个RAIDGROUP上,我们做过DD测试,当在其中一个RAIDGRUOP中用DD LUN的时候 正常, 但在另外个RAIDGROUP中DD LUN的时候  发生主机HUANG住 存储不停扫瞄硬盘(现象跟读取那个VG里的数据情况一样)
          所以现在可以肯定的是,存储两个RAIDGROUP中,有一个是完全正常的,另一个RAIDGROUP有问题,而那个VG中的两个LUN,正好有一个LUN在有问题的RAIDGROUP中.
          另外,这个有问题的RAIDGROUP,同时坏过两块盘(控制器报出来的)
          我们需要的数据也正好在那个VG上
          附件是硬件日志以及LUN信息的分布,您可以参考一下

{本文由达思总工程师覃廷良撰写,转载请注明出处(http://www.bnuol.com 达思数据恢复技术博客)}

以下截取日志片断

SUB-SYSTEM SETTINGS

  RAID Level:___________________________HPAutoRAID
  Auto Format Drive:____________________On
  Hang Detection:_______________________On
  Capacity Depletion Threshold:_________100%
  Queue Full Threshold Maximum:_________4096
  Enable Optimize Policy:_______________True
  Enable Manual Override:_______________False
  Manual Override Destination:__________False
  Read Cache Disable:___________________False
  Rebuild Priority:_____________________Low
  Security Enabled:_____________________False
  Shutdown Completion:__________________0
  Subsystem Type ID:____________________1
  Unit Attention:_______________________True
  Volume Set Partition (VSpart):________False
  Write Cache Enable:___________________True
  Write Working Set Interval:___________8640
  Enable Prefetch:______________________False
  Disable Secondary Path Presentation:__False


  Enclosure at M
  Enclosure ID__________________________0
  Enclosure Status______________________Failed
  Enclosure Type________________________HP StorageWorks Virtual Array 7400
  Node WWN______________________________50060b000014e7d6

  FRU       HW COMPONENT   IDENTIFICATION                   ID STATUS
  ===========================================================================
  M         Enclosure      00SG223J0074                        Failed
  M/P1      Power Supply   94020HE00808                        Good
  M/P2      Power Supply   94020HE00717                        Good
  M/MP1     MidPlane       000601310041                        Good
  M/C2      Controller     00PR05B50445                        Good
  M/C2.H1   Host Port      <none>                              Good
  M/C2.J1   BackEnd Port   <none>                              Good
  M/C2.B1   Battery        40133:MOLTECHPS:NI2040:2002/7/19    Good
  M/C2.PM1  Processor      HP:A6189A:HP19                      Good
  M/C2.M1   DIMM           512                                 Good
  M/C1      Controller                                         Failed
  M/D1      Disk           3EK1NM33                            Good
  M/D2      Disk           3EK0MF81                            Good
  M/D3      Disk           3EK1NXQ6                            Good
  M/D4      Disk           3HZ0G1QD                            Good
  M/D5      Disk           3EK1NQEM                            Good
  M/D6      Disk           3EK1NX69                            Good
  M/D7      Disk           3EK1NMZT                            Good
  M/D8      Disk           3EK10AZS                            Good
  M/D9      Disk           3KP17QL80000                        Good
  M/D10     Disk           3HZ92CQ9                            Good
  M/D11     Disk           3EK1KDSJ                            Good
  M/D12     Disk           3HZ0MVX7                            Good
  M/D13     Disk           3EK24C4H                            Good
  M/D14     Disk           3EK1NHSA                            Good

Enclosure at JA0
  Enclosure ID__________________________0
  Enclosure Status______________________Good
  Enclosure Type________________________HP StorageWorks Disk System DS2405
  Node WWN______________________________50060b0000195066

  FRU       HW COMPONENT   IDENTIFICATION                   ID STATUS
  ===========================================================================
  JA0       Enclosure      SG22200001                          Good
  JA0/MP1   MidPlane       SG22200001                          Good
  JA0/P1    Power Supply   62020FD01285                        Good
  JA0/P2    Power Supply   62020FD01267                        Good
  JA0/C2    LCC            R25DK1444151                        Good
  JA0/C2.H1 Front Port     <none>                              Good
  JA0/D1    Disk           3EK1MCCP                            Good
  JA0/D2    Disk           3EK01ZQN                            Good
  JA0/D3    Disk           3EK1NJNS                            Good
  JA0/D4    Disk           3EK1NL2T                            Good
  JA0/D5    Disk           3EK1NFRN                            Good
  JA0/D6    Disk           3EK1N23S                            Good
  JA0/D7    Disk           3EK1NLZL                            Good
  JA0/D8    Disk           3EK1NFJM                            Good
  JA0/D9    Disk           3EK1SBD8                            Good
  JA0/D10   Disk           3HZY5F6L                            Good
  JA0/D11   Disk           3EK1NVJZ                            Good
  JA0/D12   Disk           3EK1NQ2J                            Good
  JA0/D13   Disk           3EK1NLX5                            Good
  JA0/D14   Disk           3EK16N2S                            Good

 

Disk at JA0/D9:
  Status:_______________________________Good
  Disk State:___________________________Included
  Vendor ID:____________________________HP 73.4G
  Product ID:___________________________ST373405FC
  Product Revision:_____________________HP09
  Data Capacity:________________________66.757 GB (140000000 blocks)
  Block Length:_________________________520 bytes
  Address:______________________________8
  Node WWN:_____________________________20000004cfa1a362
  Initialize State:_____________________Ready
  Redundancy Group:_____________________1
  Volume Set Serial Number:_____________000027C200000003
  Serial Number:________________________3EK1SBD8
  Firmware Revision:____________________HP09
  Recovery Maps are on this disk.

  Disk at JA0/D13:
  Status:_______________________________Good
  Disk State:___________________________Included
  Vendor ID:____________________________HP 73.4G
  Product ID:___________________________ST373405FC
  Product Revision:_____________________HP09
  Data Capacity:________________________66.757 GB (140000000 blocks)
  Block Length:_________________________520 bytes
  Address:______________________________12
  Node WWN:_____________________________20000004cf98f82c
  Initialize State:_____________________Ready
  Redundancy Group:_____________________1
  Volume Set Serial Number:_____________000027C200000003
  Serial Number:________________________3EK1NLX5
  Firmware Revision:____________________HP09
  Recovery Maps are on this disk.

 

     初步看了日志,HP VA7400存储使用的硬盘采用520字节进行格式化,
(Block Length:_________________________520 bytes),如果要进行数据恢复,则必须把硬盘镜像出来,然后进行Raid组合。

HP VA7400,采用AutoRaid方式,然后划分出LUN,LUN空间的分配不是线性平行分配,而是由Block Map方式记录LUN空间分配地址,即便把Raid原样组合出来,还不能完全确定LUN的空间分配,要弄清楚LUN的空间分配,就得查看分析MetaData所在的硬盘,一般会有两个硬盘存放MetaData(该硬盘被标记上Recovery Maps are on this disk.),这个MetaData的存储方式,除了HP VA 系列存储设计研发人员知道,别人如果没有测试环境研究,没办法的到准确信息。
     从本故障信息看,很有可能是MetaData硬盘出现了异常,导致控制器上的信息跟硬盘上的信息不一致,读取LUN时,Map信息不准确或者地址溢出,死机或者自动重启是必然的。
     既然原因分析出来,就去验证这两块MetaData硬盘到底是不是良好的,从而下手数据恢复技术操作。