Hello,
Dealing with an ESX server with VMs giving slow performance. Specifically the SBS server VM. I know SBS is difficult on a good day, but very odd errors have been occurring as of late and it prompted me to take a look at the MegaRAID / PERC log.
Back in late June, drives were replaced in the array. I see that during the rebuild, it mentions that there were uncorrectable errors during the recovery, and several messages along the lines of "Puncturing bad block on PD..."
I am familiar with WHY a punctured array needs to be flushed and rebuild from scratch, and am definitely ready to do so if this is the case.Performing this task and the massive amount of time involved to migrate the VMs, rebuild, then migrate back, is why I'm here to ask to get other's opinions. I'd like to be almost sure that this is the case before we proceed. The Array did finish rebuilding and is in an Optimal state at the moment. Several recent Patrol Reads have passed fine.
Thank you.
Here are some of the messages from the log, I chopped out some of the binary / hex dumps for readability:
Event Sequence Number : 191944
Timestamp : 6/29/2016 ; 20:26:51
Event code : 113
Locale : PD event
Class : Information
Description of the event : Unexpected sense: PD 01(e0x20/s1) Path 5000c5007f5273ad, CDB: 28 00 02 cf fb 6d 00 00 13 00, Sense: 3/11/00
Argument Type Value: 1
Argument Type : MR_EVT_ARGS_CDB_SENSE
Physical Device ID : 0x1
Enclosure Index : 32
Slot Number : 1
Length of CDB : 10
Length of request sense data : 18
CDB :
28 00 02 CF FB 6D 00 00 13 00 00 00 00 00 00 00
Event Sequence Number : 191945
Timestamp : 6/29/2016 ; 20:26:51
Event code : 111
Locale : PD event
Class : Fatal
Description of the event : Unrecoverable medium error during recovery on PD 01(e0x20/s1) at 2cffb6d
Argument Type Value: 12
Argument Type : MR_EVT_ARGS_PD_LBA
LBA : 23039 MB
Physical Device ID : 0x1
Enclosure Index : 32
Slot Number : 1
Event Sequence Number : 191946
Timestamp : 6/29/2016 ; 20:26:51
Event code : 111
Locale : PD event
Class : Fatal
Description of the event : Unrecoverable medium error during recovery on PD 01(e0x20/s1) at 2cffb6d
Argument Type Value: 12
Argument Type : MR_EVT_ARGS_PD_LBA
LBA : 23039 MB
Physical Device ID : 0x1
Enclosure Index : 32
Slot Number : 1
Event Sequence Number : 191947
Timestamp : 6/29/2016 ; 20:26:51
Event code : 97
Locale : PD event
Class : Fatal
Description of the event : Puncturing bad block on PD 00(e0x20/s0) at 2cffb6d
Argument Type Value: 12
Argument Type : MR_EVT_ARGS_PD_LBA
LBA : 23039 MB
Physical Device ID : 0x0
Enclosure Index : 32
Slot Number : 0
Event Sequence Number : 191948
Timestamp : 6/29/2016 ; 20:26:51
Event code : 97
Locale : PD event
Class : Fatal
Description of the event : Puncturing bad block on PD 01(e0x20/s1) at 2cffb6d
Argument Type Value: 12
Argument Type : MR_EVT_ARGS_PD_LBA
LBA : 23039 MB
Physical Device ID : 0x1
Enclosure Index : 32
Slot Number : 1
Event Sequence Number : 191949
Timestamp : 6/29/2016 ; 20:26:53
Event code : 113
Locale : PD event
Class : Information
Description of the event : Unexpected sense: PD 01(e0x20/s1) Path 5000c5007f5273ad, CDB: 28 00 02 d0 17 24 00 00 0c 00, Sense: 3/11/00
Argument Type Value: 1
Argument Type : MR_EVT_ARGS_CDB_SENSE
Physical Device ID : 0x1
Enclosure Index : 32
Slot Number : 1
Length of CDB : 10
Length of request sense data : 18
CDB :
28 00 02 D0 17 24 00 00 0C 00 00 00 00 00 00 00
Event Sequence Number : 191950
Timestamp : 6/29/2016 ; 20:26:53
Event code : 111
Locale : PD event
Class : Fatal
Description of the event : Unrecoverable medium error during recovery on PD 01(e0x20/s1) at 2d01725
Argument Type Value: 12
Argument Type : MR_EVT_ARGS_PD_LBA
LBA : 23042 MB
Physical Device ID : 0x1
Enclosure Index : 32
Slot Number : 1
Event Sequence Number : 192303
Timestamp : 6/29/2016 ; 21:7:58
Event code : 271
Locale : LD event
Class : Fatal
Description of the event : Uncorrectable medium error logged for VD 00/0 at 870b05c (on PD 00(e0x20/s0) at 2d03adc)
Argument Type Value: 6
Argument Type : MR_EVT_ARGS_LD_LBA_PD_LBA
LD LBA : 69142 MB
PD LBA : 23047 MB
Logical Drive Target ID : 0
Firmware LD Number : 0
Physical Device ID : 0x0
Enclosure Index : 32
Slot Number : 0
Event Sequence Number : 192400
Timestamp : 6/30/2016 ; 5:23:48
Event code : 99
Locale : PD event
Class : Information
Description of the event : Rebuild complete on VD 00/0
Argument Type Value: 2
Argument Type : MR_EVT_ARGS_LD
Logical Drive Target ID : 0
Firmware LD Number : 0
Event Sequence Number : 192401
Timestamp : 6/30/2016 ; 5:23:48
Event code : 100
Locale : PD event
Class : Information
Description of the event : Rebuild complete on PD 00(e0x20/s0)
Argument Type Value: 10
Argument Type : MR_EVT_ARGS_PD
Physical Device ID : 0x0
Enclosure DeviceID : 32
Slot Number : 0
Event Sequence Number : 192402
Timestamp : 6/30/2016 ; 5:23:51
Event code : 114
Locale : PD event
Class : Information
Description of the event : State change on PD 00(e0x20/s0) from REBUILD(14) to ONLINE(18)
Argument Type Value: 15
Argument Type : MR_EVT_ARGS_PD_STATE
Physical Device ID : 0x0
Enclosure Index : 32
Slot Number : 0
Previous State : 20
New State : 24
Event Sequence Number : 192403
Timestamp : 6/30/2016 ; 5:23:52
Event code : 81
Locale : LD event
Class : Information
Description of the event : State change on VD 00/0 from DEGRADED(2) to OPTIMAL(3)
Argument Type Value: 8
Argument Type : MR_EVT_ARGS_LD_STATE
Logical Drive Target ID : 0
Firmware LD Number : 0
Previous State : 2
New State : 3
06/29/16 19:02:17: ErrLBAOffset (31) LBA(2cffe98) BadLba=2cffec9
06/29/16 19:02:17: EVT#191424-06/29/16 19:02:17: 111=Unrecoverable medium error during recovery on PD 02(e0x20/s2) at 2cffec9
06/29/16 19:02:17: BBMMarkBadBlock: pd=02, pdLBA=2cffec9
06/29/16 19:02:17: EVT#191425-06/29/16 19:02:17: 97=Puncturing bad block on PD 02(e0x20/s2) at 2cffec9
06/29/16 19:02:18: EVT#191426-06/29/16 19:02:18: 113=Unexpected sense: PD 01(e0x20/s1) Path 5000c5007f5273ad, CDB: 28 00 02 cf fb 40 00 00 40 00, Sense: 3/11/00
06/29/16 19:02:18: Raw Sense for PD 1: f0 00 03 02 cf fb 6c 0a 00 00 00 00 11 00 81 80 00 97
06/29/16 19:02:18: DEV_REC:Medium Error DevId[1] devHandle c RDM=807a0800 retires=0
06/29/16 19:02:18: DM_PerformSenseDataRecovery: MedErr is for: cmdId=246e, ld=0, src=4, cmd=1, lba=870a9c0, cnt=8, rmwOp=0
06/29/16 19:02:18: -> recoveryChild: ld=0 orgLi=0 recPhysArm=0 badPhysArm=ff doneFun=5 sRef=ff00 eRef=0 recFlags=0
06/29/16 19:02:18: -> RecParent: cmdId=2482, src=7, cmd=2, lba=86ff140, cnt=140, rmwOp=0, refs=40/7f
06/29/16 19:02:18: ErrLBAOffset (2c) LBA(2cffb40) BadLba=2cffb6c
06/29/16 19:02:18: EVT#191427-06/29/16 19:02:18: 111=Unrecoverable medium error during recovery on PD 01(e0x20/s1) at 2cffb6c
06/29/16 19:02:18: BBMMarkBadBlock: pd=01, pdLBA=2cffb6c
06/29/16 19:02:18: EVT#191428-06/29/16 19:02:18: 97=Puncturing bad block on PD 01(e0x20/s1) at 2cffb6c
06/29/16 19:02:19: Invalidating Line due to failed XOR!!! cmd=843f8560
06/29/16 19:02:19: EVT#191429-06/29/16 19:02:19: 113=Unexpected sense: PD 02(e0x20/s2) Path 5000c50053c1e929, CDB: 28 00 02 cf fe ca 00 00 36 00, Sense: 3/11/00
06/29/16 19:02:19: Raw Sense for PD 2: f0 00 03 02 cf fe ca 0a 00 00 00 00 11 00 81 80 00 97
06/29/16 19:02:19: DEV_REC:Medium Error DevId[2] devHandle b RDM=80785600 retires=0
06/29/16 19:02:19: DM_PerformSenseDataRecovery: MedErr is for: cmdId=2441, ld=0, src=4, cmd=1, lba=8700dc8, cnt=8, rmwOp=0
06/29/16 19:02:19: -> recoveryChild: ld=0 orgLi=0 recPhysArm=0 badPhysArm=ff doneFun=7 sRef=ff00 eRef=0 recFlags=0
06/29/16 19:02:19: -> RecParent: cmdId=241e, src=7, cmd=2, lba=86ffb98, cnt=168, rmwOp=0, refs=18/7f
06/29/16 19:02:19: ErrLBAOffset (0) LBA(2cffeca) BadLba=2cffeca
06/29/16 19:02:19: EVT#191430-06/29/16 19:02:19: 111=Unrecoverable medium error during recovery on PD 02(e0x20/s2) at 2cffeca
06/29/16 19:02:19: BBMMarkBadBlock: pd=02, pdLBA=2cffeca
06/29/16 19:02:19: EVT#191431-06/29/16 19:02:19: 97=Puncturing bad block on PD 02(e0x20/s2) at 2cffeca
06/29/16 19:02:19: Invalidating Line due to failed XOR!!! cmd=843f77e0
06/29/16 19:02:20: EVT#191432-06/29/16 19:02:20: 113=Unexpected sense: PD 01(e0x20/s1) Path 5000c5007f5273ad, CDB: 28 00 02 cf fb 6d 00 00 13 00, Sense: 3/11/00
06/29/16 19:02:20: Raw Sense for PD 1: f0 00 03 02 cf fb 6d 0a 00 00 00 00 11 00 81 80 00 97
06/29/16 19:02:20: DEV_REC:Medium Error DevId[1] devHandle c RDM=807a0800 retires=0
06/29/16 19:02:20: DM_PerformSenseDataRecovery: MedErr is for: cmdId=246e, ld=0, src=4, cmd=1, lba=870a9c0, cnt=8, rmwOp=0
06/29/16 19:02:20: -> recoveryChild: ld=0 orgLi=0 recPhysArm=0 badPhysArm=ff doneFun=5 sRef=ff00 eRef=0 recFlags=0
06/29/16 19:02:20: -> RecParent: cmdId=2482, src=7, cmd=2, lba=86ff140, cnt=140, rmwOp=0, refs=40/7f
06/29/16 19:02:20: ErrLBAOffset (0) LBA(2cffb6d) BadLba=2cffb6d
06/29/16 19:02:20: EVT#191433-06/29/16 19:02:20: 111=Unrecoverable medium error during recovery on PD 01(e0x20/s1) at 2cffb6d
06/29/16 19:02:20: BBMMarkBadBlock: pd=01, pdLBA=2cffb6d
06/29/16 19:02:20: EVT#191434-06/29/16 19:02:20: 97=Puncturing bad block on PD 01(e0x20/s1) at 2cffb6d
06/29/16 19:02:22: EVT#191435-06/29/16 19:02:22: 113=Unexpected sense: PD 01(e0x20/s1) Path 5000c5007f5273ad, CDB: 28 00 02 cf fb 6e 00 00 12 00, Sense: 3/11/00
06/29/16 19:02:22: Raw Sense for PD 1: f0 00 03 02 cf fb 6e 0a 00 00 00 00 11 00 81 80 00 97
06/29/16 19:02:22: DEV_REC:Medium Error DevId[1] devHandle c RDM=807a0800 retires=0
06/29/16 19:02:22: DM_PerformSenseDataRecovery: MedErr is for: cmdId=246e, ld=0, src=4, cmd=1, lba=870a9c0, cnt=8, rmwOp=0
06/29/16 19:02:22: -> recoveryChild: ld=0 orgLi=0 recPhysArm=0 badPhysArm=ff doneFun=5 sRef=ff00 eRef=0 recFlags=0
06/29/16 19:02:22: -> RecParent: cmdId=2482, src=7, cmd=2, lba=86ff140, cnt=140, rmwOp=0, refs=40/7f
06/29/16 19:02:22: ErrLBAOffset (0) LBA(2cffb6e) BadLba=2cffb6e
06/29/16 19:02:22: EVT#191436-06/29/16 19:02:22: 111=Unrecoverable medium error during recovery on PD 01(e0x20/s1) at 2cffb6e
06/29/16 19:02:22: BBMMarkBadBlock: pd=01, pdLBA=2cffb6e
06/29/16 19:02:22: EVT#191437-06/29/16 19:02:22: 97=Puncturing bad block on PD 01(e0x20/s1) at 2cffb6e
06/29/16 19:02:24: EVT#191438-06/29/16 19:02:24: 113=Unexpected sense: PD 01(e0x20/s1) Path 5000c5007f5273ad, CDB: 28 00 02 cf fb 6f 00 00 11 00, Sense: 3/11/00
06/29/16 19:02:24: Raw Sense for PD 1: f0 00 03 02 cf fb 6f 0a 00 00 00 00 11 00 81 80 00 97
06/29/16 19:02:24: DEV_REC:Medium Error DevId[1] devHandle c RDM=807a0800 retires=0
06/29/16 19:02:24: DM_PerformSenseDataRecovery: MedErr is for: cmdId=246e, ld=0, src=4, cmd=1, lba=870a9c0, cnt=8, rmwOp=0
06/29/16 19:02:24: -> recoveryChild: ld=0 orgLi=0 recPhysArm=0 badPhysArm=ff doneFun=5 sRef=ff00 eRef=0 recFlags=0
06/29/16 19:02:24: -> RecParent: cmdId=2482, src=7, cmd=2, lba=86ff140, cnt=140, rmwOp=0, refs=40/7f
06/29/16 19:02:24: ErrLBAOffset (0) LBA(2cffb6f) BadLba=2cffb6f
06/29/16 19:02:24: EVT#191439-06/29/16 19:02:24: 111=Unrecoverable medium error during recovery on PD 01(e0x20/s1) at 2cffb6f
06/29/16 19:02:24: BBMMarkBadBlock: pd=01, pdLBA=2cffb6f
06/29/16 19:02:24: EVT#191440-06/29/16 19:02:24: 97=Puncturing bad block on PD 01(e0x20/s1) at 2cffb6f
06/29/16 19:02:24: Invalidating Line due to failed XOR!!! cmd=843fdd20
06/29/16 19:09:30: EVT#191441-06/29/16 19:09:30: 113=Unexpected sense: PD 02(e0x20/s2) Path 5000c50053c1e929, CDB: 28 00 02 cf fe 98 00 00 68 00, Sense: 3/11/00
06/29/16 19:09:30: Raw Sense for PD 2: f0 00 03 02 cf fe c9 0a 00 00 00 00 11 00 81 80 00 97
06/29/16 19:09:30: DEV_REC:Medium Error DevId[2] devHandle b RDM=806d6e00 retires=0
06/29/16 19:09:30: DM_PerformSenseDataRecovery: MedErr is for: cmdId=249f, ld=0, src=4, cmd=1, lba=6900c00, cnt=180, rmwOp=0
06/29/16 19:09:30: -> recoveryChild: ld=0 orgLi=0 recPhysArm=0 badPhysArm=ff doneFun=7 sRef=ff00 eRef=0 recFlags=0
06/29/16 19:09:30: -> RecParent: cmdId=24ac, src=7, cmd=2, lba=86ffb98, cnt=168, rmwOp=0, refs=18/7f
06/29/16 19:09:30: ErrLBAOffset (31) LBA(2cffe98) BadLba=2cffec9
06/29/16 19:09:30: EVT#191442-06/29/16 19:09:30: 111=Unrecoverable medium error during recovery on PD 02(e0x20/s2) at 2cffec9
06/29/16 19:09:30: BBMMarkBadBlock: pd=02, pdLBA=2cffec9
06/29/16 19:09:30: EVT#191443-06/29/16 19:09:30: 97=Puncturing bad block on PD 02(e0x20/s2) at 2cffec9
06/29/16 19:09:32: EVT#191444-06/29/16 19:09:32: 113=Unexpected sense: PD 02(e0x20/s2) Path 5000c50053c1e929, CDB: 28 00 02 cf fe ca 00 00 36 00, Sense: 3/11/00
06/29/16 19:09:32: Raw Sense for PD 2: f0 00 03 02 cf fe ca 0a 00 00 00 00 11 00 81 80 00 97
06/29/16 19:09:32: DEV_REC:Medium Error DevId[2] devHandle b RDM=806d6e00 retires=0
06/29/16 19:09:32: DM_PerformSenseDataRecovery: MedErr is for: cmdId=249f, ld=0, src=4, cmd=1, lba=6900c00, cnt=180, rmwOp=0
06/29/16 19:09:32: -> recoveryChild: ld=0 orgLi=0 recPhysArm=0 badPhysArm=ff doneFun=7 sRef=ff00 eRef=0 recFlags=0
06/29/16 19:09:32: -> RecParent: cmdId=24ac, src=7, cmd=2, lba=86ffb98, cnt=168, rmwOp=0, refs=18/7f
06/29/16 19:09:32: ErrLBAOffset (0) LBA(2cffeca) BadLba=2cffeca
06/29/16 19:09:32: EVT#191445-06/29/16 19:09:32: 111=Unrecoverable medium error during recovery on PD 02(e0x20/s2) at 2cffeca
06/29/16 19:09:32: BBMMarkBadBlock: pd=02, pdLBA=2cffeca
06/29/16 19:09:32: EVT#191446-06/29/16 19:09:32: 97=Puncturing bad block on PD 02(e0x20/s2) at 2cffeca
06/29/16 19:09:32: EVT#191447-06/29/16 19:09:32: 113=Unexpected sense: PD 01(e0x20/s1) Path 5000c5007f5273ad, CDB: 28 00 02 cf fb 40 00 00 40 00, Sense: 3/11/00
06/29/16 19:09:32: Raw Sense for PD 1: f0 00 03 02 cf fb 6c 0a 00 00 00 00 11 00 81 80 00 97
06/29/16 19:09:32: DEV_REC:Medium Error DevId[1] devHandle c RDM=80712a00 retires=0
06/29/16 19:09:32: DM_PerformSenseDataRecovery: MedErr is for: cmdId=2449, ld=0, src=4, cmd=1, lba=86f9f10, cnt=c8, rmwOp=0
06/29/16 19:09:32: -> recoveryChild: ld=0 orgLi=0 recPhysArm=0 badPhysArm=ff doneFun=5 sRef=ff00 eRef=0 recFlags=0
06/29/16 19:09:32: -> RecParent: cmdId=2493, src=7, cmd=2, lba=86ff140, cnt=140, rmwOp=0, refs=40/7f
06/29/16 19:09:32: ErrLBAOffset (2c) LBA(2cffb40) BadLba=2cffb6c
06/29/16 19:09:33: EVT#191448-06/29/16 19:09:33: 111=Unrecoverable medium error during recovery on PD 01(e0x20/s1) at 2cffb6c
06/29/16 19:09:33: BBMMarkBadBlock: pd=01, pdLBA=2cffb6c
06/29/16 19:09:33: EVT#191449-06/29/16 19:09:33: 97=Puncturing bad block on PD 01(e0x20/s1) at 2cffb6c
06/29/16 19:09:33: Invalidating Line due to failed XOR!!! cmd=84404b60
06/29/16 19:09:34: EVT#191450-06/29/16 19:09:34: 113=Unexpected sense: PD 01(e0x20/s1) Path 5000c5007f5273ad, CDB: 28 00 02 cf fb 6d 00 00 13 00, Sense: 3/11/00
06/29/16 19:09:34: Raw Sense for PD 1: f0 00 03 02 cf fb 6d 0a 00 00 00 00 11 00 81 80 00 97
06/29/16 19:09:34: DEV_REC:Medium Error DevId[1] devHandle c RDM=80712a00 retires=0
06/29/16 19:09:34: DM_PerformSenseDataRecovery: MedErr is for: cmdId=2449, ld=0, src=4, cmd=1, lba=86f9f10, cnt=c8, rmwOp=0
06/29/16 19:09:34: -> recoveryChild: ld=0 orgLi=0 recPhysArm=0 badPhysArm=ff doneFun=5 sRef=ff00 eRef=0 recFlags=0
06/29/16 19:09:34: -> RecParent: cmdId=2493, src=7, cmd=2, lba=86ff140, cnt=140, rmwOp=0, refs=40/7f
06/30/16 5:21:36: GroupCmds: Fail cmd c=c0f48380 ld=0 start_block=86ff268 num_blocks=28 cmd=1
06/30/16 5:21:36: GroupCmds: Fail cmd c=c0f48380 ld=0 start_block=86ff268 num_blocks=28 cmd=1
06/30/16 5:21:36: GroupCmds: Fail cmd c=c0f48380 ld=0 start_block=86ff268 num_blocks=28 cmd=1
06/30/16 5:21:53: EVT#192397-06/30/16 5:21:53: 103=Rebuild progress on PD 00(e0x20/s0) is 97.94%(32509s)
06/30/16 5:22:47: EVT#192398-06/30/16 5:22:47: 103=Rebuild progress on PD 00(e0x20/s0) is 98.94%(32563s)
06/30/16 5:23:45: EVT#192399-06/30/16 5:23:45: 103=Rebuild progress on PD 00(e0x20/s0) is 99.94%(32621s)
06/30/16 5:23:48: EVT#192400-06/30/16 5:23:48: 99=Rebuild complete on VD 00/0
06/30/16 5:23:48: EVT#192401-06/30/16 5:23:48: 100=Rebuild complete on PD 00(e0x20/s0)
06/30/16 5:23:52: EVT#192402-06/30/16 5:23:51: 114=State change on PD 00(e0x20/s0) from REBUILD(14) to ONLINE(18)
06/30/16 5:23:52: EVT#192403-06/30/16 5:23:52: 81=State change on VD 00/0 from DEGRADED(2) to OPTIMAL(3)
06/30/16 5:23:52: EVT#192404-06/30/16 5:23:52: 249=VD 00/0 is now OPTIMAL
06/30/16 5:23:52: ld sync: 01 unsync'd lds remaining
Event Sequence Number : 192422
Timestamp : 7/23/2016 ; 3:0:0
Event code : 39
Locale : Boot/Shutdown event
Class : Information
Description of the event : Patrol Read started
Argument Type Value: 0
Event Sequence Number : 192423
Timestamp : 7/23/2016 ; 8:34:24
Event code : 35
Locale : Boot/Shutdown event
Class : Information
Description of the event : Patrol Read complete
Argument Type Value: 0
Event Sequence Number : 192424
Timestamp : 7/30/2016 ; 3:0:0
Event code : 39
Locale : Boot/Shutdown event
Class : Information
Description of the event : Patrol Read started
Argument Type Value: 0
Event Sequence Number : 192425
Timestamp : 7/30/2016 ; 17:29:56
Event code : 35
Locale : Boot/Shutdown event
Class : Information
Description of the event : Patrol Read complete
Argument Type Value: 0