Disk failures are very common in storage environment and as a storage administrator we come across this situation very often, how often that depends how much disks your storage systems is having. More disks you manage more often you come across this situation.
Disks that are to be replaced , are depending on a cool NetApp AutoSupport feature, where an IT engineer can use the auto support to claim a new drive. Netapp opens a ticket and sends a new drive from the local representative almost within business hours.
Now first let us see how to find out if the drive has failed. For this blog post I have used a testing simulator and simulated a failed drive. I have use a spare drive, and not the one that is used for data or parity. First visually we can see that the drive has an orange Amber started on the LED notification physically on a filer.
Let us see the output from the disk show command >>>
We can also look at the Volume status using the vol status -f command >>>
The status of the drive is admin removed for one reason, I have done this to simulate a failed drive.
The drive that we are interested in is named: v4.16 If the LED is not lighting on the failed drive we can issue a few command in the advanced mode to see the drive.
priv set advanced
led_on <disk id identified above>
led_off <disk id identified above>
Now we can replace the old drive with the new on and wait for about two minutes for synchronization before inserting the new one. When new drive is in place run the following command to check whether the ID of the disk you have just fitted is owned or not.
disk show -n
If disk auto assign is enabled it’ll be assigned to the FILER head which had the failed disk, if not you will have to do it manually.
disk assign <disk id> in our case the disk id is v4.16
If it won’t accept the command, it might have been auto assigned to the wrong controller/system. You can clear the assignment from the disk using the following command then try again.
disk assign <disk id> -s unowned -f
The replaced disk will now be assigned as a spare disk to replace the spare which was used when the original failed. You can check Status of this using following command:
aggr status -s
We should then check if the disk autoassign feature is on. In this test scenario it is on by default.
This completes this little tutorial on how to change a failed drive in NetApp filer.