[ahimsa-tech] [IMC-Tech] failed/reset raid controller in disko
jeff moe
jeff at indymedia.org
Sat Nov 11 14:13:00 PST 2006
jeff moe wrote:
> One of the RAID controllers reset in disko, the main ahimsa* NFS server, today.
> It /may/ just need a reboot or something to clear it out.
>
> So at this moment, disko is running on one stripe of the RAID-10. A drive
> failure now would be catastrophic. Some details of the setup:
> http://lists.indymedia.org/pipermail/imc-tech/2006-March/0308-yk.html
>
> Again, everything is still working and running, but their is no failover
> capability. Basically there are 12 drives in the setup--two pairs of 6 that
> mirror each other with each pair on it's own controller. Right now the system
> is running on one controller.
>
> I am syncing non-mir sites from disko back to jojojojojojo, where there are
> syncs from a few months ago. I would like to prioritize this in the order of
> "most needed", so if you /do/ have a recent backup of your site, let me know so
> I can make sure I get backups of the others going first.
>
> When the jojojojojojo rsync/backup finishes, we will likely reboot and power
> down the server to reset the card. When this is done we will rebuild the disko
> RAID.
Ok. So the box actually had switched to read-only mounts. I didn't know this
until I came into IRC today.
On reboot the controller cards appeared fine in the BIOS. The system did a
`fsck` which didn't require any "y"s, which is good, but then when it came back
online it gave this error:
EXT3-fs error (device md6): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only
Youch. Anyway, I re-mounted it r/w and that appears fine for now, but we're on
some dodgy ground right now. I'm waiting for micah or someone who really knows
md before i move forward with rebuilding the degraded array.
-Jeff
More information about the ahimsa-tech
mailing list