[ahimsa-tech] [IMC-Tech] failed/reset raid controller in disko

jeff moe jeff at indymedia.org
Sat Nov 11 14:13:00 PST 2006


jeff moe wrote:
> One of the RAID controllers reset in disko, the main ahimsa* NFS server, today. 
> It /may/ just need a reboot or something to clear it out.
> 
> So at this moment, disko is running on one stripe of the RAID-10. A drive 
> failure now would be catastrophic. Some details of the setup:
> http://lists.indymedia.org/pipermail/imc-tech/2006-March/0308-yk.html
> 
> Again, everything is still working and running, but their is no failover 
> capability. Basically there are 12 drives in the setup--two pairs of 6 that 
> mirror each other with each pair on it's own controller. Right now the system 
> is running on one controller.
> 
> I am syncing non-mir sites from disko back to jojojojojojo, where there are 
> syncs from a few months ago. I would like to prioritize this in the order of 
> "most needed", so if you /do/ have a recent backup of your site, let me know so 
> I can make sure I get backups of the others going first.
> 
> When the jojojojojojo rsync/backup finishes, we will likely reboot and power 
> down the server to reset the card. When this is done we will rebuild the disko 
> RAID.

Ok. So the box actually had switched to read-only mounts. I didn't know this 
until I came into IRC today.

On reboot the controller cards appeared fine in the BIOS. The system did a 
`fsck` which didn't require any "y"s, which is good, but then when it came back 
online it gave this error:
EXT3-fs error (device md6): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only


Youch. Anyway, I re-mounted it r/w and that appears fine for now, but we're on 
some dodgy ground right now. I'm waiting for micah or someone who really knows 
md before i move forward with rebuilding the degraded array.

-Jeff


More information about the ahimsa-tech mailing list