[Deskaheh-sysadmin] Deskaheh: The daily crash

john milton john at johnmilton.ca
Mon Jan 31 10:28:45 PST 2005


So: on Thursday, Friday, and Saturday of last week Deskaheh "crashed" 
i.e. was not serving content, at about noon Eastern Std. Time, our peak 
load period. We made it through Sunday o.k., and today we had another 
one at about 11:00 EST, brief though, only lasted for a few minutes. 
Pattern was the same: Server load spikes from about 80% to 600%  (the 
big one a few days ago saw load spike to 1200%), "free swap" drops from 
900K to zero, HTTPD process count drops to zero.

Seems as if our collective band width load has dropped somewhat, folks 
may have left for a while as a result of our almost week long outage 
last week, but we can assume (hope...) that they will be drifting back 
and with them the magnitude of the crash problem.

My assumption from last week, that this is a result of a T1 bandwidth 
overload, is still unconfirmed by hard data and I don't have the 
capability of doing that.

Do folks perceive this as a thing that we need to "fix", which would I 
assume mean increasing bandwidth (say by going from a T1 to 2 T1's or a 
T3) or reducing load i.e. moving sites to other servers. Or is it 
something we want to live with? "Deskaheh is out for lunch, please 
'become the media' later"

The urgency of this question is also dependent on if our load is also 
causing service interruption to the servers of our our hosts at the ACLU 
who are very kindly giving us the space on their T1 line, which is 
another thing I don't know...

Comments?
-- 
Peace: John Milton

Email: john at johhnmilton.ca
Web: johnmilton.ca

PGP encrypted mail is welcome, my key is here: 
http://www.hwcn.org/~aa492/security.htm



More information about the Deskaheh-sysadmin mailing list