[Deskaheh-sysadmin] Deskaheh: The daily crash
john milton
john at johnmilton.ca
Mon Jan 31 10:28:45 PST 2005
So: on Thursday, Friday, and Saturday of last week Deskaheh "crashed"
i.e. was not serving content, at about noon Eastern Std. Time, our peak
load period. We made it through Sunday o.k., and today we had another
one at about 11:00 EST, brief though, only lasted for a few minutes.
Pattern was the same: Server load spikes from about 80% to 600% (the
big one a few days ago saw load spike to 1200%), "free swap" drops from
900K to zero, HTTPD process count drops to zero.
Seems as if our collective band width load has dropped somewhat, folks
may have left for a while as a result of our almost week long outage
last week, but we can assume (hope...) that they will be drifting back
and with them the magnitude of the crash problem.
My assumption from last week, that this is a result of a T1 bandwidth
overload, is still unconfirmed by hard data and I don't have the
capability of doing that.
Do folks perceive this as a thing that we need to "fix", which would I
assume mean increasing bandwidth (say by going from a T1 to 2 T1's or a
T3) or reducing load i.e. moving sites to other servers. Or is it
something we want to live with? "Deskaheh is out for lunch, please
'become the media' later"
The urgency of this question is also dependent on if our load is also
causing service interruption to the servers of our our hosts at the ACLU
who are very kindly giving us the space on their T1 line, which is
another thing I don't know...
Comments?
--
Peace: John Milton
Email: john at johhnmilton.ca
Web: johnmilton.ca
PGP encrypted mail is welcome, my key is here:
http://www.hwcn.org/~aa492/security.htm
More information about the Deskaheh-sysadmin
mailing list