[Deskaheh-sysadmin] Re: [IMC-Tech] Deskaheh: The daily crash
jeff
jeff at indymedia.org
Mon Jan 31 10:45:13 PST 2005
john milton wrote:
> So: on Thursday, Friday, and Saturday of last week Deskaheh
> "crashed" i.e. was not serving content, at about noon Eastern
> Std. Time, our peak load period. We made it through Sunday
> o.k., and today we had another one at about 11:00 EST, brief
> though, only lasted for a few minutes. Pattern was the same:
> Server load spikes from about 80% to 600% (the big one a few
> days ago saw load spike to 1200%), "free swap" drops from
> 900K to zero, HTTPD process count drops to zero.
>
> Seems as if our collective band width load has dropped
> somewhat, folks may have left for a while as a result of our
> almost week long outage last week, but we can assume
> (hope...) that they will be drifting back and with them the
> magnitude of the crash problem.
>
> My assumption from last week, that this is a result of a T1
> bandwidth overload, is still unconfirmed by hard data and I
> don't have the capability of doing that.
>
> Do folks perceive this as a thing that we need to "fix",
> which would I assume mean increasing bandwidth (say by going
> from a T1 to 2 T1's or a T3) or reducing load i.e. moving
> sites to other servers. Or is it something we want to live
> with? "Deskaheh is out for lunch, please 'become the media'
> later"
>
> The urgency of this question is also dependent on if our load
> is also causing service interruption to the servers of our
> our hosts at the ACLU who are very kindly giving us the space
> on their T1 line, which is another thing I don't know...
>
> Comments?
If the server is basically crashing, it doesn't sound like it's
due to running out of bandwidth. If it was out of bandwidth,
things would be slow, but the box wouldn't be foo.
We saw something like this (where the box runs out of ram) on
ahimsa-web1 in December or so. It looked very much like some
sort of memory DoS, but I couldn't pinpoint it. The system
would go from having plenty of free RAM/swap, to having /no/
free ram/swap in less than a minute.
Turning off swap 100% fixed the problem. If you have enough RAM
in the box (say 256M or so) you should be able to run fine
without swap.
Do you have mrtg graphs of bandwidth usage? Also, is the ACLU on
the same box or just the same network? Is their site accessible
when yours isn't? If your box is completely crashed, their
sites on a separate box should be fully accessible. If they
aren't, perhaps there's a bandwidth DoS.
If you want, we could mirror the site on ahimsa* and do
round-robin DNS to share the load. Or if it's running software
that can't be mirrored very easily, we could do squid.
-Jeff
More information about the Deskaheh-sysadmin
mailing list