[mir-coders] [Fwd: Re: [imc-uk-tech] rsync replacement - sorta working]

mish mish at aktivix.org
Sun Nov 19 05:05:22 PST 2006


Email that isn't showing up fully in the archives ...

-------- Forwarded Message --------
> From: Zak <zak at riseup.net>
> To: imc-uk-tech at lists.indymedia.org
> Subject: Re: [imc-uk-tech] rsync replacement - sorta working
> Date: Sat, 18 Nov 2006 19:15:54 +0000
> 
> yossarian wrote:
> 
> > Zak wrote a script that can be put on
> > the mirrors to pull only the necessary content to the mirrors.
> 
> ... building on what mish had already written :)
> 
> The current version is in Mir CVS at scripts/mirror-scripts/update.pl
> 
> Typical usage would be:
> 
>   update.pl --remoteroot=https://publish.indymedia.org.uk \
>             --workingdir=/var/www/www.indymedia.org.uk
> 
> But try --help for further options. In particular --lastupdate allows
> you to specify a time to start from (by default it will start from "last
> midnight" if it hasn't been run previously).
> 
> More eyes welcome on this, especially from a security point of view --
> I've prevented it mirroring paths with ".." in them, which would be an
> obvious exploit if the mirroring user has access to other parts of the
> filesystem, and it doesn't invoke shell processes anywhere, but there
> may be other issues I haven't spotted.
> 
> >From an efficiency point of view, the script tries to be sensible about
> not re-fetching things (including change lists) if they haven't been
> modified since they were last fetched. All mirrored files have their
> mtimes set from the Last-Modified header sent by the server, and when
> requesting a file that already exists on the mirror, its mtime is sent
> in an If-Modified-Since header. This means that the upstream server
> should produce a 304 (Not Modified) response when appropriate, rather
> than repeatedly sending old content.
> 
> One improvement that could probably be made would be to use request
> pipelining so that individual request/response round trips aren't
> required for each new file, which would increase throughput under
> high-latency conditions. (The script already enables HTTP keep-alive to
> avoid the overhead of renegotiating the TCP -- and possibly SSL --
> connection for each request.)
> 
> 
> > * we should be able to get content to the mirrors a lot faster.  Not
> > sure how fast, maybe the mirrors could check once every two minutes or
> > something - Zak or Zapata, is this practical?  Faster? Slower?
> 
> Certainly we should be able to distribute content faster than at
> present, as it's an inexpensive GET request/304 response to find out
> nothing has changed. Exact figures will probably have to wait until
> we're testing it on a live site though.
> 
> 
> > * the new code also handles file deletions properly (I think this is the
> > case - can somebody confirm this?).
> 
> It certainly should on the mirror side, but then so does rsync if you
> give it the --delete option (some sites do, some don't). I've only seen
> the Mir side tested on a UK-based install, which never does file
> deletion anyway AFAIK.
> 
> 
> > I am not sure when this code can be deployed, it probably needs a bit
> > more testing, but I think it is basically working.
> 
> As soon as we're satisfied that the Mir side isn't going to break
> anything (even if there may be bugs in the output it generates) I
> suggest we get it running on at least one of the major sites on Traven
> (eg UK) so that we can be testing the mirror script on a more
> realistically loaded site, and also periodically checking for any
> modified files that Mir has failed to report correctly.
> 
> 
> Zak.
> _______________________________________________
> imc-uk-tech mailing list
> imc-uk-tech at lists.indymedia.org
> http://lists.indymedia.org/mailman/listinfo/imc-uk-tech
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://lists.indymedia.org/pipermail/mir-coders/attachments/20061119/979ebc83/attachment.pgp 


More information about the mir-coders mailing list