[IMC-Tech] Preventing HTTP based spam by Captchas and content filters

Alster alster at indymedia.org
Tue Jan 31 22:35:15 PST 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi!

I've got a couple additional notes on this.

Simon Shine schrieb:
> MJ Ray skrev:
>> Summary:

Thanks for the sum up.

>>   image captchas - locks out users with vision problems

see 'audio tests'

>>   cookies in form - locks out users with secured browsers

Personally, I prefer websites setting cookies (and I will allow them to
do so) as I know this allows for storing my unique ID information (which
is needed to reidentify my session) in a more secure way than having it
passed via GET or POST. I have my web browser setup to delete all
cookies whenever I exit it, and I do this regularly (a couple times a
day). This way, I can profit from the more secure (read: less insecure)
storage of the UID info in a cookie while making sure no cookies can
really be used to track my usage for a for more than a couple hours.

In other words: Cookies *can* be a privacy issue, those of bad
applications (containing login information or never ending sessions)
even a security risk, but you can gain more by using them (in a sane
way) than by not doing so. I.e., considering use of cookies a security
issue itself is nonsense.

>>   javascript in form - locks out users with secured browsers

Despite that, the (assumed - the server can just be guessing here)
availability of javascript prooves pretty much nothing, and tests both
conducted and evaluated client side do neither.

>>   audio tests - probably locks out even more
> 
> and old/alternative/non-feature-rich browsers...

Two notes:
1. Audio captchas were never (that is by me) meant as an alternative to
image captchas but as an addition to them. If seen this way it makes no
sense to discredit them for being as unusable as image captchas as one
should only make a statement on the use of the combined use of image and
audio captchas.

2. I do not understand why people suggest that you cannot use a .wav
file if it cannot be played back by a browser plugin. If you cannot play
it back inyour web browser it will just be downloaded, and after that,
you can still listen to it using your favourite audio player. And audio
players incapable of playing back wav files are probably rare (I hope).

Nevertheless, and as stated before, I'd personally discourage the use of
captchas altogether.

>>   blank field - can't see a problem with it
> 
> This would be really good. Are the bots actually filling out random
> forms with random data, or do they attack sites because they run a
> certain codebase that works in a certain way?

All that I know of are crafted for or adjusted to a certain codebase.
Text browsers as well as the special HTML instructions used to hide the
form (which may be identified by spamware) pose a problem to this approach.

>>   Q&As - needs care, but I expect no better than blank field
>>   spamvertised blacklisting - can't see a problem with it
> 
> Centralized blacklists can pose power problems, and having local
> blacklists might not be efficient enough. Also, there is the threat of
> false positives if people spam from public Internet such as libraries.

Originally, this was not about blacklists of spammers' IPs, but about
blacklists listing spamvertized websites, i.e. partial/wildcarded or
fully qualified domain names / URIs. In this case, public access points
should not pose a problem unless they allow you to run a customizable
server. ;-)

>>   scrambled field names - may hinder legitimate posting tools
> 
> What is the difference between "scrambled field names" and "blank field"
> precisely? Aren't they both basically about bots filling out data that
> shouldn't be filled out, and humans not even being presented with the
> choice, thus capturing those suspiciously entering data into fields that
> are for example invisible?

To my undertsanding, this is an attempt to fool generic spamware which
attempts to guess the content it needs to put into an input field based
on the field name specified within the <input> tag via the 'name' and
'title' attributes. In current HTML, such input tags are usually
preceeded by an additional prompt such as "Enter your name: ".

As it it assumed that generic spamware will only be able or try to
examine the field contents based on the <input> attributes, but the user
will do so based on the preceeding prompt, attempting to trick generic
spamware into submitting values causing a type mismatch seems to be the
approach in this case.

As such, the 'blank field' approach is different but similar. Both seem
to have one thing in common, though: they are defeated by codebase
customized spamware.

>>   posts per time limit - can't see a problem with it
> 
> This method is good because of few false positives, but bots can still
> predict it - especially if a certain delay becomes part of the codebase.

... and the code is open source, which will likely be the case when
talking of IMC software. Obviously, this won't defeat all spamware, but
all of those which do not have an option for a delay, and all of those
where the spamware user is too lazy to look up and configure the delay
and most of those which are generic. This can be somewhat compared to
email greylisting I think: a lot of spam will be filtered out by
default, but some (which should not) will always make it through.

Alster
- --
GPG key
http://keys.indymedia.org/cgi-bin/lookup?op=get&search=05059C17
Fingerprint    1B8B 128F 8435 541C B3A5 1B7E CF5A 9D55 0505 9C17
All other      http://docs.indymedia.org/view/Main/AlsteR
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2 (GNU/Linux)

iD8DBQFD4Fajz1qdVQUFnBcRAoz1AJ46O8+CtSuRNEBN/rbN0VF3zZUEDACggOY7
fBwd8uwvvNOkXfyEZq4gRB0=
=2m8T
-----END PGP SIGNATURE-----



More information about the imc-tech mailing list