on strategies for automoderation

Prasad Chodavarapu (prasad@acm6.me.uiuc.edu)
Sat, 17 Feb 1996 14:41:53 -0600 (CST)

i did read the soc.religion.vaishnava FAQ.
as suresh pointed out, they r being very successful
in their efforts. however, their bot takes care
of much more than cross-postings. one cannot
post more than 4 articles in a day, the subject
and keyword headers must contain atleast one
recognised keyword(they have a huge list of them),
the length of the quoted text in followup articles
cannot exceed 2/3rds of the total length etc.
some of these just can't be adopted for scit
e.g keyword check is impossible. however,
limiting no. of posts per day from a single a/c
to 5 or 6 can be considered.

on prevention of flooding and spamming:
--------------------------------------

cross-posting: posting the same article to a
number of groups by including
them in the To header.

spamming: posting the same article separately
to a number of groups.this can be
done manually or by shell scripts.

flooding: posting the same article a number of
times to the same group.

i was surprised to find well documented
FAQs on preventing these forms of net abuse.
one such is the news.admin.net-abuse FAQ
found at http://www.bluemarble.net/~scotty/acena.html
bapa rao gari assertion that spamming is frowned
at by the system administrators seems true.

auto-moderation can take care of all these
abuses. i suggest the following strategy.

cross-posts: just check the header line.
reject if more than one is mentioned.

spamming: here is how to detect and measure spamming.

[from news.admin.net-abuse FAQ]
3.5) How can I tell how many newsgroups an article was posted to?

For people who can't use the classic "grepping the newsspool" method,
nn or nngrab may be able to help. (The following is adapted from a
posting by Lee Rudolph--thanks.)

You can force the Unix newsreader nn to ignore your .newsrc and create
a "merged newsgroup" consisting only of articles containing a certain
word in their subject line. For instance, to gather all articles at
your site containing the word "spam" in their subject line, use this
command:

% nngrab spam

That's basically a faster version of

% nn -i -s"spam" -mXx

Caution: this latter method can be a long, tedious process. See the nn
man page for more details.

3.2) What is the Breidbart Index (BI)?

The Breidbart Index (BI) is a measure of the breadth of any
multi-posting, cross-posting, or combination of the two. BI is defined
as the sum of the square roots of how many newsgroups each article was
posted to. If that number approaches 20, then the posts will probably
be cancelled by somebody.

For instance, four identical posts to nine newsgroups each (4 times 3)
has a BI of 12. However, nine identical posts to four newsgroups each
(9 times 2) has a BI of 18.
[END extract from news.admin.net-abuse FAQ]

flooding: this can be easily prevented by checking the
subject line and From header. also, limiting
the number of posts per a/c per day to 5 or 6
will reduce flooding to a large extent.

so, auto moderation fails only when someone keeps
posting just 4 to 5 irrelevent posts per day, without
cross posting, spamming and flooding. i guess that is
a satisfactory solution for now.

prasad