Version January 2008
| |
~ Bots lore ~
BOT WRITING, BOT TRAPPING & BOT WARS
Part of searchlore |
This is
a 'living' workshop on bots trapping and reversing, you will find elsewhere on my
site other web searching and data mining
"broad" lore.
As ~S~ deep wrote in his bot-essay:
"There are
many Perl bots available on the
net, but I'm fairly certain that you will not find one
that does exactly what you want.
There's also a "convention" amoung bot writers not to
give bots source code to people who do not
understand them - it's considered irresponsible. Of
course, once you've learned how to
build bots, you can be as irresponsible as you like". This is exact.
Anyway knowledge runs downhill on the web: we will find more knowledge only if we create it ourself at the same time.
Your own contributes and work
are necessary. The material presented here should be more than enough to "get you started" on the bot path. Write your own bots,
publish the code so that others may improve them. Reverse the code of the existing bots.
Awaiting your own contributions...
["Our" essays!]
recent
and old
A good robots.txt
If you do not know what is a robots.txt just leave now
and come back later :)
More (older) stuff
[Iliad fetchbot]
[Juno autoresponder]
[our friends' essays!]
An older introduction - explanation
|
The term
"bot" is, according to DeadelviS, a short for "robot",
which sounds much cooler than "program"As Andrew Leonard
explains, like mechanical robots, bots are guided by algorithmic
rules of behavior - if this happens, do that; if that happens, do
this. But instead of clanking around a laboratory bumping into
walls, software robots are executable programs that maneuver through
cyberspace bouncing off communications protocols. Strings of code
written by everyone from teenage chat-room lurkers to top-flight
computer scientists, bots are variously designed to carry on
conversations, act as human surrogates, or achieve specific tasks -
such as seeking out and retrieving information. And...
bots can also be used as weapons: great fun assured.
These pages of mine may regards all sort of web bots: spiders, wanderers, and worms. Cancelbots, Lazarus,
Automoose. Chatterbots, softbots, userbots, taskbots, knowbots,
mailbots, searchbots. MrBot and MrsBot. Warbots, clonebots, floodbots,
annoybots, hackbots, and Vladbots. Gaybots, gossipbots,
gamebots. Skeleton bots, spybots, and sloth bots. Xbots,
meta-bots. Eggdrop bots... as you can see the terminology is far from
being simple... basically, though, the idea is to allow you to learn enough to WRITE
your own search bots. Information searching through commercial engines -y compris google- is less
and less efficient, alas: it is of paramount importance, for a ~S~, to learn
how to bulid his OWN 'home made and tailored" specific bots. You'll be amazed at the whealth
of really strong signals you'll be able to find among the noise as soon as you use your own bots,
even with imperfect and surely not 100% state of the art, tiny, simple bots...
It's up to you to help us with your own work or, alternatively, to keep what you'll dvelop guarded for yourself:
it is my intention to offer
enough material on this
section to allow anyone to start. Choose the konwledge path or choose the dark path, it's up to you.
I'll NEVER charge money for
accessing my site, hence I ask you the only "money" that's worth something
on this web of ours: knowledge for all.
Contribute with YOUR knowledge:
if you build on other people's shoulders, you should imho offer your
own shoulders for others
to build upon
Hey! I wanna see a real bot in action before joining this section!
Yessir! Here is a good and powerful 'fetchbot', very useful for seekers and searchers alike.
And if you knew nothing of this stuff you'll be fascinated (and even if you
already knew... :-)
You'll now (at once if I were you) approach the
"iliad" Searchbot (a very useful one, btw, was at iliad@algol.jsc.nasa.gov, is now at iliad@prime.jsc.nasa.gov):
Send an email to:
iliad@prime.jsc.nasa.gov
write into the SUBJECT part of your email (into the subject field, duh!):
iliad query
write into the TEXT part of your email (that's your letter, duh!):
?Q: internet bots automated retrieval
(for instance... and you'll -most probably- get quite
a lot of interesting material about
bots from this mighty useful Searchbot... whatt'd'ya say?
If you're stuck email the same address
with the word help both in Subject and
in text (a pretty poor help will you get :-(
The old Juno autoresponder
|
Hey, this is great! I wanna taste another email-bot, just for fun!
Yessir! Please go ahead: have a look at friend autobot:
Send an email to:
autobot@junoaccmail.org
write into the SUBJECT part of your email (into the subject field, duh!):
send index
or if you want to have a laugh at some 'scarecrow' copyright propaganda, write - always
in the SUBJECT field - the following:
send Copyrights
Hey, this is gorgeous! Now, before I start working on my own, let
me please see and touch the code of a "real" bot!
Yessir! Please go ahead: enjoy the following essays!
You'll find here all the code you may
need to start working on your own!
[mhyst_w3s.htm]: W3S: Web Personal Spyder.
by Mhyst, January 2008
"The aim of this document is to put forward the structure and functionality of W3S and, at the same time,
to describe a basic searching web spider.
I hope this essay will bring somebody the possibility of making his own web spider."
Part of the bots section.
[termisearch.htm]: A proof of concept: a pre-search filter/bot.
by fravia+, Mai 2007
Just an example of a possible application of a simple, but effective google's pre-filtering approach
Part of the bots
section (even if it is just a pre-filter and not a bot strictu sensu).
This is here simply done
adding, subtracting or ORring automatically some ad hoc search terms to whatever query you may have.
This is the whole point of this example. Tou don't need to go linguistic. You modify or create your own forms at leisure.
You may want special effective forms in order to search for books or images and just eliminate all those idiot sites that try to
'trap' searchers into advertisement hells or crippled items for zombies and guinea pigs.
Or maybe you want mp3s without having to wade knee-deep into morons trying "to sell"
you those very mp3s (quelle vulgarité!). Or whatever... I'm sure you get the infinite
possibilities now in your own hands :-)
winky_stripper.htm: Winky strips for Yahoo (A Yahoo results stripper)
by Winky ;-), April 2004
A further introduction to the power of python, by Winky.
Part of the bots
section.
...on the board they where
talking about "stripping" and searching html.
Anyway I decided to make one for yahoo, it is primative a learning tool,
but could be expanded to handle next queries etc.
yahoo_stripper.py -> is the actual script itself
clean.html -> is the "output from the script"
To create the "docs" which reside in the html directory just run the
script thru epydoc.
This sort of tactic of stripping webpages is much more effective then
just blindly using regular expressions.
Not for beginners!
Older Essays: how to build your own bots
|
PHASE ONE (16 July 1999)
this essay (perl_es1.htm): Perl@usa.net
~ How to reverse a "free" service has been written
by [blue] in July 1999 for the removing banners section, read and enjoy,
let's hope
you'll write afterwards your own perl-bots and send them here so that others can ameliorate
and give feedback...
PHASE TWO (22 July 1999)
this essay (rt_bot1.htm):The HCUbot: a simple Web Retrieval Bot in Perl has been written
by deep in July 1999, read and enjoy! Let's hope
you'll write afterwards your own perl-bots and send them here so that others can ameliorate
and give feedback...
PHASE THREE (14 September 1999)
this essay (botcgi.htm):Mirbot 1.0:
a very special kind of a Robot has been written
by The Mystical Friend in September 1999, read and enjoy! Let's hope
you'll write afterwards your own perl-bots and send them here so that others can ameliorate
and give feedback...
PHASE FOUR (14 September 1999)
this essay (rt_bot2.htm):The HCUbot (Version 2.0): a simple Web Retrieval Bot in Perl has been written
by deep in July 1999 and updated and ameliorated in September 1999, read and enjoy! Let's hope
you'll write afterwards your own perl-bots and send them here so that others can ameliorate
and give feedback...
PHASE FIVE (21 September 1999)
this essay (sono_bot.htm):spider.r: a handy search tool and intro to REBOL has been written
by sonofsamiam in September 1999, read and enjoy! Let's hope
you'll write afterwards your own rebol-bots and send them over here so that others can ameliorate
and give feedback...
PHASE SIX (March 2000)
[ftpbot1.htm]:
A small ftp fetcher
bot
by DarkWyrm
This bot searches a FTP site for a particular file (in Perl)
PHASE SEVEN (May 2000)
[plbtgrab.htm]:
Source code for a spam bot
(Kevin's spider) (in Perl)
by Kevin Jobson
Automatical link searching
PHASE EIGHT (September 2000)
[scan_reb.htm]:
A simple REBOL scanner
ways to retrieve hidden files, pages, zips, images
by -Sp!ke
Automatical link sniffing
PHASE NINE (October 2000)
Check [mysearch.zip]: ~ 20233 bytes
A search bot in Visual Basic
by Shoki (see [shokiwcd.htm])
PHASE TEN (February 2001)
Check [wf_add.htm]:
Adding engines to WebFerret
by Laurent (The guts of a search engines parser) Advanced
PHASE ELEVEN (April 2001)
[perlbot.htm]:
HOW TO FOOL SSL DOWNLOAD OBSTACLES
(spelunking into https "secure" servers)
by DigJim,
Very Advanced essay
PHASE TWELVE (April 2001)
- [lexi_wot.htm]:
the lexibot essay (600 engines for next to nothing - part ONE - first steps) by WayOutThere,
Advanced essay, part of the [bots],
and of the [Essays] sections.
- [lexi_lau.htm]:
the lexibot essay (600 engines for next to nothing - part TWO - delving deeper) by Laurent,
Advanced essay, part of the [bots],
and of the [Essays] sections.
PHASE THIRTEEN (April 2001)
- [cope_wot.htm]:
Reversing to Enhance and Expand (754 engines into the pot)
by WayOutThere,
Advanced essay, part of the [malware],
and of the [Essays] sections.
PHASE FOURTEEN (Mai 2001)
- [dolmen_1.htm]:
A PHP reformater for the PALM
by Dolmen, "So it was time for me to build a bot that, at
the AvantGo proxy request, will download the original page, extract the
info, and give it to AvantGo formatted
with a basic layout...", part of the [bots],
and [essays] sections.
- [dolmen_2.htm]:
Java Bots introduction
by Dolmen, "Here is a simple bot that downloads a page from the URL given on the command line
and outputs it preceded by the HTTP headers (if it is an http:// URL).", part of the [bots],
and [essays] sections.
PHASE FIFTEEN (Mai 2001)
- [bullseye.htm]:
Hitting The BullsEye
by CiNiX, "I took a little peek in the 'hidden' engine
directory and found about 897 engine files, mucho
interesting information for the people that work on the oslse project!",
part of the [bots],
and [essays] sections.
PHASE XVI (February 2002)
-
A "flamebot": A perl script that makes automatic replies to one or more usenet posters of
your choice
by Dr. Flonkenstein
part of the [Trolling lore]
and of the [bots lore]
PHASE XVII (September 2004)
-
[phpregexspider.htm], by Frank Mitchell
Learn how to write your own bots! Here you have a PHP web-spider, and a well commented one!
"It was in this sort of scenario that I found myself,
and it was from here that my spider was born. It's job: harvest a list of user names from a website"
Part of the bots
section and of the PHP Lab.
PHASE XVIII (November 2004)
-
[mhyst_del.htm]:
Deliverer: Distributed Framework for Bots
by Mhyst
Mhyst offers here, to every reader, a quite powerful 'queuer' bot, a sort of 'sandwich' for java small snippet bots that you may want
to write yourself and that
deliver will schedule and send around. This is not an essay for newbyes.
Sourcecode is provided.
I am confident that whomever will take the time to study (and implement) this interesting java bot will for sure
learn quite a lot of things about (java) bots. I hope that your feedback will flow, and that this first step will span
a lot of new small searching bots.
Part of the bots
section.
Friends' essays: how to build your own bots (www::search modules in perl)
|
-
[bot-block.php.txt]
Alex Kemp's BLOCK BAD BOTS & anti-scrapers routine (in PHP) -- April 2006
In fact, SEOs spammers may even produce something useful. Alle Achtung Alex for your BBB routine!
Not for beginners!
- [nbbw.c]: Scroogle Scraper
"Google provides a version of their main index to the public that is
free of ads. This is at www.google.com/ie on all of their data centers.
It is apparently used by some versions of Explorer for some feature.
The point is, Google provides this for public use"
by Daniel Brandt, Public Information Research, January 2005, Part of the bots section.
-
[test_for_ip_blocks.php.txt]
Hanu's test_for_ip_blocks routine (in PHP)
Not for beginners!
Advanced ~ For programmer-seekers
only.
Started in July 2000
If you are
really serious about advanced searching the following www::search modules in perl, which
are constantly tuned to return results from the
major search engines and news indexes, are
a (basic) MUST READ. It is difficoult to underestimate how important this stuff is for
any Seeker. Believe me, the time you'll invest reading their code will be WELL SPENT...
oh yeah, actually
very well spent. Pay special attention to some of the comments to the code :-)
Feedback with
your own bots built on this stuff is mandatory.
[AdvancedWeb.pm.txt]
by Jim Smyser & USC/ISI, v 2.02 2000/04/04
[AlltheWeb.pm.txt]
by Jim Smyser v 1.4 2000/04/03
[AltaVista.pm.txt]
by John Heidemann v 1.6 2000/05/03
[Deja.pm.txt]
by Martin Thurn v 1.2 2000/02/24
[Dejanews.pm.txt]
by Martin Thurn v 1.25 2000/06/23
[Excite.pm.txt]
by Martin Thurn v 1.24 2000/06/19
[Google.pm.txt]
by Jim Smyser v 2.20 2000/06/08
[HotBot.pm.txt]
by Wm. L. Scheding and Martin Thurn v 1.58 2000/06/26
[Infoseek.pm.txt]
by Martin Thurn v 1.26 1999/12/10
[Lycos.pm.txt]
by Wm. L. Scheding and Martin Thurn v 1.18 2000/06/15
[Magellan.pm.txt]
by Martin Thurn v 1.19 2000/05/22
[NorthernLight.pm.txt]
by Jim Smyser, v 2.06 2000/06/16
[Opendirectory.pm.txt]
by Jim Smyser, v 1.4 2000/02/04
[Yahoo.pm.txt]
by Wm. L. Scheding and Martin Thurn, v 1.29 2000/05/10
Conclusion
Ehm, yes, you should learn [Perl] , by all means... but if you prefer to learn
[PHP] instead, all these classes
can be easily ported there, of course...
Err, this could be quite useful as well, come to think of it :-)
[X-Search.pl.txt]
by Jim Smyser, v 1.06 2000/06/14
Helping hands needed!
Enjoy!
Just take your time, there is no
hurry whatsoever,
reverse some of the script above, understand their working,
try some slightly different models... implement your
own ideas... and finally write some (good) essays on this stuff
yourself!
(c) 1952-2032: [fravia+], all rights reserved