Introduction The importance of specifity and of local & specialised searching
The importance of getting to your signal 'behind' the main search engines should be
quite evident. You
can obviously go regional, visit
usenet,
use your own bots
or many other tools, but you will
also find various possibilities to query specific search engines on the web.
Of course
each big site may (and probably will) have also some
specific 'hyper-local' search engine, for www.searchlores.org we have for instance
our amazing "namazu" search engine: [search @ fravia]
So-called
"local" (or specialised) search engines are extremely important to fetch info. Freepages
providers, counters,
stats applications and all other finalized search engines...
that you can use at leisure... for fun, study purposes, "strafing", slowbombing
and/or knowledge profit. This page is just a proof of concept meant only to give you an idea, a scent, of
the many 'local' treasures you can and will dig out on the Web.
Files and images repositories (by Kane and Fravia+)
§
This section is obsolete. For a more recent treatise of these matters, confer
files repositories
Searching files repositories can be VERY useful, as an example try the following (banal) "bookz" search: rapidshare O'Reilly
Here a short list of files and images repositories, note also how using the names of these repositories as queries
you'll discover many a juicy messageboard...
http://www.filedepartment.com/:
(Does not seem very trustworthy):
Up to 1000 downloads per file per day for FreeShare file,
Up to 5000 downloads per file per day for TotalShare file,
Up to 5 files for upload per day,
Up to 50 mb for upload per file,
10 seconds of time limitation for FreeShare file,
5 seconds of time limitation for TotalShare file,
Auto file deletion after 10 days since last download.
§
This section is obsolete. For a more recent treatise of these matters, confer
deep web searching
The main search engines cover -at best- one third of the web.
This is due to the fact that
a lot of information on the web is not actually sitting on the
web as we know it. Data are stored in
formats other than HTML, and as such cannot be indexed by
spiders and crawlers. Though
many databases can be searched through a internet html-interface, the data
themselves sit outside the web, and are linked "on the fly" via PHP, PERL, Javascript,
CGI or any
other scripted programming language you can fancy. This is true for the result pages of the
search engines themselves, btw:
every time you perform a
query the search engine send back
SERPs to your browser, formatted into HTML on the fly by a scripted program. Those specific SERPs you
called into html-life did
not exist before and will
disappear again the moment you
move on: redoutable powers of the nethervoid.
The 'invisible web' is thus made mostly out of huge commercial and non-commercial (scientific, educational, legal)
databases. Moreover there are specific databases that are not fully indexed:
repositories of
phone numbers; products catalogs; lairs of e-mails, credit card numbers & addresses; collections of
dictionaries, vocabularies, thesaurii,
electronic books and journals, bulletin boards, mailing lists, and so on.
Some of these 'invisible' databases are password protected. Obtaining access
could involve a moderate amount of work.
[Complete planet]
"Discover over 70,000+ searchable databases and specialty search engines"
Topics:
Agriculture Games & Hobbies Military Religion
Arts & Design Government Music Science
Business Health News Search Engines
Computing & Internet Home & Garden Newspapers Shopping
Education Humanities People Social Sciences
Energy Jobs & Careers Places Sports
Engineering Law Politics Transportation
Environment Literature Products & Technology Travel
Family Living things Recreation Weather
Finance & Economics Magazines & Journals References
Food & Drink Media & Entertainment Regional
[Info Mine]
"Infomine is a virtual library of Internet resources relevant to faculty, students, and research staff at the university level.
It contains useful Internet resources such as databases, electronic journals, electronic books, bulletin boards, mailing lists, online library card catalogs, articles, directories of researchers, and many other types of information.
Infomine is librarian built. Librarians from the University of California, Wake Forest University, California State University, the University of Detroit - Mercy, and other universities and colleges have contributed"
[Academic Info]
"Academic Info is an online subject directory of over 25,000 hand picked educational resources for high school and college students as well as a directory of online degree programs"
Don't underestimate smut-search engines as
learning tools! Go to the Webmasters / chatboard & lounges part of
this smut search engine in order to learn how these idiots are
trapping lusers to their ads-infernos...
Don't underestimate smut-search
engines as learning tools! ...
Acronim finder
Type in the acronym to search for without periods or quotes (example:
ASAP)
Cut and paste following example (and edit):
http://www.acronymfinder.com/af-query.asp?Acronym=ASAP&String=exact&search.x=63&search.y=9
OLGA Search Engine (courtesy of P.Cook)
If searching for a
song or a band, simply enter the name. Eg 'dont think twice' 'pearl
jam'. If searching for an individual artist, (or a band named after
an individual artist) enter the last name (and then, if you want, the
first name.) Eg 'cohen leonard' 'piaf edith' 'conte paolo'.
The following three russian pointers have been ported here
from a closed pagetool of mine: they are quite useful in order to find a lot
of interesting stuff (and Bilibin's images). Enjoy!
How important this can be! A thousand "voluntary operators" at your disposal!
See also [The importance of Webrings for combing purposes]. You'll use this approach
again and again once you'll have
understood the limits of the 'main' search engines :-)
World of webrings
http://www.webringworld.org/:
World of Webrings
"This web site is the result of the collaborative effort of some experienced ringmasters engaged
in the webring community"
The RingSurf Directory search engine is SLOW, be patient...
Yahoo's webrings
Yahoo has 'fagocitated' various webring centrals, but you can still
search Yahoo's webrings!
Crickrock rings
http://www.crickrock.com/ringlist.html 'crickrock' rings:
"Face it, Yahoo! screwed you and your webring when they acquired webring.org. Move to CrickRock an d
solve your problems"
Recall is, strictly speaking,
no real search engine. But it gives you the possibility to check the
spreading and the fortunes of a given 'term' on the web.
So it is, after all a local search engine.
An interesting trick (by sonofsamiam) in order to rank, for instance,
all geocities pages
(are they really 132150?) through the following querystring:
http://us.geocities.yahoo.com/search?p=a+b+c+d+e+f+g+h+i+j+k+l+m+n+o+p+q+r+s+t+u+v+w+x+y+z+_+0+1+2+3+4+5+6+7+8+9&o=o&h=s
Note the rank order... and note that we loose only relatively few pages
(less than 4%!) if
we omit the numbers
in our searchstring:
http://us.geocities.yahoo.com/search?p=a+b+c+d+e+f+g+h+i+j+k+l+m+n+o+p+q+r+s+t+u+v+w+x+y+z&o=o&h=s
this "number rarity" (only 7% of the pages: 9242) proves once more how PREDICTABLE the
stringpatterns are inside webpages...
Fortunecity is a collection of free pages, launched in
November 1996, transformed from a UK-based (London) company into a
full-fledged international
operator with main strongholds in UK, Sweden and Germany.
Bought recently
France's Citeweb. 3,5 million sites, growing.
The search engine is an Inktomi-type. Geocities pages are often used as 'repositories' for
files you may be searching for (often with faked endings to avoid
sysads snooping): mp3 files abound.
Directory of Open Access Journals http://www.doaj.org/:
Directory of Open Access Journals.
This service covers free, full text, quality controlled
scientific and scholarly journals. We aim to cover
all subjects and languages. There are now 2209 journals
in the directory. Currently 604 journals are searchable
at article level. As of today 95820 articles are included in the DOAJ service.
http://www.itcompany.com/inforetriever/: Internet library for librarians,
"A Portal Designed for Librarians to Locate Internet Resources Related to Their Profession", very americanocentric.
CUI: Computer Science Library: (Centre Universitaire d'Informatique, Uni Genève),
"This database lists all the publications in the CUI's Computer Science library."