essays |
---|
Old friend, I have been busy. I've written several new items which may contribute to our community. Please feel free to post them on your site as you see fit... |
Google has a wild side and you didn't know it! Search Engines that do More, Part One: Vivisimo Wisenut vagaries Search Engines that Do More, Part Two: Teoma The NEAR Command Searching With Fuzzy Logic: The ADJ Command |
Usually search engines allow you to use an asterisk for wildcard searching. This means that looking for manage* will find instances of manager, manages, managed, managing, and so on. Google has taken the position that their search is so vast and accurate that there is no need for a wildcard therefore they do not allow its use. Well, we discovered an undocumented way to use so-called wildcard searching with Google. As with many other search engine concepts, Google again has broken the rules. In Google terms the * happens to be a wildcard that replaces an entire word, not just the last part of it like in the above example. If you use * connected to a keyword by any of the characters Google ignores like = , ; \ / < and > then it acts as a place setting or wildcard for "any word" like this: my resume gets over 2MM results, but... "my resume" gets 353k results and is the exact same as my=resume, my/resume, myresume and so on. The most useful aspect of this discovery is that if you use one more connector and asterisk then it returns results with two words between my and resume like this: my-*-*-resume which returns only 28k results with two words separating my from resume vs. the previous example of 47k results with only one word separating our two keywords. Try this search by using different keywords used to name resumes (vitae, CV, skills, experience), or combinations of unique skills, or city and state duets when the city name is found in many states like Rochester which is found in 27 states. ~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~- Search Engines that do More, Part One: Vivisimo: Clustering Search Engine Searching isn't always an absolute matter. Sometimes there is a need for search engines that do more than just bring back surgical results. There are four examples of new breed engines hat provide additional benefits for recruiters besides long pages of links. Today we will cover the first of four examples. Although it may look like a meta-search, Vivisimo goes miles above and beyond simply getting content from other search services. What they do so differently is automatically cluster results into topics. Similar to the former (not so former, hehe) Northern Light and InFind, Vivisimo is more advanced because its totally automated and extremely fast. It dynamically returns search results in relevant topic clusters. This differentiates Vivisimo from meta-searching because you can drill down through layers of categorization organized far more intelligently. I find a major selling point of this new service is that Vivisimo does not report results from "pay for placement" search engines like other meta- searches do hence you will find fewer commercial site results. Clustering is indispensable when you want a complete overview of a topic or when you would like help in narrowing your search. This is the only automated, hierarchical, conceptual, just-in-time clustering engine available today. Because it is automated, not manual, the categories are created on-the-fly, they are much narrower and are particularly accurate. That's good for Competitive Intelligence and Recruitment Research but read on to learn about other advantages. Vivisimo removes "most likely" duplicates. In other meta searches which attempt to remove a duplicates they often slip into the results because they are not exact duplicates. They could be a newer version or for some reason have slightly different content. Vivisimo broadens the definition of a duplicate to cleverly remove results that would otherwise slip by meta-search scrutiny. Another reason to take a serious look at Vivisimo is that it offers total control. Best results are obtained when searching with total control. Traditional meta-searches frequently fail to meet our expectations because they don't offer the granular control afforded by advanced field search commands like image:, title:, url:, link:, host:, site:, domain:, related:, and text:. In addition to every form of Boolean like "AND," "OR," "AND NOT" and even "NEAR" Vivisimo also handles all those field search commands. If I haven't turned you on to Vivisiom yet then this will cinch the deal: you can Save and Email your search results! Imagine how useful that is. In the past saving and emailing was accomplished only with heavy artillery. Click on the SAVE link in the yellow frame at the bottom right corner of the Vivisimo screen and a new page is loaded that contains all the data in one file. Save to disk the entire page, not just the link, or email directly from your browser. Netscape 4 does not save the page well, so use another browser to save it, but you can use I.E. 5 or higher and Netscape 4 or higher, to view the results. ~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~- Wisenut Vagaries This recently launched search engine has grown quickly. It is easy to use with a simple interface and powerful features. Wisenut uses a similar PageRanking relevance engine, and looks at both link structure and popularity, similarly to Google. That is not the end of the comparison, but I will state that in my opinion Google is not at risk of loosing their number one spot to Wisenut any time soon. Wisenut offers some things to help you refine your search along with some features, which make it an excellent additional search engine. Primarily, it automatically categorizes your results into "wiseguides" that are related to words in your query. Each WiseGuide displays the number of results it contains. When you run a search, open up a WiseGuide category by clicking on a white text link in black background above your search results. Clicking the plus icon next to the category opens the search results for that category and reveals any additional subcategories. Each category has a link to its right called "Search This" allowing for an easy new search using itself as the new query. Like many other search engines Wisenut compresses results from individual sites, the difference it they created a very convenient "See X more pages from this site!" format. Wisenut's compression is unique in that instead of the plain old "more results" from this site link, Wisenut lists the exact number of pages on a site that it has determined are relevant to your query. The niftiest innovation on their results pages is the ability to Sneak-a-Peek which opens the target page into a small window below the result URL. These peeks may be cached pages and eliminate some mouse movements thus saving time. One of the main reasons this new search engine is very useful for recruiters is its size and freshness. Although it's not as big as Google, Wisenut index is growing quickly. Because Wisenut's robot can allegedly read 100 million URLs a day, its likely that it will be able to give you fresh results even from pages only recently added to the Internet. Thanks to www.searchenginewatch.com and www.researchbuzz.com for inspiration in this series. Also visit www.jobmachine.net to see the Search Engine Rankings for Recruiters and the brand new Spyglass. ~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~- Search Engines that Do More. Part Two: Teoma Size is important when judging a search engine database, but without good relevance ranking, a large index is practically useless for recruiters. Teoma joins the game with a new way to analyze and rank pages. Searching Teoma's index brings back pages that contain the search terms, like any other search engine, but also pages that may be very relevant based on link context. After finding matching documents they are organized into what are called communities, then ranked based on link popularity within the community. Fine, but what relevance does that have to recruiting? Looking at the Resources section we find that search results include a list of collections from "Experts and Enthusiasts" on the bottom right side of the page. These "Resources" are pages listing relevant collections of links to other sites and resources for your search topics. Or, in other words, Weblogs with subjects related to the search keywords. Weblogs are created and maintained by people who consider themselves experts, or at least connoisseurs, on a particular subject. They spend vast amounts of time organizing what links they consider relevant to their chosen field. These pages not only make great resources for our recruitment efforts, but their authors are excellent contributions to our network and their collections are incredibly useful lists of links in their chosen field. Effectively, what this means is we are searching "other people's searches" or searching what others have searched before. For example, a search for "optical network" brings up http://www.gmpls.org/ as one of the top resources. Going to the site we see a plethora of white papers, MPLS tutorials, presentations, standards organizations, and many technical documents from hundreds of optical organizations. Its like a set of perfectly customized search results! You can also contact Mr. Vinay Ravuri via his email address listed on the site. Its quite likely he would welcome questions and be helpful. Other link collections include a page linking to all SONET vendors, a list of all telecom companies from the United States Telecom Association, and article about Supercom 2000 describing and linking to all the upstart and major competitors in the Optical space. Besides the traditional search engine results you would expect, one more reason to check in with Teoma along with other popular search engiens is its ability to provide highly relevant, or authoritative, results. Teoma's results ranking is based on what they call Subject-Specific Popularity. According to Teoma Subject-Specific Popularity analyzes the relationship of sites within the list of results, ranking sites based on the number of same-subject pages that reference it. In other words, Teoma claims they provide the best answer to search queries because by analyzing site peers they can establish authority for the search result. Keep in mind that while results are highly accurate and relevant, like with Teoma's owner Ask Jeeves, they lack in volume. Our "optical network" search yields an estimated 300,000 results in Teoma but over one million in Google. Interestingly, Teoma's search results were some of the most authoritative Optical Networking sites around like the National Transparent Optical Network Consortium NTONC, Sycamore, Ciena and the All-Optical Networking Consortium. In contrast, Google displayed a great selection of unique sites but on the first page of results it was hard to pick out the authoritative sites like those in Teoma's first page. There was, however, a small amount of overlap - both engines picked up NTONC and Salira in their first page of results. Refining a search is easy because Teoma provides a section on the top right specifically designed to suggest additional relevant keywords, which can be added to your search. Clicking on one of the links under the "Refine" section automatically adds those words to the search and returns a new set of refined results. Our example search on "optical network" offered five refining choices. The last one in the refine list was "Fiber, Manufacture." Clicking on it brought back results on a new search for "Fiber, Manufacture, optical network." Teoma does not support special syntaxes, advanced Booleans, wildcards, stemming, or field commands like inurl: and such. A search for "optical network intitle:resume" for example is not possible. On final treat, just like at Google, its clear who the advertisers are on Teoma because their ads are placed under a section dedicated to paid results - the "Sponsored Links" - making them easier to avoid. The Teoma advantage is to be able to find out which sites are the most relevant to a search. It is by no means a conclusive search, or a way to locate rare jewels and hard to find pages, but it is an invaluable tool in the CyberSleuth's ToolBag. ~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~- Fuzzy Logic Searching, The NEAR Command Fuzzy Logic as related to Booleans and Internet Searching refers to commands that are not necessarily precise or based on definite mathematics. The Boolean search command AND is very precise, for example, requiring search results to include both terms. Each term must be present, regardless of location on the page. In comparison, the fuzzy command NEAR requests results where one term is close to the other. It is considered a fuzzy logic term because the definition of "close" is left to be interpreted. Fuzzy terms can be interpreted in many different ways. For example, how near is NEAR? Within how many words do the search terms need to be from each other in order to be considered? Direction can also be left open to interpretation. Does NEAR mean close by on the right, left or both sides? The NEAR fuzzy logic comand can be used on the AltaVista search engine. There are other search engines besides AltaVista that are better at handling this kind of fuzzy logic searching. Because most readers have used AltaVista lets review the use of NEAR on that familiar search engine for the purposes of demonstration. NEAR searching is very useful for opening up a narrow search to include other possible combinations of a set of words. AltaVista interprets NEAR as within 10 words to the left or right of the first term. Like this: Nurse NEAR licensed That will return pages containing the term "Nurse" where it appears within ten words of "licensed". Results include all the types of licensed nurses like "Licenced Vocational Nurse" and "Licensed Practical Nurse." But you also included are the other ones like "Registered nurse in emergency room. Provided and supervised licensed..." where Nurse is 7 words away from Licensed. In contrast, the use of fuzzy logic search term NEAR excludes results like a "Licensed Driver" who was a "Sketch Nurse" in a play in Wisconsin (read her resume at http://suzanneadams.com/resume.htm). To further illustrate, in AltaVista a search for "nurse NEAR licensed AND title:resume" returned 86 documents, while "nurse AND licensed AND title:resume" returned 141. There are fewer results with the use of the NEAR command. Fewer results may signify a more accurate search, especially when the narrower search is successful in eliminating a large percentage of the undesirable results. In the above example using the NEAR command proved to be a more accurate search, eliminating pages similar to the Denham Personnel Services page and the Rehabilitation Recruitment Center page. Other search engines define NEAR differently. On AOL Search NEAR can be defined by the user. At Lycos NEAR is defined to be within 25 words. Fuzzy terms like NEAR can assist in making many searches more accurate. If you would like examples of how to apply the NEAR command to your search drop us a line describing one of your current searches and which search engine you favor. Join us in two weeks when we will explore other fuzzy terms. ~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~-~- Searching With Fuzzy Logic: The ADJ Command Fuzzy search fun doesn't end with just the NEAR command we reviewed in the last issue. To broaden the horizon a little we use two other extremely powerful search engines, one old and one new, like AOL and Vivisimo. SEARCH AOL: http://search.aol.com AOL has the little known ability to search with the Boolean NEAR, which we have used for many years, but also the ability to use the search commands ADJ and W/ n. "What is that?" you ask? ADJ means directly adjacent. With it we find documents that contain specific keywords directly in front of or behind a primary keyword. ADJ is different than "double quotes" for three reasons. Fist, ADJ in AOL Search automatically allows for root word variants or truncation as in "program," "programming," "programmer" and so on. Second, ADJ can connect complex expressions. For example: (engineer or developer or architect) ADJ software finds items containing either software engineer, software developer or software architect. Finally, unlike "quoted phrases" your words can be on either side of each other not necessarily in the exact order found within the quotations. To illustrate, if we were to use quotations to find both versions of database next to design we would have to use ("design database" OR "database design") but instead by using the ADJ command all we need to do is search for: design ADJ database That's not just easier, its also a bit more accurate! WITHIN On Search AOL within is a command expressed as W/n where n is any number. W/n is a proximity operator that gives us the power to manually set how close we want things to be. It will find documents where specific keywords occur within a specified number of words - n words - to the right of the primary keyword. Any whole number can be used for "n". Example: optical W/5 engineer finds documents in which optical occurs within five words after, to the right of, engineer - as in optical systems engineer, optical board level design engineer, optical long-haul systems engineer, etc. It will look only for words in order of "optical" fist then any other words numbering up to five, and finally "engineer" but not the inverse. http://www.vivisimo.com An automated, hierarchical, conceptual, just-in-time clustering engine, Vivisimo is much more than a meta-search. There are many reasons, but the most relevant for this article is its ability to offer total control. Vivisimo offers the use of advanced commands like image:, title:, url:, link:, linktext:, host:, site:, domain:, related:, and text:, in addition to every form of Boolean both traditional and Fuzzy like AND, +, OR, |, AND NOT, -, NEAR and ~. Since this is not a search engine of its own but rather gets results from Yahoo, MSN, Fast, Netscape, Open Directory, Direct Hit, Looksmart, AskJeeves, Lycos, AOL and HotBot, the advanced commands are used as they would with the search engines directly. The absence of Google and AltaVista is purposeful. Also, be aware that Near is only used by AOL and Lycos, and that on Lycos Near means within 25 words. Vivisimo should handle command translation so that the use of "host:" should translate to "url.host:" for Fast and domain: for HotBot. Clarification on who uses what commands and how can be found on Danny Sullivan's easy reference chart at: http://www.searchenginewatch.com/facts/ataglance.html ADVthanksANCE! Shally Steckerl (JobMachine, Inc ~ jobmachine.net)