Search tips, FAQs and access to documentation for WebFinder V5.0 and previous versions.

In this page you will find:
General search tips
About formulating queries
About selecting search engines
How to increase the speed
How to get more results returned
Engine plug-ins tips
About plug Editors
Important settings
About stripping distracting info

 

Documentation in pdf format
Detailed documentation for v. 5.0 and 4.5
Troubleshooting V5.0
FAQ
Troubleshooting former versions
FAQ for V4.5, 4.0, 3.x and Factory 2
Filtering results with V5.0
All WebFinder's filtering options
Result plug-ins
What can Result plug-ins do?

General search tips

Queries

Choose your keywords carefully and use synonyms
Unless you search for something very specific you can miss interesting pages if you don't describe in different ways what you want to see. You can achieve this by using long query made of several keywords and synonyms or better, by making use of WebFinder's multi-query feature which allows submission of several queries in a single operation.
In the first case you wouldn't apply search conditions such as "Exact phrase", in the second, you could use sentences relevant to the topic you're searching for.

Keep the proportion of noise low
Your queries must mainly consist of important keywords. Non specific words (such as "the", "for", "to", "some" etc.) are referred to as noise. Many engines do not take into account some "noisy" words in order to make their service faster. These ignored words are called stop words and vary from one engine to another. Some engines use a list edited by a human, while other engines use a dynamic list based on frequency.
Another point is to avoid using very popular words in your queries because engines will have hundreds of thousands of possible results. One such example is "music". Try to be as specific as possible. If you don't know the title of the music piece try with the author's name plus the musicians' names plus the type of music and so on.

Back to this page menu

Search engines selection

WebFinder has the ability to search many engines, either individually or simultaneously. However, depending on what you're looking for, some engines will be relevant, others nearly pointless.
Some engines are just a branded version of a main engine and including them in a set with the original main engine is unlikely produce anything useful and will only slow down your searches.
Knowing better how each engine responds will allow you to build smaller, better targeted sets that cover your needs and respond with speed and accuracy.

Avoid compiling sets using engines which exhibit different behaviors
Web-based engines, file archives, online shopping sites, electronic encyclopedias and other types of search engines all return different types of results.
It is not a good idea to combine engines differing in their behavior in a single set. It would only provide you with inconsistent results and would also minimize the potential effectiveness of any filters you are currently using. Besides, any post processing by means of Result plug-ins might also not function properly.
For instance do not place in a same set a general search engine such as Altavista whose plug-in only keeps results and an online retail shop's engine whose plug-in is set to fetch the full result page without any alteration. With Altavista the links correspond to web pages that you can have scanned by the WebFinder Page Crawler, with the e-shop the links correspond to a shopping cart.
Building homogeneous sets is a key point for search efficiency.

Meta search engines are useful to prepare a complex search
Meta search engines search several engines at a time. Some, such as Dogpile.com bring back raw results, others such as Mamma.com use some post-processing and present more relevant results.
So, if you have to run a complex and comprehensive search, it is a good plan to have a set made of a few clever meta search engines which can quickly test your queries and help you determine which kind of filtering to use. Then you can build a list of appropriate queries and launch a big search with a large set of engines, using the multi-query feature provided with WebFinder.

Back to this page menu

How to increase the speed

The are two basic factors which condition the speed with which results are found:
Turn on "True Simultaneous Searching" in the Preferences
This option can be switched on in the Advanced pane of preferences. When enabled, WebFinder will search all engines in the current set at the same time, instead of one after the other. Because this option is downloading results from all engines simultaneously it will use more memory but the time taken to search is significantly shorter.

Turn off the Page Crawler
As the Page Crawler has to download and scan every web page linked to by the results to find your keywords in the body text and produce a summary, searching takes much more time when this option is switched on.
If you still want a filter that provides relevant results although in minor quantity, you can alternatively choose a fast filtering option

Note: some engines such as HotBot or AltaVista offer a text-only version which is faster, so it's a good plan to have a plug-in for these text versions.

Back to this page menu

How to increase the number of results returned

The are three factors which help determine the amount of results that are found:

Increase the number of result pages per channel in the Preferences
A "page" of results is set of results that any particular engine shows to you at a time. Most engines usually limit you to 10-20 results per page with a "Next" link at the bottom to flip to the next set (or page) of results.
The number of result pages that have to be downloaded from each search engine can be modified in the Results pane of preferences. Increase it by a few units*, especially if you intend to apply strict filtering conditions to your search.
Relevant pages may be buried deeper than the first two of three pages.
*Note that not all engines return more than one page of results.

Give your filtering a wider base to work from
Filtering by "Link titles" is an extremely restrictive criterion.
Searching for an "Exact phrase" is not always the best means to get a large amount of results and is only justified if you are looking for something as specific as a title (movie, song, book, etc.)
Prefer filtering criteria that will allow a larger amount of pages to match, for instance filtering on "Summary plus link title" instead of just Link title, crawling for at least one keyword rather than "Every keyword".

Use several queries in a single operation
Very often finding pages of interest is a matter of formulation. Languages offer many ways to build a query. Unless you are looking for something extremely specific (just as a name, a trade mark) you can use several different words to describe what you are looking for.
For example:
You have installed a home theater system and you are not satisfied with the acoustical result due to reverberation in your room.
Simultaneously search for acoustical control, reverberation control, acoustical correction, sound reverberation absorption, sound absorbing materials or coatings and so on. It is almost certain that your search will produce enough results for you to find the best adapted to your particular case, from "add carpets and thick curtains" to sophisticated technical solutions including special sound absorbent coatings, foam panels trapping reverberation, etc.
To enter more than one line of search terms, expand the search pane in the application main window, enter a first query in the query box, press the "Add" button, introduce a second query, add it and go on until you think the subject is sufficiently covered, then press the "Search" button.

Using several queries increases the number of results while allowing stricter filtering conditions to be applied for increasing relevancy of results.

Understand the engines you are interrogating
Engines such as file archives, encyclopedias, and other specialized databases may only give a brief description (sometimes none) with each result and may link to downloadable files, pictures, sounds and other types of data that can not be scanned by WebFinder's Page Crawler.
In such cases filtering may produce a loss of relevant results.
If you want to better understand how WebFinder's filtering options work, you can read some explanations now.

Back to this page menu

Plug-in creation tips

Simple Editor vs Advanced Editor

Simple plug-in Editor makes it easy to create plug-ins to perform specific tasks
The Simple plug-in Editor (SPE) creates single field plug-ins, that is, plug-ins incorporating given search parameters. The Advanced plug-in Editor is intended to create multi-field plug-ins, that is to say, offering the user the possibility to set some search parameters, from within the WebFinder main window.

A multi-field plugin is one that contains more fields than just the standard search field (extra text fields, check boxes popup menus etc). Although it may sound more attractive to have multi-field plug-ins, you must understand that such plug-ins are difficult to combine into sets because their default parameters may not match. As a matter of fact you can only set parameters different from the default ones (default values are set when you create the mutli-field plugin) if the multi-field plugin is in a set by itself.
Let's say that you intend to search the German web (pages hosted on .de servers or pages hosted in Germany - whichever is their top level domain) for pages in the Spanish language.
Unless you set default parameters for all multi-field plug-ins in the set to do that, you may get unpredictable results returned as the engines may not accomodate your query correctly.
Understand that multi-field plug-ins are convenient and flexible when they are in a set by themselves, so if you have a preferred search engine and you fully rely on it to spot information, it may be ok.

Another difference between the two plug-in Editors is the way they work:

  • the Simple plug-in Editor requires you to perform a search with your browser where you set the search parameters and enter the word "test" as the query. Then you copy the url for the result page displayed in the browser to the action url field in the SPE window.
  • the Advanced plug-in Editor is intended to automatically decode the engines' search forms. However due to the extreme diversity of engines and the total lack of standards in the way engine encode their forms, the APE may show some stability problems on certain computers running certain versions of Mac OS.
    The APE is included in the WebFinder distribution as a "left over" from previous versions, and is not supported, unlike the SPE.

Professional-level searching requires specific search conditions
Intelligence searching (e.g. tracking your main competitors' activities on the Internet), corporate research, or monitoring web site rankings in search engines over a period of time are just a few examples of what some businesses do on a daily basis.
For instance your goal might be to watch what your French competitors X and Y do on the web in Spanish speaking countries.
WebFinder's Simple plug-in Editor makes it easy to setup this task. By quickly creating plug-ins that search the French web for pages in Spanish, plus plug-ins that search the World Wide Web for pages in Spanish, plus plug-ins that search regional search engines dedicated to Spanish speaking countries (Spain, USA, Mexico, Argentina, etc.). You can then create a giant set or three more specific sets.
Results will be homogenous enough to allow data extraction, post processing and storing by helper applications (eg filtering and exporting results to a database that link to your competitors' main sites, plus their subsidiaries', plus their main distributors').

Back to this page menu

Important settings

Watch out for search engines which contain their own Search Condition options
WebFinder has built-in support for applying standard conditions such as "Search for exact phrase", "Search for Every Word" etc. However, many engines also provide these types of options on their search page (e.g. a pop-up menu containing "Any", "All", "Exact").
To prevent your searches being corrupted by incorrect condition syntax make sure (when creating plug-ins for engines which have these options) to select the option that corresponds to "Any" when performing the search for "test". This option is the same as WebFinder's 'Basic' parameter and does not change the search terms in any way.

When to use the option "This plug-in ignores all filters"?
Not all engines work well with WebFinder's built-in page filters. When creating plug-ins for file archives, online dictionaries/encyclopedias, and other types of engines which usually provide little more than a follow-up link - it is recommended that you check this option as it bypasses the Page Crawler, Summary Filters and the Strip Duplicates option. This helps by telling WebFinder that this plug-in searches for very specific information that might ink to downloadable files instead of web pages and may not provide summaries.

Back to this page menu

Stripping undesired information

One of the main difficulties in searching the Internet is the large number of result pages that contain distracting, useless or unwanted information. Both plug-in Editors incorporate features to strip out this type of content.

WebFinder lets you decide which information you want to keep

Some result pages sent by engines need to be kept as they were produced (for instance a table listing items references and descriptions, prices, etc.).
Other result pages are stuffed with advertising, links to other categories, links to possible searches, links to localized versions of the same engines or links to other partner search engines etc, all of which can be distracting, or, worse, make it impossible to automatically post-process results.
Provided the service conditions of the search engines you want to create a plug-in for do not forbid or restrict it, you can use two different methods to strip out the unwanted content:
  • Using fuzzy logic
    WebFinder will try to guess how results are encoded. This method provides a satisfying way to remove distracting information. But it won't produce 100% pure results at the first go.
    Therefore you can fine tune your plug-in by manually entering information in a box titled "Remove links containing". For instance if Fuzzy logic did not guess that "www.adlinks.com/clickthru?tg=92384" is a distracting link and provided that no result url is based on adlinks.com, you would type "adlinks.com" in the Remove Links box.
  • Using tags
    Some engines use marker tags to clearly separate a result section from the rest of the page (these tags are often called List Start and List End) and results between themselves (often called Item Start and Item End).
    By examining the result page source code you can figure out what these tags are and copy them into the corresponding boxes in the plug-in Editor window. Another possibility is that no marker tags are used, but the html code has something specific (different cell format in a table, different style parameters, etc.) to the result section. A string containing a portion of html code can be used as List and Item tags in the plug-in Editors.
Using Tags when possible is a straightforward way to produce clean results, but keep in mind that search engines change their presentation from time to time and your plug-in using the Tags method will need to be edited when such changes occur as it may fail completely to extract results.
On the other hand Fuzzy Logic makes maintenance easier as a plug in will still work after a change, although the "purity" of results may suffer.

Documentation in pdf format

Troubleshooting V 5.0

Former version FAQs

 

Filtering results with WebFinder V 5.0

In this section:
Fast filtering:
Strip duplicates filter
Summary filter
Link title filter
Summary + link title filter
In depth filtering:
Page Crawler intro
Passive page crawler
Active page crawler
Back to this page menu

Fast filtering

strip duplicates Eliminating pages already found earlier in the search
"Strip Duplicates" removes results that have already been found in the current search.
Unless you want to see exactly which pages each engine of a search set has returned (for instance if you check the position of your page(s) or your competitors' on several engines for given keywords), you should usually have this option enabled as it reduces clutter in the results returned.
Back to filtering menu

summary filter Checking Summaries for relevant pages (summary filter)
WebFinder has the ability to scan the summaries returned with each result for the keywords or phrases searched for. With this option on, any result whose corresponding summary fails to contain at least one keyword will be discarded.
Search engines apply their own criteria to produce this summary. Sometimes it is based on the meta description provided by the page author, sometimes it is based on the first words found on the page, or on a supposedly relevant extract. Page authors are often conscious that search engine summaries are critical to them, so the "Summaries Filter" is a good way to find pages that have a good probability of being relevant to your queries - of course it eliminates those pages whose authors don't pay attention to search engine summaries.

Back to filtering menu

link title filtering Checking Link Titles for relevant pages (link title filter)
"Link title" is the text of the link to pages found by search engines. Very often it is an extract of the page title, not the url of the page.
So if you are looking for chocolate and the link title is "Largest choice of fine chocolate" the corresponding page is likely to deal with chocolate. However this approach is quite restrictive as any page which does not incorporate at least one required keyword in its link title will be stripped off.
Unless you specifically want to check only pages with keywords in their link title, it is suggested you use the next filtering option, which is a combination of this filter and the former.

Back to filtering menu

link title or summary Checking Summaries and Link Titles for relevant pages (summary + link title filter)
This option allows you to only keep pages whose summary OR link title do contain at least one of your keywords.
It is a combination of the two former options and will provide more results with a high probability of relevancy.
If your search engine set incorporates a lot of channels and you set preferences to bring back several pages from each channel you take the risk of getting too many results to browse. This is the time to use the Strip Duplicates filter combined with the Summary and Link Title filter to target pages with a high probability of relevancy.

Back to filtering menu
 

In-depth filtering: the page crawler

WebFinder has a built-in page crawler that, when activated, scans all html pages returned as a result of your search to check for the presence of your keywords in the body text. For each page scanned, the Page Crawler then builds a summary displaying the keywords it found in their context, so that you can have a pretty good idea of the page content. This makes it easier to decide which pages you want to visit thereby saving a considerable amount of time.
The Page Crawler has three different states:

page crawler iddle Page crawler turned off
If you just want to perform a quick search you'd better turn the Page Crawler off.
Crawling pages takes much longer than the fast filtering options due to both servers response time and full page download. An indication would be 10 to 15 seconds per page with a 33k connection.

Back to filtering menu

page crawler iddle Page crawler in passive mode
With this option on, Page Crawler will scan result pages and write a summary displaying the keywords it found in their context, after the search engine summary. This way you can easily decide whether a page is worth your visit or not.
Page Crawler has 3 possible conditions when looking for your keywords within the pages body text:

  • page contains at least one keyword
  • page contains all your keywords
  • page contains the exact phrase of your query
These conditions are set by the "Parameter" pull down menu in the Search pane.
In passive mode a summary is produced showing each instance found in its context, that is to say your keyword(s) or sentence with surrounding text, but no page is discarded from the WebFinder result page.
Back to filtering menu

page crawler iddle Page crawler in active mode
With this option on, Page Crawler works exactly as in passive mode but in addition, automatically discards those pages which do not meet your conditions.
Active Page Crawler filtering produces highly relevant results and effectively combats search engine spamming.
Take into consideration that the wider and deeper your search, the more time it will take to complete. However, all this is performed automatically so you can do something else in the meantime.

Back to filtering menu
Back to this page menu
 

Post processing results

Result plug-ins

WebFinder, through AppleScript, is able to deliver information to collaborating applications. For instance you may wish to browse the WebFinder result page in the browser of your choice and optionally to have results processed in a manner that suits your specific needs.
Professional searches often require analysis and storage of critical information.

Create or download result plug-ins that process the results of your searches

To create your own Result plug-ins you need some basic knowledge of AppleScript and The WebFinder Toolbox.
The WebFinder Toolbox is a scripting addition that gives you access to some extra commands for scripting WebFinder.

Some examples of what Result plug-ins can do for you:

  • Extract data from search results, such as date and time, query, engines, position, page url, etc. and write them to a file
  • Keep only results with a given characteristic (for instance results ranked in the top 5 on each engine) and write them to a file
  • Keep only given parts of the information returned for a result, for instance the page link and the crawler summary
  • Pass search results to scriptable applications
    such as Text Filtering applications, Databases, or web archiving tools.
  • Add custom filtering features to WebFinder, such as removing from the WebFinder result page links to sexually explicit content, or common words that have nothing to do with your business but are often associated with it (e.g. if you manufacture anticorrosion coatings using glass flakes you may wish to eliminate pages with "corn" to avoid pages talking about corn flakes).
    The advantage of this type of post filtering is that you do not depend on search engines syntax for your query (e.g. "+glass +flakes -corn" will work on some engines and fail on others that requires NOT instead of minus sign)