What you need to know about the new RHGateway search engine


June 2001 Meeting Minutes | New RHGateway Home Page | Advanced RHGateway Search and Topics | PHMWG Home Page

How to use the new engine to search for terms:

  • Words: multiple words are OR-ed together (condom OR IUD). You get anything that includes either term as well as documents that include both terms. Usually returns a lot of hits. Relevance is higher for scarcer words in the whole collection.
  • Phrases: enclose phrases in “double quotes” to keep together and return fewer hits.
  • Case: UPPERcase only matches uppercase; lowercase matches either case. Use uppercase for words like AIDS, SIDA, HIV and you will get better search results.
  • Require (+) or Reject (-) Operators: Typing this +”quality of care” specifies that all returned documents must contain the phrase “quality of care”. Typing this -“Johns Hopkins” specifies that none of the returned documents contain the phrase “Johns Hopkins.”

How to use Field Specifiers:

  • Site: Matches whole or parts of site name – – site:www.jhuccp.org matches only documents that have “www.jhuccp.org” as part of the site URL. “site:rhgateway.org” will match search.rhgateway.org or www.rhgateway.org. You cannot include specific directory names.
  • url: Matches part of a site name, directory or file name — url:org finds all the URLS that include ‘org’. “url:pdf” finds all the PDF files.
  • title: Finds documents with specific words in the TITLE field – – title:”adolescent health” finds all documents with the phrase “adolescent health” in the title field.

Topics
To use TOPICS click on ADVANCED, then scroll to the bottom of the screen. You will see a list of topics like adolescent health, HIV/AIDS, etc. These are predefined queries using filters that try to help you find the most relevant and useful documents in RHGateway on those topics.

  • Examples of some topic filters used in RHGateway:
    HIV/AIDS ->
    require title:HIV or
    require title:AIDS or
    require keywords:HIV or
    require keywords:AIDSAdolescent Health: Abstinence ->
    require title:adolescent and require title:abstinence or
    require keywords:adolescent and require keywords:abstinence or
    require keywords:abstinence and adolescent (text word) or
    require title:abstinence and youth (text word) or
    require keywords:abstinence and youth (text word) or
    require title:abstinence and student (text word) or
    require title:abstinence and young (text word)

How to control relevancy ranking and how it is computed

  • Ranking is controlled by:
    -frequency of the term(s) within the document
    -location of the term(s) within the document: The title is most significant (8 points), then keyword or summary (4 points), and finally the text or body (1 point).
    -rarity of the individual terms, including phrases, in the whole collection
    -occurence of multiple query terms
    -proprietary factors
  • See FAQ64: How does Ultraseek Server come up with the scores for each item in the search results? for more details.

What forms the document summary?

  • From the ‘Description’ Meta tag: Create a Meta tag with a name attribute equal to “description” and a content attribute equal to the summary you would like to show for that particular document. For example:<meta name=”description” content=”CONDOMS CD-ROM is a searchable, multimedia, reference database allowing direct access to the most comprehensive, international colleciton of information, education and communication materials on condoms.”>
  • If there is no ‘Description’ Meta tag, the spider collects plain text, ignoring HTML tags, up to 255 characters.
  • Since there is no ‘Description’ Meta tag for PDF documents, summaries are formed from the first 255 characters.
  • See FAQ 059: How does Ultraseek Server create summaries? for more information

How to control indexing

  • Use the <–!startindexing– and –!stopindexing–> tags to exclude specific text in a document.
  • Use the ‘robots.txt’ document at the site root to exclude directories and files.
  • See FAQ 092: How does Ultraseek Server interact with sitelist.txt? for more information.

How to control the RHGateway search engine

  • Add URL [Click on HELP, then on ADD URL]: If your site is already part of RHGateway and you’ve just created a new document, add the url of your new document here in order to have it spidered immediately.
  • URL status [Click on HELP, then on URL STATUS]: Type in a specific URL of a document to find out how recently it was spidered and when it is scheduled for re-spidering.
  • View Sites [Click on HELP, then on VIEW SITES]: If you want to see a list of the sites being spidered, click here. RHGateway is currently indexing 52 sites and includes 41,620 documents.
  • Revisit Site [Click on HELP, then on REVISIT SITE]: If you’ve just made major changes to your site, you can instruct the spider to reindex it here.