Glossary: Sorting and Ranking

When we search a database, our results are displayed in some order. This is called ranking or sorting. The most common orders we see are most recent first and relevance.

When the most recent rank is used, the items which were entered into the database last (or published last, depending on what criteria is being considered by the database) are at the beginning of the list, and the oldest ones are displayed at the end. TOPCAT uses this ranking in its keyword search display.

When a relevance rank is used, the database implements some method of identifying which results most closely represent what you are searching for and shows those at the beginning of the list. Criteria used to determine the relevance of an item may be:

  • where in the record the keyword appears: it will be more relevant if it appears in a title or abstract rather than a list of references used;
  • how many times the keyword appears: the more times it appears, the more relevant it will be;
  • whether the keyword appears as part of a phrase or on its own: if you enter a phrase as a keyword, items with the full phrase will be more relevant than those with only part of the phrase.

Relevance ranking is difficult for a system to accomplish because it depends upon logic. Some search engines are better at it than others, making it appear as though the database contains more useful information. The strength of Google as a web search engine lies in its abilty to sort by relevance, although the criteria used to establish relevance are not always scholarly and are sometimes controversial.

Research has shown us that few students look beyond the first or second page of results because they assume that the most important items would have appeared by then. This isn't always the case, and we should always consider the sorting/ranking order used before we make such an assumption.