Cheshire3 Configuration: Index

Introduction

Indexes need to be configured to know where to find the data that they should extract, how to process it once it's extracted and where to store it once processed.

Example

<subConfig id="zrx-idx-9">
  <objectType>index.ProximityIndex</objectType>
  <paths>
    <object type="indexStore" ref="zrxIndexStore"/>
  </paths>
  <source>
    <preprocess>
       <object type="transformer" ref="zeerexTxr"/>
       <object type="parser" ref="SaxParser"/>
    </preprocess>
    <xpath>name/value</xpath>
    <xpath xmlns:zrx="http://explain.z3950.org/dtd/2.0">zrx:name/zrx:value</xpath>
    <process>
      <object type="extracter" ref="ExactParentProximityExtracter"/>
      <object type="normaliser" ref="CaseNormaliser"/>
    </process>
  </source>
  <options>
    <setting type="sortStore">true</setting>
    <setting type="lr_constant0">-3.7</setting>
  </options>
</subConfig>

<source>

An index configuration must contain at least one source element. Each source block configures one or more XPaths to use to extract data from the record, a workflow of objects to process the results of the XPath evaluation and optionally a workflow of objects to preprocess the record to transform it into a state suitable for XPathing. Each source block will be processed in turn by the system for each record during indexing.

<xpath>

This element contains an XPath expression to use in extracting data from a record. It may appear more than once, and if so, the results of each expression will be processed by the process chain (as described below). If the XPath makes use of XML namespaces, then the mappings for the namespace prefixes must be present on the XPath element.

<process> and <preprocess>

These elements contain an ordered list of objects. The results of the first object is given to the second and so on down the chain.

The first object in a process chain must be an Extracter, as the input data is either a string, a DOM node or a SAX event list as appropriate to the XPath evaluation. The result of a process chain must be a hash, typically from an Extracter or a Normaliser. However if the last object is an IndexStore, it will be used to store the terms rather than the default.

The input to a preprocess chain is a Record, so the first object is most likely to be a Transformer. The result must also be a Record, so the last object is most likely to be a Parser.

For existing processing objects that can be used in these fields, see the object documentation.

Paths

Settings