Extracter and Normaliser Configurations

Introduction

Extracters locate and extract data of a given format from either a string, a DOM node tree, or a list of SAX events. They must be the first object in an index's workflow. Normalisers are then used to process those terms into a standard form for storing in an index store.

Unless you're using a new extracter or normaliser class, they should all be built by the default server configuration, but for completeness we'll go through the configuration below.

Example

Example extracter and normaliser configurations:

01 <subConfig type="extracter" id="ExactExtracter">
02   <objectType>extracter.SimpleExtracter</objectType>
03 </subConfig>
04 
05 <subConfig type="normaliser" id="CaseNormaliser">
06   <objectType>normaliser.CaseNormaliser</objectType>
07 </subConfig>


Explanation

There's obviously not much to say, as these objects only do one thing and don't have a lot of options or paths to set.

Currently available extracters, of which the first four are the most commonly used:

Currently available normalisers: