CQL Protocol Map Configuration

Introduction

Once we have all of our processing set up, we then need to create a map from our objects to a query language. The default implemented language used internally by Cheshire3 is CQL, the Common Query Language. CQL has indexes which are almost directly mappable onto Cheshire3 indexes, but they can be modified by relations to treat them as exact string matches or keyword matching, or further modified with relation modifiers to specify things such as to use stemming or not.

We specify the mapping between CQL indexes and Cheshire3 indexes using a ZeeRex record. This record is then used for the Explain response for SRW. To include the information about the objects used internally, we have to extend the schema with a 'c3' namespace.

The links above have full documentation on the query language and the ZeeRex record schema, so if you want more information, please follow through to their home pages.

Example

Example protocol map configuration:

01 <subConfig type="protocolMap" id="CQLProtocolMap">
02   <objectType>protocolMap.CQLProtocolMap</objectType>
03   <paths>
04     <path type="zeerexPath">zeerex_srw.xml</path>
05   </paths>
06 </subConfig>

And a sample (minimal) ZeeRex file to go with it:

01 <explain xmlns="http://explain.z3950.org/dtd/2.0/" xmlns:c3="http://www.cheshire3.org/schemas/explain/">
02   <serverInfo protocol="srw/u" version="1.1" transport="http">
03     <host>myhostname.mydomain.com</host>
04     <port>8080</port>
05     <database>services/databasename</database>
06   </serverInfo>
07   <indexInfo>
08     <set identifier="info:srw/cql-context-set/1/dc-v1.1" name="dc"/>
09     <index c3:index="title-idx">
10       <title>Title</title>
11       <map><name set="dc">title</name></map>
12       <configInfo>
13         <supports type="relationModifier" c3:index="titleword-idx">word</supports>
14         <supports type="relationModifier" c3:index="titlewordstem-idx">stem</supports>
15       </configInfo>
16     </index>
17   </indexInfo>
18   <schemaInfo>
19     <schema identifier="info:srw/schema/1/dc-v1.1" name="dc" c3:transformer="dublinCoreTransformer">
20       <title>Simple Dublin Core</title>
21     </schema>
22   </schemaInfo>
23 </explain>

Explanation

The mapping object itself is quite simple. It has an identifier and objectType like all other objects, and then one path called 'zeerexPath'. This should point to a ZeeRex xml file to be processed which includes the mapping information.

Setting up the ZeeRex file is easiest done by copying an existing record and modifying it. Some quick pointers first:

You can find the context sets for the set elements here
You can find the indexes linked in the context set references from the above page. If none of the indexes have the right semantics, then you're perfectly at liberty to make up your own context set.
The identifiers for common record schemas are here. Again if your record schema isn't present, then you can make up your own identifier.

In line 1 of the ZeeRex file, please note the namespace definition. This is then carried through for the rest of the file to distinguish Cheshire3 information from the basic ZeeRex.

If you are going to enable the SRW/U service, then you should correct the host, port and database name in lines 3 through 5. In particular, you should be careful that the path in the database tag matches with the location that Apache is listening for SRW/U requests.

The most important mapping to take place in the file is from CQL index names and relations to Cheshire3 index objects. First of all you need to define a short name for the context set that the index is part of. For example, the dublin core context set has the identifier 'info:srw/cql-context-set/1/dc-v1.1', but we call it 'dc' for short. This is done in the set element, such as at line 8.

Once all of the sets you're going to be using have been defined, you can then define indexes. Each index is just one semantic concept, not split by word, string, stem or other such normalisation. The primary index is given in a 'c3:index' attribute on the index element, as at line 9. Then there is a human readable title, and in line 10 a map. The short name for the context set is given in a 'set' attribute, and the contents of the name element is the name of the index, in this case 'title'. This makes the index available as 'dc.title' in CQL.

The configInfo section for the index then allows us to map to different objects based on the type of relation or relation modifiers used. Line 13 maps word queries to the 'titleword-idx' instead of the default 'title-idx'. Line 14 maps stemmed queries to 'titlewordstem-idx'.

Finally in line 19, note that we have a transformer object. This is the transformer to turn the record in the recordStore into the schema which is being published, in this case simple dublin core.

Please note that even if you do not define any indexes in the ZeeRex file, you can still use CQL queries to search. Any index is available via c3.index name. For example we could search the exact title index with 'c3.title-idx' instead of the 'dc.title' mapping above. For this to work, the object identifier must be all lower case characters.