Search Interfaces |
A sample and fairly straightforwards script to search a database given a CQL query. We go through it section by section and explain how things work. It can be used as a template for other scripts, or as a base point for more complicated versions.
01 #!/home/cheshire/install/bin/python 02 03 import sys 04 osp = sys.path 05 sys.path = ["/home/cheshire/cheshire3/code"] 06 sys.path.extend(osp) 07 08 from server import SimpleServer 09 from PyZ3950 import CQLParser 10 from baseObjects import Session |
The first thing to do in any script is to setup python such that you can use the various Cheshire3 objects. This allows us to find the Cheshire3 code first, before any other similarly named modules that might be installed. Line 8 imports the base server code needed in the next section, and line 9 imports the parser for CQL (Common Query Language). Line 10 imports the base Session object used to maintain contextual information.
11 # Build environment... 12 session = Session() 13 serv = SimpleServer(session"../configs/serverConfig.xml") 14 db = serv.get_object(session, 'db_tei') 15 recStore = db.get_object(session, 'TeiRecordStore') 16 resultStore = db.get_object(session, 'TempResultStore') 17 txr = db.get_object(session, 'TeiToDCTransformer') 18 idx = db.get_object(session, 'l5r-idx-1') 19 |
This example uses a lot of different objects within the Cheshire3 framework. First of all (13) we build a server from the configuration file. From that server we then retrieve the database (14), the recordStore (15), a result set store (16), a transformer (17) and an index (18). These will all be used in the next section.
20 query = 'dc.title any "sword fist steel"' 21 clause = CQLParser.parse(query) 22 result = db.search(session, clause) 23 hits = len(result) 24 if hits: 25 result.order(session, idx) 26 rsid = resultStore.create_resultSet(session, result) 27 for i in range(min(hits, 10)): 28 rec = recStore.fetch_record(session, result[i].docid) 29 doc = txr.process_record(session, rec) 30 print doc.get_raw() |
Now that we're set up, we can actually do some work. First we define a query in CQL (20). This would typically come from a command line or another interface rather than being static as per the example. We need to turn the query into a parsed tree to process it, which is done by the CQLParser in line 21. The search is then carried out in line 22, returning a result set object.
ResultSets are a combination of set and list in that they have a fixed order but can be combined with booleans such as AND or OR. First we check if there were any matches for our search (23-24). If there was, then we order the result set according to an index. This is one of the indexes built in the build process and hence knows how to extract data from a record. If configured, it may have a pre-generated database of these extracted values per record. In line 26 we then store the sorted result set in a ResultSetStore for later reference.
Finally, we step through the first 10 records (or however many are in the set, if less). We need to retrieve the actual document from the recordStore first (28), as the set consists of pointers. Then we transform the record into a document containing, in this example, the simple dublin core form of the data. The last line then prints the transformed XML.
#!/home/cheshire/install/bin/python import sys osp = sys.path sys.path = ["/home/cheshire/cheshire3/code"] sys.path.extend(osp) from server import SimpleServer from PyZ3950 import CQLParser from baseObjects import Session # Build environment... session = Session() serv = SimpleServer(session, "../configs/serverConfig.xml") db = serv.get_object(session, 'db_tei') recStore = db.get_object(session, 'TeiRecordStore') resultStore = db.get_object(session, 'TempResultStore') txr = db.get_object(session, 'TeiToDCTransformer') idx = db.get_object(session, 'l5r-idx-1') query = 'dc.title any "sword fist steel"' clause = CQLParser.parse(query) result = db.search(session, clause) hits = len(result) if hits: result.order(session, idx) rsid = resultStore.create_resultSet(session, result) for i in range(min(hits, 10)): rec = recStore.fetch_record(session, result[i].docid) doc = txr.process_record(session, rec) print doc.get_raw() |