Cheshire3 Installation

Introduction

The following instructions will hopefully walk you through installing Cheshire3 and its prerequisites from scratch under Linux (or any Unix). If you have troubles at any stage, feel free to contact us.

Alternatively, there is a shell script that will compile everything as per the instructions below. Retrieve ftp://ftp.cheshire3.org/pub/cheshire3/cheshire3-0.8-FULL.tgz (100M), uncompress it, and run build.sh.

Requirements

All of the code used is available together at: ftp://ftp.cheshire3.org/pub/cheshire3/
The links below are for if you want to check if there's a more recent version than the one we have available.

MinimumCurrentLocationOptional
expat 1.95.81.95.8http://sourceforge.net/projects/expat/(see note)
BerkeleyDB 4.04.3.28http://www.sleepycat.com/ 
apache 2.0.422.0.54http://httpd.apache.org/Yes
Python 2.3.02.4http://www.python.org/ 
mod_python 3.1.03.1.4http://www.modpython.org/Yes
4Suite 1.0a3-cvs1.0b1http://sourceforge.net/projects/foursuite/ 
ZSI 1.5-cvs1.7http://sourceforge.net/projects/pywebsvcs/ 
PyZ39502.04With Cheshire3 (http://www.panix.com/~asl2/software/PyZ3950/) 
python-dateutil0.9https://moin.conectiva.com.br/DateUtilYes
SRW1.11.1-2With Cheshire3 
Cheshire3 0.60.8http://www.cheshire3.org/ 
libxml2 2.6.102.6.19http://www.xmlsoft.org/Yes
libxslt 1.1.81.1.14http://www.xmlsoft.org/Yes
PVM 3.4.43.4.5http://www.netlib.org/pvm3/index.htmlYes
PyPVM 0.920.94http://pypvm.sourceforge.net/Yes
TextIndexNG 2.12.0.8(Only the Stemmer library used) http://www.zopyx.com/OpenSource/TextIndexNGYes
PyStemmer 0.10.1(Older stemmer library, may not work with Python 2.4)http://sourceforge.net/projects/pystemmer/Yes

Installing

Below is a set of instructions to install all of the requirements in user space, rather than globally. If you want to install globally omit the --prefix from the configurations. The example location is '/home/cheshire/install' and the source is being decompressed in /home/cheshire/build

Before embarking on the process below, you'll need to have a C compiler and make utility installed along with the appropriate libraries. We strongly recommend GCC. But you probably already do.

If you don't install everything in one session, you'll need to ensure that the environment variables are reset:

export CPPFLAGS=-I/home/cheshire/install/include
export LDFLAGS=-L/home/cheshire/install/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/cheshire/install/lib

  1. Install Expat
    Expat is the XML parser library that everything links to, so you'll need to install this first. Libxml2 is an alternative parser, but you'll need expat regardless as it's linked by Apache, Python and 4Suite.
    ./configure --prefix=/home/cheshire/install
    make
    make install

    Python 2.4+ and 4Suite 1.0a4+ both include expat version 1.95.8. Previously the included versions were different and this could cause problems running under Apache. The 'minimum' versions have now been updated to these, but if you want to hack in the same version of expat to Python/4Suite or not run under Apache, then previous versions will be okay. You do not need to install this package if it is already present on your system. Most *nix distributions include Expat.

  2. Install BerkeleyDb
    BerkeleyDB is a very fast transactional database system. It's used by such giants as Ebay, Amazon and IBM. For the purposes of Cheshire3, it is 10+ times faster than a relational database used for the same job, as it's used by all of the Store interfaces (indexes, records, configurations and other objects)
    cd build_unix
    ../dist/configure --prefix=/home/cheshire/install
    make
    make install
    BerkeleyDB is generally present in most Linux systems. If there is a version 4 or greater, then this is unnecessary.
  3. Install Apache (Optional)
    Try to make sure that it links the version of expat you just installed by checking the output of configure to see whereabouts it's linking. Apache is only required if you want to have a remote interface to the Cheshire3 databases, eg by SRW/U, OAI or Z39.50. However using regular CGI calls rather than mod_python (below) handlers will be much slower as the infrastructure takes a second or so to configure and instantiate.
    export CPPFLAGS=-I/home/cheshire/install/include
    export LDFLAGS=-L/home/cheshire/install/lib
    export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/cheshire/install/lib
    ./configure --prefix=/home/cheshire/install --enable-mods=all --with-berkeley-db=/home/cheshire/install --enable-suexec
    make
    make install
    Apache is also generally present in most systems, however you must ensure that it is run with the right environment variable so that it will link against the libraries that have been installed. Also, you'll need to ensure that the user which Apache is run as has read (and potentially write) access to the databases which the index and record data is maintained in.
  4. Install Python
    Python is the language that all of the main operational functions are written in, as opposed to the raw number crunching which is mostly done by C libraries. It's easy to understand and maintain, enabling developers to get right into the nitty gritty if desired, but without significant sacrifices to performance.

    MacOSX note: use --enable-framework in the configure to build as a framework.

    ./configure --prefix=/home/cheshire/install
    make
    make install
  5. Install mod_python (Optional)
    If you haven't installed Apache, you can skip this section. Mod_python allows Apache to run python code internally to handle connections and requests. Each apache thread gets its own python interpreter which is only started once and left running. This means that the Cheshire3 architecture only needs to be built once, rather than per invocation.
    ./configure --prefix=/home/cheshire/install --with-python=/home/cheshire/install/bin/python2.4 --with-apxs=/home/cheshire/install/bin/apxs
    make
    make install
  6. Install 4Suite
    4Suite is the best XML processing library available at the current time for Python. We use it for XPath and XSLT processing, as well as most DOM creation. See libxml2 for an alternative, but currently it does not support SAX2 under Python, nor does it produce unicode objects, just strings.
    python ./setup.py build
    python ./setup.py install
  7. Install ZSI
    ZSI is the best Python SOAP toolkit. The most recent version (1.5) comes with a WSDL compiler, however it's not yet quite up to SRW.

    The most recent version of ZSI either requires PyXML to be installed (which is not otherwise required for Cheshire3, and can conflict with 4Suite which is a better package) or to use the CVS version. The CVS version is available in the Cheshire3 FTP site.

    python ./setup.py build
    python ./setup.py install
  8. Install PyStemmer or TextIndexNG (Optional)
    PyStemmer is a wrapper around the Snowball stemming language. It provides interfaces to stemmers which reduce a word down to its lexical stem, eg 'princesses' to 'princess' or 'understanding' to 'understand'.

    The PyStemmer library is not being maintained but a new version is available in TextIndexNG. However the new version sometimes does not compile properly on older platforms (eg Redhat 7.2) and the older version does not compile properly on newer platforms (Fedora Core 3). The stemmer library is the only thing used from TextIndexNG.

    python ./setup.py install

    Or for TextIndexNG:

    python2.3 ./setup.py build
    cd build/lib*
    cp Stemmer.so /home/cheshire/install/lib/python2.4/site-packages/
  9. Install PyZ3950
    This package is required even if you don't want to enable Z39.50 interfaces as it contains the CQL libraries used in all C3 queries. It's also used by the Z3950SearchDocumentGroup as a Z client.

    You may need to install lex and yacc by hand first, as these are required to build the ASN.1 compiler.

    cp lex.py yacc.py /home/cheshire/install/lib/python2.4/site-packages/
    python ./setup.py install
  10. Install SRW (Optional)
    This very small package contains the stubs for ZSI as well as a quick SRW demo client in python. If you're not going to enable an SRW/U interface, or the SRWSearchDocumentGroup, you can omit it.
    python ./setup.py install
  11. Install DateUtils (Optional)
    The DateUtils code provides an excellent free text date parser (though doesn't currently handle multiple dates in the same block of text)
    python ./setup.py install
  12. Install libxml2 (Optional)
    This library is faster than expat and comes with its own XPath and XSLT implementations. But it's not required if you don't feel like it.
    ./configure --prefix=/home/cheshire/install --with-python
    make
    make install
    cd python
    python ./setup.py install
  13. Install libxslt (Optional)
    A companion library to libxml2 to process XSLT.
    ./configure --prefix=/home/cheshire/install --with-python
    make
    make install
  14. Install PVM (Optional)
    The Parallel Virtual Machine library is a very fast, low transaction cost parallelization system. It lets you run processes on multiple machines and compiles for multiple platforms. As Python and hence Cheshire3 will run on multiple platforms without any additional effort this means that you can build a completely heterogeneous cluster without any difficulties.
    [Coming]
  15. Install PyPVM (Required if PVM is installed)
    This is the Python wrapper around the PVM library.
    [Coming]
  16. If you run into issues with the 'sort' utility breaking, check you have the latest version of textutils installed.
    ./configure --prefix=/home/cheshire/install
    make
    make install

Configuration

  1. Environment variables required if you have installed this as a local user.
    export LD_LIBRARY_PATH=/home/cheshire/install/lib
    export LD_RUN_PATH=/home/cheshire/install/lib
    Also ensure that Apache is run with these environment variables (envvars / envvars-std files with httpd binary)
  2. Configure Apache.
    • The standard configuration is typically sufficient to start with. Add:
      Include conf/cheshire3.conf
    • And then in cheshire3.conf:
      # Load mod_python
      LoadModule python_module modules/mod_python.so
      
      # SRW/U interface at /srw/dbname
      <Directory /home/cheshire/install/htdocs/srw>
        SetHandler mod_python
        PythonDebug On
        PythonPath "['/home/cheshire/cheshire3/code']+sys.path"
        PythonHandler srwApacheHandler
      </Directory>
      
      # Z3950 interface on 2100
      Listen 2100
      <VirtualHost *:2100>
            PythonPath "['/home/cheshire/cheshire3/code']+sys.path"
            PythonConnectionHandler zApacheHandler
            PythonDebug On
      </VirtualHost>
      
  3. Configure Cheshire3.
    See further documentation.

Troubleshooting

If you get strange errors from mod_python under Linux, first trying restarting Apache. If this fails with a No space left on device error when there obviously is space, then you've hit the semaphore problem.
The fix is:

echo "512 32000 32 512" > /proc/sys/kernel/sem
Or see: http://clarens.sourceforge.net/index.php?docs+faq

If you have trouble compiling Apache 2.0.50 under Solaris, for example:

In file included from apr_dbm_berkeleydb.c:24:
/usr/include/stdlib.h:165: error: conflicting types for 'getsubopt'
/usr/local/lib/gcc/sparc-sun-solaris2.8/3.4.0/include/stdio.h:281: error:
previous declaration of 'getsubopt' was here
A (not very elegant) solution from Giulia Hill @ UC Berkeley is:
Run configure first, then comment out lines 165 and 189 of stdlib.h hoping that nobody else is going to use it just right then, then run make. Put back those lines before running 'make install'.