
  UU   UUU  CCCCCCC SSSSSSSSS        		 
  UU   UUU CCC     SSS               		 
  UU   UUU CC       SSSS   UCS Version 0.4
  UU   UUU CC         SSS                       
  UU  UUUU CCC        SSS    Copyright (C) 2004-2005 by Stefan Evert 
   UUUUUU   CCCCCC SSSSS                        



INTRODUCTION

The UCS toolkit is a collection of libraries and scripts for the statistical
analysis of cooccurrence data.  It can be thought of as a simple and highly
specialised database, storing data sets of word pairs and frequency
information in a tabular format in plain (compressed) text files.  The data
sets can be viewed, printed, manipulated in various ways, annotated with
association scores, ranked, and sorted.  In addition, there are library
functions for the graphical evaluation of association measures in a
collocation extraction task and for measuring intercoder agreement.

UCS is based on two programming languages: Perl (www.perl.com) for data
manipulation, and R (www.r-project.org) for statistical processing.  The
UCS/Perl system with its command-line tools is the main workhorse of the UCS
toolkit.  Most of its functionality is encapsulated in Perl modules and can be
used in custom scripts.  The UCS/R system provides a number of R libraries
with convenience functions, evaluation graphs, measures of intercoder
agreement, and models for the distribution of cooccurrence frequencies.


SUPPORTED PLATFORMS

UCS should work on all Unix-like platforms, provided that the pre-requisites
listed below are met.  So far, it has been tested on:

  Linux 2.4 / i386  (SuSE 9.0, RedHat 9)
  Solaris 2.8 / SPARC
  Mac OS X / PPC

If you are running some version of Windows, you may be able to work with UCS
under the Cygwin emulation [http://www.cygwin.com/].  See "doc/install.txt"
for more information about the experimental Cygwin support.


PREREQUISITES

 * Perl version 5.6.1 or newer   [http://www.perl.com/]

 * Additional Perl modules       [http://www.cpan.org/]

    - Expect
    - Pod::Perldoc (Perl versions before 5.8.1 only)

 * The R statistical environment version 1.6 or newer
   [http://www.r-project.org/]

recommended, but not mandatory:

 * Additional Perl modules

    - Tk            (required by Tk::Pod)
    - Tk::Pod       (GUI for the UCS manpages)
    - Term::ReadKey (better viewing of data sets)

 * a2ps

Notice: 
Future releases of the UCS toolkit are expected to require Perl version 5.8.0 or newer!


QUICK INSTALLATION

1. Make sure that all prerequisites are installed.  You will probably need
   root access to install missing programs, or ask your system administrator
   to do so.  Perl modules can easily be installed with the "cpan" interactive
   shell (if you have root access).

2. Run the auto-configuration script, which tests whether all prerequisites
   are installed and configures UCS to work on your system.  If there are
   multiple versions of Perl on your computer, be sure to use an appropriate
   one (e.g. by typing "perl5.6.1" or "perl5" instead of "perl"). 

     perl System/Install.perl

   If some of the recommended (optional) modules are missing, the
   configuration script will stop with a warning and ask you for confirmation
   before the installation continues.

3. Include the directory containing the UCS/Perl command-line utilities 
   (System/bin/) in your search path.  You can also copy or link the programs 
   into an appropriate directory that is already in your path (e.g. ~/bin/).

If UCS does not work out of the box, consult the file "doc/install.txt" for
configuration options and some installation hints.


GETTING STARTED

Printable versions of the complete UCS documentation are provided in the doc/
directory, both in PostScript and PDF format.  There are separate documents
for UCS/Perl (UCS-Perl.ps.gz, UCS-Perl.pdf) and UCS/R (UCS-R.ps.gz,
UCS-R.pdf).  Additional HTML versions can be found in the subdirectories
doc/UCS-Perl-html/ and doc/UCS-R-html/.  On-line documentation for UCS/Perl is
accessed with the "ucsdoc" command ("ucsdoc ucsintro" is a good starting
point), while the UCS/R documentation is made available through the R help
system (once the UCS/R library has been loaded, type "?UCS" to get started).

If you are new to the UCS system, you should step through the UCS/Perl and
UCS/R tutorials.  The UCS/Perl tutorial can be found in the doc/ directory and
is available in text ("UCS-Perl-tutorial.txt") and HTML
("UCS-Perl-tutorial.html") versions.  The UCS/R tutorial has the form of an R
script and can be found in the System/R/script/ directory (change to the
System/R/ directory and follow the instructions in the README file there).  An
HTML version ("UCS-R-tutorial.html") is provided in the doc/ directory.

You should also have a look at "file-tree.txt" in the doc/ directory, which
explains the directory structure of the UCS distribution.  Answers to some
frequently asked questions are given in "faq.txt".  If you would like to
contribute useful UCS/Perl scripts (or make them easily available to other
users on your own computer), have a look at "contrib.txt".

If you are upgrading from a previous version of the UCS system, it is
recommended that you check the file "doc/changes.txt" for new or modified
features in the current UCS release.


QUESTIONS, COMMENTS, BUG REPORTS

The UCS system is BETA software.  It has not been tested extensively yet, so
there may be more or less serious bugs in the software.  Please send questions
about the system, comments, suggestions, and bug reports to the author.  His
current e-mail address is: stefan.evert@uos.de


ACKNOWLEDGEMENTS

Brigitte Krenn (FAI, Vienna) kindly provided a large database of manual
annotations for German PP+verb pairs.  The annotations in the GLAW data set
were produced by Stefan Evert, Ulrich Heid (IMS, Stuttgart), and Wolfgang
Lezius.  

My heartfelt thanks are due to Brigitte Krenn for long discussions in Vienna's
coffee houses and for our inspiring collaboration on the evaluation of
collocation extraction methods; as well as to Marco Baroni (Forli, Bologna)
for extensive beta testing of the UCS toolkit, encouraging colleagues and
students to use the software, and for many useful comments, bug reports, and
suggestions.  Marco Baroni and Patrick Watrin (UCL, Louvain-la-Neuve) included
the UCS toolkit in "live CD" Linux distributions focussing on corpus-related
software.



COPYRIGHT AND LICENSE

Copyright (C) 2004-2005 by Stefan Evert.

This software is provided AS IS and the author makes no warranty as to
its use and performance. You may use the software, redistribute and
modify it under the same terms as Perl (see "perldoc perlartistic").
