UCS                   package:UCS                   R Documentation

_I_n_t_r_o_d_u_c_t_i_o_n _t_o _U_C_S/_R

_D_e_s_c_r_i_p_t_i_o_n:

     UCS/R consists of a set of R libraries related to the
     visualisation of cooccurrence data and the evaluation of
     association measures. The current functionaliy includes:
     evaluation graphs for association measures (in terms of precision
     and recall), measures for inter-annotator agreement, and two
     population models for word frequency distributions.

_U_s_a_g_e:

     source("/path/to/UCS/System/R/lib/ucs.R")
     ucs.library()

_D_e_t_a_i_l_s:

     UCS/R is initialised by 'source'ing the file 'ucs.R' in the 'lib/'
     subdirectory of the UCS/R directory tree.  This will make the
     UCS/R documentation available in the R process and provide the
     'ucs.library' command, which is used to load individual UCS/R
     modules. Enter 'ucs.library()' now to display a list of available
     modules (see the 'ucs.library' manpage for details).

     Currently, the following modules are available.  The listing below
     also indicates the most important manpages for each module.
     Throughout the documentation, it is assumed that you are familiar
     with the UCS/Perl naming conventions and data set file format.


        *  'sfunc:'  *Special Mathematical Functions*

           Convenience interfaces to the Gamma function ('Cgamma'), the
           incomplete (and regularized) Gamma function and its inverse
           ('Igamma', 'Rgamma'), the Beta function ('Cbeta'), the
           incomplete (and regularized) Beta function and its inverse
           ('Ibeta', 'Rbeta'), and binomial confidence intervals
           ('binom.conf.interval').

           All these functions are computed from the 'pgamma' and
           'pbeta' distributions (and the corresponding quantile
           functions) in the standard library of R.

        *  'base':  *Basic Functions for Loading and Managing UCS data
           sets*

           This module provides functions for loading UCS data set
           files ('read.ds.gz'), listing annotated association measures
           ('ds.find.am', 'am.key2var'), ranking by association scores 
           ('order.by.am', 'add.ranks'), and computing precision/recall
           tables for the evaluation of association measures
           ('precision.recall').

           The module also includes a listing of all built-in
           association measures in the UCS/Perl system, including
           add-on packages ('builtin.ams').

        *  'plots':  *Evaluation Graphs for Association Measures*

           This module plots precision-, recall-, and
           precision-by-recall graphs for the empirical evaluation of
           association measures (all combined in a single function,
           'evaluation.plot'). The graphs are highly configurable,
           either locally in each function call or by setting global
           defaults ('ucs.par'). The 'evaluation.plot' function
           supports confidence intervals, significance tests for result
           differences, and evaluation based on random samples (see
           Evert, 2004, Ch. 5). A simple text-mode version of the
           precision/recall-based evaluation is provided by the
           'evaluation.table' function in the 'base' module.

        *  'iaa':  *Measures of Inter-Annotator Agreement*

           Computes Cohen's kappa statistic with standard deviation
           (Fleiss, Cohen & Everitt, 1969) or confidence interval for
           proportion of true agreement (Krenn, Evert & Zinsmeister,
           2004) from a 2-by-2 contingency table (see 'iaa.kappa' and
           'iaa.pta')

        *  'gam':  *Generalised association measures (GAMs)*

           This module implements extensions of several association
           measures to continuous functions on a real-valued coordinate
           space (generalised association measures, GAMs).  For details
           and terminology, please refer to Evert (2004, Sec. 3.3). 
           The functions in this module compute GAM scores and
           iso-surfaces in standard or ebo-coordinates, and can add
           jitter to a given data set.  New GAMs can easily be added
           with the 'register.gam' function. Relevant help pages are
           'builtin.gams', 'gam.score', 'gam.iso', 'gamma.nbest',
           'add.jitter', 'add.gams', 'add.ebo', and 'gam.helpers'.

        *  'eo':  *Visualise GAMs in the (e,o) plane*

           This module implements 2-D visualisation of data sets and
           GAMs by plotting point clouds and iso-lines in the (e,o)
           plane (see Evert 2004, Sec. 3.3).  The recommended starting
           point is the documentation of the 'eo.setup' function, which
           intialises a new (e,o) plot.  Other relevant help pages are
           'eo.par', 'eo.points', 'eo.iso', 'eo.iso.diff', 'eo.legend'
           and 'eo.mark'.

        *  'lexstats':  *Utilities for lexical statistics*

           This module contains miscellaneous utility functions for
           word frequency distributions, including: an interface to
           file formats used by the 'lexstats' software (Baayen 2001);
           a range of common plots; goodness-of-fit evaluation for LNRE
           populations models (cf. the 'zm' and 'fzm' modules below). 
           Currently, the most useful functions in this module are
           'read.spectrum', 'spectrum.plot', and
           'lnre.goodness.of.fit'.

        *  'zm':  *The Zipf-Mandelbrot (ZM) Population Model*

           This module implements a simple population model for word
           frequency distributions (Baayen, 2001) based on the
           Zipf-Mandelbrot law.  See (Evert, 2004a) for details. 
           Relevant help pages are 'zm', 'EV', 'EVm', 'VV', 'VVm',
           'write.lexstats', and 'lnre.goodness.of.fit'.

        *  'fzm':  *The Finite Zipf-Mandelbrot (fZM) Population Model*

           This module implements the finite Zipf-Mandelbrot model, an
           extension of the ZM model (Evert, 2004a). Relevant help
           pages are 'fzm', 'EV', 'EVm',  'VV', 'VVm',
           'write.lexstats', and 'lnre.goodness.of.fit'.

     The command 'help(package=UCS)' will give you a full index of
     available UCS/R help pages.  Use 'help.search()' for full-text
     search.

_N_o_t_e:

     The correct 'source' path for the file 'ucs.R' can be set
     automatically with the UCS/Perl tool 'ucs-config'.  Simply insert
     the statement


         source("ucs.R")

     on a separate line in your R script file (say, 'my-script.R') and
     run the shell command


         ucs-config my-script.R


_R_e_f_e_r_e_n_c_e_s:

     Baayen, R. Harald (2001). _Word Frequency Distributions._ Kluwer,
     Dordrecht.

     Evert, Stefan (2004). _The Statistics of Word Cooccurrences: Word
     Pairs and Collocations._ PhD Thesis, IMS, University of Stuttgart.

     Evert, Stefan (2004a). A simple LNRE model for random character
     sequences. In _Proceedings of JADT 2004_, Louvain-la-Neuve,
     Belgium, pages 411-422.

     Fleiss, Joseph L.; Cohen, Jacob; Everitt, B. S. (1969). Large
     sample standard errors of kappa and weighted kappa. _Psychological
     Bulletin_, *72*(5), 323-327.

     Krenn, Brigitte; Evert, Stefan; Zinsmeister, Heike (2004).
     Determining intercoder agreement for a collocation identification
     task.  In preparation.

_S_e_e _A_l_s_o:

     'ucs.library', the UCS/R tutorial ('tutorial.R' in the 'script/'
     subdirectory) and the UCS/Perl documentation.

