


	FREQUENTLY ASKED QUESTIONS


Q: When I unpack the UCS distribution, tar prints a warning that "Cannot
create symlink to `Perl/': File exists" and ends with the message "Error exit
delayed from previous errors". What has gone wrong?

A: Unfortunately, your file system is dyslexic and cannot distinguish between
uppercase and lowercase letters.  Don't worry, tar has successfully unpacked
the distribution and you can just proceed with the installation.


Q: Why can't I just load a text file into UCS and get back a list of collocations?

A: Because UCS is designed for the statistical analysis and evaluation of data
sets.  Extracting cooccurrences from a text corpus is an entirely different
task, and it is far from trivial.  Accurate identification of relational
cooccurences often requires specialised tools for a sophisticated syntactic
and morphological analysis of the corpus.  If you still want to get off to a
quick start from raw text, you should have a look at Ted Pedersen's N-gram
Statistics Package (NSP). [ http://www.d.umn.edu/~tpederse/nsp.html ]


Q: Can I use NSP to extract cooccurrences and then compute statistics with
   UCS/Perl and do the evaluation in UCS/R?

A: No problem. You can import cooccurrence data extracted with NSP's count.pl
tool by using the script "nsp2ucs.perl" in the System/Perl/tools/ directory.
A short example is given in "getting-started.txt".


Q: What about N-grams?

A: Easy. Let N=2. ;o)

Serious A: The UCS system is only designed for pair cooccurrences
(i.e. bigrams). I will turn to N-grams when I have finished my studies of the
bigram case, but they will probably require different statistical and
technical approaches.  When somebody asks me why I spend so much effort on a
comparatively simple special case, my answer is a quote from D. R. Cox:

  Nevertheless points remain for discussion, in particular so as to
  understand what to do in more complicated cases for which the single
  2x2 table is a prototype.


Q: Why doesn't UCS use an XML format for data sets files?

A: The UCS data set file format was chosen to be compatible both with Perl and
R. It is compact, efficient, and can actually be read by a human being. XML is
a waste of space. That said, I'm planning to implement XML import and export
filters for UCS data sets in one of the next releases.


Q: I only want to use the UCS/Perl part.  Why does Install.perl insist that
   I have to install R?

A: UCS/Perl depends on an R backend for special mathematical functions
(including statistical distributions).  This backend is used in various places
throughout the system, including some of the built-in association measures.
UCS/Perl will work without R, but may suddenly abort when it attemps to use
a special function.  If you are willing to take the risk (and work around 
those problems), you can force installation by passing the "--force" option
to "Install.perl" ("perl System/Install.perl --force").  See "install.txt" 
for more information.  Actually, why didn't you read the detailed
installation instructions in the first place before turning to the FAQ?


Q: Can I use UCS with non-latin characters?

UCS, being implemented in perl, uses utf-8 as the default character set. Thus,
as long as you can convert your non-latin data to utf-8 (e.g. with the GNU
recode utility [ http://www.gnu.org/directory/recode.html ]), UCS will have no
problems handling them.  In order to visualize the data correctly, you will
need, of course, a shell that understands and displays utf-8 (e.g., the GNU
bash shell).  Moreover, since ucs-print uses "less" as a pager, you have to
make sure that your "less" handles utf-8 correctly.  If your locale is not set
to utf-8, you can do this by setting the LESSCHARSET environment variable to
the value "utf-8". For example, if you use the bash shell, type the following
command at the prompt:

  export LESSCHARSET=utf-8

or add it to your init file "~/.bashrc". UCS has been tested successfully with
utf-8 encoded Japanese data.
[Thanks to Marco Baroni for this entry.]


Q: Can I use UCS in the Windows operating system? I know that many other
   Perl tools work with ActivePerl for Windows.

A: Wouldn't you rather install a decent operating system and forget about
Windows?  If you absolutely feel you need to help Bill Gates make even more
money, you can try your luck with Cygwin [ http://www.cygwin.com/ ], a Unix
emulation layer running under all recent versions of Windows (not Win95). 
The current release of UCS has experimental support for Win32/Cygwin, but
there are many pitfalls.  See "install.txt" for more information.


Q: Wouldn't it be nice to have a glossary for the UCS terminology somewhere?

A: Yes. I just haven't got round to writing it. 


Q: What does UCS stand for?

A: Try "ucsdoc ucsintro". Haven't you read the documentation? ;o)


Q: When I run "ucsdoc", the manual pages is displayed on screen, one page at a
   time, but it is full of weird characters and "ESC"s.  What is happening?

A: You will probably experience the same problem when you call "perldoc"
directly, or the "perldoc" script isn't even installed on your system (seen on
Mandrake Linux).  Perldoc uses a pager (typically "less") to display the
formatted manpage, which seems to get confused by the highlighting added to
the text.  Try setting the $LESS environment variable to "-R".  To do so, type
"export LESS=-R" in sh or bash, and "setenv LESS -R" in tcsh.  If this
variable is already defined, it is probably a good idea to preserve the
current settings and just add the "-R" flag.  If setting $LESS doesn't solve
the problem, you can run "ucsdoc" (and "perldoc") with the option "-t" to
display a plain-text version of the manpage.


Q: How can I find out how an association measure, say log-likelihood, is
   implemented?  I mean the actual Perl code?

A: First of all, its implemented as a UCS expression, not in plain Perl. :o)  
You can access the source code of a built-in association measures in two ways:
(a) look it up in the Perl module where it is implemented (UCS::AM, etc.); or
(b) be cool and type

  ucs-config -e 'print UCS::AM_Expression("log.likelihood")->string, "\n"'

After all, Perl wouldn't be Perl without one-liners, would it?


Q: "ucs-print --interactive" doesn't display the data set in paged mode, but
rather as a single long table.  Typical symptoms are that column headers are
missing from all but the first page, the formatting may be odd, and large data
sets may take a long time before the first page of data is displayed.

A: You should install the optional Term::ReadKey module (which was recommended
by the installation script, but apparently you chose to continue without it).
On Linux, it might be enough to type the command "export LINES" before running
ucs-print, or add it to "~/.bashrc".