fzm                   package:UCS                   R Documentation

_T_h_e _F_i_n_i_t_e _Z_i_p_f-_M_a_n_d_e_l_b_r_o_t _L_N_R_E _M_o_d_e_l (_f_z_m)

_D_e_s_c_r_i_p_t_i_o_n:

     Object constructor for a finite Zipf-Mandelbrot (fZM) LNRE model
     with parameters alpha, A and B (Evert, 2004a).  Either the
     parameters are specified explicitly, or one or more of them can be
     estimated from an observed frequency spectrum.

_U_s_a_g_e:

     fzm(alpha, A, B)

     fzm(alpha, A, N, V)

     fzm(alpha, N, V, spc, m.max=15, stepmax=10, debug=FALSE)

     fzm(N, V, spc, m.max=15, stepmax=10, debug=FALSE)

_A_r_g_u_m_e_n_t_s:

   alpha: a number in the range (0,1), the shape parameter alpha of the
          fZM model.  'alpha' can automatically be estimated from 'N',
          'V', and 'spc'.

       A: a small positive number A << 1, the parameter A of the fZM
          model. 'A' can automatically be estimated from 'N', 'V', and
          'spc'.

       B: a large positive number B >> 1, the parameter B of the fZM
          model. 'B' can automatically be estimated from 'N' and 'V'.

       N: the sample size, i.e. number of observed tokens

       V: the vocabulary size, i.e. the number of observed types

     spc: a vector of non-negative integers representing the class
          sizes V_m of the observed frequency spectrum.  The vector is
          usually read from a file in 'lexstats' format with the
          'read.spectrum' function.

   m.max: the number of ranks from 'spc' that will be used to estimate
          the alpha parameter

 stepmax: maximal step size of the 'nlm' function used for parameter
          estimation.  It should not be necessary to change the default
          value.

   debug: if 'TRUE', print debugging information during the parameter
          estimation process.  This feature can be useful to find out
          why parameter estimation fails.

_D_e_t_a_i_l_s:

     The fZM model with parameters alpha in (0,1) and C > 0 is defined
     by the type density function

                      g(p) := C * p^(-alpha - 1)

     for A <= p <= B.  The normalisation constant C is determined from
     the other parameters by the condition

                     integral_A^B p * g(p) dp = 1


     The parameters alpha and A are estimated simultaneously by
     nonlinear minimisation ('nlm') of a multinomial chi-squared
     statistic for the observed against the expected frequency
     spectrum. Note that this is different from the multivariate
     chi-squared test used to measure the goodness-of-fit of the final
     model (Baayen, 2001, Sec. 3.3).

     See Evert (2004, Ch. 4) for further mathematical details,
     especially concerning the expected vocabulary size, frequency
     spectrum and conditional parameter distribution, as well as their
     variances.

_V_a_l_u_e:

     An object of class '"fzm"' with the following components: 

   alpha: value of the alpha parameter

       A: value of the A parameter

       B: value of the B parameter

       C: value of the normalisation constant C

       C: population size S predicted by the model

       N: number of observed tokens (if specified)

       V: number of observed types (if specified)

     spc: observed frequency spectrum (if specified)

     This object 'print's a short summary, including the population
     size S and a comparison of the first ranks of the observed and
     expected frequency spectrum (if available).

_R_e_f_e_r_e_n_c_e_s:

     Baayen, R. Harald (2001). _Word Frequency Distributions._ Kluwer,
     Dordrecht.

     Evert, Stefan (2004). _The Statistics of Word Cooccurrences: Word
     Pairs and Collocations._ PhD Thesis, IMS, University of Stuttgart.

     Evert, Stefan (2004a). A simple LNRE model for random character
     sequences. In _Proceedings of JADT 2004_, Louvain-la-Neuve,
     Belgium, pages 411-422.

_S_e_e _A_l_s_o:

     'zm', 'EV', 'EVm', 'VV', 'VVm', 'write.lexstats',
     'lnre.goodness.of.fit', 'read.spectrum', and 'spectrum.plot'

