All tools and classes used so far have a large number of options
that make them highly configurable. For instance, there are other
properties of a factory that can be specified—please have a look at the
Javadoc of the document factory you are using. For instance, a common
property is wordreader, which makes it possible to
specify a different instance of WordReader—the
class that it used to segment text into words and non-words. The
standard WordReader
(FastBufferedReader) considers just letters and
digits as part of a word, but you can choose your variant, and even
specify it directly on the command line: for instance,
-pwordreader=FastBufferedReader\(_\) specifies that
underscores should be considered as part of a word. More generally, you
can specify an expression that follows dsutils's
ObjectParser conventions and that will be used to
instantiate a WordReader.
All MG4J tools implement the standard --help
option, which will display a detailed help text.