Bash commands reference

List of wordseg commands

Once wordseg is installed on your plaform, the commands are available from the terminal as any other command line tool. The commands for which we provide documentation are:

  • wordseg-prep takes as input a text in phonological-like form with tags, and preps it for segmentation by checking format, removing all tags but word boundaries to generate a gold version, and all tags but the minimal unit boundaries to generate what we will call prepared.txt, which is the input for segmentation.

  • wordseg-<algorithm> always takes as input a prepared.txt file, outputing the same text with word boundaries added. Please note that some algorithms require more input than just that. For details, see the :ref:overview page. The calls for the algorithms are:

    • wordseg-baseline for random baseline,

    • wordseg-dibs for the diphone based segmentation,

    • wordseg-tp for the transitional probabilities,

    • wordseg-puddle for PUDDLE,

    • wordseg-ag for the adaptor grammar.

    • wordseg-dpseg for the DPSeg or DMCMC (please note that this algorithm is not functional at present),

  • wordseg-eval takes as input a segmented text and a gold version, to compute the precision, recall and f-score at type, token and boundary levels. See more on the :ref:overview page.

  • wordseg-stats takes as input a segmented or gold text and computes basic statistics.

Note

  • To get all the details of a command arguments, have a wordseg-<command> --help,

  • More information on the algorithms and functions are in Overview.

  • For an example of use with all algorithms, see the Tutorial.

Commands input/output

All the commands read from standard input and write to standard output by default (this allows easy communication with other tools). But you can specify input and output files as arguments if you want to. For example the command:

cat my_input.txt | wordseg-prep > my_output.txt

is equivalent to:

wordseg-prep my_input.txt -o my_output.txt

Logging messages

The commands write log messages to standard error. The messages are either an error, a warning, an info or a debug message. You can choose the level of logging you want to display with the following arguments (available for all commands):

  • -v | --verbose displays errors, warnings and infos,

  • -vv | --verbose --verbose displays all the messages,

  • -q | --quiet does not display any message.

When running several commands in scripts or in parallel, the standard error can become a mess. It is possible to redirect stderr to a file using:

wordseg-<command> 2> ./my_log.txt