Collocations

Compute significant bigrams and trigrams.

Inputs

  • Corpus: A collection of documents.

Outputs

  • Table: A list of bigrams or trigrams.

Collocations finds frequently co-occurring words in a corpus. It displays bigrams or trigrams by the score.

  1. Settings: observe bigrams (sets of two co-occurring words) or trigrams (sets of three co-occurring words). Set the frequency threshold (remove n-grams with frequency lower than the threshold).
  2. Scoring method:

Example

Collocations is mostly intended for data exploration. Here, we show how to observe bigrams that occur more than five times in the corpus. Bigrams are computed using the Pointwise Mutual Information statistics.

We use the grimm-tales-selected data in the Corpus and send the data to Collocations.

References

Manning, Christopher, and Hinrich Schütze. 1999. Collocations. Available at: https://nlp.stanford.edu/fsnlp/promo/colloc.pdf