Vitalii Kotliarenko — How we build language models at Grammarly (20 minutes, Lightning Talk)

Statistical language models (LMs) are one of the core concepts in natural language processing (NLP). In essence, it is a probability distribution over sequences of words. Simple yet powerful LMs like n-grams have found their application in machine translation, general error correction, and many other areas of NLP. This talk put the light on the process of training of the n-gram language model on large corpora, reveal challenges on the way of implementation of the pipeline in Scala and Apache Spark.


