Romansh Lemmatizer(BETA)
Basic Lemmatizer for Romansh Varieties (Beta): Demo
This demo visualises the functionalities of the package "romansh_lemmatizer", available at:
https://github.com/ZurichNLP/romansh_lemmatizer
The underlying Python package presents a basic dictionary-based lemmatizer for the Romansh language. Provided a Romansh text, the lemmatizer splits it into words and looks up each word in the Pledari Grond dictionaries for the five standard Romansh idioms: Sursilvan, Sutsilvan, Surmiran, Puter, and Vallader, as well as the dictionary for Rumantsch Grischun.
For example, if a Romansh text contains the word lavuraiva, the lemmatizer traces the word back to the Vallader and Puter dictionaries:
Typical use cases for the lemmatizer include:
- Accessing potential German translations (glosses) of Romansh words
 - Automatically detecting the variety of a Romansh text, based on how many words are found in the respective dictionaries
 
A limitation of the current version is that the lemmatizer does not disambiguate between multiple possible ways of lemmatizing a word. Specifically:
- If a word has multiple dictionary entries, all the dictionary entries are returned, irrespective of the context in which the word occurs.
 - If there are multiple ways of morphologically analysing a given word form, all possible analyses are returned.
 
Demo Interface
In the top left corner, the demo interface allows for a text to be input. Upon clicking the "Analyze" button, the lemmatizer is a applied to the text, which results in the text being split into tokens and in searching for the lemmas of each token.
The idiom scores in the top right corner are calculated as the number of tokens that have a lemma in a particular idiom's dictionary divided by the number of tokens in the sentence.
Underneath these two fields, the table displays the analysis of each token in the detected (i.e., the dark blue) idiom. This includes the lemma(s), if present, and a set of German translations as well as morphological annotations.
At the bottom of the page, a couple of example sentences in each idiom are provided.
Acknowledgements and Data Rights
This demo incorporates dictionary data from the Pledari Grond project.
- The dictionaries for Rumantsch Grischun, Surmiran, Sursilvan and Sutsilvan are openly licensed. © Lia Rumantscha 1980 – 2025
 - The dictionaries for Vallader and Puter are kindly provided by Uniun dals Grischs and may only be used in the context of this lemmatizer. © Uniun dals Grischs. All rights reserved.
 
Analysis of Words