corpus
Parseme corpus
This multilingual resource contains corpora in which verbal MWEs have been manually annotated. The data covers 26 languages corresponding to the combination of the corpora for all previous three editions (1.0, 1.1 and 1.2) of the corpora. VMWEs were annotated according to the universal guidelines. The corpora are provided in the cupt format, inspired by the CONLL-U format. Morphological and syntactic information, including parts of speech, lemmas, morphological features and/or syntactic dependencies, are also provided. Depending on the language, the information comes from treebanks (e.g., Universal Dependencies) or from automatic parsers trained on treebanks (e.g., UDPipe). All corpora are split into training, development and test data, following the splitting strategy adopted for the PARSEME Shared Task 1.2. More info: https://gitlab.com/parseme/corpora/-/wikis/home
release 1.0: http://hdl.handle.net/11372/LRT-2282
release 1.1: http://hdl.handle.net/11372/LRT-2842
release 1.2: http://hdl.handle.net/11234/1-3367