Identification of indigenous knowledge concepts through semantic networks, spelling tools and word embeddings

Renato Rocha Souza, Amelie Dorn, Barbara Piringer, Eveline Wandl-Vogt

Veröffentlichungen: Beitrag in BuchBeitrag in KonferenzbandPeer Reviewed


In order to access indigenous, regional knowledge contained in language corpora, semantic tools and network methods are most typically employed. In this paper we present an approach for the identification of dialectal variations of words, or words that do not pertain to High German, on the example of non-standard language legacy collection questionnaires of the Bavarian Dialects in Austria (DBÖ). Based on selected cultural categories relevant to the wider project context, common words from each of these cultural categories and their lemmas using GermaLemma were identified. Through word embedding models the semantic vicinity of each word was explored, followed by the use of German Wordnet (Germanet) and the Hunspell tool. Whilst none of these tools have a comprehensive coverage of standard German words, they serve as an indication of dialects in specific semantic hierarchies. Methods and tools applied in this study may serve as an example for other similar projects dealing with non-standard or endangered language collections, aiming to access, analyze and ultimately preserve native regional language heritage. © European Language Resources Association (ELRA), licensed under CC-BY-NC
TitelLREC 2020 - 12th International Conference on Language Resources and Evaluation, Conference Proceedings
Herausgeber (Verlag)ELRA - European Language Resources Association
ISBN (Print)979-109554634-4
PublikationsstatusVeröffentlicht - 2020
Veranstaltung12th International Conference on Language Resources and Evaluation, LREC 2020 - Marseille, Frankreich
Dauer: 11 Mai 202016 Mai 2020


Konferenz12th International Conference on Language Resources and Evaluation, LREC 2020
KurztitelLREC 2020

ÖFOS 2012

  • 605007 Digital Humanities
  • 602014 Germanistik