Namgyal Manuscript Collection Datasets

Markus Viehbeck (Editorial Journalist), Eric Werner (Developer)

Publications: Electronic/multimedia outputSoftware or database

Abstract

These are the official datasets created for the Tibetan Manuscript Project Vienna (TMPV) in the years 2023 and 2024. These datasets contain:

OCR datasets (line image - line label pairs) created from the PageXML annotations
PageXML (Transkribus) annotations in Unicode and Wylie
PageXML Layout annotations (lines, images, captions, margins) used for image segmentation training
OCR models (PyTorch checkpoints and ONNX model files)
Original languageEnglish
Media of outputOnline
Size1,3GB
DOIs
Publication statusPublished - 29 Nov 2024

Austrian Fields of Science 2012

  • 602050 Tibetan studies

Cite this