Skip to content

Text-Fabric Serialization of the General Missives of the VOC

Date: 2022 (pre-GLOBALISE)
URL: https://clariah.github.io/wp6-missieven-search/text/index.html and https://github.com/CLARIAH/wp6-missieven/
Status: Demo
People involved: Dirk Roorda, Sophie Arnoult, Lodewijk Petram, Piek Vossen, Jesse de Does, Jessica den Oudsten, Daniël Tuik

A Text-Fabric representation of the General Missives of the Dutch East India Company (VOC) offers a new way to explore and analyze these reports. The General Missives sent from Batavia (Jakarta) to the Dutch Republic between 1610 and 1795, are now accessible for in-depth research thanks to efforts within the CLARIAH project by a team from VU University, the Huygens Institute, and the Dutch Language Institute. Utilizing advanced OCR and Named Entity Recognition techniques 1, the team enhanced these documents with metadata and structural elements, including annotations for entities like persons and locations.

The Text-Fabric serialization of the enriched texts is especially suited for linguistic analysis with computational methods. Users can explore the materials in the Text-Fabric search interface or by using the Text-Fabric Python package. A slightly less cleaned version of the same corpus is also available in a BlackLab search environment.

Screenshot of the Text-Fabric Search Interface for General Missives of the VOC
https://clariah.github.io/wp6-missieven-search/text/index.html

The General Missives summarize the information contained in the Overgekomen Brieven en Papieren series of documents from the VOC archives that the GLOBALISE project aimes to unlock for in-depth research. The corpus available in the BlackLab environment is a selection of General Missives from the period 1610-1767 that was transcribed, edited and published in 14 (digital) book volumes by the Huygens Institute and its predecessors. The original volumes are also available online.

Please note that the General Missives contain labels, characterizations and information about persons, actions and events that may be offensive and troubling to individuals and communities.


  1. Sophie I. Arnoult, Lodewijk Petram, and Piek Vossen. 2021. Batavia asked for advice. Pretrained language models for Named Entity Recognition in historical texts. In Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature, pages 21–30, Punta Cana, Dominican Republic (online). Association for Computational Linguistics. https://doi.org/10.18653/v1/2021.latechclfl-1.3