Portal | CLARIN Centre voor Nederland en Vlaanderen

Last update: 04-02-2025

Deposition of data at the Dutch Language Institute

The repository of the Dutch Language Institute (Instituut voor de Nederlandse Taal – INT) gives access to the language resources and tools of the INT and of other organizations.

Collection Policy

An important part of the mission of the Dutch Language Institute is providing access to Dutch source material in the form of historical and contemporary corpora, dictionaries, lexical digital databases, grammars, … including the required technical tools. Apart from the data we produce ourselves we also accept resources from other organizations.

Scope of the collection

We are interested in digesting all kind of resources that pertain to the Dutch language. However, we use a number of criteria for selection:
  • Size: The size of the resource is of sufficient interest for scientific research and/or commercial exploitation.
  • Non-Redundancy: The resource is sufficiently unique and cannot be found elsewhere.
  • Quality: The resource is well documented and the data clearly organized.
  • Sustainability: The data is provided in a sustainable format (See ‘Supported Data formats’).

Deposition Process

If you would like to deposit your data at the INT, please send a message to ‘servicedesk@ivdnt.org’. We will check with you whether the data fits in our collection. We might ask you for some additional information, and if necessary, will ask you to sign an agreement (See ‘Deposition Agreement’). The data will be safely archived at the INT and we will create a product page for your resource and a PID that can be used for referencing. Furthermore we will ensure that the language resource is findable in the CLARIN VLO (Virtual Language Observatory, https://vlo.clarin.eu): a central search engine for all kinds of language resources.

Guidelines for deposition

Data presented for deposition need to be supplied with all information that is essential for sustainable data management and future use.

Moreover, the data should be provided in standard formats. See ‘supported data formats’ . For archiving purposes, a minimum set of metadata in valid CMDI format either needs to be provided by the data producer or is extracted by the INT from the data and documentation.

Data producers are encouraged to supply additional documentation of the data or links to publications (using persistent identifiers) about the data.

Responsibilities

The data producer will always remain the proprietor of the data. The INT receives a copy of which it must take good care, according to the terms of the license contract and the terms and conditions for use. The INT also makes copies, for example for the benefit of backup and looks after them well. In case of an emergency we are able to build up an entire new database composed of all files we backed up and stored safely at another location.

Preservation

To ensure the integrity of the data sets, for every deposited file a checksum (md5 type) is made which allows us to check for defects of the data over the years. Once deposited, files in data sets are never changed and only minor changes to the metadata are allowed. For example: correction of spelling, minor changes in documentation, additional documentation added. Changes to the data themselves will be issued as a new version of the dataset, which will obtain a new persistent identifier. These changes are only made in close collaboration with the producer of the dataset.

Authenticity

Data producers hand over the materials to us. We do not change the data, except by adding metadata if required. The repository maintains links to other relevant materials (e.g. articles, theses, documentation, related data) and to software and tools that have been used in production of the data, if applicable.

Deposition Agreement

If the language resource is not released with an open license, a signed deposition agreement is required. That can be our own standard agreement, or an agreement from the CLARIN Licensing Framework (https://www.clarin.eu/content/clarin-licensing-framework) or any tailor-made agreement.

Supported Data Formats

Type Preferred Acceptable
Text Documents PDF/A (.pdf)
ODT (.odt)
PDF other than PDF/A (.pdf)
Rich Text File (.rtf)
Open Office XML (.docx)
Microsoft Word (.doc, .docx)
Plain text Unicode text (.txt) ASCII (.txt)
Markup[1] XML (.xml)
HTML (.html)
XHTML (.html)
SGML (.sgml)
Wikitext
Spreadsheets ODS (.ods)
CSV (.csv)
Office Open XML Workbook (.xlsx)
Database files SQL (.sql)
Open Document Database (.odb)
Raster Images JPEG (.jpg, .jpeg)
TIFF (.tiff)
PNG (.png)
JPEG 2000 (.jp2)
Vector Images SVG (.svg) EPS (.eps)
PostScript (.ps)
Audio Broadcast Wave Format (.bwf)
Material Exchange Format (.mxf)
FLAC (.flac)
OPUS (.opus)
Matroska Multimedia Container (.mka)
WAVE (.wav)
MP3 (.mp3)
AIFF (.aif, .aiff)
OGG (.ogg)
AAC (.aac, .m4a)
Video Material Exchange Format (.mxf)
Matroska Multimedia Container (.mkv)
MPEG-4 (.mp4)
MPEG-2 (.mpg)
AVI (.avi)
QuickTime (.mov, .qt)
Geographic Information Systems (GIS) GML (.gml)
MapInfo Interchange Format (.mif, .mid)
MapInfo (.tab, related files)
RDF RDF/XML (.rdf)
Trig (.trig)
Turtle (.ttl)
Ntriples (.nt)
JSON-LD

[1] Any files that are related to markup language files, such as .css and .js for HTML, .xslt for XML, etc., can be submitted along with any primary markup language files they are supplementary to.

Notice

In connection with copyright law, some products or tools are only accessible with a user ID and password. Are you employed by a university or scientific institute? Then you can log in with the user ID and password of your own organization. Is your organization not in the list or do you not have an account at an academic institution? Then you can open an account with CLARIN.EU

About this portal

The Repository "CLARIN Centre INT" gives access to language resources and tools from the INT and other CLARIN Members. The INT has obtained the Data Seal of Approval.

About CLARIN

CLARIN wants to achieve an integrated, interoperable research infrastructure of language resources and language technology. This infrastructure must be stable, permanent, accessible and expandable; it should put an end to the current fragmentation, and promote the use of computational techniques in the humanities (eHumanities).

About the Dutch Language Institute

More information about the Dutch Language Institute (INT) can be found on our website. General information is also available in English.