The Boarnsterhim Corpus (BHC) is a spoken language corpus containing bilingual Frisian-Dutch data of four generations of speakers (born between 1898 and 2000). The corpus contains two main parts: one recorded between 1984 and 1987, collected by Tony Feitsma; and one recorded between 2017 and 2019, collected by Marjoleine Sloos. Almost 30 speakers have been recorded at both moments, which make the data suitable for comparable panel and trend studies.

The corpus is still under construction and will be transcribed in Standard Frisian and Standard Dutch. It will also be POS-tagged. Eventually, the corpus will be part of the Clarin infrastructure, hosted by The Dutch Language Union.

The data are suitable for research in sociolinguistic variation and change, phonological change, variation and change in bilingualism studies, historiographic description, and also anthropology.

Students who are interested in an internship on one of the following (or related) topics are welcome to contact dr. Marjoleine Sloos bilingualism, phonetics, sociolinguistics, phonology, reading skills, language attitudes, corpus linguistics.

Interns and volunteers who are interested in the construction of the corpus (recordings, orthographic transcriptions in Dutch and/or Frisian, phonetic transcriptions, POS tagging) are also advised to contact dr. Marjoleine Sloos


Does Frisian converge towards Dutch? That question has often been asked and some evidence seems to support that idea. To study whether the sound system of Frisian was really changing towards Dutch, The Boarnsterhim Corpus (henceforth BHC) was recorded in 1982-1984. The studies that followed from this suggest that the Frisian sound system was stable. In some respects, the distinction between Frisian and Dutch became even stronger. To further investigate whether this trend continues, the BHC2 is recorded in 2017-2019. Recordings and analyses of four generations of speech provides the opportunity to investigate the stability, variation, and change of the Frisian sound system over 100 years.

In both periods, speakers of three generations of the same families were recorded: grandmother, mother, and daughter; or grandfather, father, and grandson. The two younger generations of the first period overlap with the oldest two generations of the second period. A unique property of this corpus is that as far as possible, half of the overlapping generations in the BHC1 and the BHC2 consists of speech of the same individuals.

All speakers were recorded twice. One time they were recorded in Frisian with a native interviewer to ensure informal Frisian speech. The other time they were recorded in Dutch with a monolingual Dutch interviewer to avoid Frisian. Each recording consists of 20 read sentences, a read story (2-3 minutes), and an interview of about 40 minutes about the speaker’s use of Frisian, language attitude, and daily life activities. In the BHC1, data were recorded on cassette tapes which were digitalized in 2016. The BHC2 is a replication of the BHC1, with the same number of speakers and same age groups.

With the assistance of research assistants, interns, and volunteers, the data are annotated in Praat speech processing software. This separates the phrases, words, and sounds (with an accuracy of milliseconds). There are separate tiers (levels) for:

  • orthography
  • words
  • phonemes
  • phonetic realization
  • deletion of speech sounds
  • specific phonological processes


This corpus is highly suitable for research in the following fields

  • bilingualism and code-switching
  • long term language change
    • especially in bilingualism
    • and minority languages
  • the phonetics and phonology of Frisian
  • real-time vs. apparent time studies into language change
  • studies into the development of reading competences of Frisian
  • frequency effects in language
  • language and ageing
  • language attitude over time



Netherlandse Organisatie voor Wetenschappelijk Onderzoek NWO “The Netherlands Organisation for Scientific Research” VENI grant for Dr. Marjoleine Sloos.


Nederlandse Organisatie voor Zuiver Wetenschappelijk Onderzoek (currently Netherlandse Organisatie voor Wetenschappelijk Onderzoek NWO “The Netherlands Organisation for Scientific Research”)

Stichting Taalwetenschap

Fryske Akademy, for endowed chair Frisian

Fryslân Bank


2016 – to date

Project leader

Dr Marjoleine Sloos


Ir. Eduard Drenth


Dr Wilbert Heeringa

Orthographic transcriptions Frisian

Eke Born, Truus Bremer, Edo Eisma, Kobe Flapper, Renske Hooijenga, Hilde de Jong BA, Dik Nauta, Wytse Willem Pel, Janneke Spoelstra MA, Wilma Stienstra, Tineke Tamminga, Helga Zandberg

Orthographic transcriptions Dutch

Grietje Keizer-Heeringa, Theresia Schreiber, Edmee Valk-Boon BA, Rick Weggen


Andrea Garcia Ariza MA, Tessa Hummel BA, Mirte Koppenberg, Bahar Soohani PhD


Grietje Keizer-Heeringa, Dik Nauta, Theresia Schreiber



Tony Feitsma


Els van der Geest M.A., dr. Frits van der Kuip, Irénke Meekma, M.A.


Sloos, Marjoleine, Eduard Drenth & Wilbert Heeringa (Forthcoming). The Boarnsterhim Corpus: A Bilingual Frisian-Dutch Panel and Trend Study. In Proceedings of the 11th edition of the Language Resources and Evaluation Conference, 7-12 May 2018, Miyazaki (Japan).

Feitsma, Antonia. (1989). Changes in the pronunciation of Frisian under the influence of Netherlandic. In Deprez, K. (ed.), Language and Intergroup Relations in Flanders and in the Netherlands, 181-193. Dordrecht: Foris.

Meekma, Irénke. 1989. Frouljuspraat en it lytse ferskil. Oer útspraakferoaring yn ‘e sandhi by froulju en manlju. It Beaken 51, 115-29.

Feitsma, Tony, Els van der Geest, Frits J. van der Kuip & Irénke Meekma. 1987. Variations and development in Frisian sandhi phenomena. International Journal of the Sociology of Language 64, 81-94.

van der Kuip, Frits J. 1986. Syllabisearring yn it Frysk en it Hollânsk fan Fryskpraters. Tydskrift foar Fryske Taalkunde 2, 69-92.