Archiving and using corpora [1/4] (FieldLing 2022)

This post accompanies a lecture given on “Archiving and using corpora” as part of the FieldLing 2022 “International School in Linguistic Fieldwork” hosted by CRNS in Paris (INALCO).

Outline:
Context
Why archives
How to archive
Ethics of archiving

Context

Archiving in linguistics is a topic that comes out of a recent history of the field reckoning with language endangerment. See Henke & Berez-Kroeker (2016) for a full history.

Language Endangerment: In the late 20th century, linguists started to become more aware that all over the world languages were no longer being transmitted to the next generation. They began to use the label “language endangerment” for this severe, unprecedented and largely irreversible reduction in the world’s linguistic diversity.

“Without intervention to increase language transmission to younger generations, we predict that by the end of the century there will be a nearly five-fold increase in Sleeping languages, with at least 1,500 languages ceasing to be spoken.”

(Bromham et al. 2021)

Language Documentation: One major concern for scholars was the loss of access to linguistic data. This led to increased efforts to put academic resources into various ways of collecting data. The most popular way of thinking about this new emphasis was articulated under the label “Language Documentation”.  

“…a language documentation is a lasting, multipurpose record of a language.”

(Himmelmann 2006)

Archive: Since significant resources were being invested into creating datasets for researchers, there was also an increasing interesting in understanding how to preserve and make available those datasets through digital language archives.

“An archive is a repository or institution that preserves materials so they can be accessed in the future.”

(Kung et al. 2020)

An essential point is that the institutional nature of archives ensures that the data they hold is much more secure and likely to last than data on a hard drive or on a website server.

In addition to researchers, the other stakeholders in language archiving are the communities of people whose languages are being documented. The central role of the speakers/signers of these languages often gets overlooked.

“…language archives now present a veritable treasure trove of non linguistic information encoded in the signal of the subject language. And users of language archives are very often interested in this type of information.”

(Holton 2012)

Consider language work have you done or would you like to do. How might a language corpus you archive potentially be used by various groups?

Next: Why archives

5 thoughts on “Archiving and using corpora [1/4] (FieldLing 2022)

  1. Bonjour et bonne année  2023 Joe,J’espère que tu te portes bien. Est-ce qu’on peut avoir une attestation de participation au CALL 51 d’août passé pour des besoins administratifs ?Bien cordialement,Bebey Théodore.

    Like

Leave a comment

Design a site like this with WordPress.com
Get started