How to Build a Corpus of Legal Language: Ensuring its Representativeness

Glogar,  Ondřej

Publication details

How to Build a Corpus of Legal Language: Ensuring its Representativeness

Authors	GLOGAR Ondřej
Year of publication	2023
Type	Appeared in Conference without Proceedings
MU Faculty or unit	Faculty of Law
Citation
Description	Although the premise of the importance of language for law has resonated in legal theory for some time, existing research on legal language either lacks findings supported by sufficient data or does not cover all aspects of legal language. In particular, it may seem problematic that legal theorists, with few exceptions, describe legal language based solely on their own linguistic experience and a random selection of examples (as noted, for instance, by Mouritsen, 2017). One way of avoiding this problem of intuition and lack of empirical data is to use a language corpus that reflects the actual use of the language in everyday practice. A standard corpus thus collects a range of texts that are accessible by software, so that (mainly linguistic) hypotheses can be easily tested. And although there are already some corpora focused on legal language, they usually capture only a narrow segment or only a specific genre (e.g. a corpus covering only case law or statutes). Therefore, it is advisable to conceive of a comprehensive and balanced corpus including representatives from each genre of legal language. However, we may encounter many intersections when creating such a corpus and we need a suitable methodology first. In my paper, I thus discuss the various risks and procedures to be considered when building such a corpus. Through an analysis of the applied linguistics literature (e.g. Meyer, 2002), I evaluate the individual criteria for sample collection and segmentation and adapt them to the specifics of legal language. Perhaps the most important of these seems to be the question of the representativeness of such a corpus, which is the focus of the paper. The criteria for the selection of texts and utterances must necessarily differ from those of general language, as the different legal branches, legal language speakers, as well as genres of legal language need to be taken into account (cf. Tiersma, 2000, Cao, 2007). The main aim of this paper is to present reflections on the design and methodology for the creation of such a corpus of legal language, with a particular focus on its representativeness.
Related projects:	Construction of meaning in law

10 reasons why you will fall in love with MU

Ask our ambassador

Read about research at MU

How to Build a Corpus of Legal Language: Ensuring its Representativeness