Statistics for Linguistics with R

Quantitative Methods in Humanistic studies


PhD School at the Faculty of Humanities at University of Copenhagen

The aim of the course is to introduce students to statistical techniques for use in experimental and data-driven analyses. Such techniques are fundamental in the analysis and interpretation of data in many domains. In the Humanities, as in many other fields, there is an increasing emphasis on data-driven techniques to complement more analytical perspectives, making a working knowledge of statistics essential. For example, literary analysis has benefited from quantitative methods in the study of stylistics, reader response and authorship attribution. In linguistics, the use of large corpora, as well as methods from experimental psychology, have placed many theories on a firmer empirical footing. Historical research can also benefit from quantitative methods, for example, in the identification of the role of specific events in shaping demographic trends. The study-unit will focus on the following areas:

  • An introduction to basic probability theory;
  • The notion of a distribution, with particular reference to some fundamental distributions such as the normal and zipfian distributions;
  • The concept of a variable and different types of numerical data;
  • Basic measures of central tendency and dispersion in samples and populations;
  • Correlation and regression techniques;
  • The foundations of hypothesis testing, and the use of inferential statistical methods to falsify the null hypothesis.

The course will introduce different tests with an emphasis on:

  • When they should be used, depending on the type of data in hand;
  • How they should be interpreted, in particular, when a trend can be considered reliable. 

Throughout, an emphasis will be placed on practical applications, with students being given the opportunity to deploy their newly acquired skills to analyse data from their PhD project. An important feature of this course is that it also introduces students to the use of the R software package. (Read more here:

Time: The course will take place from Monday to Friday, 9:00-16:00.

The first four days of the course, there will be lectures and online training with the R system in the morning. Most of the examples used in these sessions will be taken from linguistics. The afternoons will be devoted to work with the students’ own data. The last day, the students will present the analyses they have carried out.

Prerequisites: The students will have to bring their own laptops, and to download and install the R software prior to the course. Instructions on how to do this will be provided.

The participants will also be asked to send a brief description of the project data they intend to work on during the course.

Max. number of participants: 20.

Textbook: Gries, Stefan T. (2009) Statistics for Linguistics with R . De Gruyter Mouton.

Additional readings: Baayen, R.H. (2008) Analysing Linguistic Data: A Practical Introduction to Statistics Using . Cambridge UP. Online introduction to statistics for the humanities at

