Introduction to Advanced Statistics for Language Sciences

This workshop aims to provide practical introductions to advanced statistical methods used in research in the language sciences, with a focus on research in bilingualism. The workshop includes three sessions that will cover mixed-effects models, generalised additive mixed-models, and Bayesian analysis. Each session will include an introduction to the method, and worked-through, practical examples in the statistical software package R.


We do not expect pre-requisite knowledge in mixed-effects models or Bayesian analysis, but attendees should have basic understanding of R and R syntax. This workshop is best suited for PhD students, early career researchers and other academics who work in bilingualism and are interested in learning about, or refreshing their understanding of, these methods. Attendees are directed to either Learning Statistics with R or Statistics for Linguistics as relevant preparation resources for this workshop.

Introduction to Mixed-Effects Models in Bilingualism Research (Ian Cunnings and George Pontikas)

The last decade or so has seen a step-change in the statistical analysis techniques used in different subfields of linguistics and psychology, including bilingualism research and second language acquisition. While statistical tests such as t-test and ANOVA were prevalent in bilingualism research only a decade ago, use of mixed-effects models has increased considerably over recent years (Baayen et al., 2008; Barr et al., 2013; Cunnings, 2012). The move to mixed-effects models has been motivated by several long-standing problems in research in bi-/multilingualism. For example, mixed-effects models overcome long-standing problems in generalising study findings to both study participants and the linguistic materials tested (Clark, 1973). Mixed-models also provide a framework for analysing different types of data, including continuous and binary dependent variables, and for modelling different types of fixed and random effects (Baayen et al., 2008). This is especially useful from the perspective of second language acquisition research, where the ability to model nested random effects, such as occurs when multiple second language learners are tested in different classrooms and schools, is crucial to assessing the generalisability of study findings.


While mixed-effects models are thus becoming standard within the field, researchers may still be unfamiliar with their use, or unsure how to deal with modelling issues, such as model convergence errors, that can come up during analysis. In this talk, we will provide a practical introduction to mixed-effects models for bilingualism researchers. We will provide a conceptual introduction to mixed-effects modelling and go through some practical examples using the lme4 package in R. In so doing, we will provide guidance on best practice in their use for different types of data. Our talk will provide an introduction to mixed-effects models for those unfamiliar with them, and a refresher for those who have some experience with such models.

Dr Ian Cunnings is an Associate Professor of Psycholinguistics in the School of Psychology and Clinical Language Sciences at the University of Reading, UK. His research examines sentence processing in different populations of speakers, including first language users and second language learners. He has used mixed-effects models widely in his research and has written several introductions to their use in second language acquisition and bilingualism research (Cunnings, 2012; Cunnings & Finlayson, 2015; Linck & Cunnings, 2015).


Dr George Pontikas is a Lecturer in Clinical Language Sciences in the School of Psychology and Clinical Language Sciences at the University of Reading. His research examines the processing of complex morphosyntax in bilingual populations, including children, using the visual-world paradigm. He has used mixed-effects models diversely in his research to model response accuracy, reaction times and gaze data (Pontikas, Cunnings & Marinis, 2022), and teaches research methods and statistics on undergraduate and postgraduate degrees at Reading.

Introduction to Generalised Additive Models in R (Ste Coretta)

While most processes that involve linguistic phenomena can be analysed assuming a form of linear dependence between variables, there are particular cases in which such assumption might not hold. Non-linearity is typical of time-series or longitudinal data and of multi-dimensional data like EEG and MRI data.

This talk introduces Generalised Additive Models in R as one possible solution to the analysis of non-linear data. Generalised Additive Models (GAMs) are an extension of generalised linear models that allow researchers to fit so-called smoothers to non-linear data. The smoothers can estimate both linear and non-linear effects and are a computationally efficient way of modelling, for example, effects across trials, longitudinal changes, time-series data like ERPs and other physiological measures, and spatial data such like geographic variation data and EGG electrode amplitudes or MRI data.

After a conceptual introduction to non-linearity and GAMs, the talk will show how to fit GAMs in R using the mgcv package. Modelling of random (non-linear) effects will also be discussed. Real datasets relevant to bilingualism research will be used to illustrate the process of fitting and interpreting GAMs.

Dr Ste Coretta holds a tenured faculty position as Senior Teaching Coordinator for Statistics in the Linguistics and English Language department of the University of Edinburgh. While Ste's background is in general and historical linguistics and experimental phonetics, he has developed an expertise in quantitative methods and statistics. Ste teaches and coordinates statistic courses and provides staff with training seminars on advanced statistical techniques.

Ste is involved in the delivery of several workshops on a variety of statistical topics (such as Bayesian Linear Models, Generalised Linear Models, Introduction to R) and he keeps an active research profile as lead in meta-scientific projects (like the Many Speech Analysis Project) and as statistical collaborator on other research projects in different branches of linguistics.

Bayesian analyses for bilingualism research: A primer (João Veríssimo)

The last decade has seen some dramatic methodological changes in the wider disciplines of psychology and linguistics, in response to concerns that many experimental results may not generalise across samples and conditions (e.g., Gibson & Fedorenko, 2013; Maxwell, Lau, & Howard, 2015). An important set of changes (some of which have already trickled down to the field of bilingualism; Veríssimo, 2021) relates to the statistical methods that we routinely use to draw inferences. For example, by-subject and by-item analyses have now been broadly replaced by mixed-effects models, which can be applied to trial-level data without any prior aggregation (Baayen, Davidson, & Bates, 2008). Another recent development is the increasing adoption of Bayesian analyses as an alternative framework that replaces traditional frequentist inference (Vasishth, Nicenboim, Beckman, Li, & Kong, 2018) .

In this talk, I will provide an introduction to Bayesian (mixed-effects) regression models. I will first introduce the foundations of Bayesian statistics and show how this framework provides a very natural and informative quantification of effects and their uncertainty.

I will then highlight four specific advantages of Bayesian analyses for bilingualism research: (a) they allow incorporating prior information, so that more realistic and generalisable inferences can be drawn; for example, results obtained in previous research can inform effects in bilingual populations; (b) they allow multiple sources of variance to be accurately modelled, thus capturing the substantial within- and between-person variability that characterises bilingual populations; (c) they are flexible enough to fit virtually any kind of distribution, so that they can be applied to the many data types that are analysed in our field (e.g., response times, rating scales, etc.); and (d) they can be used to quantify the evidence for a null hypothesis over the alternative; for instance, we can assess how much evidence the data provides for the equality of experimental effects in two participant groups (e.g., L1 vs. L2 speakers), whereas non-significant frequentist statistics would be inconclusive.

These advantages will be demonstrated through examples with real datasets of bilingual language processing, using the R-package brms. All data and code will be made available for further practice, so that researchers that are already familiar with mixed-effects models will feel at home, while newcomers can benefit from a first contact with the richness and informativeness of the Bayesian framework.

Dr. João Veríssimo is an Assistant Professor at the University of Lisbon. His research has focussed on lexical and morphological processing in L1 and L2 speakers and, more recently, on modelling and explaining individual differences in language and cognition with Bayesian statistics. Dr. Veríssimo teaches at the Potsdam SMLP Summer School, a leading workshop on advanced statistical methods, and has edited a mini-series on new statistical approaches and research practices for bilingualism research in Bilingualism: Language and Cognition (Veríssimo, 2021) .

Draft schedule

5:00PM Introduction

5:05PM-6:00PM Introduction to Mixed-Effects Models in Bilingualism Research (Ian Cunnings and George Pontikas: LIVE)

6:00PM-6:30PM Q&A on Introduction to Mixed-Effects Models in Bilingualism Research (Ian Cunnings, George Pontikas and others: LIVE)

6:30PM-7:30PM Introduction to Generalised Additive Models in R (Stefano Coretta: RECORDED)

7:30PM-8:30PM Bayesian Analysis for Bilingualism Research: A Primer (João Verissimo: LIVE)

8:30PM-8:55PM Q&A on Generalised Additive Models and Bayesian Analysis (João Verissimo and others: LIVE)

8:55PM-9:00PM Close

(Australian Eastern Standard Time)

Workshop Detail

This is a satellite conference. Registration for this workshop is independent of attendance at the main conference.