Stefan Gries — профессор университета Калифорнии, Санта-Барбара, специалист по корпусной и квантитативной лингвистике. http://www.linguistics.ucsb.edu/faculty/stgries/

Лекции будут прочитаны 29 и 30 ноября по теме «Quantitative methods in corpus linguistics». Они организованы Международной лабораторией языковой конвергенции совместно со Школой лингвистики НИУ ВШЭ.

29 ноября 18.10 — 19.30

30 ноября 18.10 — 19.30 и 19.40 — 21.00

Старая Басманная, 21/4, ауд. 501
Quantitative methods in corpus linguistics
Talk 1: Spanish internet orthography (deletion & reduplication); recognition points in morphological blends (like brunch); rhythmic alternation in particle verbs; alliteration in idioms

This talk begins by discussing the often underestimated need for quantitative analyses in linguistics and proceeds by presenting several quantitative corpus-based studies that showcase the utility of even the simplest kinds of statistical analyses. Case study 1 explores language variation in creative spellings of Spanish on the internet; it shows that deletion and reduplication online are not haphazard but governed by a variety of factors such a s frequency, pragmatics, and articulatory aspects and that speakers ‘keep track’ of the ‘coolness’ of words that they modify. Case study 2 explores on the basis of different kinds of type and token frequencies where people split up words (such as breakfast and lunch or channel and tunnel) to create blends (such as brunch and chunnel). Case studies 3a and b discuss the role that phonology and articulation have on the syntactic alternation of particle placement (John picked up the book vs. John picked the book up) and the formation of idioms and semi-idiomatic constructions (in the Construction Grammar sense of the term).

Talk 2: the change of third person sg. in English from 1400-1700; the change of genitives in Singaporean English (as compared to British English); the change of the use of Spanish sentir over a few centuries

This talk showcases a variety of more sophisticated statistical methods and their application to diachronic linguistics. Case study 1 combines exploratory and hypothesis-testing methods to model the development of the 3rd person singluar in English using chronological clustering and mixed-effects regression modeling. Case study 2 critiques the kind of apparent-time ‘wanna-be diachronic’ analysis characteristic of much research on (English) indigenized varieties using different kinds of corpus data. Case study 3 is an application of exploratory methods such as multidimensional scaling on the development of the Spanish verb sentir.

Talk 3: corpus data in psycholinguistics and what that means for regression modeling (with reanalyses of published work); frequencies, contingency/association, dispersion, and entropy in corpus data; example: that complementation in L2 English.

This talk discusses threats to statistical modeling of corpus and experimental data. It first highlights a few common issues that require attention in the process of regression modeling and exemplifies them with a reanalysis of data published in a paper in Cognition. It continues to to emphasize what kinds of data corpora offer beyond the most elementary kinds of frequencies — contingency, dispersion, entropy — and, thus, argues for a multidimensional interpretation of co-occurrence data in corpus linguistics that goes beyond the current simplistic frequency and association measures. I conclude with a learner corpus research case study that exemplifies at least some of these aspects.

