Лекции Стефана Гриса в НИУ ВШЭ
Stefan Gries — профессор университета Калифорнии, Санта-Барбара, специалист по корпусной и квантитативной лингвистике. http://www.linguistics.ucsb.edu/faculty/stgries/
Лекции будут прочитаны 29 и 30 ноября по теме «Quantitative methods in corpus linguistics». Они организованы Международной лабораторией языковой конвергенции совместно со Школой лингвистики НИУ ВШЭ.
29 ноября 18.10 — 19.30
30 ноября 18.10 — 19.30 и 19.40 — 21.00
Старая Басманная, 21/4, ауд. 501
Quantitative methods in corpus linguistics
Talk 1: Spanish internet orthography (deletion & reduplication); recognition points in morphological blends (like brunch); rhythmic alternation in particle verbs; alliteration in idioms
This talk begins by discussing the often underestimated need for quantitative analyses in linguistics and proceeds by presenting several quantitative corpus-based studies that showcase the utility of even the simplest kinds of statistical analyses. Case study 1 explores language variation in creative spellings of Spanish on the internet; it shows that deletion and reduplication online are not haphazard but governed by a variety of factors such a s frequency, pragmatics, and articulatory aspects and that speakers ‘keep track’ of the ‘coolness’ of words that they modify. Case study 2 explores on the basis of different kinds of type and token frequencies where people split up words (such as breakfast
or channel and tunnel
) to create blends (such as brunch
). Case studies 3a and b discuss the role that phonology and articulation have on the syntactic alternation of particle placement (John picked up the book
vs. John picked the book up
) and the formation of idioms and semi-idiomatic constructions (in the Construction Grammar sense of the term).
Talk 2: the change of third person sg. in English from 1400-1700; the change of genitives in Singaporean English (as compared to British English); the change of the use of Spanish sentir over a few centuries
This talk showcases a variety of more sophisticated statistical methods and their application to diachronic linguistics. Case study 1 combines exploratory and hypothesis-testing methods to model the development of the 3rd person singluar in English using chronological clustering and mixed-effects regression modeling. Case study 2 critiques the kind of apparent-time ‘wanna-be diachronic’ analysis characteristic of much research on (English) indigenized varieties using different kinds of corpus data. Case study 3 is an application of exploratory methods such as multidimensional scaling on the development of the Spanish verb sentir
Talk 3: corpus data in psycholinguistics and what that means for regression modeling (with reanalyses of published work); frequencies, contingency/association, dispersion, and entropy in corpus data; example: that complementation in L2 English.
This talk discusses threats to statistical modeling of corpus and experimental data. It first highlights a few common issues that require attention in the process of regression modeling and exemplifies them with a reanalysis of data published in a paper in Cognition. It continues to to emphasize what kinds of data corpora offer beyond the most elementary kinds of frequencies — contingency, dispersion, entropy — and, thus, argues for a multidimensional interpretation of co-occurrence data in corpus linguistics that goes beyond the current simplistic frequency and association measures. I conclude with a learner corpus research case study that exemplifies at least some of these aspects.
Запись опубликована в рубрике Лекции/Семинары
. Добавьте в закладки постоянную ссылку