User Tools

Site Tools


colloquium:ssr14

14th SIKS/Twente Seminar on Searching and Ranking

Text as social and cultural data (Friday 10 March, 2017)

This symposium aims to bring together researchers from various disciplines to discuss approaches to using text for studying social and cultural phenomena.

Location: Horsttoren T1300, University of Twente (note: location has changed!)

Participation is free but please register to help us plan the required catering.


Preliminary program

10:30Coffee and welcome
11:00 Doing sociolinguistics on Twitter? Things you need to be aware of
Anders Søgaard (University of Copenhagen)
11:30 Social networks and sociolinguistic variation
Jacob Eisenstein (Georgia Institute of Technology)
12:00 Lunch
13:00 The #Datagrant program: Developing and applying NLP methods to study the role of social identity in online health campaigns
Tijs van den Broek (Utwente) and Anna Priante (Utwente)
13:20 Tweeting in regional minority languages – investigating the language choice between Dutch and Frisian or Limburgish on Twitter
Lysbeth Jongbloed-Faber (De Fryske Akademy) and Leonie Cornips (Meertens Institute)
13:40 Anticipointment detection in event tweets
Florian Kunneman (Radboud University)
14:00 Break
14:20 Ad hoc monitoring of vocabulary shifts over time
Tom Kenter (University of Amsterdam)
14:40 Devils, fairies and dragons: Character bias in the cultural success of fairy tales
Folgert Karsdorp (Meertens Institute)
15.00 How social media studies can inform ideas about language variation
John Nerbonne (University of Groningen/Albert-Ludwigs-Universität Freiburg)
15:30 End
16:30 PhD defense of Dong Nguyen

Doing sociolinguistics on Twitter? Things you need to be aware of

by Anders Søgaard (University of Copenhagen)

Large-scale social media analysis can be used to validate some sociolinguistic hypotheses, but social media introduce several methodological problems. For starters, social media users are biased samples of the population, and the traces they leave on social media, are influenced by writing, the perceived audience, etc. I discuss these problems in the context of specific sociolinguistic hypotheses.

Social networks and sociolinguistic variation

by Jacob Eisenstein (Georgia Institute of Technology)

Language is socially situated: both what we say and what we mean depend on our identities, our interlocutors, and the communicative setting. The first generation of research in computational sociolinguistics focused on large-scale social categories, such as gender. However, many of the most linguistically salient social distinctions are locally defined. Rather than attempt to annotate these social properties or extract them from metadata, we turn to social network analysis, which has been only lightly explored in traditional sociolinguistics. I will describe two projects at the intersection of language and social networks. First, I will discuss how the spread of linguistic innovations can serve as evidence for sociocultural influence, using a parametric Hawkes process to model the features that make dyads especially likely or unlikely to be conduits for language change. Second, I will show how social network structures can be exploited to make natural language processing more robust to sociolinguistic variation.

The #Datagrant program: Developing and applying NLP methods to study the role of social identity in online health campaigns

by Anna Priante (University of Twente) and Tijs van den Broek (University of Twente)

In 2014, the University of Twente received a datagrant from Twitter that offers a unique opportunity to study online campaigns as an instrument to raise cancer awareness and prevention behavior. The Twitter #DataGrant project is a concerted effort of Management and Public Administration scholars working together with computer scientists. This collaboration aims to analyze archival Twitter data covering the period between 2008 and 2014 to study multiple cancer awareness campaigns by developing and applying machine learning methods. In this presentation, we will show one of the many outcomes of our project that combines social theory and Natural Language Processing methods to build a classifier for English-speaking Twitter users’ social identity in profile descriptions. Our study shows how social theory can be used to guide NLP methods, and how such methods provide input to revisit traditional social theory that is strongly consolidated in offline settings.

Tweeting in regional minority languages – investigating the language choice between Dutch and Frisian or Limburgish on Twitter

by Lysbeth Jongbloed-Faber (De Fryske Akademy) and Leonie Cornips (Meertens Institute)

In the Dutch provinces of Limburg and Friesland, besides Dutch, a large share of the population also speaks a regional minority language: Frisian varieties in Friesland and Limburgish varieties in Limburg. Until recently, these regional minority languages were mainly used in spoken communication. However, the upcome of social media has shown an increasing use of non-standard varieties in the written domain. In the presentation the use of Dutch and Frisian and Limburgish on Twitter will be compared: when do people choose between Dutch and Frisian/Limburgish? To this end, we will compare the tweets from 20 twitterers in Limburg and Friesland who use both Dutch and Frisian or Limburgish extensively. How does their use differ and which patterns in language choice can be identified.

Anticipointment detection in event tweets

by Florian Kunneman (Radboud University)

Is it good to have positive expectations about a social event, such as a symposium? Or can it save us a lot of disappointment if we expected the worst? In this talk we present the first event anticipointment index by modelling the language used in the context of the emotions of positive expectation, satisfaction and disappointment in tweets. This talk might permanently change the way in which you anticipate events.

Ad hoc monitoring of vocabulary shifts over time

by Tom Kenter (University of Amsterdam)

Word meanings change over time. Detecting shifts in meaning for particular words has been the focus of much research recently. We address the complementary problem of monitoring shifts in vocabulary over time. That is, given a small seed set of words, we are interested in monitoring which terms are used over time to refer to the underlying concept denoted by the seed words.

Devils, fairies and dragons: Character bias in the cultural success of fairy tales

by Folgert Karsdorp (Meertens Institute)

Perhaps the most central question in fairy tale transmission research is why it seems that fairy tales consistently succeed to prevail in culture. Among the most popular stories in present-day Western Culture are fairy tales like “Cinderella”, “Snow White”, or “Red Riding Hood” – all of which are not present-day creations but find their roots in centuries-old tales. What is it about fairy tales that makes them stick? In this talk, I report on research examining the cultural successfulness of fairy tales from the famous Brothers Grimm collection Kinder- und Hausmärchen. Approaching tales' successfulness from a content-based perspective, I address the question of which story elements contribute to a story’s popularity, or, in other words, which story elements form attractors causing a fairy tale to 'stick' and gain popularity. In particular, I look into the question whether the type of characters in a story correlates with its successfulness, i.e. whether a character type bias is at play in story selection. The findings reported serve to aid our understanding of how the funnel-like selection process observed in the dissemination of the 210 stories in Kinder- und Hausmärchen has proceeded, which, in its turn furthers the understanding of the more general question of how to explain prevalent culture, in which some cultural artifacts are more likely to survive than others.

How social media studies can inform ideas about language variation

by John Nerbonne (University of Groningen/Albert-Ludwigs-Universität Freiburg)

It's very exciting how computational linguists have shown how to wring inferences about language variation from high-volume but noisy sources such Twitter or blogs. This has the potential to inform ideas about language variation profoundly. For the first time it is possible to follow innovations in real time (not merely from one generation to the next), to examine how such innovations spread, who the main carriers of innovation are, and which innovations are adopoted more generally (although there'll be a lag wrt to the last point). An intriguiging idea is to ask whether Bloomfields “density of communication” can be shown to be efficacious even in the presence of physical barriers.

Organization

  • Antal van den Bosch (Radboud University/Meertens institute)
  • Djoerd Hiemstra (University of Twente)
  • Franciska de Jong (University of Twente/Utrecht University)
  • Dong Nguyen (University of Twente/Alan Turing Institute/Edinburgh University)
  • Mariët Theune (University of Twente)

Sponsors

- Research School for Information and Knowledge Systems

CTIT logo 9 - Centre for Telematics and Information Technology


Text as social and cultural data (2017-03-10 10:30 in Horsttoren T1300)

colloquium/ssr14.txt · Last modified: 2017/02/24 13:34 by Djoerd Hiemstra