Evaluating hypotheses in geolocation on a very large sample of Twitter
Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt
Standard
Evaluating hypotheses in geolocation on a very large sample of Twitter. / Salehi, Bahar; Søgaard, Anders.
Proceedings of the 3rd Workshop on Noisy User-generated Text. Association for Computational Linguistics, 2017. s. 62-67.Publikation: Bidrag til bog/antologi/rapport › Konferencebidrag i proceedings › Forskning › fagfællebedømt
Harvard
APA
Vancouver
Author
Bibtex
}
RIS
TY - GEN
T1 - Evaluating hypotheses in geolocation on a very large sample of Twitter
AU - Salehi, Bahar
AU - Søgaard, Anders
PY - 2017
Y1 - 2017
N2 - Recent work in geolocation has madeseveral hypotheses about what linguisticmarkers are relevant to detect where peoplewrite from. In this paper, we examinesix hypotheses against a corpus consistingof all geo-tagged tweets from theUS, or whose geo-tags could be inferred,in a 19% sample of Twitter history. Ourexperiments lend support to all six hypotheses,including that spelling variantsand hashtags are strong predictors of location.We also study what kinds of commonnouns are predictive of location aftercontrolling for named entities such as dolphinsor sharks.
AB - Recent work in geolocation has madeseveral hypotheses about what linguisticmarkers are relevant to detect where peoplewrite from. In this paper, we examinesix hypotheses against a corpus consistingof all geo-tagged tweets from theUS, or whose geo-tags could be inferred,in a 19% sample of Twitter history. Ourexperiments lend support to all six hypotheses,including that spelling variantsand hashtags are strong predictors of location.We also study what kinds of commonnouns are predictive of location aftercontrolling for named entities such as dolphinsor sharks.
M3 - Article in proceedings
SN - 978-1-945626-94-4
SP - 62
EP - 67
BT - Proceedings of the 3rd Workshop on Noisy User-generated Text
PB - Association for Computational Linguistics
T2 - 3rd Workshop on Noisy User-generated Text
Y2 - 7 September 2017 through 7 September 2017
ER -
ID: 195014345