Lifting the Veil on the Use of Big Data News Repositories: A Documentation and Critical Discussion of A Protest Event Analysis

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Standard

Lifting the Veil on the Use of Big Data News Repositories: A Documentation and Critical Discussion of A Protest Event Analysis. / Hoffmann, Matthias; Santos, Felipe G.; Neumayer, Christina; Mercea, Dan.

I: Communication Methods and Measures, 2022.

Publikation: Bidrag til tidsskriftTidsskriftartikelForskningfagfællebedømt

Harvard

Hoffmann, M, Santos, FG, Neumayer, C & Mercea, D 2022, 'Lifting the Veil on the Use of Big Data News Repositories: A Documentation and Critical Discussion of A Protest Event Analysis', Communication Methods and Measures. https://doi.org/10.1080/19312458.2022.2128099

APA

Hoffmann, M., Santos, F. G., Neumayer, C., & Mercea, D. (2022). Lifting the Veil on the Use of Big Data News Repositories: A Documentation and Critical Discussion of A Protest Event Analysis. Communication Methods and Measures. https://doi.org/10.1080/19312458.2022.2128099

Vancouver

Hoffmann M, Santos FG, Neumayer C, Mercea D. Lifting the Veil on the Use of Big Data News Repositories: A Documentation and Critical Discussion of A Protest Event Analysis. Communication Methods and Measures. 2022. https://doi.org/10.1080/19312458.2022.2128099

Author

Hoffmann, Matthias ; Santos, Felipe G. ; Neumayer, Christina ; Mercea, Dan. / Lifting the Veil on the Use of Big Data News Repositories: A Documentation and Critical Discussion of A Protest Event Analysis. I: Communication Methods and Measures. 2022.

Bibtex

@article{e0edf331b48843c6b131cb1f1c4debb6,
title = "Lifting the Veil on the Use of Big Data News Repositories: A Documentation and Critical Discussion of A Protest Event Analysis",
abstract = "This paper presents a critical discussion of the processing, reliability and implications of free big data repositories. We argue that big data is not only the starting point of scientific analyses but also the outcome of a long string of invisible or semi-visible tasks, often masked by the fetish of size that supposedly lends validity to big data. We unpack these notions by illustrating the process of extracting protest event data from the Global Database of Events, Language and Tone (GDELT) in six European countries over a period of seven years. To stand up to rigorous scientific scrutiny, we collected additional data by computational means and undertook large-scale neural-network translation tasks, dictionary-based content analyses, machine-learning classification tasks, and human coding. In a documentation and critical discussion of this process, we render visible opaque procedures that inevitably shape any dataset and show how this type of freely available datasets require significant additional resources of knowledge, labor, money, and computational power. We conclude that while these processes can ultimately yield more valid datasets, the supposedly free and ready-to-use big news data repositories should not be taken at face value.",
author = "Matthias Hoffmann and Santos, {Felipe G.} and Christina Neumayer and Dan Mercea",
year = "2022",
doi = "10.1080/19312458.2022.2128099",
language = "English",
journal = "Communication Methods and Measures",
issn = "1931-2458",
publisher = "Routledge",

}

RIS

TY - JOUR

T1 - Lifting the Veil on the Use of Big Data News Repositories: A Documentation and Critical Discussion of A Protest Event Analysis

AU - Hoffmann, Matthias

AU - Santos, Felipe G.

AU - Neumayer, Christina

AU - Mercea, Dan

PY - 2022

Y1 - 2022

N2 - This paper presents a critical discussion of the processing, reliability and implications of free big data repositories. We argue that big data is not only the starting point of scientific analyses but also the outcome of a long string of invisible or semi-visible tasks, often masked by the fetish of size that supposedly lends validity to big data. We unpack these notions by illustrating the process of extracting protest event data from the Global Database of Events, Language and Tone (GDELT) in six European countries over a period of seven years. To stand up to rigorous scientific scrutiny, we collected additional data by computational means and undertook large-scale neural-network translation tasks, dictionary-based content analyses, machine-learning classification tasks, and human coding. In a documentation and critical discussion of this process, we render visible opaque procedures that inevitably shape any dataset and show how this type of freely available datasets require significant additional resources of knowledge, labor, money, and computational power. We conclude that while these processes can ultimately yield more valid datasets, the supposedly free and ready-to-use big news data repositories should not be taken at face value.

AB - This paper presents a critical discussion of the processing, reliability and implications of free big data repositories. We argue that big data is not only the starting point of scientific analyses but also the outcome of a long string of invisible or semi-visible tasks, often masked by the fetish of size that supposedly lends validity to big data. We unpack these notions by illustrating the process of extracting protest event data from the Global Database of Events, Language and Tone (GDELT) in six European countries over a period of seven years. To stand up to rigorous scientific scrutiny, we collected additional data by computational means and undertook large-scale neural-network translation tasks, dictionary-based content analyses, machine-learning classification tasks, and human coding. In a documentation and critical discussion of this process, we render visible opaque procedures that inevitably shape any dataset and show how this type of freely available datasets require significant additional resources of knowledge, labor, money, and computational power. We conclude that while these processes can ultimately yield more valid datasets, the supposedly free and ready-to-use big news data repositories should not be taken at face value.

U2 - 10.1080/19312458.2022.2128099

DO - 10.1080/19312458.2022.2128099

M3 - Journal article

JO - Communication Methods and Measures

JF - Communication Methods and Measures

SN - 1931-2458

ER -

ID: 319879217