Synthetic data: provocations and frictions of emerging data regimes

Synthetic data are defined as information created by computer simulations or algorithms reproducing structural and statistical properties of so-called “real-world” data. In recent years, synthetic data have emerged as a promise to fix such problems of the AI industry as more and better-quality data for training datasets, protecting personal data via anonymization, and mitigating data bias. A growing field of scholarship uncovers issues regarding the emergence of new data economies and labour practices in synthetic data industries. Furthermore, synthetic data impacts the level of policy, data regulation and notions of personal data. Although the idea of synthetic data has been known and well-used since the 1940s, its potential applications as training data for AI present new challenges and require thorough investigation from social sciences and the humanities.

By surfacing frictions and provocations of synthetic data, the seminar aims to create a fruitful dialogue within a nascent field. We invite contributions on synthetic data and the synthetic turn from the social sciences and humanities, such as e.g. critical data studies, cultural studies, cultural theory, philosophy, political economy, sociology and STS.

Attendance is free but requires registration. If you want to present a paper, please send a short abstract (around 150-200 words) to kpas@hum.ku.dk by 10 October.

Preliminary programme

09:00 – Coffee & croissants

09:30 – Intro

09:45 – Keynote (James Steinhoff)

11:30 – Lunch

12:30 – Presentations followed by Q&A (5-6 presentations)

14:00 – Coffee break

14:30-16:00 – Round-table discussion

 

Synthetic data: new challenges for critical data studies

This talk introduces critical research on synthetic data: the latest darling of the artificial intelligence industry. Synthetic data is data which is generated by a computational process rather than captured by recording phenomena from the real world or digital platforms. With a synthetic dataset consisting of procedurally generated 3D models of faces, one can train a working facial recognition model that has never been exposed to a real human face (Wood et al. 2021). I contend that synthetic data has political, economic, epistemological and ontological implications for the critical study of data which have only just begun to be explored (Steinhoff 2022). It is no exaggeration to say that synthetic data is being positioned as the cure for all of the big problems facing AI. It is purported to be the solution to: data bias, privacy/surveillance and the labour costs associated with data collection and labeling. I discuss the nascent synthetic data industry and its global scope, as well as the most popular applications of the technology. It will also be necessary to explore the “reality gap”, the central technical problem posed by synthetic data–whether a model trained on synthetic data will function when deployed in the real world. I discuss the critical implications of synthetic data in several dimensions: the ontological connection between synthetic data and the human and the possibility of a data-intensive capitalism which does not rely on surveillance, the novel spatial and temporal dynamics of synthetic data and the labour process by which synthetic data is produced in simulations. I conclude with some considerations on the future of synthetic data in relation to the problem of model collapse.

References

Steinhoff, J., 2022. Toward a political economy of synthetic data: A data-intensive capitalism that is not a surveillance capitalism?. New Media & Society 26(6).

Wood, E., Baltrušaitis, T., Hewitt, C., Dziadzio, S., Cashman, T.J. and Shotton, J., 2021. Fake it till you make it: face analysis in the wild using synthetic data alone. In Proceedings of the IEEE/CVF International Conference on Computer Vision (3681-3691).


Bio

“James Steinhoff is Assistant Professor in the School of Information and Communication Studies at University College Dublin. His research focuses on automation and the political economy of AI and data. He is the author of Automation and Autonomy: Labour, Capital and Machines in the Artificial Intelligence Industry (Palgrave 2021) and co-author of Inhuman Power: Artificial Intelligence and the Future of Capitalism (Pluto 2019).”