Much of my work is in pursuit of “data dignity”, an idea that stems in part from scholars arguing that we should sometimes think of “data as labor”. However, the data as labor comparison often elicits quite a bit of pushback. In this post, I’m going to describe a thought experiment regarding cartography and two seaside villages. My hope is that this story can illustrate some theoretical reasons to be hopeful about the potential for collective bargaining power deriving from data, and I hope to paint an optimistic picture of a path towards AI governance. Throughout this thought experiment, I’ll also make references to Nagaraj and Stern’s Journal of Economic Perspectives 2020 article on the Economics of Maps.
Ultimately, no metaphor for “data as X” is perfect (and there's been some great critiques of data labor specifically). The scope of human activities that are “data-generating” is vast, and growing. But the comparison with labor, especially map-making labor can be instructive.
Mapping a Seaside Village
Imagine a seaside village, at some point in time well before satellites, Google Maps, or OpenStreetMap. Many of the residents of the village use boats to fish and travel. It is of great value to the town to map out the coast and places where rocks jut upwards. In the early days of the village, the town council hires a few young people to act as cartographers. These freshly minted cartographers set out to boat, swim, dive, and walk, all in service of recording a first map.
As time goes on, more people become cartographers, and these cartographers begin to compete: they make more aesthetic maps, explore in larger radii, or make guarantees regarding how frequently their maps are updated. A market for cartographic labor develops.
A challenge for the map-makers trying to sell their maps is that, it's not too difficult to make a copy or share key details (maps are a type of “information good”, as Nagaraj and Stern explain). A fisherman might use the map once, learn the key details, and share with his friends and colleagues who never pay a cent to the map-maker. Eventually, the town council might decide to just put up a large, permanent map in the town square, making it easy for each villager to reference at will or make a personal copy and it making it hard for cartographers to make money.
Ultimately, maps have the attributes that put them somewhere between public and club goods. In some cases maps can be made “excludable” – the mapmaker can try to make it hard for any non-paying customers to access and copy the map, but it's a Sisyphean task to do so (requiring, perhaps, a state that will protect intellectual property).
The reason I've laid out this extended story is because I believe the mapping metaphor will be useful in highlighting the distinction between places where mapping labor has leverage and where it does not. This provides a nice conceptual, well, mapping between contexts where data-generators (i.e., tech users, the public) are likely to be able to gain a powerful bargaining position with data users (i.e. tech companies).
Some Places are Calm, and in Other Places Volcanoes Erupt
In our seaside village with typical geological conditions, we might imagine things don't change too often. Erosion is slow, there aren't too many landslides, shifting tectonic plates, or volcanic eruptions that require large map changes. Every once in a while, residents might take note that a tree has fallen and update the communal map.
We can characterize this system as quite static. If we were to hone in on a particular variable describing the map over time (for instance, the average elevation), that variable likely has a low entropy. There's not likely to be a vibrant labor market of cartographers here. In modern terms, the level of OpenStreetMap contribution activity might be relatively low. Probably, the most ambitious cartographers will set off for greater heights or further shores.
However, imagine another seaside village, but with fantastically dynamic geology. On a monthly basis, volcanic eruptions sprout islands and mountains. Here, we can imagine a strong demand for cartographic labor, driven by a more robust set of map offerings ("Get your maps here, updated daily!", "The only map drawn with the help of a submarine!").
In the first seaside village, the cartographers have little labor leverage, and likely very little business. But, the villagers reap the continuous benefits of a near-perfect navigation. In the second village, cartographers wield immense leverage. Were they to go on strike, any seafaring commercial activities would be shut down or severely hindered. A potential Cartographer's Union in our dynamic seaside village could command great prices. Of course, the downside is that life is just more unpredictable. Presumably, more ships run into rocks and beaches.
Model Weights Maps Systems, and Social Systems are Mostly Dynamic
I've belabored this example because I believe much of modern AI can be described in terms of technology companies conscripting the vast majority of tech users into becoming cartographers of both our physical and social spaces. This is, in fact, a very literal description of traffic prediction algorithms (traffic being a phenomenon that combines physical and social aspects). But more broadly, from a zoomed-out view most "AI" systems map inputs to outputs: a language model maps your prompt to an output, a recommender system maps your browsing history to a list of 10 items you might like, and image classifier maps a picture to some likelihood score that the picture contains some objects. Of course, there are internal technical details that imply conceptual differences between paper maps and modern AI, but I believe there is a striking resonance in this comparison.
Most importantly, it can be insightful to describe many of our “AI Advances” as techniques to conscript more people to spend time mapping various physical and social systems in our world or by finding ways to get existing cartographers (users) to map more things.
I've written at length about the idea that since we all contribute to AI, we have potential power. One common retort to this idea (which to be fair, has yet to be tested in the field!) involves arguing that unwitting data generating actions like using a search engine or writing social media posts shouldn't count as labor. I hope that this specific comparison with map-making labor might make the (powerful) analogy more palatable.
But a second critique is that all this theorizing is in vain, because collective action around data will never be able to make a noticeable impact on large-scale AI systems. Here, I think the extended thought experiment above is useful, because we can think of different systems as falling somewhere on the spectrum between our static, perfectly mapped village and our dynamic island-sprouting village. We could characterize this spectrum in terms of a measure of each region's geological complexity, such as the information entropy of the variables that describe the terrain (or something easy to interpret, like the monthly probability of significant geological activity).
A Hopeful Note for AI Governance
Most of the public-facing machine learning technologies that are centered in ongoing discussion of AI and society are in the business of mapping disordered systems with physical and social components. Each of these systems can be described in terms of random variables, which vary in their entropy and in other measures of complexity. Arguing from our thought experiment above, I'd argue that many of these technologies are trying to map systems much closer to our volcano-filled, plate-shifting, eroding region. Very low entropy systems can be mapped very easily; consider the case of an "AI system" which consists of many if-else code blocks. These are mapped, and that's good!
The near-term battles around AI systems will probably center how and if we should advance the capabilities of models that are mapping very dynamic and complex systems that characterized by variables with greater approximate entropy (has the volcano erupted? We won't know until somebody makes the effort to check!). While it's true that we've already done an unimaginable amount of mapping labor (all the activities performed that have led to the records filling data centers used by Google, Meta, Microsoft, etc.), it's also true that the labor we'll do tomorrow matters.
Time is on our side. New islands are sprouting and the old maps will fade; if the cartographers of the world (all of us) can muster collective action, they’ll have a loud voice going forward. And of course, we can all benefit from an up to date map.
Please note: this is meant to be a “beta test” of this idea. I’d like to refine, change, or bolster it, with more specific references to the arguments I’m addressing here and more supporting examples. Please let me know what you think!
Thanks to Vincent Guth and Daan Huttinga for the seaside photos, found via unsplash.com. The Martellus map photo is from the Nagaraj and Stern paper above, originally from www.myoldmaps.com.
Image credits: * https://unsplash.com/photos/NUiQpGnj9rI! * https://unsplash.com/photos/Su27UvdJrgU * https://www.myoldmaps.com/_Media/martellus-copy-2_med_hr.jpeg