Big Data – an environmental asset

Big Data has emerged as a game-changing presence in commerce and politics. What used to be the vast and unknown cosmos of individual behavior and preferences can now be parsed for patterns and trends to aid decision-making. Where policies are based on gut-checks and intuition, Big Data is currently translated into decisions that result in significant profit, political gain, or, according to its more optimistic proponents, to save the world.

But forests don’t tweet, and whales don’t shop on Amazon. So what does Big Data mean for the environment and sustainability?

Yale University is the creator of the Environmental Performance Index (EPI); although it has yet to see the Big Data revolution enter the environmental domain. We sift through a plethora of globally available national datasets that measure a suite of environmental issues, ranging from climate change to air quality and forests. Despite the data available, we are still woefully plagued with gaps in knowledge, insufficient data, and uncertainty. We lack, for example, global datasets for national recycling rates, waste management, and toxic chemicals.

That leaves us frequently creating indicators based on incomplete or imperfect data. These indicators are meant to provoke policymakers to act on an environmental issue. One danger in creating these proxy measures is that data gaps are often ignored because the underlying problems are masked.

So how can we bring Big Data to environmental decision-making? What is needed to invigorate the same kind of massive data collection that tech companies and the private sector are harnessing to their advantage?

Another challenge is that we don’t yet know what environmental Big Data will look like or where it will come from. However, there are a few emerging suggestions. Crowd-sourcing and citizen science like Danger map — a crowd-sourced environmental pollution map-making ripples in China — are increasingly popular tools for creating information where there previously was none. Open hardware and the Arduino platform offer exciting prospects for widely distributed, inexpensive means to enable crowdsourced data collection. The World Resources Institute has teamed with the Center for Global Development to aggregate vast amounts of satellite data on forest cover, developing algorithms to detect when deforestation might be happening in any part of the world. The Global Forest Watch 2.0 platform allows users to contribute their observations if those algorithms and data are wrong. The National Ecological Observatory Network, or NEON, aggregates and designs communicative media for the information we already have about climate change, land-use change, and invasive species impacts. They are doing it to make their resources open and plastic to new information as it comes in.

Still, we have a long way to go. Unlike stock market data that is updated faster than real-time, there is no analogous platform or indicators for the environment. That’s a severe problem. Though many environmental phenomena manifest slowly over time, it is often already too late by the time we can perceive them.

In this 400 ppm time, we need to start thinking about enlisting Big Data for the environment.