Home Home News How AI Helped Create the Latest Open Buildings Dataset to Address Global...

How AI Helped Create the Latest Open Buildings Dataset to Address Global Data Gaps

AI to Track Urban Growth and Build Insights for the Global South
Listen to this article

In 2021, the Google Exploration Africa group presented the Open Structures dataset, an open-source assortment of building impressions across the Worldwide South. Powered by artificial intelligence (AI) and high-resolution satellite imagery, this dataset was designed to address a significant gap in demographic data for developing regions. 

The dataset has now progressed to its third iteration, covering building data for 1.8 billion structures spread across 58 million square kilometers in areas including Africa, South and Southeast Asia, Latin America, and the Caribbean.

This groundbreaking dataset has become an invaluable tool for governments, organizations like the UN, researchers, and nonprofits to better understand population distribution and density. 

These insights are crucial for planning vital services such as vaccination campaigns, disaster response, and infrastructure development. Additionally, the dataset has enriched services like Google Maps, adding millions of previously unmapped structures.

Expanding the Dataset to Capture Urban Growth Over Time

The Google Research Africa team, which is based in Ghana with members in locations like Tel Aviv and Zurich, is always working to enhance the project’s impact. Abdoulaye Diack, program manager at Google Research, shares, “We’re constantly experimenting with new ideas, tackling challenges, and asking ourselves, “How can we make this better?” One significant limit of the underlying dataset was its static nature — it just gave a preview of building areas without catching how these regions developed after some time.

The challenge stemmed from the fact that commercial satellite providers often focus on high-demand areas, leaving large portions of the Global South—about 40% of the world’s surface—without regular, high-resolution imagery. Open-source imagery, such as that provided by the European Space Agency’s Sentinel-2 satellite, offered global coverage but at a lower resolution than was typically needed for detailed building detection.

 

Despite the low resolution, the team was hopeful and decided to test the potential of this imagery. They first tried inputting a single low-resolution image into their AI model and asked it to identify buildings within the frame. “It was a tough task, but the model showed potential,” Abdoulaye recalls. “It performed well enough that we knew it was worth refining.”

Introducing the Open Buildings 2.5D Temporal Dataset

After extensive experimentation and refining the model, the team released the Open Buildings 2.5D Temporal Dataset in late 2023. This new version covers building data from 2016 to 2023, providing annual snapshots that reflect the presence of buildings, population growth, and urban development. The dataset also includes building heights, which offer valuable insights into how cities evolve over time due to factors like development, population growth, and natural disasters.

Users can easily explore the dataset by selecting regions and toggling between different years to see how specific areas have changed. “By 2050, an estimated 2.5 billion people could move to cities in the Global South—this dataset will help governments and organizations plan for that population growth,” says Olivia Graham, product manager at Google Research. 

It demonstrates especially significant in pinpointing quickly creating locales, empowering city organizers and specialist organizations to zero in on basic assets like medical services and schooling.

For example, following the 2018 earthquake in Indonesia, the dataset clearly showed how the built environment along the coastline receded due to the disaster’s impact. This feature allows users to track the effects of natural events on urban development in real time, showcasing the power of this dataset for disaster recovery and long-term planning.

AI Model Enhancements Super-Resolving Low-Resolution Images

So, how did the team overcome the challenge of working with low-resolution satellite imagery to accurately detect buildings? They turned to an advanced AI technique called a “teacher-student model.” Krishna Sapkota, a software engineer at Google Research, explains, “The teacher model is trained using high-resolution images to detect buildings and generate labels. The student model then learns from the teacher’s output and applies that knowledge to infer higher-resolution details from the low-res Sentinel-2 images.”

 

To further improve the dataset’s accuracy, the model uses up to 32 different Sentinel-2 frames of the same location. Each frame is slightly offset in time, allowing the model to construct a clearer image of the area—similar to how modern smartphones capture sharper photos by combining multiple images.

 

While the original dataset focused on precise building polygons, the new dataset uses raster data to represent buildings as pixel-based masks. Additionally, the model now predicts building heights with remarkable precision—achieving a mean absolute error of just 1.5 meters, which is far less than the height of a single story.

Real-World Impact and Future Enhancements

The Open Buildings 2.5D Temporal Dataset has already been put to work by trusted partners. Sunbird AI, a nonprofit based in Uganda, is leveraging the dataset to pinpoint regions that could benefit from solar panels and microgrids. 

This initiative is aimed at improving electricity access for the 74% of Uganda’s population that currently lacks reliable power sources. By using this data, the organization can better target areas for sustainable energy solutions.

The dataset allows local governments to better understand urban growth patterns and adjust infrastructure plans to accommodate these changes, Olivia explains.

The same drive that led to the creation of the temporal dataset continues to propel the team’s work. “Living in Ghana, I can see firsthand how our work is making a difference,” says Abdoulaye.” Numerous regions come up short on assets, which thus prompts information holes that have extensive results. Being part of a team that is helping to fill those gaps is a privilege, and it’s incredibly rewarding to see the positive impact our work is having.”

The team is focused on further enhancing the dataset and discovering new methods to broaden its coverage and functionality. Their goal is to strengthen its importance as a key resource for tracking urban development and promoting sustainable progress in the Global South.

 

LEAVE A REPLY

Please enter your comment!
Please enter your name here