Abstract
The introduction and widespread use of foundation models has accelerated the necessity of identifying geographic bias in AI-generated content. In this respect, we operationalize geographic diversity as a countermeasure. We refine the notion of geographic diversity as the quality of including data from various places and maintaining a balance across these places in both learning and generation processes. Drawing from information theory, ecology, and prior work in AI evaluation, we provide an entropy-based definition of geographic diversity and propose to measure geographic diversity as effective numbers of places. We apply our measurement by studying generated content from six large language models, including GPT-3.5, GPT-4o, Mistral 7B, Mistral Large, Claude 3 Haiku, and Claude 3.5 Sonnet. Our case study reveals that prompt variations, such as modifying concept mentions or scale mentions in a user prompt, can result in more geographic diversity in their generated content. In addition, we observe that less advanced models can generate more geographically diverse content than state-of-the-art ones. Furthermore, certain places dominate the generated content of these models, yet their prominence does not reflect their real-world counterparts. Our work stresses the importance of quantifying geographic information in AI-generated content to support GeoAI and the broader AI evaluation in the age of foundation models.
| Original language | English |
|---|---|
| Article number | e70057 |
| Journal | Transactions in GIS |
| Volume | 29 |
| Issue number | 3 |
| DOIs | |
| Publication status | Published - May 2025 |
Austrian Fields of Science 2012
- 507003 Geoinformatics
Keywords
- GeoAI
- geographic diversity
- generative AI
- geographic bias
- data diversity
- information theory