Leaf Lens
Overview¶
Leaf Lens is one of three companion platforms for the study Advancing Paleobotany with AI-guided Expert Fossil Leaf Identification. This site focuses on explainability: how the network organizes concepts and families when classifying leaves, using cleared and x-rayed imagery so overlays stay easy to read.
Companion apps¶
-
Hugging Face Fossil App — run the same family-level model on your own fossil leaf images in the browser. Fossil training is currently dominated by Florissant; images from other sites are run through the model but predictions may be more variable.
-
Fossil Leaf Lens — browse Predicted Fossil Identifications and per-specimen pages for Florissant fossils (catalog numbers, similar training images, and model concepts for each specimen).
Our deep learning framework mitigates data scarcity by augmenting sparse fossil data with synthetic examples and by aligning extant and fossil leaf domains through representational learning. In the main article, we apply this approach to the late Eocene Florissant flora of Colorado and report well over 90% accuracy for family-level classification across 142 dicot families, compared with a chance level of 3.5%.
For dataset details, model training, fossil analyses, and the rest of the scientific story, see the paper—an arXiv preprint is coming soon (we will add the link here). The dataset is cited in the Citation section below.
Training data and Leaf Lens
Our models are trained on the Extant and Fossil Leaves dataset introduced by Wilf et al. (2021, PhytoKeys), a curated collection of cleared, x-rayed, and fossil leaf images spanning more than 150 angiosperm families. Leaf Lens, however, uses cleared and x-rayed images only. Fossil leaves differ sharply in contrast, breakage, matrix, and preservation, which would add variation unrelated to the taxonomic signal we aim to interpret.
Project goals¶
Our primary objective is to use explainable AI to characterize the concepts that matter most when neural networks classify leaves. By tracing how these networks encode and organize visual information, we aim to:
- Reveal how the model makes decisions and which features it relies on for classification.
- Clarify how biological taxonomy relates to structure in the learned representation space.
- Provide visual, interactive tools for exploring how concepts and families are organized in those representations.
By leveraging explainability methods, we surface internal visual “concepts” that highlight diagnostic patterns that are often hard for human observers to see.
Explore the site¶
Use the navigation (sidebar) to open any family or concept, or start with the maps below.
Interactive maps (UMAP)¶
| Families | 142 families in 2D feature space—hover points for details. |
| Concepts | 2000+ learned concepts—hover to see how patterns cluster. |
Family UMAP
Concept UMAP
Family (class) pages¶
Per family: representative samples, concept visualizations (what matters for that family), and activation heatmaps.
Concept pages¶
Per concept: feature visualizations, top 10 images with strongest activation, and notes on each concept’s role in classification.
Suggested flow¶
- Pan and hover the maps above.
- Open a family in the nav for specimen- and heatmap-level detail.
- Open a concept for the pattern-level view.
Broader implications¶
Using concept-based interpretability, we surface botanically meaningful cues by visually summarizing subtle morphological features that define families across fossil and extant specimens—suggesting new diagnostic characters.
The study’s fossil deployment—over 1,100 previously unidentified Florissant specimens, with experts flagging many predictions as intriguing or plausible for follow-up—is summarized and explored on Fossil Leaf Lens. Leaf Lens (here) is the companion site for concepts and families on cleared/x-rayed imagery.
We invite you to explore the results, interact with the visualizations, and engage with this work on concept learning.
Citation¶
If you use Leaf Lens in your research, please cite:
@article{rodriguez2025fossils,
title = {Advancing Paleobotany with AI-guided Expert Fossil Leaf Identification},
author = {Rodriguez, Ivan Felipe and Fel, Thomas and Gaonkar, Gaurav and
Vaishnav, Mohit and Meyer, Herbert and Wilf, Peter and Serre, Thomas},
year = {2025},
note = {arXiv preprint: coming soon}
}
This work also uses the following dataset:
@article{wilf2021leaves,
title = {An image dataset of cleared, x-rayed, and fossil leaves vetted to plant family for human and machine learning},
author = {Wilf, Peter and Wing, Scott L. and Meyer, Herbert W. and Rose, Jacob A. and Saha, Rohit and Serre, Thomas and Cúneo, N. Rubén and Donovan, Michael P. and Erwin, Diane M. and Gandolfo, Maria A. and Gonzalez-Akre, Erika and Herrera, Fabiany and Hu, Shusheng and Iglesias, Ari and Johnson, Kirk R. and Karim, Talia S. and Zou, Xiaoyu},
journal = {phytokeys},
volume = {187},
pages = {93--128},
year = {2021},
doi = {10.3897/phytokeys.187.72350}
}
Brown University
Kempner Institute, Harvard University
Pennsylvania State University
Florissant Fossil Beds, National Park Service