Leaf Lens

Ivan Felipe Rodriguez^1🌿, Thomas Fel^1,2🌿, Gaurav Gaonkar¹, Mohit Vaishnav¹, Herbert Meyer³,
Peter Wilf ⁴ & Thomas Serre^1🍂

¹ Center for Computational Brain Science, Brown University
² Kempner Institute, Harvard University
³ Florissant Fossil Beds National Monument, NPS
⁴ Dept. of Geosciences, Pennsylvania State University

^🌿 Joint first authors | ^🍂 Corresponding author

👋 Start exploring now »

Overview¶

Leaf Lens is the companion platform to our study "Decoding Fossil Leaves with Artificial Intelligence: An application to the Florissant Formation". This website provides an interactive exploration of how deep neural networks learn to classify fossil angiosperm leaves—one of paleobotany's most persistent challenges.

Our deep learning framework overcomes data scarcity by augmenting sparse fossil data with synthetic examples and aligning extant and fossil leaf domains through representational learning. We demonstrate this approach on the late Eocene Florissant flora of Colorado, achieving well over 90% accuracy for family-level classification across 142 dicot families—compared to a chance level of just 3.5%.

Project goals¶

Our primary objective is to leverage Explainable AI techniques to understand the concepts that matter most for neural networks when classifying leaves. By revealing these concepts, we aim to provide:

Insights into the model's decision-making process, identifying the key features used for classification.
A deeper understanding of the relationships between biological taxonomy and computational representations.
Visual and interactive tools for exploring how concepts and families are structured within the learned representations.

Our system addresses a fundamental challenge: the extreme scarcity of taxonomically vetted fossil specimens. While modern leaf specimens are abundant, fossilization processes—compression, mineralization, fragmentation—create a challenging domain shift between living and fossil forms. By leveraging explainability techniques, we identify internal visual "concepts" that reveal diagnostic patterns difficult for human observers to discern.

Key highlights¶

Number of families: 142 dicot angiosperm families
Total dataset: Over 34,000 images (extant and fossil leaves) from our open-access leaf image dataset
Florissant fossils: 3,200 taxonomically vetted specimens spanning 23 families
Classification performance: Well over 90% top-5 accuracy (chance: 3.5%)
Discovered concepts: 2,000+ unique visual concepts extracted via sparse dictionary learning

Explore the website¶

Interactive UMAP visualizations

2,000+ concepts: Explore how the network organizes learned concepts in a 2D UMAP projection. Each point represents a distinct concept, clustered based on similarity. Hover over clusters for details.
142 families: See how the leaf families relate to one another in the feature space through an interactive UMAP plot. Gain insights into family-level similarities and separations.

Concepts visualization

Family visualization

Family-specific pages

For each of the 142 dicot families, a dedicated page includes

Representative samples from the dataset.
Concept visualizations that highlight the features most critical for classifying leaves in this family.
Activation heatmaps showing how the neural network processes these leaves.

Concept-specific pages

Each of the 2,000+ discovered concepts has its own page, detailing

Feature visualizations representing the concept.
The top 10 leaf images that activate the concept most strongly.
Insights into the concept's role in classifying specific leaf families.

Navigating the investigation¶

Begin with the UMAP visualizations to explore the relationships between concepts and families.
Dive deeper into family pages to learn about specific leaf families and the features the model uses to classify them.
Explore the concept pages for an in-depth look at the learned concepts and their biological or computational significance.

Unlocking paleobotanical "dark data"¶

Isolated leaves dominate the angiosperm fossil record yet remain notoriously difficult to identify accurately, representing paleobotany's largely untapped source of "dark data." Historical literature is riddled with botanically incorrect identifications due to the inherent complexity of leaf morphology and insufficient vetted reference samples. Most well-identified fossil leaves represent only a handful of morphologically distinctive families, leaving the vast majority of angiosperm diversity in the fossil record unrecognized.

By overcoming data scarcity through generative AI and representational learning, this work offers a pathway to unlocking the vast collections of unidentified specimens in museum drawers worldwide—providing essential data for interpreting evolutionary radiations, extinctions, biome evolution, plant-animal interactions, biogeography, and biotic responses to climate change.

Broader implications¶

This research advances one of paleobotany's central challenges—accurate identification of fossil angiosperm leaves—and demonstrates how state-of-the-art AI can be applied to scientific domains with limited training data. Using concept-based interpretability methods, our system surfaces botanically meaningful cues by visually summarizing subtle morphological features that define families across fossil and extant specimens, suggesting new diagnostic characters.

Beyond the Florissant Formation, this cross-domain strategy is readily generalizable to other fossil deposits, positioning this approach for broad use in understanding the evolution and ecological dynamics of ancient terrestrial ecosystems. We have already applied our system to over 1,700 previously unidentified Florissant fossils, with expert paleobotanists finding a high proportion of the predictions to be intriguing or plausible candidates for detailed follow-up studies.

We invite you to explore the findings, interact with the visualizations, and engage with this collaborative exploration into concept learning.

Citations¶

If you make use of Leaf Lens in your research, please cite:

Main paper:

Rodriguez, I.F., Fel, T., Gaonkar, G., Vaishnav, M., Meyer, H., Wilf, P., & Serre, T. (2025). Decoding Fossil Leaves with Artificial Intelligence: An application to the Florissant Formation.

@article{rodriguez2025fossils,
  title  = {Decoding Fossil Leaves with Artificial Intelligence: 
            An application to the Florissant Formation},
  author = {Rodriguez, Ivan Felipe and Fel, Thomas and Gaonkar, Gaurav and 
            Vaishnav, Mohit and Meyer, Herbert and Wilf, Peter and Serre, Thomas},
  year   = {2025}
}

Dataset:

Wilf, P., Wing, S.L., Meyer, H.W., Rose, J.A., Saha, R., Serre, T., Cúneo, N.R., Donovan, M.P., Erwin, D.M., Gandolfo, M.A., Gonzalez-Akre, E., Herrera, F., Hu, S., Iglesias, A., Johnson, K.R., Karim, T.S., & Zou, X. (2021). An image dataset of cleared, x-rayed, and fossil leaves vetted to plant family for human and machine learning. PhytoKeys, 187, 93–128. https://doi.org/10.3897/phytokeys.187.72350

@article{wilf2021leaves,
  title   = {An image dataset of cleared, x-rayed, and fossil leaves vetted 
             to plant family for human and machine learning},
  author  = {Wilf, Peter and Wing, Scott L. and Meyer, Herbert W. and 
             Rose, Jacob A. and Saha, Rohit and Serre, Thomas and 
             Cúneo, N. Rubén and Donovan, Michael P. and Erwin, Diane M. and 
             Gandolfo, Maria A. and Gonzalez-Akre, Erika and Herrera, Fabiany and 
             Hu, Shusheng and Iglesias, Ari and Johnson, Kirk R. and 
             Karim, Talia S. and Zou, Xiaoyu},
  journal = {PhytoKeys},
  volume  = {187},
  pages   = {93--128},
  year    = {2021},
  doi     = {10.3897/phytokeys.187.72350}
}

Funding and acknowledgments¶

This material is based upon work supported by the U.S. National Science Foundation under Award No. EAR-1925481 (T.S.) and EAR-1925755 (P.W.), and by ANR-3IA Artificial and Natural Intelligence Toulouse Institute (ANR-19-PI3A-0004).

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Computing support was provided by the Center for Computation and Visualization (CCV) at Brown University (via NIH Office of the Director grant S10OD025181). We also acknowledge Google's Cloud TPU hardware resources via the TensorFlow Research Cloud (TFRC) program.