Fossil Leaf Lens

Fossil Leaf Lens

A machine learning tool to help paleobotanists identify leaf fossils

Ivan Felipe Rodriguez🌿 1, Thomas Fel🌿 2, Gaurav Gaonkar 1, Mohit Vaishnav 1,
Herbert Meyer 4, Peter Wilf 3 & Thomas Serre🍂1
🌿 Joint first authors  |  🍂 Corresponding author
1 Brown University, 2 Kempner Institute, Harvard University,
3 Pennsylvania State University, 4 Florissant Fossil Beds, National Park Service
Leaf Lens Navigation Guide

Welcome to Fossil Leaf Lens

We are excited to share the fruits of years of research and innovation aimed at solving one of paleobotany's most challenging puzzles: identifying fossil angiosperm leaves. These organs are often abundant yet notoriously difficult to classify, especially in the absence of organic attachments or cuticles, due to their complexity, variation, and the often limited quality and quantity of available images.

Through the power of AI and computer vision, we have developed a deep learning model that synthesizes photorealistic fossil images from extant cleared and x-rayed leaves, increasing the sample size of "fossil" image collections for training. As explained in our accompanying manuscript (coming soon), this approach allows machine identifications of fossil and extant leaves at the family level, the starting point for most investigations, with levels of accuracy sufficient to provide useful suggestions for experts.

Initially, to limit the immense variation in leaf preservation among fossil sites, we present the tool for leaf fossils from a single, extraordinarily well-studied and photo-documented site: Florissant Fossil Beds, late Eocene of Colorado. The images were gathered over many years by Dr. Herbert Meyer (retired, National Parks Service) and assistants from museums around the world, as explained by Meyer et al. 2008 (GSA Special Papers 435) and Wilf et al. 2021 (PhytoKeys), who made a vetted subset of Florissant fossils available as part of a large image collection of living and fossil leaves.

The accompanying manuscript explores machine identifications of vetted Florissant fossils from the Wilf et al. 2021 dataset. On this website, we show the broader potential of the method by sharing the results of our model for hundreds of hard-to-identify fossil leaves from Florissant that were not included in the 2021 vetted subset, including both unidentified specimens and those attributed previously to botanical names that are now uncertain. The model's training images include the vetted Florissant images and all the cleared and x-rayed leaf images described in Wilf et al. 2021. We hope that this tool will stimulate new research into the world-famous Florissant flora, as we work to generalize the algorithms to apply to other fossil sites.

We are eager to hear from the expert community. Your feedback will help us gauge how many of these classifications are plausible and where further exploration is needed. We look forward to your input in advancing this exciting field!

Website Features

1Predicted Fossil Identification

You can explore the predicted fossil identifications by clicking on the "Predicted Fossil Identification" link in the navigation bar. This will open a list of fossil specimens. Clicking on a specimen will open a detailed webpage with a predicted fossil information card. This card includes the following information: Dataset catalog number, primary catalog number, model predictions, similar specimens, and concepts.

Fossil Identification Card

In this identification card, you will find details about the Fossil specimen, including its repository number. You can easily find additional metadata for the specimens, including prior identifications, from their filenames (CU- or FLFO- prefix), with these metadata tables kindly provided by Dr. Meyer (see Wilf et al. 2021 for more information about these two image sets):

Similar Specimens

Below this you will find the images from our training dataset that are most similar to the provided specimen, with informative filenames as detailed by Wilf et al. 2021.

Concepts

Finally, you will find the concepts that were utilized by the model in the classification process. These concepts are parts of the images that are useful for family identification through the dataset. In this context, concepts are visual or structural patterns in the specimen that the model uses for classification. These often, though not always obviously, correspond to diagnostic leaf architecture traits used in traditional taxonomy, such as leaf margins, venation, symmetry, etc. The concepts are a rich source of potential taxonomically informative characters (see Spagnuolo et al. 2022, Intl. J. Plant Sci.) You can also click on them to explore more details about the concept and other families where it occurs.

2Feedback Table

The table displays a list of unidentified fossils. Each row contains:

  • Image filename (clickable hyperlink; some images are closeups of others)
  • Fossil image of the given specimen
  • Top five predictions (clickable hyperlink to concept page for the indicated plant family)
  • Feedback options: Use the color-coded buttons to mark each prediction as , , , or . Please simply skip over poorly preserved or inapplicable specimens (see Disclaimers below for details).

The Feedback Procedure

You can interact with each fossil prediction by:

1. For each row you can mark any of the following interactions:

  • - In your best judgement, one or multiple of the families proposed can be actually the family of the specimen.
  • - No way! None of the predictions make sense for this specimen.
  • - You don't recognize the features of all the top-5 families offered by the system, and further study is needed.
  • - The specimen doesn't belong in the dataset (e.g., non-dicot leaf, too degraded, or not a leaf fossil).

2. Response Tracking

  • Use the 📥 Download Responses button (bottom right) to save your choices as a JSON file.
  • Important: Download before closing the website to avoid losing responses.

3. Resuming Your Work

  • You can resume work using serial numbers during your next visit.

4. Sending your feedback: Your feedback on any portion of the dataset is greatly appreciated. Feel free to send the downloaded JSON file to ivan_felipe_rodriguez@brown.edu

Disclaimers

Please note: While our dataset is extensive, many fossil samples are badly preserved and may lack the detail needed for accurate classification. In addition, although the images were manually filtered several years ago to remove most that are inappropriate, there remain many images of monocots and non-angiosperms (which are severely undersampled in the training dataset), reproductive organs (likewise), and non-plant fossils (feathers, fish, and so on). We recommend simply skipping these poorly preserved or inapplicable specimens to ensure more reliable results.

Finally, please be aware that the model can only predict families that are in its training dataset, listed here.

We invite you to explore this innovative blend of paleobotany and artificial intelligence, and to join us in refining the art and science of fossil leaf identification!