Professor Brian Farrell receives David Rockefeller's beetle collection — about 150,000 specimens strong — from Eileen Rockefeller in 2017.
Professor Brian Farrell receives David Rockefeller's beetle collection — about 150,000 specimens strong — from Eileen Rockefeller in 2017. By Courtesy of Brian Farrell

Decades of Digitizing: The Quest to Reconstruct Seven Million Insects

After more than two decades of effort, the MCZ has labels for a mere five percent of its collection.
By Julian K. Li and John Lin

A new bug is joining the shelves of the Museum of Comparative Zoology’s 7-million-strong insect collection, one made not of antennae and exoskeleton, but of wires and lenses: “Lightning Bug,” a state-of-the-art imaging rig that hopes to expedite the MCZ’s decades-long effort to digitize its massive specimen collection.

Digitization at the MCZ began in the early 1990s when Edward O. Wilson, the MCZ’s curator of entomology at the time, began by typing labels into spreadsheets. Methods advanced in 1995 when the then newly-minted curator Brian D. Farrell kickstarted the accumulation of image data by scanning slide photos of the insect specimens. This process was expedited in 1999 when Nikon released its first digital camera, the D1. Since then, as technology continues to improve, the MCZ’s digitization effort has gained momentum.

Birthed from a multidisciplinary collaboration between Harvard, Argonne National Laboratory and a robotics team from Yale, Lightning Bug (aptly named after the efficiency of its insect counterpart) comprises a stage of cameras with a chamber where the specimen will sit. The robotic machinery of the stage allows the cameras to automatically image the specimen at different angles and heights. From there, a computer imaging software developed by Argonne uses artificial intelligence to stitch these images into a 3D reconstruction of the specimen.

The imaging process includes taking photographs of an insect’s top, sides, and underbelly, according to Crystal A. Maier, the MCZ’s associate curator who has taken the lead on the Lightning Bug collaboration. “By capturing images all around the insect and reconstructing a 3D model of it, you then have a complete picture of the size and shape of that insect right in front of you on the computer.”

The value of Lightning Bug, however, lies not only in capturing the insect itself but in preserving its label, which provides a description of when, where and how a species was collected.

“[The labels] may be all that we know about the species because it may have only been collected once,” says Maier. “[Digitizing labels] is a really good way to help us understand how a species evolved and where it might be in the world.”

A graduate student prepares beetle specimens for MCZ's collection as part of a Harvard summer school course at the UASD university in the Dominican Republic.
A graduate student prepares beetle specimens for MCZ's collection as part of a Harvard summer school course at the UASD university in the Dominican Republic. By Courtesy of Brian Farrell

The usual process of transcribing biological data from the labels to a database is long and tedious, she says. Because the label is pinned directly beneath a specimen, museum staff typically have to remove the label from each sample before imaging the insect and typing information from its label into the database.

Lightning Bug offers a streamlined solution. The new system, which arrived last month, can take images of the label at different angles without the need to remove it. Combining images using AI software will produce an easily readable image of the entire label. By reducing the label imaging time from minutes to seconds, Lightning Bug will speed up the flow of specimen data by about five times, according to Farrell.

For researchers such as Waring “Buck” Trible, who leads a lab that studies the evolution of ant morphology in different social castes, the influx and accessibility of imaging and label data could foreshadow new discoveries in his field.

Trible compares his study of phenotypic evolution and ant caste to “the way that alchemists were studying chemistry.”

“Bringing in diverse people who have different viewpoints and making the data as accessible to as many people as possible is really important because we don't really know what we're doing,” he says. “It's something where creativity and outside perspectives could be the thing that actually pulls this thing out of the dark ages.”

But even as the time needed to digitize each specimen has slimmed, digitizing the entire insect collection remains a distant feat. After more than two decades of effort, the MCZ has labels for a mere five percent of this collection, of which an even smaller portion has associated imaging data. If the MCZ team were to image insects at its current pace of ten minutes per specimen, it would take over 300 years before the final bug makes its way into the database — almost the same time it has taken to amass the collection itself. Lightning Bug would make this much faster.

“My dream would be to have the entire process automated from start to finish,” says Maier, who envisions Lightning Bug using a robotic arm to move specimens in and out of the imaging area while a computer automatically reads labels into the database.

OEB graduate student Sangil Kim collects insects in the Dominican Republic using a black light trap in summer 2018. The specimens will be studied and deposited in MCZ's collections.
OEB graduate student Sangil Kim collects insects in the Dominican Republic using a black light trap in summer 2018. The specimens will be studied and deposited in MCZ's collections. By Courtesy of Brian Farrell

Even with this accelerated workflow, completion is not necessarily the goal. “You can’t think in terms of finishing,” Farrell says. “You have to think in terms of information per hour spent.” The thought process behind choosing which of the millions of specimens to image is far from random. Instead, the MCZ prioritizes projects like David Rockefeller’s ’36 personal beetle collection, or Trible’s fire ants.

For Farrell, at the end of the day, digitization is all about accessibility. “Information only gains value by being shared,” he says. And with the addition of Lightning Bug to the MCZ lab team, the hope is that sharing can only become easier.

Tags
The Scoop