Forget everything you know about how scientists keep track of animals in a crowd. Seriously, wipe the slate. Because for years, the state of the art in multi-animal tracking has relied on a requirement so absurd it sounds like a riddle: you can only figure out who's who if everyone shows up to the same party at the same time. I have been thinking about how strange that is, and it turns out a team of researchers just painted right over the whole problem.
The "Who's Who" Problem (It's Harder Than You Think)
Here's the setup. You have a tank full of zebrafish, or a dish of fruit flies, or a cage of mice doing mouse things. You film them from above. Now comes the question that has haunted behavioral neuroscience for years: which fish is which?
This isn't a casual question. If you want to understand collective behavior - how schools form, how social hierarchies emerge, how one bold individual changes the mood of an entire group - you need to track individuals, not just blobs. And your average zebrafish doesn't come wearing a name tag.
The original idtracker.ai, published in 2019, was a genuine breakthrough. It used deep learning to identify individual animals in groups of up to 100, no markers required. But it had a catch: the software needed video segments where every single animal was simultaneously visible and separated from its neighbors. Think of it like a classroom attendance system that only works if all 30 kids stand still at the same time. In practice, this meant some videos took days to process. A few took nearly two weeks. Some couldn't be tracked at all.
Teaching a Computer to See Like a Bouncer
The new idtracker.ai, just published in eLife by Jordi Torrents, Tiago Costa, and Gonzalo de Polavieja, throws out the attendance-sheet approach entirely. Instead of treating tracking as a classification problem (label every image with an animal's name), they reframed it as a representation learning problem. The difference is subtle but revolutionary, like switching from memorizing faces to understanding what makes a face distinctive.
The technique is called contrastive learning, and it works like the world's most meticulous bouncer. The software watches the video and identifies "fragments" - stretches where a single animal moves uninterrupted between crossings with other animals. Images from the same fragment are treated as positive pairs (same individual), while images from different animals at the same time are negative pairs (different individuals). A neural network (ResNet18, if you're keeping score) then learns to map every animal image into a representation space where each individual naturally clusters together.
No need for everyone to show up at once. No group photo required. The system just needs animals to occasionally cross paths - which, if you've ever watched fish in a tank, happens approximately every three seconds.
700 Times Faster (No, That's Not a Typo)
The numbers here are the kind that make you reread them. The new system achieves a median identification accuracy of 99.92% across 33 test videos spanning zebrafish, fruit flies, and mice in groups of 5 to 100 animals. But the speed gains are where things get genuinely jaw-dropping: tracking runs up to 700 times faster than the original. Videos that once consumed a research assistant's entire week now finish while the coffee's still hot.
This matters more than it sounds. In behavioral neuroscience, the bottleneck hasn't been collecting data for a while now - cameras are cheap and storage is cheaper. The bottleneck has been processing it. Tools like SLEAP and DeepLabCut have transformed pose estimation, but maintaining consistent identities across a video, especially when animals overlap, has remained stubbornly difficult. A recent survey of deep learning approaches to multi-animal tracking highlights identity maintenance as one of the field's central unsolved challenges.
Why Your Zebrafish Data Just Got a Lot More Interesting
The practical upshot is this: experiments that were previously impractical become routine. Want to track 100 zebrafish for hours? Go ahead. Want to study how social dynamics shift across dozens of trials? The processing time is no longer the thing standing between you and your results. Researchers studying collective animal neuroscience - an emerging field trying to understand how brains coordinate behavior in groups - suddenly have a pipeline that can keep up with their ambitions.
And here's the quietly elegant part: the new system doesn't need any manual labeling. It's entirely self-supervised, pulling its training signal from the natural structure of the video itself. Animals swim, walk, and fly past each other, and every crossing becomes a free lesson in who's who. The video is the curriculum.
(I find something almost poetic in that. The animals teach the algorithm to see them, just by being themselves.)
References
-
Torrents, J., Costa, T., & de Polavieja, G. (2026). New idtracker.ai rethinks multi-animal tracking as a representation learning problem to increase accuracy and reduce tracking time. eLife, 12, e107602. DOI: 10.7554/eLife.107602 | PMID: 41983457
-
Romero-Ferrero, F., Bergomi, M. G., Hinz, R. C., Heras, F. J. H., & de Polavieja, G. G. (2019). idtracker.ai: tracking all individuals in small or large collectives of unmarked animals. Nature Methods, 16, 179-182. DOI: 10.1038/s41592-018-0295-5 | PMID: 30643215
-
Pereira, T. D., Tabris, N., Matsliah, A., et al. (2022). SLEAP: A deep learning system for multi-animal pose tracking. Nature Methods, 19, 486-495. DOI: 10.1038/s41592-022-01426-1
-
Mathis, A., Mamidanna, P., Cury, K. M., et al. (2018). DeepLabCut: markerless pose estimation of user-defined body parts with deep learning. Nature Neuroscience, 21, 1281-1289. DOI: 10.1038/s41593-018-0209-y
-
Chen, X., Wangsanata, I., & Bhatt, S. (2024). Deep learning in multiple animal tracking: A survey. Computers and Electronics in Agriculture, 224, 109161. DOI: 10.1016/j.compag.2024.109161
-
Bhatt, S., et al. (2022). Toward collective animal neuroscience. Science, 377(6611), 1150-1151. DOI: 10.1126/science.abm3060
Disclaimer: The image accompanying this article is for illustrative purposes only and does not depict actual experimental results, data, or biological mechanisms.