If you've ever taken a neuroscience course, you've probably heard of the Morris water maze. It's the classic memory test: put a mouse in a pool, hide a platform under the water, and see if the little guy can remember where the escape route is. If the mouse finds the platform faster over repeated trials, congratulations, its hippocampus works. Simple, elegant, been around since 1981. You'd think after four decades we'd have this nailed down.
You would be wrong.
A new paper in Neuroscience & Biobehavioral Reviews introduces DataMaze, an open-source repository designed to solve a problem that's been hiding in plain sight: everyone runs this test differently, and it's been quietly wrecking our ability to compare results across labs.
Welcome to the Tower of Babel, But Make It Rodent Swimming
Let me paint you a picture of how this works in practice. Lab A in Boston uses a 120 cm diameter pool. They run four training days with 60-second trials and measure escape latency (how long until the mouse finds the platform). Lab B in Tokyo uses a 150 cm pool. They run six training days with 90-second trials and measure path length (how far the mouse swims to get there). Lab C in Berlin has a different pool size altogether, trains for a different number of days, and invented their own metric that combines several measurements into a proprietary score.
All three labs publish papers saying they found learning impairments in their genetically modified mice. Great. But are those impairments actually the same? Were those mice really worse at learning, or did the different protocols just make them look that way?
Here's the honest answer: we often cannot tell.
This is the reproducibility crisis wearing a lab coat and pretending everything is fine.
The Goldilocks Problem Nobody Talks About
Every protocol decision affects results. Pool too big? Mice might tire out before finding the platform, and now you're measuring endurance, not memory. Trial length too short? You're cutting off slower learners before they can demonstrate learning. Training too brief? You're testing acquisition. Training too long? You're testing something else entirely.
And these variations aren't random. They accumulated over decades as labs made independent choices, often based on available equipment, traditions passed down from advisors, or "the way we've always done it." Nobody was coordinating. Nobody had to. Until someone tried to actually compare results across studies and discovered we'd been speaking different experimental dialects the whole time.
This matters beyond academic neatness. Drug companies have spent enormous amounts of money on compounds that "fixed" memory deficits in one lab's maze setup but did absolutely nothing in another's. Were the drugs bad? Was the first result a false positive? Was the second a false negative? Without standardized protocols, you're basically guessing.
DataMaze: Finally, Someone Built the Thing We Needed
The researchers behind DataMaze created an open-source repository for sharing both water maze methods AND data. This isn't just a place to dump your spreadsheets. It's designed to connect protocols to outcomes so researchers can actually answer questions like: "When I run the test this way, what should I expect to see?"
You want to know if your new lab's results are comparable to published literature? Check if your protocol matches. You're seeing weird effects and wondering if they're real? Compare to similar protocols in the database. You're designing a new experiment and want to maximize your chances of detecting a real effect? Look at what's worked before.
This is the kind of infrastructure that doesn't win Nobel Prizes but might actually make the science more reliable. Which, if you think about it, is kind of the whole point.
What We Might Actually Learn
By aggregating methods and data, DataMaze enables something researchers love but rarely get to do: meta-scientific analysis. This means asking questions about the science itself. How do procedural variations affect results? Which protocol elements actually matter for detecting real memory differences? Which are just tradition, passed down like family recipes that nobody questions?
Maybe there's an optimal pool size that maximizes signal and minimizes noise. Maybe certain training schedules are more sensitive to real impairments while others generate more false positives. We don't know yet. But with enough data from enough labs running enough variations, patterns should emerge.
This isn't just housekeeping. This is how fields grow up and stop wasting resources on noise masquerading as signal.
A Template for Fixing Other Messes
The Morris water maze isn't special in its inconsistency. The elevated plus maze (for anxiety). The forced swim test (for depression). The novel object recognition test (for memory). All of these suffer from the same "everyone does it their own way" problem.
DataMaze represents a model for how fields can self-organize to address variation. You don't need a governing body issuing mandates. You need an accessible platform where sharing methods is easy and comparing them is useful. Make the friction of standardization lower than the friction of chaos, and researchers will naturally converge.
Sometimes scientific progress isn't about brilliant insights or revolutionary techniques. Sometimes it's about admitting that we've all been making it harder to replicate each other's work and then actually doing something about it.
Standardization isn't glamorous. It doesn't make for exciting headlines. But it might be what separates robust findings from very expensive noise. And after four decades of everyone doing their own thing, that's a pretty good reason to pay attention.
Reference: Bhattacharyya S, et al. (2025). DataMaze: An open source methods and data repository for Morris water maze experiments. Neuroscience & Biobehavioral Reviews. doi: 10.1016/j.neubiorev.2025.105825 | PMID: 40825454
Disclaimer: The image accompanying this article is for illustrative purposes only and does not depict actual experimental results, data, or biological mechanisms.