Right now, as your eyes skim these words, the letters are hitting your retina at a laughably tiny scale - a few millimeters of projected light, smaller than an ant. Yet you don't think the words are ant-sized. You know the screen is a screen, the text is readable, and the whole thing sits at arm's length. Your visual system just executed a trade you never authorized: it took the cheap, noisy signal arriving at your eye and converted it into a stable estimate of how big things actually are in the world. A new study in eLife clocked exactly when that trade settles.
The pricing problem your eyes can't avoid
Here's the core inefficiency the brain has to deal with. The size of an object on your retina is a terrible measure of its real-world size. A coffee mug up close and a car down the street can paint the exact same patch on your retina. Retinal size is like a stock price: it moves constantly, it's distorted by distance, and taking it at face value will bankrupt you.
What you actually want is the object's underlying value - its real-world size. A mug is mug-sized whether it's in your hand or across the room. The brain treats this as a fundamental, organizing property of objects, which is why your occipitotemporal cortex literally sorts objects along a big-to-small axis the way a warehouse sorts inventory (Konkle & Oliva, 2012). The catch nobody could cleanly settle: is the brain pricing in true real-world size, or just reacting to depth cues and retinal size, three variables that tend to move together like correlated assets?
Disentangling correlated assets
Zitong Lu and Julie Golomb went after that confound with the kind of tooling a quant would appreciate. They used the THINGS EEG2 dataset - high-speed EEG recordings of people viewing thousands of natural images (Gifford et al., 2022) - and ran representational similarity analysis, which is basically a correlation matrix for brain states. Then they used partial correlation to strip out the shared variance, isolating real-world size from retinal size and from real-world depth.
The payoff is a timeline, and timelines are where the money is. The brain processes these properties in a strict sequence: real-world depth first, then retinal size, and finally real-world size. Think of it as a settlement order. The fast, cheap calculations - how far away is this, how big is the blob on my retina - clear early. The expensive, higher-order judgment, the actual real-world size that fuses what you see with what you know, posts last. Value-added processing costs latency, and the brain pays it only when the raw inputs have already been booked.
When the machines agreed with the humans
Then they ran the same images through artificial neural networks - a vision-only model (ResNet), a vision-plus-language model (CLIP), and a language-only model (Word2Vec) - to see whether silicon reached the same verdict. It did. Early layers of the networks were dominated by depth and retinal size; the late layers were all about real-world size, mirroring the brain's own "cheap first, expensive last" schedule.
The kicker is that even a language-only model, which never saw a single pixel, carried real-world size information. That tells you size isn't purely a visual readout - it's a higher-level dimension stitched together from both what things look like and what we know about them. Real-world size, in market terms, is the blue-chip valuation: stable, slow to compute, and informed by more than the day's price action.
Why a settlement timeline matters
Knowing the order of operations is more useful than it sounds. If you want to build vision systems that fail the way humans fail - or better, succeed the way humans succeed - you need to know which computations are foundational and which are derivative. You don't hedge a position before you've priced the underlying. This work hands engineers a processing schedule and hands neuroscientists a cleaner map of object space, where real-world size sits near the top of the hierarchy rather than tangled up with the cheaper signals feeding it.
It also quietly explains why you've never once mistaken a distant building for a toy. Your brain isn't just seeing. It's continuously running the arbitrage between appearance and reality, and it's been beating the market your entire life.
Disclaimer: The image accompanying this article is for illustrative purposes only and does not depict actual experimental results, data, or biological mechanisms.
References
- Lu, Z., & Golomb, J. (2025). Human EEG and artificial neural networks reveal disentangled representations and processing timelines of object real-world size and depth in natural images. eLife. https://doi.org/10.7554/eLife.98117 (PMID: 41424246)
- Gifford, A. T., Dwivedi, K., Roig, G., & Cichy, R. M. (2022). A large and rich EEG dataset for modeling human visual object recognition. NeuroImage, 264, 119754. https://doi.org/10.1016/j.neuroimage.2022.119754 (PMCID: PMC9771828)
- Konkle, T., & Oliva, A. (2012). A real-world size organization of object responses in occipitotemporal cortex. Neuron, 74(6), 1114-1124. https://doi.org/10.1016/j.neuron.2012.04.036 (PMCID: PMC3391318)
- Konkle, T., & Oliva, A. (2011). Canonical visual size for real-world objects. Journal of Experimental Psychology: Human Perception and Performance, 37(1), 23-37. https://doi.org/10.1037/a0020413