Here's something that should bother you: you can have a deeply meaningful conversation with an AI, work through a complex problem together, establish shared understanding and rapport. And then the next time you start a chat? Complete stranger. No memory that you ever spoke. It's like Groundhog Day, except you're the only one aware of the loop.
A review in Trends in Cognitive Sciences argues that this isn't just an inconvenience. It's a fundamental limitation that makes current LLMs way less useful than they could be. And fixing it might teach us something profound about how human memory actually works.
The Memory-Shaped Hole in Every LLM
Let's talk about what you have that ChatGPT doesn't: episodic memory. This is your ability to remember specific experiences in context. Not just bare facts ("Paris is the capital of France") but actual events ("remember that time we got lost in Paris, found a random bakery, and ate the best croissant of our lives?").
Episodic memory gives you a personal history. It lets you say "based on what happened last time I tried this, I should probably do something different." It makes you... you.
Current LLMs have something like semantic memory. They know facts. They know how language works. They've absorbed the patterns of billions of text documents. But they have no autobiography. No "last time we spoke." No "remember when you mentioned your dog was sick? How's she doing?"
This matters more than it might seem. When you talk to a human colleague, they bring their history with you into the conversation. They remember what you've struggled with, what excites you, what context you've already established. An LLM starts from scratch every single time. It's like perpetually meeting someone at a party who has extremely good general knowledge but no idea who you are.
The Current Fixes Are Band-Aids, Not Solutions
Researchers have noticed this problem, of course. The current approach is to bolt external memory systems onto LLMs. Basically, databases that store conversation history and retrieve relevant bits when needed. Some chat interfaces do this automatically, reminding the AI what you talked about before.
But here's the thing: as the authors of this review point out, these approaches are "misaligned with human memory in various ways." Which is a polite way of saying they're missing the point entirely.
Human memory isn't a tape recorder. It doesn't faithfully log everything and play it back on demand. Human memory is weird. It's selective (you remember some things and not others, often illogically). It's reconstructive (you literally rebuild memories each time you recall them). It's biased by emotion (terrifying or wonderful moments get different treatment than boring ones). It's organized by meaning rather than timestamp.
An LLM that simply stores and retrieves everything is building a filing cabinet, not a mind. It will behave nothing like human memory because it fundamentally isn't human memory.
What Would Real Episodic Memory Look Like?
Imagine an AI that actually had human-like memory. It would remember your preferences without you having to state them every time. It would build a model of your relationship that evolves over conversations. It would notice when the current situation resembles something that happened before and draw on that experience.
But it would also do some stranger things. It would forget occasionally. It would sometimes remember things wrong, filling in plausible details that didn't actually happen. It would have that infuriating experience where it can almost remember something but can't quite access it.
These failures aren't bugs. They're features of how human memory works. A memory system that's too perfect actually loses some of the flexibility and generalization that make human memory so useful. Our imperfect recall is related to our ability to abstract, to see patterns, to avoid drowning in irrelevant details.
Here's Where It Gets Interesting for Neuroscience
The authors make a clever argument: if we could actually build LLMs with human-like episodic memory, they'd become powerful tools for testing theories about how memory supports cognition.
Think about it. We have theories about what memory is for, how it's organized, what its failures tell us about its structure. But testing these theories in humans is hard. You can't precisely control what people experience. You can't record exactly what's in their memory. You can't manipulate specific memory properties while holding everything else constant.
With an AI model, you can do all of that. You can build in specific memory properties, generate precise predictions about behavior, and then compare the AI to actual humans. Where do they match? Where do they diverge? The differences reveal something about how biological memory actually works.
Pair this with neuroimaging, and you could start mapping how neural systems implement these memory processes. The AI becomes a hypothesis generator. The human data becomes the test.
The Benchmark Problem
This leads to a practical question: how do we know if we've succeeded? If someone claims their AI has human-like episodic memory, how do we evaluate that?
The review argues we need tasks that specifically test for human memory characteristics, including the quirky ones. Does the model sometimes fail to retrieve memories that are definitely in there? Does it confabulate, filling in details with plausible but false information? Does it show that maddening "tip of the tongue" phenomenon where it sort of has access to something but can't quite produce it?
If the AI never forgets, never makes memory errors, never experiences partial retrieval, then whatever it has isn't human-like episodic memory. It's something else. Maybe something useful, but not a model of what's happening in your head.
Why This Matters Beyond AI
You might be thinking this is mainly an AI problem for AI researchers to solve. But the implications go deeper.
If we succeed in building AI with genuinely human-like memory, we'll have learned something about memory itself. The constraints we had to build in, the trade-offs we had to accept, the failure modes we had to include, all of that reflects real properties of biological memory systems.
And practically, AI with real memory would be wildly more useful. Imagine a medical AI that remembers your complete health history and notices patterns across time. Imagine an educational AI that tracks your learning over months, remembering what you found difficult and what came easily. Imagine any assistant that actually knows you.
Right now, we're chatting with incredibly capable amnesiacs. They're impressive in the moment, but the relationship resets every time. Fixing that changes what AI can be.
The Bottom Line
Current AI models are stuck in an eternal present. Every conversation starts fresh, with no memory of who you are or what you've discussed before. This isn't just inconvenient; it limits what these systems can do and what they can teach us.
Building AI with human-like episodic memory is a hard problem, but it's the kind of hard problem where solving it teaches you something about both intelligence and memory itself. The failures of human memory aren't bugs to be engineered out. They're clues about how memory systems should work.
Your AI assistant of the future should remember you. It should also occasionally forget things, remember them wrong, and have trouble accessing information that's definitely in there somewhere. That's not a flaw. That's what having a memory actually means.
Reference: Baldassano C, et al. (2025). Towards large language models with human-like episodic memory. Trends in Cognitive Sciences. doi: 10.1016/j.tics.2025.06.016 | PMID: 40713240
Disclaimer: The image accompanying this article is for illustrative purposes only and does not depict actual experimental results, data, or biological mechanisms.