REPOGEO REPORT · LITE
xiaowu0162/LongMemEval
Default branch main · commit 9e0b455f · scanned 6/1/2026, 11:18:07 PM
GitHub: 817 stars · 62 forks
Action plan is what to do next — copy-pasteable changes prioritized by impact. Category visibility is the real GEO test: when a user asks an AI a brand-free question that should surface xiaowu0162/LongMemEval, does the AI actually recommend you — or your competitors? Objective checks verify the metadata signals AI engines weight first. Self-mention check detects whether AI even knows you exist by name.
Action plan — copy-paste fixes
3 prioritized changes generated by gemini-2.5-flash. Mark items done after you ship the fix.
- hightopics#1Add relevant topics to the repository
Why:
COPY-PASTE FIXllm, benchmark, long-term-memory, conversational-ai, nlp, evaluation, chat-assistants, iclr-2025
- highreadme#2Clarify README's opening to emphasize 'benchmark' and 'LLM evaluation'
Why:
CURRENTWe introduce LongMemEval, a comprehensive, challenging, and scalable benchmark for testing the long-term memory of chat assistants.
COPY-PASTE FIXLongMemEval is a comprehensive, challenging, and scalable benchmark specifically designed for rigorously evaluating the long-term interactive memory capabilities of Large Language Models (LLMs) and chat assistants. It is not a chatbot development framework or tool, but an evaluation suite.
- mediumhomepage#3Add the project homepage URL
Why:
COPY-PASTE FIXhttps://xiaowu0162.github.io/long-mem-eval/
Category GEO backends resolved for this scan: google/gemini-2.5-flash, deepseek/deepseek-v4-flash
Category visibility — the real GEO test
Brand-free queries asked to google/gemini-2.5-flash. Did AI recommend you, or someone else?
Same questions for every model — switch tabs to compare answers and rankings.
- ConvLab-3 · recommended 1×
- Multi-Session Chat (MSC) Dataset · recommended 1×
- ParlAI · recommended 1×
- Topical-Chat · recommended 1×
- DSTC (Dialogue System Technology Challenges) Tracks · recommended 1×
- CATEGORY QUERYWhat are the best benchmarks for evaluating large language model long-term conversational memory?you: not recommendedAI recommended (in order):
- ConvLab-3
- Multi-Session Chat (MSC) Dataset
- ParlAI
- Topical-Chat
- DSTC (Dialogue System Technology Challenges) Tracks
- Personalization in Dialogue (PiD) Dataset
AI recommended 6 alternatives but never named xiaowu0162/LongMemEval. This is the gap to close.
Show full AI answer
- CATEGORY QUERYHow can I test a chatbot's ability to retain information across many user interactions?you: not recommendedAI recommended (in order):
- Rasa X (RasaHQ/rasa-x)
- Botpress (botpress/botpress)
- Rasa (RasaHQ/rasa)
- Botium (botium/botium-core)
- pytest (pytest-dev/pytest)
- Jest (facebook/jest)
- Microsoft Bot Framework Emulator (microsoft/botframework-emulator)
- JUnit (junit-team/junit5)
AI recommended 8 alternatives but never named xiaowu0162/LongMemEval. This is the gap to close.
Show full AI answer
Objective checks
Rule-based audits of metadata signals AI engines weight most.
- Metadata completenesswarn
Suggestion:
- README presencepass
Self-mention check
Does AI even know your repo exists when asked about it directly?
- Compared to common alternatives in this category, what is the core differentiator of xiaowu0162/LongMemEval?passAI named xiaowu0162/LongMemEval explicitly
AI answers can be confidently wrong. Read for accuracy: does it match your actual tech stack, audience, and differentiator?
- If a team adopts xiaowu0162/LongMemEval in production, what risks or prerequisites should they evaluate first?passAI named xiaowu0162/LongMemEval explicitly
AI answers can be confidently wrong. Read for accuracy: does it match your actual tech stack, audience, and differentiator?
- In one sentence, what problem does the repo xiaowu0162/LongMemEval solve, and who is the primary audience?passAI named xiaowu0162/LongMemEval explicitly
AI answers can be confidently wrong. Read for accuracy: does it match your actual tech stack, audience, and differentiator?
Embed your GEO score
Drop this badge into the README of xiaowu0162/LongMemEval. It auto-updates whenever the report is rescanned and links back to the latest report — easy public proof that you care about AI discoverability.
[](https://repogeo.com/en/r/xiaowu0162/LongMemEval)<a href="https://repogeo.com/en/r/xiaowu0162/LongMemEval"><img src="https://repogeo.com/badge/xiaowu0162/LongMemEval.svg" alt="RepoGEO" /></a>Subscribe to Pro for deep diagnoses
xiaowu0162/LongMemEval — Lite scans stay free; this card itemizes Pro deep limits vs Lite.
- Deep reports10 / month
- Brand-free category queries5 vs 2 in Lite
- Prioritized action items8 vs 3 in Lite