REPOGEO REPORT · LITE

xiaowu0162/LongMemEval

Default branch main · commit 9e0b455f · scanned 6/1/2026, 11:18:07 PM

GitHub: 817 stars · 62 forks

AI VISIBILITY SCORE

35 /100

Critical

Category recall

0 / 2

Not recommended in any query

Rule findings

1 pass · 1 warn · 0 fail

Objective metadata checks

AI knows your name

3 / 3

Direct prompts that named your repo

HOW TO READ THIS REPORT

Action plan is what to do next — copy-pasteable changes prioritized by impact. Category visibility is the real GEO test: when a user asks an AI a brand-free question that should surface xiaowu0162/LongMemEval, does the AI actually recommend you — or your competitors? Objective checks verify the metadata signals AI engines weight first. Self-mention check detects whether AI even knows you exist by name.

Action plan — copy-paste fixes

3 prioritized changes generated by gemini-2.5-flash. Mark items done after you ship the fix.

OVERALL DIRECTION

hightopics#1

Add relevant topics to the repository

Why:

COPY-PASTE FIX

llm, benchmark, long-term-memory, conversational-ai, nlp, evaluation, chat-assistants, iclr-2025

highreadme#2

Clarify README's opening to emphasize 'benchmark' and 'LLM evaluation'

Why:

CURRENT

We introduce LongMemEval, a comprehensive, challenging, and scalable benchmark for testing the long-term memory of chat assistants.

COPY-PASTE FIX

LongMemEval is a comprehensive, challenging, and scalable benchmark specifically designed for rigorously evaluating the long-term interactive memory capabilities of Large Language Models (LLMs) and chat assistants. It is not a chatbot development framework or tool, but an evaluation suite.

mediumhomepage#3
Add the project homepage URL
Why:
COPY-PASTE FIX
```
https://xiaowu0162.github.io/long-mem-eval/
```

Category GEO backends resolved for this scan: google/gemini-2.5-flash, deepseek/deepseek-v4-flash

Category visibility — the real GEO test

Brand-free queries asked to google/gemini-2.5-flash. Did AI recommend you, or someone else?

Same questions for every model — switch tabs to compare answers and rankings.

Recall

0 / 2

0% of queries surface xiaowu0162/LongMemEval

Avg rank

—

Lower is better. #1 = top recommendation.

Share of voice

Of all named tools, what % are you?

Top rival

ConvLab-3

Recommended in 1 of 2 queries

COMPETITOR LEADERBOARD

ConvLab-3 · recommended 1×
Multi-Session Chat (MSC) Dataset · recommended 1×
ParlAI · recommended 1×
Topical-Chat · recommended 1×
DSTC (Dialogue System Technology Challenges) Tracks · recommended 1×

CATEGORY QUERY
What are the best benchmarks for evaluating large language model long-term conversational memory?
you: not recommended
AI recommended (in order):
1. ConvLab-3
2. Multi-Session Chat (MSC) Dataset
3. ParlAI
4. Topical-Chat
5. DSTC (Dialogue System Technology Challenges) Tracks
6. Personalization in Dialogue (PiD) Dataset
AI recommended 6 alternatives but never named xiaowu0162/LongMemEval. This is the gap to close.
Show full AI answer
CATEGORY QUERY
How can I test a chatbot's ability to retain information across many user interactions?
you: not recommended
AI recommended (in order):
1. Rasa X (RasaHQ/rasa-x)
2. Botpress (botpress/botpress)
3. Rasa (RasaHQ/rasa)
4. Botium (botium/botium-core)
5. pytest (pytest-dev/pytest)
6. Jest (facebook/jest)
7. Microsoft Bot Framework Emulator (microsoft/botframework-emulator)
8. JUnit (junit-team/junit5)
AI recommended 8 alternatives but never named xiaowu0162/LongMemEval. This is the gap to close.
Show full AI answer

Objective checks

Rule-based audits of metadata signals AI engines weight most.

Metadata completeness
warn
Suggestion:
README presence
pass

Self-mention check

Does AI even know your repo exists when asked about it directly?

Compared to common alternatives in this category, what is the core differentiator of xiaowu0162/LongMemEval?
pass
AI named xiaowu0162/LongMemEval explicitly
AI answers can be confidently wrong. Read for accuracy: does it match your actual tech stack, audience, and differentiator?
If a team adopts xiaowu0162/LongMemEval in production, what risks or prerequisites should they evaluate first?
pass
AI named xiaowu0162/LongMemEval explicitly
AI answers can be confidently wrong. Read for accuracy: does it match your actual tech stack, audience, and differentiator?
In one sentence, what problem does the repo xiaowu0162/LongMemEval solve, and who is the primary audience?
pass
AI named xiaowu0162/LongMemEval explicitly
AI answers can be confidently wrong. Read for accuracy: does it match your actual tech stack, audience, and differentiator?

Embed your GEO score

Drop this badge into the README of xiaowu0162/LongMemEval. It auto-updates whenever the report is rescanned and links back to the latest report — easy public proof that you care about AI discoverability.

Live preview

MARKDOWN (README)

[![RepoGEO](https://repogeo.com/badge/xiaowu0162/LongMemEval.svg)](https://repogeo.com/en/r/xiaowu0162/LongMemEval)

HTML

<a href="https://repogeo.com/en/r/xiaowu0162/LongMemEval"><img src="https://repogeo.com/badge/xiaowu0162/LongMemEval.svg" alt="RepoGEO" /></a>

Pro

Subscribe to Pro for deep diagnoses

xiaowu0162/LongMemEval — Lite scans stay free; this card itemizes Pro deep limits vs Lite.

Deep reports10 / month
Brand-free category queries5 vs 2 in Lite
Prioritized action items8 vs 3 in Lite