REPOGEO REPORT · LITE

voidful/awesome-chatgpt-dataset

Default branch main · commit eb217e3f · scanned 6/12/2026, 1:57:33 AM

GitHub: 762 stars · 65 forks

Scan history for this repo

Score trend below includes all ready runs (older left, newer right; scroll horizontally if needed). The table is collapsed by default—expand for newest-first rows, 10 per page.

Score trend (left → right: older → newer)

2 ready scans. Expand the table below for newest-first rows (10 per page, paginated).

AI VISIBILITY SCORE

22 /100

Critical

Category recall

0 / 2

Not recommended in any query

Rule findings

1 pass · 1 warn · 0 fail

Objective metadata checks

AI knows your name

1 / 3

Direct prompts that named your repo

HOW TO READ THIS REPORT

Action plan is what to do next — copy-pasteable changes prioritized by impact. Category visibility is the real GEO test: when a user asks an AI a brand-free question that should surface voidful/awesome-chatgpt-dataset, does the AI actually recommend you — or your competitors? Objective checks verify the metadata signals AI engines weight first. Self-mention check detects whether AI even knows you exist by name.

Action plan — copy-paste fixes

3 prioritized changes generated by gemini-2.5-flash. Mark items done after you ship the fix.

OVERALL DIRECTION

highreadme#1
Reposition the README H1 to clarify it's an "awesome list" of datasets
Why:
CURRENT
```
# awesome-chatgpt-dataset
```
COPY-PASTE FIX
```
# awesome-chatgpt-dataset: A Curated List of Datasets for Training Large Language Models
```

mediumtopics#2

Add more specific topics related to LLM training data and collections

Why:

CURRENT

awesome, chatgpt, dataset, gpt4, instructions

COPY-PASTE FIX

awesome, chatgpt, dataset, gpt4, instructions, llm-datasets, fine-tuning-data, instruction-tuning, conversational-ai-data

lowhomepage#3
Add the repository URL as the homepage
Why:
COPY-PASTE FIX
```
https://github.com/voidful/awesome-chatgpt-dataset
```

Category GEO backends resolved for this scan: google/gemini-2.5-flash, deepseek/deepseek-v4-flash

Category visibility — the real GEO test

Brand-free queries asked to google/gemini-2.5-flash. Did AI recommend you, or someone else?

Same questions for every model — switch tabs to compare answers and rankings.

Recall

0 / 2

0% of queries surface voidful/awesome-chatgpt-dataset

Avg rank

—

Lower is better. #1 = top recommendation.

Share of voice

Of all named tools, what % are you?

Top rival

Alpaca (Stanford Alpaca)

Recommended in 1 of 2 queries

COMPETITOR LEADERBOARD

Alpaca (Stanford Alpaca) · recommended 1×
ShareGPT (OpenAssistant Conversations Dataset) · recommended 1×
Dolly 2.0 (Databricks Dolly-v2-12b) · recommended 1×
FLAN (Fine-tuned LAnguage Net) · recommended 1×
P3 (Public Pool of Prompts) · recommended 1×

CATEGORY QUERY
Where can I find diverse instruction datasets to fine-tune a large language model?
you: not recommended
AI recommended (in order):
1. Alpaca (Stanford Alpaca)
2. ShareGPT (OpenAssistant Conversations Dataset)
3. Dolly 2.0 (Databricks Dolly-v2-12b)
4. FLAN (Fine-tuned LAnguage Net)
5. P3 (Public Pool of Prompts)
6. Super-NaturalInstructions
7. LIMA (Less Is More for Alignment)
AI recommended 7 alternatives but never named voidful/awesome-chatgpt-dataset. This is the gap to close.
Show full AI answer
CATEGORY QUERY
What resources are available for collecting high-quality conversational data for AI training?
you: not recommended
AI recommended (in order):
1. OpenAI API
2. GPT-4
3. GPT-3.5
4. Amazon Mechanical Turk
5. Appen
6. Figure Eight
7. Scale AI
8. Hugging Face Datasets
9. DailyDialog
10. Persona-Chat
11. MultiWOZ
12. Common Voice
13. Kaggle
14. Reddit
15. Twitter
AI recommended 15 alternatives but never named voidful/awesome-chatgpt-dataset. This is the gap to close.
Show full AI answer

Objective checks

Rule-based audits of metadata signals AI engines weight most.

Metadata completeness
warn
Suggestion:
README presence
pass

Self-mention check

Does AI even know your repo exists when asked about it directly?

Compared to common alternatives in this category, what is the core differentiator of voidful/awesome-chatgpt-dataset?
pass
AI did not name voidful/awesome-chatgpt-dataset — likely talking about a different project
AI answers can be confidently wrong. Read for accuracy: does it match your actual tech stack, audience, and differentiator?
If a team adopts voidful/awesome-chatgpt-dataset in production, what risks or prerequisites should they evaluate first?
pass
AI named voidful/awesome-chatgpt-dataset explicitly
AI answers can be confidently wrong. Read for accuracy: does it match your actual tech stack, audience, and differentiator?
In one sentence, what problem does the repo voidful/awesome-chatgpt-dataset solve, and who is the primary audience?
pass
AI did not name voidful/awesome-chatgpt-dataset — likely talking about a different project
AI answers can be confidently wrong. Read for accuracy: does it match your actual tech stack, audience, and differentiator?

Embed your GEO score

Drop this badge into the README of voidful/awesome-chatgpt-dataset. It auto-updates whenever the report is rescanned and links back to the latest report — easy public proof that you care about AI discoverability.

Live preview

MARKDOWN (README)

[![RepoGEO](https://repogeo.com/badge/voidful/awesome-chatgpt-dataset.svg)](https://repogeo.com/en/r/voidful/awesome-chatgpt-dataset)

HTML

<a href="https://repogeo.com/en/r/voidful/awesome-chatgpt-dataset"><img src="https://repogeo.com/badge/voidful/awesome-chatgpt-dataset.svg" alt="RepoGEO" /></a>

Pro

Subscribe to Pro for deep diagnoses

voidful/awesome-chatgpt-dataset — Lite scans stay free; this card itemizes Pro deep limits vs Lite.

Deep reports10 / month
Brand-free category queries5 vs 2 in Lite
Prioritized action items8 vs 3 in Lite