RRepoGEO

REPOGEO REPORT · LITE

chaoswork/sft_datasets

Default branch master · commit 1dde965b · scanned 6/11/2026, 8:42:51 AM

GitHub: 581 stars · 41 forks

AI VISIBILITY SCORE
35 /100
Critical
Category recall
0 / 2
Not recommended in any query
Rule findings
1 pass · 1 warn · 0 fail
Objective metadata checks
AI knows your name
3 / 3
Direct prompts that named your repo
HOW TO READ THIS REPORT

Action plan is what to do next — copy-pasteable changes prioritized by impact. Category visibility is the real GEO test: when a user asks an AI a brand-free question that should surface chaoswork/sft_datasets, does the AI actually recommend you — or your competitors? Objective checks verify the metadata signals AI engines weight first. Self-mention check detects whether AI even knows you exist by name.

Action plan — copy-paste fixes

3 prioritized changes generated by gemini-2.5-flash. Mark items done after you ship the fix.

OVERALL DIRECTION
  • highreadme#1
    Add a clear introductory sentence to the README

    Why:

    CURRENT
    # 开源SFT数据集整理
    COPY-PASTE FIX
    # 开源SFT数据集整理
    
    这是一个精心整理和持续更新的开源中文SFT(监督微调)数据集索引,旨在为大语言模型(LLM)的训练和研究提供高质量的指令遵循和多轮对话数据集。
  • highlicense#2
    Add a LICENSE file to the repository

    Why:

    COPY-PASTE FIX
    Create a `LICENSE` file in the repository root. For example, for an MIT License, the file content would be: `MIT License
    
    Copyright (c) [YEAR] [COPYRIGHT HOLDER]
    
    Permission is hereby granted, free of charge, to any person obtaining a copy
    of this software and associated documentation files (the "Software"), to deal
    in the Software without restriction, including without limitation the rights
    to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
    copies of the Software, and to permit persons to whom the Software is
    furnished to do so, subject to the following conditions:
    
    The above copyright notice and this permission notice shall be included in all
    copies or substantial portions of the Software.
    
    THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
    IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
    FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
    AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
    LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
    OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
    SOFTWARE.` (Remember to replace `[YEAR]` and `[COPYRIGHT HOLDER]` with appropriate values).
  • mediumhomepage#3
    Add a homepage URL to the repository settings

    Why:

    COPY-PASTE FIX
    Add a relevant URL (e.g., a project page, a blog post, or the GitHub repo URL itself if no external page exists) to the 'Homepage' field in the repository settings.

Category GEO backends resolved for this scan: google/gemini-2.5-flash, deepseek/deepseek-v4-flash

Category visibility — the real GEO test

Brand-free queries asked to google/gemini-2.5-flash. Did AI recommend you, or someone else?

Same questions for every model — switch tabs to compare answers and rankings.

Recall
0 / 2
0% of queries surface chaoswork/sft_datasets
Avg rank
Lower is better. #1 = top recommendation.
Share of voice
0%
Of all named tools, what % are you?
Top rival
Hugging Face Datasets
Recommended in 1 of 2 queries
COMPETITOR LEADERBOARD
  1. Hugging Face Datasets · recommended 1×
  2. CLUE (Chinese Language Understanding Evaluation) Benchmark · recommended 1×
  3. C-Eval · recommended 1×
  4. COIG (Chinese Open Instruction Generalist) · recommended 1×
  5. Belle · recommended 1×
  • CATEGORY QUERY
    Where can I find diverse Chinese datasets for supervised fine-tuning large language models?
    you: not recommended
    AI recommended (in order):
    1. Hugging Face Datasets
    2. CLUE (Chinese Language Understanding Evaluation) Benchmark
    3. C-Eval
    4. COIG (Chinese Open Instruction Generalist)
    5. Belle
    6. WudaoCorpora
    7. OpenDataLab
    8. ChatGLM
    9. Baichuan
    10. Qwen
    11. ACL
    12. EMNLP
    13. NAACL
    14. COLING
    15. NLPCC
    16. CCL
    17. Kaggle

    AI recommended 17 alternatives but never named chaoswork/sft_datasets. This is the gap to close.

    Show full AI answer
  • CATEGORY QUERY
    What open-source collections provide Chinese instruction-following datasets for LLM training?
    you: not recommended
    AI recommended (in order):
    1. Belle Datasets
    2. Firefly Datasets
    3. COIG Datasets
    4. MOSS-003-SFT Dataset
    5. PCL-Instruction

    AI recommended 5 alternatives but never named chaoswork/sft_datasets. This is the gap to close.

    Show full AI answer

Objective checks

Rule-based audits of metadata signals AI engines weight most.

  • Metadata completeness
    warn

    Suggestion:

  • README presence
    pass

Self-mention check

Does AI even know your repo exists when asked about it directly?

  • Compared to common alternatives in this category, what is the core differentiator of chaoswork/sft_datasets?
    pass
    AI named chaoswork/sft_datasets explicitly

    AI answers can be confidently wrong. Read for accuracy: does it match your actual tech stack, audience, and differentiator?

  • If a team adopts chaoswork/sft_datasets in production, what risks or prerequisites should they evaluate first?
    pass
    AI named chaoswork/sft_datasets explicitly

    AI answers can be confidently wrong. Read for accuracy: does it match your actual tech stack, audience, and differentiator?

  • In one sentence, what problem does the repo chaoswork/sft_datasets solve, and who is the primary audience?
    pass
    AI named chaoswork/sft_datasets explicitly

    AI answers can be confidently wrong. Read for accuracy: does it match your actual tech stack, audience, and differentiator?

Embed your GEO score

Drop this badge into the README of chaoswork/sft_datasets. It auto-updates whenever the report is rescanned and links back to the latest report — easy public proof that you care about AI discoverability.

RepoGEO badge previewLive preview
MARKDOWN (README)
[![RepoGEO](https://repogeo.com/badge/chaoswork/sft_datasets.svg)](https://repogeo.com/en/r/chaoswork/sft_datasets)
HTML
<a href="https://repogeo.com/en/r/chaoswork/sft_datasets"><img src="https://repogeo.com/badge/chaoswork/sft_datasets.svg" alt="RepoGEO" /></a>
Pro

Subscribe to Pro for deep diagnoses

chaoswork/sft_datasets — Lite scans stay free; this card itemizes Pro deep limits vs Lite.

  • Deep reports10 / month
  • Brand-free category queries5 vs 2 in Lite
  • Prioritized action items8 vs 3 in Lite