RRepoGEO

REPOGEO REPORT · LITE

darkrishabh/agent-skills-eval

Default branch main · commit b60eebe3 · scanned 6/8/2026, 9:32:14 PM

GitHub: 574 stars · 30 forks

AI VISIBILITY SCORE
40 /100
Critical
Category recall
0 / 2
Not recommended in any query
Rule findings
2 pass · 0 warn · 0 fail
Objective metadata checks
AI knows your name
3 / 3
Direct prompts that named your repo
HOW TO READ THIS REPORT

Action plan is what to do next — copy-pasteable changes prioritized by impact. Category visibility is the real GEO test: when a user asks an AI a brand-free question that should surface darkrishabh/agent-skills-eval, does the AI actually recommend you — or your competitors? Objective checks verify the metadata signals AI engines weight first. Self-mention check detects whether AI even knows you exist by name.

Action plan — copy-paste fixes

3 prioritized changes generated by gemini-2.5-flash. Mark items done after you ship the fix.

OVERALL DIRECTION
  • highreadme#1
    Strengthen README's opening statement to emphasize LLM agent evaluation and comparison

    Why:

    CURRENT
    **A test runner for Agent Skills.**
    
    Write a `SKILL.md`, drop in some evals, and find out — empirically — whether your skill actually makes the model better at the task.
    COPY-PASTE FIX
    **An empirical evaluation and comparison tool for LLM agent skills.**
    
    Run your agent skills against prompts, compare performance with and without skills, and prove — empirically — whether your skill actually makes the model better at the task.
  • mediumtopics#2
    Add broader LLM evaluation and testing topics

    Why:

    CURRENT
    agent-evals, agent-skills, agentskills, ai-agents, cli, jsonl, llm-evals, llm-evaluation, openai-compatible, typescript, yaml
    COPY-PASTE FIX
    agent-evals, agent-skills, agentskills, ai-agents, ai-evaluation, cli, jsonl, llm-evals, llm-evaluation, llm-testing, openai-compatible, performance-evaluation, typescript, yaml
  • lowabout#3
    Refine repository description for clearer emphasis on empirical comparison

    Why:

    CURRENT
    A test runner for agentskills.io-style AI agent skills
    COPY-PASTE FIX
    An empirical test runner for comparing AI agent skills (agentskills.io-style)

Category GEO backends resolved for this scan: google/gemini-2.5-flash, deepseek/deepseek-v4-flash

Category visibility — the real GEO test

Brand-free queries asked to google/gemini-2.5-flash. Did AI recommend you, or someone else?

Same questions for every model — switch tabs to compare answers and rankings.

Recall
0 / 2
0% of queries surface darkrishabh/agent-skills-eval
Avg rank
Lower is better. #1 = top recommendation.
Share of voice
0%
Of all named tools, what % are you?
Top rival
LlamaIndex
Recommended in 2 of 2 queries
COMPETITOR LEADERBOARD
  1. LlamaIndex · recommended 2×
  2. Hugging Face Transformers · recommended 1×
  3. OpenAI API · recommended 1×
  4. Neo4j · recommended 1×
  5. RDFox · recommended 1×
  • CATEGORY QUERY
    How to empirically test if an AI agent's domain knowledge improves task performance?
    you: not recommended
    AI recommended (in order):
    1. Hugging Face Transformers
    2. OpenAI API
    3. Neo4j
    4. RDFox
    5. Stardog
    6. LangChain
    7. LlamaIndex
    8. Pinecone
    9. Weaviate
    10. CLIPS
    11. Drools
    12. Gensim
    13. SpaCy
    14. Protégé

    AI recommended 14 alternatives but never named darkrishabh/agent-skills-eval. This is the gap to close.

    Show full AI answer
  • CATEGORY QUERY
    Tool for comparing LLM agent performance with and without specific contextual instructions?
    you: not recommended
    AI recommended (in order):
    1. LangSmith
    2. LlamaIndex
    3. Phoenix
    4. W&B Prompts
    5. Humanloop
    6. DeepEval
    7. OpenAI Evals
    8. Ragas

    AI recommended 8 alternatives but never named darkrishabh/agent-skills-eval. This is the gap to close.

    Show full AI answer

Objective checks

Rule-based audits of metadata signals AI engines weight most.

  • Metadata completeness
    pass

  • README presence
    pass

Self-mention check

Does AI even know your repo exists when asked about it directly?

  • Compared to common alternatives in this category, what is the core differentiator of darkrishabh/agent-skills-eval?
    pass
    AI named darkrishabh/agent-skills-eval explicitly

    AI answers can be confidently wrong. Read for accuracy: does it match your actual tech stack, audience, and differentiator?

  • If a team adopts darkrishabh/agent-skills-eval in production, what risks or prerequisites should they evaluate first?
    pass
    AI named darkrishabh/agent-skills-eval explicitly

    AI answers can be confidently wrong. Read for accuracy: does it match your actual tech stack, audience, and differentiator?

  • In one sentence, what problem does the repo darkrishabh/agent-skills-eval solve, and who is the primary audience?
    pass
    AI named darkrishabh/agent-skills-eval explicitly

    AI answers can be confidently wrong. Read for accuracy: does it match your actual tech stack, audience, and differentiator?

Embed your GEO score

Drop this badge into the README of darkrishabh/agent-skills-eval. It auto-updates whenever the report is rescanned and links back to the latest report — easy public proof that you care about AI discoverability.

RepoGEO badge previewLive preview
MARKDOWN (README)
[![RepoGEO](https://repogeo.com/badge/darkrishabh/agent-skills-eval.svg)](https://repogeo.com/en/r/darkrishabh/agent-skills-eval)
HTML
<a href="https://repogeo.com/en/r/darkrishabh/agent-skills-eval"><img src="https://repogeo.com/badge/darkrishabh/agent-skills-eval.svg" alt="RepoGEO" /></a>
Pro

Subscribe to Pro for deep diagnoses

darkrishabh/agent-skills-eval — Lite scans stay free; this card itemizes Pro deep limits vs Lite.

  • Deep reports10 / month
  • Brand-free category queries5 vs 2 in Lite
  • Prioritized action items8 vs 3 in Lite