RRepoGEO

REPOGEO REPORT · LITE

songlab-cal/tape

Default branch master · commit 6d345c2b · scanned 6/2/2026, 7:32:40 AM

GitHub: 738 stars · 135 forks

AI VISIBILITY SCORE
67 /100
Needs work
Category recall
1 / 2
Avg rank #3.0 when recommended
Rule findings
2 pass · 0 warn · 0 fail
Objective metadata checks
AI knows your name
3 / 3
Direct prompts that named your repo
HOW TO READ THIS REPORT

Action plan is what to do next — copy-pasteable changes prioritized by impact. Category visibility is the real GEO test: when a user asks an AI a brand-free question that should surface songlab-cal/tape, does the AI actually recommend you — or your competitors? Objective checks verify the metadata signals AI engines weight first. Self-mention check detects whether AI even knows you exist by name.

Action plan — copy-paste fixes

3 prioritized changes generated by gemini-2.5-flash. Mark items done after you ship the fix.

OVERALL DIRECTION
  • highreadme#1
    Reposition the README's opening to highlight datasets for training PLMs

    Why:

    CURRENT
    Data, weights, and code for running the TAPE benchmark on a trained protein embedding. We provide a pretraining corpus, five supervised downstream tasks, pretrained language model weights, and benchmarking code. This code has been updated to use pytorch - as such previous pretrained model weights and code will not work. The previous tensorflow TAPE repository is still available at https://github.com/songlab-cal/tape-neurips2019. This repository is *not* an effort to maintain maximum compatibility and reproducability with the original paper, but is instead meant to facilitate ease of use and future development (both for us, and for the community). Although we provide much of the same functionality, we have not tested every aspect of training on all models/downstream tasks, and we have also made some deliberate changes. Therefore, if your goal is to reproduce the results from our paper, please use the original code. Our paper is available at https://arxiv.org/abs/1906.08230. Some documentation is incomplete. We will try to fill it in over time, but if there is something you would like an explanation for, please open an issue so we know where to focus our effort! **Update 09/26/2020:** We no longer recommend trying to train directly with TAPE's training code. It will likely still work for some time, but will not be updated for future pytorch versions. Internally, we have been working with different frameworks for training (specifi
    COPY-PASTE FIX
    Tasks Assessing Protein Embeddings (TAPE) provides a foundational benchmark, a pretraining corpus, and five supervised downstream tasks, serving as essential resources for *training* and evaluating protein language models. This repository offers the data, weights, and code necessary to run the TAPE benchmark, facilitating the development of new protein embeddings. While the provided datasets and tasks remain highly valuable for community use and development, please note that we no longer recommend trying to train directly with TAPE's specific training code, as it is not actively updated for future PyTorch versions. For reproducing results from the original paper, please use the original TensorFlow repository at https://github.com/songlab-cal/tape-neurips2019.
  • mediumreadme#2
    Add a dedicated 'Key Resources' section to the README

    Why:

    COPY-PASTE FIX
    ## Key Resources
    
    - **Pretraining Corpus:** A large corpus for training protein language models.
    - **Five Supervised Downstream Tasks:** A set of biologically relevant tasks for evaluating protein embeddings across different domains.
    - **Pretrained Language Model Weights:** Weights for various protein language models.
    - **Benchmarking Code:** Tools and scripts for running the TAPE benchmark.
  • lowtopics#3
    Add 'protein-language-models' to the topics list

    Why:

    CURRENT
    benchmark, dataset, deep-learning, language-modeling, protein-sequences, protein-structure, pytorch, semi-supervised-learning
    COPY-PASTE FIX
    benchmark, dataset, deep-learning, language-modeling, protein-language-models, protein-sequences, protein-structure, pytorch, semi-supervised-learning

Category GEO backends resolved for this scan: google/gemini-2.5-flash, deepseek/deepseek-v4-flash

Category visibility — the real GEO test

Brand-free queries asked to google/gemini-2.5-flash. Did AI recommend you, or someone else?

Same questions for every model — switch tabs to compare answers and rankings.

Recall
1 / 2
50% of queries surface songlab-cal/tape
Avg rank
#3.0
Lower is better. #1 = top recommendation.
Share of voice
3%
Of all named tools, what % are you?
Top rival
CASP
Recommended in 2 of 2 queries
COMPETITOR LEADERBOARD
  1. CASP · recommended 2×
  2. ESM-1b · recommended 2×
  3. CAFA · recommended 1×
  4. DeepFRI · recommended 1×
  5. ESM-2 · recommended 1×
  • CATEGORY QUERY
    How to evaluate the performance of different protein sequence embedding models?
    you: #3
    AI recommended (in order):
    1. CAFA
    2. DeepFRI
    3. TAPE ← you
    4. ESM-2
    5. ProtT5-XL-UniRef50
    6. Ankh
    7. ProtTrans
    8. MSA Transformer
    9. SeqVec
    10. UniRep
    11. CASP
    12. STRING database
    13. BioGRID
    14. SCOP/CATH databases
    15. ProtBERT
    16. UMAP
    17. t-SNE
    18. ConSurf
    19. ESM-1b
    20. Captum
    21. TF-Explain
    Show full AI answer
  • CATEGORY QUERY
    Where can I find datasets and tasks for training protein language models?
    you: not recommended
    AI recommended (in order):
    1. Hugging Face Datasets
    2. ProteinNet
    3. AlphaFold DB
    4. UniProtKB/Swiss-Prot
    5. UniRef
    6. PDB
    7. ESM-1b
    8. CASP

    AI recommended 8 alternatives but never named songlab-cal/tape. This is the gap to close.

    Show full AI answer

Objective checks

Rule-based audits of metadata signals AI engines weight most.

  • Metadata completeness
    pass

  • README presence
    pass

Self-mention check

Does AI even know your repo exists when asked about it directly?

  • Compared to common alternatives in this category, what is the core differentiator of songlab-cal/tape?
    pass
    AI named songlab-cal/tape explicitly

    AI answers can be confidently wrong. Read for accuracy: does it match your actual tech stack, audience, and differentiator?

  • If a team adopts songlab-cal/tape in production, what risks or prerequisites should they evaluate first?
    pass
    AI named songlab-cal/tape explicitly

    AI answers can be confidently wrong. Read for accuracy: does it match your actual tech stack, audience, and differentiator?

  • In one sentence, what problem does the repo songlab-cal/tape solve, and who is the primary audience?
    pass
    AI named songlab-cal/tape explicitly

    AI answers can be confidently wrong. Read for accuracy: does it match your actual tech stack, audience, and differentiator?

Embed your GEO score

Drop this badge into the README of songlab-cal/tape. It auto-updates whenever the report is rescanned and links back to the latest report — easy public proof that you care about AI discoverability.

RepoGEO badge previewLive preview
MARKDOWN (README)
[![RepoGEO](https://repogeo.com/badge/songlab-cal/tape.svg)](https://repogeo.com/en/r/songlab-cal/tape)
HTML
<a href="https://repogeo.com/en/r/songlab-cal/tape"><img src="https://repogeo.com/badge/songlab-cal/tape.svg" alt="RepoGEO" /></a>
Pro

Subscribe to Pro for deep diagnoses

songlab-cal/tape — Lite scans stay free; this card itemizes Pro deep limits vs Lite.

  • Deep reports10 / month
  • Brand-free category queries5 vs 2 in Lite
  • Prioritized action items8 vs 3 in Lite