REPOGEO REPORT · LITE

songlab-cal/tape

Default branch master · commit 6d345c2b · scanned 6/2/2026, 7:32:40 AM

GitHub: 738 stars · 135 forks

Scan history for this repo

Score trend below includes all ready runs (older left, newer right; scroll horizontally if needed). The table is collapsed by default—expand for newest-first rows, 10 per page.

Score trend (left → right: older → newer)

2 ready scans. Expand the table below for newest-first rows (10 per page, paginated).

AI VISIBILITY SCORE

67 /100

Needs work

Category recall

1 / 2

Avg rank #3.0 when recommended

Rule findings

2 pass · 0 warn · 0 fail

Objective metadata checks

AI knows your name

3 / 3

Direct prompts that named your repo

HOW TO READ THIS REPORT

Action plan is what to do next — copy-pasteable changes prioritized by impact. Category visibility is the real GEO test: when a user asks an AI a brand-free question that should surface songlab-cal/tape, does the AI actually recommend you — or your competitors? Objective checks verify the metadata signals AI engines weight first. Self-mention check detects whether AI even knows you exist by name.

Action plan — copy-paste fixes

3 prioritized changes generated by gemini-2.5-flash. Mark items done after you ship the fix.

OVERALL DIRECTION

highreadme#1

Reposition the README's opening to highlight datasets for training PLMs

Why:

CURRENT

Data, weights, and code for running the TAPE benchmark on a trained protein embedding. We provide a pretraining corpus, five supervised downstream tasks, pretrained language model weights, and benchmarking code. This code has been updated to use pytorch - as such previous pretrained model weights and code will not work. The previous tensorflow TAPE repository is still available at https://github.com/songlab-cal/tape-neurips2019. This repository is *not* an effort to maintain maximum compatibility and reproducability with the original paper, but is instead meant to facilitate ease of use and future development (both for us, and for the community). Although we provide much of the same functionality, we have not tested every aspect of training on all models/downstream tasks, and we have also made some deliberate changes. Therefore, if your goal is to reproduce the results from our paper, please use the original code. Our paper is available at https://arxiv.org/abs/1906.08230. Some documentation is incomplete. We will try to fill it in over time, but if there is something you would like an explanation for, please open an issue so we know where to focus our effort! **Update 09/26/2020:** We no longer recommend trying to train directly with TAPE's training code. It will likely still work for some time, but will not be updated for future pytorch versions. Internally, we have been working with different frameworks for training (specifi

COPY-PASTE FIX

Tasks Assessing Protein Embeddings (TAPE) provides a foundational benchmark, a pretraining corpus, and five supervised downstream tasks, serving as essential resources for *training* and evaluating protein language models. This repository offers the data, weights, and code necessary to run the TAPE benchmark, facilitating the development of new protein embeddings. While the provided datasets and tasks remain highly valuable for community use and development, please note that we no longer recommend trying to train directly with TAPE's specific training code, as it is not actively updated for future PyTorch versions. For reproducing results from the original paper, please use the original TensorFlow repository at https://github.com/songlab-cal/tape-neurips2019.

mediumreadme#2

Add a dedicated 'Key Resources' section to the README

Why:

COPY-PASTE FIX

## Key Resources

- **Pretraining Corpus:** A large corpus for training protein language models.
- **Five Supervised Downstream Tasks:** A set of biologically relevant tasks for evaluating protein embeddings across different domains.
- **Pretrained Language Model Weights:** Weights for various protein language models.
- **Benchmarking Code:** Tools and scripts for running the TAPE benchmark.

lowtopics#3

Add 'protein-language-models' to the topics list

Why:

CURRENT

benchmark, dataset, deep-learning, language-modeling, protein-sequences, protein-structure, pytorch, semi-supervised-learning

COPY-PASTE FIX

benchmark, dataset, deep-learning, language-modeling, protein-language-models, protein-sequences, protein-structure, pytorch, semi-supervised-learning

Category GEO backends resolved for this scan: google/gemini-2.5-flash, deepseek/deepseek-v4-flash

Category visibility — the real GEO test

Brand-free queries asked to google/gemini-2.5-flash. Did AI recommend you, or someone else?

Same questions for every model — switch tabs to compare answers and rankings.

Recall

1 / 2

50% of queries surface songlab-cal/tape

Avg rank

#3.0

Lower is better. #1 = top recommendation.

Share of voice

Of all named tools, what % are you?

Top rival

CASP

Recommended in 2 of 2 queries

COMPETITOR LEADERBOARD

CASP · recommended 2×
ESM-1b · recommended 2×
CAFA · recommended 1×
DeepFRI · recommended 1×
ESM-2 · recommended 1×

CATEGORY QUERY
How to evaluate the performance of different protein sequence embedding models?
you: #3
AI recommended (in order):
1. CAFA
2. DeepFRI
3. TAPE ← you
4. ESM-2
5. ProtT5-XL-UniRef50
6. Ankh
7. ProtTrans
8. MSA Transformer
9. SeqVec
10. UniRep
11. CASP
12. STRING database
13. BioGRID
14. SCOP/CATH databases
15. ProtBERT
16. UMAP
17. t-SNE
18. ConSurf
19. ESM-1b
20. Captum
21. TF-Explain
Show full AI answer
CATEGORY QUERY
Where can I find datasets and tasks for training protein language models?
you: not recommended
AI recommended (in order):
1. Hugging Face Datasets
2. ProteinNet
3. AlphaFold DB
4. UniProtKB/Swiss-Prot
5. UniRef
6. PDB
7. ESM-1b
8. CASP
AI recommended 8 alternatives but never named songlab-cal/tape. This is the gap to close.
Show full AI answer

Objective checks

Rule-based audits of metadata signals AI engines weight most.

Metadata completeness
pass
README presence
pass

Self-mention check

Does AI even know your repo exists when asked about it directly?

Compared to common alternatives in this category, what is the core differentiator of songlab-cal/tape?
pass
AI named songlab-cal/tape explicitly
AI answers can be confidently wrong. Read for accuracy: does it match your actual tech stack, audience, and differentiator?
If a team adopts songlab-cal/tape in production, what risks or prerequisites should they evaluate first?
pass
AI named songlab-cal/tape explicitly
AI answers can be confidently wrong. Read for accuracy: does it match your actual tech stack, audience, and differentiator?
In one sentence, what problem does the repo songlab-cal/tape solve, and who is the primary audience?
pass
AI named songlab-cal/tape explicitly
AI answers can be confidently wrong. Read for accuracy: does it match your actual tech stack, audience, and differentiator?

Embed your GEO score

Drop this badge into the README of songlab-cal/tape. It auto-updates whenever the report is rescanned and links back to the latest report — easy public proof that you care about AI discoverability.

Live preview

MARKDOWN (README)

[![RepoGEO](https://repogeo.com/badge/songlab-cal/tape.svg)](https://repogeo.com/en/r/songlab-cal/tape)

HTML

<a href="https://repogeo.com/en/r/songlab-cal/tape"><img src="https://repogeo.com/badge/songlab-cal/tape.svg" alt="RepoGEO" /></a>

Pro

Subscribe to Pro for deep diagnoses

songlab-cal/tape — Lite scans stay free; this card itemizes Pro deep limits vs Lite.

Deep reports10 / month
Brand-free category queries5 vs 2 in Lite
Prioritized action items8 vs 3 in Lite