REPOGEO REPORT · LITE
songlab-cal/tape
Default branch master · commit 6d345c2b · scanned 6/2/2026, 7:32:40 AM
GitHub: 738 stars · 135 forks
Action plan is what to do next — copy-pasteable changes prioritized by impact. Category visibility is the real GEO test: when a user asks an AI a brand-free question that should surface songlab-cal/tape, does the AI actually recommend you — or your competitors? Objective checks verify the metadata signals AI engines weight first. Self-mention check detects whether AI even knows you exist by name.
Action plan — copy-paste fixes
3 prioritized changes generated by gemini-2.5-flash. Mark items done after you ship the fix.
- highreadme#1Reposition the README's opening to highlight datasets for training PLMs
Why:
CURRENTData, weights, and code for running the TAPE benchmark on a trained protein embedding. We provide a pretraining corpus, five supervised downstream tasks, pretrained language model weights, and benchmarking code. This code has been updated to use pytorch - as such previous pretrained model weights and code will not work. The previous tensorflow TAPE repository is still available at https://github.com/songlab-cal/tape-neurips2019. This repository is *not* an effort to maintain maximum compatibility and reproducability with the original paper, but is instead meant to facilitate ease of use and future development (both for us, and for the community). Although we provide much of the same functionality, we have not tested every aspect of training on all models/downstream tasks, and we have also made some deliberate changes. Therefore, if your goal is to reproduce the results from our paper, please use the original code. Our paper is available at https://arxiv.org/abs/1906.08230. Some documentation is incomplete. We will try to fill it in over time, but if there is something you would like an explanation for, please open an issue so we know where to focus our effort! **Update 09/26/2020:** We no longer recommend trying to train directly with TAPE's training code. It will likely still work for some time, but will not be updated for future pytorch versions. Internally, we have been working with different frameworks for training (specifi
COPY-PASTE FIXTasks Assessing Protein Embeddings (TAPE) provides a foundational benchmark, a pretraining corpus, and five supervised downstream tasks, serving as essential resources for *training* and evaluating protein language models. This repository offers the data, weights, and code necessary to run the TAPE benchmark, facilitating the development of new protein embeddings. While the provided datasets and tasks remain highly valuable for community use and development, please note that we no longer recommend trying to train directly with TAPE's specific training code, as it is not actively updated for future PyTorch versions. For reproducing results from the original paper, please use the original TensorFlow repository at https://github.com/songlab-cal/tape-neurips2019.
- mediumreadme#2Add a dedicated 'Key Resources' section to the README
Why:
COPY-PASTE FIX## Key Resources - **Pretraining Corpus:** A large corpus for training protein language models. - **Five Supervised Downstream Tasks:** A set of biologically relevant tasks for evaluating protein embeddings across different domains. - **Pretrained Language Model Weights:** Weights for various protein language models. - **Benchmarking Code:** Tools and scripts for running the TAPE benchmark.
- lowtopics#3Add 'protein-language-models' to the topics list
Why:
CURRENTbenchmark, dataset, deep-learning, language-modeling, protein-sequences, protein-structure, pytorch, semi-supervised-learning
COPY-PASTE FIXbenchmark, dataset, deep-learning, language-modeling, protein-language-models, protein-sequences, protein-structure, pytorch, semi-supervised-learning
Category GEO backends resolved for this scan: google/gemini-2.5-flash, deepseek/deepseek-v4-flash
Category visibility — the real GEO test
Brand-free queries asked to google/gemini-2.5-flash. Did AI recommend you, or someone else?
Same questions for every model — switch tabs to compare answers and rankings.
- CASP · recommended 2×
- ESM-1b · recommended 2×
- CAFA · recommended 1×
- DeepFRI · recommended 1×
- ESM-2 · recommended 1×
- CATEGORY QUERYHow to evaluate the performance of different protein sequence embedding models?you: #3AI recommended (in order):
- CAFA
- DeepFRI
- TAPE ← you
- ESM-2
- ProtT5-XL-UniRef50
- Ankh
- ProtTrans
- MSA Transformer
- SeqVec
- UniRep
- CASP
- STRING database
- BioGRID
- SCOP/CATH databases
- ProtBERT
- UMAP
- t-SNE
- ConSurf
- ESM-1b
- Captum
- TF-Explain
Show full AI answer
- CATEGORY QUERYWhere can I find datasets and tasks for training protein language models?you: not recommendedAI recommended (in order):
- Hugging Face Datasets
- ProteinNet
- AlphaFold DB
- UniProtKB/Swiss-Prot
- UniRef
- PDB
- ESM-1b
- CASP
AI recommended 8 alternatives but never named songlab-cal/tape. This is the gap to close.
Show full AI answer
Objective checks
Rule-based audits of metadata signals AI engines weight most.
- Metadata completenesspass
- README presencepass
Self-mention check
Does AI even know your repo exists when asked about it directly?
- Compared to common alternatives in this category, what is the core differentiator of songlab-cal/tape?passAI named songlab-cal/tape explicitly
AI answers can be confidently wrong. Read for accuracy: does it match your actual tech stack, audience, and differentiator?
- If a team adopts songlab-cal/tape in production, what risks or prerequisites should they evaluate first?passAI named songlab-cal/tape explicitly
AI answers can be confidently wrong. Read for accuracy: does it match your actual tech stack, audience, and differentiator?
- In one sentence, what problem does the repo songlab-cal/tape solve, and who is the primary audience?passAI named songlab-cal/tape explicitly
AI answers can be confidently wrong. Read for accuracy: does it match your actual tech stack, audience, and differentiator?
Embed your GEO score
Drop this badge into the README of songlab-cal/tape. It auto-updates whenever the report is rescanned and links back to the latest report — easy public proof that you care about AI discoverability.
[](https://repogeo.com/en/r/songlab-cal/tape)<a href="https://repogeo.com/en/r/songlab-cal/tape"><img src="https://repogeo.com/badge/songlab-cal/tape.svg" alt="RepoGEO" /></a>Subscribe to Pro for deep diagnoses
songlab-cal/tape — Lite scans stay free; this card itemizes Pro deep limits vs Lite.
- Deep reports10 / month
- Brand-free category queries5 vs 2 in Lite
- Prioritized action items8 vs 3 in Lite