by allenai
Open source · 16k downloads · 65 likes
SPECTER is a language model designed to generate scalable vector representations (embeddings) of documents by leveraging the relationships between scientific publications through their citation graphs. Unlike traditional models, it produces relevant embeddings without requiring task-specific fine-tuning, making it particularly effective for analyzing academic texts. Its primary use cases include recommending articles, classifying documents, or retrieving information from scientific corpora. What sets it apart is its innovative approach, which incorporates citation context to capture richer semantic relationships between documents, thereby enhancing the quality of embeddings compared to conventional methods.
SPECTER is a pre-trained language model to generate document-level embedding of documents. It is pre-trained on a powerful signal of document-level relatedness: the citation graph. Unlike existing pretrained language models, SPECTER can be easily applied to downstream applications without task-specific fine-tuning.
If you're coming here because you want to embed papers, SPECTER has now been superceded by SPECTER2. Use that instead.
Paper: SPECTER: Document-level Representation Learning using Citation-informed Transformers
Original Repo: Github
Evaluation Benchmark: SciDocs
Authors: Arman Cohan, Sergey Feldman, Iz Beltagy, Doug Downey, Daniel S. Weld