by dmis-lab
Open source · 2k downloads · 0 likes
TinySapBERT is a language model specialized in representing biomedical entities, designed to be lightweight and efficient. It builds on an approach inspired by SapBERT, optimized for tasks such as named entity recognition (NER) in the medical field. This model stands out for its reduced size, making deployment easier while maintaining strong accuracy on biomedical data. It is particularly well-suited for applications requiring fast and reliable analysis of scientific or clinical texts. TinySapBERT is part of the KAZU ecosystem, offering an accessible solution for researchers and businesses working in the healthcare sector.
This model repository presents "TinySapBERT", tiny-sized biomedical entity representations (language model) trained using official SapBERT code and instructions (Liu et al., NAACL 2021).
We used our TinyPubMedBERT, a tiny-sized LM, as an initial starting point to train using the SapBERT scheme.
cf) TinyPubMedBERT is a distillated PubMedBERT (Gu et al., 2021), open-sourced along with the release of the KAZU (Korea University and AstraZeneca) framework.
Joint-first authorship of Richard Jackson (AstraZeneca) and WonJin Yoon (Korea University).
Please cite the simplified version using the following section, or find the full citation information here
@inproceedings{YoonAndJackson2022BiomedicalNER,
title="Biomedical {NER} for the Enterprise with Distillated {BERN}2 and the Kazu Framework",
author="Yoon, Wonjin and Jackson, Richard and Ford, Elliot and Poroshin, Vladimir and Kang, Jaewoo",
booktitle="Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track",
month = dec,
year = "2022",
address = "Abu Dhabi, UAE",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2022.emnlp-industry.63",
pages = "619--626",
}
The model used resources of SapBERT paper. We appreciate the authors for making the resources publicly available!
Liu, Fangyu, et al. "Self-Alignment Pretraining for Biomedical Entity Representations."
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021.
For help or issues using the codes or model (NER module of KAZU) in this repository, please contact WonJin Yoon (wonjin.info (at) gmail.com) or submit a GitHub issue.