This model repository presents "TinySapBERT", tiny-sized biomedical entity representations (language model) trained using official SapBERT code and instructions (Liu et al., NAACL 2021). We used our TinyPubMedBERT, a tiny-sized LM, as an initial starting point to train using the SapBERT scheme.
cf) TinyPubMedBERT is a distillated PubMedBERT (Gu et al., 2021), open-sourced along with the release of the KAZU (Korea University and AstraZeneca) framework.

For details, please visit KAZU framework or see our paper entitled Biomedical NER for the Enterprise with Distillated BERN2 and the Kazu Framework, (EMNLP 2022 industry track).
For the demo of KAZU framework, please visit http://kazu.korea.ac.kr

Citation info

Joint-first authorship of Richard Jackson (AstraZeneca) and WonJin Yoon (Korea University).
Please cite the simplified version using the following section, or find the full citation information here

INI

@inproceedings{YoonAndJackson2022BiomedicalNER,
  title="Biomedical {NER} for the Enterprise with Distillated {BERN}2 and the Kazu Framework",
  author="Yoon, Wonjin and Jackson, Richard and Ford, Elliot and Poroshin, Vladimir and Kang, Jaewoo",
  booktitle="Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track",
  month = dec,
  year = "2022",    
  address = "Abu Dhabi, UAE",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2022.emnlp-industry.63",
  pages = "619--626",
}

The model used resources of SapBERT paper. We appreciate the authors for making the resources publicly available!

Rust

Liu, Fangyu, et al. "Self-Alignment Pretraining for Biomedical Entity Representations." 
Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021.

Contact Information

For help or issues using the codes or model (NER module of KAZU) in this repository, please contact WonJin Yoon (wonjin.info (at) gmail.com) or submit a GitHub issue.

@inproceedings{YoonAndJackson2022BiomedicalNER, title="Biomedical {NER} for the Enterprise with Distillated {BERN}2 and the Kazu Framework", author="Yoon, Wonjin and Jackson, Richard and Ford, Elliot and Poroshin, Vladimir and Kang, Jaewoo", booktitle="Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing: Industry Track", month = dec, year = "2022", address = "Abu Dhabi, UAE", publisher = "Association for Computational Linguistics", url = "https://aclanthology.org/2022.emnlp-industry.63", pages = "619--626", }

Liu, Fangyu, et al. "Self-Alignment Pretraining for Biomedical Entity Representations." Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2021.

TinySapBERT from TinyPubMedBERT v1.0

Citation info

Contact Information

TinySapBERT from TinyPubMedBERT v1.0

Citation info

Contact Information