by vinai
Open source · 109k downloads · 7 likes
BARTpho syllable is a cutting-edge Vietnamese language model based on the BART architecture, specifically designed for generative natural language processing tasks. It excels particularly in text generation and information synthesis, such as automatic summarization, where it outperforms existing models like mBART. This monolingual model, pre-trained on a large scale, stands out for its ability to accurately understand and produce Vietnamese text. Its use cases include summary generation, text reformulation, and automated content creation. BARTpho syllable positions itself as a high-performance solution for applications requiring a deep understanding of the Vietnamese language.
Two BARTpho versions BARTpho-syllable and BARTpho-word are the first public large-scale monolingual sequence-to-sequence models pre-trained for Vietnamese. BARTpho uses the "large" architecture and pre-training scheme of the sequence-to-sequence denoising model BART, thus especially suitable for generative NLP tasks. Experiments on a downstream task of Vietnamese text summarization show that in both automatic and human evaluations, BARTpho outperforms the strong baseline mBART and improves the state-of-the-art.
The general architecture and experimental results of BARTpho can be found in our paper:
@article{bartpho,
title = {{BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese}},
author = {Nguyen Luong Tran and Duong Minh Le and Dat Quoc Nguyen},
journal = {arXiv preprint},
volume = {arXiv:2109.09701},
year = {2021}
}
Please CITE our paper when BARTpho is used to help produce published results or incorporated into other software.
For further information or requests, please go to BARTpho's homepage!