by Vamsi
Open source · 96k downloads · 41 likes
The T5 Paraphrase Paws model is designed to rephrase English sentences while preserving their original meaning. It is built on the T5 architecture and has been trained on the Google PAWS dataset, which specializes in paraphrase detection. Its key capabilities include generating natural and grammatically correct textual variations, making it ideal for enriching content or avoiding repetition. The model finds applications in areas such as natural language processing, data quality enhancement, and writing assistance. What sets it apart is its precision in maintaining meaning while offering varied formulations, thanks to its training on near-synonymous sentence pairs.
T5 Model for generating paraphrases of english sentences. Trained on the Google PAWS dataset.
## Requires sentencepiece: # !pip install sentencepiece PyTorch and TF models available
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("Vamsi/T5_Paraphrase_Paws")
model = AutoModelForSeq2SeqLM.from_pretrained("Vamsi/T5_Paraphrase_Paws").to('cuda')
sentence = "This is something which i cannot understand at all"
text = "paraphrase: " + sentence + " </s>"
encoding = tokenizer.encode_plus(text,pad_to_max_length=True, return_tensors="pt")
input_ids, attention_masks = encoding["input_ids"].to("cuda"), encoding["attention_mask"].to("cuda")
outputs = model.generate(
input_ids=input_ids, attention_mask=attention_masks,
max_length=256,
do_sample=True,
top_k=120,
top_p=0.95,
early_stopping=True,
num_return_sequences=5
)
for output in outputs:
line = tokenizer.decode(output, skip_special_tokens=True,clean_up_tokenization_spaces=True)
print(line)
For more reference on training your own T5 model or using this model, do check out Paraphrase Generation.