par google
Open source · 116k downloads · 11 likes
Le modèle *Reformer Crime and Punishment* est un modèle de langage entraîné sur l’œuvre *Crime et Châtiment* de Dostoïevsky, spécialisé dans la génération de texte littéraire. Il s’appuie sur une architecture optimisée pour traiter de longs textes tout en conservant une grande cohérence narrative et stylistique. Ce modèle excelle dans la production de prose inspirée du style de l’auteur, avec une attention particulière portée aux nuances psychologiques et aux dialogues introspectifs. Il est particulièrement adapté à la création de récits, de dialogues ou d’analyses stylistiques imitant l’univers de Dostoïevsky. Ce qui le distingue, c’est sa capacité à générer des textes longs et structurés, tout en restant fidèle à l’esprit de l’œuvre originale.
Crime and Punishment is a novel written by Fyodor Dostoevsky and was translated into English.
Crime and Punishment training data was taken from gs://trax-ml/reformer/crime-and-punishment-2554.txt and contains
roughly 0.5M tokens.
The ReformerLM model was trained in flax using colab notebook proposed by authors: https://colab.research.google.com/github/google/trax/blob/master/trax/models/reformer/text_generation.ipynb and the weights were converted to Hugging Face's PyTorch ReformerLM model ReformerModelWithLMHead.
The model is a language model that operates on small sub-word units. Text can be generated as follows:
model = ReformerModelWithLMHead.from_pretrained("google/reformer-crime-and-punishment")
tok = ReformerTokenizer.from_pretrained("google/reformer-crime-and-punishment")
tok.decode(model.generate(tok.encode("A few months later", return_tensors="pt"), do_sample=True,temperature=0.7, max_length=100)[0])
# gives:'A few months later on was more than anything in the flat.
# “I have already.” “That’s not my notion that he had forgotten him.
# What does that matter? And why do you mean? It’s only another fellow,” he said as he went out, as though he want'