AI/EXPLORER
OutilsCatégoriesSitesLLMsComparerQuiz IAAlternativesPremium
—Outils IA
—Sites & Blogs
—LLMs & Modèles
—Catégories
AI Explorer

Trouvez et comparez les meilleurs outils d'intelligence artificielle pour vos projets.

Fait avecen France

Explorer

  • ›Tous les outils
  • ›Sites & Blogs
  • ›LLMs & Modèles
  • ›Comparer
  • ›Chatbots
  • ›Images IA
  • ›Code & Dev

Entreprise

  • ›Premium
  • ›À propos
  • ›Contact
  • ›Blog

Légal

  • ›Mentions légales
  • ›Confidentialité
  • ›CGV

© 2026 AI Explorer·Tous droits réservés.

AccueilLLMsChatgpt2 mini

gpt2 mini

par erwanf

Open source · 251k downloads · 4 likes

0.9
(4 avis)ChatAPI & Local
À propos

GPT-2 Mini est une version allégée du modèle GPT-2, comptant seulement 39 millions de paramètres. Entraîné sur un sous-ensemble de l'OpenWebText, il conserve la capacité de générer des textes complexes et cohérents tout en étant plus accessible pour des expérimentations rapides. Ce modèle est particulièrement adapté à la recherche et à l'enseignement, grâce à sa taille réduite qui permet des tests efficaces même sur des ressources limitées. Il partage le même tokenizer que le GPT-2 original, garantissant une compatibilité avec les outils existants. Son principal atout réside dans son équilibre entre performance et simplicité, offrant une solution pratique pour explorer les capacités des grands modèles de langage sans nécessiter des infrastructures coûteuses.

Documentation

GPT-2 Mini

A smaller GPT-2 model with (only) 39M parameters. It was pretrained on a subset of OpenWebText, the open-source version of the pretraining dataset used by OpenAI for the original GPT-2 models.

Uses

The purpose of this model is mainly for research and education. Its small size allows for fast experiments in resource-limited settings, while still being able of generating complex and coherent text.

Getting Started

Use the code below to get started with the model:

Py
from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model
model = AutoModelForCausalLM.from_pretrained("erwanf/gpt2-mini")
model.eval()

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("erwanf/gpt2-mini")

# Generate text
prompt = "Hello, I'm a language model,"
input_ids = tokenizer.encode(prompt, return_tensors="pt")

output = model.generate(input_ids, do_sample=True, max_length=50, num_return_sequences=5)
output_text = tokenizer.batch_decode(output, skip_special_tokens=True)
print(output_text)

Output:

VB.NET
["Hello, I'm a language model, I can't be more efficient in words.\n\nYou can use this as a point to find out the next bit in your system, and learn more about me.\n\nI think a lot of the",
 "Hello, I'm a language model, my teacher is a good teacher - a good school teacher – and one thing you have to remember:\n\nIt's not perfect. A school is not perfect; it isn't perfect at all!\n\n",
 'Hello, I\'m a language model, but if I can do something for you then go for it (for a word). Here is my blog, the language:\n\nI\'ve not used "normal" in English words, but I\'ve always',
 'Hello, I\'m a language model, I\'m talking to you the very first time I used a dictionary and it can be much better than one word in my dictionary. What would an "abnormal" English dictionary have to do with a dictionary and',
 'Hello, I\'m a language model, the most powerful representation of words and phrases in the language I\'m using."\n\nThe new rules change that makes it much harder for people to understand a language that does not have a native grammar (even with']

Training Details

The architecture relies on the GPT-2 model, with smaller dimensions and less layers. It uses the same tokenizer as GPT-2. We used the first 2M rows from the OpenWebText dataset, out of which we use 1k for test and validation sets.

Hyperparameters

HyperparameterValue
Model Parameters
Vocabulary Size50,257
Context Length512
Number of Layers4
Hidden Size512
Number of Attention Heads8
Intermediate Size2048
Activation FunctionGELU
DropoutNo
Training Parameters
Learning Rate5e-4
Batch Size256
OptimizerAdamW
beta10.9
beta20.98
Weight Decay0.1
Training Steps100,000
Warmup Steps4,000
Learning Rate SchedulerCosine
Training Dataset Size1M samples
Validation Dataset Size1k samples
Float Typebf16
Liens & Ressources
Spécifications
CatégorieChat
AccèsAPI & Local
LicenceOpen Source
TarificationOpen Source
Note
0.9

Essayer gpt2 mini

Accédez directement au modèle