llama 68m

Name: llama 68m
Rating: 2.0 (36 reviews)

by JackFram

Open source · 203k downloads · 36 likes

2.0

(36 reviews)ChatAPI & Local

About

This model, inspired by LLaMA but reduced to just 68 million parameters, was trained on excerpts from Wikipedia as well as portions of the C4-en and C4-realnewslike datasets. Designed primarily as a small speculative model for the SpecInfer research framework, it is intended for exploratory use or feasibility testing. While it has not yet undergone thorough evaluations, it can generate text autonomously, though its performance remains to be confirmed. Its main advantage lies in its lightweight design, enabling quick and resource-efficient experimentation. It is particularly well-suited for researchers or developers looking to test novel architectures or approaches without committing to larger models.

Documentation

Model description

This is a LLaMA-like model with only 68M parameters trained on Wikipedia and part of the C4-en and C4-realnewslike datasets.

No evaluation has been conducted yet, so use it with care.

The model is mainly developed as a base Small Speculative Model in the SpecInfer paper.

Citation

To cite the model, please use

Bibtex

@misc{miao2023specinfer,
      title={SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification}, 
      author={Xupeng Miao and Gabriele Oliaro and Zhihao Zhang and Xinhao Cheng and Zeyu Wang and Rae Ying Yee Wong and Zhuoming Chen and Daiyaan Arfeen and Reyna Abhyankar and Zhihao Jia},
      year={2023},
      eprint={2305.09781},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Capabilities & Tags

transformerspytorchllamatext-generationentext-generation-inferenceendpoints_compatible

Links & Resources