by JackFram
Open source · 203k downloads · 36 likes
This model, inspired by LLaMA but reduced to just 68 million parameters, was trained on excerpts from Wikipedia as well as portions of the C4-en and C4-realnewslike datasets. Designed primarily as a small speculative model for the SpecInfer research framework, it is intended for exploratory use or feasibility testing. While it has not yet undergone thorough evaluations, it can generate text autonomously, though its performance remains to be confirmed. Its main advantage lies in its lightweight design, enabling quick and resource-efficient experimentation. It is particularly well-suited for researchers or developers looking to test novel architectures or approaches without committing to larger models.
This is a LLaMA-like model with only 68M parameters trained on Wikipedia and part of the C4-en and C4-realnewslike datasets.
No evaluation has been conducted yet, so use it with care.
The model is mainly developed as a base Small Speculative Model in the SpecInfer paper.
To cite the model, please use
@misc{miao2023specinfer,
title={SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification},
author={Xupeng Miao and Gabriele Oliaro and Zhihao Zhang and Xinhao Cheng and Zeyu Wang and Rae Ying Yee Wong and Zhuoming Chen and Daiyaan Arfeen and Reyna Abhyankar and Zhihao Jia},
year={2023},
eprint={2305.09781},
archivePrefix={arXiv},
primaryClass={cs.CL}
}