LLama2 Inference Project

This repository contains the implementation of Meta's LLAMA2 language model for inferencing. The implementation is built from scratch using only pytorch and standard python libs.

Model key features:

Rotary Positional Embedding
RMS Normalization
KV cache
Grouped Query Attention (GQA)
SwiGLU activation

Also developing a range of inference methods including Greedy, Beam Search, Temperature Scaling, Random Sampling, Top-K and Top-P, providing flexibility for users to choose the optimal approach for their specific natural language processing (NLP) tasks. Currently, the Top-P inference method has been successfully integrated into the codebase.

Installation

pip install -r requirements.txt

Model Weights

The model weights can be downloaded from Meta's LLAMA2 model repository and then downloaded with the download.sh file. Please refer to the official repository for the most up-to-date information on weight downloads.

How to Run Inference

Clone the repository:

git clone https://github.com/gkpodder/Llama2-implementation.git

Download the model weights:

./download.sh

Run inference:

python inference.py

Benchmarking

KV caching and GQA resulted in a significant speed up in inference of about 50% in my benchmarking.

Contact

For questions or concerns, please open a GitHub issue

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitignore		.gitignore
Output.JPG		Output.JPG
README.md		README.md
download.sh		download.sh
inference.py		inference.py
model.py		model.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLama2 Inference Project

Installation

Model Weights

How to Run Inference

Benchmarking

Contact

About

Releases

Packages

Languages

gkpodder/Llama2-implementation

Folders and files

Latest commit

History

Repository files navigation

LLama2 Inference Project

Installation

Model Weights

How to Run Inference

Benchmarking

Contact

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages