GenAI System Evaluation

This repository contains sample notebooks to demonstrate how to evaluate an LLM-augmented system. It provides tools and methods for local evaluation.

Environment

Ensure you've enabled Claude Sonnet and Claude Haiku in the Bedrock Console
Ensure you have adequate permissions to call Bedrock from the Python SDK (Boto3)

Local

These notebooks were tested with Python 3.12. If you're running locally, ensure you're using 3.12. Also ensure that you have the AWS CLI setup with the credentials you want set to the default profile. These credentials need access to Amazon Bedrock Models

Folder Structure

LLM-System-Validation/
├── data/                  # RAG context and validation datasets
├── example-notebooks/     # Notebooks for evaluating various components
|__ script/                # Various scripts for setting up environment.
|__ .github/               # Example github actions

Details

data/: Contains the datasets used for Retrieval-Augmented Generation (RAG) context and validation.
example-notebooks/: Jupyter notebooks demonstrating the evaluation of:
- Embeddings and Chunking Strategy
- Reranking with large chunk sizes
- LLM-As-A-Judge Prompt Engineering
- RAG Prompt Engineering
- E2E RAG Testing

Getting Started

Clone the repository:

git clone git@github.com:aws-samples/genai-system-evaluation.git
cd genai-system-evaluation

Set up a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows, use `venv\Scripts\activate`

Install the required dependencies:
```
pip install -r requirements.txt
```

Download opensearch docs for RAG context.

$ cd data && mkdir opensearch-docs && cd  opensearch-docs
$ git clone https://github.com/opensearch-project/documentation-website.git

Go to notebook examples & start jupyter notebooks!
```
$ cd ../../example-notebooks
$ jupyter notebook
```
Start at notebook 1 and work your way through them!

Usage

Explore the example notebooks in the example-notebooks/ directory to understand different evaluation techniques.

Authors

Tanner McRae - Initial work - github
Felix Huthmacher - Initial work - github

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data/eval-datasets		data/eval-datasets
example-notebooks		example-notebooks
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GenAI System Evaluation

Environment

Local

Table of Contents

Folder Structure

Details

Getting Started

Usage

Authors

Security

License

About

Releases

Packages

Contributors 3

Languages

License

aws-samples/genai-system-evaluation

Folders and files

Latest commit

History

Repository files navigation

GenAI System Evaluation

Environment

Local

Table of Contents

Folder Structure

Details

Getting Started

Usage

Authors

Security

License

About

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages