Skip to content

A starter application that shows data collection, embedding creation and querying, and text completion with OpenAI.

Notifications You must be signed in to change notification settings

initialcapacity/ai-starter

Repository files navigation

AI Starter

A starter application that shows a data collector architecture for retrieval augmented generation.

Technology stack

This codebase is written Go and runs on Google's Cloud Run and Cloud Run Jobs. It uses Go's built-in server and templates, along with Azure's OpenAI Client. It stores data in PostgreSQL and uses pgvector to write and query embeddings. A GitHub Action runs tests, builds the apps, runs migrations, then deploys to Google Cloud.

Architecture

The AI Starter consists of three applications communicating with one Postgres database.

  1. The data collector is a background process that collects data from one or more sources.
  2. The data analyzer is another background process that processes collected data.
  3. The web application collects a query from the user and displays a result to the user.
flowchart LR
    embeddings([OpenAI embeddings])
    user((User))
    app["Web App (Cloud Run)"]
    db[("PostgreSQL + pgvector")]
    llm([OpenAI completion])
    
    user -- query --> app
    app -- create embedding --> embeddings
    app -- search embeddings --> db
    app -- retrieve documents --> db
    app -- fetch text completion --> llm

    classDef node font-weight:bold,color:white,stroke:black,stroke-width:2px;
    classDef app fill:#3185FC;
    classDef db fill:#B744B8;
    classDef external fill:#FA9F42;
    classDef user fill:#ED6A5A;

    class app,collector,analyzer app;
    class db db;
    class docs,embeddings,llm external;
    class user user;
Loading
flowchart LR
    embeddings([OpenAI embeddings])
    docs(["RSS feeds"])
    db[("PostgreSQL + pgvector")]
    collector["Data Collector (Cloud Run Job)"]
    analyzer["Data Analyzer (Cloud Run Job)"]
    
    collector -- fetch documents --> docs
    collector -- save documents --> db
    analyzer -- retrieve documents --> db
    analyzer -- create embeddings --> embeddings
    analyzer -- "save embeddings (with reference)" --> db

    classDef node font-weight:bold,color:white,stroke:black,stroke-width:2px;
    classDef app fill:#3185FC;
    classDef db fill:#B744B8;
    class app,collector,analyzer app;
    classDef external fill:#FA9F42;
    classDef user fill:#ED6A5A;

    class db db;
    class docs,embeddings external;
    class user user;
Loading

Collection and Analysis

The data collector fetches documents from RSS feeds sources and stores the document text in the database. It also splits documents into chunks of less than 6000 tokens to ensure embedding and text completion calls stay below their token limits. The data analyzer sends document chunks to the OpenAI Embeddings API and uses pgvector to store the embeddings in PostgreSQL.

Web Application

The web application collects the user's query and creates an embedding with the OpenAI Embeddings API. It then searches the PostgreSQL for similar embeddings (using pgvector) and provides the corresponding chunk of text as context for a query to the OpenAI Chat Completion API.

Local development

  1. Install Go, PostgreSQL 15, and pgvector.

  2. Create and migrate the local databases.

    psql postgres < ./databases/create_databases.sql
    DATABASE_URL="user=starter password=starter database=starter_development host=localhost" go run ./cmd/migrate
    DATABASE_URL="user=starter password=starter database=starter_test host=localhost" go run ./cmd/migrate
  3. Copy the example environment file and fill in the necessary values.

    cp .env.example .env 
    source .env
  4. Run the collector and the analyzer to populate the database, then run the app and navigate to localhost:8778.

    go run ./cmd/collector
    go run ./cmd/analyzer
    go run ./cmd/app

Integration tests

The integration test script runs the collector and analyzer, then tests the app against the production OpenAI API.

source .env
go test ./cmd/integrationtest -count=1 -tags=integration

Evaluation

Run an evaluation against a populated database of articles and embeddings.

source .env
go run ./cmd/evaluator

View the results in a csv (scores.csv) or a markdown file (scores.md).

About

A starter application that shows data collection, embedding creation and querying, and text completion with OpenAI.

Topics

Resources

Stars

Watchers

Forks