Stream of Consciousness

Mark Eschbach's random writings on various topics.

Resource Augmented Generation

Categories: ai

Tags: resource-augmented-generation notes

Resource Augmented Generation (RAG)

Reviewing what others are saying about RAG and how to use it. Mainly looking to upskill how to externalize my knowledge to others.

What is Retrieval-Augmented Generation (RAG)?

LLM is instructed to retrieve content from a dedicated corpus of document models with primary source data. Increases confidence based on retrieved data.

RAG vs Fine-Tuning vs Prompt Engineering: Optimizing AI Models

Retrieval Augmented Generation

Query -> Model -> Corpus

Converts the queries and documents into vector embeddings.

  • Searches for documents semantically similar documents
  • Allows for updated information and domain specific knowledge
  • Adds additional latency to the response of the model.
    • Vector embeddings can be expensive

Fine-Tuning

Takes an existing model and give it a focused data set to develop expertise on. Changes the model weights, generally using backpropagation using supervised learning with domain specific request and response models. Very fast since it is baked into the model weight.

Down sides:

  • Thousands of training points
  • Computational cost of training a new model
  • Catastrophic forgetting – forgets skills unrelated to the specialized training.

Prompt training

Benefits;

  • No infrastructure reuqired for it
  • Immediate results
  • Prompt is an art

Drawbacks:

  • Limited to existing knowledge
  • No additional knowledge

Summary

All three are generally used together.

LangChain RAG: Optimizing AI Models for Accurate Responses

Great chart to pull from about ~2 minutes

Walks through how RAG works. Specifically, the two flows: on-line queries and off-line loading. They note the search space is target model specific.

Off-line loading

They use retrievers to load multiple web pages into a langchain Chroma database as a set of vectors. Allows for queries to find related queries based on semantically relevant responses.

On-line queries

A prompt is a combination of instructions + search results + question sent to the LLM. Then inferences are made based off this.