04 Jun, 2024 - About 3 minutes
LlamaIndex
Intro
LlamaIndex is a framework for building context-augmented LLM applications.
Context
Context Augmentation refers to any use case that applies LLMs on top of your private or domain-specific data. Some popular use cases include the following:
- Question-Answering Chatbots (commonly referred to as RAG systems, which stands for “Retrieval-Augmented Generation”)
- Document Understanding and Extraction
- Autonomous Agents that can perform research and take actions
LlamaIndex provides the tools to build any of these above use cases from prototype to production. The tools allow you to both ingest/process this data and implement complex query workflows combining data access with LLM prompting.
Why Context Augmentation
LLMs offer a natural language interface between humans and data. Widely available models come pre-trained on huge amounts of publicly available data. However, they are not trained on your data, which may be private or specific to the problem you’re trying to solve. It’s behind APIs, in SQL databases, or trapped in PDFs and slide decks.
Getting Started
Download Data
This example uses the text of Paul Graham’s essay, “What I Worked On”.
Setup
Ollama is a tool to help you get set up with LLMs locally.
Check out my previous article on how to setup this.
NOTE: You will need a machine with at least 32GB of RAM.
PIP Packages
Install the required pip packages
pip3 install llama-index |
Then create a data dir and generate a data file paul_graham_essay.txt
with the content from Paul Graham’s essay.
Create the following starter.py
file with the content:
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader, Settings |
You should have a structure similar this one.
├── starter.py |
We use the BAAI/bge-base-en-v1.5
model through our HuggingFaceEmbedding
class and our Ollama LLM wrapper to load in the Llama3 model.
When running the starter.py
script it will build an index over the documents in the data folder.
Query the Data
Let’s add the following lines on our script.
query_engine = index.as_query_engine() |
This creates an engine for Q&A over your index and asks a simple question. You should get back a response similar to the following:
According to the given context, before college, the author worked on writing and programming. |
Next Steps
The execution of this code on my local toaster without a GPU takes something like 5m41,258s
. One way that we might try to reduce this is by incorporating llmlingua on the chain. I would probably write one article specifically for it. But in a nutshell is like a token compressor that would reduce the size of your prompt without losing context.
Conclusion
In this article I went to the process of testing LlamaIndex a framework for building context-augmented LLM applications, using a data file regarding Paul Graham essay and leveraging bge-base-en-v1.5
model to ask a question with this new context.
References
- https://github.com/run-llama/llama_index
- https://docs.llamaindex.ai/en/latest
- https://www.youtube.com/watch?v=wjZofJX0v4M&list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3pi
- https://docs.llamaindex.ai/en/latest/getting_started/concepts
- https://docs.llamaindex.ai/en/latest/getting_started/starter_example_local
- https://llmlingua.com