Configuration
Colette provides two main interfaces for interaction:
JSON Interface: Accessible via Python for direct integration.
HTTP Interface: Usable with
curl
or any programming language that supports HTTP requests.
Colette operates as an application that is configured through a JSON file, allowing users to define its behavior before sending queries.
Pre-configured RAG systems
Colette comes with several pre-configured RAG systems, each tailored for specific use cases. These systems are designed to simplify the setup process and provide a quick start for users.
They can be used using the Colette CLI:
Indexing
To run the indexing phase.
colette_cli index [OPTIONS]
Options:
* --app-dir PATH Specify the application directory [default: None] [required]
* --data-dir PATH Specify the data directory [default: None] [required]
--models-dir PATH Specify the models directory [default: None]
--config-file PATH Specify the config file [default: None]
--index-file PATH [default: <typer.models.OptionInfo object at 0x7f629528c0b0>]
--help Show this message and exit.
Querying
To start a chat with the application.
colette_cli chat [OPTIONS]
Options:
* --app-dir PATH Specify the application directory [default: None] [required]
* --msg TEXT Specify the user message [default: None] [required]
--models-dir PATH Specify the models directory [default: None]
--help Show this message and exit.
Configuring a RAG-based System
Colette supports Retrieval-Augmented Generation (RAG), enabling the ingestion and retrieval of documents to enhance answer generation. Below is an overview of the RAG pipeline used by Colette:
Overview of the RAG Process
The diagram above illustrates the five key steps involved in processing queries using a Retrieval-Augmented Generation (RAG) pipeline:
Encoding
Documents (e.g., PDFs, docx, …) are converted into embeddings using an embedding model.
When a user submits a query, it is also encoded into an embedding.
Indexing & Similarity Search
Encoded document embeddings are stored in a vector database (e.g., ChromaDB).
When a query embedding is received, the system searches for similar documents in the vector database.
Retrieving Relevant Chunks
The most relevant document chunks are retrieved from the database.
Generating a Prompt
The retrieved chunks are formatted into a structured prompt that provides context to the language model.
Generating the Response
The prompt is passed to the LLM, which generates a final response based on the retrieved context.
Configuration Structure
The configuration file consists of three main sections:
app
Section - Defines the application repository and logging level.parameters
Section - Contains sub-sections related to data input, preprocessing, retrieval settings, and prompt templates.llm
Section - Specifies the LLM model and inference settings.
Below is a detailed breakdown of each section.
1. Application Settings (app
)
{
"app": {
"repository": "/path/to/rag",
"verbose": "info"
}
}
This section controls Colette’s internal configurations, including:
repository
: The directory where Colette will store its internal files, configurations, and data.verbose
: The logging level (e.g.,info
,debug
).
2. Parameters (parameters
)
The parameters
section defines how data is processed, indexed, and retrieved before being passed to the LLM for answer generation.
2.1 Input Settings (input
)
"input": {
"preprocessing": {
"files": ["all"],
"lib": "unstructured",
"save_output": false,
"filters": ["\/~[^\/]*$"]
},
"rag": {
"indexdb_lib": "chromadb",
"embedding_lib": "huggingface",
"embedding_model": "intfloat/multilingual-e5-small",
"gpu_id": 0,
"search": true,
"reindex": false
},
"template": {
"template_prompt": "Tu es un assistant expert dans le management des systèmes spatiaux et orbitaux. Réponds en francais en utilisant les informations du contexte qui suit. Contexte: {context}. Question: {question}. Réponse: ",
"template_prompt_variables": ["context", "question"]
},
"data": ["/path/to/data/"]
}
Preprocessing (preprocessing
)
This section controls how Colette filters and processes files before they are indexed in the RAG system:
files
: Determines which files to include (["all"]
means all files are considered).lib
: Specifies the library used for preprocessing (unstructured
for document parsing).save_output
: Boolean flag (false
here) indicating whether to store preprocessed files.filters
: Regex patterns to exclude files based on their names.
RAG Configuration (rag
)
Controls settings for retrieval and vector database indexing:
indexdb_lib
: The vector database used (chromadb
).embedding_lib
: The library used to compute text embeddings (huggingface
).embedding_model
: The embedding model (intfloat/multilingual-e5-small
).gpu_id
: Specifies which GPU to use (0
means the first available GPU).search
: Iftrue
, retrieval is enabled.reindex
: Iffalse
, existing indexes are used instead of re-processing everything.
Prompt Template (template
)
Defines the system prompt used when querying the LLM:
template_prompt
: The structured prompt instructing the assistant to respond as an expert in space systems management.template_prompt_variables
: The placeholders used in the prompt (context
,question
).
Data Sources (data
)
Defines where Colette retrieves documents for processing:
data
: The directory containing documents (/path/to/data/
).
3. LLM Configuration (llm
)
"llm": {
"source": "llama3.1:latest",
"inference": {
"lib": "ollama"
}
}
This section defines the language model used for answering queries:
source
: The LLM model version (llama3.1:latest
).inference
:lib
: The inference library (ollama
).
How Colette Processes RAG Queries
Preprocessing: Colette filters and processes input documents based on the defined rules.
Indexing: The documents are embedded using the
huggingface
embedding model (intfloat/multilingual-e5-small
) and stored inchromadb
.Retrieval: When a user submits a query, Colette searches the document embeddings to retrieve relevant context.
Prompt Construction: The retrieved context is inserted into the
template_prompt
.LLM Query: The constructed prompt is sent to
llama3.1:latest
usingollama
for inference.Response Generation: The model generates an answer using both the retrieved documents and its own knowledge.
Example API Usage
Querying via JSON Interface
import requests
url = "http://localhost:1873/predict"
data = {
"question": "Quels sont les principes de gestion des débris spatiaux ?"
}
response = requests.post(url, json=data)
print(response.json())
Querying via cURL
curl -X POST http://localhost:1873/predict -H "Content-Type: application/json" -d '{
"question": "Quels sont les principes de gestion des débris spatiaux ?"
}'