Building a Private RAG with Proton Lumo
Retrieval-Augmented Generation (RAG) has quickly moved from research labs into practical use cases — powering smarter knowledge bases, improving compliance reporting, and enabling teams to answer complex questions over private data.
With Proton’s Lumo API, launched in early 2025 as part of the Proton Plus product suite, it’s now possible to build a private RAG pipeline that aligns with strong privacy guarantees.
This post explains what Lumo is, why RAG matters, and how you can wire up a private RAG workflow using Lumo’s API.
What is Proton Lumo?
Proton introduced Lumo in 2025 as its end-to-end encrypted AI assistant. It extends the Proton ecosystem — Mail, Drive, VPN, and Calendar — with private, model-driven capabilities.
Unlike most commercial AI APIs, Proton designed Lumo around zero-access encryption; Proton’s servers never store or log your raw prompts or data. For security and compliance leaders, that design choice is significant — it reduces exposure in ways most “AI-as-a-service” platforms do not.
Lumo runs on a routing layer that automatically selects the most suitable underlying model (for example, OpenHands 32 B for code-heavy queries or Mistral Small 3 for general text). For enterprises in the EU, it’s also notable that Lumo operates entirely within Proton’s European data centers, simplifying GDPR data-residency concerns.
What is RAG?
At its core, Retrieval-Augmented Generation is a simple pattern:
- Retrieve: Query a knowledge base or vector store to find the most relevant chunks of information.
- Augment: Insert those chunks into the prompt sent to the language model.
- Generate: Let the model craft an answer, grounded in the retrieved data.
This approach improves factual accuracy, reduces hallucination risk, and makes AI usable for organization-specific content — policies, contracts, audit evidence, or customer FAQs.
Why Build a Private RAG?
For security and compliance leaders, a private RAG offers several benefits:
- Control of Data: Your documents never leave your encrypted storage or vector database.
- Auditability: Retrieved chunks and citations can be logged for evidence or compliance.
- Customization: You decide how documents are chunked, embedded, and retrieved.
- Privacy by Design: Using Proton’s Lumo API keeps prompts and context end-to-end encrypted.
- Regulatory Alignment: EU-hosted infrastructure and transparent handling satisfy GDPR and similar obligations.
How to Build a Private RAG with Lumo
Here’s a step-by-step overview using Lumo’s API.
1. Set up your vector store
Choose a store such as Pinecone, Qdrant, Weaviate, or PostgreSQL + pgvector. This will hold the embeddings of your documents.
2. Embed your documents
curl -X POST https://api.lumo.proton.me/v1/embeddings \
-H "Authorization: Bearer YOUR_LUMO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "lumo-embed",
"input": "Your document text…"
}'
The returned vector can be upserted into your chosen store.
3. Retrieve relevant chunks
When a user asks a question, perform a similarity search in your vector store and select the top-k results (usually 3–5 snippets).
4. Assemble the augmented prompt
Context:
{chunk_1}
{chunk_2}
{chunk_3}
---
Question: {user_query}
5. Call the Lumo chat endpoint
curl -X POST https://api.lumo.proton.me/v1/chat/completions \
-H "Authorization: Bearer YOUR_LUMO_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "lumo-plus",
"messages": [
{"role":"system","content":"You are a helpful assistant that answers using the provided context."},
{"role":"user","content":"[assembled prompt here]"}
],
"max_tokens": 1024,
"temperature": 0.2,
"response_format": "citation"
}'
This ensures the response is grounded in your data — and optionally includes citations.
6. Post-process as needed
Depending on your use case, you might:
- Store Q&A pairs in your knowledge base.
- Enforce tone or style (for example, executive summary vs. technical detail).
- Chain calls to summarize long documents within token limits.
Optional: Lumo’s Built-In Web Search
For prototyping or lightweight projects, you can skip the vector store and let Lumo use its built-in web_search tool. This acts as a managed RAG service:
- Lumo queries Brave Search.
- It embeds the top results.
- It generates an answer with citations.
Handy for rapid tests, though less controllable than a private store.
Things to Keep in Mind
| Consideration | Why it Matters |
|---|---|
| Rate limits | 60 requests / min on Lumo Plus; higher with Visionary plans. |
| Token limits | 8 K-token context per request; chain calls for larger docs. |
| Costs | Subscription + token usage billing. |
| Compliance | Encrypted, EU-hosted, GDPR-aligned. |
| Model routing | Automatic — Lumo selects the best model for each query. |
Closing Thoughts
RAG isn’t just a technical pattern — it’s a governance tool. It gives leaders confidence that AI systems are grounded in authoritative data and remain auditable. Proton’s Lumo makes this accessible while preserving privacy and regulatory alignment.
If you’re already using Proton services, Lumo is a natural way to extend that ecosystem into secure, AI-driven knowledge workflows. And if you’re exploring AI more generally, it’s a strong reminder that privacy-first design is possible — and worth insisting on.