Building a Private RAG with Proton Lumo

Retrieval-Augmented Generation (RAG) has quickly moved from research labs into practical use cases — powering smarter knowledge bases, improving compliance reporting, and enabling teams to answer complex questions over private data.

With Proton’s Lumo API, launched in early 2025 as part of the Proton Plus product suite, it’s now possible to build a private RAG pipeline that aligns with strong privacy guarantees.

This post explains what Lumo is, why RAG matters, and how you can wire up a private RAG workflow using Lumo’s API.


What is Proton Lumo?

Proton introduced Lumo in 2025 as its end-to-end encrypted AI assistant. It extends the Proton ecosystem — Mail, Drive, VPN, and Calendar — with private, model-driven capabilities.

Unlike most commercial AI APIs, Proton designed Lumo around zero-access encryption; Proton’s servers never store or log your raw prompts or data. For security and compliance leaders, that design choice is significant — it reduces exposure in ways most “AI-as-a-service” platforms do not.

Lumo runs on a routing layer that automatically selects the most suitable underlying model (for example, OpenHands 32 B for code-heavy queries or Mistral Small 3 for general text). For enterprises in the EU, it’s also notable that Lumo operates entirely within Proton’s European data centers, simplifying GDPR data-residency concerns.


What is RAG?

At its core, Retrieval-Augmented Generation is a simple pattern:

  1. Retrieve: Query a knowledge base or vector store to find the most relevant chunks of information.
  2. Augment: Insert those chunks into the prompt sent to the language model.
  3. Generate: Let the model craft an answer, grounded in the retrieved data.

This approach improves factual accuracy, reduces hallucination risk, and makes AI usable for organization-specific content — policies, contracts, audit evidence, or customer FAQs.


Why Build a Private RAG?

For security and compliance leaders, a private RAG offers several benefits:

  • Control of Data: Your documents never leave your encrypted storage or vector database.
  • Auditability: Retrieved chunks and citations can be logged for evidence or compliance.
  • Customization: You decide how documents are chunked, embedded, and retrieved.
  • Privacy by Design: Using Proton’s Lumo API keeps prompts and context end-to-end encrypted.
  • Regulatory Alignment: EU-hosted infrastructure and transparent handling satisfy GDPR and similar obligations.

How to Build a Private RAG with Lumo

Here’s a step-by-step overview using Lumo’s API.

1. Set up your vector store

Choose a store such as Pinecone, Qdrant, Weaviate, or PostgreSQL + pgvector. This will hold the embeddings of your documents.

2. Embed your documents

curl -X POST https://api.lumo.proton.me/v1/embeddings \
  -H "Authorization: Bearer YOUR_LUMO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
        "model": "lumo-embed",
        "input": "Your document text…"
      }'

The returned vector can be upserted into your chosen store.

3. Retrieve relevant chunks

When a user asks a question, perform a similarity search in your vector store and select the top-k results (usually 3–5 snippets).

4. Assemble the augmented prompt

Context:
{chunk_1}
{chunk_2}
{chunk_3}
---
Question: {user_query}

5. Call the Lumo chat endpoint

curl -X POST https://api.lumo.proton.me/v1/chat/completions \
  -H "Authorization: Bearer YOUR_LUMO_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
        "model": "lumo-plus",
        "messages": [
          {"role":"system","content":"You are a helpful assistant that answers using the provided context."},
          {"role":"user","content":"[assembled prompt here]"}
        ],
        "max_tokens": 1024,
        "temperature": 0.2,
        "response_format": "citation"
      }'

This ensures the response is grounded in your data — and optionally includes citations.

6. Post-process as needed

Depending on your use case, you might:

  • Store Q&A pairs in your knowledge base.
  • Enforce tone or style (for example, executive summary vs. technical detail).
  • Chain calls to summarize long documents within token limits.

For prototyping or lightweight projects, you can skip the vector store and let Lumo use its built-in web_search tool. This acts as a managed RAG service:

  1. Lumo queries Brave Search.
  2. It embeds the top results.
  3. It generates an answer with citations.

Handy for rapid tests, though less controllable than a private store.


Things to Keep in Mind

Consideration Why it Matters
Rate limits 60 requests / min on Lumo Plus; higher with Visionary plans.
Token limits 8 K-token context per request; chain calls for larger docs.
Costs Subscription + token usage billing.
Compliance Encrypted, EU-hosted, GDPR-aligned.
Model routing Automatic — Lumo selects the best model for each query.

Closing Thoughts

RAG isn’t just a technical pattern — it’s a governance tool. It gives leaders confidence that AI systems are grounded in authoritative data and remain auditable. Proton’s Lumo makes this accessible while preserving privacy and regulatory alignment.

If you’re already using Proton services, Lumo is a natural way to extend that ecosystem into secure, AI-driven knowledge workflows. And if you’re exploring AI more generally, it’s a strong reminder that privacy-first design is possible — and worth insisting on.