A Practical Guide for Professional Translators
This quick guide explains how to create a fully local translation assistant that runs on your own computer, respects your glossaries and reference materials, and does not send any data to external servers.
The setup is suitable for translators working with confidential, regulated, or proprietary content.
This assistant allows you to:
Translate texts using a local large language model (LLM)
Reference your own glossaries, previous translations, and style guides
Enforce consistent terminology and style via system prompts
Work entirely offline (except for initial downloads)
It is designed as a translation assistant, not a fully automated CAT tool.
The system consists of four components:
Ollama: runs language models locally
Open WebUI: browser interface (similar to ChatGPT)
Embedding model (nomic-embed-text): indexes your documents
Adaptive Memory add-on: stores your preferences and habits
You need:
Windows, macOS, or Linux
A terminal (Command Prompt / PowerShell / Terminal);
At least 16 GB RAM recommended
Optional GPU for larger models (strongly recommended for professional use)
No programming skills are required.
Ollama runs the AI models locally.
Download and install:
Follow the installer for your operating system.
For professional translation quality, avoid very small models.
Recommended options:
General-purpose, high quality: ollama pull gpt-oss-20b
Faster, lighter alternative: ollama pull llama3.1:8b
Open WebUI provides a browser-based interface to interact with your local models.
Download Docker Desktop:
https://www.docker.com/products/docker-desktop/
Then follow the Open WebUI Docker quick start:
https://docs.openwebui.com/getting-started/quick-start/
After starting the container, open:
To enable document-aware translation, install the embedding model: ollama pull nomic-embed-text
This model:
Does not appear in chat
Is used internally to index and retrieve document fragments
Open Admin Panel
Go to Settings → Documents
Configure Embedding:
Embedding Model Engine: Ollama
Embedding Model: nomic-embed-text
API key: leave empty
Under Retrieval, enable Full Context Mode
Save the settings.
Documents are split into chunks before indexing.
Recommended values for translation work:
Chunk size: 256–384 tokens
Overlap: 15–20%
This improves terminological consistency and reduces context loss.
⚠️ Changing these values later requires re-uploading documents.
Go to Workspace → Knowledge
Click Create Knowledge
Give it a name (for example): Client Glossaries and Style Guides
Upload:
Glossaries
Previous translations
Style guides
Client instructions
When you upload documents to Knowledge, Open WebUI always performs background indexing. This process is automatic and consists of several internal steps:
The uploaded file is parsed (for example, PDF or DOCX is converted into plain text)
The extracted text is split into smaller chunks (chunking)
An embedding model (such as nomic-embed-text) is applied to each chunk
The resulting vectors are stored in an internal vector database
When you submit a query, semantic retrieval is performed to find the most relevant chunks, which are then injected into the model’s context.
This indexing step is essential for Knowledge to work. Without it, document-based retrieval would not be possible.
There may be no explicit “indexing complete” or “ready” indicator in the interface, depending on the Open WebUI version. Indexing happens in the background, so users should allow some time after uploading documents and watch for any error messages during processing.
Go to Workspace → Models
Click Create Model
Select your base model (e.g. qwen2.5:14b)
Attach your Knowledge base
Optionally enable Adaptive Memory
Save the model.
This ensures that relevant document fragments are automatically included whenever you translate.
Adaptive Memory stores your working preferences (not documents).
Adaptive Memory download: https://openwebui.com/posts/9b50f29d-92c2-4028-b94e-78cead0d8c88
In Open WebUI:
Go to Workspace → Functions
Import or enable Adaptive Memory V3
Activate it for your custom model
Examples of what Adaptive Memory can store:
Preferred target language
Tone requirements
Formatting habits.
Define strict translation behaviour in the model settings.
Example system prompt:
You are a professional translator. Translate texts, preserving terminology and phrasing from the provided documents. Prefer terms used in my reference materials. Do not paraphrase unless explicitly requested.
Do not invent terminology. Output only the translation, without commentary.
This prompt applies automatically to every chat using the custom model.
Start a new chat
Select your custom translation model
Provide clear instructions, for example:
Translate the following text into Italian. Use my documents for terminology and style consistency.
The assistant will:
Retrieve relevant document fragments
Inject them into the context
Generate a translation aligned with your materials.
Create separate Knowledge bases per client or domain.
Keep system prompts strict and unambiguous.
Always review output professionally.
Treat this as an assistant, not an autonomous translator.
You now have:
A fully local translation assistant
Document-aware translation support
No data leakage
A setup that any professional translator can maintain independently.
This guide is inspired by the freeCodeCamp guide on running local LLMs for document interaction, adapted here for the needs of professional translators.