AI Models
SafeRag SE supports three AI providers. Choose what works best for your Mac.
llama.cpp (Built-In) — Recommended
llama.cpp is built directly into SafeRag SE. No external tools or installations are needed — just download a model and start chatting.
- Download GGUF models from within the app
- Runs locally on Apple Silicon using Metal GPU acceleration
- Full control over which models you use
- No internet required after downloading a model
Model Recommendations
Choose a model based on your Mac's available RAM:
| Model | Size | RAM Needed | Best For |
|---|---|---|---|
| Llama 3.2 3B | ~2 GB | 8 GB | Quick responses, everyday tasks |
| Mistral 7B | ~4 GB | 8 GB | Balanced quality and speed |
| Llama 3.1 8B | ~5 GB | 8 GB | Strong general-purpose assistant |
| Llama 3.1 13B | ~8 GB | 16 GB | Higher quality reasoning |
| Qwen 2.5 32B | ~20 GB | 32 GB+ | Advanced tasks, long context |
Apple Foundation Models (macOS 26+)
On macOS 26 (Tahoe) and later, SafeRag SE can use Apple's built-in Foundation Models for AI inference. This provides a seamless experience with no downloads required.
- No model download needed — uses Apple's on-device models
- Optimized for Apple hardware with fast response times
- Select Apple FM as your provider in Settings
- Automatic updates with macOS system updates
Ollama (Optional)
If you already have Ollama installed on your Mac, SafeRag SE can connect to it as an AI provider. This is entirely optional — SafeRag SE works without Ollama.
- Enable Ollama in Settings > AI Provider
- Access Ollama's full model library
- SafeRag SE connects to Ollama's local API on your Mac
- Useful if you already manage models through Ollama
Ollama is not bundled with SafeRag SE. You need to install it separately from ollama.com if you want to use it.
Managing Models
You can manage your downloaded models from the Settings panel:
- View models — See all downloaded models with their size and format
- Download models — Browse and download additional GGUF models
- Delete models — Remove models you no longer need to free up disk space
- Set default — Choose which model to use for new chat sessions
Navigate to Settings > Models to access model management.
Choosing the Right Model
The best model for you depends on your Mac's configuration and what you need the AI for. Consider these factors:
Available RAM
- 8 GB Mac — Use 3B–8B parameter models. Larger models may cause slowdowns or fail to load.
- 16 GB Mac — Comfortable with 8B–13B models. Good balance of quality and performance.
- 32 GB+ Mac — Can run 32B+ models for the best quality responses.
Speed vs. Quality
- Smaller models respond faster but may produce less nuanced answers
- Larger models give higher quality responses but take longer to generate
- For quick questions, a 3B–8B model is usually sufficient
- For complex reasoning or writing tasks, a larger model is worth the wait
Task Type
- Everyday chat — Any model works well, smaller models are faster
- Document Q&A (RAG) — 8B+ models give better context-aware answers
- Writing and analysis — 13B+ models produce higher quality output
- Code assistance — Look for code-focused models in the download list