AI Models

SafeRag SE supports three AI providers. Choose what works best for your Mac.

llama.cpp (Built-In) — Recommended

llama.cpp is built directly into SafeRag SE. No external tools or installations are needed — just download a model and start chatting.

Download GGUF models from within the app
Runs locally on Apple Silicon using Metal GPU acceleration
Full control over which models you use
No internet required after downloading a model

Model Recommendations

Choose a model based on your Mac's available RAM:

Model	Size	RAM Needed	Best For
Llama 3.2 3B	~2 GB	8 GB	Quick responses, everyday tasks
Mistral 7B	~4 GB	8 GB	Balanced quality and speed
Llama 3.1 8B	~5 GB	8 GB	Strong general-purpose assistant
Llama 3.1 13B	~8 GB	16 GB	Higher quality reasoning
Qwen 2.5 32B	~20 GB	32 GB+	Advanced tasks, long context

Start Small

Start with a smaller model (3B–8B) to test performance on your Mac. You can always download a larger model later if you need better quality responses.

Apple Foundation Models (macOS 26+)

On macOS 26 (Tahoe) and later, SafeRag SE can use Apple's built-in Foundation Models for AI inference. This provides a seamless experience with no downloads required.

No model download needed — uses Apple's on-device models
Optimized for Apple hardware with fast response times
Select Apple FM as your provider in Settings
Automatic updates with macOS system updates

macOS 26 Required

Apple Foundation Models require macOS 26 (Tahoe) or later. On earlier macOS versions, use llama.cpp as your AI provider.

Ollama (Optional)

If you already have Ollama installed on your Mac, SafeRag SE can connect to it as an AI provider. This is entirely optional — SafeRag SE works without Ollama.

Enable Ollama in Settings > AI Provider
Access Ollama's full model library
SafeRag SE connects to Ollama's local API on your Mac
Useful if you already manage models through Ollama

Ollama is not bundled with SafeRag SE. You need to install it separately from ollama.com if you want to use it.

Managing Models

You can manage your downloaded models from the Settings panel:

View models — See all downloaded models with their size and format
Download models — Browse and download additional GGUF models
Delete models — Remove models you no longer need to free up disk space
Set default — Choose which model to use for new chat sessions

Navigate to Settings > Models to access model management.

Choosing the Right Model

The best model for you depends on your Mac's configuration and what you need the AI for. Consider these factors:

Available RAM

8 GB Mac — Use 3B–8B parameter models. Larger models may cause slowdowns or fail to load.
16 GB Mac — Comfortable with 8B–13B models. Good balance of quality and performance.
32 GB+ Mac — Can run 32B+ models for the best quality responses.

Speed vs. Quality

Smaller models respond faster but may produce less nuanced answers
Larger models give higher quality responses but take longer to generate
For quick questions, a 3B–8B model is usually sufficient
For complex reasoning or writing tasks, a larger model is worth the wait

Task Type

Everyday chat — Any model works well, smaller models are faster
Document Q&A (RAG) — 8B+ models give better context-aware answers
Writing and analysis — 13B+ models produce higher quality output
Code assistance — Look for code-focused models in the download list