AI Models

SafeRag SE supports three AI providers. Choose what works best for your Mac.

llama.cpp (Built-In) — Recommended

llama.cpp is built directly into SafeRag SE. No external tools or installations are needed — just download a model and start chatting.

  • Download GGUF models from within the app
  • Runs locally on Apple Silicon using Metal GPU acceleration
  • Full control over which models you use
  • No internet required after downloading a model

Model Recommendations

Choose a model based on your Mac's available RAM:

Model Size RAM Needed Best For
Llama 3.2 3B ~2 GB 8 GB Quick responses, everyday tasks
Mistral 7B ~4 GB 8 GB Balanced quality and speed
Llama 3.1 8B ~5 GB 8 GB Strong general-purpose assistant
Llama 3.1 13B ~8 GB 16 GB Higher quality reasoning
Qwen 2.5 32B ~20 GB 32 GB+ Advanced tasks, long context
Start Small
Start with a smaller model (3B–8B) to test performance on your Mac. You can always download a larger model later if you need better quality responses.

Apple Foundation Models (macOS 26+)

On macOS 26 (Tahoe) and later, SafeRag SE can use Apple's built-in Foundation Models for AI inference. This provides a seamless experience with no downloads required.

  • No model download needed — uses Apple's on-device models
  • Optimized for Apple hardware with fast response times
  • Select Apple FM as your provider in Settings
  • Automatic updates with macOS system updates
macOS 26 Required
Apple Foundation Models require macOS 26 (Tahoe) or later. On earlier macOS versions, use llama.cpp as your AI provider.

Ollama (Optional)

If you already have Ollama installed on your Mac, SafeRag SE can connect to it as an AI provider. This is entirely optional — SafeRag SE works without Ollama.

  • Enable Ollama in Settings > AI Provider
  • Access Ollama's full model library
  • SafeRag SE connects to Ollama's local API on your Mac
  • Useful if you already manage models through Ollama

Ollama is not bundled with SafeRag SE. You need to install it separately from ollama.com if you want to use it.

Managing Models

You can manage your downloaded models from the Settings panel:

  • View models — See all downloaded models with their size and format
  • Download models — Browse and download additional GGUF models
  • Delete models — Remove models you no longer need to free up disk space
  • Set default — Choose which model to use for new chat sessions

Navigate to Settings > Models to access model management.

Choosing the Right Model

The best model for you depends on your Mac's configuration and what you need the AI for. Consider these factors:

Available RAM

  • 8 GB Mac — Use 3B–8B parameter models. Larger models may cause slowdowns or fail to load.
  • 16 GB Mac — Comfortable with 8B–13B models. Good balance of quality and performance.
  • 32 GB+ Mac — Can run 32B+ models for the best quality responses.

Speed vs. Quality

  • Smaller models respond faster but may produce less nuanced answers
  • Larger models give higher quality responses but take longer to generate
  • For quick questions, a 3B–8B model is usually sufficient
  • For complex reasoning or writing tasks, a larger model is worth the wait

Task Type

  • Everyday chat — Any model works well, smaller models are faster
  • Document Q&A (RAG) — 8B+ models give better context-aware answers
  • Writing and analysis — 13B+ models produce higher quality output
  • Code assistance — Look for code-focused models in the download list