AI Model Guide¶

This guide explains how to add, select, and delete AI models through the Linux AI NPU Assistant GUI — including browsing for ONNX files and dragging models in from your file manager.

Overview¶

The application supports three kinds of AI model sources:

Source	How to add	Best for
Ollama (local server)	Pull via `ollama pull <name>` or the Models tab	Most LLMs (Llama, Mistral, Gemma…)
OpenAI-compatible server	Set in the Backend tab (LM Studio, Jan, etc.)	Any GGUF model served locally
ONNX file	Browse or drag-and-drop in the Models tab	AMD NPU inference

Opening the Settings / Model Manager¶

Press the Copilot key (or Ctrl+Alt+Space) to open the assistant.
Click the ⚙ Settings button in the top-right corner.
Select the Models tab.

Selecting a model from your Ollama server¶

Make sure Ollama is running: ollama serve
Open Settings → Models tab.
Click 🔄 Refresh to fetch the list of installed models.
Click a model name in the list to select it.

The NPU compatibility badge next to each model tells you whether it will run well on the NPU:

Badge	Meaning
✅ NPU OK	Small/quantized model — should run efficiently on NPU
⚠ NPU Warning	May run slowly or fall back to CPU
⛔ Not recommended	Too large or incompatible format for NPU

Click ✔ Use this model to activate it. The change is saved automatically to ~/.config/linux-ai-npu-assistant/settings.json.

Browsing for an ONNX file¶

If you have downloaded a model in ONNX format for direct NPU inference:

Open Settings → Models tab.
Click 📂 Browse ONNX…
A file-picker dialog opens filtered to *.onnx files.
Navigate to your model file and click Open.
The path is added to the model list and selected automatically.
Click ✔ Use this model to activate it.

Where to get ONNX models

Hugging Face — search for onnx models
ONNX Model Zoo

Export your own with Optimum:

pip install optimum[exporters]
optimum-cli export onnx --model meta-llama/Llama-3.2-1B onnx-llama3/

Drag-and-drop¶

You can drag a model file directly from your file manager into the Models tab:

Open Settings → Models tab.
Open your file manager (Nautilus, Dolphin, Thunar, etc.) and navigate to your model.
Drag the .onnx or .gguf file and drop it onto the model list area in the Settings window.
The file path appears in the list and is selected automatically.
Click ✔ Use this model to activate it.

Supported drag-and-drop formats

.onnx — direct NPU inference via ONNX Runtime
.gguf — served via Ollama or llama.cpp (the path is registered as the model name)

Deleting a model¶

Ollama models¶

Select the model in the list.
Click 🗑 Delete.
A confirmation dialog appears. Click Yes to remove the model.

This runs ollama rm <model-name> under the hood. The model files are permanently removed from your Ollama store (~/.ollama/models/).

ONNX files¶

Select the ONNX file entry in the list.
Click 🗑 Delete.
Choose whether to:
Remove from list only — the file stays on disk but is removed from the settings.
Delete file from disk — permanently deletes the .onnx file.

Warning

Deleting a file from disk is irreversible. Make sure you have a backup before choosing this option.

NPU compatibility details¶

The NPU compatibility check evaluates each model against these rules:

Rule	Result
File ends in `.onnx`	✅ OK — designed for ONNX Runtime / NPU
Model name contains `70b`, `65b`, `34b`, etc.	⛔ Too large for NPU memory
Vision model (llava, bakllava, moondream…)	⚠ Needs custom ONNX export
Full-precision (`f16`, `f32`, `bf16`)	⚠ Slow on NPU; use quantized variant
Quantized small model (`3b-q4_K_M`, etc.)	✅ OK
Embedding model (nomic-embed, mxbai-embed…)	⚠ Not for conversational use
Model > 13 GB	⚠ May exceed NPU memory

These thresholds can be adjusted in settings.json under model_selector.size_warning_gb.

Recommended models for NPU¶

These models are known to work well on AMD Ryzen AI NPUs:

Model	Size	Quantization	Notes
`llama3.2:3b-instruct-q4_K_M`	~2 GB	Q4_K_M	Fast, great for Q&A
`phi3:mini-instruct-q4_K_M`	~2.3 GB	Q4_K_M	Microsoft, code-capable
`mistral:7b-instruct-q4_K_M`	~4.1 GB	Q4_K_M	Good all-rounder
`gemma2:2b-instruct-q4_K_M`	~1.6 GB	Q4_K_M	Tiny, very fast
`qwen2.5:3b-instruct-q4_K_M`	~2 GB	Q4_K_M	Multilingual

Pull any of these with:

ollama pull llama3.2:3b-instruct-q4_K_M