NPU Manager¶
npu_manager
¶
AMD Ryzen AI / ONNX Runtime NPU manager.
This module probes for AMD NPU availability via ONNX Runtime's VitisAI Execution Provider and exposes a thin inference wrapper. When the NPU is unavailable it falls back gracefully to CPU inference.
NPUSession
¶
Wraps an onnxruntime.InferenceSession configured for AMD NPU.
Supports the context-manager protocol so resources are released as soon
as the with block exits::
with NPUSession(model_path, providers) as session:
outputs = session.run(feeds)
# session memory freed here
Parameters¶
model_path: Path to a pre-compiled ONNX model. providers: Ordered list of ONNX Runtime Execution Providers to try. vitisai_config: Optional path to the VitisAI EP JSON configuration file.
Source code in src/npu_manager.py
close
¶
run
¶
Run inference.
Parameters¶
feeds: Dict mapping input names to numpy arrays.
Returns¶
list Raw ONNX Runtime output tensors.
Note¶
The session may be None after meth:
close is called. Always
use this object inside a with block or check attr:
is_open.
Source code in src/npu_manager.py
NPUManager
¶
High-level manager for AMD NPU availability and session lifecycle.
Source code in src/npu_manager.py
is_npu_available
¶
Return True if the VitisAI Execution Provider is usable.
Source code in src/npu_manager.py
get_device_info
¶
Return human-readable information about the detected AI accelerator.
Source code in src/npu_manager.py
load_model
¶
Load the configured ONNX model onto the NPU (or CPU fallback).
When npu.model_path is "auto" (the default), the bundled
Phi-3-mini-4k-instruct ONNX model is used. If it is not yet installed
and npu.auto_install_default_model is True, it is downloaded
automatically from Hugging Face on first call.
Parameters¶
progress_callback: Optional callable receiving download-progress strings.
Returns¶
NPUSession | None
Loaded session, or None if no model is available.
Source code in src/npu_manager.py
run_inference
¶
Load the model, run inference, and immediately unload if configured.
When resources.unload_model_after_inference is True (default)
the ONNX session is destroyed after the call so NPU/GPU memory is
released straight away.
Parameters¶
feeds: Dict mapping input names to numpy arrays. progress_callback: Optional callable for download-progress messages on first run.
Source code in src/npu_manager.py
get_session
¶
Return the cached session, loading it if necessary.
Parameters¶
progress_callback: Optional callable for download-progress messages on first run.