Skip to content

AI Assistant

ai_assistant

AI assistant backend — vision-capable LLM interaction.

Supported backends

  • ollama – Local Ollama server (recommended; supports llava and other vision models out of the box). Uses the /api/chat endpoint so conversation history is passed natively.
  • openaiLocal OpenAI-compatible REST API (LM Studio, llama.cpp, etc.). External cloud endpoints are blocked by default.
  • npu – AMD Ryzen AI ONNX model running on the NPU / iGPU.

Privacy & security

By default (network.allow_external: false) every backend URL is validated before each request. Only localhost, 127.x.x.x, ::1, and RFC-1918 private-network addresses are accepted. Any attempt to configure an external endpoint raises :class:ExternalNetworkBlockedError at request time so the check cannot be bypassed by a bad config file without explicitly opting in.

Backend resource efficiency

  • requests is imported lazily; no persistent Session is kept between calls (Connection: close is sent with every request so the socket is released immediately after the response).
  • Responses are streamed token-by-token and yielded to the caller so the UI can update incrementally without buffering the full reply in RAM.
  • Screenshot / image bytes are passed in and can be deleted by the caller as soon as :func:~AIAssistant.ask returns — they are not retained here.
  • NPU sessions are unloaded right after inference (see :mod:npu_manager).

AIAssistant

AIAssistant(config, npu_manager=None, registry=None, os_info=None)

Facade for talking to a vision-capable LLM backend.

Parameters

config: The application :class:~src.config.Config object. npu_manager: An optional :class:~src.npu_manager.NPUManager. Only used when backend == "npu".

Source code in src/ai_assistant.py
def __init__(self, config, npu_manager=None, registry=None, os_info=None) -> None:  # noqa: ANN001
    self._config = config
    self._npu_manager = npu_manager
    self._registry = registry  # ToolRegistry | None
    self._os_info = os_info  # OSInfo | None
    # Rate limiter — reads from config.security.rate_limit_per_minute (0 = disabled)
    security_cfg: dict = (
        config.get("security", {}) if hasattr(config, "get") else {}
    )
    rpm = int(security_cfg.get("rate_limit_per_minute", 0))
    self._rate_limiter = RateLimiter(calls_per_minute=rpm)

ask

ask(prompt, *, history=None, screenshot_jpeg=None, attachment_image_jpegs=None, attachment_texts=None, max_context_messages=40)

Send a prompt (with optional images/text/history) and stream the reply.

This is a generator: iterate over it to receive response tokens as they arrive from the model. The caller should delete screenshot_jpeg and any attachment bytes once this function returns to free memory.

Parameters

prompt: The user's natural-language question or instruction. history: :class:~src.conversation.ConversationHistory whose past messages are passed to the model for multi-turn context. screenshot_jpeg: JPEG bytes of the current screen (optional). attachment_image_jpegs: List of JPEG bytes for user-uploaded images (optional). attachment_texts: List of text file contents to include in the context (optional). max_context_messages: How many of the most recent past messages to include in the request. None includes all of them.

Yields

str Incremental response tokens as they arrive.

Source code in src/ai_assistant.py
def ask(
    self,
    prompt: str,
    *,
    history: "ConversationHistory | None" = None,
    screenshot_jpeg: bytes | None = None,
    attachment_image_jpegs: list[bytes] | None = None,
    attachment_texts: list[str] | None = None,
    max_context_messages: int | None = 40,
) -> Generator[str, None, None]:
    """Send a prompt (with optional images/text/history) and stream the reply.

    This is a **generator**: iterate over it to receive response tokens as
    they arrive from the model.  The caller should delete
    ``screenshot_jpeg`` and any attachment bytes once this function returns
    to free memory.

    Parameters
    ----------
    prompt:
        The user's natural-language question or instruction.
    history:
        :class:`~src.conversation.ConversationHistory` whose past messages
        are passed to the model for multi-turn context.
    screenshot_jpeg:
        JPEG bytes of the current screen (optional).
    attachment_image_jpegs:
        List of JPEG bytes for user-uploaded images (optional).
    attachment_texts:
        List of text file contents to include in the context (optional).
    max_context_messages:
        How many of the most recent past messages to include in the
        request.  ``None`` includes all of them.

    Yields
    ------
    str
        Incremental response tokens as they arrive.
    """
    # Rate-limit check: raises RateLimitExceededError if over the limit.
    self._rate_limiter.check()

    backend = self._config.backend
    if backend == "ollama":
        yield from self._ask_ollama(
            prompt,
            history,
            screenshot_jpeg,
            attachment_image_jpegs,
            attachment_texts,
            max_context_messages,
        )
    elif backend == "openai":
        yield from self._ask_openai(
            prompt,
            history,
            screenshot_jpeg,
            attachment_image_jpegs,
            attachment_texts,
            max_context_messages,
        )
    elif backend == "npu":
        yield from self._ask_npu(prompt, screenshot_jpeg)
    else:
        raise ValueError(f"Unknown backend: {backend!r}")