QVAC Logo
How-to guides

Multimodal

LLM inference over text, images, and other media within a single conversation context.

Overview

Completion supports multimodal prompts. Multimodal lets you attach media files to inputs for completion(). You can include multiple attachments in the same request (e.g., to compare two images).

Compared to text-only completion inference, the key differences are:

  • You must load a multimodal-capable LLM and its matching projectionModelSrc via loadModel().
  • Your history messages can include attachments: [{ path: "/path/to/image.jpg" }] (the file must exist on disk).
  • Aside from attachments, you still call completion({ modelId, history, stream }) the same way and consume the same streaming output.

Functions

Use the following sequence of function calls:

  1. loadModel()
  2. completion()
  3. unloadModel()

For how to use each function, see SDK — API reference.

Models

You should load two models:

  • a llama.cpp-compatible multimodal-capable LLM. Model file format: *.gguf; and
  • a matching projection model (mmproj-*.gguf). Model file format: *.gguf.

Recommended pairs:

  • SmolVLM2 + mmproj-*
  • Qwen2.5-Omni + mmproj-* (or Qwen3-VL + mmproj-*)

For models available as constants, see SDK — Models.

Example

The following script shows an example of multimodal completion with one image (and optionally two):

multimodal.js
import { completion, loadModel, SMOLVLM2_500M_MULTIMODAL_Q8_0, MMPROJ_SMOLVLM2_500M_MULTIMODAL_Q8_0, unloadModel, } from "@qvac/sdk";
if (process.argv.length < 3) {
    console.error(`Specify an image file path as the first argument and a second image file path as the second (optional) argument`);
    process.exit(1);
}
try {
    // const modelPath = args[modelIndex + 1]!;
    const imageFilePath = process.argv[2];
    // Load the main model with projection in a single step
    const modelId = await loadModel({
        modelSrc: SMOLVLM2_500M_MULTIMODAL_Q8_0,
        modelType: "llm",
        projectionModelSrc: MMPROJ_SMOLVLM2_500M_MULTIMODAL_Q8_0,
        modelConfig: {
            ctx_size: 1024,
        },
        onProgress: (progress) => {
            console.log(`Loading: ${progress.percentage.toFixed(1)}%`);
        },
    });
    //Using one particular media
    const history = [
        {
            role: "user",
            content: "What's in this image?",
            attachments: [{ path: imageFilePath }],
        },
    ];
    const result = completion({ modelId, history, stream: true });
    for await (const token of result.tokenStream) {
        process.stdout.write(token);
    }
    const stats = await result.stats;
    console.log("\n📊 Performance Stats:", stats);
    console.log("--------------------------------");
    //Using multiple media
    if (process.argv.length < 4) {
        console.log(`Only one image provided, terminating`);
        process.exit(0);
    }
    const imageFilePath2 = process.argv[3];
    const history2 = [
        {
            role: "user",
            content: "Compare the two newspaper articles",
            attachments: [{ path: imageFilePath }, { path: imageFilePath2 }],
        },
    ];
    const result2 = completion({ modelId, history: history2, stream: true });
    for await (const token of result2.tokenStream) {
        process.stdout.write(token);
    }
    const stats2 = await result2.stats;
    console.log("\n📊 Performance Stats:", stats2);
    console.log("--------------------------------");
    await unloadModel({ modelId, clearStorage: false });
}
catch (error) {
    console.error("❌ Error:", error);
    process.exit(1);
}

Tip: all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see SDK quickstart.

On this page