Multimodal
LLM inference over text, images, and other media within a single conversation context.
Overview
Completion supports multimodal prompts. Multimodal lets you attach media files to inputs for completion(). You can include multiple attachments in the same request (e.g., to compare two images).
Compared to text-only completion inference, the key differences are:
- You must load a multimodal-capable LLM and its matching
projectionModelSrcvialoadModel(). - Your
historymessages can includeattachments: [{ path: "/path/to/image.jpg" }](the file must exist on disk). - Aside from attachments, you still call
completion({ modelId, history, stream })the same way and consume the same streaming output.
Functions
Use the following sequence of function calls:
For how to use each function, see SDK — API reference.
Models
You should load two models:
- a
llama.cpp-compatible multimodal-capable LLM. Model file format:*.gguf; and - a matching projection model (
mmproj-*.gguf). Model file format:*.gguf.
Recommended pairs:
- SmolVLM2 + mmproj-*
- Qwen2.5-Omni + mmproj-* (or Qwen3-VL + mmproj-*)
For models available as constants, see SDK — Models.
Example
The following script shows an example of multimodal completion with one image (and optionally two):
import { completion, loadModel, SMOLVLM2_500M_MULTIMODAL_Q8_0, MMPROJ_SMOLVLM2_500M_MULTIMODAL_Q8_0, unloadModel, } from "@qvac/sdk";
if (process.argv.length < 3) {
console.error(`Specify an image file path as the first argument and a second image file path as the second (optional) argument`);
process.exit(1);
}
try {
// const modelPath = args[modelIndex + 1]!;
const imageFilePath = process.argv[2];
// Load the main model with projection in a single step
const modelId = await loadModel({
modelSrc: SMOLVLM2_500M_MULTIMODAL_Q8_0,
modelType: "llm",
projectionModelSrc: MMPROJ_SMOLVLM2_500M_MULTIMODAL_Q8_0,
modelConfig: {
ctx_size: 1024,
},
onProgress: (progress) => {
console.log(`Loading: ${progress.percentage.toFixed(1)}%`);
},
});
//Using one particular media
const history = [
{
role: "user",
content: "What's in this image?",
attachments: [{ path: imageFilePath }],
},
];
const result = completion({ modelId, history, stream: true });
for await (const token of result.tokenStream) {
process.stdout.write(token);
}
const stats = await result.stats;
console.log("\n📊 Performance Stats:", stats);
console.log("--------------------------------");
//Using multiple media
if (process.argv.length < 4) {
console.log(`Only one image provided, terminating`);
process.exit(0);
}
const imageFilePath2 = process.argv[3];
const history2 = [
{
role: "user",
content: "Compare the two newspaper articles",
attachments: [{ path: imageFilePath }, { path: imageFilePath2 }],
},
];
const result2 = completion({ modelId, history: history2, stream: true });
for await (const token of result2.tokenStream) {
process.stdout.write(token);
}
const stats2 = await result2.stats;
console.log("\n📊 Performance Stats:", stats2);
console.log("--------------------------------");
await unloadModel({ modelId, clearStorage: false });
}
catch (error) {
console.error("❌ Error:", error);
process.exit(1);
}Tip: all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see SDK quickstart.