QVAC Logo

Logging

Built-in log propagation that streams model logs from the worker process to your client application in real-time. This gives you visibility into what's happening inside your models during loading, inference, and other operations.

Flow

  1. Pass a logger when loading your model.
  2. When logging is enabled, you'll see real-time logs from the underlying model libraries:
[DEBUG] llamacpp:llm: Loading model weights...
[INFO] llamacpp:llm: Model loaded successfully, vocab_size=32000
[DEBUG] llamacpp:llm: Starting inference...
[DEBUG] llamacpp:llm: Inference completed, tokens=12

Features

  • Streaming API (loggingStream) — Consume real-time logs from your models programmatically. Console output disabled by default—you control formatting, storage (file/database), analytics, etc.

  • Logger API (getLogger) — Create loggers for your application code with custom transports. Console output enabled by default; set enableConsole: false to use only custom transports.

It works for all model types (LLM, Whisper, NMT, Embeddings) and provides valuable insight into model performance and behavior.

Usage

qvac-sdk/examples/logging-streaming.ts
import {
  loadModel,
  completion,
  unloadModel,
  loggingStream,
  LLAMA_3_2_1B_INST_Q4_0,
  GTE_LARGE_FP16,
  VERBOSITY,
  embed,
} from "@qvac/sdk";

try {
  console.log("🚀 Starting addon log streaming demo...\n");

  // Load model
  const llmModelId = await loadModel({
    modelSrc: LLAMA_3_2_1B_INST_Q4_0,
    modelType: "llm",
    modelConfig: {
      ctx_size: 2048,
      temp: 0.7,
      verbosity: VERBOSITY.ERROR, // Only log errors, remaining logs are captured by loggingStream
    },
  });

  const embedModelId = await loadModel({
    modelSrc: GTE_LARGE_FP16,
    modelType: "embeddings",
  });

  console.log("Starting log stream in background...\n");
  (async () => {
    for await (const log of loggingStream({ modelId: llmModelId })) {
      const timestamp = new Date(log.timestamp).toISOString();
      console.log(
        `[LLM] [${timestamp}] [${log.level.toUpperCase()}] ${log.namespace}: ${log.message}`,
      );
    }
  })().catch(() => {
    // Stream terminated - this is normal when model unloads
  });

  (async () => {
    for await (const log of loggingStream({ modelId: embedModelId })) {
      const timestamp = new Date(log.timestamp).toISOString();
      console.log(
        `[EMBED] [${timestamp}] [${log.level.toUpperCase()}] ${log.namespace}: ${log.message}`,
      );
    }
  })().catch(() => {
    // Stream terminated - this is normal when model unloads
  });
  const messages = [
    { role: "user", content: "Count from 1 to 5 and explain each number." },
  ];

  const result = completion({
    modelId: llmModelId,
    history: messages,
    stream: true,
  });
  const embedding = await embed({
    modelId: embedModelId,
    text: messages[0]?.content ?? "Hello, world!",
  });

  console.log("📝 Response:\n");
  for await (const token of result.tokenStream) {
    process.stdout.write(token);
  }

  console.log("Embedding (first 20 elements)", embedding.slice(0, 20));
  console.log("Embeddings length", embedding.length);

  await unloadModel({ modelId: llmModelId, clearStorage: false });
  await unloadModel({ modelId: embedModelId, clearStorage: false });
} catch (error) {
  console.error("❌ Error:", error);
  process.exit(1);
}

On this page