QVAC Logo

loadModel( )

Loads a machine learning model from a local path, remote URL, or Hyperdrive key.

function loadModel(options): Promise<string>;

Description

This function supports multiple model types: LLM (Large Language Model), Whisper (speech recognition), embeddings, NMT (translation), and TTS. It can handle both local file paths and Hyperdrive URLs (pear://).

When onProgress is provided, the function uses streaming to provide real-time download progress. Otherwise, it uses a simple request-response pattern for faster execution.

Accepted forms

Load new model

function loadModel(options): Promise<string>;

Hot-reload existing model config

function loadModel(options): Promise<string>;

Parameters

NameTypeRequired?Description
optionsobjectModel loading configuration

options (new model)

FieldTypeRequired?Description
modelSrcstringThe location from which the model weights are fetched (local path, remote URL, or Hyperdrive URL)
modelType"llm" | "whisper" | "embeddings" | "nmt" | "tts"The type of model
modelConfigobjectModel-specific configuration options
projectionModelSrcstring(LLM only) Projection model source for multimodal models
vadModelSrcstring(Whisper only) VAD model source for voice activity detection
configSrcstring(TTS only) Config file source
eSpeakDataPathstring(TTS only) Path to eSpeak data
delegateobjectDelegation configuration for P2P inference
onProgressfunctionCallback for download progress updates
loggerLoggerLogger instance for model operation logs

options (hot-reload)

FieldTypeRequired?Description
modelIdstringThe ID of an existing loaded model
modelType"llm" | "whisper" | "embeddings" | "nmt" | "tts"The type of model (must match)
modelConfigobjectNew configuration to apply

modelConfig (varies by model type)

Configuration options depend on the modelType. Common options include:

LLM models:

  • ctx_size (number): Context window size
  • device ("cpu" | "gpu"): Device to use
  • temp (number): Temperature
  • top_p (number): Top-p sampling
  • top_k (number): Top-k sampling
  • seed (number): Random seed
  • n_predict (number): Max tokens to predict

Whisper models:

  • language (string): Language code (e.g., "en")
  • translate (boolean): Whether to translate to English
  • strategy ("greedy" | "beam_search"): Sampling strategy
  • temperature (number): Temperature
  • vad_params (object): VAD parameters

Embeddings models:

  • device ("cpu" | "gpu"): Device to use
  • ctx_size (number): Context size

NMT models:

  • from (string): Source language
  • to (string): Target language
  • temperature (number): Temperature

TTS models:

  • language (string): Language code

delegate

FieldTypeRequired?Description
topicstringP2P topic for delegation
providerPublicKeystringProvider's public key
timeoutnumberTimeout in milliseconds
fallbackToLocalbooleanWhether to fallback to local inference if delegation fails

onProgress callback

(progress: ModelProgressUpdate) => void

ModelProgressUpdate object:

FieldTypeDescription
percentagenumberDownload percentage (0-100)
downloadednumberBytes downloaded
totalnumberTotal bytes
downloadKeystringKey for canceling download
shardInfoobjectShard information (if sharded model)

Returns

Promise<string> — Promise that resolves to the model ID (either the provided modelSrc or a generated ID)

Throws

  • When model loading fails, with details in the error message
  • When streaming ends unexpectedly (only when using onProgress)
  • When receiving an invalid response type from the server

Example

// Local file path - absolute path
const localModelId = await loadModel({
  modelSrc: "/home/user/models/llama-7b.gguf",
  modelType: "llm",
  modelConfig: { ctx_size: 2048 }
});

// Local file path - relative path
const relativeModelId = await loadModel({
  modelSrc: "./models/whisper-base.gguf",
  modelType: "whisper"
});

// Hyperdrive URL with key and path
const hyperdriveId = await loadModel({
  modelSrc: "pear://<hyperdrive-key>/llama-7b.gguf",
  modelType: "llm",
  modelConfig: { ctx_size: 2048 }
});

// Remote HTTP/HTTPS URL with progress tracking
const remoteId = await loadModel({
  modelSrc: "https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGUF/resolve/main/llama-2-7b-chat.Q4_K_M.gguf",
  modelType: "llm",
  onProgress: (progress) => {
    console.log(`Downloaded: ${progress.percentage}%`);
  }
});

// Multimodal model with projection
const multimodalId = await loadModel({
  modelSrc: "https://huggingface.co/.../main-model.gguf",
  modelType: "llm",
  projectionModelSrc: "https://huggingface.co/.../projection-model.gguf",
  modelConfig: { ctx_size: 512 },
  onProgress: (progress) => {
    console.log(`Loading: ${progress.percentage}%`);
  }
});

// Whisper with VAD model
const whisperId = await loadModel({
  modelSrc: "https://huggingface.co/.../whisper-model.gguf",
  modelType: "whisper",
  vadModelSrc: "https://huggingface.co/.../vad-model.bin",
  modelConfig: {
    language: "en",
    strategy: "greedy",
    vad_params: {
      threshold: 0.35
    }
  }
});

// Load with automatic logging
import { getLogger } from "@qvac/sdk";
const logger = getLogger("my-app");

const modelId = await loadModel({
  modelSrc: "/path/to/model.gguf",
  modelType: "llm",
  logger
});

On this page