QVAC Logo

loadModel( )

Loads a machine learning model from a local path, remote URL, or Hyperdrive key.

// Load new model
function loadModel(options: LoadModelOptions): Promise<string>;

// Hot-reload config on an already-loaded model
function loadModel(options: ReloadConfigOptions): Promise<string>;

Supports multiple model types: LLM, Whisper (speech recognition), embeddings, NMT (translation), TTS, and OCR. Handles local file paths, HTTP/HTTPS URLs, Hyperdrive URLs (pear://), and registry URLs.

When onProgress is provided, streaming is used for real-time download progress. Otherwise, a simple request-response pattern is used.

Parameters

NameTypeRequired?Description
optionsLoadModelOptions | ReloadConfigOptionsConfiguration for loading or hot-reloading a model

LoadModelOptions

Common fields present in all variants:

FieldTypeRequired?DefaultDescription
modelSrcstringModel source — local path, HTTP(S) URL, Hyperdrive pear:// URL, or registry URL
modelTypestringThe type of model — see model type variants
modelConfigobject{}Model-specific configuration (varies by modelType). LLM models accept a verbosity field — use the exported VERBOSITY constant (VERBOSITY.ERROR, VERBOSITY.WARN, VERBOSITY.INFO, VERBOSITY.DEBUG).
seedbooleanfalseWhether to seed the model on Hyperdrive after download
delegateDelegateDelegation configuration for remote inference
onProgress(progress: ModelProgressUpdate) => voidCallback for real-time download progress
loggerLoggerLogger instance — model operation logs are forwarded to this logger

Delegate

Optional delegation configuration for remote (P2P) inference:

FieldTypeRequired?DefaultDescription
topicstringP2P topic for delegation
providerPublicKeystringProvider's public key
timeoutnumberTimeout in milliseconds (min 100)
fallbackToLocalbooleanfalseWhether to fallback to local inference if delegation fails
forceNewConnectionbooleanfalseForce a new connection to the provider

ReloadConfigOptions

Hot-reload configuration on an already-loaded model without reloading the model weights. Currently supported for Whisper models only.

FieldTypeRequired?Description
modelIdstringThe ID of an existing loaded model (16-char hex)
modelTypestringThe type of model (must match the loaded model)
modelConfigobjectNew configuration to apply

Model type variants

Additional fields depend on modelType:

"llm"

FieldTypeRequired?DefaultDescription
projectionModelSrcstringProjection model source for multimodal models
toolFormat"json" | "xml""json"Tool call format

"whisper"

FieldTypeRequired?Description
vadModelSrcstringVAD model source for voice activity detection

"embeddings"

FieldTypeRequired?DefaultDescription
modelConfig.gpuLayersnumber99Number of layers offloaded to GPU
modelConfig.devicestring"gpu"Device to use
modelConfig.batchSizenumber1024Embedding batch size
modelConfig.ctxSizenumberContext size
modelConfig.flashAttention"on" | "off"Flash attention toggle
modelConfig.rawConfigstringRaw CLI override (advanced)

"nmt"

modelConfig is required and is a discriminated union on engine:

FieldTypeRequired?Description
modelConfig.engine"Opus" | "Bergamot" | "IndicTrans"Translation engine (determines available languages)
modelConfig.fromstringSource language code
modelConfig.tostringTarget language code
srcVocabSrcstringSource vocabulary file
dstVocabSrcstringDestination vocabulary file

See modelConfig details below for generation parameters.

"tts"

modelConfig is required and is a discriminated union on ttsEngine:

Chatterbox engine (ttsEngine: "chatterbox"):

FieldTypeRequired?Description
ttsEngine"chatterbox"Engine discriminator
language"en" | "es" | "de" | "it"Output language
ttsTokenizerSrcstringTokenizer model source
ttsSpeechEncoderSrcstringSpeech encoder model source
ttsEmbedTokensSrcstringEmbed tokens model source
ttsConditionalDecoderSrcstringConditional decoder model source
ttsLanguageModelSrcstringLanguage model source
referenceAudioSrcstringReference WAV file for voice cloning

Supertonic engine (ttsEngine: "supertonic"):

FieldTypeRequired?Description
ttsEngine"supertonic"Engine discriminator
language"en" | "es" | "de" | "it"Output language
ttsTokenizerSrcstringTokenizer model source
ttsTextEncoderSrcstringText encoder model source
ttsLatentDenoiserSrcstringLatent denoiser model source
ttsVoiceDecoderSrcstringVoice decoder model source
ttsVoiceSrcstringVoice .bin file source
ttsSpeednumberSpeech speed multiplier
ttsNumInferenceStepsnumberNumber of inference steps

"ocr"

FieldTypeRequired?Description
detectorModelSrcstringDetector model source for OCR

Custom plugin

Any modelType string that is not a built-in type. modelConfig accepts Record<string, unknown>.

modelConfig reference

LLM modelConfig

FieldTypeDefaultDescription
ctx_sizenumber1024Context window size
devicestring"gpu"Device to use
gpu_layersnumber99Number of layers offloaded to GPU
system_promptstring"You are a helpful assistant."System prompt
tempnumberTemperature (0–2)
top_pnumberTop-p sampling (0–1)
top_knumberTop-k sampling (0–128)
seednumberRandom seed
predictnumberMax tokens to predict. -1 = until stop token, -2 = until context filled
lorastringLoRA adapter path
no_mmapbooleanDisable memory-mapped I/O
verbosity0 | 1 | 2 | 3Engine verbosity — use exported VERBOSITY constant
presence_penaltynumberPresence penalty
frequency_penaltynumberFrequency penalty
repeat_penaltynumberRepeat penalty
stop_sequencesstring[]Custom stop sequences
n_discardednumberNumber of discarded tokens
toolsbooleanEnable tool calling support

Whisper modelConfig

Common fields:

FieldTypeDescription
languagestringLanguage code (e.g., "en")
translatebooleanWhether to translate to English
strategy"greedy" | "beam_search"Sampling strategy
temperaturenumberTemperature
initial_promptstringInitial prompt for the decoder
detect_languagebooleanAuto-detect language
vad_paramsobjectVAD parameters — { threshold?, min_speech_duration_ms?, min_silence_duration_ms?, max_speech_duration_s?, speech_pad_ms?, samples_overlap? }
audio_format"f32le" | "s16le"Audio format
contextParamsobjectContext parameters — { use_gpu?, flash_attn?, gpu_device? }

Additional fields: n_threads, n_max_text_ctx, offset_ms, duration_ms, audio_ctx, no_context, no_timestamps, single_segment, print_special, print_progress, print_realtime, print_timestamps, token_timestamps, thold_pt, thold_ptsum, max_len, split_on_word, max_tokens, debug_mode, tdrz_enable, suppress_regex, suppress_blank, suppress_nst, length_penalty, temperature_inc, entropy_thold, logprob_thold, greedy_best_of, beam_search_beam_size. All optional. See whisperConfigSchema in the source for details.

NMT modelConfig

Discriminated union on engine. Common generation parameters (all optional):

FieldTypeDefaultDescription
mode"full""full"Translation mode
beamsizenumber4Beam size
lengthpenaltynumber1.0Length penalty
maxlengthnumber512Max output length
repetitionpenaltynumber1.0Repetition penalty
norepeatngramsizenumber0No-repeat n-gram size
temperaturenumber0.3Temperature
topknumber0Top-k sampling
toppnumber1.0Top-p sampling

Engine-specific:

  • Opus: from/to accept "en" | "de" | "es" | "it" | "ru" | "ja"
  • Bergamot: from/to accept 24 languages (en, ar, bg, ca, cs, de, es, et, fi, fr, hu, is, it, ja, ko, lt, lv, nl, pl, pt, ru, sk, sl, uk, zh). Additional fields: srcVocabPath, dstVocabPath, normalize
  • IndicTrans: from/to accept 26 Indic language codes (e.g., "eng_Latn", "hin_Deva")

OCR modelConfig

FieldTypeDescription
langListstring[]Languages to detect
useGPUbooleanUse GPU acceleration
timeoutnumberTimeout in milliseconds
magRationumberMagnification ratio for detection
defaultRotationAnglesnumber[]Rotation angles to try
contrastRetrybooleanRetry with contrast adjustment
lowConfidenceThresholdnumberThreshold for low-confidence filtering
recognizerBatchSizenumberBatch size for recognizer

ModelProgressUpdate

FieldTypeDescription
type"modelProgress"Event type
downloadednumberBytes downloaded so far
totalnumberTotal bytes expected
percentagenumberDownload percentage
downloadKeystringUnique download key (use with cancel())
shardInfoobjectShard progress (optional, for sharded models)
shardInfo.currentShardnumberCurrent shard index
shardInfo.totalShardsnumberTotal number of shards
shardInfo.shardNamestringCurrent shard file name
shardInfo.overallDownloadednumberTotal bytes downloaded across all shards
shardInfo.overallTotalnumberTotal bytes across all shards
shardInfo.overallPercentagenumberOverall percentage across all shards
onnxInfoobjectONNX multi-file progress (optional, for ONNX models)
onnxInfo.currentFilestringCurrent file being downloaded
onnxInfo.fileIndexnumberCurrent file index
onnxInfo.totalFilesnumberTotal number of files
onnxInfo.overallDownloadednumberTotal bytes downloaded across all files
onnxInfo.overallTotalnumberTotal bytes across all files
onnxInfo.overallPercentagenumberOverall percentage across all files

Returns

Promise<string> — Resolves to the model ID (used to reference the model in subsequent API calls).

Throws

ErrorWhen
MODEL_LOAD_FAILEDModel loading fails
STREAM_ENDED_WITHOUT_RESPONSEStreaming ends without a final response (when using onProgress)
INVALID_RESPONSE_TYPEResponse type does not match expected "loadModel"

Example

// Local file path
const modelId = await loadModel({
  modelSrc: "/home/user/models/llama-7b.gguf",
  modelType: "llm",
  modelConfig: { contextSize: 2048 }
});

// Remote URL with progress tracking
const modelId = await loadModel({
  modelSrc: "https://huggingface.co/.../model.gguf",
  modelType: "llm",
  onProgress: (progress) => {
    console.log(`Downloaded: ${progress.percentage}%`);
  }
});

// Hyperdrive URL
const modelId = await loadModel({
  modelSrc: "pear://<hyperdrive-key>/llama-7b.gguf",
  modelType: "llm",
  modelConfig: { contextSize: 2048 }
});

// Multimodal model with projection
const modelId = await loadModel({
  modelSrc: "https://huggingface.co/.../main-model.gguf",
  modelType: "llm",
  projectionModelSrc: "https://huggingface.co/.../projection-model.gguf",
  modelConfig: { ctx_size: 512 }
});

// Whisper with VAD model
const modelId = await loadModel({
  modelSrc: "https://huggingface.co/.../whisper-model.gguf",
  modelType: "whisper",
  vadModelSrc: "https://huggingface.co/.../vad-model.bin",
  modelConfig: {
    mode: "caption",
    output_format: "plaintext",
    min_seconds: 2,
    max_seconds: 6
  }
});

// With logger forwarding
import { getLogger } from "@qvac/sdk";
const logger = getLogger("my-app");

const modelId = await loadModel({
  modelSrc: "/path/to/model.gguf",
  modelType: "llm",
  logger
});

On this page