loadModel( )
Loads a machine learning model from a local path, remote URL, or Hyperdrive key.
// Load new model
function loadModel(options: LoadModelOptions): Promise<string>;
// Hot-reload config on an already-loaded model
function loadModel(options: ReloadConfigOptions): Promise<string>;Supports multiple model types: LLM, Whisper (speech recognition), embeddings, NMT (translation), TTS, and OCR. Handles local file paths, HTTP/HTTPS URLs, Hyperdrive URLs (pear://), and registry URLs.
When onProgress is provided, streaming is used for real-time download progress. Otherwise, a simple request-response pattern is used.
Parameters
| Name | Type | Required? | Description |
|---|---|---|---|
| options | LoadModelOptions | ReloadConfigOptions | ✓ | Configuration for loading or hot-reloading a model |
LoadModelOptions
Common fields present in all variants:
| Field | Type | Required? | Default | Description |
|---|---|---|---|---|
| modelSrc | string | ✓ | — | Model source — local path, HTTP(S) URL, Hyperdrive pear:// URL, or registry URL |
| modelType | string | ✓ | — | The type of model — see model type variants |
| modelConfig | object | ✗ | {} | Model-specific configuration (varies by modelType). LLM models accept a verbosity field — use the exported VERBOSITY constant (VERBOSITY.ERROR, VERBOSITY.WARN, VERBOSITY.INFO, VERBOSITY.DEBUG). |
| seed | boolean | ✗ | false | Whether to seed the model on Hyperdrive after download |
| delegate | Delegate | ✗ | — | Delegation configuration for remote inference |
| onProgress | (progress: ModelProgressUpdate) => void | ✗ | — | Callback for real-time download progress |
| logger | Logger | ✗ | — | Logger instance — model operation logs are forwarded to this logger |
Delegate
Optional delegation configuration for remote (P2P) inference:
| Field | Type | Required? | Default | Description |
|---|---|---|---|---|
| topic | string | ✓ | — | P2P topic for delegation |
| providerPublicKey | string | ✓ | — | Provider's public key |
| timeout | number | ✗ | — | Timeout in milliseconds (min 100) |
| fallbackToLocal | boolean | ✗ | false | Whether to fallback to local inference if delegation fails |
| forceNewConnection | boolean | ✗ | false | Force a new connection to the provider |
ReloadConfigOptions
Hot-reload configuration on an already-loaded model without reloading the model weights. Currently supported for Whisper models only.
| Field | Type | Required? | Description |
|---|---|---|---|
| modelId | string | ✓ | The ID of an existing loaded model (16-char hex) |
| modelType | string | ✓ | The type of model (must match the loaded model) |
| modelConfig | object | ✓ | New configuration to apply |
Model type variants
Additional fields depend on modelType:
"llm"
| Field | Type | Required? | Default | Description |
|---|---|---|---|---|
| projectionModelSrc | string | ✗ | — | Projection model source for multimodal models |
| toolFormat | "json" | "xml" | ✗ | "json" | Tool call format |
"whisper"
| Field | Type | Required? | Description |
|---|---|---|---|
| vadModelSrc | string | ✗ | VAD model source for voice activity detection |
"embeddings"
| Field | Type | Required? | Default | Description |
|---|---|---|---|---|
| modelConfig.gpuLayers | number | ✗ | 99 | Number of layers offloaded to GPU |
| modelConfig.device | string | ✗ | "gpu" | Device to use |
| modelConfig.batchSize | number | ✗ | 1024 | Embedding batch size |
| modelConfig.ctxSize | number | ✗ | — | Context size |
| modelConfig.flashAttention | "on" | "off" | ✗ | — | Flash attention toggle |
| modelConfig.rawConfig | string | ✗ | — | Raw CLI override (advanced) |
"nmt"
modelConfig is required and is a discriminated union on engine:
| Field | Type | Required? | Description |
|---|---|---|---|
| modelConfig.engine | "Opus" | "Bergamot" | "IndicTrans" | ✓ | Translation engine (determines available languages) |
| modelConfig.from | string | ✓ | Source language code |
| modelConfig.to | string | ✓ | Target language code |
| srcVocabSrc | string | ✗ | Source vocabulary file |
| dstVocabSrc | string | ✗ | Destination vocabulary file |
See modelConfig details below for generation parameters.
"tts"
modelConfig is required and is a discriminated union on ttsEngine:
Chatterbox engine (ttsEngine: "chatterbox"):
| Field | Type | Required? | Description |
|---|---|---|---|
| ttsEngine | "chatterbox" | ✓ | Engine discriminator |
| language | "en" | "es" | "de" | "it" | ✓ | Output language |
| ttsTokenizerSrc | string | ✓ | Tokenizer model source |
| ttsSpeechEncoderSrc | string | ✓ | Speech encoder model source |
| ttsEmbedTokensSrc | string | ✓ | Embed tokens model source |
| ttsConditionalDecoderSrc | string | ✓ | Conditional decoder model source |
| ttsLanguageModelSrc | string | ✓ | Language model source |
| referenceAudioSrc | string | ✓ | Reference WAV file for voice cloning |
Supertonic engine (ttsEngine: "supertonic"):
| Field | Type | Required? | Description |
|---|---|---|---|
| ttsEngine | "supertonic" | ✓ | Engine discriminator |
| language | "en" | "es" | "de" | "it" | ✓ | Output language |
| ttsTokenizerSrc | string | ✓ | Tokenizer model source |
| ttsTextEncoderSrc | string | ✓ | Text encoder model source |
| ttsLatentDenoiserSrc | string | ✓ | Latent denoiser model source |
| ttsVoiceDecoderSrc | string | ✓ | Voice decoder model source |
| ttsVoiceSrc | string | ✓ | Voice .bin file source |
| ttsSpeed | number | ✗ | Speech speed multiplier |
| ttsNumInferenceSteps | number | ✗ | Number of inference steps |
"ocr"
| Field | Type | Required? | Description |
|---|---|---|---|
| detectorModelSrc | string | ✗ | Detector model source for OCR |
Custom plugin
Any modelType string that is not a built-in type. modelConfig accepts Record<string, unknown>.
modelConfig reference
LLM modelConfig
| Field | Type | Default | Description |
|---|---|---|---|
| ctx_size | number | 1024 | Context window size |
| device | string | "gpu" | Device to use |
| gpu_layers | number | 99 | Number of layers offloaded to GPU |
| system_prompt | string | "You are a helpful assistant." | System prompt |
| temp | number | — | Temperature (0–2) |
| top_p | number | — | Top-p sampling (0–1) |
| top_k | number | — | Top-k sampling (0–128) |
| seed | number | — | Random seed |
| predict | number | — | Max tokens to predict. -1 = until stop token, -2 = until context filled |
| lora | string | — | LoRA adapter path |
| no_mmap | boolean | — | Disable memory-mapped I/O |
| verbosity | 0 | 1 | 2 | 3 | — | Engine verbosity — use exported VERBOSITY constant |
| presence_penalty | number | — | Presence penalty |
| frequency_penalty | number | — | Frequency penalty |
| repeat_penalty | number | — | Repeat penalty |
| stop_sequences | string[] | — | Custom stop sequences |
| n_discarded | number | — | Number of discarded tokens |
| tools | boolean | — | Enable tool calling support |
Whisper modelConfig
Common fields:
| Field | Type | Description |
|---|---|---|
| language | string | Language code (e.g., "en") |
| translate | boolean | Whether to translate to English |
| strategy | "greedy" | "beam_search" | Sampling strategy |
| temperature | number | Temperature |
| initial_prompt | string | Initial prompt for the decoder |
| detect_language | boolean | Auto-detect language |
| vad_params | object | VAD parameters — { threshold?, min_speech_duration_ms?, min_silence_duration_ms?, max_speech_duration_s?, speech_pad_ms?, samples_overlap? } |
| audio_format | "f32le" | "s16le" | Audio format |
| contextParams | object | Context parameters — { use_gpu?, flash_attn?, gpu_device? } |
Additional fields: n_threads, n_max_text_ctx, offset_ms, duration_ms, audio_ctx, no_context, no_timestamps, single_segment, print_special, print_progress, print_realtime, print_timestamps, token_timestamps, thold_pt, thold_ptsum, max_len, split_on_word, max_tokens, debug_mode, tdrz_enable, suppress_regex, suppress_blank, suppress_nst, length_penalty, temperature_inc, entropy_thold, logprob_thold, greedy_best_of, beam_search_beam_size. All optional. See whisperConfigSchema in the source for details.
NMT modelConfig
Discriminated union on engine. Common generation parameters (all optional):
| Field | Type | Default | Description |
|---|---|---|---|
| mode | "full" | "full" | Translation mode |
| beamsize | number | 4 | Beam size |
| lengthpenalty | number | 1.0 | Length penalty |
| maxlength | number | 512 | Max output length |
| repetitionpenalty | number | 1.0 | Repetition penalty |
| norepeatngramsize | number | 0 | No-repeat n-gram size |
| temperature | number | 0.3 | Temperature |
| topk | number | 0 | Top-k sampling |
| topp | number | 1.0 | Top-p sampling |
Engine-specific:
- Opus:
from/toaccept"en" | "de" | "es" | "it" | "ru" | "ja" - Bergamot:
from/toaccept 24 languages (en, ar, bg, ca, cs, de, es, et, fi, fr, hu, is, it, ja, ko, lt, lv, nl, pl, pt, ru, sk, sl, uk, zh). Additional fields:srcVocabPath,dstVocabPath,normalize - IndicTrans:
from/toaccept 26 Indic language codes (e.g.,"eng_Latn","hin_Deva")
OCR modelConfig
| Field | Type | Description |
|---|---|---|
| langList | string[] | Languages to detect |
| useGPU | boolean | Use GPU acceleration |
| timeout | number | Timeout in milliseconds |
| magRatio | number | Magnification ratio for detection |
| defaultRotationAngles | number[] | Rotation angles to try |
| contrastRetry | boolean | Retry with contrast adjustment |
| lowConfidenceThreshold | number | Threshold for low-confidence filtering |
| recognizerBatchSize | number | Batch size for recognizer |
ModelProgressUpdate
| Field | Type | Description |
|---|---|---|
| type | "modelProgress" | Event type |
| downloaded | number | Bytes downloaded so far |
| total | number | Total bytes expected |
| percentage | number | Download percentage |
| downloadKey | string | Unique download key (use with cancel()) |
| shardInfo | object | Shard progress (optional, for sharded models) |
| shardInfo.currentShard | number | Current shard index |
| shardInfo.totalShards | number | Total number of shards |
| shardInfo.shardName | string | Current shard file name |
| shardInfo.overallDownloaded | number | Total bytes downloaded across all shards |
| shardInfo.overallTotal | number | Total bytes across all shards |
| shardInfo.overallPercentage | number | Overall percentage across all shards |
| onnxInfo | object | ONNX multi-file progress (optional, for ONNX models) |
| onnxInfo.currentFile | string | Current file being downloaded |
| onnxInfo.fileIndex | number | Current file index |
| onnxInfo.totalFiles | number | Total number of files |
| onnxInfo.overallDownloaded | number | Total bytes downloaded across all files |
| onnxInfo.overallTotal | number | Total bytes across all files |
| onnxInfo.overallPercentage | number | Overall percentage across all files |
Returns
Promise<string> — Resolves to the model ID (used to reference the model in subsequent API calls).
Throws
| Error | When |
|---|---|
MODEL_LOAD_FAILED | Model loading fails |
STREAM_ENDED_WITHOUT_RESPONSE | Streaming ends without a final response (when using onProgress) |
INVALID_RESPONSE_TYPE | Response type does not match expected "loadModel" |
Example
// Local file path
const modelId = await loadModel({
modelSrc: "/home/user/models/llama-7b.gguf",
modelType: "llm",
modelConfig: { contextSize: 2048 }
});
// Remote URL with progress tracking
const modelId = await loadModel({
modelSrc: "https://huggingface.co/.../model.gguf",
modelType: "llm",
onProgress: (progress) => {
console.log(`Downloaded: ${progress.percentage}%`);
}
});
// Hyperdrive URL
const modelId = await loadModel({
modelSrc: "pear://<hyperdrive-key>/llama-7b.gguf",
modelType: "llm",
modelConfig: { contextSize: 2048 }
});
// Multimodal model with projection
const modelId = await loadModel({
modelSrc: "https://huggingface.co/.../main-model.gguf",
modelType: "llm",
projectionModelSrc: "https://huggingface.co/.../projection-model.gguf",
modelConfig: { ctx_size: 512 }
});
// Whisper with VAD model
const modelId = await loadModel({
modelSrc: "https://huggingface.co/.../whisper-model.gguf",
modelType: "whisper",
vadModelSrc: "https://huggingface.co/.../vad-model.bin",
modelConfig: {
mode: "caption",
output_format: "plaintext",
min_seconds: 2,
max_seconds: 6
}
});
// With logger forwarding
import { getLogger } from "@qvac/sdk";
const logger = getLogger("my-app");
const modelId = await loadModel({
modelSrc: "/path/to/model.gguf",
modelType: "llm",
logger
});