SDK

Overview

Install the npm package @qvac/sdk in your project. Then, load models and use them to perform AI inference locally, or delegate inference to peers using the built-in P2P capability.

Description

The JS SDK is cross-platform, type-safe, and pluggable, exposing all QVAC capabilities through a unified interface.

Key features

Unified interface: multiple AI tasks, one single npm package to install in your project.
Cross-platform: portable code across Linux, macOS, and Windows (Node.js / Bare runtime); Android and iOS (Expo).
Pluggable: build lean apps by including only what you need, and extend the SDK with custom plugins.
Type-safe: typed JS API.

Quickstart

Run your first example using the JS SDK

At the end, you’ll find instructions for running all examples in this documentation.

Installation

Install and run on Node.js, Bare, or Expo

Supported environments and how to install the SDK for each one.

Functionalities

AI tasks

Completion: LLM inference for text generation and chat via llama.cpp.
Text embeddings: vector embedding generation for semantic search, clustering, and retrieval, via llama.cpp.
Translation: text-to-text neural machine translation (NMT), via Marian and Bergamot.
Transcription: automatic speech recognition (ASR) for speech-to-text via whisper.cpp.
Text-to-Speech: speech synthesis for text-to-speech (TTS) via ONNX Runtime.
OCR: optical character recognition (OCR) for extracting text from images via ONNX runtime.
Multimodal: LLM inference over text, images, and other media within a single conversation context.
RAG: out-of-the-box retrieval-augmented generation workflow.

P2P capabilities

Delegated inference: delegate inference to peers via the Holepunch stack, enabling resource sharing.
Fetch models: download AI models from peers via the distributed model registry.
Blind relays: connect peers across NATs/firewalls by routing traffic through relay nodes.

Utilities

Logging: visibility into what's happening during loading, inference, and other operations.
Download Lifecycle: pause and resume model downloads.
Sharded models: download a model that is sharded into multiple parts.

Flow

Before you can use a model, you need to load it from some location into memory. Flow for performing AI inference:

Call function loadModel() to initialize the SDK and load one model. You can load multiple models simultaneously calling loadModel() again.
Perform AI tasks by calling the appropriate functions from SDK API — e.g., completion().
When you are done with a model, call unloadModel() to release computer resources.
Finally, close the SDK instance by calling close().

Each AI task works with different model families, and among the supported ones, you can choose which to use and how to obtain them. loadModel() manages the download and caching of models (one or multiple files), and their loading from disk into memory, preparing them for use.

loadModel() supports loading models from three different locations:

Local filesystem, by providing a path.
HTTP server, by providing an HTTP URL.
Our distributed model registry.

The SDK package does not ship with built-in models, but its API exposes constants representing preconfigured models (e.g., LLAMA_3_2_1B_INST_Q4_0). Each constant maps a model already published to our model registry. When calling loadModel(), you can provide one of these constants instead of a location, making model retrieval transparent.

Model registry index

See the index of models available in our distributed model registry.

For more on querying the model registry, see modelRegistryList(), modelRegistrySearch(), and modelRegistryGetModel().

For more on loading models, see loadModel() at @qvac/sdk API reference.

Overview

Releases

Description

Key features

Quickstart

Run your first example using the JS SDK

Installation

Install and run on Node.js, Bare, or Expo

Functionalities

AI tasks

P2P capabilities

Utilities

Flow

Models

Model registry index

Configuration

qvac.config.*

Plugin system

Built-in and custom plugins

Write a custom plugin

API

API reference

How it works

How it works

Other resources

On this page