How it works

Understand what happens under the hood when you use QVAC SDK in your application.

Overview

The SDK supports multiple JS runtimes, but its underlying components run only on Bare. When the SDK runs in a runtime other than Bare, it spawns a Bare worker where all AI operations will take place. The worker is started lazily on the first RPC call and can be explicitly shut down with close().

Phase 1: initialization

The first time you call loadModel() (or any function other than close()), the SDK performs a complete initialization sequence. It initializes a runtime-specific RPC client and sends configuration to the worker via the internal __init_config message. The worker process is spawned once and reused for subsequent calls until you explicitly close it. In Bare runtime, no separate worker process is spawned; requests are handled in-process.

Phase 2: model loading

There is only a single RPC client and Bare worker per application, not per model — i.e., singleton pattern. The model is downloaded and loaded into memory, and registered with a unique ID. From that point on, it will be available for AI inference until you unload it — call unloadModel() to free its memory.

Phase 3: inference

You can call loadModel() multiple times to make multiple models ready for use simultaneously. Additionally, you can perform AI inference multiple times with all of them. When you no longer need a model, call unloadModel() to free up resources.

Phase 4: shutdown

close() explicitly shuts down the worker and releases the RPC connection. In Node/Expo, this terminates the worker process; in Bare, the call is a no-op since there is no separate worker process. After close(), the next SDK call will reinitialize the RPC client and spawn a fresh worker.

Tip

unloadModel() will automatically close the RPC connection when there are no active models or providers, but close() is the explicit way to shut down the SDK instance.