QVAC Logo
How-to guides

Text embeddings

Vector embedding generation for semantic search, clustering, and retrieval.

Overview

Text embeddings uses llama.cpp as inference engine. Load any supported model using modelType: "embeddings". Then, provide text input as text where the value is either a single string or an array of strings.

embed() returns a single embedding vector (number[]) for single text input, or an array of embedding vectors (number[][]) for batch input.

Functions

Use the following sequence of function calls:

  1. loadModel()
  2. embed()
  3. unloadModel()

For how to use each function, see SDK — API reference.

Models

You can load any llama.cpp-compatible embeddings model. Model file format: *.gguf.

  • If the model is sharded across multiple files (a multi-file bundle), see Sharded models.
  • For models available as constants, see SDK — Models.

Example

The following script shows an example of embedding:

text-embeddings.js
import { embed, GTE_LARGE_FP16, loadModel, unloadModel } from "@qvac/sdk";
function cosineSimilarity(vecA, vecB) {
    let dotProduct = 0;
    for (let i = 0; i < vecA.length; i++) {
        dotProduct += vecA[i] * vecB[i];
    }
    return dotProduct;
}
try {
    const modelId = await loadModel({
        modelSrc: GTE_LARGE_FP16,
        modelType: "embeddings",
        onProgress: (progress) => {
            console.log(progress);
        },
        modelConfig: {
            gpuLayers: 99,
            device: "gpu",
        },
    });
    console.log("\n📝 Example 1: Single Text Embedding");
    console.log("=".repeat(50));
    const singleEmbedding = await embed({ modelId, text: "Hello, world!" });
    console.log("Input: 'Hello, world!'");
    console.log("Embedding dimensions:", singleEmbedding.length);
    console.log("First 10 values:", singleEmbedding.slice(0, 10));
    console.log("\n📝 Example 2: Batch Text Embeddings");
    console.log("=".repeat(50));
    const texts = [
        "The quick brown fox jumps over the lazy dog",
        "A fast auburn fox leaps over a sleepy canine",
        "Python is a programming language",
    ];
    const batchEmbeddings = await embed({ modelId, text: texts });
    console.log("Input: Array of", texts.length, "texts");
    console.log("Output: Array of", batchEmbeddings.length, "embeddings");
    const [emb1, emb2, emb3] = batchEmbeddings;
    if (!emb1 || !emb2 || !emb3) {
        throw new Error("Expected 3 embeddings");
    }
    console.log("Each embedding dimensions:", emb1.length);
    console.log("\n🔍 Similarity Analysis");
    console.log("=".repeat(50));
    const similarity1 = cosineSimilarity(emb1, emb2);
    const similarity2 = cosineSimilarity(emb1, emb3);
    console.log("Similarity between texts 1 and 2 (similar meaning):", similarity1.toFixed(4));
    console.log("Similarity between texts 1 and 3 (different topics):", similarity2.toFixed(4));
    console.log("\n💡 Higher values indicate more similar meanings");
    await unloadModel({ modelId, clearStorage: false });
}
catch (error) {
    console.error("❌ Error:", error);
    process.exit(1);
}

Tip: all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see SDK quickstart.

On this page