QVAC Logo
How-to guides

OCR

Optical character recognition (OCR) for extracting text from images.

Overview

OCR uses ONNX runtime as the inference engine. It runs a two-stage pipeline and requires compatible models for both stages:

  • Text detection: locate text regions in an image
  • Text recognition: decode characters in detected regions

Load supported models using modelType: "ocr". Then, provide an image as either a file path (string) or an in-memory buffer. Each OCR block contains extracted text and may include bbox (bounding box coordinates) and confidence (recognition score).

Functions

Use the following sequence of function calls:

  1. loadModel()
  2. ocr()
  3. unloadModel()

For how to use each function, see SDK — API reference.

Models

You can load any ONNX Runtime-compatible OCR pipeline. Required files: detector_craft.onnx + recognizer_<lang>.onnx (file format: *.onnx).

For models available as constants, see SDK — Models.

Example

The following script shows an example of OCR:

ocr.js
import { close, loadModel, ocr, OCR_LATIN_RECOGNIZER_1, unloadModel, } from "@qvac/sdk";
import path from "path";
import { fileURLToPath } from "url";
const __dirname = path.dirname(fileURLToPath(import.meta.url));
const imagePath = process.argv[2] || path.join(__dirname, "image/basic_test.bmp");
try {
    console.log("🚀 Loading OCR model...");
    const modelId = await loadModel({
        modelSrc: OCR_LATIN_RECOGNIZER_1,
        modelType: "ocr",
        modelConfig: {
            langList: ["en"],
            useGPU: true,
            timeout: 30000,
            magRatio: 1.5,
            defaultRotationAngles: [90, 180, 270],
            contrastRetry: false,
            lowConfidenceThreshold: 0.5,
            recognizerBatchSize: 1,
        },
    });
    console.log(`✅ Model loaded successfully! Model ID: ${modelId}`);
    console.log(`\n🔍 Running OCR on: ${imagePath}`);
    const { blocks } = ocr({
        modelId,
        image: imagePath,
        options: {
            paragraph: false,
        },
    });
    const result = await blocks;
    console.log("\n📝 OCR Results:");
    console.log("================================");
    for (const block of result) {
        console.log(`\n📄 Text: ${block.text}`);
        if (block.bbox) {
            console.log(`   📍 BBox: [${block.bbox.join(", ")}]`);
        }
        if (block.confidence !== undefined) {
            console.log(`   ✓ Confidence: ${block.confidence}`);
        }
    }
    console.log("\n================================");
    console.log("\n🔄 Unloading model...");
    await unloadModel({ modelId, clearStorage: false });
    console.log("✅ Model unloaded successfully.");
    process.exit(0);
}
catch (error) {
    console.error("❌ Error during OCR processing:", error);
    await close();
}

Tip: all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see SDK quickstart.

On this page