@qvac/ocr-onnx

Overview

Bare module that adds support for OCR in QVAC using ONNX runtime as the inference engine. It runs a two-stage pipeline and requires compatible models for both stages:

Text detection: locate text regions in an image
Text recognition: decode characters in detected regions

Models

You can load any ONNX Runtime-compatible OCR pipeline. Required files: detector_craft.onnx + recognizer_<lang>.onnx (file format: *.onnx).

Installation

npm i @qvac/ocr-onnx

Quickstart

If you don't have Bare runtime, install it:

npm i -g bare

Create a new project:

mkdir qvac-ocr-quickstart
cd qvac-ocr-quickstart
npm init -y

Install dependencies:

npm i @qvac/ocr-onnx bare-path

Place the OCR model files (detector_craft.onnx and recognizer_latin.onnx) into models/ocr/. These are available from our model registry.

Create index.js:

index.js

'use strict'

const path = require('bare-path')
const { ONNXOcr } = require('@qvac/ocr-onnx')

const imagePath = path.resolve('./my-image.jpg')

async function main () {
  const detectorPath = './models/ocr/detector_craft.onnx'
  const recognizerPath = './models/ocr/recognizer_latin.onnx'

  const model = new ONNXOcr({
    params: {
      langList: ['en'],
      pathDetector: detectorPath,
      pathRecognizer: recognizerPath,
      useGPU: false
    },
    opts: { stats: true }
  })

  try {
    console.log('Loading OCR model...')
    await model.load()
    console.log('Model loaded.')

    console.log(`Running OCR on: ${imagePath}`)
    const response = await model.run({
      path: imagePath
    })

    console.log('Waiting for OCR results...')
    await response
      .onUpdate(data => {
        console.log('--- OCR Update ---')
        console.log('Output: ' + JSON.stringify(data.map(o => o[1])))
        console.log('--- data ---')
        console.log(JSON.stringify(data, null, 2))
        console.log('------------------')
      })
      .await()

    console.log('OCR finished!')
    if (response.stats) {
      console.log(`Inference stats: ${JSON.stringify(response.stats)}`)
    }
  } catch (err) {
    console.error('Error during OCR processing:', err)
  } finally {
    console.log('Unloading model...')
    await model.unload()
    console.log('Model unloaded.')
  }
}

main().catch(console.error)

Run index.js:

bare index.js

Usage

The library provides a straightforward workflow for image-based text recognition:

1. Configure Parameters

Define the arguments for the OCR instance, including paths to the ONNX models and the list of languages to recognize.

const args = {
  params: {
    // Required parameters
    langList: ['en'],                              // Language codes (ISO 639-1)
    pathDetector: './models/ocr/detector_craft.onnx',
    pathRecognizer: './models/ocr/recognizer_latin.onnx',
    // Or use prefix: pathRecognizerPrefix: './models/ocr/recognizer_',

    // Optional parameters
    useGPU: true,                    // Enable GPU acceleration (default: true)
    timeout: 120,                    // Inference timeout in seconds (default: 120)

    // Performance tuning (optional)
    magRatio: 1.5,                   // Detection magnification ratio (default: 1.5)
    defaultRotationAngles: [90, 270], // Rotation angles to try (default: [90, 270])
    contrastRetry: false,            // Retry low-confidence with contrast adjustment (default: false)
    lowConfidenceThreshold: 0.4,     // Threshold for contrast retry (default: 0.4)
    recognizerBatchSize: 32          // Batch size for recognizer inference (default: 32)
  },
  opts: {
    stats: true                      // Enable performance statistics logging
  }
}

Required Parameters

Parameter	Type	Description
`langList`	`string[]`	List of language codes (ISO 639-1). The first supported language determines the recognizer model. See Supported Languages.
`pathDetector`	`string`	Path to the detector ONNX model file.
`pathRecognizer`	`string`	Path to the recognizer ONNX model file. Required if `pathRecognizerPrefix` is not provided.
`pathRecognizerPrefix`	`string`	Prefix path for recognizer model. The library appends the language suffix automatically (e.g., `recognizer_latin.onnx`). Required if `pathRecognizer` is not provided.

Optional Parameters

Parameter	Type	Default	Description
`useGPU`	`boolean`	`true`	Enable GPU/NPU/TPU acceleration. Falls back to CPU if unavailable.
`timeout`	`number`	`120`	Maximum inference time in seconds. Increase for complex images or slower devices.
`magRatio`	`number`	`1.5`	Detection magnification ratio (1.0-2.0). Higher values improve detection of small text but increase processing time.
`defaultRotationAngles`	`number[]`	`[90, 270]`	Rotation angles to try for text detection. Use `[]` to disable rotation variants.
`contrastRetry`	`boolean`	`false`	Re-process low-confidence regions with adjusted contrast. Improves accuracy but increases memory usage.
`lowConfidenceThreshold`	`number`	`0.4`	Confidence threshold (0-1) below which contrast retry is triggered (when `contrastRetry` is enabled).
`recognizerBatchSize`	`number`	`32`	Number of text regions processed per batch. Lower values reduce memory usage on mobile devices.

2. Create Model Instance

Import the library and create a new instance with the configured arguments.

const { ONNXOcr } = require('@qvac/ocr-onnx')

const model = new ONNXOcr(args)

3. Load Model

Asynchronously load the ONNX models specified in the parameters.

try {
  await model.load()
  console.log('OCR model loaded successfully.')
} catch (error) {
  console.error('Failed to load OCR model:', error)
}

4. Run OCR

Pass the path to the input image file to the run method. Supported formats: BMP, JPEG, and PNG.

const imagePath = 'path/to/your/image.jpg'

try {
  const response = await model.run({
     path: imagePath,
     options: {
       paragraph: true,           // Group results into paragraphs (default: false)
       rotationAngles: [90, 270], // Override default rotation angles for this run
       boxMarginMultiplier: 1.0   // Adjust bounding box margins
     }
  })
  // ... process the response (see step 5)
} catch (error) {
  console.error('OCR failed:', error)
}

Runtime Options

Option	Type	Default	Description
`paragraph`	`boolean`	`false`	Group detected text regions into paragraphs based on proximity.
`rotationAngles`	`number[]`	Uses `defaultRotationAngles`	Override default rotation angles for this specific run.
`boxMarginMultiplier`	`number`	`1.0`	Multiplier for bounding box margins around detected text.

5. Process Output

The run method returns a QvacResponse object. Use its methods to handle the OCR results as they become available.

// Option 1: Using onUpdate callback
await response
  .onUpdate(data => {
    // data contains OCR results for a chunk or the final result
    console.log('OCR Update:', JSON.stringify(data))
  })
  .await() // Wait for the entire process to complete

// Option 2: Using async iterator (if supported by QvacResponse in the future)
// for await (const data of response.iterate()) {
//   console.log('OCR Chunk:', JSON.stringify(data))
// }

// Access performance stats if enabled
if (response.stats) {
  console.log(`Inference stats: ${JSON.stringify(response.stats)}`)
}

See Output Format for the structure of the results.

6. Release Resources

Unload the model and free up resources when done.

try {
  await model.unload()
  console.log('OCR model unloaded.')
} catch (error) {
  console.error('Failed to unload model:', error)
}

Output Format

The output is typically received via the onUpdate callback of the QvacResponse object. It's a JSON array where each element represents a detected text block.

Each text block contains:

Bounding Box: An array of four [x, y] coordinate pairs defining the corners of the box around the detected text. Coordinates are clockwise, starting from the top-left relative to the text orientation.
Detected Text: The recognized text string.
Confidence Score: A numerical value indicating the model's confidence in the recognition (range may vary, often 0-1).

[ // Array of detected text blocks
  [ // First text block
    [ // Bounding Box
      [x1, y1], // Top-left corner
      [x2, y2], // Top-right corner
      [x3, y3], // Bottom-right corner
      [x4, y4]  // Bottom-left corner
    ],
    "Detected Text String", // Recognized text
    0.95 // Confidence score
  ],
  [ // Second text block
    [ /* Bounding Box */ ],
    "Another piece of text",
    0.88
  ]
  // ... more text blocks
]

Example:

[[
  [
    [10, 10],
    [150, 12],
    [149, 30],
    [9, 28]
  ],
  "Example Text",
  0.85
]]

The box coordinates are always provided in clockwise direction and starting from the top-left point with relation to the extracted text. Therefore, it is possible to know how extracted text is rotated based on this.

(Note: The exact structure and timing of updates might depend on internal buffering and the paragraph option.)

Supported Languages

Language support is determined by the recognizer model used. Each recognizer model supports a specific set of languages. The library automatically selects the appropriate model based on the langList parameter.

Recognizer Model	Languages
`recognizer_latin.onnx`	af, az, bs, cs, cy, da, de, en, es, et, fr, ga, hr, hu, id, is, it, ku, la, lt, lv, mi, ms, mt, nl, no, oc, pi, pl, pt, ro, rs_latin, sk, sl, sq, sv, sw, tl, tr, uz, vi
`recognizer_arabic.onnx`	ar, fa, ug, ur
`recognizer_cyrillic.onnx`	ru, rs_cyrillic, be, bg, uk, mn, abq, ady, kbd, ava, dar, inh, che, lbe, lez, tab, tjk
`recognizer_devanagari.onnx`	hi, mr, ne, bh, mai, ang, bho, mah, sck, new, gom, sa, bgc
`recognizer_bengali.onnx`	bn, as, mni
`recognizer_thai.onnx`	th
`recognizer_zh_sim.onnx`	ch_sim
`recognizer_zh_tra.onnx`	ch_tra
`recognizer_japanese.onnx`	ja
`recognizer_korean.onnx`	ko
`recognizer_tamil.onnx`	ta
`recognizer_telugu.onnx`	te
`recognizer_kannada.onnx`	kn

See supportedLanguages.js for the complete language definitions.

More resources

Package at npm