@qvac/ocr-onnx
Optical character recognition (OCR) for extracting text from images.
Overview
Bare module that adds support for OCR in QVAC using ONNX runtime as the inference engine. It runs a two-stage pipeline and requires compatible models for both stages:
- Text detection: locate text regions in an image
- Text recognition: decode characters in detected regions
Models
You can load any ONNX Runtime-compatible OCR pipeline. Required files: detector_craft.onnx + recognizer_<lang>.onnx (file format: *.onnx).
Requirement
Bare v1.24
Installation
npm i @qvac/ocr-onnxQuickstart
If you don't have Bare runtime, install it:
npm i -g bareCreate a new project:
mkdir qvac-ocr-quickstart
cd qvac-ocr-quickstart
npm init -yInstall dependencies:
npm i @qvac/ocr-onnx bare-pathPlace the OCR model files (detector_craft.onnx and recognizer_latin.onnx) into models/ocr/. These are available from our model registry.
Create index.js:
'use strict'
const path = require('bare-path')
const { ONNXOcr } = require('@qvac/ocr-onnx')
const imagePath = path.resolve('./my-image.jpg')
async function main () {
const detectorPath = './models/ocr/detector_craft.onnx'
const recognizerPath = './models/ocr/recognizer_latin.onnx'
const model = new ONNXOcr({
params: {
langList: ['en'],
pathDetector: detectorPath,
pathRecognizer: recognizerPath,
useGPU: false
},
opts: { stats: true }
})
try {
console.log('Loading OCR model...')
await model.load()
console.log('Model loaded.')
console.log(`Running OCR on: ${imagePath}`)
const response = await model.run({
path: imagePath
})
console.log('Waiting for OCR results...')
await response
.onUpdate(data => {
console.log('--- OCR Update ---')
console.log('Output: ' + JSON.stringify(data.map(o => o[1])))
console.log('--- data ---')
console.log(JSON.stringify(data, null, 2))
console.log('------------------')
})
.await()
console.log('OCR finished!')
if (response.stats) {
console.log(`Inference stats: ${JSON.stringify(response.stats)}`)
}
} catch (err) {
console.error('Error during OCR processing:', err)
} finally {
console.log('Unloading model...')
await model.unload()
console.log('Model unloaded.')
}
}
main().catch(console.error)Run index.js:
bare index.jsUsage
The library provides a straightforward workflow for image-based text recognition:
1. Configure Parameters
Define the arguments for the OCR instance, including paths to the ONNX models and the list of languages to recognize.
const args = {
params: {
// Required parameters
langList: ['en'], // Language codes (ISO 639-1)
pathDetector: './models/ocr/detector_craft.onnx',
pathRecognizer: './models/ocr/recognizer_latin.onnx',
// Or use prefix: pathRecognizerPrefix: './models/ocr/recognizer_',
// Optional parameters
useGPU: true, // Enable GPU acceleration (default: true)
timeout: 120, // Inference timeout in seconds (default: 120)
// Performance tuning (optional)
magRatio: 1.5, // Detection magnification ratio (default: 1.5)
defaultRotationAngles: [90, 270], // Rotation angles to try (default: [90, 270])
contrastRetry: false, // Retry low-confidence with contrast adjustment (default: false)
lowConfidenceThreshold: 0.4, // Threshold for contrast retry (default: 0.4)
recognizerBatchSize: 32 // Batch size for recognizer inference (default: 32)
},
opts: {
stats: true // Enable performance statistics logging
}
}Required Parameters
| Parameter | Type | Description |
|---|---|---|
langList | string[] | List of language codes (ISO 639-1). The first supported language determines the recognizer model. See Supported Languages. |
pathDetector | string | Path to the detector ONNX model file. |
pathRecognizer | string | Path to the recognizer ONNX model file. Required if pathRecognizerPrefix is not provided. |
pathRecognizerPrefix | string | Prefix path for recognizer model. The library appends the language suffix automatically (e.g., recognizer_latin.onnx). Required if pathRecognizer is not provided. |
Optional Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
useGPU | boolean | true | Enable GPU/NPU/TPU acceleration. Falls back to CPU if unavailable. |
timeout | number | 120 | Maximum inference time in seconds. Increase for complex images or slower devices. |
magRatio | number | 1.5 | Detection magnification ratio (1.0-2.0). Higher values improve detection of small text but increase processing time. |
defaultRotationAngles | number[] | [90, 270] | Rotation angles to try for text detection. Use [] to disable rotation variants. |
contrastRetry | boolean | false | Re-process low-confidence regions with adjusted contrast. Improves accuracy but increases memory usage. |
lowConfidenceThreshold | number | 0.4 | Confidence threshold (0-1) below which contrast retry is triggered (when contrastRetry is enabled). |
recognizerBatchSize | number | 32 | Number of text regions processed per batch. Lower values reduce memory usage on mobile devices. |
2. Create Model Instance
Import the library and create a new instance with the configured arguments.
const { ONNXOcr } = require('@qvac/ocr-onnx')
const model = new ONNXOcr(args)3. Load Model
Asynchronously load the ONNX models specified in the parameters.
try {
await model.load()
console.log('OCR model loaded successfully.')
} catch (error) {
console.error('Failed to load OCR model:', error)
}4. Run OCR
Pass the path to the input image file to the run method. Supported formats: BMP, JPEG, and PNG.
const imagePath = 'path/to/your/image.jpg'
try {
const response = await model.run({
path: imagePath,
options: {
paragraph: true, // Group results into paragraphs (default: false)
rotationAngles: [90, 270], // Override default rotation angles for this run
boxMarginMultiplier: 1.0 // Adjust bounding box margins
}
})
// ... process the response (see step 5)
} catch (error) {
console.error('OCR failed:', error)
}Runtime Options
| Option | Type | Default | Description |
|---|---|---|---|
paragraph | boolean | false | Group detected text regions into paragraphs based on proximity. |
rotationAngles | number[] | Uses defaultRotationAngles | Override default rotation angles for this specific run. |
boxMarginMultiplier | number | 1.0 | Multiplier for bounding box margins around detected text. |
5. Process Output
The run method returns a QvacResponse object. Use its methods to handle the OCR results as they become available.
// Option 1: Using onUpdate callback
await response
.onUpdate(data => {
// data contains OCR results for a chunk or the final result
console.log('OCR Update:', JSON.stringify(data))
})
.await() // Wait for the entire process to complete
// Option 2: Using async iterator (if supported by QvacResponse in the future)
// for await (const data of response.iterate()) {
// console.log('OCR Chunk:', JSON.stringify(data))
// }
// Access performance stats if enabled
if (response.stats) {
console.log(`Inference stats: ${JSON.stringify(response.stats)}`)
}See Output Format for the structure of the results.
6. Release Resources
Unload the model and free up resources when done.
try {
await model.unload()
console.log('OCR model unloaded.')
} catch (error) {
console.error('Failed to unload model:', error)
}Output Format
The output is typically received via the onUpdate callback of the QvacResponse object. It's a JSON array where each element represents a detected text block.
Each text block contains:
- Bounding Box: An array of four
[x, y]coordinate pairs defining the corners of the box around the detected text. Coordinates are clockwise, starting from the top-left relative to the text orientation. - Detected Text: The recognized text string.
- Confidence Score: A numerical value indicating the model's confidence in the recognition (range may vary, often 0-1).
[ // Array of detected text blocks
[ // First text block
[ // Bounding Box
[x1, y1], // Top-left corner
[x2, y2], // Top-right corner
[x3, y3], // Bottom-right corner
[x4, y4] // Bottom-left corner
],
"Detected Text String", // Recognized text
0.95 // Confidence score
],
[ // Second text block
[ /* Bounding Box */ ],
"Another piece of text",
0.88
]
// ... more text blocks
]Example:
[[
[
[10, 10],
[150, 12],
[149, 30],
[9, 28]
],
"Example Text",
0.85
]]The box coordinates are always provided in clockwise direction and starting from the top-left point with relation to the extracted text. Therefore, it is possible to know how extracted text is rotated based on this.
(Note: The exact structure and timing of updates might depend on internal buffering and the paragraph option.)
Supported Languages
Language support is determined by the recognizer model used. Each recognizer model supports a specific set of languages. The library automatically selects the appropriate model based on the langList parameter.
| Recognizer Model | Languages |
|---|---|
recognizer_latin.onnx | af, az, bs, cs, cy, da, de, en, es, et, fr, ga, hr, hu, id, is, it, ku, la, lt, lv, mi, ms, mt, nl, no, oc, pi, pl, pt, ro, rs_latin, sk, sl, sq, sv, sw, tl, tr, uz, vi |
recognizer_arabic.onnx | ar, fa, ug, ur |
recognizer_cyrillic.onnx | ru, rs_cyrillic, be, bg, uk, mn, abq, ady, kbd, ava, dar, inh, che, lbe, lez, tab, tjk |
recognizer_devanagari.onnx | hi, mr, ne, bh, mai, ang, bho, mah, sck, new, gom, sa, bgc |
recognizer_bengali.onnx | bn, as, mni |
recognizer_thai.onnx | th |
recognizer_zh_sim.onnx | ch_sim |
recognizer_zh_tra.onnx | ch_tra |
recognizer_japanese.onnx | ja |
recognizer_korean.onnx | ko |
recognizer_tamil.onnx | ta |
recognizer_telugu.onnx | te |
recognizer_kannada.onnx | kn |
See supportedLanguages.js for the complete language definitions.