@qvac/translation-nmtcpp
Text-to-text neural machine translation (NMT).
Overview
Bare module that adds support for translation in QVAC using either nmt.cpp or Bergamot as the inference engine.
Models
You should load a model compatible with your chosen inference engine:
nmt.cpp(default): OPUS-MT or IndicTrans2, converted to GGML. Model file format:*.bin.- Bergamot: Bergamot model bundle. Required files: model
*.bin+vocab*.spm.
Requirement
Bare v1.24
Installation
npm i @qvac/translation-nmtcppQuickstart
If you don't have Bare runtime, install it:
npm i -g bareCreate a new project:
mkdir qvac-translation-quickstart
cd qvac-translation-quickstart
npm init -yInstall dependencies:
npm i @qvac/translation-nmtcpp @qvac/dl-hyperdriveCreate example.js:
'use strict'
/**
* Quickstart Example
*
* This example demonstrates both translation backends:
* 1. GGML backend - Downloads model via HyperdriveDL (English to Italian)
* 2. Bergamot backend - Uses local model files (requires BERGAMOT_MODEL_PATH)
*
* Usage:
* bare examples/quickstart.js
* BERGAMOT_MODEL_PATH=/path/to/bergamot/model bare examples/quickstart.js
*
* Enable verbose C++ logging:
* VERBOSE=1 bare examples/quickstart.js
*/
const TranslationNmtcpp = require('@qvac/translation-nmtcpp')
const HyperdriveDL = require('@qvac/dl-hyperdrive')
const fs = require('bare-fs')
const path = require('bare-path')
const process = require('bare-process')
// ============================================================
// LOGGING CONFIGURATION
// Set VERBOSE=1 environment variable to enable C++ debug logs
// ============================================================
const VERBOSE = process.env.VERBOSE === '1' || process.env.VERBOSE === 'true'
const logger = VERBOSE
? {
info: (msg) => console.log('[C++ INFO]', msg),
warn: (msg) => console.warn('[C++ WARN]', msg),
error: (msg) => console.error('[C++ ERROR]', msg),
debug: (msg) => console.log('[C++ DEBUG]', msg)
}
: null // null = suppress all C++ logs
const text = 'Machine translation has revolutionized how we communicate across language barriers in the modern digital world.'
async function testGGML () {
console.log('\n=== Testing GGML Backend ===\n')
// Create `DataLoader`
const hdDL = new HyperdriveDL({
// The hyperdrive key for en-it translation model weights and config
key: 'hd://9ef58f31c20d5556722e0b58a5d262fd89801daf2e6cb28e3f21ac6e9228088f'
})
// Create the `args` object
const args = {
loader: hdDL,
params: { mode: 'full', dstLang: 'it', srcLang: 'en' },
diskPath: './models',
modelName: 'model.bin',
logger // Pass the logger
}
// Create Model Instance
const model = new TranslationNmtcpp(args, { })
// Load model
await model.load()
try {
// Run the Model
const response = await model.run(text)
await response
.onUpdate(data => {
console.log(data)
})
.await()
console.log('GGML translation finished!')
} finally {
// Unload the model
await model.unload()
// Close the DataLoader
await hdDL.close()
}
}
async function testBergamot () {
console.log('\n=== Testing Bergamot Backend ===\n')
// Use local model path for Bergamot - env var or relative path
const bergamotPath = process.env.BERGAMOT_MODEL_PATH || './model/bergamot/enit'
console.log('Model path:', bergamotPath)
// Check if model directory exists
if (!fs.existsSync(bergamotPath)) {
console.log('Bergamot model directory not found, skipping test')
console.log('Set BERGAMOT_MODEL_PATH env var or place model in ./model/bergamot/enit')
return
}
console.log('Loading model...')
// Create a local file loader for Bergamot models that are already on disk
const localLoader = {
ready: async () => { /* Models already on disk */ },
close: async () => { /* No resources to close */ },
download: async (filename) => {
// Read file from local disk
const filePath = path.join(bergamotPath, filename)
return fs.readFileSync(filePath)
},
getFileSize: async (filename) => {
const filePath = path.join(bergamotPath, filename)
const stats = fs.statSync(filePath)
return stats.size
}
}
// Create the `args` object for Bergamot
const args = {
loader: localLoader,
params: { mode: 'full', dstLang: 'it', srcLang: 'en' },
diskPath: bergamotPath,
modelName: 'model.enit.intgemm.alphas.bin',
logger // Pass the logger
}
// Config with explicit vocab paths for Bergamot
const config = {
srcVocabName: 'vocab.enit.spm',
dstVocabName: 'vocab.enit.spm',
modelType: TranslationNmtcpp.ModelTypes.Bergamot
}
// Create Model Instance
const model = new TranslationNmtcpp(args, config)
// Load model
await model.load()
console.log('Model loaded successfully!')
try {
console.log('Running translation...')
console.log('Input text:', text)
// Run the Model
const response = await model.run(text)
await response
.onUpdate(data => {
console.log('Translation output:', data)
})
.await()
console.log('Bergamot translation finished!')
} finally {
console.log('Unloading model...')
await model.unload()
// Close the local loader
await localLoader.close()
console.log('Done!')
}
}
async function main () {
try {
// Test GGML backend
await testGGML()
// Test Bergamot backend
await testBergamot()
console.log('\n=== All Tests Completed Successfully! ===\n')
} catch (error) {
console.error('Test failed:', error)
throw error
}
}
main()Run example.js:
bare example.jsUsage
The library provides a straightforward and intuitive workflow for translating text. Irrespective of the chosen model, the workflow remains the same:
1. Create DataLoader
In QVAC, the DataLoader class provides an interface for fetching model weights and other resources crucial for running AI Models. A DataLoader instance is required to successfully instantiate a ModelClass. We can create a HyperdriveDL using the following code.
const HyperdriveDL = require('@qvac/dl-hyperdrive')
const hdDL = new HyperdriveDL({
key: 'hd://528eb43b34c57b0fb7116e532cd596a9661b001870bdabf696243e8d079a74ca' // (Required) Hyperdrive key with 'hd://' prefix (raw hex also works)
// store: corestore // (Optional) A Corestore instance for persistent storage. See Glossary for details.
})It is extremely important that you provide the correct key when using a HyperdriveDataLoader. A DataLoader with model weights and settings for an en-it translation can obviously not be utilized for doing a de-en translation. Please ensure that the key being used aligns with the model (package) installed and the translation requirement.
2. Create the args object
The args object contains the DataLoader we created in the previous step and other translation parameters that control how the translation model operates, including which languages to translate between and what performance metrics to collect.
The structure varies slightly depending on which backend you're using.
OPUS/Marian
For European language translations using OPUS models from Hyperdrive:
const HyperdriveDL = require('@qvac/dl-hyperdrive')
const hdDL = new HyperdriveDL({
key: 'hd://528eb43b34c57b0fb7116e532cd596a9661b001870bdabf696243e8d079a74ca' // en-it model (MARIAN_OPUS_EN_IT)
})
const args = {
loader: hdDL,
params: {
mode: 'full', // Model loading mode (full is recommended)
srcLang: 'en', // Source language (ISO 639-1 code)
dstLang: 'it' // Target language (ISO 639-1 code)
},
diskPath: './models/opus-en-it', // Unique directory per model
modelName: 'model.bin' // Always 'model.bin' for OPUS models
}Key Parameters:
| Parameter | Description | Example |
|---|---|---|
srcLang | Source language (ISO 639-1) | 'en', 'de', 'it', 'es', 'fr' |
dstLang | Target language (ISO 639-1) | 'en', 'de', 'it', 'es', 'fr' |
modelName | Always 'model.bin' | 'model.bin' |
IndicTrans2
For Indic language translations (English ↔ Hindi, Bengali, Tamil, etc.):
const HyperdriveDL = require('@qvac/dl-hyperdrive')
const hdDL = new HyperdriveDL({
key: 'hd://8c0f50e7c75527213a090d2f1dcd9dbdb8262e5549c8cbbb74cb7cb12b156892' // en-hi 200M model (MARIAN_EN_HI_INDIC_200M_Q0F32)
})
const args = {
loader: hdDL,
params: {
mode: 'full',
srcLang: 'eng_Latn', // Source language (ISO 15924 code)
dstLang: 'hin_Deva' // Target language (ISO 15924 code)
},
diskPath: './models/indic-en-hi-200M', // Unique directory per model
modelName: 'ggml-indictrans2-en-indic-dist-200M.bin' // Must match exact filename in Hyperdrive
}Key Parameters:
| Parameter | Description | Example |
|---|---|---|
srcLang | Source language (ISO 15924) | 'eng_Latn', 'hin_Deva', 'ben_Beng' |
dstLang | Target language (ISO 15924) | 'eng_Latn', 'hin_Deva', 'tam_Taml' |
modelName | Specific filename per model | 'ggml-indictrans2-en-indic-dist-200M.bin' |
modelType | Required: TranslationNmtcpp.ModelTypes.IndicTrans | - |
IndicTrans2 model naming pattern:
ggml-indictrans2-{direction}-{size}.binfor q0f32 quantizationggml-indictrans2-{direction}-{size}-q0f16.binfor q0f16 quantizationggml-indictrans2-{direction}-{size}-q4_0.binfor q4_0 quantization
Where direction is en-indic, indic-en, or indic-indic, and size is dist-200M, dist-320M, or 1B.
Bergamot
Bergamot models (Firefox Translations) are available via Hyperdrive or as local files.
Option 1: Using Hyperdrive (Recommended)
const HyperdriveDL = require('@qvac/dl-hyperdrive')
const hdDL = new HyperdriveDL({
key: 'hd://a8811fb494e4aee45ca06a011703a25df5275e5dfa59d6217f2d430c677f9fa6' // en-it Bergamot (BERGAMOT_ENIT)
})
const args = {
loader: hdDL,
params: {
mode: 'full',
srcLang: 'en', // Source language (ISO 639-1 code)
dstLang: 'it' // Target language (ISO 639-1 code)
},
diskPath: './models/bergamot-en-it', // Unique directory per model
modelName: 'model.enit.intgemm.alphas.bin' // Model file from Hyperdrive
}Option 2: Using Local Files
const fs = require('bare-fs')
const path = require('bare-path')
// Path to your locally downloaded Bergamot model directory
const bergamotPath = './models/bergamot-en-it'
const localLoader = {
ready: async () => {},
close: async () => {},
download: async (filename) => {
return fs.readFileSync(path.join(bergamotPath, filename))
},
getFileSize: async (filename) => {
const stats = fs.statSync(path.join(bergamotPath, filename))
return stats.size
}
}
const args = {
loader: localLoader,
params: {
mode: 'full',
srcLang: 'en',
dstLang: 'it'
},
diskPath: bergamotPath,
modelName: 'model.enit.intgemm.alphas.bin'
}Bergamot Model Files by Language Pair:
| Language Pair | Hyperdrive Key | Model File | Vocab File(s) |
|---|---|---|---|
| en→it | a8811fb494e4aee4... | model.enit.intgemm.alphas.bin | vocab.enit.spm |
| it→en | 3b4be93d19dd9e9e... | model.iten.intgemm.alphas.bin | vocab.iten.spm |
| en→es | bf46f9b51d04f561... | model.enes.intgemm.alphas.bin | vocab.enes.spm |
| es→en | c3e983c8db3f64fa... | model.esen.intgemm.alphas.bin | vocab.esen.spm |
| en→fr | 0a4f388c0449b777... | model.enfr.intgemm.alphas.bin | vocab.enfr.spm |
| fr→en | 7a9b38b0c4637b2e... | model.fren.intgemm.alphas.bin | (see registry) |
| en→de | (see Bergamot section in registry) | model.ende.intgemm.alphas.bin | vocab.ende.spm |
| en→ru | 404279d9716f3191... | model.enru.intgemm.alphas.bin | vocab.enru.spm |
| ru→en | dad7f99c8d8c1723... | model.ruen.intgemm.alphas.bin | vocab.ruen.spm |
| en→zh | 15d484200acea8b1... | model.enzh.intgemm.alphas.bin | srcvocab.enzh.spm, trgvocab.enzh.spm |
| zh→en | 17eb4c3fcd23ac3c... | model.zhen.intgemm.alphas.bin | vocab.zhen.spm |
| en→ja | ac0b883d176ea3b1... | model.enja.intgemm.alphas.bin | srcvocab.enja.spm, trgvocab.enja.spm |
| ja→en | 85012ed3c3ff5c2b... | model.jaen.intgemm.alphas.bin | vocab.jaen.spm |
Key Parameters:
| Parameter | Description | Example |
|---|---|---|
srcLang | Source language (ISO 639-1) | 'en', 'es', 'de' |
dstLang | Target language (ISO 639-1) | 'it', 'fr', 'de' |
modelName | Model weights file | 'model.enit.intgemm.alphas.bin' |
srcVocabName | Required in config: Source vocab file | 'vocab.enit.spm' or 'srcvocab.enja.spm' |
dstVocabName | Required in config: Target vocab file | 'vocab.enit.spm' or 'trgvocab.enja.spm' |
modelType | Required in config: TranslationNmtcpp.ModelTypes.Bergamot | - |
Bergamot model file naming convention:
model.{srctgt}.intgemm.alphas.bin- Model weights (e.g.,model.enit.intgemm.alphas.bin)vocab.{srctgt}.spm- Shared vocabulary for most language pairssrcvocab.{srctgt}.spm+trgvocab.{srctgt}.spm- Separate vocabs for CJK languages (zh, ja)
`diskPath` Configuration
Use a unique directory per model to avoid file conflicts when using multiple models:
./models/opus-en-itfor OPUS English→Italian./models/indic-en-hi-200Mfor IndicTrans English→Hindi./models/bergamot-en-itfor Bergamot English→Italian
The list of supported languages for the srcLang and dstLang parameters differ by model type.
3. Create the config object
The config object contains two types of parameters:
- Model-specific parameters (required for some backends)
- Generation/decoding parameters (optional, controls output quality)
Model-Specific Parameters
| Parameter | OPUS/Marian | IndicTrans2 | Bergamot |
|---|---|---|---|
modelType | Not needed (default) | Required | Required |
srcVocabName | Not needed | Not needed | Required |
dstVocabName | Not needed | Not needed | Required |
Generation/Decoding Parameters (OPUS/IndicTrans Only)
These parameters control how the model generates output. Note: Full parameter support is only available for OPUS/Marian and IndicTrans2 models. Bergamot has limited parameter support.
// Generation parameters for OPUS/Marian and IndicTrans2
const generationParams = {
beamsize: 4, // Beam search width (>=1). 1 disables beam search
lengthpenalty: 0.6, // Length normalization strength (>=0)
maxlength: 128, // Maximum generated tokens (>0)
repetitionpenalty: 1.2, // Penalize previously generated tokens (0..2)
norepeatngramsize: 2, // Disallow repeating n-grams of this size (0..10)
temperature: 0.8, // Sampling temperature [0..2]
topk: 40, // Keep top-K logits [0..vocab_size]
topp: 0.9 // Nucleus sampling threshold (0 < p <= 1)
}4. Create Model Instance
Import TranslationNmtcpp and create an instance by combining args (from Step 2) with config parameters (from Step 3):
const TranslationNmtcpp = require('@qvac/translation-nmtcpp')OPUS/Marian (Default)
// OPUS - combine generation parameters (modelType defaults to Opus)
const config = {
...generationParams, // Spread generation params from Step 3
beamsize: 4, // Or override specific values
maxlength: 128
}
const model = new TranslationNmtcpp(args, config)IndicTrans2
// IndicTrans - must specify modelType + generation parameters
const config = {
modelType: TranslationNmtcpp.ModelTypes.IndicTrans,
...generationParams, // Spread generation params from Step 3
maxlength: 256 // Override for longer outputs
}
const model = new TranslationNmtcpp(args, config)Bergamot
// Bergamot - must specify modelType, vocab files (limited generation params support)
const config = {
modelType: TranslationNmtcpp.ModelTypes.Bergamot,
srcVocabName: 'vocab.enit.spm', // Required: source vocabulary file
dstVocabName: 'vocab.enit.spm', // Required: target vocabulary file
beamsize: 4 // Only beamsize supported for Bergamot
}
const model = new TranslationNmtcpp(args, config)Available Model Types:
TranslationNmtcpp.ModelTypes = {
Opus: 'Opus', // Default - Marian OPUS models
IndicTrans: 'IndicTrans', // Indic language models
Bergamot: 'Bergamot' // Firefox Translations models
}5. Load Model
try {
// Basic usage
await model.load()
} catch (error) {
console.error('Failed to load model:', error)
}6. Run the Model
We can perform inference on the input text using the run() method. This method returns a QVACResponse object.
try {
// Execute translation on input text
const response = await model.run('Hello world! Welcome to the internet of peers!')
// Process streamed output using callback
await response
.onUpdate(outputChunk => {
// Handle each new piece of translated text
console.log(outputChunk)
})
.await() // Wait for translation to complete
// Access performance statistics (if enabled with opts.stats)
if (response.stats) {
console.log('Translation completed in:', response.stats.totalTime, 'ms')
}
} catch (error) {
console.error('Translation failed:', error)
}7. Batch Translation (Bergamot Only)
For translating multiple texts efficiently, use the runBatch() method instead of calling run() multiple times.
runBatch() is only available with the Bergamot backend. OPUS/Marian and IndicTrans2 models should use sequential run() calls.
// Array of texts to translate (English)
const textsToTranslate = [
'Hello world!',
'How are you today?',
'Machine translation has revolutionized communication.'
]
try {
// Batch translation - returns array of translated strings
const translations = await model.runBatch(textsToTranslate)
// Output each translation
translations.forEach((translatedText, index) => {
console.log(`Original: ${textsToTranslate[index]}`)
console.log(`Translated: ${translatedText}\n`)
})
} catch (error) {
console.error('Batch translation failed:', error)
}runBatch() vs run():
| Method | Input | Output | Backend Support |
|---|---|---|---|
run(text) | Single string | QVACResponse with streaming | All (OPUS, IndicTrans, Bergamot) |
runBatch(texts) | Array of strings | Array of strings | Bergamot only |
runBatch() is significantly faster when translating multiple texts as it processes them in a single batch operation.
8. Unload the Model
// Always unload the model when finished to free memory
try {
await model.unload()
} catch (error) {
console.error('Failed to unload model:', error)
}Supported Languages
Marian/OPUS Models (Hyperdrive)
The following language pairs are available via Hyperdrive.
Core European Languages (with cross-language support):
| Language | Code | Supported Pairs | Hyperdrive |
|---|---|---|---|
| English | en | ↔ de, es, it, fr, pt, ru, ar, ja, zh | Yes |
| German | de | ↔ en, es, it, fr | Yes |
| Spanish | es | ↔ en, de, it, fr | Yes |
| Italian | it | ↔ en, de, es | Yes |
| French | fr | ↔ en, de, es | Yes |
Other Languages (English ↔ X):
| Language | Code | Hyperdrive |
|---|---|---|
| Portuguese | pt | Yes |
| Russian | ru | Yes |
| Arabic | ar | Yes |
| Japanese | ja | Yes |
| Chinese | zh | Yes |
Legend: ↔ = bidirectional support available in Hyperdrive.
The OPUS project supports many more language pairs. Only the pairs listed above are currently available via Hyperdrive. Additional models may be added in future updates.
IndicTrans2 Models (Hyperdrive)
IndicTrans2 supports translation between English and 22 Indic languages. The following directions are available via Hyperdrive:
| Direction | Hyperdrive Keys | Sizes |
|---|---|---|
| English → Indic | Yes | 200M, 1B |
| Indic → English | Yes | 200M, 1B |
| Indic → Indic | Yes | 320M, 1B |
Supported Indic Languages:
| Assamese (asm_Beng) | Kashmiri (Arabic) (kas_Arab) | Punjabi (pan_Guru) |
| Bengali (ben_Beng) | Kashmiri (Devanagari) (kas_Deva) | Sanskrit (san_Deva) |
| Bodo (brx_Deva) | Maithili (mai_Deva) | Santali (sat_Olck) |
| Dogri (doi_Deva) | Malayalam (mal_Mlym) | Sindhi (Arabic) (snd_Arab) |
| English (eng_Latn) | Marathi (mar_Deva) | Sindhi (Devanagari) (snd_Deva) |
| Konkani (gom_Deva) | Manipuri (Bengali) (mni_Beng) | Tamil (tam_Taml) |
| Gujarati (guj_Gujr) | Manipuri (Meitei) (mni_Mtei) | Telugu (tel_Telu) |
| Hindi (hin_Deva) | Nepali (npi_Deva) | Urdu (urd_Arab) |
| Kannada (kan_Knda) | Odia (ory_Orya) |
Bergamot Models (Firefox Translations)
Language pairs available via Hyperdrive:
| Language | Code | en→X | X→en |
|---|---|---|---|
| Arabic | ar | Yes | Yes |
| Czech | cs | Yes | Yes |
| Spanish | es | Yes | Yes |
| French | fr | Yes | Yes |
| Italian | it | Yes | Yes |
| Japanese | ja | Yes | Yes |
| Portuguese | pt | Yes | Yes |
| Russian | ru | Yes | Yes |
| Chinese | zh | Yes | Yes |
The Bergamot backend supports all language pairs available in Firefox Translations. See the Firefox Translations models repository for the complete and up-to-date list of supported language pairs. Download Firefox Translations models locally only if your language pair is not available via Hyperdrive.
ModelClasses and Packages
ModelClass
The main class exported by this library is TranslationNmtcpp, which supports multiple translation backends:
const TranslationNmtcpp = require('@qvac/translation-nmtcpp')
// Available model types
TranslationNmtcpp.ModelTypes = {
IndicTrans: 'IndicTrans', // For Indic language translations
Opus: 'Opus', // For Marian OPUS models
Bergamot: 'Bergamot' // For Bergamot/Firefox translations
}Available Packages
Main Package
| Package | Description | Backends | Languages |
|---|---|---|---|
@qvac/translation-nmtcpp | Main translation package | OPUS, Bergamot, IndicTrans | See Supported Languages |
The main package supports all three backends and all their respective languages. See Supported Languages for the complete list.
Logging
The library supports configurable logging for both JavaScript and C++ (native) components. By default, C++ logs are suppressed for cleaner output.
Enabling C++ Logs
To enable verbose C++ logging, pass a logger object in the args parameter:
// Enable C++ logging
const logger = {
info: (msg) => console.log('[C++ INFO]', msg),
warn: (msg) => console.warn('[C++ WARN]', msg),
error: (msg) => console.error('[C++ ERROR]', msg),
debug: (msg) => console.log('[C++ DEBUG]', msg)
}
const args = {
loader: hdDL,
params: { mode: 'full', srcLang: 'en', dstLang: 'it' },
diskPath: './models/opus-en-it',
modelName: 'model.bin',
logger // Pass logger to enable C++ logs
}Disabling C++ Logs
To suppress all C++ logs, either omit the logger parameter or set it to null:
const args = {
loader: hdDL,
params: { mode: 'full', srcLang: 'en', dstLang: 'it' },
diskPath: './models/opus-en-it',
modelName: 'model.bin'
// No logger = suppress C++ logs
}Using Environment Variables (Recommended for Examples)
All examples support the VERBOSE environment variable:
# Run with C++ logging disabled (default)
bare examples/example.hd.js
# Run with C++ logging enabled
VERBOSE=1 bare examples/example.hd.jsLog Levels
The C++ backend supports these log levels (mapped from native priority):
| Priority | Level | Description |
|---|---|---|
| 0 | error | Critical errors |
| 1 | warn | Warnings |
| 2 | info | Informational messages |
| 3 | debug | Debug/trace messages |