QVAC Logo

@qvac/translation-nmtcpp

Text-to-text neural machine translation (NMT).

Overview

Bare module that adds support for translation in QVAC using either nmt.cpp or Bergamot as the inference engine.

Models

You should load a model compatible with your chosen inference engine:

  • nmt.cpp (default): OPUS-MT or IndicTrans2, converted to GGML. Model file format: *.bin.
  • Bergamot: Bergamot model bundle. Required files: model *.bin + vocab*.spm.

Requirement

Bare \geq v1.24

Installation

npm i @qvac/translation-nmtcpp

Quickstart

If you don't have Bare runtime, install it:

npm i -g bare

Create a new project:

mkdir qvac-translation-quickstart
cd qvac-translation-quickstart
npm init -y

Install dependencies:

npm i @qvac/translation-nmtcpp @qvac/dl-hyperdrive

Create example.js:

example.js
'use strict'

/**
 * Quickstart Example
 *
 * This example demonstrates both translation backends:
 * 1. GGML backend - Downloads model via HyperdriveDL (English to Italian)
 * 2. Bergamot backend - Uses local model files (requires BERGAMOT_MODEL_PATH)
 *
 * Usage:
 *   bare examples/quickstart.js
 *   BERGAMOT_MODEL_PATH=/path/to/bergamot/model bare examples/quickstart.js
 *
 * Enable verbose C++ logging:
 *   VERBOSE=1 bare examples/quickstart.js
 */

const TranslationNmtcpp = require('@qvac/translation-nmtcpp')
const HyperdriveDL = require('@qvac/dl-hyperdrive')
const fs = require('bare-fs')
const path = require('bare-path')
const process = require('bare-process')

// ============================================================
// LOGGING CONFIGURATION
// Set VERBOSE=1 environment variable to enable C++ debug logs
// ============================================================
const VERBOSE = process.env.VERBOSE === '1' || process.env.VERBOSE === 'true'

const logger = VERBOSE
  ? {
      info: (msg) => console.log('[C++ INFO]', msg),
      warn: (msg) => console.warn('[C++ WARN]', msg),
      error: (msg) => console.error('[C++ ERROR]', msg),
      debug: (msg) => console.log('[C++ DEBUG]', msg)
    }
  : null // null = suppress all C++ logs

const text = 'Machine translation has revolutionized how we communicate across language barriers in the modern digital world.'

async function testGGML () {
  console.log('\n=== Testing GGML Backend ===\n')

  // Create `DataLoader`
  const hdDL = new HyperdriveDL({
    // The hyperdrive key for en-it translation model weights and config
    key: 'hd://9ef58f31c20d5556722e0b58a5d262fd89801daf2e6cb28e3f21ac6e9228088f'
  })

  // Create the `args` object
  const args = {
    loader: hdDL,
    params: { mode: 'full', dstLang: 'it', srcLang: 'en' },
    diskPath: './models',
    modelName: 'model.bin',
    logger // Pass the logger
  }

  // Create Model Instance
  const model = new TranslationNmtcpp(args, { })

  // Load model
  await model.load()

  try {
    // Run the Model
    const response = await model.run(text)

    await response
      .onUpdate(data => {
        console.log(data)
      })
      .await()

    console.log('GGML translation finished!')
  } finally {
    // Unload the model
    await model.unload()

    // Close the DataLoader
    await hdDL.close()
  }
}

async function testBergamot () {
  console.log('\n=== Testing Bergamot Backend ===\n')

  // Use local model path for Bergamot - env var or relative path
  const bergamotPath = process.env.BERGAMOT_MODEL_PATH || './model/bergamot/enit'

  console.log('Model path:', bergamotPath)

  // Check if model directory exists
  if (!fs.existsSync(bergamotPath)) {
    console.log('Bergamot model directory not found, skipping test')
    console.log('Set BERGAMOT_MODEL_PATH env var or place model in ./model/bergamot/enit')
    return
  }

  console.log('Loading model...')

  // Create a local file loader for Bergamot models that are already on disk
  const localLoader = {
    ready: async () => { /* Models already on disk */ },
    close: async () => { /* No resources to close */ },
    download: async (filename) => {
      // Read file from local disk
      const filePath = path.join(bergamotPath, filename)
      return fs.readFileSync(filePath)
    },
    getFileSize: async (filename) => {
      const filePath = path.join(bergamotPath, filename)
      const stats = fs.statSync(filePath)
      return stats.size
    }
  }

  // Create the `args` object for Bergamot
  const args = {
    loader: localLoader,
    params: { mode: 'full', dstLang: 'it', srcLang: 'en' },
    diskPath: bergamotPath,
    modelName: 'model.enit.intgemm.alphas.bin',
    logger // Pass the logger
  }

  // Config with explicit vocab paths for Bergamot
  const config = {
    srcVocabName: 'vocab.enit.spm',
    dstVocabName: 'vocab.enit.spm',
    modelType: TranslationNmtcpp.ModelTypes.Bergamot
  }

  // Create Model Instance
  const model = new TranslationNmtcpp(args, config)

  // Load model
  await model.load()
  console.log('Model loaded successfully!')

  try {
    console.log('Running translation...')
    console.log('Input text:', text)

    // Run the Model
    const response = await model.run(text)

    await response
      .onUpdate(data => {
        console.log('Translation output:', data)
      })
      .await()

    console.log('Bergamot translation finished!')
  } finally {
    console.log('Unloading model...')
    await model.unload()

    // Close the local loader
    await localLoader.close()
    console.log('Done!')
  }
}

async function main () {
  try {
    // Test GGML backend
    await testGGML()

    // Test Bergamot backend
    await testBergamot()

    console.log('\n=== All Tests Completed Successfully! ===\n')
  } catch (error) {
    console.error('Test failed:', error)
    throw error
  }
}

main()

Run example.js:

bare example.js

Usage

The library provides a straightforward and intuitive workflow for translating text. Irrespective of the chosen model, the workflow remains the same:

1. Create DataLoader

In QVAC, the DataLoader class provides an interface for fetching model weights and other resources crucial for running AI Models. A DataLoader instance is required to successfully instantiate a ModelClass. We can create a HyperdriveDL using the following code.

const HyperdriveDL = require('@qvac/dl-hyperdrive')

const hdDL = new HyperdriveDL({
  key: 'hd://528eb43b34c57b0fb7116e532cd596a9661b001870bdabf696243e8d079a74ca' // (Required) Hyperdrive key with 'hd://' prefix (raw hex also works)
  // store: corestore // (Optional) A Corestore instance for persistent storage. See Glossary for details.
})

It is extremely important that you provide the correct key when using a HyperdriveDataLoader. A DataLoader with model weights and settings for an en-it translation can obviously not be utilized for doing a de-en translation. Please ensure that the key being used aligns with the model (package) installed and the translation requirement.

2. Create the args object

The args object contains the DataLoader we created in the previous step and other translation parameters that control how the translation model operates, including which languages to translate between and what performance metrics to collect.

The structure varies slightly depending on which backend you're using.

OPUS/Marian

For European language translations using OPUS models from Hyperdrive:

const HyperdriveDL = require('@qvac/dl-hyperdrive')

const hdDL = new HyperdriveDL({
  key: 'hd://528eb43b34c57b0fb7116e532cd596a9661b001870bdabf696243e8d079a74ca' // en-it model (MARIAN_OPUS_EN_IT)
})

const args = {
  loader: hdDL,
  params: {
    mode: 'full',      // Model loading mode (full is recommended)
    srcLang: 'en',     // Source language (ISO 639-1 code)
    dstLang: 'it'      // Target language (ISO 639-1 code)
  },
  diskPath: './models/opus-en-it',  // Unique directory per model
  modelName: 'model.bin'            // Always 'model.bin' for OPUS models
}

Key Parameters:

ParameterDescriptionExample
srcLangSource language (ISO 639-1)'en', 'de', 'it', 'es', 'fr'
dstLangTarget language (ISO 639-1)'en', 'de', 'it', 'es', 'fr'
modelNameAlways 'model.bin''model.bin'

IndicTrans2

For Indic language translations (English ↔ Hindi, Bengali, Tamil, etc.):

const HyperdriveDL = require('@qvac/dl-hyperdrive')

const hdDL = new HyperdriveDL({
  key: 'hd://8c0f50e7c75527213a090d2f1dcd9dbdb8262e5549c8cbbb74cb7cb12b156892' // en-hi 200M model (MARIAN_EN_HI_INDIC_200M_Q0F32)
})

const args = {
  loader: hdDL,
  params: {
    mode: 'full',
    srcLang: 'eng_Latn',   // Source language (ISO 15924 code)
    dstLang: 'hin_Deva'    // Target language (ISO 15924 code)
  },
  diskPath: './models/indic-en-hi-200M',              // Unique directory per model
  modelName: 'ggml-indictrans2-en-indic-dist-200M.bin' // Must match exact filename in Hyperdrive
}

Key Parameters:

ParameterDescriptionExample
srcLangSource language (ISO 15924)'eng_Latn', 'hin_Deva', 'ben_Beng'
dstLangTarget language (ISO 15924)'eng_Latn', 'hin_Deva', 'tam_Taml'
modelNameSpecific filename per model'ggml-indictrans2-en-indic-dist-200M.bin'
modelTypeRequired: TranslationNmtcpp.ModelTypes.IndicTrans-

IndicTrans2 model naming pattern:

  • ggml-indictrans2-{direction}-{size}.bin for q0f32 quantization
  • ggml-indictrans2-{direction}-{size}-q0f16.bin for q0f16 quantization
  • ggml-indictrans2-{direction}-{size}-q4_0.bin for q4_0 quantization

Where direction is en-indic, indic-en, or indic-indic, and size is dist-200M, dist-320M, or 1B.

Bergamot

Bergamot models (Firefox Translations) are available via Hyperdrive or as local files.

Option 1: Using Hyperdrive (Recommended)

const HyperdriveDL = require('@qvac/dl-hyperdrive')

const hdDL = new HyperdriveDL({
  key: 'hd://a8811fb494e4aee45ca06a011703a25df5275e5dfa59d6217f2d430c677f9fa6' // en-it Bergamot (BERGAMOT_ENIT)
})

const args = {
  loader: hdDL,
  params: {
    mode: 'full',
    srcLang: 'en',    // Source language (ISO 639-1 code)
    dstLang: 'it'     // Target language (ISO 639-1 code)
  },
  diskPath: './models/bergamot-en-it',           // Unique directory per model
  modelName: 'model.enit.intgemm.alphas.bin'     // Model file from Hyperdrive
}

Option 2: Using Local Files

const fs = require('bare-fs')
const path = require('bare-path')

// Path to your locally downloaded Bergamot model directory
const bergamotPath = './models/bergamot-en-it'

const localLoader = {
  ready: async () => {},
  close: async () => {},
  download: async (filename) => {
    return fs.readFileSync(path.join(bergamotPath, filename))
  },
  getFileSize: async (filename) => {
    const stats = fs.statSync(path.join(bergamotPath, filename))
    return stats.size
  }
}

const args = {
  loader: localLoader,
  params: {
    mode: 'full',
    srcLang: 'en',
    dstLang: 'it'
  },
  diskPath: bergamotPath,
  modelName: 'model.enit.intgemm.alphas.bin'
}

Bergamot Model Files by Language Pair:

Language PairHyperdrive KeyModel FileVocab File(s)
en→ita8811fb494e4aee4...model.enit.intgemm.alphas.binvocab.enit.spm
it→en3b4be93d19dd9e9e...model.iten.intgemm.alphas.binvocab.iten.spm
en→esbf46f9b51d04f561...model.enes.intgemm.alphas.binvocab.enes.spm
es→enc3e983c8db3f64fa...model.esen.intgemm.alphas.binvocab.esen.spm
en→fr0a4f388c0449b777...model.enfr.intgemm.alphas.binvocab.enfr.spm
fr→en7a9b38b0c4637b2e...model.fren.intgemm.alphas.bin(see registry)
en→de(see Bergamot section in registry)model.ende.intgemm.alphas.binvocab.ende.spm
en→ru404279d9716f3191...model.enru.intgemm.alphas.binvocab.enru.spm
ru→endad7f99c8d8c1723...model.ruen.intgemm.alphas.binvocab.ruen.spm
en→zh15d484200acea8b1...model.enzh.intgemm.alphas.binsrcvocab.enzh.spm, trgvocab.enzh.spm
zh→en17eb4c3fcd23ac3c...model.zhen.intgemm.alphas.binvocab.zhen.spm
en→jaac0b883d176ea3b1...model.enja.intgemm.alphas.binsrcvocab.enja.spm, trgvocab.enja.spm
ja→en85012ed3c3ff5c2b...model.jaen.intgemm.alphas.binvocab.jaen.spm

Key Parameters:

ParameterDescriptionExample
srcLangSource language (ISO 639-1)'en', 'es', 'de'
dstLangTarget language (ISO 639-1)'it', 'fr', 'de'
modelNameModel weights file'model.enit.intgemm.alphas.bin'
srcVocabNameRequired in config: Source vocab file'vocab.enit.spm' or 'srcvocab.enja.spm'
dstVocabNameRequired in config: Target vocab file'vocab.enit.spm' or 'trgvocab.enja.spm'
modelTypeRequired in config: TranslationNmtcpp.ModelTypes.Bergamot-

Bergamot model file naming convention:

  • model.{srctgt}.intgemm.alphas.bin - Model weights (e.g., model.enit.intgemm.alphas.bin)
  • vocab.{srctgt}.spm - Shared vocabulary for most language pairs
  • srcvocab.{srctgt}.spm + trgvocab.{srctgt}.spm - Separate vocabs for CJK languages (zh, ja)

`diskPath` Configuration

Use a unique directory per model to avoid file conflicts when using multiple models:

  • ./models/opus-en-it for OPUS English→Italian
  • ./models/indic-en-hi-200M for IndicTrans English→Hindi
  • ./models/bergamot-en-it for Bergamot English→Italian

The list of supported languages for the srcLang and dstLang parameters differ by model type.

3. Create the config object

The config object contains two types of parameters:

  1. Model-specific parameters (required for some backends)
  2. Generation/decoding parameters (optional, controls output quality)

Model-Specific Parameters

ParameterOPUS/MarianIndicTrans2Bergamot
modelTypeNot needed (default)RequiredRequired
srcVocabNameNot neededNot neededRequired
dstVocabNameNot neededNot neededRequired

Generation/Decoding Parameters (OPUS/IndicTrans Only)

These parameters control how the model generates output. Note: Full parameter support is only available for OPUS/Marian and IndicTrans2 models. Bergamot has limited parameter support.

// Generation parameters for OPUS/Marian and IndicTrans2
const generationParams = {
  beamsize: 4,            // Beam search width (>=1). 1 disables beam search
  lengthpenalty: 0.6,     // Length normalization strength (>=0)
  maxlength: 128,         // Maximum generated tokens (>0)
  repetitionpenalty: 1.2, // Penalize previously generated tokens (0..2)
  norepeatngramsize: 2,   // Disallow repeating n-grams of this size (0..10)
  temperature: 0.8,       // Sampling temperature [0..2]
  topk: 40,               // Keep top-K logits [0..vocab_size]
  topp: 0.9               // Nucleus sampling threshold (0 < p <= 1)
}

4. Create Model Instance

Import TranslationNmtcpp and create an instance by combining args (from Step 2) with config parameters (from Step 3):

const TranslationNmtcpp = require('@qvac/translation-nmtcpp')

OPUS/Marian (Default)

// OPUS - combine generation parameters (modelType defaults to Opus)
const config = {
  ...generationParams,  // Spread generation params from Step 3
  beamsize: 4,          // Or override specific values
  maxlength: 128
}

const model = new TranslationNmtcpp(args, config)

IndicTrans2

// IndicTrans - must specify modelType + generation parameters
const config = {
  modelType: TranslationNmtcpp.ModelTypes.IndicTrans,
  ...generationParams,  // Spread generation params from Step 3
  maxlength: 256        // Override for longer outputs
}

const model = new TranslationNmtcpp(args, config)

Bergamot

// Bergamot - must specify modelType, vocab files (limited generation params support)
const config = {
  modelType: TranslationNmtcpp.ModelTypes.Bergamot,
  srcVocabName: 'vocab.enit.spm',    // Required: source vocabulary file
  dstVocabName: 'vocab.enit.spm',    // Required: target vocabulary file
  beamsize: 4                        // Only beamsize supported for Bergamot
}

const model = new TranslationNmtcpp(args, config)

Available Model Types:

TranslationNmtcpp.ModelTypes = {
  Opus: 'Opus',           // Default - Marian OPUS models
  IndicTrans: 'IndicTrans', // Indic language models  
  Bergamot: 'Bergamot'    // Firefox Translations models
}

5. Load Model

try {
  // Basic usage
  await model.load()
} catch (error) {
  console.error('Failed to load model:', error)
}

6. Run the Model

We can perform inference on the input text using the run() method. This method returns a QVACResponse object.

try {
  // Execute translation on input text
  const response = await model.run('Hello world! Welcome to the internet of peers!')

  // Process streamed output using callback
  await response
    .onUpdate(outputChunk => {
      // Handle each new piece of translated text
      console.log(outputChunk)
    })
    .await() // Wait for translation to complete

  // Access performance statistics (if enabled with opts.stats)
  if (response.stats) {
    console.log('Translation completed in:', response.stats.totalTime, 'ms')
  }
} catch (error) {
  console.error('Translation failed:', error)
}

7. Batch Translation (Bergamot Only)

For translating multiple texts efficiently, use the runBatch() method instead of calling run() multiple times.

runBatch() is only available with the Bergamot backend. OPUS/Marian and IndicTrans2 models should use sequential run() calls.

// Array of texts to translate (English)
const textsToTranslate = [
  'Hello world!',
  'How are you today?',
  'Machine translation has revolutionized communication.'
]

try {
  // Batch translation - returns array of translated strings
  const translations = await model.runBatch(textsToTranslate)

  // Output each translation
  translations.forEach((translatedText, index) => {
    console.log(`Original: ${textsToTranslate[index]}`)
    console.log(`Translated: ${translatedText}\n`)
  })
} catch (error) {
  console.error('Batch translation failed:', error)
}

runBatch() vs run():

MethodInputOutputBackend Support
run(text)Single stringQVACResponse with streamingAll (OPUS, IndicTrans, Bergamot)
runBatch(texts)Array of stringsArray of stringsBergamot only

runBatch() is significantly faster when translating multiple texts as it processes them in a single batch operation.

8. Unload the Model

// Always unload the model when finished to free memory
try {
  await model.unload()
} catch (error) {
  console.error('Failed to unload model:', error)
}

Supported Languages

Marian/OPUS Models (Hyperdrive)

The following language pairs are available via Hyperdrive.

Core European Languages (with cross-language support):

LanguageCodeSupported PairsHyperdrive
Englishen↔ de, es, it, fr, pt, ru, ar, ja, zhYes
Germande↔ en, es, it, frYes
Spanishes↔ en, de, it, frYes
Italianit↔ en, de, esYes
Frenchfr↔ en, de, esYes

Other Languages (English ↔ X):

LanguageCodeHyperdrive
PortugueseptYes
RussianruYes
ArabicarYes
JapanesejaYes
ChinesezhYes

Legend: = bidirectional support available in Hyperdrive.

The OPUS project supports many more language pairs. Only the pairs listed above are currently available via Hyperdrive. Additional models may be added in future updates.

IndicTrans2 Models (Hyperdrive)

IndicTrans2 supports translation between English and 22 Indic languages. The following directions are available via Hyperdrive:

DirectionHyperdrive KeysSizes
English → IndicYes200M, 1B
Indic → EnglishYes200M, 1B
Indic → IndicYes320M, 1B

Supported Indic Languages:

Assamese (asm_Beng)Kashmiri (Arabic) (kas_Arab)Punjabi (pan_Guru)
Bengali (ben_Beng)Kashmiri (Devanagari) (kas_Deva)Sanskrit (san_Deva)
Bodo (brx_Deva)Maithili (mai_Deva)Santali (sat_Olck)
Dogri (doi_Deva)Malayalam (mal_Mlym)Sindhi (Arabic) (snd_Arab)
English (eng_Latn)Marathi (mar_Deva)Sindhi (Devanagari) (snd_Deva)
Konkani (gom_Deva)Manipuri (Bengali) (mni_Beng)Tamil (tam_Taml)
Gujarati (guj_Gujr)Manipuri (Meitei) (mni_Mtei)Telugu (tel_Telu)
Hindi (hin_Deva)Nepali (npi_Deva)Urdu (urd_Arab)
Kannada (kan_Knda)Odia (ory_Orya)

Bergamot Models (Firefox Translations)

Language pairs available via Hyperdrive:

LanguageCodeen→XX→en
ArabicarYesYes
CzechcsYesYes
SpanishesYesYes
FrenchfrYesYes
ItalianitYesYes
JapanesejaYesYes
PortugueseptYesYes
RussianruYesYes
ChinesezhYesYes

The Bergamot backend supports all language pairs available in Firefox Translations. See the Firefox Translations models repository for the complete and up-to-date list of supported language pairs. Download Firefox Translations models locally only if your language pair is not available via Hyperdrive.

ModelClasses and Packages

ModelClass

The main class exported by this library is TranslationNmtcpp, which supports multiple translation backends:

const TranslationNmtcpp = require('@qvac/translation-nmtcpp')

// Available model types
TranslationNmtcpp.ModelTypes = {
  IndicTrans: 'IndicTrans',  // For Indic language translations
  Opus: 'Opus',              // For Marian OPUS models
  Bergamot: 'Bergamot'       // For Bergamot/Firefox translations
}

Available Packages

Main Package

PackageDescriptionBackendsLanguages
@qvac/translation-nmtcppMain translation packageOPUS, Bergamot, IndicTransSee Supported Languages

The main package supports all three backends and all their respective languages. See Supported Languages for the complete list.

Logging

The library supports configurable logging for both JavaScript and C++ (native) components. By default, C++ logs are suppressed for cleaner output.

Enabling C++ Logs

To enable verbose C++ logging, pass a logger object in the args parameter:

// Enable C++ logging
const logger = {
  info: (msg) => console.log('[C++ INFO]', msg),
  warn: (msg) => console.warn('[C++ WARN]', msg),
  error: (msg) => console.error('[C++ ERROR]', msg),
  debug: (msg) => console.log('[C++ DEBUG]', msg)
}

const args = {
  loader: hdDL,
  params: { mode: 'full', srcLang: 'en', dstLang: 'it' },
  diskPath: './models/opus-en-it',
  modelName: 'model.bin',
  logger  // Pass logger to enable C++ logs
}

Disabling C++ Logs

To suppress all C++ logs, either omit the logger parameter or set it to null:

const args = {
  loader: hdDL,
  params: { mode: 'full', srcLang: 'en', dstLang: 'it' },
  diskPath: './models/opus-en-it',
  modelName: 'model.bin'
  // No logger = suppress C++ logs
}

All examples support the VERBOSE environment variable:

# Run with C++ logging disabled (default)
bare examples/example.hd.js

# Run with C++ logging enabled
VERBOSE=1 bare examples/example.hd.js

Log Levels

The C++ backend supports these log levels (mapped from native priority):

PriorityLevelDescription
0errorCritical errors
1warnWarnings
2infoInformational messages
3debugDebug/trace messages

More resources

Package at npm

On this page