Transcription
Automatic speech recognition (ASR) for speech-to-text — i.e., generate text transcriptions from audio input.
Overview
Transcription uses whisper.cpp as inference engine. Load any supported model using modelType: "whisper". Then, provide audio input as audioChunk, either as a file path (string) or an in-memory audio buffer.
Optionally (recommended), also pass an ancillary voice activity detection (VAD) model (e.g., Silero) to improve speech segmentation.
transcribe() returns the full transcription as a single string. If you need partial results as they become available, use transcribeStream() to receive text chunks in real-time.
Functions
Use the following sequence of function calls:
For how to use each function, see SDK — API reference.
Models
You should load two models:
- a
whisper.cpp-compatible model for transcription. Model file format:*.bin; and - a VAD model (e.g., Silero) converted to GGML. Model file format:
*.bin(optional, recommended).
For models available as constants, see SDK — Models.
Example
The following script shows an example of transcription:
import { loadModel, unloadModel, transcribe, WHISPER_TINY } from "@qvac/sdk";
try {
console.log("🎤 Starting Whisper transcription with prompt example...");
// Load the Whisper model
console.log("📥 Loading Whisper model...");
const modelId = await loadModel({
modelSrc: WHISPER_TINY,
modelType: "whisper",
modelConfig: {
audio_format: "f32le",
// Sampling strategy
strategy: "greedy",
n_threads: 4,
// Transcription options
language: "en",
translate: false,
no_timestamps: false,
single_segment: false,
print_timestamps: true,
token_timestamps: true,
// Quality settings
temperature: 0.0,
suppress_blank: true,
suppress_nst: true,
// Advanced tuning
entropy_thold: 2.4,
logprob_thold: -1.0,
// VAD configuration
vad_params: {
threshold: 0.35,
min_speech_duration_ms: 200,
min_silence_duration_ms: 150,
max_speech_duration_s: 30.0,
speech_pad_ms: 600,
samples_overlap: 0.3,
},
// Context parameters for GPU
contextParams: {
use_gpu: true,
flash_attn: true,
gpu_device: 0,
},
},
onProgress: (progress) => {
console.log(progress);
},
});
console.log(`✅ Whisper model loaded with ID: ${modelId}`);
// Perform transcription
console.log("🎧 Transcribing audio...");
const text = await transcribe({
modelId,
audioChunk: "examples/audio/sample-16khz.wav",
prompt: "This is a test recording with clear speech and proper punctuation.",
});
console.log("📝 Transcription result:");
console.log(text);
// Unload the model when done
console.log("🧹 Unloading Whisper model...");
await unloadModel({ modelId });
console.log("✅ Whisper model unloaded successfully");
process.exit(0);
}
catch (error) {
console.error("❌ Error:", error);
process.exit(1);
}Tip: all examples throughout this documentation are self-contained and runnable. For instructions on how to run them, see SDK quickstart.