Transcription concerns and needs | Characteristics of Whisper | Characteristics of Vink and additional needs |
---|---|---|
Resources, infrastructure, and costs | ||
Transcription services are expensive. | Whisper is offered by OpenAI free of cost. | Vink is a free of cost transcription tool using Whisper’s open-source algorithm. |
Transcription software often requires high computing power to operate. | Whisper offers multiple model sizes that require 1–10 GB of RAM, thus can run on average computers, depending on the model size. | Vink conserves this feature from Whisper, allowing selection of model size per user and computer characteristics. |
Safety and privacy | ||
Uploading data for transcription or outsourcing transcripts to a third party raises confidentiality and data protection issues. | Whisper runs locally, thus eliminates the need to share or upload data. | Vink is designed to operate locally without uploading data. |
Quality of transcription | ||
Transcription software is often unavailable in non-Western or less dominant languages. | The same speech models for all languages technically make Whisper usable for everyone, yet differences in performance persist. Audio files with mixed languages can be transcribed. | Accuracy of transcription varies across languages (Table 1). |
Conventional transcription software often requires training on a user’s voice or on exemplary audio data. | The Whisper algorithm has already been trained on big data and is ready for use. | The ‘ready to use’ feature limits the possibilities to adapt the algorithm to individual requirements. |
Conventional transcription software often struggles with accents, mixed use of languages and background noise. | Whisper provides improved robustness to accents, background noise and technical language. | The improved speech recognition comes at the expense of expressions (e.g., laughter) that are excluded from the final transcript. |
Identifying speakers (e.g., interviewer, respondent, multiple participants) is an essential but sometimes challenging feature of transcription. | Whisper does not offer speaker recognition. | Vink currently does not include speaker recognition. Depending on the transcription approach, the user may need to add them manually. |
Other open-source transcription software (Silero, Vosk) only output raw lower-case text. Punctuation models can be applied later in the process, but these are not available for all languages. | Whisper generates transcripts with already integrated punctuation and upper cases regardless of the language. | |
Ease of use | ||
Transcription software should be accessible to researchers without knowledge of software programming. | Whispers requires a programming language (e.g., Python, R), an interpreter and installation of specific packages within the programming software, to operate. | Vink is a downloadable standalone application which includes the necessary packages and tokenizers, reducing the installation requirements and steps. |
Whisper does not have a user interface, which limits its use to people with knowledge of programming (e.g., Python). | Our transcription tool includes an intuitive user interface. |