From voice to ink (Vink): development and assessment of an automated, free-of-charge transcription tool

Tolle, Hannah; Castro, Maria del Mar; Wachinger, Jonas; Putri, Agrin Zauyani; Kempf, Dominic; Denkinger, Claudia M.; McMahon, Shannon A.

doi:10.1186/s13104-024-06749-0

BMC Research Notes

Table 1 Word error rate and time-needed-to-correct of Vink-generated transcriptsWould it be possible for the table to not be in alternating white and blue rows? Furthermore the formatting is off, there should be 1 row per language, formatted the way that the row for "Arabic (Classic)" already is.

From: From voice to ink (Vink): development and assessment of an automated, free-of-charge transcription tool

Language	Audio length (minutes)	Audio characteristics		Time-needed-to-correct (minutes)		Total words	Word Error Rate (WER)
American English	06:50	Number of speakers	2	17		854	WER	6.6%
		Sex	F, M				Substitutions	7
		Background noise¹	Medium				Insertions	50
							Deletions	0
Arabic (Classical Arabic)	03:06	Number of Speakers Sex Background noise	1 F Low	27.5		363	WER Substitutions Insertions Deletions	15.2% 7 20 28
Bahasa Indonesia	05:12	Number of speakers	2	10		465	WER	7.95%
		Sex	F, F				Substitutions	10
		Background noise	Medium				Insertions	22
							Deletions	5
Burmese	05:05	Number of speakers Sex Background noise	3 M, M, F High		Transcript is nonsensical
Chinese	05:01	Number of speakers	1	12		950	WER	0.95%
		Sex	F				Substitutions	8
		Background noise	Low				Insertions	1
							Deletions	0
Filipino	5:00	Number of speakers	2	19		1343	WER	7.80%
		Sex	F, GNB²				Substitutions	56
		Background noise	Medium				Insertions	5
							Deletions	45
French	04:09	Number of speakers	2	19:57		611	WER	24%
		Sex	F, M				Substitutions	15
		Background noise	Medium				Insertions	12
							Deletions	122
German	05:00	Number of speakers	2	9:40		676	WER	4.28%
		Sex	F, F				Substitutions	9
		Background noise	Low				Insertions	2
							Deletions	18
Malagasy	04:41	Number of speakers	2	62		351	WER	41%
		Sex	F, M				Substitutions	134
		Background noise	Medium				Insertions	12
							Deletions	5
Portuguese Brazilian	02:19	Number of speakers	2	4		209	WER	1.4%
		Sex	F, M				Substitutions	2
		Background noise	Medium				Insertions	1
							Deletions	0
Spanish Colombian	06:31	Number of speakers	2	36:46		1111	WER	14.5%
		Sex	F, F				Substitutions	34
		Background noise	Low				Insertions	21
							Deletions	107
Tamil	04:32	Number of speakers	1	72		221	WER	79.8%
		Sex	M				Substitutions	45
		Background noise	Low				Insertions	103
							Deletions	54
Turkish	03:19	Number of speakers	1	8		232	WER	4.3%
		Sex	F				Substitutions	3
		Background noise	Low				Insertions	1
							Deletions	6
Yoruba	5:56	Number of speakers	2	20		528	WER	46%
		Sex	F, M				Substitutions	164
		Background noise	Medium				Insertions	36
							Deletions	45

¹Background noise levels were classified ‘low’ in case of close to no background noise, ‘medium’ in case of occasional or faint background noises and ‘high’ if background noises notably impaired understandability of speakers ²GNB: gender non-binary

Back to article page

ISSN: 1756-0500

Contact us

Submission enquiries: bmcresearchnotes@biomedcentral.com
General enquiries: ORSupport@springernature.com