Speaker diarization and speech recognition in the semi-automatization of audio description: An exploratory study on future possibilities?


  • Héctor Delgado Universitat Autònoma de Barcelona
  • Anna Matamala Universitat Autònoma de Barcelona
  • Javier Serrano Universitat Autònoma de Barcelona




This article presents an overview of the technological components used in the process of audio description, and suggests a new scenario in which speech recognition, machine translation, and text-to-speech, with the corresponding human revision, could be used to increase audio description provision. The article focuses on a process in which both speaker diarization and speech recognition are used in order to obtain a semi-automatic transcription of the audio description track. The technical process is presented and experimental results are summarized.

Biografia do Autor

Héctor Delgado, Universitat Autònoma de Barcelona

BS in Computer Science Engineering by Universidad de Sevilla, Spain, and MS in Multimedia Technologies by Universitat Autònoma de Barcelona, Spain. PhD candidate at the Department of Telecommunications and Systems Engineering at Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Barcelona, Spain.

Anna Matamala, Universitat Autònoma de Barcelona

BA in Translation and Interpreting by Universitat Autònoma de Barcelona, and PhD in Applied Linguistics by Universitat Pompeu Fabra (Barcelona). Tenured senior lecturer at Universitat Autònoma de Barcelona (Spain).

Javier Serrano, Universitat Autònoma de Barcelona

BA in Computer Science (Universitat Autònoma de Barcelona) and PhD in Automatic Control (Computer Science Program, UAB). Associate Professor at Universitat Autònoma de Barcelona.


Como Citar

Delgado, H., Matamala, A., & Serrano, J. (2015). Speaker diarization and speech recognition in the semi-automatization of audio description: An exploratory study on future possibilities?. Cadernos De Tradução, 35(2), 308–324. https://doi.org/10.5007/2175-7968.2015v35n2p308