Speaker diarization and speech recognition in the semi-automatization of audio description: An exploratory study on future possibilities?
This article presents an overview of the technological components used in the process of audio description, and suggests a new scenario in which speech recognition, machine translation, and text-to-speech, with the corresponding human revision, could be used to increase audio description provision. The article focuses on a process in which both speaker diarization and speech recognition are used in order to obtain a semi-automatic transcription of the audio description track. The technical process is presented and experimental results are summarized.
ADLAB (2012). Report on user needs assessment. Report no. 1, ADLAB (Audio Description: Lifelong Access to the Blind) project. Retrieved from www.adlabproject.eu.
Álvarez, A.; Mendes, C.; Raffaello, M.; Luis, T.; Paulo, S.; Piccinini, N.; Arzelus, H.; Neto, J.; Aliprandi, C., & Del Pozo, A. (2015). Automating live and batchsubtitling of multimedia contents for several European languages. Multimedia Tools and Applications (MTAP).
Bourne, J., & Jiménez, C. (2007). From the visual to the verbal in two languages: a contrastive analysis of the audio description of The Hours in English and Spanish. In J. Díaz-Cintas, P. Orero, & A. Remael (Eds.), Media for All. Subtitling for the Deaf, Audio Description, and Sign Languages (pp. 175-188). Amsterdam: Rodopi.
Caruso, B. (2012). Audio Description Using Speech Synthesis. In Languages and the Media. 9th International Conference on Language Transfer in Audiovisual Media.Conference Catalogue (pp. 59-60). Berlin: ICWE.
Delgado, H., Fredouille, C., & Serrano, J. (2014). Towards a complete binary key system for the speaker diarization task. Interspeech 2014. Proceedings of the 15th Annual Conference of the International Speech Communication Association (pp. 572-576). Singapore.
Drożdż-Kubik, J. (2011). Harry Potter iKamieńFilozoficznysłowemmalowany – czylibadanieodbiorufilmu z audiodeskrypcją z synteząmowy. MA Thesis. Krakow: Jagiellonian University.
DTV4ALL (2009). Digital Television for All. D2.3. Interim Report on Pilot services. Retrieved from http://dea.brunel.ac.uk/dtv4all/ICT-PSP-224994-D23.pdf
Fernández-Torné, A., & Matamala, A. (2014, November). Machine translation and audio description. Is it worth it? Assessing the post-editing effort. Paper presented at Languages and the Media. 10th International Conference on Languages Transfer in Audiovisual Media, Berlin, Germany.
Fernández-Torné, A., & Matamala, A. (forthcoming). Text-to-speech vs human voiced audio descriptions: a reception study in films dubbed into Catalan. Jostrans. The Journal of Specialised Translation.
Fernández-Torné, A., Matamala, A., & Ortiz-Boix, C. (2012, June). Technology for accessibility in multilingual settings: the way forward in AD? Paper presented at The translation and reception of multilingual films Conference, Montpellier, France. Retrieved from http://ddd.uab.cat/record/117160
Fredouille, C.; Bozonnet, S., & Evans, N.W.D. (2009) The LIA- EURECOM RT‘09 Speaker Diarization System.RT’09, NIST Rich Transcription Workshop. Florida, USA. Retrieved from http://www.itl.nist.gov/iad/mig/tests/rt/2009/workshop/LIA-EURECOM_paper.pdf
Hyks, V. (2005). Audio description and translation: Two related but different skills. Translating Today Magazine, 4(1), 6–8.
Jankowska, A. (2013). Tłumaczenieskryptówaudiodeskrypcji z językaangielskiegojakoalternatywnametodatworzeniaskryptówaudiodeskrypcji. Unpublished doctoral dissertation, Jagiellonian University, Krakow, Poland.
Kobayashi, M., Fukuda, K., Takagi, H., & Asakawa, C. (2009). Providing synthesized audio description for online videos. ASSETS ’09: Proceedings of the 11th International ACM SIGACCESS Conference on Computers and Accessibility(pp. 249-250). New York, USA: ACM.
Mączyńska, M. (2011). TTS AD with audio subtitling to a non-fiction film. A case study based on La Soufriere by Werner Herzog. Unpublished MA Thesis, University of Warsaw, Warsaw, Poland.
Matamala, A. (2006). La accesibilidad en los medios aspectos língüísticos y retos de formación. In R. Pérez-Amat,& Á. Pérez-Ugena (Eds.) Sociedad, integración y televisión en España (pp. 293–306). Madrid: Laberinto.
Matamala, A., & Orero, P. (2009). L’accessibilitat a Televisió de Catalunya: parlemamb Rosa Vallverdú, directora del departament de Subtitulació de TVC. Quaderns, Revista de Traducció, 16, 301-312.
Mieskes, M., & Martínez Pérez, J. (2011).A web-based editor for audio-titling using synthetic speech. Paper presented at the 3rd International Symposium on Live Subtitling with Speech Recognition, Antwerp, Belgium Retrieved from http://www.respeaking.net/Antwerp%202011/Webbased_editor.pdf
Moreno, A.; Febrer, A., & Márquez, L. (2006). Generation of Language Resources for the Development of Speech Technologies in Catalan. Proceedings of the Language Resources and Evaluation Conference LREC 06 (pp. 1632-1635). LREC: Genoa, Italy.
Oncins, E., Lopes, O., Orero, P., Serrano, J., & J. Carrabina(2013). All together now: a multi-language and multi-system mobile application to make living performing arts accessible. Jostrans. The Journal of Specialised Translation, 20, 147-164.
Ortiz-Boix, C. (2012). Technologies for audio description: study on the application of machine translation and text-to-speech to the audio description in Spanish. Unpublished MA Thesis, UniversitatAutònoma de Barcelona, Barcelona, Spain.
Remael, A., & Vercauteren, G. (2010). The translation of recorded audiodescription from English into Dutch.Perspectives. Studies in Translatology, 18(3), 155-171.
Szarkowska, A. (2011). Text-to-speech audio description: towards wider availability of AD. The Journal of Specialised Translation, 15, 142-162.
Szarkowska, A., & Jankowska, A. (2012). Text-to-speech audio description of voice-over films. A case study of audio described Volver in Polish. In E. Perego (Ed.) (2012). Emerging topics in translation: Audio description (pp. 81-98). Trieste, Italy: Edizioni Università di Trieste.
Walczak, A., & Szarkowska, A. (2012). Text-to-speech audio description of educational materials for visually impaired children. In S. Bruti, & E. Di Giovanni (Eds.) Audio Visual Translation across Europe: An Ever-Changing Landscape (pp. 209-234). Bern/Berlin: Peter Lang.
Comment citer
Autores mantêm os direitos autorais e concedem à revista o direito de primeira publicação, com o trabalho simultaneamente licenciado sob a Licença Creative Commons Atribuição 4.0 Internacional (CC BY) que permite o compartilhamento do trabalho com reconhecimento da autoria e publicação inicial nesta revista.
Autores têm autorização para assumir contratos adicionais separadamente, para distribuição não exclusiva da versão do trabalho publicada nesta revista (ex.: publicar em repositório institucional ou como capítulo de livro, com reconhecimento de autoria e publicação inicial nesta revista).