Speaker diarization and speech recognition in the semi-automatization of audio description: An exploratory study on future possibilities?

Héctor Delgado; Anna Matamala; Javier Serrano

doi:10.5007/2175-7968.2015v35n2p308

Autores/as

Héctor Delgado Universitat Autònoma de Barcelona
Anna Matamala Universitat Autònoma de Barcelona
Javier Serrano Universitat Autònoma de Barcelona

DOI:

https://doi.org/10.5007/2175-7968.2015v35n2p308

Resumen

Este artículo presenta una visión panorámica de los componentes tecnológicos usados en el proceso de audiodescripción y propone un nuevo escenario en el que se aplicarían el reconocimiento de habla, la traducción automática y la síntesis de habla, con su correspondiente revisión humana, para incrementar la cantidad de audiodescripciones disponibles. El artículo describe un proceso en el que la diarización y el reconocimiento de habla permiten obtener una transcripción semiautomática de la audiodescripción. El artículo presenta detalladamente el proceso técnico así como un resumen de los resultados experimentales.

Biografía del autor/a

Héctor Delgado, Universitat Autònoma de Barcelona

BS in Computer Science Engineering by Universidad de Sevilla, Spain, and MS in Multimedia Technologies by Universitat Autònoma de Barcelona, Spain. PhD candidate at the Department of Telecommunications and Systems Engineering at Universitat Autònoma de Barcelona, Cerdanyola del Vallès, Barcelona, Spain. E-mail: hecdelflo@gmail.com

Anna Matamala, Universitat Autònoma de Barcelona

BA in Translation and Interpreting by Universitat Autònoma de Barcelona, and PhD in Applied Linguistics by Universitat Pompeu Fabra (Barcelona). Tenured senior lecturer at Universitat Autònoma de Barcelona (Spain). E-mail:anna.matamala@uab.cat

Javier Serrano, Universitat Autònoma de Barcelona

Javier Serrano: BA in Computer Science (Universitat Autònoma de Barcelona) and PhD in Automatic Control (Computer Science Program, UAB). Associate Professor at Universitat Autònoma de Barcelona. E-email: javier.serrano@uab.cat

Citas

ADLAB (2012). Report on user needs assessment. Report no. 1, ADLAB (Audio Description: Lifelong Access to the Blind) project. Retrieved from www.adlabproject.eu.

Álvarez, A.; Mendes, C.; Raffaello, M.; Luis, T.; Paulo, S.; Piccinini, N.; Arzelus, H.; Neto, J.; Aliprandi, C., & Del Pozo, A. (2015). Automating live and batchsubtitling of multimedia contents for several European languages. Multimedia Tools and Applications (MTAP).

Bourne, J., & Jiménez, C. (2007). From the visual to the verbal in two languages: a contrastive analysis of the audio description of The Hours in English and Spanish. In J. Díaz-Cintas, P. Orero, & A. Remael (Eds.), Media for All. Subtitling for the Deaf, Audio Description, and Sign Languages (pp. 175-188). Amsterdam: Rodopi.

Caruso, B. (2012). Audio Description Using Speech Synthesis. In Languages and the Media. 9th International Conference on Language Transfer in Audiovisual Media.Conference Catalogue (pp. 59-60). Berlin: ICWE.

Delgado, H., Fredouille, C., & Serrano, J. (2014). Towards a complete binary key system for the speaker diarization task. Interspeech 2014. Proceedings of the 15th Annual Conference of the International Speech Communication Association (pp. 572-576). Singapore.

Drożdż-Kubik, J. (2011). Harry Potter iKamieńFilozoficznysłowemmalowany – czylibadanieodbiorufilmu z audiodeskrypcją z synteząmowy. MA Thesis. Krakow: Jagiellonian University.

DTV4ALL (2009). Digital Television for All. D2.3. Interim Report on Pilot services. Retrieved from http://dea.brunel.ac.uk/dtv4all/ICT-PSP-224994-D23.pdf

Fernández-Torné, A., & Matamala, A. (2014, November). Machine translation and audio description. Is it worth it? Assessing the post-editing effort. Paper presented at Languages and the Media. 10th International Conference on Languages Transfer in Audiovisual Media, Berlin, Germany.

Fernández-Torné, A., & Matamala, A. (forthcoming). Text-to-speech vs human voiced audio descriptions: a reception study in films dubbed into Catalan. Jostrans. The Journal of Specialised Translation.

Fernández-Torné, A., Matamala, A., & Ortiz-Boix, C. (2012, June). Technology for accessibility in multilingual settings: the way forward in AD? Paper presented at The translation and reception of multilingual films Conference, Montpellier, France. Retrieved from http://ddd.uab.cat/record/117160

Fredouille, C.; Bozonnet, S., & Evans, N.W.D. (2009) The LIA- EURECOM RT‘09 Speaker Diarization System.RT’09, NIST Rich Transcription Workshop. Florida, USA. Retrieved from http://www.itl.nist.gov/iad/mig/tests/rt/2009/workshop/LIA-EURECOM_paper.pdf

Hyks, V. (2005). Audio description and translation: Two related but different skills. Translating Today Magazine, 4(1), 6–8.

Jankowska, A. (2013). Tłumaczenieskryptówaudiodeskrypcji z językaangielskiegojakoalternatywnametodatworzeniaskryptówaudiodeskrypcji. Unpublished doctoral dissertation, Jagiellonian University, Krakow, Poland.

Kobayashi, M., Fukuda, K., Takagi, H., & Asakawa, C. (2009). Providing synthesized audio description for online videos. ASSETS ’09: Proceedings of the 11th International ACM SIGACCESS Conference on Computers and Accessibility(pp. 249-250). New York, USA: ACM.

Mączyńska, M. (2011). TTS AD with audio subtitling to a non-fiction film. A case study based on La Soufriere by Werner Herzog. Unpublished MA Thesis, University of Warsaw, Warsaw, Poland.

Matamala, A. (2006). La accesibilidad en los medios aspectos língüísticos y retos de formación. In R. Pérez-Amat,& Á. Pérez-Ugena (Eds.) Sociedad, integración y televisión en España (pp. 293–306). Madrid: Laberinto.

Matamala, A., & Orero, P. (2009). L’accessibilitat a Televisió de Catalunya: parlemamb Rosa Vallverdú, directora del departament de Subtitulació de TVC. Quaderns, Revista de Traducció, 16, 301-312.

Mieskes, M., & Martínez Pérez, J. (2011).A web-based editor for audio-titling using synthetic speech. Paper presented at the 3rd International Symposium on Live Subtitling with Speech Recognition, Antwerp, Belgium Retrieved from http://www.respeaking.net/Antwerp%202011/Webbased_editor.pdf

Moreno, A.; Febrer, A., & Márquez, L. (2006). Generation of Language Resources for the Development of Speech Technologies in Catalan. Proceedings of the Language Resources and Evaluation Conference LREC 06 (pp. 1632-1635). LREC: Genoa, Italy.

Oncins, E., Lopes, O., Orero, P., Serrano, J., & J. Carrabina(2013). All together now: a multi-language and multi-system mobile application to make living performing arts accessible. Jostrans. The Journal of Specialised Translation, 20, 147-164.

Ortiz-Boix, C. (2012). Technologies for audio description: study on the application of machine translation and text-to-speech to the audio description in Spanish. Unpublished MA Thesis, UniversitatAutònoma de Barcelona, Barcelona, Spain.

Remael, A., & Vercauteren, G. (2010). The translation of recorded audiodescription from English into Dutch.Perspectives. Studies in Translatology, 18(3), 155-171.

Szarkowska, A. (2011). Text-to-speech audio description: towards wider availability of AD. The Journal of Specialised Translation, 15, 142-162.

Szarkowska, A., & Jankowska, A. (2012). Text-to-speech audio description of voice-over films. A case study of audio described Volver in Polish. In E. Perego (Ed.) (2012). Emerging topics in translation: Audio description (pp. 81-98). Trieste, Italy: Edizioni Università di Trieste.

Walczak, A., & Szarkowska, A. (2012). Text-to-speech audio description of educational materials for visually impaired children. In S. Bruti, & E. Di Giovanni (Eds.) Audio Visual Translation across Europe: An Ever-Changing Landscape (pp. 209-234). Bern/Berlin: Peter Lang.

Diarización y reconocimiento de habla en la semiautomatización de la audiodescripción: un estudio exploratorio sobre posibilidades futuras

Autores/as

DOI:

Resumen

Biografía del autor/a

Héctor Delgado, Universitat Autònoma de Barcelona

Anna Matamala, Universitat Autònoma de Barcelona

Javier Serrano, Universitat Autònoma de Barcelona

Citas

Descargas

Publicado

Cómo citar

Número

Sección

Licencia

Declaración de Derecho de Autor

Artículos más leídos del mismo autor/a

Idioma

Enviar un artículo

Indexadores

ISSN: 2175-7968