DOI: 10.1049/ibc.2016.0027

ISBN: 978-1-78561-343-2

Location: Amsterdam, Netherlands

Conference date: 8-12 Sept. 2016

Format: PDF

Latency remains one of the most significant factors (1) in the audience's perception of quality in live-originated TV captions for the Deaf and Hard of Hearing. Once all prepared script material has been shared between the programme production team and the captioners, pre-recorded video content remains a significant challenge - particularly `packages' for transmission as part of a news broadcast. These video clips are usually published just prior to or even during their intended programme - providing little opportunity for thorough preparation. This paper presents an automated solution based on cutting-edge developments in Automatic Speech Recognition research, the benefits of context-tuned models, and the practical application of Machine Learning across large corpora of data - namely many hours of accurately captioned broadcast news programmes. The challenges in facilitating the collaboration between academic partners, broadcasters and technology suppliers are explored, as are the technical approaches used to create the recognition and punctuation models, the necessary testing and refinement required to transform raw automated transcription into broadcast captions and methodologies for introducing the technology into a live production environment.

Inspec keywords: video recording; just-in-time; learning (artificial intelligence); speech recognition

Subjects: Speech recognition and synthesis; Video recording; Speech processing techniques; Knowledge engineering techniques

Just-in-time prepared captioning for live transmissions

Related content