Generating Audio-Visual Slideshows from Text Articles
Using Word Concreteness
Mackenzie Leake✧
Hijung Valentina Shin✻
Joy O. Kim✻
Maneesh Agrawala✧
✧ Stanford University, ✻ Adobe Research
Abstract: We present a system that automatically transforms text articles into audio-visual slideshows by leveraging the notion of
word concreteness, which measures how strongly a word or
phrase is related to some perceptible concept. In a formative
study we learn that people not only prefer such audio-visual
slideshows but find that the content is easier to understand
compared to text articles or text articles augmented with images.
We use word concreteness to select search terms and find
images relevant to the text. Then, based on the distribution
of concrete words and the grammatical structure of an article,
we time-align selected images with audio narration obtained
through text-to-speech to produce audio-visual slideshows.
In a user evaluation we find that our concreteness-based algorithm
selects images that are highly relevant to the text.
The quality of our slideshows is comparable to slideshows
produced manually using standard video editing tools, and
people strongly prefer our slideshows to those generated using
a simple keyword-search based approach.
Example:
Fig. 1: We present a method for automatically generating audio-visual slideshows from a text article by identifying representative concrete phrases
from the text and searching for visuals that match these words. Left: Original text article. Right: Visuals selected by our system based on the concrete
words (red) in each sentence. The text is automatically turned into voiceover speech, and the visuals are timed to appear whenever the first concrete
word appears in a sentence.