Ph.D. Thesis @ UNC
August 01, 2008
Clique: Perceptually Based, Task Oriented Auditory Display for GUI Applications
Screen reading is the prevalent approach for presenting graphical desktop applications in audio. The primary function of a screen reader is to describe widgets the user encounters when interacting with a graphical user interface (GUI). This straightforward method allows people with visual impairments to hear exactly what is on the screen, but with significant usability problems in a multitasking environment. Screen reader users must infer the state of on-going tasks spanning multiple graphical windows, from a single, serial stream of speech describing one widget after another.
In this dissertation, I explore a new approach to enabling auditory display of GUI programs. With this method, the display describes concurrent application tasks using a small set of simultaneous speech and sound streams. The user listens to and interacts solely with this display, never with the underlying graphical interfaces. Scripts support this level of adaption by mapping GUI widgets to task definitions. Evaluation of this approach shows improvements in user efficiency, satisfaction, and understanding with relatively little development effort.
To develop this method, I studied the literature on existing auditory displays, working user behavior, and theories of human auditory perception and processing. I then conducted a user study to observe problems encountered and techniques employed by users interacting with an ideal auditory display: another human being. Based on my findings, I designed and implemented a prototype auditory display, called Clique, along with scripts adapting seven GUI applications. I concluded my work by conducting a variety of evaluations on Clique. The results of these studies show the following benefits of Clique over the state of the art for users with visual impairments (1-5) and mobile sighted users (6):
- Faster, accurate access to speech utterances through concurrent speech streams.
- Better awareness of peripheral information via concurrent speech and sound streams.
- Increased information bandwidth through concurrent streams.
- More efficient information seeking enabled by ubiquitous tools for browsing and searching.
- Greater accuracy in describing unfamiliar applications learned using a consistent, task-based user interface.
- Faster completion of email tasks in a standard GUI after exposure to those tasks in audio.
Documents
- Parente, Peter. Clique: Perceptually Based, Task Oriented Auditory Display for GUI Applications Ph.D. Thesis. University of North Carolina-Chapel Hill. July, 2008.
- The proposal document introducing Clique as my dissertation topic approved in December, 2004.
Example Movie
The following video gives a sample of the Clique user experience. In the video, a user works to complete a task assigned by email using multiple programs. The speech and sounds heard are all generated by Clique, and all changes in the visual GUIs are performed by Clique as it carries out the user commands. The captions in the video explain what the user is currently doing.
An accessible alternative to the Flash player embedded below is also available. Click here to download and automatically play the movie in Quicktime.
Example Sounds
The following sounds are examples of various concepts described in the dissertation document.
Description | Reference | Audio |
---|---|---|
Concatenative speech synthesis | Chapter 2, Section 2.1.1, Page 16 | OGG, MP3 |
Formant speech synthesis | Chapter 2, Section 2.1.1, Page 17 | OGG, MP3 |
Auditory icons | Chapter 2, Section 2.1.2, Page 18 | OGG, MP3 |
Familial earcons | Chapter 2, Section 2.1.3, Page 20 | OGG, MP3 |
Ambient sound | Chapter 2, Section 2.1.4, Page 22 | OGG, MP3 |
Audio mixing | Chapter 2, Section 2.1.5, Page 23 | OGG, MP3 |
HRTF spatialized sound | Chapter 2, Section 2.1.6, Page 24 | External link |
Screen reading a Web page | Chapter 2, Sections 2.5.2 and 2.5.3, Pages 60-67 | OGG, MP3, Screenshot |
Ideal display interaction | Chapter 3, Section 3.2.1, Page 91, List item #3 | OGG, MP3 |
Temporal stream integration | Chapter 4, Section 4.1.2, Pages 133-134 | External Link |
Spectral stream integration | Chapter 4, Section 4.1.2, Pages 133-134 | External Link |
Content assistant in isolation | Chapter 5, Section 5.1.1, Pages 160-165 | OGG, MP3 |
Summary assistant in isolation | Chapter 5, Section 5.1.1, Pages 160-165 | OGG, MP3 |
Related assistant in isolation | Chapter 5, Section 5.1.1, Pages 160-165 | OGG, MP3 |
Environmental sound theme in isolation | Chapter 5, Section 5.1.2, Pages 165-168 | OGG, MP3 |
Program menu | Chapter 5, Section 5.1.3, Pages 168-169 | OGG, MP3 |
Task menu | Chapter 5, Section 5.1.3, Pages 168-169 | OGG, MP3 |
Assistant response to task navigation | Chapter 5, Section 5.1.4, Pages 170-171 | OGG, MP3 |
Assistant response to content browsing | Chapter 5, Section 5.1.5, Page 173 | OGG, MP3 |
Assistant response to content searching | Chapter 5, Section 5.1.5, Page 173 | OGG, MP3 |
Assistant response to content editing | Chapter 5, Section 5.1.5, Page 173 | OGG, MP3 |
Source Code
The Clique source code is BSD licensed. I provide it as a reference implementation of a task-based, multichannel auditory display in hope that developers will revise and extend its core concepts.
The source includes the following sounds licensed under various Creative Commons licenses:
- By man: soldati-marcia.aif
- By jnr hacksaw: Zap.flac
- By csengeri: Cricket2.wav
Acknowledgement
This material is based upon work supported under a National Science Foundation Graduate Research Fellowship. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of the National Science Foundation