TY - CONF
T1 - Audio head pose estimation using the direct to reverberant speech ratio
AU - Barnard, Mark
AU - Wang, Wenwu
AU - Kittler, Josef
N1 - Note: Published in: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway, NJ : Institute of Electrical and Electronics Engineers. pp. 8056-8060. ISSN (print) 1520-6149 ISSN (online) 2379-190X ISBN 9781479903566
PY - 2013/5
Y1 - 2013/5
N2 - Head pose is an important cue in many applications such as, speech recognition and face recognition. Most approaches to head pose estimation to date have used visual information to model and recognise a subject's head in different configurations. These approaches have a number of limitations such as, inability to cope with occlusions, changes in the appearance of the head, and low resolution images. We present here a novel method for determining coarse head pose orientation purely from audio information, exploiting the direct to reverberant speech energy ratio (DRR) within a highly reverberant meeting room environment. Our hypothesis is that a speaker facing towards a microphone will have a higher DRR and a speaker facing away from the microphone will have a lower DRR. This hypothesis is confirmed by experiments conducted on the publicly available AV16.3 database. © 2013 IEEE.
AB - Head pose is an important cue in many applications such as, speech recognition and face recognition. Most approaches to head pose estimation to date have used visual information to model and recognise a subject's head in different configurations. These approaches have a number of limitations such as, inability to cope with occlusions, changes in the appearance of the head, and low resolution images. We present here a novel method for determining coarse head pose orientation purely from audio information, exploiting the direct to reverberant speech energy ratio (DRR) within a highly reverberant meeting room environment. Our hypothesis is that a speaker facing towards a microphone will have a higher DRR and a speaker facing away from the microphone will have a lower DRR. This hypothesis is confirmed by experiments conducted on the publicly available AV16.3 database. © 2013 IEEE.
KW - Computer science and informatics
U2 - 10.1109/ICASSP.2013.6639234
DO - 10.1109/ICASSP.2013.6639234
M3 - Paper
T2 - 38th International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Y2 - 26 May 2013 through 31 May 2013
ER -