Abstract
The probability hypothesis density (PHD) filter
based on sequential Monte Carlo (SMC) approximation (also
known as SMC-PHD filter) has proven to be a promising
algorithm for multi-speaker tracking. However, it has a heavy
computational cost as surviving, spawned and born particles
need to be distributed in each frame to model the state of the
speakers and to estimate jointly the variable number of speakers
with their states. In particular, the computational cost is mostly
caused by the born particles as they need to be propagated
over the entire image in every frame to detect the new speaker
presence in the view of the visual tracker. In this paper, we
propose to use audio data to improve the visual SMC-PHD (VSMC-PHD) filter by using the direction of arrival (DOA) angles
of the audio sources to determine when to propagate the born
particles and re-allocate the surviving and spawned particles.
The tracking accuracy of the AV-SMC-PHD algorithm is further
improved by using a modified mean-shift algorithm to search
and climb density gradients iteratively to find the peak of the
probability distribution, and the extra computational complexity
introduced by mean-shift is controlled with a sparse sampling
technique. These improved algorithms, named as AVMS-SMCPHD and sparse-AVMS-SMC-PHD respectively, are compared
systematically with AV-SMC-PHD and V-SMC-PHD based on
the AV16.3, AMI and CLEAR datasets.
| Original language | English |
|---|---|
| Pages (from-to) | 2417-2431 |
| Journal | IEEE Transactions on Multimedia |
| Volume | 18 |
| Issue number | 12 |
| Early online date | 10 Aug 2016 |
| DOIs | |
| Publication status | Published - 31 Dec 2016 |
Bibliographical note
Note: This work was supported by the Engineering and Physical Sciences Research Council [grant numer: EP/K014307/1 and EP/L000539/1].Keywords
- Computer science and informatics