Mean-shift and sparse sampling based SMC-PHD filtering for audio informed visual speaker tracking

  • Volkan Kilic
  • , Mark Barnard
  • , Wenwu Wang
  • , Adrian Hilton
  • , Josef Kittler

    Research output: Contribution to journalArticlepeer-review

    Abstract

    The probability hypothesis density (PHD) filter based on sequential Monte Carlo (SMC) approximation (also known as SMC-PHD filter) has proven to be a promising algorithm for multi-speaker tracking. However, it has a heavy computational cost as surviving, spawned and born particles need to be distributed in each frame to model the state of the speakers and to estimate jointly the variable number of speakers with their states. In particular, the computational cost is mostly caused by the born particles as they need to be propagated over the entire image in every frame to detect the new speaker presence in the view of the visual tracker. In this paper, we propose to use audio data to improve the visual SMC-PHD (VSMC-PHD) filter by using the direction of arrival (DOA) angles of the audio sources to determine when to propagate the born particles and re-allocate the surviving and spawned particles. The tracking accuracy of the AV-SMC-PHD algorithm is further improved by using a modified mean-shift algorithm to search and climb density gradients iteratively to find the peak of the probability distribution, and the extra computational complexity introduced by mean-shift is controlled with a sparse sampling technique. These improved algorithms, named as AVMS-SMCPHD and sparse-AVMS-SMC-PHD respectively, are compared systematically with AV-SMC-PHD and V-SMC-PHD based on the AV16.3, AMI and CLEAR datasets.
    Original languageEnglish
    Pages (from-to)2417-2431
    JournalIEEE Transactions on Multimedia
    Volume18
    Issue number12
    Early online date10 Aug 2016
    DOIs
    Publication statusPublished - 31 Dec 2016

    Bibliographical note

    Note: This work was supported by the Engineering and Physical Sciences Research Council [grant numer: EP/K014307/1 and EP/L000539/1].

    Keywords

    • Computer science and informatics

    Fingerprint

    Dive into the research topics of 'Mean-shift and sparse sampling based SMC-PHD filtering for audio informed visual speaker tracking'. Together they form a unique fingerprint.

    Cite this