A Brief Analysis of SLAVC method for Sound Source Localization

Xavier Juanola; Gloria Haro

doi:10.5201/ipol.2024.525

Xavier Juanola, Gloria Haro

published: 2024-05-29
reference: Xavier Juanola, and Gloria Haro, A Brief Analysis of SLAVC method for Sound Source Localization, Image Processing On Line, 14 (2024), pp. 159–172. https://doi.org/10.5201/ipol.2024.525

Communicated by Quentin Bammey
Demo edited by Aknine Billel and Xavier Juanola

Abstract

Mo and Morgado introduced in 2022 a novel self-supervised learning approach for Visual Sound Source Localization, denoted as SLAVC [Mo, S. and Mordado, P., A Closer Look at Weakly-Supervised Audio-Visual Source Localization, Advances in Neural Information Processing Systems, 2022]. The proposed method is based on multiple-instance contrastive learning. In addition to improving the results of previous methods, it also solves two critical problems that former methods faced: 1) excessive overfitting despite training on extensive datasets, 2) tendency to hallucinate sound sources even without visual evidence to support it in the video. In this paper, we briefly present the method, offer an online executable version allowing the users to test it on their own image-audio pairs and propose some improvements that could benefit the model as future work.

This is an MLBriefs article, the source code has not been reviewed!
The original source code is available here (last checked 2024/05/26).

Download

full text manuscript: PDF (3.1MB)
source code: ZIP

Abstract

Download

Preview