SIGGRAPH Asia 2017

Fully Perceptual-Based 3D Spatial Sound Individualization
with an Adaptive Variational AutoEncoder


Kazuhiko Yamamoto                    Takeo Igarashi
Figure
Figure

ABSTRACT


To realize 3D spatial sound rendering with a two-channel headphone, one needs head-related transfer functions (HRTFs) tailored for a specific user. However, measurement of HRTFs requires a tedious and expensive procedure. To address this, we propose a fully perceptual-based HRTF fitting method for individual users using machine learning techniques. The user only needs to answer pairwise comparisons of test signals presented by the system during calibration. This reduces the efforts necessary for the user to obtain individualized HRTFs. Technically, we present a novel adaptive variational AutoEncoder with a convolutional neural network. In the training, this AutoEncoder analyzes publicly available HRTFs dataset and identifies factors that depend on the individuality of users in a nonlinear space. In calibration, the AutoEncoder generates high-quality HRTFs fitted to a specific user by blending the factors. We validate the feasibilities of our method through several quantitative experiments and a user study.


VIDEO



DOWNLOADS


Figure
  • Paper
  • Video

PUBLICATION


Kazuhiko Yamamoto and Takeo Igarashi. 2017. Fully Perceptual-Based 3D Spatial Sound Individualization with an Adaptive Variational AutoEncoder, ACM Transaction on Graphics. 36, 6, pp.212:1--212:13 (2017). (a.k.a. Proceedings of SIGGRAPH Asia 2017)