Models for spatial audio can provide accurate information about the relation between the sound source and the surrounding environment, including the listener and his/her body which acts as an additional filter. This information cannot be substituted by any other modality (e.g., visual or tactile). Nevertheless, today’s spatial representation of audio tends to be simplistic and with poor interaction capabilities, being multimedia systems currently focused on graphics processing mostly, and integrated with simple stereo or multi-channel surround sound.
On a much different level lie binaural rendering approaches (i.e. based on headphone reproduction). Most of current binaural sound rendering techniques rely on the use of Head-Related Transfer Functions (HRTFs), i.e. filters that capture the acoustic effects of the human head and ears and allow simulation of the audio signal at the entrance of the ear canal as a function of the sound source’s spatial position. HRTF filters are usually estimated from recordings on mannequins with average anthropometric measures. However, individual anthropometric features have a key role in HRTF shaping: several studies have shown how listening to non-individualized binaural sounds results in localization errors. On the other hand, individual HRTF measurements is both time- and resource-consuming.
The analysis and synthesis of 3D audio scenes through headphones also requires the collection of data such as Headphone Transfer Functions (HpTFs), which lead the equalization process of several types and models of headphones. As a matter of fact, the transfer function between headphone and eardrum heavily varies from person to person and with small displacements of the headphone itself. Thus, an inaccurate compensation likely leads to spectral colorations that affect both source elevation perception and sound externalization.
PADVA’s research program will focus on a family of approaches that overcome the current limitations of headphone-based 3D audio systems, aiming at building personal auditory displays through structural binaural audio models. The idea that arises from this kind of modeling is that if one isolates the contributions of the listener’s head, pinnae, shoulders, and torso in modifying the incoming signal’s spectrum in different subcomponents (either recorded or numerically simulated and stored in a database) then he can reconstruct the global transformation from a proper selected combination of all the considered effects.
In PADVA a novel framework for HRTF synthesis and HpTF modeling and customization which combines the structural modeling paradigm with other HRTF selection techniques, known as Mixed Structural Modeling (MSM), will be eviscerated. Customization will be based on individual anthropometric data, used either to fit a mathematical formulation or to select a simulated/recorded component within a set of available responses. It will be thus possible to adapt spatial rendering algorithms to a specific subject, just by knowing some of his anthropometric quantities (head radius, pinna shape, shoulder width, and so on).
The main objective of PADVA’s research program will thus be the definition and experimental validation of one or more completely customizable models for binaural sound presentation, which are today still missing in the literature on spatial sound. The starting point of the research program will be a previous work that highlights how in median-plane HRTFs the frequency of the spectral notches is related to the shape of the subject’s pinna. This opens the door for a very attractive approach to the selection of the HRTF – extrapolating the most relevant parameters that characterize it just from a 2-D (or 3-D) representation of the user’s pinna.
Effective personal auditory displays represent an innovative breakthrough for a plethora of applications, and the structural approach can also allow for effective scalability depending on the available computational resources or bandwidth. Scenes with multiple audio-visual objects can be managed exploiting the parallelism of increasingly ubiquitous GPUs (Graphics Processing Units). Some examples are: multi-channel downmix over headphones, personal cinema, spatial audio rendering in mobile devices, computer-game engines, individual binaural audio standards for movie and music production.