Immersive sound, 3D sound, Surround sound… Let’s talk Psychoacoustics
Lately, all these concepts have been gaining popularity in the sound field, many manufacturers have started to increase their innovations for the audience to experience even more realistic immersive sound. But what’s the theory behind it?
Immersive sound, 3D sound, and Surround sound are all referring to the same thing and it is basically how sound can be manipulated to recreate real-life sound on speakers and headphones, making it closer to a 360 experience. These manipulations are based purely on how our brain tricks us to hear things, It is all related to the way we hear.
The way our brain processes the information we receive in our ears defines the way we hear. Meaning that additionally to the physical shapes of our body and the mechanical characteristics of the sound waves, it is actually the brain, combining this information, creates the perception of sound. There have been studies about it and the theory behind those studies is called Psychoacoustics.
Starting with the human body, there have been several studies that defined very accurately the physical structure of our ear and the role they play in the process of hearing:
The external ear, where sound waves hit the eardrum and make it vibrate
The medium ear where the small bones transfer these vibrations from the eardrum to the internal ear (malleus, incus, and stapes)
The internal ear where these vibrations reach the Cochlea, in which interior millions of hair cells vibrate and produce signals sent to the brain
Once the brain receives these signals, it processes them according to its properties
- Pitch characteristics
- Distance-related properties – Interaural level difference
- Time-related properties – Interaural time difference
Here is where psychoacoustics plays a role and explains with different phenomena how we can perceive sound depending on different properties of the sound and the way our brain is programmed. There are many concepts and explanations about how our brain behaves when listening to sound:
According to pitch characteristics:
Physically each region of the Cochlea acts as an amplifier to mechano-electrical transduction that gives “hair cells” an electromotility property which gives selectivity and sensitivity to the frequencies we hear by areas.
One common disease related to this is tinnitus, at least 20% of the population is affected by it, where despite the existence of an external sound, the person can hear phantom noises due to age-related hearing loss, an ear injury or a problem with the circulatory system
Ghost fundamental:
This phenomenon happens when a signal containing all harmonic elements but the fundamental is detected by the brain who identifies the pattern of the signal and tricks us to hear that fundamental frequency that is not present.
Robinson-Dadson curves:
The Ear has a specific response to what we hear, meaning the higher the Sound Pressure Level (SPL) is, the flatter the response is in our ears, i.e., we tend to hear frequency range more evenly with a higher SPL. If the SPL is lower, lower frequencies are more present in our hearing
Masking:
For each tonal frequency, there is an associated masking threshold with a critical bandwidth, where any signal reproduced inside this bandwidth and below that threshold will not be heard. For frequencies around 20Hz and 400Hz, this bandwidth varies between 100Hz<BW <400Hz, behaving logarithmically on the raise. Meaning our brain processes most information within the lower frequencies, a concept utilized in audio compression such as mp3 to removed frequencies masked on the higher frequencies range
Not perceptible frequencies:
There have been findings showing that even if over 26KHz frequencies cannot be heard, they can be detected as brain activity in MRI images, causing different responses in individuals such as pleasure, tranquility, and dynamic appreciations
According to Interaural time and level difference:
Source Location:
It has been determined that the way our brain deciphers the position of a sound is determined by high frequencies. Because the wavelength at high frequencies is comparable to the dimensions of our body, it can let the brain understand what the position of the sound source is in reference to our bodies. However, lower frequencies can also help to determine the sound source by measuring phase differences between signals perceived by each ear.
Hass effect:
When two sound sources with the same distance and SPL to the hearing person (stereo) will produce a ghost image in the middle of the sources. If one of the sources is attenuated or delayed (5ms or -18dB), the ghost image will move from the center closer to one of the sources. Here is where concepts like time difference and level differences between audio signals are the key for most professional audio applications
HRTF. HPTF:
The Head Related and Head Phones Transfer Functions are mathematical equations that explain how our head, torso, and ear shapes affect the way we hear, meaning how the distance between our ears, dimensions, curves, shapes, bones reflections, and bones resonators affect each frequency of a sound wave defining the sound perception for each specific individual. These anthropometric measurements are used to define personalized HRTF and HPTF.
As mentioned before, all these concepts are being applied in Professional sound applications based on psychoacoustics theory:
- When using headphones, pitch and spatiality of sound are affected, being compensated by HPTF and equalization.
- The Dummy head (binaural microphone), records sound passing through a digital simulation based on an algorithm using the HRTF
- Ambisonic Microphones record 3D sounds based on localization cues (ILD, ITD) by using 4 decoded signals to be reproduced on a speaker matrix, the signal is reconstructed using the spherical harmonics theory
- Immersive sound uses software manipulation to emulate 3d objects using time difference, level difference, and HRTF combined with speaker matrixes
References
Oohashi, Tsutomu (1991). High-Frequency sound above the audible range affects brain electric activity and sound perception. AES 91st Convention. NY
Sunder, K., Tan E., Gan W. (2014) Effect of headphone equalization on auditory distances perception. AES 137th Convention LA.
Novatech & Adelaide Symphony Orchestra Present Harry Potter and the Prisoner of Azkaban in L-ISA. https://www.youtube.com/watch?v=3VMfMA1i-hY
What is AMBEO Immersive Audio by Sennheiser? https://www.youtube.com/watch?v=uIpVM4-3tV4
Binaural Audio Recording. https://www.youtube.com/watch?v=vGt9DjCnnt0
ASMR 3D Tingles | Zoom H3 VR Mic Test (No Talking). https://www.youtube.com/watch?v=Hrf87AdR3Eg
Woo Lee, G., Kook Kim, H. (2018) Personalized HRTF Modeling Based on Deep Neural Network Using Anthropometric Measurements and Images of the Ear