Empowering the Next Generation of Women in Audio

Join Us

Keeping it Real – Section 2

This is Section 2 of Becky Pell’s 3 Section Article on Using psychoacoustics in IEM mixing and the technology that takes it to the next level. Section 1

Acoustic Reflex Threshold

Have you ever noticed how you and the band can take a break from rehearsing, come back half an hour later, and when put your ears back in everything feels louder? And then how after a few moments it settles down and feels normal again? It’s because of a reflex action of the stapedius muscle in the middle ear. When this little muscle contracts, it pulls the stapes or ‘stirrup bone’ slightly away from the oval window of the cochlea, against which it normally vibrates to transmit pressure waves to be converted into nerve impulses. This action, which is a response to sounds of between 70-100dB SPL, effectively creates a compression effect resulting in a 20dB reduction in what you hear. However, the muscle can’t stay fully contracted for long periods, so after a few seconds, the tension drops to around 50% of the maximum. Whilst the initial reaction, at 150 milliseconds, is not fast enough to fully protect the ear against very loud and sudden transient sounds, it helps in reducing hearing fatigue over longer periods. Interestingly this reflex also occurs when a person vocalises, which helps to explain why a singer’s in-ear mix of the band might sound loud enough in isolation, but when they start singing they find they need more instrumentation. This happens in conjunction with the fact they are hearing themselves not only via the mix but through the bone conductivity of their skull. It’s well worth trying to sing along to an IEM mix that you’ve prepared for a singer to experience what this feels like for them because it’s a very different sensation from simply shouting down the mic to EQ it.

The acoustic reflex threshold also means that transients appear quieter than sustained sounds of the same level, and it’s the thinking behind a compression trick that is often used in studios and film production. When you compress the decay of a short sound such as a drum hit, it fools the brain into thinking the drum hit as a whole is significantly louder and punchier than it is, although the peak level – the transient – has not changed. Personally, I’d advocate caution if you’re going to try this in a monitor mix – the drummer needs to hear what their drums ACTUALLY sound like, and getting things such as drum tuning and mic placement correct at source are vital – but it’s an interesting thing to be aware of.

All in the timing

Our ability to perceive sounds as separate events is not only dependent on there being sufficient difference between them in frequency, but also on timing. This phenomenon is known as the ‘precedence effect’ and the ‘Haas effect.’

These effects describe how when two identical sounds are presented in quick succession, they are heard as a single sound. This perception occurs when the delay between the two sounds is between 1 to 5 ms for single click sounds, but up to 40 ms for more complex sounds such as piano music. When the lag is longer, the second sound is heard as an echo. A single reflection arriving within 5 to 30 ms can be up to 10 dB louder than the direct sound without being perceived as a distinct event. In 1951 Helmut Haas examined how the perception of speech is affected in the presence of a single reflection. He discovered that a reflection arriving later than 1 ms after the direct sound increases the perceived level and spaciousness (more precisely, the perceived width of the sound source), without being heard as a separate sound. This holds true up to around 20ms, at which point the sounds become distinguishable.

This can be an interesting experiment to try with a vocal mic and your IEMs. If you split the vocal mic down two channels, and delay one input somewhere between 1 and 20 ms, see what you notice. Then try panning one input hard left and the other hard right, and see how the vocal sounds thicker and creates a sense of width and space. Play with the delay time, and you’ll see that if it’s too short the signal starts to phase; too long and you lose the illusion. This game does make the signal susceptible to comb-filtering if you sum the inputs back to mono, especially at shorter delay times, so be aware of that.

Once again I would advocate extreme caution if you intend to use this in a monitor mix, as ‘tricking’ a singer in this way can backfire! However it’s a useful principle to be aware of if you have the opportunity to get creative with other sounds, and I use it a lot when adding pre-delay to a reverb – try it for yourself. No pre-delay creates a feeling of immediacy to the effect, but just 5-10ms creates a slight sense of space. If you’re after a little more breathiness and drama – ‘vampires swirling’ as I once heard it described – try increasing the pre-delay up to 20 ms and feel how it changes.

The Haas effect is also something to be very aware of for IEM mixing when it comes to digital latency. Every time we take a signal out of the console and send it somewhere else in the digital domain, a degree of minor time delay known as latency is introduced. Different processing devices introduce different amounts of latency, and obviously the less, the better. The more devices we add, the more the latency stacks up. Whilst a few milliseconds of latency may be totally imperceptible for, say, a guitarist; it’s a different matter when it comes to vocals. A singer will often be able to perceive something as being not quite right, without being able to put their finger on it, because when we vocalise and have that signal returned to our ears, the discrepancy between what we hear at the moment of making the sound, and the moment of it returning, becomes heightened in our awareness. Something to be vigilant about when dealing with any digital outboard such as plug-ins, for a singer.

Location Services

The Haas effect also affects where we perceive a sound to be coming from – the supposed location of the source is determined by the sound which arrives first, even though the sounds may be from two different physical locations. This holds true until the second sound is around 15dB louder than the first when the perception of direction changes.

Sound localisation is a very complex mechanism performed by the human brain. It’s not only dependent on the directional cues received by the ears, but it is also intertwined with the other senses, especially vision and proprioception. Our ability to determine a sound’s location and distance is called binaural hearing, and in addition to all the psychoacoustic effects discussed so far, it is also heavily influenced by the physical shape of our heads, ears, and even torsos. The outer ear or ‘pinna’ functions as a directional sound collector which funnels sound waves into the ear canal. The head and the topography of our face and torso influence how sounds from any position other than a 0° angle are heard, as they create an acoustic ‘shadow.’ Our brains process the differences between the information that our two ears collect, and interpret the results to determine where a sound is coming from, how far away it is, and whether it’s still or moving. At lower frequencies, below about 2kHz, this is mostly determined by the inter-aural time difference; that is, the discrepancy in time between when the sound reaches each ear. Above 2k the information gathered comes from the inter-aural level difference; that is, the discrepancy in volume between the sound that each ear hears. This clever evolutionary adaptation is due to the relative lengths of sound waves at different frequencies. For frequencies below 800 Hz, the dimensions of the head are smaller than the half wavelength of the sound waves so that the brain can determine phase delays between the ears.

However, for frequencies above 1600 Hz the dimensions of the head are greater than the length of the sound waves, so a determination of direction based on phase alone is not possible at higher frequencies; instead, we rely on the level difference between the two ears. These binaural disparities are known as Duplex theory and play an important role for sound localisation in the horizontal plane.

(As the frequency drops below 80 Hz it becomes difficult or impossible to use either time difference or level difference to determine a sound’s lateral source because the phase difference between the ears becomes too small for a directional evaluation, hence the experience of sub-bass frequencies being omnidirectional.)

Whilst this phenomenon makes it easy to sense which side a sound is coming from, it’s harder to determine direction in the up/down and front/back planes, due to our ears being placed at the same horizontal level as each other. Some types of owl have their ears placed at different heights, to allow for greater efficiency in finding prey when hunting at night, but humans have no such facility. This can result in ‘cones of confusion’, where we are unsure as to the elevation of a sound source because all sounds that lie in the mid-sagittal plane have similar inter-aural differences; however, once again the shapes of our bodies help us out. Imagine a sound source is right in front of you. There is a certain detour the torso reflection takes and hence a certain difference of this torso reflection in relation to the direct sound arriving at both ears. This yields a slight comb filter pattern which will change if you elevate this source. The same is true if this source is now moved behind you; the torso reflection changes and our brains process the information discrepancies to help us locate the source.

Next time: In the third and final section of this series on using psychoacoustics to enhance your monitor mixing, we’ll discover a ground-breaking new technology that takes IEMs to a whole new dimension.

Keeping It Real

Using psychoacoustics in IEM mixing and the technology that takes it to the next level

SECTION 1

All monitor engineers know that there are many soft skills required in our job – building a trusting relationship with bands and artists is vital for them to feel supported so they can forget about monitoring and concentrate on their job of giving a great performance. But what do you know about how the brain and ears work together to create the auditory response, and how can you make use of it in your mixes?

Hearing is not simply a mechanical phenomenon of sound waves travelling into the ear canal and being converted into electrical impulses by the nerve cells of the inner ear; it’s also a perceptual experience. The ears and brain join forces to translate pressure waves into an informative event that tells us where a sound is coming from, how close it is, whether it’s stationary or moving, how much attention to give to it and whether to be alarmed or relaxed in response. Whilst additional elements of cognitive psychology are also at play – an individual’s personal expectations, prejudices and predispositions, which we cannot compensate for – monitor engineers can certainly make use of psychoacoustics to enhance our mixing chops. Over the space of my next three posts, we’ll look at the different phenomena which are relevant to what we do, and how to make use of them for better monitor mixes.

What A Feeling

Music is unusual in that it activates all areas of the brain. Our motor responses are stimulated when we hear a compelling rhythm and we feel the urge to tap our feet or dance; the emotional reactions of the limbic system are triggered by a melody and we feel our mood shift to one of joy or melancholy; and we’re instantly transported back in time upon hearing the opening bars of a familiar song as the memory centres are activated. Studies have shown that memories can be unlocked in severely brain-damaged people and dementia patients by playing them music they have loved throughout their lives.

The auditory cortex of the brain releases the reward chemical dopamine in response to music – the same potentially addictive chemical which is also released in response to sex, Facebook ‘likes’, chocolate and even cocaine…. making music one of the healthier ways of getting your high. DJs and producers use this release to great effect when creating a build-up to a chorus or the drop in a dance track; in a phenomenon called the anticipatory listening phase, our brains actually get hyped up waiting for that dopamine release when the music ‘resolves’, and it’s manipulating this pattern of tension and release which creates that Friday night feeling in your head.

Missing Fundamentals

Our brains are good at anticipating what’s coming next and filling in the gaps, and a phenomenon known as ‘missing fundamentals’ demonstrates a trick which our brains play on our audio perception. Sounds that are not a pure tone (ie a single frequency sine wave) have harmonics. These harmonics are linear in nature: that is, a sound with a root note of 100 Hz will have harmonics at 200, 300, 400, 500 Hz and so on. However, our ears don’t actually need to receive all of these frequencies in order to correctly perceive the chord structure. If you play those harmonic frequencies, and then remove the root frequency (in this case 100Hz), your brain will fill in the gaps and you’ll still perceive the chord in its entirety – you’ll still hear 100Hz even though it’s no longer there. You experience this every time you speak on the phone with a man – the root note of the average male voice is 150Hz, but most phones cannot reproduce below 300Hz. No matter – your brain fills in the gaps and tells you that you’re hearing exactly what you’d expect to hear. So whilst the tiny drivers of an in-ear mould may not physically be able to reproduce the very low fundamental notes of some bass guitars or kick drums, you’ll still hear them as long as the harmonics are in place.

A biased system

Human hearing is not linear – our ear canals and brains have evolved to give greater bias to the frequencies where speech intelligibility occurs. This is represented in the famous Fletcher-Munson equal-loudness curves, and it’s where the concept of A-weighting for measuring noise levels originated. As you can see from the diagram below, we perceive a 62.5 Hz tone to be equal in loudness to a 1 kHz tone, when the 1k tone is actually 30dB SPL quieter.

Similarly, the volume threshold at which we first perceive a sound varies according to frequency. The area of the lowest absolute threshold of hearing is between 1 and 5 kHz; that is, we can detect a whisper of human speech at far lower levels than we detect a frequency outside that window. However, if another sound of a similar frequency is also audible at the same time, we may experience the phenomenon known as auditory masking.

This can be illustrated by the experience of talking with a friend on a train station platform, and then having a train speed by. Because the noise of the train encompasses the same frequencies occupied by speech, suddenly we can no longer clearly hear what our friend is saying, and they have to either shout to be heard or wait for the train to pass: the train noise is masking the signal of the speech. The degree to which the masking effect is experienced is dependent on the individual – some people would still be able to make out what their friend was saying if they only slightly raised their voice, whilst others would need them to shout loudly in order to carry on the conversation.

Masking also occurs in a subtler way. When two sounds of different frequencies are played at the same time, as long as they are sufficiently far apart in frequency two separate sounds can be heard. However, if the two sounds are close in frequency they are said to occupy the same critical bandwidth, and the louder of the two sounds will render the quieter one inaudible. For example, if we were to play a 1kHz tone so that we could easily hear it, and then add a second tone of 1.1kHz at a few dB louder, the 1k tone would seem to disappear. When we mute the second tone, we confirm that the original tone is still there and was there all along; it was simply masked. If we then re-add the 1.1k tone so the original tone vanishes again, and slowly sweep the 1.1k tone up the frequency spectrum, we will hear the 1k tone gradually ‘re-appear’: the further away the second tone gets from the original one, the better we will hear them as distinct sounds.

This ability to hear frequencies distinctly is known as frequency resolution, which is a type of filtering that takes place in the basilar membrane of the cochlea. When two sounds are very close in frequency, we cannot distinguish between them and they are heard as a single signal. Someone with hearing loss due to cochlea damage will typically struggle to differentiate between consonants in speech.

This is an important phenomenon to be aware of when mixing. The frequency range to which our hearing is most attuned, 500Hz – 5k, is where many of our musical inputs such as guitars, keyboards, strings, brass and vocals reside; and when we over-populate this prime audio real estate, things can start to get messy. This is where judicious EQ’ing becomes very useful in cleaning up a mix – for example, although a kick drum mic will pick up frequencies in that mid-range region, that’s not where the information for that instrument is. The ‘boom’ and ‘thwack’ which characterise a good kick sound are lower and higher than that envelope, so by creating a deep EQ scoop in that mid-region, we can clear out some much-needed real estate and un-muddy the mix. Incidentally, because of the non-linear frequency response of our hearing, this also tricks the brain into thinking the sound is louder and more powerful than it is. The reverse is also true; rolling off the highs and lows of a signal creates a sense of front-to-back depth and distance.

It’s also worth considering whether all external track inputs are necessary for a monitor mix – frequently pads and effects occupy this territory, and whilst they may add to the overall picture on a large PA, are they helping or hindering when it comes to creating a musical yet informative IEM mix?

Next time: In the second part of this psychoacoustics series we’ll examine the Acoustic Reflex Threshold, the Haas effect, and how our brains and ears work together to determine where a sound is coming from; and we’ll explore what it all means for IEM mixes.


 

X