Empowering the Next Generation of Women in Audio

Join Us

Depression, Anxiety, and Hope from a Roadie In The Age of Covid

Dear Everyone, you are not alone.


**TRIGGER WARNING: This blog contains personal content surrounding issues of mental health including depression and anxiety, and the Covid-19 pandemic. Reader discretion is advised.**

The alarm on my phone went off at 6:30 a.m.

I rolled out of my bunk, carefully trying to make as little noise as possible as I gathered my backpack, clothes, and tool bag before exiting the bus.

The morning air felt cool against my face as I looked around me trying to orient myself in the direction of the loading dock to the arena. Were we in New York? Ohio? Pennsylvania? In the morning before coffee, those details were difficult to remember.

Passed the elephant door, the arena sprawled out before me, empty and suspensefully silent. I looked up with a mixed sense of awe and critical analysis as I noted the three tiers of the arena, the red seats forming distinct geometrical shapes between each section. As I made my way out to the middle of the deceivingly large room, I looked toward the ground in hopes of finding that tell-tale button marking the middle of the room, if I was lucky.

As I set up my tripod, I heard the footsteps of the rigging team as they began stretching out their yellow measuring tapes across the cement floor. The clapping of their feet echoed in the room and soon the sound of their voices calling out distances joined the chorus in the reverb tails.

I turned on my laser and pulled out my notepad, the pen tucked in my hair as I aimed for the first measurement.

Then I woke up.

Up above me, all I could see was the white air-tile of the basement ceiling while the mini-fridge hummed in the corner of the room.

For a few seconds, or maybe it was a full minute, I had absolutely no idea where I was.

I wanted to scream.

I lay in bed for what could have been 15 minutes or an hour, telling myself I had to get out of bed. I couldn’t just lay here. I had to do something. Get up. Get UP.

Eventually, I made my way upstairs and put on a pot of water for coffee. When I opened my phone and opened Facebook, I saw a status update from a friend about a friend of a friend who had passed away. My heart sank. I remembered doing a load-in with that person. Years ago, at a corporate event in another city, in another lifetime. They didn’t post details on what had happened to them. Frankly, it wasn’t anyone’s business, but the family and those closest to them. Yet my heart felt heavy.

Six months ago, or maybe more, time had ceased to have any tangible meaning at this point, I had been sitting in a restaurant in Northern California when the artists told the whole tour that we were all going home. Tomorrow. Like a series of ill-fated dominoes, events were canceling one-by-one across the country and across the world. Before I knew it, I was back in my storage unit at my best friend’s house, trying to shove aside the boxes I had packed up 4 or 5 months earlier to make room for an inflatable mattress so I had somewhere to sleep. I hadn’t really expected to be “home” yet so I hadn’t really come up with a plan as to what I was going to do.

Maybe I’ll go camping for the next month or so. Try to get some time to think. I loved nature and being out in the trees always made me feel better about everything, so maybe that was the thing to do. Every day I looked at the local newspaper’s report of the number of Covid-19 cases in California. It started out in the double digits. The next day it was in the triple digits. Then it grew again. And again. Every day the numbers grew bigger and notices of business closing and areas being restricted filled the pages and notifications across the Internet.

Fast-forward and the next thing I knew, I was packing all my possessions into a U-Haul trailer and driving across the country to be with my sister in Illinois. She had my baby niece a little over a year ago, so I figured the best use of my time would be to spend time with my family while I could.

I was somewhere driving across Kansas when the reality of what was happening hit me. As someone who loved making lists and planning out everything from their packing lists to their hopes and dreams in life, I—for once—literally had no idea what I was doing. This seemed like the best idea I could think of at the time.

Fast-forward and I was sitting on the phone in the basement of my sister’s house in the room she had graciously fabricated for me out of sectioned-off tapestries. I looked at the timestamp on my phone for how long I had been on hold with the Unemployment Office. Two hours and thirty minutes. It took twenty calls in a row to try and get through to someone at the California Employment Development Department. At the three-hour mark, the line disconnected. I just looked down at my phone.

I remember one Christmas when I was with my dad’s side of the family at dinner, I tried to explain what I do to them.

“So you are a DJ, then?” my aunt asked enthusiastically, believing that she had finally gotten it right.

“No,” I said.

“Do you play with the band?” my uncle asked.

“No, I’m the person who tries to make sure everyone in the audience can hear the band,” I tried to laugh.

Everyone laughed that sort of half-laugh when you try to pretend you get the joke, but you don’t actually get it.

Across my social media feeds, friends, colleagues, acquaintances, and everyone in between, were all sharing updates of how they had to get “real jobs”, how they couldn’t get through to unemployment or their state had completely failed to get them any unemployment at all, how they were angry, desperate, and how they needed to feed their families. Leaders in the industry grew from the motivation of trying to speak out on behalf of the live events industry to the government, pleading for financial relief for businesses, venues, individuals, and more, and my feeds flooded with initiatives and campaigns for awareness of the plight of the live events industry.

Yet when I talked to people who were not in the industry, they seemed to have no idea that the live events sector had been affected at all. Worse yet, I realized more and more that so few people had any idea of what people in the live events industry actually do. Organizations struggled to get news channels to do exposés on the subject, and perhaps it was because there were so many people across every sector of every industry that were struggling. In one conversation with a friend, I had explained that there were nearly 100 people on a tour that I had worked on between the production, tech crew, artist’s tech crew, everyone. They couldn’t believe so many people were working behind the scenes at one concert.

Yet the more I talked about my job and the more time that passed, the more I felt like I was talking about a dream. This fear grew inside me that there was no end in sight to all this and the stories started to repeat themselves and it started to feel like these were stories of what had been, not what was. It was becoming increasingly difficult to concentrate when talking to people about “regular” things in our daily lives because it was not work. Talking about the weather was not talking about rigging plots or truckloads, so my brain just refused to focus on it. Yet I couldn’t stop thinking about the industry: watching webinars, learning new things because I just wanted so desperately to go back to my career that I fabricated schedules and deadlines around other obligations to feel like work was still there.

Then the thought that underpinned all this rose up like a monster from the sea:

Who am I without my job?

I read an article Dave Grohl wrote [1] about performing and playing music on-stage for people, how there was nothing like that feeling in the whole world. I think he hit on something that, in effect, is really indescribable to anyone who has not worked in the live events world. There was a feeling unlike any other of standing in a room with tens of thousands of people screaming at deafening levels. There was a feeling unlike any other of standing alone in a room listening to a PA and crafting it to sound the way you wanted it to. There was a feeling unlike any other of hearing motors running in the morning while pulling a snake across an arena floor. There was a feeling unlike any other of complete, utter exhaustion riding a bus in the morning to the next load-in after doing 4, 5, 6, however many gigs in a row. I tried to explain these feelings to my friends and family who listened with compassion, but I couldn’t help but feel that sometimes they were just pretending to get the joke.

Days, weeks, months floated by and the more time passed, the more I felt like I was floating in a dream. This was a bad dream that I would wake up from. It had to be. Then when I came to reality and realized that this was not a dream, that this was where I was in my life now, it felt like my brain and the entire fabric of my being was splitting in two. It was not unbeknownst to me how fortunate I was with my sister taking me in. Every morning I tried to say 5 things I was grateful for to keep my spirits up and my sister was always one of them.

The painful irony was that I had stopped going to therapy in January 2020 because I felt I had gotten to an OK point in my life where I was good for now. I had gotten where I needed to for the time being and I could shelve all the other stuff for now until I had time to address them. Then suddenly I had all the time in the world and while shut down in quarantine, all those things in my brain I told myself I would deal with later…Well, now I had no other choice than to deal with them, and really this all intersected with the question at hand of who was I without my job.

And I don’t think I was alone

The thing people don’t tell you about working in the industry is the social toll it takes on your life and soul. The things you give up and the parts of yourself you give up to make it a full-time gig. Yet there is this mentality of toughing it through because there are 3,000 other people waiting in line to take your spot and if you falter for even just one step, you could be gone and replaced just as easily. Organizations focusing on mental health in the industry started to arise from the pandemic because, in fact, it wasn’t just me. There are many people who struggle to find that balance of life and work let alone when there is a global health crisis at hand. All this should make one feel less alone, and to some extent it does. The truth is that the journey towards finding yourself is, as you would imagine, something each person has to do for themself. And my reality was that despite all the sacrifices needed for this job, all I wanted to do was run back to it as fast as I could.

Without my work, it felt like a huge hole was missing from my entire being. That sense of being in a dream pervaded my every waking moment and even in my dreams, I dreamt of work to the point where I had to take sleeping aids just so I would stop thinking about it in my dreams too. I found myself at this strange place in my life where I reunited myself with hobbies that I previously cast aside for touring life and trying to appreciate what happiness they could offer. More webinars and industry discussions popped up about “pivoting” into new industries or fields and in some of these, you could physically see the pain in the interviewees’ faces as they tried to discuss how they had made their way in another field.

One day I was playing with my baby niece and I told her we had to stop playing to go do something, but we would come back to playing later. She just looked at me in utter bewilderment and said, “No! No! No!” Then I remembered that small children have no concept of “now” versus “later”. Everything literally is in the “now” for them. It struck me as something very profound that my niece lived completely in the moment. Everything was a move from one activity to the next, always moving forward. So with much effort and pushback against every fiber of my future-thinking self, I just stopped trying to think of anything further than the next day ahead of me. Just move one foot in front of the other and be grateful every day that I am here in what’s happening at this moment.

Now with the vaccination programs here in the United States and the rumblings of movement trickling across the grapevine, it feels like for the first time in more than a year that there is hope on the horizon. There is a part of me that is so desperate for it to be true and part of me that is suspiciously wary of it being true. Like seeing the carrot on the ground, but being very aware of the fact there is a string attached to it that can easily pull the carrot away from you once more.

There is a hard road ahead and a trepidatious one, at that. Yet after months and months of complete uncertainty, there is something to be said about having hope that things will return to a new type of “normal”. Because “normal” would imply that we would return to how things were before 2020. I believe that there is good change and reflection that came in the pause of the pandemic that we should not revert back from: a collective reflection on who we are, whether we wanted to address it to ourselves or not.

What will happen from this point moving forward is anyone’s gamble, but I always like to think that growth doesn’t come from being comfortable. So with one foot in front of the other, we move forward into this next phase of time. And like another phrase that seems to come up over and over again, “Well, we will cross that bridge when we come to it.”



What Is a FIR Filter?

The use of FIR filters (or finite impulse response filters) has grown in popularity in the live sound world as digital signal processing (DSP) for loudspeakers becomes more and more sophisticated. While not a new technology in itself, these filters provide a powerful tool in system optimization due to their linear phase properties. But what exactly do we mean by “finite impulse response” and how do these filters work? In order to understand digital signal processing better we are going to need to take a step back into our understanding of mathematics and levels of abstraction.

A (Very) Brief Intro To DSP

One of the reasons I find mathematics so awesome is because we are able to take values in the real or imaginary world and represent them either symbolically or as a variable in order to analyze them. We can use the number “2” to represent two physical oranges or apples. Similarly, we can take it up another level of abstraction by saying we have “x” amount of oranges or apples to represent a variable amount of said item. Let’s say we wanted to describe an increasing amount of apples where for every new index of apples, we add the sum of the previous number of apples. We can write this as an arithmetic series for all positive integer number “n” of apples as:

Where for each index of apples starting at 1, 2, 3, 4…etc onto infinity we have the current index value n plus the sum of all the values before it. Ok, you might be asking yourself why we are talking about apples when we are supposed to be talking about FIR filters. Well, the reason is that digital signal processing can be represented using this series notation and it makes it a lot easier than writing out the value for every single input into a filter. If we were to sample a sine wave like the one below, we could express the total number of samples over the period from t1 to t2 as the sum of all the samples over that given period.

In fact, as Lyons points out in Understanding Digital Signal Processing (2011) we can express the discrete-time sequence for a given sine-wave at frequency f (in Hertz) at a given time t (in seconds) with the function f(n) = This equation allows us to translate each value of the sine wave, for example, voltage in an electric signal, for a discrete moment in time into an integer value that can be plotted in digital form.

What our brain wants to do is draw lines in between these values to create a continuous waveform so it looks like the original continuous sine wave that we sampled. In fact, this is not possible because each of these integers are discrete values and thus must be seen separately as compared to an analog, continuous signal. Now, what if the waveform that we sampled wasn’t a perfect sine wave, but instead had peaks and transient values? The nature of FIR filters has the ability to “smooth out” these stray values with linear phase properties.

How It Works

The finite impulse response filter gets its name because the same number, or finite, input values you get going into the filter, you get coming out the output. In Understanding Digital Signal Processing, Lyons uses a great analogy of how FIR filters average out summations like averaging the number of cars crossing over a bridge [2]. If you counted the number of cars going over a bridge every minute and then took an average over the last five minutes of the total number of cars, this averaging has the effect of smoothing out the outlying higher or lower number of vehicles to create a more steady average over time. FIR filters function similarly by taking each input sample and multiplying it by the filter’s coefficients and then summing them at the filter’s output. Lyons points out how this can be described as a series which illustrates the convolution equation for a general “M-tap FIR filter” [3]:

While this may look scary at first, remember from the discussion at the beginning of this blog that mathematical symbols package concepts into something more succinct for us to analyze. What this series is saying is that for every sample value x whose index value is n-k, k being some integer greater than zero, we multiply its value times the coefficient h(k) and sum the values for the number of taps in the filter (M-1). So here’s where things start to get interesting: the filter coefficients h(k) are the FIR filter’s impulse response. Without going too far down the rabbit hole in discussing convolution and different types of FIR windows for filter design, let’s jump into the phase properties of these filters then focus on their applications.

The major advantage of the FIR filter compared to other filters such as the IIR (or infinite impulse response) filter lies in the symmetrical nature of the delay introduced into the signal that doesn’t introduce phase shift into the output of the system. As Lyons points out this relates to the group delay of the system:

When the group delay is constant, as it is over the passband of all FIR filters having symmetrical coefficients, all frequency components of the filter input signal are delayed by an equal amount of time […] before they reach the filter’s output. This means that no phase distortion is induced in the filter’s desired output signal […] [4]

It is well known that phase shift, especially at different frequency ranges, can cause detrimental constructive and/or destructive effects between two signals. Having a filter at your disposal that allows gain and attenuation without introducing phase shift has significant advantages especially when used as a way of optimizing frequency response between zones of loudspeaker cabinets in line arrays. So now that we have talked about what a FIR filter is and its benefits, let’s discuss a case for the application of FIR filters.

Applications of FIR filters

Before sophisticated DSP and processors were so readily available, a common tactic of handling multiway sound systems, particularly line arrays, with problematic high-frequencies was to go up to the amplifier of the offending zone of boxes and physically turn down the amplifier running the HF drivers. I’m not going to argue against doing what you have to do to save people’s ears in dire situations, but the problem with this method is that when you change the gain of the amplifier for the HF in a multiway loudspeaker, you effectively change the crossover point as well. One of our goals in optimizing a sound system is to maintain the isophasic response of the array throughout all the elements and zones of the system. By using FIR filters to adjust the frequency response of a system, we can make adjustments and “smooth out” the summation effects of the interelement angles between loudspeaker cabinets without introducing phase shift in-between zones of our line array.

Remember the example Lyons gave comparing the averaging effects of FIR filters to averaging the number of cars crossing a bridge? Now instead of cars, imagine we are trying to “average” out the outlier values for a given frequency band in the high-frequency range of different zones in our line array. These variances are due to the summation effects dependent on the interelement angles between cabinets. Figure A depicts a 16 box large-format line array with only optimized interelement angles between boxes using L-Acoustics’ loudspeaker prediction software Soundvision.

Figure A

Each blue line represents a measurement of the frequency response along the coverage area of the array. Notice the high amount of variance in frequency response particularly above 8kHz between the boxes across the target audience area for each loudspeaker. Now when we use FIR filtering available in the amplifier controllers and implemented via Network Manager to smooth out these variances like in the car analogy, we get a smoother response closer to the target curve above 8kHz as seen in Figure B.

Figure B

In this example, FIR filtering allows us to essentially apply EQ to individual zones of boxes within the array without introducing a relative phase shift that would break the isophasic response of the entire array.

Unfortunately, there is still no such thing as a free lunch. What you win in phase coherence, you pay for in propagation time. That is why, sadly, FIR filters aren’t very practical for lower frequency ranges in live sound because the amount of introduced delay at those frequency ranges would not be practical in real-time applications.


By taking discrete samples of a signal in time and representing it with a series expressions, we are able to define filters in digital signal processing as manipulations of a function. Finite impulse response filters with symmetric coefficients are able to smooth out variances in the input signal due to the averaging nature of the filter’s summation. The added advantage here is that this happens without introducing phase distortion, which makes the FIR filter a handy tool for optimizing zones of loudspeaker cabinets within a line array. Today, most professional loudspeaker manufacturers employ FIR filters to some degree in processing their point source, constant curvature, and variable curvature arrays. Whether the use of these filters creates a smoother sounding frequency response is up to the user to decide.


[1] (pg. 2) Lyons, R.G. (2011). Understanding Digital Signal Processing. 3rd ed. Prentice-Hall: Pearson Education.

[2] (pg. 170) Lyons, R.G. (2011). Understanding Digital Signal Processing. 3rd ed. Prentice-Hall: Pearson Education.

[3] (pg. 176) Lyons, R.G. (2011). Understanding Digital Signal Processing. 3rd ed. Prentice-Hall: Pearson Education.

[4] (pg. 211) Lyons, R.G. (2011). Understanding Digital Signal Processing. 3rd ed. Prentice-Hall: Pearson Education.


John. M. (n.d.) Audio FIR Filtering: A Guide to Fundamental FIR Filter Concepts & Applications in Loudspeakers. Eclipse Audio. https://eclipseaudio.com/fir-filter-guide/

Lyons, R.G. (2011). Understanding Digital Signal Processing. 3rd ed. Prentice-Hall: Pearson Education.

One Size Does Not Fit All in Acoustics

Have you ever stood outside when it has been snowing and noticed that it feels “quieter” than normal? Have you ever heard your sibling or housemate play music or talk in the room next to you and hear only the lower frequency content on the other side of the wall? People are better at perceptually understanding acoustics than we give ourselves credit for. In fact our hearing and our ability to perceive where a sound is coming from is important to our survival because we need to be able to tell if danger is approaching. Without necessarily thinking about it, we get a lot of information about the world around us through localization cues gathered from the time offsets between direct and reflected sounds arriving at our ears that our brain performs quick analysis on compared to our visual cues.

Enter the entire world of psychoacoustics

Whenever I walk into a music venue during a morning walk-through, I try to bring my attention to the space around me: What am I hearing? How am I hearing it? How does that compare to the visual data I’m gathering about my surroundings? This clandestine, subjective information gathering is important to reality check the data collected during the formal, objective measurement processes of systems tunings. People spend entire lifetimes researching the field of acoustics, so instead of trying to give a “crash course” in acoustics, we are going to talk about some concepts to get you interested in the behavior that you have already been spending your whole life learning from an experiential perspective without realizing it. I hope that by the end of reading this you will realize that the interactions of signals in the audible human hearing range are complex because the perspective changes depending on the relationships of frequency, wavelength, and phase between the signals.

The Magnitudes of Wavelength

Before we head down this rabbit hole, I want to point out one of the biggest “Eureka!” moments I had in my audio education was when I truly understood what Jean-Baptiste Fourier discovered in 1807 [1] regarding the nature of complex waveforms. Jean-Baptiste Fourier discovered that a complex waveform can be “broken down” into its many component waves that when recombined create the original complex waveform. For example, this means that a complex waveform, say the sound of a human singing, can be broken down into the many composite sine waves that add together to create the complex original waveform of the singer. I like to conceptualize the behavior of sound under the philosophical framework of Fourier’s discoveries. Instead of being overwhelmed by the complexities as you go further down the rabbit hole, I like to think that the more that I learn, the more the complex waveform gets broken into its component sine waves.

Conceptualizing sound field behavior is frequency-dependent


One of the most fundamental quandaries about analyzing the behavior of sound propagation is due to the fact that the wavelengths that we work with in the audible frequency range vary in orders of magnitude. We generally understand the audible frequency range of human hearing to be 20 cycles per second (Hertz) -20,000 cycles per second (20 kilohertz), which varies with age and other factors such as hearing damage. Now recall the basic formula for determining wavelength at a given frequency:

Wavelength (in feet or meters) = speed of sound (feet or meters) / frequency (Hertz) **must use same units for wavelength and speed of sound i.e. meters and meters per second**

So let’s look at some numbers here given specific parameters of the speed of sound since we know that the speed of sound varies due to factors such as altitude, temperature, and humidity. The speed of sound at “average sea level”, which is roughly 1 atmosphere or 101.3 kiloPascals [1]), at 68 degrees Fahrenheit (20 degrees Celsius), and at 0% humidity is approximately 343 meters per second or approximately 1,125 feet per second [2]. There is a great calculator online at sengpielaudio.com if you don’t want to have to manually calculate this [3]. So if we use the formula above to calculate the wavelength for 20 Hz and 20kHz with this value for the speed of sound we get (we will use Imperial units because I live in the United States):

Wavelength of 20 Hz= 1,125 ft/s / 20 Hz = 56.25 feet

Wavelength of 20 kHz or 20,000 Hertz = 1,125 ft/s / 20,000 Hz = 0.0563 feet or 0.675 inches

This means that we are dealing with wavelengths that range from roughly the size of a penny to the size of a building. We see this in a different way as we move up in octaves along the audible range from 20 Hz to 20 kHz because as we increase frequency, the number of frequencies per octave band increases logarithmically.

32 Hz-63 Hz

63-125 Hz

125-250 Hz

250-500 Hz

500-1000 Hz

1000-2000 Hz

2000-4000 Hz

4000-8000 Hz

8000-16000 Hz

Look familiar??

Unfortunately, what this ends up meaning to us sound engineers is that there is no “catch-all” way of modeling the behavior of sound that can be applied to the entire audible frequency spectrum. It means that the size of objects and surfaces obstructing or interacting with sound may or may not create issues depending on its size in relation to the frequency under scrutiny.

For example, take the practice of placing a measurement mic on top of a flat board to gather what is known as a “ground plane” measurement. For example, placing the mic on top of a board, and putting the board on top of seats in a theater. This is a tactic I use primarily in highly reflective room environments to take measurements of a loudspeaker system in order to observe the system behavior without the degradation from the reflections in the room. Usually, because I don’t have control over changing the acoustics of the room itself (see using in-house, pre-installed PAs in a venue). The caveat to this method is that if you use a board, the board has to be at least a wavelength at the lowest frequency of interest. So if you have a 4ft x 4 ft board for your ground plane, the measurements are really only helpful from roughly 280 Hz and above (solve for : 1,125 ft/s / 4 ft  ~280 Hz given the assumption of the speed of sound discussed earlier). Below that frequency, the wavelengths of the signal under test will be larger in relation to the board so the benefits of the ground plane do not apply. The other option to extend the usable range of the ground plane measurement is to place the mic on the ground (like in an arena) so that the floor becomes an extension of the boundary itself.

Free Field vs. Reverberant Field:

When we start talking about the behavior of sound, it’s very important to make the distinction about what type of sound field behavior we are observing, modeling, and/or analyzing. If that isn’t confusing enough, depending on the scenario, the sound field behavior will change depending on what frequency range is under scrutiny. Most loudspeaker prediction software works by using calculations based on measurements of the loudspeaker in the free field. To conceptualize how sound operates in the free field, imagine a single, point-source loudspeaker floating high above the ground, outside, and with no obstructions insight. Based on the directivity index of the loudspeaker, the sound intensity will propagate outward from the origin according to the inverse square law. We must remember that the directivity index is frequency-dependent, which means that we must look at this behavior as frequency-dependent. As a refresher, this spherical radiation of sound intensity from the point source results in 6dB loss per doubling of distance. As seen in Figure A, sound intensity propagating at radius “r” will increase by a factor of r^2 since we are in the free field and sound pressure radiates omnidirectionally as a sphere outward from the origin.

Figure A. A point source in the free field exhibits spherical behavior according to the inverse square law where sound intensity is lost 6dB per doubling of distance


The inverse square law applies to point-source behavior in the free field, yet things grow more complex when we start talking about line sources and Fresnel zones. The relationship between point source and line source behavior changes whether we are observing the source in the near field or far field since a directional source becomes a point source if observed in the far-field. Line source behavior is a subject that can have an entire blog or book on its own, so for the sake of brevity, I will redirect you to the Audio Engineering Society white papers on the subject such as the 2003 white paper on “Wavefront Sculpture Technology” by Christian Heil, Marcel Urban, and Paul Bauman [4].

Free field behavior, by definition, does not take into account the acoustical properties of the venue that the speakers exist in. Free field conditions exist pretty much only outdoors in an open area. The free field does, however, make speaker interactions easier to predict especially when we have known direct (on-axis) and off-axis measurements comprising the loudspeakers’ polar data. Since loudspeakers manufacturers have this high-resolution polar data of their speakers, they can predict how elements will interact with one another in the free field. The only problem is that anyone who has ever been inside a venue with a PA system knows that we aren’t just listening to the direct field of the loudspeakers even when we have great audience coverage of a system. We also listen to the energy returned from the room in the reverberant field.

As mentioned in the introduction to this blog, our hearing allows us to gather information about the environment that we are in. Sound radiates in all directions, but it has directivity relative to the frequency range being considered and the dispersion pattern of the source. Now if we take that imaginary point source loudspeaker from our earlier example and listen to it in a small room, we will hear not only the direct sound coming from the loudspeaker to our ears, but also the reflections from the loudspeaker bouncing off the walls and then back at our ears delayed by some offset in time. Direct sound often correlates to something we see visually like hearing the on-axis, direct signal from a loudspeaker. Since reflections result from the sound bouncing off other surfaces then arriving at our ears, what they don’t contribute to the direct field, they add to the reverberant field that helps us perceive spatial information about the room we are in.


Signals arriving on an obstructed path to our ears we perceive as direct arrivals, whereas signals bouncing off a surface and arriving with some offset in time are reflections


Our ears are like little microphones that send aural information to our brain. Our ears vary from person to person in size, shape, and the distance between them. This gives everyone their own unique time and level offsets based on the geometry between their ears which create our own individual head-related transfer functions (HRTF). Our brain combines the data of the direct and reflected signals to discern where the sound is coming from. The time offsets between a reflected signal and the direct arrival determine whether our brain will perceive the signals as coming from one source or two distinct sources. This is known as the precedence effect or Haas effect. Sound System Engineering by Don Davis, Eugene Patronis, Jr., & Pat Brown (2013), notes that our brain integrates early reflections arriving within “35-50 ms” from the direct arrival as a single source. Once again, we must remember that this is an approximate value for time since actual timing will be frequency-dependent. Late reflections that arrive later than 50ms do not get integrated with the direct arrival and instead are perceived as two separate sources [5]. When two signals have a large enough time offset between them, we start to perceive the two separate sources as echoes. Specular reflections can be particularly obnoxious because they arrive at our ears either with an increased level or angle of incidence such that they can interfere with our perception of localized sources.

Specular reflections act like reflections off a mirror bouncing back at the listener


Diffuse reflections, on the other hand, tend to lack localization and add more to the perception of “spaciousness” of the room, yet depending on frequency and level can still degrade intelligibility. Whether the presence of certain reflections will degrade or add to the original source are highly dependent on their relationship to the dimensions of the room.


Various acoustic diffusers and absorbers used to spread out reflections [6]

In the Master Handbook of Acoustics by F. Alton Everest and Ken C. Pohlmann (2015), they illustrate how “the behavior of sound is greatly affected by the wavelength of the sound in comparison to the size of objects encountered” [7]. Everest & Pohlmann describe how the varying size of wavelength depending on frequency means that how we model sound behavior will vary in relation to the room dimensions. There is a frequency range at which in smaller rooms, the dimensions of the room are shorter than the wavelength such that the room cannot contribute boosts due to resonance effects [7]. Everest & Pohlmann note that when the wavelength becomes comparable to room dimensions, we enter modal behavior. At the top of this range marks the “cutoff frequency” to which we can begin to describe the interactions using “wave acoustics”, and as we progress into the higher frequencies of the audible range we can model these short-wavelength interactions using ray behavior. One can find the equations for estimating these ranges based on room length, width, and height dimensions in the Master Handbook of Acoustics. It’s important to note that while we haven’t explicitly discussed phase, its importance is implied since it is a necessary component to understanding the relationship between signals. After all, the phase relationship between two copies of the same signal will determine whether their interaction will result in constructive or destructive interference. What Everest & Pohlmann are getting at is that how we model and predict sound field behavior will change based on wavelength, frequency, and room dimensions. It’s not as easy as applying one set of rules to the entire audible spectrum.

Just the Beginning

So we haven’t even begun to talk about the effects of properties of surfaces such absorption coefficients and RT60 times, and yet we already see the increasing complexity of the interactions between signals based on the fact we are dealing with wavelengths that differ in orders of magnitude. In order to simplify predictions, most loudspeaker prediction software uses measurements gathered in the free field. Although acoustic simulation software, such as EASE, exists that allows the user to factor in properties of the surfaces, often we don’t know the information that is needed to account for things such as absorption coefficients of a material unless someone gets paid to go and take those measurements. Or the acoustician involved with the design has well documented the decisions that were made during the architecture of the venue. Yet despite the simplifications needed to make prediction easier, we still carry one of the best tools for acoustical analysis with us every day: our ears. Our ability to perceive information about the space around us based on interaural level and time differences from signals arriving at our ears allows us to analyze the effects of room acoustics based on experience alone. It’s important when looking at the complexity involved with acoustic analysis to remember the pros and cons of our subjective and objective tools. Do the computer’s predictions make sense based on what I hear happening in the room around me? Measurement analysis tools allow us to objectively identify problems and their origins that aren’t necessarily perceptible to our ears. Yet remembering to reality check with our ears is important because otherwise, it’s easy to get lost in the rabbit hole of increasing complexity as we get further into our engineering of audio. At the end of the day, our goal is to make the show sound “good”, whatever that means to you.


[1] https://www.aps.org/publications/apsnews/201003/physicshistory.cfm

[2] (pg. 345) Giancoli, D.C. (2009). Physics for Scientists & Engineers with Modern Physics. Pearson Prentice Hall.

[3] http://www.sengpielaudio.com/calculator-airpressure.htm

[4] https://www.aes.org/e-lib/browse.cfm?elib=12200

[5] (pg. 454) Davis, D., Patronis, Jr., E. & Brown, P. Sound System Engineering. (2013). 4th ed. Focal Press.

[6] “recording studio 2” by JDB Sound Photography is licensed with CC BY-NC-SA 2.0. To view a copy of this license, visit https://creativecommons.org/licenses/by-nc-sa/2.0/

[7] (pg. 235) Everest, F.A. & Pohlmann, K. (2015). Master Handbook of Acoustics. 6th ed. McGraw-Hill Education.


American Physical Society. (2010, March). This Month in Physics History March 21, 1768: Birth of Jean-Baptiste Joseph Fourier. APS Newshttps://www.aps.org/publications/apsnews/201003/physicshistory.cfm

Davis, D., Patronis, Jr., E. & Brown, P. Sound System Engineering. (2013). 4th ed. Focal Press.

Everest, F.A. & Pohlmann, K. (2015). Master Handbook of Acoustics. 6th ed. McGraw-Hill Education.

Giancoli, D.C. (2009). Physics for Scientists & Engineers with Modern Physics. Pearson Prentice Hall.

JDB Photography. (n.d.). [recording studio 2] [Photograph]. Creative Commons. https://live.staticflickr.com/7352/9725447152_8f79df5789_b.jpg

Sengpielaudio. (n.d.). Calculation: Speed of sound in humid air (Relative humidity). Sengelpielaudio. http://www.sengpielaudio.com/calculator-airpressure.htm

Urban, M., Heil, C., & Bauman, P. (2003). Wavefront Sculpture Technology. [White paper]. Journal of the Audio Engineering Society, 51(10), 912-932.


The Beauty Lies In The Fractals

A Story by Arica Rust

When I walk down the street, I sometimes stop to look at plants or trees that I pass by. A tree above me in the autumn daylight lowers its branches to allow closer inspection of the maple leaves hanging from its limbs. At the end of the bow, its limbs divide into another set of branches nearly identical in number to the ones stemming from the original limb. Then yet again the branches divide into twigs each festooned with maple leaves fading from red to green as the older, larger leaves begin to darken to red with the coming cold weather. The new green leaves look like copies of the larger red ones: children of themselves like when I stand in front of the bathroom mirror with another mirror at my back and see many reiterations of myself stretching out to infinity towards the horizon. Inside each leaf, I see a memory.

In 1807 when Jean-Baptiste Fourier published his memoir On the Propagation of Heat in Solid Bodies [1], he described what would become known as the Fourier series wherein one can recreate a complex waveform by adding together its component waves.

The other night, I was lying in my hotel room with my headphones on, listening to one of my favorite tracks. In the silence of the mostly empty hotel, I closed my eyes and let my mind’s focus move from each instrument. I pulled forward the electric guitar, then the bass guitar, then the tom rolls, then the lead vocal, one-by-one, to the forefront of my mind like picking the petals off a flower. Then when finished, I lay each petal back into the mix to reconstruct the song in its wholeness like the semblance of the flower.

For a very long time, this listening process has been the closest I come to meditation. It brings me a sense of calm to hear a song this way, much like looking at a painting in a museum then stepping forward to look at each individual brushstroke. I hear this way in my everyday life if I shift my focus.

I am walking down a new street in a town I have never been to before that reminds me of everywhere and yet nowhere. I hear the reflections of cars whirring about, bouncing off the glass buildings. Then I shift my attention to the shuffle of my feet against the rough concrete, then shift again to hear the two people I pass by as they talk over coffee, and shift and shift and shift until the people talking sound like they are singing, the reflections off the glass buildings sound like striking bells, and my feet sound like a drunk drum beat. The world around me becomes an urban orchestra twisting and reconstructing itself in its own enveloping rhythms. Inside each sound, I hear a memory.

I reach above my head, brushing the sweaty hair poking out of my rock climbing helmet off my face. I forgot to pick the cable with a spanset to the cable bridge before we started going up to trim, and now I had to fix it. Standing on top of the motor distro I reached out to choke the cable with the spanset.

“What does it say on your arm?”

I turned my head around to see my friend but also my boss standing below me with his laptop in hand staring up at the Dune tattoo scrolled across my left forearm.

“What?!” I said. I was so fixated on trying to wrangle the cables in a hurry that the words went straight through my brain.

“Your arm. What does it say on your arm.”

I smiled, “ ‘Fear is the mind-killer.’ It is a quote from the book Dune by Frank Herbert.”

Instead of responding, he pulled up the t-shirt sleeve on his same arm to reveal a series of words written in Latin on his upper arm.

“We have the same tattoo,” his words grinned.

Maybe it was only for a split second, but in that second, I thought of the leaves on the trees spiraling off the branches identical to the ones that came before it, and inside each leaf was written one of the letters from the tattoos on our arms. Inside them, I read a memory.

Seven days into this show, the A1 and I had become friends talking about professors that we had in common from San Francisco State, but in different time periods. Some teachers and mentors last through generations like that. He always offered to buy me coffee during his morning excursions after our beginning-of-day checks were complete and walk-in started rolling. Come to think of it, he even had the same classic white-haired, “sound guy” ponytail that our professor had.

And the branches diverged yet again.

I had thought about something ahead of him in anticipation of something I knew he would think but had not yet thought and then when he thought it, he laughed in surprise and gratitude.

“You know, you are gonna make a great husband one day.”

My heart smiled, and in each word I heard a memory.

We had just finished dumping the truck and pushing all the cases into the dark theater. I finished helping with what I could on deck so now it was time to make my way towards FOH to see what we were working with today.

The FOH engineer was already there beginning to pull things out of the utility case to place them on top of his console in what was becoming our daily base configuration of the setup. An old man sat in a chair next to the house console, we had met earlier during introductions, and he told me he was the house tech.

After getting ourselves situated and ready to begin our verification steps, I began our daily procedure of moving systematically through the system du jour to check where we were at.

“We just had the [insert Manufacturer’s Name] guy come in to check the tuning a few months ago,” the house tech said.

“Oh, it’s all good, this is just part of our procedure every day,” I said cheerfully.

I moved the measurement microphone at the transition point between one side of the main hang and one side of the in-fills. There seemed to be a time difference present.

“Hey, do you mind if I see the tablet for a sec? It looks like there is a slight time offset between the mains and in-fills,” I said.

“I can’t give you access to the tablet. It has to be run by a house technician. Also, that seems impossible. This was just tuned.”

I just stared at him.

I went back up to the stage to grab something, or so I told myself.

“Are you OK?” the stage tech asked.

“Yeah, I’m fine. I’m just having a hard time getting this guy to help me.”

“Dude, he came up to me earlier when we were loading in and started asking me all these questions and I was like, ‘Man, you got to talk to her, she is our crew chief’ and he said, ‘Oh, that little girl over there? She is your crew chief?’” he told me.

I didn’t understand. I looked at him while he spoke and the words fell apart into their individual components trying to form themselves into a complete thought. Crew chief. She. Little girl. Man. None of these words made sense. They were not talking about me. The words fell out of his mouth and clanged onto the floor like a rigging shackle falling out of someone’s pocket.

Inside each word, I saw a memory. Leaves branching off of a trunk further and further and suddenly the jukebox in my brain flipped on and I started hearing The Beatles in my head:

I am he as you are he as you are me and we are all together….” 

And the focus shifted, the CD skipped, the record flipped to a new song:

“I am just a copy of a copy of a copy…” 

And the focus shifted again, spiraling out like leaves slowly fading to red on the branch of that tree and I could hear each word dripping off them like the sound of water droplets falling into a bigger pond. Then suddenly without warning, the orchestra surged with energy, gathering up into a great crescendo. I was walking backward and falling upwards and reading texts from a book forwards:

I will face my fear. 

I will permit it to pass over me and through me.

And when it has gone past I will turn the inner eye to see its path. 

Where the fear has gone there will be nothing. 

Only I will remain.”

And I’m inside my own memory.

Standing in front of the mirror in the bathroom of my childhood home where the door to the bathroom held a full-length mirror and swung inwards. When I stood in front of the mirror I saw myself reiterated out into infinity: a complex form split into its component parts.

Who is this that stood before me?

It seems that I keep being told who I am, but only I get to decide who I am…


When I open my eyes, I’m standing under the tree. The sunlight gently warms the outside of my face. My face. The wind begins to pick up, rustling through the leaves, and I pick their decisive sound out amidst the complexity of the orchestra.

Then they begin to fall.

One by one the tree sheds its leaves.

Returning to the dirt to be decomposed, eaten, and returned as food to feed itself to grow for the next spring.

“Fear is the mind-killer.”

A Note From The Author:

Once upon a time, before I focused on audio (and sometimes while), I was a writer. I published a collection of poetry in 2016, but haven’t written much since. It seems that in this time of uncertainty, we need art more than ever. I usually write technical blogs to focus on education in the audio world, but art and science exist to both love and hate one another. A historically bittersweet romance. Yet the beauty of this world lies in its complexity in each individual. Much like the Fourier transform, a complex world is the sum of its many individual parts. 


[1] https://www.aps.org/publications/apsnews/201003/physicshistory.cfm

Quotes from Books and Music:

Dune by Frank Herbert (https://dunenovels.com/)

“I Am The Walrus” by The Beatles (https://www.beatlesbible.com/songs/i-am-the-walrus/)

“Copy of A” by Nine Inch Nails (https://www.nin.wiki/Copy_Of_A)

(Not So) Basic Networking For Live Sound Engineers

Part Three: Networking Protocols

(or A History of IEEE Standards)

Read Part One Here

Read Part Two Here

Evaluating Applications

One thing I have learned from my do-it-yourself research in computer science that I have applied to understanding the world in general is the concept of building on “levels of abstraction.” (Once again, here I am quoting Carrie Ann Philbin from the “Crash Course: Computer Science” YouTube series) [1]. From the laptop that this blog was written on, to performing a show in an arena, all these things would not be possible if it were not for the multitude of smaller parts working together to create a system. Whether it is an arena concert divided into different departments to execute the gig or a data network broken up into different steps in the OSI Model, we can take a complicated system and break it down into its composite parts to understand how it works as a whole. Similarly, the efficiency and innovation of this compartmentalization in technology lays in the fact that one person can work on just one section of the OSI Model (like the Transport Layer) while not really needing to know anything about what’s happening on the other layers.


This is why I have spent all this time in the last two blogs of “Basic Networking For Live Sound Engineers” breaking up the daunting concept of networking into smaller composites from defining what is a network to designing topologies including VLANS and trunks. At this point, we have spent a lot of time talking about how everything from Cat6 cable to switches physically and conceptually works together. Now it’s time to really dive deep into the languages, or protocols, that these devices use to transmit audio. This is a fundamental piece in deciding on a network design because one protocol may be more appropriate for a particular design versus another. As we discuss how these protocols handle different aspects of a data packet differently, I want you to think about why one might be more beneficial in one situation versus another. After all, there are so many factors that go into the design of a system from working in pre-existing infrastructures to building networks from scratch, that we must take these variables into account in our network design decisions. A joke often appears in the world of live entertainment: you can have cheap, efficient, or quality. Pick 2.

What Is In A Packet, Really?

As a quick refresher from Part 2, data gets encapsulated in a process that involves the formation of a header and body for each packet. The very basic overall structure of a packet or frame includes a header and body. How you define each section and whether it is actually called a “packet” or “frame” depends on what layer of the OSI Model you are referring to.

Basic structure of a data packet…or do I mean frame? It depends!!


Now this back and forth of terminology seemed really confusing until I read a thread in StackExchange that pointed out that the “combination” of the header and data at Level 2 is called a frame and at Level 3 is called a packet [2]. The change in terminology corresponds to different additions in the encapsulation process at different layers in the OSI Model.

In an article by Alison Quine on “How Encapsulation Works Within the TCP/IP Model,” the encapsulation process involves adding headers onto a body of data at each step starting from the top of the OSI model at the Application layer and moving down to Physical Layer, and then stripping off each of those headers as you move back up the OSI Model in reverse through each process [3]. That means that during the encapsulation process at each parameter within the OSI Model for a given network, there is another header that gets added on to help the data get to the right place. Audinate’s Dante Level 3 training on “IP Encapsulation” talks about this process in a network stack. At the Application level, we start with a piece of data. Then at the Transport Layer, the source port, destination port, and the transport protocol attach to the data or payload. At the Network Layer, the Destination and Source IP address add on top of what already exists in the Transport Layer. Then at the Data Link layer, the destination and source MAC addresses attach on top of everything else in the frame by referencing an ARP table [4]. ARP, or Address Resolution Protocol, uses message requests to build tables in devices (like a switch, for example) to match IP addresses to MAC addresses, and vice versa.

So I want to pause for a second before we move onward to really drive the point home that the OSI Model is a conceptual tool used for educational purposes to talk about different aspects of networking. For example, you can use the OSI Model to understand network protocols or understand different types of switches. The point is we are using it here to understand the signal flow in the encapsulation process of data, just as you would look at a chart of signal flow for a mixer.

Check 1, Check 2…

There is the old visage that time equals money, but the reality of working in live sound is that time is of the essence. Lost audio packets that create jitter or sound audibly delayed (our brains are very good at detecting time differences) are not acceptable. So it goes without saying that data has to arrive as close to synchronously as possible. In my previous blog on clocks, I talked about the importance of different digital audio devices starting their sampling at the same rate based on a leader clock (also referred to as a master clock) in order to preserve the original waveform. An accurate clock is important in preserving the word length, or bits, of the data. Let’s look at this example:





In this example, we have two 16 bit words which represent two copies of the same sample of data traveling between two devices that are in sync because of the same clock. Now, what happens if the clock is off by just one bit?

If the sample is off by even just one bit, the whole word gets shifted and produces an entirely different value altogether! This manifests itself as digital artifacts, jitter, or no signal at all. So move up a “level of abstraction” to the data packet at the Network level in the OSI Model and you can understand why it is important for packets to arrive on time in a network so that bits of data don’t get lost or packets don’t collide because otherwise, it will create a broadcast storm. But as I’ve mentioned before, UDP and TCP/IP handles data accuracy and timing differences.


Recall from Part 2 that TCP/IP checks for a “handshake” between the receiver and sender to validate the data transmission at the cost of time, while UDP decreases transmission time in exchange for not doing this back and forth validation. In an article from LearnCisco on “Understanding the TCP/IP Transport Layer,” TCP/IP is a “connection-oriented protocol” that requires adding more processes into the header to verify the “handshake” between the sender and receiver [5]. On the other hand, UDP acts as a “connectionless protocol”:

[…] there will be some error checking in the form of checksums that go along with the packet to verify integrity of those packets. There is also a pseudo-header or small header that includes source and destination ports. And so, if the service is not running on a specific machine, then UDP will return an error message saying that the service is not available. [5]

So instead of verifying that the data made it to the destination, UDP will check that the packet’s integrity is solid and if there is a path available for it to take. If there is no available path, the packet just won’t get sent. Due to the lack of “error checking” in UDP, it is imperative that the packets arrive at their correct destination and on time. So how does a network actually keep time? In reference to what?

Time, Media Clocking, and PTP

Let’s get philosophical for a moment and talk about the abstraction of time. So I have a calendar on my phone that I schedule events and reminders based on a day divided into hours and minutes. This division of hours and minutes are arguably pointless without being referenced to some standard of time, which in this case is the clock on my phone. I assume that the clock inside my phone is accurate in relation to a greater reference of time wherever I am located. The standard for civil time is UTC or “Coordinated Universal Time” which is a compromise between the TAI standard, based on atomic clocks, and UT1, which is based on an average solar day, by making up for it in leap seconds [6]. In order for me to have a Zoom call with someone in another time zone, we need a reference to the same moment wherever we are because it doesn’t matter if I say our Zoom call is at 12 pm Pacific Standard Time and they think it is at 3 pm Eastern Standard Time as long as our clocks have the same ultimate point of reference, which for us civilians is UTC. In this same sense, digital devices need a media clock with reference to a common master (but we are going to update this term to leader) in order to make sure data gets transmitted without bit-slippage as we discussed earlier.


In a white paper titled “Media Clock Synchronization Based On PTP” from the Audio Engineering Society 44th International Conference in San Diego, Hans Weibel and Stefan Heinzmann note that, “In a networked media system it is desirable to use the network itself for ensuring synchronization, rather than requiring a separate clock distribution system that uses its own wiring” [7]. This is where PTP or Precision Time Protocol comes in. The IEEE (Institute of Electrical and Electronics Engineers) 1588 standardized this protocol in 2002, and expanded it further in 2008 [7]. The 2002 standard created PTPv1 that works using UDP on a level of microsecond accuracy by sending sync messages between leader and follower clocks. As described in the Weibel and Heinzmann paper, on the Application layer follower nodes compare their local clocks to the sync messages sent by the leader and adjust their clocks to match while also taking into account the absolute time offset in the delay between the leader and follower [7]. Say we have two Devices A and B:


Device A (our leader for all intents and purposes) sends a Sync message to Device B saying, “This is what time it is. 11:00 A.M.”

Device B says, “Ok. I think it’s 12:00 P.M,” This is the Follow_Up message.“What time did you send that message?” says the Delay_Request message.

Device A replies, “At 11:00 A.M.” This is the Delay_Response message. “What time did you receive it?”

Device B replies, “At 12:15 P.M. Ok, I’ll adjust.”

Analogy of clocking communication in PTPv1 as described in IEEE 1588-2002

This back and forth allows the follower to adjust their clocks to whatever clock is considered the leader according to the best master clock algorithm (which should be renamed the best leader clock algorithm) and the ultimate reference being considered the grandmaster clock/grandleader clock [8]. Fun fact: in the Weibel and Heinzmann paper, they point out that “the epoch of the PTP time scale is midnight on 1 January TAI. A sampling point coinciding with this point in absolute time is said to have zero phase” [9].

So in 2008, the standards got updated to PTPv2, which of course is not backwards compatible with PTPv1 [10]. But this update includes changing how clock quality is determined, going from all PTP messages being multicast in v1 to having the option of unicast in v2, improving clocking accuracy from microseconds to nanoseconds, and the introduction of transparent clocks. The 1588-2002 standard introduced the concept of ordinary clocks as a device or clock node with one port while boundary clocks have two or more ports [11]. Switches and routers can be an example of a boundary clock while other end-point devices including audio equipment can be examples of ordinary clocks. A Luminex article titled “PTPv2 Timing protocol in AV Networks” describes how “[a] Transparent Clock will calculate how long packets have spent inside of itself and add a correction for that to the packets as they leave. In that sense, the [boundary clock] becomes ‘transparent’ in time, as if it is not contributing to delay in the network” [12]. PTPv2 improves upon the Sync message system by adding an announce message scheme for electing the grandmaster/grandleader clock. The Luminex article illustrates this by describing how a PTPv2 device starts up in a state “listening” for announce messages that include information about the quality of the clock until a determined amount of time called the Announce Timeout Interval. If no messages arrive, that device becomes the leader. Yet if it receives an announce message indicating the other clock has superior quality, it will revert to a follower and make the other device the leader [13]. It is these differences in the handling of clocking between IEEE 1588-2002 and 2008 that will be key to understanding the underlying difference when talking about Dante versus AVB.

Dante, AVB, AES67, RAVENNA, and Milan

Much like the battles between Blu-Ray, HD DVDs, and other contending audiovisual formats, you can bet that there has been a struggle over the years to create a manufacturer-independent standard for audio-over-IP or networking protocols used in the audio world. The two major players that have come out on top in terms of widespread use in the audio industry are AVB and Dante. AES67 and RAVENNA are popular as well, RAVENNA dominating the world of broadcast.

Dante, created by the company Audinate, began in 2003 under the key principle that still makes the protocol appealing today: the ability to use pre-existing IT infrastructures to distribute audio over a network [14]. Its other major appeal is that it allows for use of redundancy that makes it particularly appealing to the world of live production. In a Dante network you can set up a primary and secondary network, the secondary being an identical “copy” of the primary so that if the primary network fails, it switches over seamlessly to the secondary. Dante works at the Network Layer (Layer 3) of the OSI Model by resting on top of the IP addressing schemes already in place in a standard IT networking system and works above this. It’s understandable financially why a major corporate office would want to use this protocol because of the savings on overhauling the entire infrastructure of an office building to put in new switches, upgrade topologies, and so on.

An example of a basic Dante Network with redundant primary (blue) and secondary (red) networks

The adaptable nature of Dante comes from existing as a Layer 3 protocol, which allows one to use most Gigabit switches and even sometimes 100Mbps switches to distribute a Dante network (but only if it’s solely a 100Mbps network) [15]. That being said, there are some caveats. It is strongly recommended (and in 100Mbps networks, mandatory) to use specific Quality of Service (QoS) settings when configuring managed switches (switches whose ports and other features are configurable usually via a software GUI) to be used for Dante. This includes flagging specific DSCP values that are important to Dante traffic as high priority, including our friend PTP. Other network traffic can exist alongside Dante traffic on a network as long as the subnets are configured correctly (for more info on what I mean by subnets, see Part 1 of this blog series). I myself personally prefer configuring specific VLANs for dedicated network control traffic and Dante to keep the waters clear between the two. This is because I know control network traffic will not be prioritized over Dante traffic because of QoS, but at the same time Dante was made for this so as long as your subnets are configured correctly, it should be fine. The issue is that with Dante using PTPv1, even with proper QoS settings the clock precision can get choked if there are issues with bandwidth. The Luminex article mentioned earlier discusses this: “Clock precision can still be affected by the volume of traffic and how much contention there is for priority. Thus; PTP clock messages can get stuck and delayed in the backbone; in the switches between your devices” [16].

So since Dante uses PTPv1, Dante will find the best device on the network to be the Master (Leader) Clock using PTP as the clocking system for the entire network, and if one device drops out, it will elect a new Master (Leader) Clock based on the parameters we discussed in PTPv1. This can be manually configured too if necessary. According to the 1588-2008 standard, PTPv2 was not backwards compatible with PTPv1, but ANOTHER revision of the standard in 2019 (IEEE 1588-2019) included backwards compatibility [17]. AES67, RAVENNA, and AVB use PTPv2 (although AVB uses its own profile of IEEE 1588-2008, which we will talk about later). In a Shure article on “Dante And AES67 Clocking In Depth,” they point out that PTPv1 and PTPv2 can “coexist on the same network”, but “[i]f there is a higher prevision PTPv2 clock on a network, then one Dante device will synchronize to the higher-precision PTPv2 clock and act as a Boundary Clock for PTPv1 devices” [18]. So what we see happening is that end devices in the network that support PTPv2 introduce backwards compatibility with PTPv1, but the problem is that since these Layer 3 networks rely on standard network infrastructures, it’s not as easy to find switches that are capable of handling PTPv1 and PTPv2. On top of that, there is this juggling of keeping track of which devices are using what clocking system, and you can imagine that as this scales upward, it becomes a bigger and bigger headache to manage.

AES67 and RAVENNA use PTPv2 as well, but try to address some of these issues with improvements without reinventing the wheel. AES67 and RAVENNA also operate as Layer 3 protocols on top of standard IP networks, but were created by different organizations. The Audio Engineering Society came up with the standards outlining AES67 first in 2013 with revisions thereafter [19]. The goal of AES67 is to create a set of standards that allow for interoperability between devices, which is a concept we are going to see come up again when we talk about AVB in more depth, but AES67 applies it differently. What AES67 aimed to achieve is to use preexisting standards from the IEEE and IETF (Internet Engineering Task Force) to make a higher performing audio networking protocol.  What’s interesting is that because AES67 shares many of the same standards as RAVENNA, RAVENNA supports a profile of AES67 as a result [20]. RAVENNA is an audio-over-IP protocol popular particularly in the broadcast world. The place of RAVENNA as the standard in broadcasting comes from its flexibility in ability to transport a multitude of different data formats and sampling rates for both audio and video, along with low latency, and support of WAN connections [21]. So as technology improves, new protocols keep being made to try to accommodate the new advances, but one starts to wonder why don’t the standards just get revised themselves instead of trying to make the products reflect an ever-changing industry? AES67 kind of addresses this by using the latest IEEE and IETF standards, but maybe the solution is deeper than that. Well that’s exactly what happened with the creation of AVB.

AVB stands for Audio Video Bridging and differs on a fundamental level from Dante because it is a Data Link, Layer 2 protocol, whereas Dante is a Network, Level 3 protocol. So since these standards affect Layer 2, a switch must be designed for AVB implementation in order to be compatible with the standards on that fundamental level. This brings in an OSI Model conceptualization of switches designed for a Layer 2 implementation versus a Layer 3 implementation. In fact, the concept behind designing AVB stemmed from the need to “standardize” audio-over-IP so compatible different devices could talk across different manufacturers. Dante, being owned by a company, requires specific licensing for devices to be “Dante-enabled.” The IEEE wanted to create standards for AVB to ensure compatibility across all devices on the network regardless of the manufacturer. These AVB compatible switches have been notoriously magnitudes more expensive than a more common, run-of-the-mill TCP/IP switch, so it has often been seen as a roadblock to AVB deployments simply because of the cost factor in replacing an infrastructure of more common (read cheaper), Layer 3 switches with Layer 2 AVB-compatible (read more expensive) switches.

When talking about most networking protocols, especially AVB, the discussion dives into layers and layers of standards and revisions. AVB in and of itself, refers to the IEEE 802.1 set of standards along with others outlined in IEEE 1722 and IEEE 1733 [22]. So I know all this talk of IEEE standards gets really confusing so it is helpful to remember that there is a hierarchy to all this. In an AES White Paper by Axel Holzinger and Andreas Hildebrand with a very long title called “Realtime Linear Audio Distribution Over Networks A Comparison of Layer 2 And 3 Solutions Using The Example Of Ethernet AVB And RAVENNA” they lay out the four AVB protocols in 802.1:



It’s important here to stop and go over some new terminology when discussing devices in an AVB domain since it is Layer 2, after all. Instead of talking about a network, senders, receivers, and switches we are going to replace the same consecutive terms with domain, talkers, listeners, and bridges [24].

An example of a basic AVB network

IEEE 802.1AS is basically an AVB-specific profile of the IEEE 1588 standards for PTPv2. One of the editions of this standard, IEEE 802.1AS-2011, introduces gPTP (or “generalized PTP”). When used in conjunction with IEEE 1722-2011, gPTP introduces a presentation time for media data which indicates “when the rendered media data shall be presented to the viewer or listener” [25]. What I have learned from all this research is that the IEEE loves nesting new standards within other standards like a convoluted russian doll. The Stream Reservation Protocol (SRP also known as IEEE 802.1Qat) is the key that makes AVB shine from other network protocols because it allows endpoints in the network to check routes and reserve bandwidth, and SRP “checks end-to-end bandwidth availability before an A/V stream starts” [26]. This basically ensures that data won’t be sent until stream bandwidth is available and lets the endpoints decide the best route to take in the domain. So in a Dante deployment, adding additional switches daisy-chained in a network increases overall network latency the more hops that are added, and results in a need to reevaluate the network topology configuration entirely. Dante latency is set per device and depending on the size of the network, but with AVB, thanks to SRP and the QoS improvements, the bandwidth reservation gets announced through the network and latency times are kept lower even with large network deployments.

The solidity and fast communications of AVB networks have made them more common because of their ability, as the name implies, to carry audio, video, and data on the same network. The problem with all these network protocols follows the logic of Moore’s Law. If you couldn’t tell from all the revisions of IEEE standards that I have been listing, these technologies improve and get revised very quickly. Because technology is constantly improving at a blinding pace, it’s no wonder that gear manufacturing companies haven’t been able to “settle” on a common standard the way that they settled on, say, the XLR cable. This is where the newest addition to the onslaught of protocols comes in: Milan.

The standards of AVB kept developing with more improvements just like the revisions of IEEE 1588, and have led to the latest development in AVB technology called Milan. With the collaboration of some of the biggest names in the business, Milan was developed as a subset of standards within the overarching protocol of AVB. Milan includes the use of a primary and secondary redundancy scheme like that of Dante, which was not available in previous AVB networks, among other features. The key here is that Milan is open source meaning that manufacturers can develop their own implementation of Milan specific to their gear as long as it follows the outlined standards [27]. This is pretty huge if you consider how many different networking protocols are used across different pieces of gear in the audio industry. Avnu Alliance, the organization of collaborating manufacturers who developed Milan, have put together the series of specifications for Milan under the idea that any product that is released with a “Milan-ready” certification, or a badge of that nature, will be able to talk to one another over this Milan network [28].


A Note On OSC And The Future

Before we conclude our journey through the world of networking, I want to take a minute for  OSC. Open Sound Control protocol, or OSC, is an open source communications protocol that was originally designed for use with electronic music instruments but has expanded to streamlining the communications between anything from controlling synthesizers, to connecting movement trackers and software programs, to controlling virtual reality [29]. It is not an audio transport protocol, but used for device communication like MIDI (except not like MIDI because it is IP-based). I think this is a great place to end on because OSC is a great example of the power of open source technology. The versatility in OSC and its open-source platform has allowed for many programs from small to large to implement this protocol, and it is a testimony to the improvement of workflows when everyone (i.e. open-source) has the ability to input changes to make things better. We’ve spent this entire blog talking about the many different standards that have been implemented over the years to try and improve upon previous technology. Yet a gridlock of progress ensues mostly due to the fact that a standard gets made and by the time it actually gets enacted, the standard is already out of date because the technology has already surpassed that previous point in time.


So maybe it’s time for something different.

Maybe the open source nature of Milan and OSC are the way of the future because if everyone can put their heads together to try and develop specifications that are fluid and open to change as opposed to restricted by the rigidity of bureaucracy, maybe hardware will finally be able to keep up with the pace of the minds of the people using it.


[1] https://www.youtube.com/playlist?list=PL8dPuuaLjXtNlUrzyH5r6jN9ulIgZBpdo


[3] https://www.itprc.com/how-encapsulation-works-within-the-tcpip-model/

[4] https://youtu.be/9glJEQ1lNy0

[5] https://www.learncisco.net/courses/icnd-1/building-a-network/tcpip-transport-layer.html

[6] https://www.iol.unh.edu/sites/default/files/knowledgebase/1588/ptp_overview.pdf

[7] https://www.aes.org/e-lib/browse.cfm?elib=16146 (pages 1-2)

[8] https://www.nist.gov/system/files/documents/el/isd/ieee/tutorial-basic.pdf

[9] https://www.aes.org/e-lib/browse.cfm?elib=16146 (page 5)

[10] https://en.wikipedia.org/wiki/Precision_Time_Protocol



[13] ibid.







[20] ibid.


[22 ]Kreifeldt, R. (2009, July 30). AVB for Professional A/V Use [White paper]. Avnu Alliance.

[23] https://www.aes.org/e-lib/browse.cfm?elib=16147

[24] ibid.

[25] https://www.aes.org/e-lib/browse.cfm?elib=16146 (page 6)

[26] Kreifeldt, R. (2009, July 30). AVB for Professional A/V Use [White paper]. Avnu Alliance.

[27]https://avnu.org/wp-content/uploads/2014/05/Milan-Whitepaper_FINAL-1.pdf (page 7)


[29] http://opensoundcontrol.org/osc-application-areas



Audinate. (2018, July 5). Dante Certification Program – Level 3 – Module 5: IP Encapsulation [Video]. YouTube.


Audinate. (2018, July 5). Dante Certification Program – Level 3 – Module 8: ARP [Video]. YouTube. https://www.youtube.com/watch?v=x4l8Q4JwtXQ

Audinate. (2018, July 5). Dante Certification Program – Level 3 – Module 23: Advanced Clocking [Video]. YouTube.


Audinate. (2019, December). The Relationship Between Dante, AES67, and SMPTE ST 2110 [White paper]. Uploaded to Scribd. Retrieved from


Audinate. (n.d.). History. https://www.audinate.com/company/about/history

Audinate. (n.d.). Networks and Switches.


Avnu Alliance. (n.d.). Avnu Alliance Test Plans and Specifications.


Bakker, R., Cooper, A. & Kitagawa, A. (2014). An introduction to networked audio [White paper]. Yamaha Commercial Audio. Retrieved from


Cambium Networks Community [Mark Thomas]. (2016, February 19). IEEE 1588: What’s the difference between a Boundary Clock and Transparent Clock? [Online forum post]. https://community.cambiumnetworks.com/t5/PTP-FAQ/IEEE-1588-What-s-the-difference-between-a-Boundary-Clock-and/td-p/50392

Cisco. (n.d.) Layer 3 vs Layer 2 Switching.


Crash Course. (2020, March 19). Computer Science [Video Playlist]. YouTube. https://www.youtube.com/playlist?list=PL8dPuuaLjXtNlUrzyH5r6jN9ulIgZBpdo

Eidson, J. (2005, October 10). IEEE 1588 Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems [PDF of slides]. Agilent Technologies. Retrieved from


Garner, G. (2010, May 28). IEEE 802.1AS and IEEE 1588 [Lecture slides]. Presented at Joint ITU-T/IEEE Workshop on The Future of Ethernet Transport, Geneva 28 May 2010. Retrieved from https://www.itu.int/dms_pub/itu-t/oth/06/38/T06380000040002PDFE.pdf

Holzinger, A. & Hildebrand, A. (2011, November). Realtime Linear Audio Distribution Over Networks A Comparison Of Layer 2 And Layer 3 Solutions Using The Example Of Ethernet AVB And RAVENNA [White paper]. Presented at the AES 44th International Conference, San Diego, CA, 2011 November 18-20. Retrieved from https://www.aes.org/e-lib/browse.cfm?elib=16147

Johns, Ian. (2017, July). Ethernet Audio. Sound On Sound. Retrieved from https://www.soundonsound.com/techniques/ethernet-audio

Kreifeldt, R. (2009, July 30). AVB for Professional A/V Use [White paper]. Avnu Alliance.

Laird, Jeff. (2012, July). PTP Background and Overview. University of New Hampshire InterOperability Laboratory. Retrieved from


LearnCisco. (n.d.). Understanding The TCP/IP Transport Layer.


LearnLinux. (n.d.). ARP and the ARP table.


Luminex. (2017, June 6). PTPv2 Timing protocol in AV networks. https://www.luminex.be/improve-your-timekeeping-with-ptpv2/

Milan Avnu. (2019, November). Milan: A Networked AV System Architecture [PDF of slides].

Mullins, M. (2001, July 2). Exploring the anatomy of a data packet. TechRepublic. https://www.techrepublic.com/article/exploring-the-anatomy-of-a-data-packet/

Network Engineering [radiantshaw]. (2016, September 18). What’s the difference between Frame, Packet, and Payload? [Online forum post]. Stack Exchange.


Opensoundcontrol.org. (n.d.). OSC Application Areas. Retrieved August 10, 2020 from http://opensoundcontrol.org/osc-application-areas

Perales, V. & Kaltheuner, H. (2018, June 1). Milan Whitepaper [White Paper]. Avnu Alliance. https://avnu.org/wp-content/uploads/2014/05/Milan-Whitepaper_FINAL-1.pdf

Precision Time Protocol. (n.d.). In Wikipedia. Retrieved August 10, 2020, from https://en.wikipedia.org/wiki/Precision_Time_Protocol

Presonus. (n.d.). Can Dante enabled devices exist with other AVB devices on my network? https://support.presonus.com/hc/en-us/articles/210048823-Can-Dante-enabled-devices-exist-with-other-AVB-devices-on-my-network-

Quine, A. (2008, January 27). How Encapsulation Works Within the TCP/IP Model. IT Professional’s Resource Center.


Quine, A. (2008, January 27). How The Transport Layer Works. IT Professional’s Resource Center. https://www.itprc.com/how-transport-layer-works/

RAVENNA. (n.d.). AES67 and RAVENNA In A Nutshell [White Paper]. RAVENNA. https://www.ravenna-network.com/app/download/13999773923/AES67%20and%20RAVENNA%20in%20a%20nutshell.pdf?t=1559740374

RAVENNA. (n.d.). What is RAVENNA?


Rose, B., Haighton, T. & Liu, D. (n.d.). Open Sound Control. Retrieved August 10, 2020 from https://staas.home.xs4all.nl/t/swtr/documents/wt2015_osc.pdf

Shure. (2020, March 20). Dante And AES67 Clocking In Depth. Retrieved August 10, 2020 from https://service.shure.com/s/article/dante-and-aes-clocking-in-depth?language=en_US

Weibel, H. & Heinzmann, S. (2011, November). Media Clock Synchronization Based On PTP [White Paper]. Presented at the AES 44th International Conference, San Diego, CA, 2011 November 18-20. Retrieved from https://www.aes.org/e-lib/browse.cfm?elib=16146

Basic Networking For Live Sound Engineers

Part Two: Designing A Network*

Read Part One Here

This blog is dedicated to Sidney Wilson. You make electronics so cool.

The Road To Data

In my last blog, “Basic Networking For Live Sound Engineers: Part 1 Defining A Network,” we delved deep into what creating a network entails, from understanding IP addresses and subnet masks on a binary level to connecting a laptop to a network to talk to a piece of gear. Now that we have laid the groundwork for a foundational knowledge and vocabulary of networking, we can move into how we put this together to construct a network for practical applications in the world of live sound. The last blog talked about basic structures of point-to-point transmission and ended with incorporating switches and routers to build another level of complexity to our signal flow. In this blog, we are going to put on our network system designer hats as well as our engineering hats to think about what we are trying to accomplish with a network in order to determine how we should build it, how we should divide it, and what level of redundancy we wish to build into our design.

From The Abstract

It is about time we introduce the OSI Model into our discussion of networking because in this blog, and especially in the next one, it is going to keep coming up in order to help us grasp networking signal flow on a conceptual level. The OSI Model or “Open Systems Interconnection Model” [1] is a conceptual model that educators use to break down the approach to networking into a hierarchy of 7 “levels of abstraction”, to use a term I borrowed from Carrie Ann Philbin’s “Crash Course Computer Science” Tutorials on YouTube [2]. (Sidebar: If you want to know more about how computers work, watch her video series because it’s amazing.)

The 7 Layers of the OSI Model


Let’s briefly break this down starting from the Physical layer and moving upward. At the very bottom at the Physical layer, this literally addresses the physical cable that you are using to plug one device into another. It also includes the binary bits or electrical signals that comprise the data we are moving around. As we move up a step, we arrive at the Data Link layer. The Lifeware article by Bradley Mitchell explains how this layer gets further subdivided into the “Logical Link Control” and “Media Access Control” layers as it is the “gatekeeper” that verifies data before it gets packaged [1]. Moving up from there, we arrive at the Network Layer and this is where data generally gets packaged, and the management involved in IP addressing falls in this realm. If the packages in the Network layer were cars, the Transport layer is where all the highways lie. This is where network protocols tend to fall in, but we will see in the next blog that it depends. Next up, I like to think of the Session layer like a session in your favorite digital audio workstation. This is where we start putting together these different highways and lower levels like taking a bunch of different audio tracks from different recordings and putting them together in one workspace. As we move up into the Presentation layer, this entails the methods that dictate how this data is going to be conveyed in the highest level at Application to the end user.  At the top of the model, we see the highest “level of abstraction” in Application. This is what the end user engages with, and by that I mean it is the most familiar way that we log in to a network. From now on, as we go through different aspects of our network design we are going to refer back to the OSI Model to help give us a reference of how these concepts work into the greater picture of our network design. Why are we going to do this? This is how we will think about the different steps of conceptualization that we will need to address (at least on some level) of our network design in order for it to work. The important thing to remember here is that even though we have all this granulation of detail available to visualize our network, manufacturers have put A LOT of money and research into making some of these levels simple for you to implement so that you (hopefully) don’t have to worry about them too much.

Down To The Wire

Now that our brains are primed with this level of abstraction, let’s talk about what cabling we can use for our network. In most networking applications, there are two major categories of cabling that you will likely encounter: copper and fiber. In the copper world, we often hear the terms “Ethernet”, “RJ45”, “Cat5”, “Cat5e”, and “Cat6” thrown around and used interchangeably as common types of network cabling. They often get used as misnomers instead of what they ACTUALLY refer to.

The term “Ethernet” actually doesn’t refer to a type of cable itself, it refers to a protocol called 802.3 as defined by the Institute of Electrical and Electronics Engineers (the IEEE, remember them from last time?) [3]. As mentioned in this Linksys article, Ethernet refers to “the most common type of Local Area Network (LAN) used today” [3]. (See how it’s all coming back around?) The most common types of cabling used for Ethernet includes the Cat5, Cat5e, and Cat6 specifications. The number refers to the generation of the cable [4]. The biggest differences between these three specifications is the bandwidth speeds these different specs can handle. This is a factor of the way the twisted pairs are wound inside the cable. The twisted pairs in Cat6 cabling are more tightly wound, which allows it to support higher bandwidths at higher transmission frequencies. This is also why how you coil these types of cables is so important as they lose efficiency if the twisted pairs become “unwound”. It also is a major drawback to the longevity of the cable itself and why it was originally intended for fixed installation. There are also stranded versus solid core versions of each cable, and while the advantage is that the solid core can transmit longer distances, it also is more susceptible to breakage.

Cat5, Cat5e, and Cat6 cable all contain four twisted pairs of conductors (hence the 8-pin connector) and can come in the form of UTP (Unshielded Twisted Pair) and STP (Shielded Twisted Pair). The idea being that a shielded twisted pair is less susceptible to outside interference, but it definitely ups the price point on the cable and MAY not be necessary depending on the application. For example, manufacturers often recommend shielded Cat5e or Cat6 cable for snakes for certain audio consoles to limit interference, but would that really be necessary for an installation in a home that is just getting a basic network set-up? Below is a table listing the major differences between Cat5, Cat5e, and Cat6 [5].

Cat5 Cat5e Cat6
  • Transfer data up to 100Mbps
  • Supports bandwidth up to 100MHz (conductors look less twisted)
  • Antiquated
  • Transfer data up to 1Gbps
  • Supports bandwidth up to 100MHz
  • Most common
  • Reduced near-end crosstalk
  • Transfer data up to 10 Gbps
  • Longitudinal separator inside between twisted pairs
  • Supports bandwidth up to 250 MHz capacity (conductors will look more twisted)
  • Reduced near-end crosstalk


If you look at the jacket of a copper cable used for networking, you will probably see a marking listing one of these specifications. The 8-pin connector on the end of the cable is referred to as a RJ45 connector or “registered jack” [6] and is the most common networking plug.

The end of a Cat6 patch cable with RJ45 connector. Notice the 8 conductors lined up with the 8 pins at the end.

Another major drawback of this copper cabling, besides the danger of the twisted pairs becoming “unwound” over time, is the length restriction. All 3 types of cabling are only rated to go a maximum of 100 meters, or roughly 330 feet, before needing a repeater or something to boost the signal again. This is where fiber wins by a longshot.

Another transport medium for data transmission involves converting the ones and zeros into light using a transceiver on both ends, and transferring it via fiber optic cabling. Fiber cabling is composed of single (or multiple) strands of glass or plastic roughly the diameter of a human hair [7]. The biggest advantage of fiber is its ability to go very long distances (depending whether it is singlemode or multimode fiber) with very little loss, very quickly. At the speed of light, in fact. The difference between singlemode and multimode fiber has to do with the thickness of the fiber core itself and how the light (which IS data) bounces around as it travels through the cable. In multimode fiber, the fiber core is larger and because it is larger, the light inside it bounces around the inside of the fiber more often. The Fiber Optic Association points out, the light travels “the core in many rays, called modes” [7]. These “refractions” inside the core cause some signal loss of the light over distance, which makes multimode relatively less efficient at traveling longer distances.

Singlemode vs Multimode fiber (including Grated-index and Step-index)

Singlemode fiber, on the other hand, has a significantly smaller core, which basically forces the light to travel in “only one ray (mode)” [7] allowing the signal to travel very long distances, we’re talking kilometers. This is an example of the type of fiber that might be used by your television company to send signals between cities. The problem with singlemode fiber is that while being expensive, it is also more delicate. It’s important to make the distinction here that the terms “singlemode” and “multimode” are related to the diameter/construction of the fiber core itself, NOT the number of strands in the fiber cable. There are military or “tactical grade” fiber cables with multiple strands of fiber in them like TAC-6 or TAC-12 that refer to the number of strands in the cable (6 and 12, respectively). You can have a TAC-6 or TAC-12 cable that can come in either singlemode or multimode flavors. In the majority of live sound applications, you will be dealing with multimode fiber, but before we move on, I want to make an important distinction about different types of fiber connectors.

The most common fiber connectors for live sound applications include LC and SC  (including single or duplex), and HMA or expanded beam connectors. SC connectors are a snap-in connection that have a 2.5mm ferrule, while LC is half the size with a 1.25mm ferrule [8]. These connectors are commonly seen in networking racks or from panels to stage racks as small yellow jumpers. They are cheap and, thus, they are delicate and can easily break if mishandled. The Neutrik opticalCON DUO cable [9] is based on LC-Duplex connectors, but the rugged build makes the connections more durable for the trials of live sound. Yet there is an important distinction here because these types of connectors care a lot more about alignment than an expanded beam connection.

From left to right: L-Com SC-SC singlemode fiber cable [10], Belkin multimode fiber optic cable LC/LC duplex MMF [11], Neutrik opticalCON Duo [9], & QPC QMicro Expanded Beam Fiber optic connector [12] (I do not own the rights to these photos, for educational purposes only)

Once upon a time, in a world where we still did gigs on a regular basis, Sidney Wilson (the operations manager at Hi-Tech Audio in Hayward, California) sat down with me at the end of a day to explain to me how fiber optics worked. I was at Sound On Stage at the time, and our shop was just a stone’s throw away from the Hi-Tech shop so I went over after hours one day to ask him to teach me about fiber because, at the time, I knew nothing about it. He talked to me about the difference between the opticalCON-type fiber connectors and the HMA or expanded beam fiber connections. It has to do with the end of the fiber strand. On the SC and LC type connections, the end of the fiber is cut so that when you mate the connection, the alignment must be dead on in order to pass the light through. On the other hand, a HMA or expanded beam connection has a lens shaped like a ball on the connector that magnifies the light coming from the thin strand [12]. This makes the alignment of the connection more “forgiving” in terms of alignment since there is a greater surface area for contact. Consequently, this also makes the connector more lenient with the daily abuse of mating connections in the touring audio world, especially with the rugged, military-grade connector. The trade-off here is that there is SOME amount of loss due to the magnification of the lens.

A simplified illustration comparing the mating of these two types of fiber ends. My attempt at recreating the napkin drawing Sidney originally drew to explain this to me.

So, as always, it comes down to application and, admittedly, the price tag. Leaving a box’s worth of Cat5e in a trench after a long corporate gig costs magnitudes less than trying to leave a single run of fiber after an event. Either way, whether we go with copper Cat5e cable or multimode HMA fiber, these transport mediums belong to the Physical layer of the OSI model, and deciding what to use for a given application is part of the basic decision making we need to assess in a network design.

“Papa, can you hear me?” → Message Transmission and Time

In the previous blog, I introduced the difference between unicast and multicast in the TCP/IP Protocol. We are now going to dig deeper and talk about how data gets transmitted, specifically in relation to time. First, let’s talk about the process called encapsulation. At the most basic level, a header and body is what composes a data packet. Pieces get added and/or stripped at different steps in the encapsulation process. In an article by Oracle, “the packet is the basic unit of information transferred across a network, consisting, at a minimum, of a header with the sending and receiving hosts’ addresses, and a body with the data to be transferred” [13]. The way to visualize the data encapsulation process of a TCP/IP Protocol Stack is like a consolidated version of the OSI model.

The TCP/IP Model looks like an abbreviated version of the OSI Model


At the Transport layer, depending on whether the packet uses UDP or TCP protocols, how the process passes off data changes in relation to accuracy and error checking. TCP, or Transmission Control Protocol [14], needs the start and endpoints of a transmission to acknowledge each other before passing data. In contrast, UDP, or User Datagram Protocol [15], does not check for this “handshake” when delivering packets and is widely used by audio-over-IP and higher-level protocols such as Dante. But why wouldn’t we want to use TCP that checks for errors since, after all, we need our data to be accurate? Well, the problem is that checking for these errors requires time. Audio, especially live, in-real-time applications require low latency, low time-delayed signal paths. A singer belting into a mic on a video screen and the audience hearing audio significantly later, generally doesn’t fly. If packets start getting lost or arriving at different times, this creates jitter in the data stream. So instead of choosing a protocol that goes back and “checks” to make sure all the data is there, in UDP we have chosen the path of least time resistance under the caveat that we better make sure it gets there. This is why QoS settings for UDP data transmission are very important.

If we were to set up a device, let’s say a managed switch, that will be dealing with UDP data transmission, we need to dive into the device’s administrative settings (or at least verify) that priority in the data transmission will be given to our time-sensitive data. QoS, or Quality of Service, refers to the management of bandwidth to prioritize certain data traffic over others. One example is DSCP, or Differentiated Services Code Point, which tags the packet header at the Network layer (in the OSI model) to prioritize that data in the transmission path [16]. If the network encounters a situation in which there is not enough bandwidth to pass all the data, the data without the priority tag gets queued until there is sufficient bandwidth to pass it, or it will get dropped first over the higher priority data [16]. For example, if you set up a classic Cisco SG300-10 managed switch to be used for Dante, part of the setup process is that you must log in to the administrator settings and set specific DSCP flags to prioritize data that is used for Dante over all other general network traffic. Once we start delving into these advanced settings such as QoS, we have to really keep in mind the overall picture of the function of our network. What is this data network going to be used for? Will we have other traffic like Internet traffic traveling alongside our audio signal? The capabilities of advanced networking allow us to accommodate all kinds of needs as long as we build and implement the network design properly.

Virtual Network Division (Boss-level)

One approach to taking a variety of network information and funneling it through to its various destinations is by utilizing VLANs and trunks. VLAN stands for “Virtual Local Area Network” and is basically what the name describes: it’s a way of creating a separated network that exists inside a greater network without having to do this physically. This is basically done at the Data Link layer by assigning certain ports on a managed switch to only carry certain broadcast domains. Here’s an example: say you have a network with two 10-port managed switches (one at either end) and you want Ports 1-4 to carry a VLAN (or multiple VLANs!) that is dedicated to the control network for running your favorite amplifier network controlling software, and then you want Port 5-8 to carry a VLAN (or multiple VLANs!) that has all audio-over-IP data. For the intentions of your network, you do not want these data streams to cross. By setting the switches up this way, you can use Ports 1-4 to plug in your laptop on one end to talk to the amplifiers on Ports 1-4 on the switch at the other end. Then other devices, say an audio console, can plug in anywhere on Ports 5-8 to pick up the data on the dedicated network that the stage rack is plugged in to on Port 5-8 on the switch at the other end. This is a great way of managing a large network to make sure that different devices don’t cross paths, but great care must be taken to make sure the correct settings are implemented and devices are plugged into the right ports in order to avoid a broadcast storm.

So how do all these separate VLANs get carried between the switches? It would kind of defeat the purpose of the VLAN to run separate cables between the switches connecting these ports. This is where trunking saves the day. Trunking involves the process of dedicating specific ports as “transport vehicles” to carry all the traffic from all the VLANs. Think of a trunk like a data version of a multicore snake carrying all the different, separated VLANs like separated, copper conductors on an analog snake. These are the connections you want to make between the managed switches. Be warned that generally, all network data travels through these ports so if you plug something into a trunk port that only wants to see traffic from a VLAN, it probably won’t be too happy about it. Here is a great way that, as a network designer, we can start harnessing the real power of our network. Some managed switches have certain SFP ports that allow for fiber connections using a special transceiver that converts data to light (and vice versa). Going back to our previous example, if Ports 9 and 10 are SFP ports and we set them up as trunks, we can run fiber for our cable path between switches and carry all our VLANs via that fiber connection. If you think about the possibility of utilizing multicore fiber cables such as TAC-6 or TAC-12 mentioned earlier so that each of those fibers contains a trunk that then carries multiple VLANs, it’s easy to see how the capabilities of our network quickly scale by orders of magnitude with these advanced setups. Now that we have conceptually seen how we can divide our network topology using VLANs and trunking, let’s take a step outward to see how we can divide it on a physical level.

Physical Network Division And Topologies

If you imagine a stage plot for a typical band and try to draw cable paths for all the snakes and sub snakes for each performer’s world, how you connect the stage boxes, to one another and/or to the main snakehead, will affect what will happen if there is some failure in one of the cables. The same concept applies when thinking about networks and how host devices or nodes connect to one another. In most live sound applications, there are four basic network topologies that you will encounter on a regular basis: daisy-chain, ring, star, and hybrid.

In a daisy-chain topology, we loop nodes from one device to the next in series. This is the most simple network to set up as it basically just involves connecting one device to another and then another and so on. Remember that the majority of network protocols implement a two-way road so the devices send and receive data back and forth on one cable. The problem with daisy-chaining your devices is that if one device goes down, it can take out your whole network depending on where it is in the signal path. It also adds more and more overall network latency as you go from one device to the next since we consider each node another hop in the network. In the example below, Console A is connected to Switch A, then to Rack A, and on to Rack B. If Rack A fails or a cable between Rack A and B fails, then Rack B gets taken down too because it is “downstream” of Rack A.


An example of a daisy-chain topology


If Rack A and Rack B had separate connections to Switch A, if one failed, the other would still have connection to the console.

In a star topology, one node acts as a hub in which other nodes branch off of it. This has less risk of one node failing and taking down the whole network. It has the disadvantage of using more cabling, but unless the node acting as the hub of the star goes down, it is far more resilient to individual host failures than the daisy-chain topology. In this example, we have connected a main switch in this rack to a series of networkable mic receivers. Yet instead of running a network cable to one receiver and then flowing through to daisy chain them together, we have run a separate cable from a discrete port on the switch to each receiver. Now if one receiver dies, regardless of where it is, we will still have network connection to the rest.


An example of a star topology


This also has the added advantage that the only network hop is from the hub device to the end node (or in this case, receiver). By using a combination of star and daisy-chain topology we have even more options.

A hybrid topology is a combination of utilizing several methods within the same network. Often this is necessary when you are incorporating devices with limited network ports, for making cable runs more efficient, and also lowering latency on big network deployments. Let’s say you are at a corporate event and have a console at FOH, but there is a stage rack in video world, two-stage racks in monitor world for the band inputs, and a rack in A2 world for wireless microphone receiver inputs. One possible solution utilizing a hybrid topology is to have the two-stage racks in monitor world daisy-chained from one to the other that then go to a switch that talks to both consoles in a star. Then the “master switch” talks to a switch in A2 world that has one port used by the wireless receivers daisy-chained together and then another port to the stage rack in video world because it is so close by.

An example of a hybrid topology in a network deployment

Now the “failure point” of this system is that if the switch in monitor world that acts like a hub for everything goes down, the whole network will pretty much go down with it. Maybe a possible solution would be to run a separate network connection from FOH to the switch in A2 world since the monitor engineer maybe is only there for the band portion of the event. It all comes down to designing the network with the least amount of failure points possible. As the joke goes in the world of audio: you can have cheap, efficient, and quality; pick two.

Another network topology worth mentioning here is called a ring. A ring network consists of devices that are always connected to two other neighboring devices.  In the world of live sound, we often see this from console manufacturers as a way for the console to always have one connection to a stage rack even if one of the two snake runs fail. In this example, the FOH and Monitor console are sharing one stage rack in a ring. On each device, or node, there is an “A” network connection and “B” network connection. In order to create a ring, cables make each connection as seen below from FOH port B to Stage Rack port A, Stage Rack port B to Monitor port A, and lastly back around from Monitor port B to FOH port A.

An example of a ring topology

Even if say the connection from FOH B to Stage Rack A somehow failed, since it is simultaneously still connected to Stage Rack B via the Monitor desk, the connection remains.

Daisy-chain, star, hybrid, and ring are very common network topologies in the world of live sound, but there are other topologies such as mesh networks that can be useful too, especially in wireless network applications. When you are designing your network it’s important to think about how you can make the system efficient given your situation’s requirements and available resources, without accumulating latency, and what level of redundancy you need the network to perform at.

Redundancy In The World Of Live Environments

Sidney Wilson also once pointed out to me that the level of redundancy we chose to abide by in the world of live sound is different than the expectations of redundancy in enterprise-level network applications. Let’s talk about the concepts of primary and secondary networks. As you might guess, the primary network is the main network path of data transmission, while the secondary network is your back-up in case something happens to the primary. This can range from having devices with the capability to maintain two internally separated networks to having two entirely separate rigs, consoles and all, in case the primary goes down. In an enterprise-level network installation, they might run separate cables down completely separate paths of the building to prevent the network from going down if one cable fails. Yet in the world of live sound, and especially touring applications, how often do we run two separate cable paths for the audio snake to FOH? One for the primary run, one to the secondary? Maybe if it is important enough, you might be able to run the snakes on two separate paths. Yet if you were at a music festival where there is one snake path for everyone because of cable jackets and safety precautions, the chances of you being able to do that is pretty close to nil. So, like everything in the live entertainment industry, it is a game of compromise.

What’s really cool is that you can apply this concept of redundancy to almost every level of the OSI model. Technology keeps improving to give us more failsafes in our network design. On the one hand, you can have physically separate cable runs and/or systems for a primary and secondary network, and if one fails then someone literally unplugs the main data stream into the secondary network. There are also different protocols that implement redundancy by having “automatic” switchovers where if the primary network fails, the data switches almost instantaneously to the secondary network. This includes Dante and AVB networks with Milan.

If you’ve made it this far, congratulations! Thank you for sticking with me through these first two blogs from explanations of binary to the extensive discussion of network cable. If you’ve read the last blog and this one, my hope is that you can combine the knowledge from the two to start conceptualizing how all these pieces work together in the application of the world of live sound. Now that we have established this basis in which to talk about networking, in the next blog we will advance into the world of networking protocols such as AVB and Dante. Now that we have this knowledge under our belt we can better compare and contrast the applications and usages for both. See you next time!

*I thought this name covered this concept a lot better than “Dividing A Network” as mentioned at the end of my last blog


[1] https://www.lifewire.com/layers-of-the-osi-model-illustrated-818017

[2] https://www.youtube.com/playlist?list=PL8dPuuaLjXtNlUrzyH5r6jN9ulIgZBpdo

[3] https://www.linksys.com/us/r/resource-center/basics/whats-ethernet/


[5] http://ciscorouterswitch.over-blog.com/article-cat5-vs-cat5e-vs-cat6-125134063.html

[6] https://techterms.com/definition/rj45

[7] https://www.thefoa.org/tech/ref/basic/fiber.html

[8] https://www.thefoa.org/tech/connID.htm


[10] https://www.l-com.com/fiber-optic-9-125-singlemode-fiber-cable-sc-sc-30m

[11] https://www.belkin.com/us/p/P-F2F202LL/

[12] https://www.qpcfiber.com/product/qmicro/

[13] https://docs.oracle.com/cd/E19455-01/806-0916/ipov-32/index.html

[14] https://www.pcmag.com/encyclopedia/term/tcp

[15] https://www.pcmag.com/encyclopedia/term/udp

[16] https://www.networkcomputing.com/networking/basics-qos



Audinate. (n.d.). Dante Certification Program. https://www.audinate.com/learning/training-certification/dante-certification-program

Audio Technica U.S., Inc. (2014, November 5). Networking Fundamentals for Dante. https://www.audio-technica.com/cms/resource_library/files/89301711029b9788/networking_fundamentals_for_dante.pdf

Belkin International, Inc. (n.d.). Belkin Fiber Optic Cable; Multimode LC/LC Duplex MMF, 62.5/125. Retrieved June 21, 2020 from https://www.belkin.com/us/p/P-F2F202LL/

Cai, Cloris. (2016, December 29). What Is The Difference Between Cat5, Cat5e, and Cat6 Cable?. Medium. https://medium.com/@cloris326192312/what-is-the-difference-between-cat5-cat5e-and-cat6-cable-530e4e0ab12b

Chapman, B.D. & Zwicky, E.D. (1995, November). Building Internet Firewalls. O’Reilly & Associates. http://web.deu.edu.tr/doc/oreily/networking/firewall/ch06_03.htm

Cisco & Cisco Router, Network Switch. (2014, December 3). CAT5 vs. CAT5e vs. CAT6. Overblog. http://ciscorouterswitch.over-blog.com/article-cat5-vs-cat5e-vs-cat6-125134063.html

Crash Course. (2020, March 19). Computer Science [Video Playlist]. YouTube. https://www.youtube.com/playlist?list=PL8dPuuaLjXtNlUrzyH5r6jN9ulIgZBpdo

Froehlich, Andrew. (2016, August 15). The Basics of QoS. Network Computing. https://www.networkcomputing.com/networking/basics-qos

Geeks for Geeks. (n.d.). Types of Network Topology. Retrieved June 21, 2020 from https://www.geeksforgeeks.org/types-of-network-topology/

Infinite Electronics International, Inc. (n.d.) 9/25, Singlemode Fiber Cable, SC / SC, 3.0m. L-com. Retrieved June 21, 2020 from https://www.l-com.com/fiber-optic-9-125-singlemode-fiber-cable-sc-sc-30m

Linksys. (n.d.). What is Ethernet?. Retrieved June 21, 2020 from https://www.linksys.com/us/r/resource-center/basics/whats-ethernet/

Mitchell, Bradley. (2020, April 29). The Layers of the OSI Model Illustrated. Lifewire. https://www.lifewire.com/layers-of-the-osi-model-illustrated-818017

Neutrik. (n.d.). OpticalCON DUO Cable. Retrieved June 21, 2020 from https://www.neutrik.com/en/neutrik/products/opticalcon-fiber-optic-connection-system/opticalcon-advanced/opticalcon-duo/opticalcon-duo-cable

Oracle Corporation. (2010). Data Encapsulation and the TCP/IP Protocol Stack. In System Administration Guide, Volume 3. Retrieved June 21, 2020 from https://docs.oracle.com/cd/E19455-01/806-0916/ipov-32/index.html

PCMag. (n.d.). TCP. In PCMag Encyclopedia. Retrieved June 21, 2020 from https://www.pcmag.com/encyclopedia/term/tcp

PCMag. (n.d.). UDP. In PCMag Encyclopedia. Retrieved June 21, 2020 from https://www.pcmag.com/encyclopedia/term/udp

QPC. (n.d.). QMicro. Retrieved June 21, 2020 from, https://www.qpcfiber.com/product/qmicro/

TechDifferences. (2017, August 18). Difference Between Frame and Packet. https://techdifferences.com/difference-between-frame-and-packet.html

Tech Terms. (2011, July 1). RJ45. https://techterms.com/definition/rj45

The Fiber Optic Association, Inc. (2019). Guide To Fiber Optics & Premises Cabling. Retrieved June 21, 2020 from https://www.thefoa.org/tech/connID.htm

The Fiber Optic Association, Inc. (2018). Reference Guide. Retrieved June 21, 2020 from https://www.thefoa.org/tech/ref/basic/fiber.html


Basic Networking For Live Sound Engineers 

Part One: Defining A Network

The World of Audio Over IP

There is a certain sense of security that comes from physically plugging a cable made of copper from one device to another. On some level my engineer brain finds comfort believing that, “As long as I patch this end to that end correctly and the integrity of the cable itself has not been compromised, the signal will get from Point A to Point B.”  I believe one of the most daunting aspects of understanding networked audio, and audio-over-IP in general, stems from the feeling of self-induced, psychological uncertainty in one’s ability to “physically” route one thing to another. I mean, after all these years consoles still have faders, buttons, and knobs because people enjoy the tactile feedback of performing a move related to their task in audio.

The psychological hurdle that must be overcome is that a network can be much like a copper multicore snake, sending multiple signals all over the place. The beauty and power of it is that it has so much more adaptability than our old copper friend. We can send larger quantities of high-quality signal around the world: a task that would be financially and physically impractical for a single project using physical wires. In this first blog, part 1 of a 3 part series, I will attempt to overview a basic understanding of what a network is and how we can create and connect to a network.

What Is A Network?

A network can refer to any group of things that interconnect to transfer data: think of a “social network” where a group of individuals exchange ideas in person or over the Internet. Cisco Systems (one of the biggest juggernauts of the industrial networking world) defines a network as “two or more connected computers that can share resources such as data, a printer, and Internet connection, applications, or a combination of these resources” (Cisco, 2006 [1]). We commonly see networks created using wired systems, Wi-Fi, or a combination of these. Wired systems build a network using physical Ethernet connections (Cat5e/Cat6 cabling) or fiber, while Wi-Fi uses radio frequencies to carry signals from device to device. “Wi-Fi” is a marketing term for the technology that the Institute of Electrical and Electronics Engineers (IEEE) define in standards 802.11, and we could dedicate an entire blog just to discussing this topic [2].


Unicast vs. Multicast

In a given network using the TCP/IP protocol, which stands for “Transmission Control Protocol/Internet Protocol”, devices exchange packets of data by requesting and responding to messages sent to one another. In a unicast message, one device talks directly to another as a point-to-point transmission. In a multicast message, one device can broadcast a message to multiple devices at once. To understand how devices exchange messages to one another, we must understand how IP and MAC addresses work.

I like to think of a data network like a department in a tour: there are the audio, lighting, video, and other departments, and each department has its own participants who communicate with each other within their own department. Let’s look at the analogy of a network compared to the audio department. Each individual, (the monitor engineer, PA techs, systems engineer, FOH Engineer, etc.), act as discrete hosts performing tasks like a computer or amplifier talking to one another on a data network. Every device has a unique MAC address, which stands for “Media Access Control” Address and, like the name of each person on a crew (except 48-bit and written in hexadecimal [3]), is unique to the hardware of a device on a network. An IP address is a 32-bit number written as 4 octets (if translated into binary) and is specific to devices within the same network [4]. Think of an IP address as different from a MAC address like a nickname is to a given name. There may be several folks nicknamed “Jay” on a crew, maybe Jennifer in Audio and John in Lighting, but as long as “Jay” is talking to people locally in the same department, the other hosts will know who “Jay” is being referred to.

These two networks (or tour departments) are not local to the same network

MAC addresses are specific to hardware, but IP addresses can be “reused” as long as there are no conflicts with another device of the same address within the same local network. A group of devices in the same IP range is called a LAN or Local Area Network. LANs can vary from basic to complex networks and are seen everywhere from the Wi-Fi network in our homes to a network of in-ear monitor transmitters and wireless microphone receivers connected to a laptop. So how do these devices talk to each other within a LAN?

IP Addresses and Subnet Masks within a LAN:

Let’s create a simple LAN of a laptop and a network-capable wireless microphone receiver and dive deep into understanding what composes an IP address. The computer has an IP address that is associated with it via its MAC address and the same goes for the receiver. In Figure A the two devices are directly connected from the network adapter of one to the other with an Ethernet Cat 6 cable.

Figure A

The IP address of the laptop is and the IP address of the receiver is Each of the four numbers separated by a period actually translates to an octet (8 bits) of binary. This is important because both devices are on the same subnet 192.168.1.XXX. A subnet is a way of dividing a network by having devices only look at other devices that are within their same network as defined by their subnet mask. There are 254 addresses available on the subnet mask According to a Microsoft article, “Understanding TCP/IP addressing and subnetting basics”, XXX.XXX.XXX.0 is used to specify a network “without specifying a host” and XXX.XXX.XXX.255 is used to “broadcast a message to every host on the network” [5]. So, in this network example, neither the computer nor the receiver can use the IP addresses or because those addresses are reserved for the network and for broadcast. But how does the computer know to look for the receiver in the 192.168.1.XXX IP address range? Why doesn’t it look at This has to do with the subnet mask of each device.

Let me give you a little history about these numbers: believe it or not, but there is an organization whose main gig is to assign IP addresses in the public Internet. The Internet Assigned Numbers Authority (IANA) manages IP addresses that connect you and your Internet Service provider (ISP) to the World Wide Web. In order to prevent conflicts with the IP addresses that connect with the Internet, the IANA enforces a set of standards created by the IETF (Internet Engineering Task Force). One set of standards referred to as RFC 1918 [6] reserves a specific set of IP ranges for private networks, like the example 192.168.1.XXX. That means that anyone can use them within their own LAN, as long as it does not connect to the Internet. To understand more about how our computers connect to the Internet, we have to talk about DNS and gateways, which is beyond the scope of this blog. The key for our laptop and receiver to determine whether another device is local to their LAN lies in the subnet mask. Both devices in Figure A have a subnet mask of Each set of numbers, like the IP address, corresponds to an octet of binary. The difference is that instead of indicating a specific number, it indicates the number of available values for addresses in that range. The subnet mask becomes a lot easier to understand once you think about it in its true binary form. But trust me, once you understand what a subnet mask ACTUALLY refers to in binary, you will better understand how it refers to available IP addresses in the subnet.

A subnet mask is composed of 4 octets in binary. If we filled every bit in each octet except for the last and translated it to its true binary form we would get a subnet mask that looks like this: can also be written as 11111111.11111111.11111111.00000000

Binary is base two and reflects an “on” or “off” value, which means that each position of each bit in the octet, whether it is zero or one, can mathematically equal 2^n (2 to the nth power) until you get to the 8th position.

The octet XXXXXXXX (value X in octet of either 1 or 0) can also be written as:


Binary math is simply done by “filling in” the position of the bit in the octet with a “true” value and then calculating the math from there. In other words, a binary octet of 11000000 (underlines added for emphasis) can be interpreted as


OK, OK, roll with me here. So if we do the binary math for all values in the octet being “true” or 1 then in the previous example,


So if we refer back to the first subnet mask example, we can discern based on the binary math that:


When a value is “true” or 1 in a bit in an octet, that position has been “filled” and no other values can be placed there. Think of each octet like a highway: each highway has 8 lanes that can fit up to 254 cars/hosts total on the highway (remember it is base 2 math and the values of 0 and 255 are accounted for). A value of 1 means that the lane has been filled by 2^n cars/hosts where n=lane position on the highway and the lanes count starting at 0 (because it is a computer). So to add another car, it must move to the next lane to the left or bit position. For example, if you climb up from 00000011 to 00000111 each 1 acts like cars filling up a lane, and if the lane is filled, the next bit moves on to the next left lane.


Each position of a bit is like a lane on a highway (top), when the value of the lowest bit is “filled” or True (remember this is an analogy, really it’s either binary On or Off), the ascending value “spills” over to the next bit (bottom) 

So why do we care about this? Well if a device has a subnet mask of or 11111111.11111111.11111111.00000000 that means that all the binary values of the first 3 octets must match with the other devices in order for them to be considered to be “local” to the same local network. The only values or lanes “available” for hosts are in the last octet (hence the zeroes). So going back to Figure A our computer and wireless network both have a subnet mask of which indicates that the first 3 octets of the IP address on both devices MUST be the same on both devices for them to talk to each other AND there are only 254 available IP addresses for hosts on the network (192.168.1-254). Indeed both the laptop and receiver are local because they both are on the 192.168.1.XXX subnet, and the subnet mask only “allows” them to talk to devices within that local network.

In this example, we talked about devices given static IP addresses as opposed to addresses created using DHCP. In a static IP address, the user or network administrator defines the IP address for the device whereas a device set to DHCP, or Dynamic Host Configuration Protocol, looks to the network to determine what is the current available address for the device and assigns it to that device on a lease basis [7]. In the world of audio, the type of network addressing you choose for your system may vary from application to application, but static IP addressing is commonly preferred due to the ability for the operator to specify the exact range they want the devices to operate in as opposed to leaving it up to the network to decide. Returning to our earlier analogy of the audio department on a tour, each host needs a way to communicate with one another and also to other departments. What if the PA tech needs to talk to someone in the outside network of the lighting department? This is where routers and switches come into play.

A switch and a router often get referred to interchangeably when in fact they perform two different functions. A switch is a device that allows for data packets to be sent between devices on the same network. Switches have tables of MAC addresses on the same local network that they use to reference when sending data packets between devices. A router works by identifying IP addresses of different devices, and “directing traffic” by acting as a way to connect devices over separate networks. Routers do this by creating a “routing table” of IP addresses and when a device makes a request to talk to another device, it can reference its table to find the corresponding device to forward that message [8]. Routers are kind of like department crew chiefs where you can give them a message to be delivered to another department.


Routers can connect separate networks to allow them to talk to one another

Routers often get confused with their close relative the access point, and though you can use a router to function similarly to an access point, an access point cannot be a router. Routers and access points come up often in wireless applications as a way to remotely get into a network. The difference is that access points allow you to get into a specific local network or expand the current network. Unlike a router, access points do not have the capability to send messages to another network outside the LAN.

So now let’s say we want to add another device to our network in Figure A and we don’t need to cross into another network. For example, we want to add an in-ear monitor transmitter. One method we can use is to add a switch to connect all the devices.

Network from Figure A with an IEM transmitter added, all talking via a switch

The switch connects the three devices all on the same local network of 192.168.1.XXX. You can tell that they are all local to this network because they have the subnet mask, therefore all devices are only looking to “talk” to messages on 192.168.1.XXX since only the values in the last octet are available for host IP addresses. Voilà! We have created our first LAN!

It may seem daunting at first, but understanding the binary behind the numbering in IP addresses and subnet masks are the key to understanding how devices know what other hosts are considered to be on their local network or LAN. With the help of switches and access points, we can expand this local network and with the addition of routers, we can include other networks. Using these expanding devices allows us to divide our network further into different topologies. In the next blog, this concept will be expanded further in Basic Networking For Live Sound Part 2: Dividing A Network. Stay tuned!

If you want to learn more about networking, there are some GREAT resources available to you online! Check out training from companies such as:




And more!



[2] https://www.cisco.com/c/en_ca/products/wireless/what-is-wifi.html

[3] https://www.audio-technica.com/cms/resource_library/files/89301711029b9788/networking_fundamentals_for_dante.pdf

[4] Ibid.

[5] https://support.microsoft.com/en-ca/help/164015/understanding-tcp-ip-addressing-and-subnetting-basics

[6] https://tools.ietf.org/html/rfc1918

[7] https://eu.dlink.com/uk/en/support/faq/firewall/what-is-dhcp-and-what-does-it-do

[8] https://www.cisco.com/c/en/us/solutions/small-business/resource-center/networking/how-does-a-router-work.html#~what-does-a-router-do


Audinate. (n.d.). Dante Certification Program. https://www.audinate.com/learning/training-certification/dante-certification-program

Audio Technica U.S., Inc. (2014, November 5). Networking Fundamentals for Dante. https://www.audio-technica.com/cms/resource_library/files/89301711029b9788/networking_fundamentals_for_dante.pdf

Cisco. (n.d.) How Does a Router Work? https://www.cisco.com/c/en/us/solutions/small-business/resource-center/networking/how-does-a-router-work.html

Cisco. (2006). Networking Fundamentals. In SMB University: Selling Cisco SMB Foundation Solutions. Retrieved from https://www.cisco.com/c/dam/global/fi_fi/assets/docs/SMB_University_120307_Networking_Fundamentals.pdf

Cisco. (n.d.) What Is Wi-Fi? https://www.cisco.com/c/en_ca/products/wireless/what-is-wifi.html

D-Link. (2012-2018). What is DHCP and what does it do? https://eu.dlink.com/uk/en/support/faq/firewall/what-is-dhcp-and-what-does-it-do

Encyclopedia Brittanica. (n.d.). TCP/IP Internet Protocols. In Encyclopedia Brittanica. Retrieved April 26, 2020, from https://www.britannica.com/technology/domain-name

Generate Random MAC Addresses. (2020). Browserling. https://www.browserling.com/tools/random-mac

Internet Assigned Numbers Authority. (2020, April 21). In Wikipedia. https://en.wikipedia.org/wiki/Internet_Assigned_Numbers_Authority

Internet Engineering Task Force. (1996). Address Allocation for Private Internets (RFC 1918). Retrieved from https://tools.ietf.org/html/rfc1918

Microsoft Support. (2019, December 19). Understanding TCP/IP addressing and subnetting basics. https://support.microsoft.com/en-ca/help/164015/understanding-tcp-ip-addressing-and-subnetting-basics

Thomas, Jajish. (n.d.).What are Routing and Switching | Difference between Routing and Switching. OmniSecu.com. https://www.omnisecu.com/cisco-certified-network-associate-ccna/what-are-routing-and-switching.php

Word Clocks, Clock Masters, SRC, and Digital Clocks

And Why They Matter To You

Three digital audio consoles walk into a festival/bar and put in their drink orders. The bartender/front-end processor says, “You can order whatever you want, but I’m going to determine when you drink it.” In the modern audio world, we are able to keep our signal chain in the digital realm from the microphone to the loudspeaker longer without hopping back and forth through analog-to-digital (and vice versa) converters. In looking at our digital signal flow there are some important concepts to keep in mind when designing a system. In order to keep digital artifacts from rearing their ugly heads amongst our delivery of crispy, pristine audio, we must consider our application of sample rate conversions and clock sources.

Let’s back up a bit to define some basic terminology: What is a sample rate? What is the Nyquist frequency? What is bit depth? If we take a period of one second of a waveform and chop it up into digital samples, the number of “chops” per second is our sample rate. For example, the common sample rates of 44.1 KHz, 48 KHz, and 192 KHz refer to 44,100 samples per second; 48,000 samples per second; and 192,000 samples per second.

A waveform signal “chopped” into 16 samples

Why do these specific numbers matter you may ask? This brings us to the concept of the Nyquist theorem and Nyquist frequency:

“The Nyquist Theorem states that in order to adequately reproduce a signal it should be periodically sampled at a rate that is 2X the highest frequency you wish to record.”

(Ruzin, 2009)*.

*Sampling theory is not just for audio, it applies to imaging too! See references

So if the human ear can hear 20Hz-20kHz, then in theory in order to reproduce the frequency spectrum of the human ear, the minimum sample rate must be 40,000 samples per second. Soooo why don’t we have sample rates of 40 KHz? Well, the short answer is that it doesn’t sound very good. The long answer is that it doesn’t sound good because the frequency response of the sampled waveform is affected by frequencies above the Nyquist frequency due to aliasing. According to “Introduction to Computer Music: Volume One”, by Professor Jeffrey Hass of Indiana University, these partials or overtones above the sample frequency range are “mirrored the same distance below the Nyquist frequency as the originals were above it, at the original amplitudes” (2017-2018). This means that those frequencies sampled above the range of human hearing can affect the frequency response of our audible bandwidth given high enough amplitude! So without going down the rabbit hole of recording music history, CDs, and DVDs, you can see part of the reasoning behind these higher sample rates is to provide better spectral bandwidth for what us humans can perceive. Another important term for us to discuss here is bit depth and word length when talking about digital audio.

Not only is the integrity of our digital waveform affected by the minimum number of samples per second, but bit depth affects it as well. Think of bit depth as the “size” of the “chops” of our waveform where the higher the bit depth, the greater discretization of our samples. Imagine you are painting a landscape with dots of paint: if you used large dots to draw this landscape, the image would be chunkier and perhaps it would be harder to interpret the image being conveyed. As you paint with smaller and smaller dots closer together, the dots start approaching the characteristics of lines the smaller and closer together they get. As a result, the level of articulation within the drawing significantly increases.


Landscape portrayed with dots of smaller sample size


Landscape portrayed with dots of larger sample size


When you have higher bit depths, the waveform is “chopped” into smaller pieces creating increased articulation of the signal. Each chop is described by a “word” in the digital realm that translates to a computer value by the device doing the sampling. The word length or bit depth describes in computer language to the device how discrete and fine to make the dots in the painting. So who is telling these audio devices when to start taking samples and at what rates to do so? Here is where the device’s internal clock comes in.

Every computer device from your laptop to your USB audio interface has some sort of clock in it whether it’s in the processor’s logic board or in a separate chip. This clock acts kind of like a police officer in the middle of an intersection directing the traffic of bits based on time in your computer. You can imagine how mission-critical this is, especially for an audio device, because our entire existence in the audio world lives as a function of the time domain. If an analog signal from a microphone is going to be converted into a digital signal at a given sample rate, the clock inside the device with the analog-to-digital and digital-to-analog converter needs to “keep time” for that sampling rate so that all the electronic signals traveling through the device don’t turn into a mush of cars slamming into each other at random intervals of an intersection. Chances are if you have spent enough time with digital audio, you have come into some situation where there was a sample rate discrepancy or clock slipping error that reared its ugly head and the only solution is to get the clocks of the devices in sync or change the sample rates to be consistent throughout the signal chain.

One of the solutions for keeping all these devices in line is to use an external word clock. Many consoles, recording interfaces, and other digital audio devices allow the use of an external clocking device to act as the “master” for everything downstream of it. Some engineers claim the sonic benefits of using an external clock for increased fidelity in the system since the idea is that all the converters in the downstream devices connected to the external clock are beginning their samples at the same time. Yet regardless of whether you use an external clock or not, the MOST important thing to know is WHO/WHAT is acting as the clock master.

Let’s go back to our opening joke of this blog about the different consoles walking into a festival, umm I mean bar. Let’s say you have a PA being driven via the AES/EBU standard and a drive rack at FOH with a processor that is acting as a matrix for all the guest consoles/devices into the system. If a guest console comes in running at 96 KHz, another at 48 KHz, another at 192 KHz, and the system is being driven via AES at 96 KHz, for the sake of this discussion, who is determining where the samples of the electronic signals being shipped around start and end? Aren’t there going to be bits “lost” since one console is operating at one sample rate and another at a totally different one? I think now is the time to bring up the topic of SRC or “Sample Rate Conversion”.

My favorite expression in the industry is, “There is no such thing as a free lunch” because life really is a game of balancing compromises for the good and the bad. Some party in the above scenario is going to have to yield to the traffic of a master clock source or cars are going to start slamming into each other in the form of digital artifacts, i.e. “pops” and “clicks”. Fortunately for us, manufacturers have thought of this for the most part. Somewhere in a given digital device’s signal chain they put a sample rate converter to match the other device chained to it so that this traffic jam doesn’t happen. Whether this sample rate conversion happens at the input or the output and synchronously or asynchronously of the other device is manufacturer specific.

What YOU need to understand as the human deploying these devices is what device is going to be the police officer directing the traffic. Sure, there is a likelihood that if you leave these devices to sort their sample rate conversions out for themselves there may not be any clock slip errors and everyone can pat themselves on the back that they made it through this hellish intersection safe and sound. After all, these manufacturers have put a lot of R&D into making sure their devices work flawlessly in these scenarios…right? Well, as a system designer, we have to look at what we have control over in our system to try and eliminate the factors that could create errors based on the lowest common denominator.

Let’s consider several scenarios of how we can use our trusty common sense and our newfound understanding of clocks to determine an appropriate selection of a clock master source for our system. Going back to our bartending-festival scenario, if all these consoles operating at different sample rates are being fed into one system for the PA, it makes sense for a front-end processor that is taking in all these consoles to operate its clock internally and independently. If the sample rate conversion happens internally in the front-end processor and independent of the input, then it doesn’t really care what sample rate comes into it because it all gets converted to match the 96 KHz sample rate at its output to AES.


Front-end DSP clocking internally with SRC

In another scenario, let’s say we have a control package where the FOH and monitor desk are operating on a fiber loop and the engineers are also operating playback devices that are gathering time domain-related data from that fiber loop. The FOH console is feeding a processor in a drive rack via AES that in turn feeds a PA system. In this scenario, it makes the most sense for the fiber loop to be the clock source and the front-end processor to gather clock and SRC data from the AES input of the console upstream of it because if you think about it as a flow chart, all the source data is coming back to the fiber loop. In a way, you could think of where the clock master comes from to be the delegation of the police officer that has the most influence on the audio path under discussion.


Fiber loop as chosen origin of clock source for the system


As digital audio expands further into the world of networked audio, the concept of a clock master becomes increasingly important to understanding signal processing when you dive into the realms of protocols such as AVB or Dante. Our electronic signal turns into data packets on a network stream and the network itself starts determining where the best clock source is coming from and can even switch between clock masters if one were to fail. (For more information check out www.audinate.com for info on Dante and www.avnu.org for info on AVB). As technology progresses and computers get increasingly more capable for large amounts of digital signal processing, it will be interesting to see how the manifestation of fidelity correlates to better sample rates, bit-perfect converters, and how we can continue to seek perfection in the representation of a beautiful analog waveform in the digital realm.

The views in this blog are for educational purposes only and the opinion of the author alone and not to be interpreted as an endorsement or reflect the views of the aforementioned sources. 


Hass, Jeffrey. 2017-2018. Chapter Five: Digital Audio. Introduction to Computer Music: Volume One. Indiana University. https://cecm.indiana.edu/etext/digital_audio/chapter5_nyquist.shtml


Ruzin, Steven. 2009, April 9. Capturing Images. UC Berkeley. http://microscopy.berkeley.edu/courses/dib/sections/02Images/sampling.html


**Omnigraffle stencils by Jorge Rosas: https://www.graffletopia.com/stencils/435


There Really Is No Such Thing As A Free Lunch

Using The Scientific Method in Assessment of System Optimization

A couple of years ago, I took a class for the first time from Jamie Anderson at Rational Acoustics where he said something that has stuck with me ever since. He said something to the effect of our job as system engineers is to make it sound the same everywhere, and it is the job of the mix engineer to make it sound “good” or “bad”.

The reality in the world of live sound is that there are many variables stacked up against us. A scenic element being in the way of speaker coverage, a client that does not want to see a speaker in the first place, a speaker that has done one too many gigs and decides that today is the day for one driver to die during load-in or any other myriad of things that can stand in the way of the ultimate goal: a verified, calibrated sound system.

The Challenges Of Reality


One distinction that must be made before beginning the discussion of system optimization is that we must draw a line here and make all intentions clear: what is our role at this gig? Are you just performing the tasks of the systems engineer? Are you the systems engineer and FOH mix engineer? Are you the tour manager as well and work directly with the artist’s manager? Why does this matter, you may ask? The fact of the matter is that when it comes down to making final evaluations on the system, there are going to be executive decisions that will need to be made, especially in moments of triage. Having clearly defined what one’s role at the gig is will help in making these decisions when the clock is ticking away.

So in this context, we are going to discuss the decisions of system optimization from the point of the systems engineer. We have decided that the most important task of our gig is to make sure that everyone in the audience is having the same show as the person mixing at front-of-house. I’ve always thought of this as a comparison to a painter and a blank canvas. It is the mix engineer’s job to paint the picture for the audience to hear, it is our job as system engineers to make sure the painting sounds the same every day by providing the same blank canvas.

The scientific method teaches the concept of control with independent and dependent variables. We have an objective that we wish to achieve, we assess our variables in each scenario to come up with a hypothesis of what we believe will happen. Then we execute a procedure, controlling the variables we can, and analyze the results given the tools at hand to draw conclusions and determine whether we have achieved our objective. Recall that an independent variable is a factor that remains the same in an experiment, while a dependent variable is the component that you manipulate and observe the results. In the production world, these terms can have a variety of implications. It is an unfortunate, commonly held belief that system optimization starts at the EQ stage when really there are so many steps before that. If there is a column in front of a hang of speakers, no EQ in the world is going to make them sound like they are not shadowed behind a column.

Now everybody take a deep breath in and say, “EQ is not the solution to a mechanical problem.” And breathe out…

Let’s start with preproduction. It is time to assess our first round of variables. What are the limitations of the venue? Trim height? Rigging limitations? What are the limitations proposed by the client? Maybe there is another element to the show that necessitates the PA being placed in a certain position over another; maybe the client doesn’t want to see speakers at all. We must ask our technical brains and our career paths in each scenario, what can we change and what can we not change? Note that it will not always be the same in every circumstance. In one scenario, we may be able to convince the client to let us put the PA anywhere we want, making it a dependent variable. In another situation, for the sake of our gig, we must accept that the PA will not move or that the low steel of the roof is a bleak 35 feet in the air, and thus we face an independent variable.

The many steps of system optimization that lie before EQ


After assessing these first sets of variables, we can now move into the next phase and look at our system design. Again, say it with me, “EQ is not the solution to a mechanical problem.” We must assess our variables again in this next phase of the optimization process. We have been given the technical rider of the venue that we are going to be at and maybe due to budgetary restraints we cannot change the PA: independent variable. Perhaps we are carrying our own PA and thus have control over the design with limitations from the venue: dependent variable forms, but with caveats. Let’s look deeper into this particular scenario and ask ourselves: as engineers building our design, what do we have control over now?

The first step lies in what speaker we choose for the job. Given the ultimate design control scenario where we get the luxury to pick and choose the loudspeakers we get to use in our design, different directivity designs will lend themselves better in one scenario versus another. A point source has just as much validity as the deployment of a line array depending on the situation. For a small audience of 150 people with a jazz band, a point source speaker over a sub may be more valid than showing up with a 12 box line array that necessitates a rigging call to fly from the ceiling. But even in this scenario, there are caveats in our delicate weighing of variables. Where are those 150 people going to be? Are we in a ballroom or a theater? Even the evaluation of our choices on what box to choose for a design are as varied as deciding what type of canvas we wish to use for the mix engineer’s painting.

So let’s create a scenario: let’s say we are doing an arena show and the design has been established with a set number of boxes for daily deployment with an agreed-upon design by the production team. Even the design is pretty much cut and paste in terms of rigging points, but we have varying limitations to trim height due to high and low steel of the venue. What variables do we now have control over? We still have a decent amount of control over trim height up to a (literal) limit of the motor, but we also have control over the vertical directivity of our (let’s make the design decision for the purpose of discussion) line array. There is a hidden assumption here that is often under-represented when talking about system designs.

A friend and colleague of mine, Sully (Chris) Sullivan once pointed out to me that the hidden design assumption that we often make as system engineers, but don’t necessarily acknowledge, is that we assume that the loudspeaker manufacturer has actually achieved the horizontal coverage dictated by technical specifications. This made me reconsider the things I take for granted in a given system. In our design, we choose to use Manufacturer X’s 120-degree line source element. They have established in their technical specs that there is a measurable point at 60 degrees off-axis (total 120-degree coverage) where the polar response drops 6 dB. We can take our measurement microphone and check that the response is what we think it is, but if it isn’t what really are our options? Perhaps we have a manufacturer defect or a blown driver somewhere, but unless we change the physical parameters of the loudspeaker, this is a variable that we put in the trust of the manufacturers. So what do we have control over? He pointed out to me that our decision choices lie in the manipulation of the vertical.

Entire books and papers can and have been written about how we can control the vertical coverage of our loudspeaker arrays, but certain factors remain consistent throughout. Inter-element angles, or splay angles, let us control the summation of elements within an array. Site angle and trim height let us control the geometric relationship of the source to the audience and thus affect the spread of SPL over distance. Azimuth also gives us geometric control of the directivity pattern of the entire array along a horizontal dispersion pattern. Note that this is a distinction from the horizontal pattern control of the frequency response radiating from the enclosure, of which we have handed responsibility over to the manufacturer. Fortunately, the myriad of loudspeaker prediction software available from modern manufacturers has given the modern system engineer an unprecedented level of ability to assess these parameters before a single speaker goes up into the air.

At this point, we have made a lot of decisions on the design of our system and weighed the variables along every step of the way to draw out our procedure for the system deployment. It is now time to analyze our results and verify that what we thought was going to happen did or did not happen. Here we introduce our tools to verify our procedure in a two step-process of mechanical then acoustical verification. First, we use tools such as protractors and laser inclinometers as a means of collecting data to assess whether we have achieved our mechanical design goal. For example, our model says we need a site angle of 2 degrees to achieve this result so we verify with the laser inclinometer that we got there. Once we have assessed that we made our design’s mechanical goals, we must analyze the acoustical results.

Laser inclinometers are just one example of a tool we can use to verify the mechanical actualization of a design


It is here only at this stage that we are finally introducing the examination software to analyze the response of our system. After examining our role at the gig, the criteria involved in pre-production, choosing design elements appropriate for the task, and verifying their deployment, only now can move into the realm of analysis software to see if all those goals were met. We can utilize dual-channel measurement software to take transfer functions at different stages of the input and output of our system to verify that our design goals have been met, but more importantly to see if they have not been met and why. This is where our ability to critically interpret the data comes in to play. By evaluating impulse response data, dual-channel FFT (Fast-Fourier Transform) functions, and the coherence of our gathered data we can make an assessment of how our design has been achieved in the acoustical and electronic realm.

What’s interesting to me is that often the discussion of system optimization starts here. In fact, as we have seen, the process begins as early as the pre-production stage when talking with different departments and the client, and even when asking ourselves what our role is at the gig. The final analysis of any design comes down to the tool that we always carry with us: our ears. Our ears are the final arbiters after our evaluation of acoustical and mechanical variables, and are used along every step of our design path along with our trusty use of  “common sense.” In the end, our careful assessment of variables leads us to utilize the power of the scientific method to make educated decisions to work towards our end goal: the blank canvas, ready to be painted.

Big thanks to the following for letting me reference them in this article: Jamie Anderson at Rational Acoustics, Sully (Chris) Sullivan, and Alignarray (www.alignarray.com)