Empowering the Next Generation of Women in Audio

Join Us

(Not So) Basic Networking For Live Sound Engineers

Part Three: Networking Protocols

(or A History of IEEE Standards)

Read Part One Here

Read Part Two Here

Evaluating Applications

One thing I have learned from my do-it-yourself research in computer science that I have applied to understanding the world in general is the concept of building on “levels of abstraction.” (Once again, here I am quoting Carrie Ann Philbin from the “Crash Course: Computer Science” YouTube series) [1]. From the laptop that this blog was written on, to performing a show in an arena, all these things would not be possible if it were not for the multitude of smaller parts working together to create a system. Whether it is an arena concert divided into different departments to execute the gig or a data network broken up into different steps in the OSI Model, we can take a complicated system and break it down into its composite parts to understand how it works as a whole. Similarly, the efficiency and innovation of this compartmentalization in technology lays in the fact that one person can work on just one section of the OSI Model (like the Transport Layer) while not really needing to know anything about what’s happening on the other layers.

 

This is why I have spent all this time in the last two blogs of “Basic Networking For Live Sound Engineers” breaking up the daunting concept of networking into smaller composites from defining what is a network to designing topologies including VLANS and trunks. At this point, we have spent a lot of time talking about how everything from Cat6 cable to switches physically and conceptually works together. Now it’s time to really dive deep into the languages, or protocols, that these devices use to transmit audio. This is a fundamental piece in deciding on a network design because one protocol may be more appropriate for a particular design versus another. As we discuss how these protocols handle different aspects of a data packet differently, I want you to think about why one might be more beneficial in one situation versus another. After all, there are so many factors that go into the design of a system from working in pre-existing infrastructures to building networks from scratch, that we must take these variables into account in our network design decisions. A joke often appears in the world of live entertainment: you can have cheap, efficient, or quality. Pick 2.

What Is In A Packet, Really?

As a quick refresher from Part 2, data gets encapsulated in a process that involves the formation of a header and body for each packet. The very basic overall structure of a packet or frame includes a header and body. How you define each section and whether it is actually called a “packet” or “frame” depends on what layer of the OSI Model you are referring to.

Basic structure of a data packet…or do I mean frame? It depends!!

 

Now this back and forth of terminology seemed really confusing until I read a thread in StackExchange that pointed out that the “combination” of the header and data at Level 2 is called a frame and at Level 3 is called a packet [2]. The change in terminology corresponds to different additions in the encapsulation process at different layers in the OSI Model.

In an article by Alison Quine on “How Encapsulation Works Within the TCP/IP Model,” the encapsulation process involves adding headers onto a body of data at each step starting from the top of the OSI model at the Application layer and moving down to Physical Layer, and then stripping off each of those headers as you move back up the OSI Model in reverse through each process [3]. That means that during the encapsulation process at each parameter within the OSI Model for a given network, there is another header that gets added on to help the data get to the right place. Audinate’s Dante Level 3 training on “IP Encapsulation” talks about this process in a network stack. At the Application level, we start with a piece of data. Then at the Transport Layer, the source port, destination port, and the transport protocol attach to the data or payload. At the Network Layer, the Destination and Source IP address add on top of what already exists in the Transport Layer. Then at the Data Link layer, the destination and source MAC addresses attach on top of everything else in the frame by referencing an ARP table [4]. ARP, or Address Resolution Protocol, uses message requests to build tables in devices (like a switch, for example) to match IP addresses to MAC addresses, and vice versa.

So I want to pause for a second before we move onward to really drive the point home that the OSI Model is a conceptual tool used for educational purposes to talk about different aspects of networking. For example, you can use the OSI Model to understand network protocols or understand different types of switches. The point is we are using it here to understand the signal flow in the encapsulation process of data, just as you would look at a chart of signal flow for a mixer.

Check 1, Check 2…

There is the old visage that time equals money, but the reality of working in live sound is that time is of the essence. Lost audio packets that create jitter or sound audibly delayed (our brains are very good at detecting time differences) are not acceptable. So it goes without saying that data has to arrive as close to synchronously as possible. In my previous blog on clocks, I talked about the importance of different digital audio devices starting their sampling at the same rate based on a leader clock (also referred to as a master clock) in order to preserve the original waveform. An accurate clock is important in preserving the word length, or bits, of the data. Let’s look at this example:

 

1010001111001110

1010001111001110

 

In this example, we have two 16 bit words which represent two copies of the same sample of data traveling between two devices that are in sync because of the same clock. Now, what happens if the clock is off by just one bit?

If the sample is off by even just one bit, the whole word gets shifted and produces an entirely different value altogether! This manifests itself as digital artifacts, jitter, or no signal at all. So move up a “level of abstraction” to the data packet at the Network level in the OSI Model and you can understand why it is important for packets to arrive on time in a network so that bits of data don’t get lost or packets don’t collide because otherwise, it will create a broadcast storm. But as I’ve mentioned before, UDP and TCP/IP handles data accuracy and timing differences.

 

Recall from Part 2 that TCP/IP checks for a “handshake” between the receiver and sender to validate the data transmission at the cost of time, while UDP decreases transmission time in exchange for not doing this back and forth validation. In an article from LearnCisco on “Understanding the TCP/IP Transport Layer,” TCP/IP is a “connection-oriented protocol” that requires adding more processes into the header to verify the “handshake” between the sender and receiver [5]. On the other hand, UDP acts as a “connectionless protocol”:

[…] there will be some error checking in the form of checksums that go along with the packet to verify integrity of those packets. There is also a pseudo-header or small header that includes source and destination ports. And so, if the service is not running on a specific machine, then UDP will return an error message saying that the service is not available. [5]

So instead of verifying that the data made it to the destination, UDP will check that the packet’s integrity is solid and if there is a path available for it to take. If there is no available path, the packet just won’t get sent. Due to the lack of “error checking” in UDP, it is imperative that the packets arrive at their correct destination and on time. So how does a network actually keep time? In reference to what?

Time, Media Clocking, and PTP

Let’s get philosophical for a moment and talk about the abstraction of time. So I have a calendar on my phone that I schedule events and reminders based on a day divided into hours and minutes. This division of hours and minutes are arguably pointless without being referenced to some standard of time, which in this case is the clock on my phone. I assume that the clock inside my phone is accurate in relation to a greater reference of time wherever I am located. The standard for civil time is UTC or “Coordinated Universal Time” which is a compromise between the TAI standard, based on atomic clocks, and UT1, which is based on an average solar day, by making up for it in leap seconds [6]. In order for me to have a Zoom call with someone in another time zone, we need a reference to the same moment wherever we are because it doesn’t matter if I say our Zoom call is at 12 pm Pacific Standard Time and they think it is at 3 pm Eastern Standard Time as long as our clocks have the same ultimate point of reference, which for us civilians is UTC. In this same sense, digital devices need a media clock with reference to a common master (but we are going to update this term to leader) in order to make sure data gets transmitted without bit-slippage as we discussed earlier.

 

In a white paper titled “Media Clock Synchronization Based On PTP” from the Audio Engineering Society 44th International Conference in San Diego, Hans Weibel and Stefan Heinzmann note that, “In a networked media system it is desirable to use the network itself for ensuring synchronization, rather than requiring a separate clock distribution system that uses its own wiring” [7]. This is where PTP or Precision Time Protocol comes in. The IEEE (Institute of Electrical and Electronics Engineers) 1588 standardized this protocol in 2002, and expanded it further in 2008 [7]. The 2002 standard created PTPv1 that works using UDP on a level of microsecond accuracy by sending sync messages between leader and follower clocks. As described in the Weibel and Heinzmann paper, on the Application layer follower nodes compare their local clocks to the sync messages sent by the leader and adjust their clocks to match while also taking into account the absolute time offset in the delay between the leader and follower [7]. Say we have two Devices A and B:

 

Device A (our leader for all intents and purposes) sends a Sync message to Device B saying, “This is what time it is. 11:00 A.M.”

Device B says, “Ok. I think it’s 12:00 P.M,” This is the Follow_Up message.“What time did you send that message?” says the Delay_Request message.

Device A replies, “At 11:00 A.M.” This is the Delay_Response message. “What time did you receive it?”

Device B replies, “At 12:15 P.M. Ok, I’ll adjust.”

Analogy of clocking communication in PTPv1 as described in IEEE 1588-2002

This back and forth allows the follower to adjust their clocks to whatever clock is considered the leader according to the best master clock algorithm (which should be renamed the best leader clock algorithm) and the ultimate reference being considered the grandmaster clock/grandleader clock [8]. Fun fact: in the Weibel and Heinzmann paper, they point out that “the epoch of the PTP time scale is midnight on 1 January TAI. A sampling point coinciding with this point in absolute time is said to have zero phase” [9].

So in 2008, the standards got updated to PTPv2, which of course is not backwards compatible with PTPv1 [10]. But this update includes changing how clock quality is determined, going from all PTP messages being multicast in v1 to having the option of unicast in v2, improving clocking accuracy from microseconds to nanoseconds, and the introduction of transparent clocks. The 1588-2002 standard introduced the concept of ordinary clocks as a device or clock node with one port while boundary clocks have two or more ports [11]. Switches and routers can be an example of a boundary clock while other end-point devices including audio equipment can be examples of ordinary clocks. A Luminex article titled “PTPv2 Timing protocol in AV Networks” describes how “[a] Transparent Clock will calculate how long packets have spent inside of itself and add a correction for that to the packets as they leave. In that sense, the [boundary clock] becomes ‘transparent’ in time, as if it is not contributing to delay in the network” [12]. PTPv2 improves upon the Sync message system by adding an announce message scheme for electing the grandmaster/grandleader clock. The Luminex article illustrates this by describing how a PTPv2 device starts up in a state “listening” for announce messages that include information about the quality of the clock until a determined amount of time called the Announce Timeout Interval. If no messages arrive, that device becomes the leader. Yet if it receives an announce message indicating the other clock has superior quality, it will revert to a follower and make the other device the leader [13]. It is these differences in the handling of clocking between IEEE 1588-2002 and 2008 that will be key to understanding the underlying difference when talking about Dante versus AVB.

Dante, AVB, AES67, RAVENNA, and Milan

Much like the battles between Blu-Ray, HD DVDs, and other contending audiovisual formats, you can bet that there has been a struggle over the years to create a manufacturer-independent standard for audio-over-IP or networking protocols used in the audio world. The two major players that have come out on top in terms of widespread use in the audio industry are AVB and Dante. AES67 and RAVENNA are popular as well, RAVENNA dominating the world of broadcast.

Dante, created by the company Audinate, began in 2003 under the key principle that still makes the protocol appealing today: the ability to use pre-existing IT infrastructures to distribute audio over a network [14]. Its other major appeal is that it allows for use of redundancy that makes it particularly appealing to the world of live production. In a Dante network you can set up a primary and secondary network, the secondary being an identical “copy” of the primary so that if the primary network fails, it switches over seamlessly to the secondary. Dante works at the Network Layer (Layer 3) of the OSI Model by resting on top of the IP addressing schemes already in place in a standard IT networking system and works above this. It’s understandable financially why a major corporate office would want to use this protocol because of the savings on overhauling the entire infrastructure of an office building to put in new switches, upgrade topologies, and so on.

An example of a basic Dante Network with redundant primary (blue) and secondary (red) networks

The adaptable nature of Dante comes from existing as a Layer 3 protocol, which allows one to use most Gigabit switches and even sometimes 100Mbps switches to distribute a Dante network (but only if it’s solely a 100Mbps network) [15]. That being said, there are some caveats. It is strongly recommended (and in 100Mbps networks, mandatory) to use specific Quality of Service (QoS) settings when configuring managed switches (switches whose ports and other features are configurable usually via a software GUI) to be used for Dante. This includes flagging specific DSCP values that are important to Dante traffic as high priority, including our friend PTP. Other network traffic can exist alongside Dante traffic on a network as long as the subnets are configured correctly (for more info on what I mean by subnets, see Part 1 of this blog series). I myself personally prefer configuring specific VLANs for dedicated network control traffic and Dante to keep the waters clear between the two. This is because I know control network traffic will not be prioritized over Dante traffic because of QoS, but at the same time Dante was made for this so as long as your subnets are configured correctly, it should be fine. The issue is that with Dante using PTPv1, even with proper QoS settings the clock precision can get choked if there are issues with bandwidth. The Luminex article mentioned earlier discusses this: “Clock precision can still be affected by the volume of traffic and how much contention there is for priority. Thus; PTP clock messages can get stuck and delayed in the backbone; in the switches between your devices” [16].

So since Dante uses PTPv1, Dante will find the best device on the network to be the Master (Leader) Clock using PTP as the clocking system for the entire network, and if one device drops out, it will elect a new Master (Leader) Clock based on the parameters we discussed in PTPv1. This can be manually configured too if necessary. According to the 1588-2008 standard, PTPv2 was not backwards compatible with PTPv1, but ANOTHER revision of the standard in 2019 (IEEE 1588-2019) included backwards compatibility [17]. AES67, RAVENNA, and AVB use PTPv2 (although AVB uses its own profile of IEEE 1588-2008, which we will talk about later). In a Shure article on “Dante And AES67 Clocking In Depth,” they point out that PTPv1 and PTPv2 can “coexist on the same network”, but “[i]f there is a higher prevision PTPv2 clock on a network, then one Dante device will synchronize to the higher-precision PTPv2 clock and act as a Boundary Clock for PTPv1 devices” [18]. So what we see happening is that end devices in the network that support PTPv2 introduce backwards compatibility with PTPv1, but the problem is that since these Layer 3 networks rely on standard network infrastructures, it’s not as easy to find switches that are capable of handling PTPv1 and PTPv2. On top of that, there is this juggling of keeping track of which devices are using what clocking system, and you can imagine that as this scales upward, it becomes a bigger and bigger headache to manage.

AES67 and RAVENNA use PTPv2 as well, but try to address some of these issues with improvements without reinventing the wheel. AES67 and RAVENNA also operate as Layer 3 protocols on top of standard IP networks, but were created by different organizations. The Audio Engineering Society came up with the standards outlining AES67 first in 2013 with revisions thereafter [19]. The goal of AES67 is to create a set of standards that allow for interoperability between devices, which is a concept we are going to see come up again when we talk about AVB in more depth, but AES67 applies it differently. What AES67 aimed to achieve is to use preexisting standards from the IEEE and IETF (Internet Engineering Task Force) to make a higher performing audio networking protocol.  What’s interesting is that because AES67 shares many of the same standards as RAVENNA, RAVENNA supports a profile of AES67 as a result [20]. RAVENNA is an audio-over-IP protocol popular particularly in the broadcast world. The place of RAVENNA as the standard in broadcasting comes from its flexibility in ability to transport a multitude of different data formats and sampling rates for both audio and video, along with low latency, and support of WAN connections [21]. So as technology improves, new protocols keep being made to try to accommodate the new advances, but one starts to wonder why don’t the standards just get revised themselves instead of trying to make the products reflect an ever-changing industry? AES67 kind of addresses this by using the latest IEEE and IETF standards, but maybe the solution is deeper than that. Well that’s exactly what happened with the creation of AVB.

AVB stands for Audio Video Bridging and differs on a fundamental level from Dante because it is a Data Link, Layer 2 protocol, whereas Dante is a Network, Level 3 protocol. So since these standards affect Layer 2, a switch must be designed for AVB implementation in order to be compatible with the standards on that fundamental level. This brings in an OSI Model conceptualization of switches designed for a Layer 2 implementation versus a Layer 3 implementation. In fact, the concept behind designing AVB stemmed from the need to “standardize” audio-over-IP so compatible different devices could talk across different manufacturers. Dante, being owned by a company, requires specific licensing for devices to be “Dante-enabled.” The IEEE wanted to create standards for AVB to ensure compatibility across all devices on the network regardless of the manufacturer. These AVB compatible switches have been notoriously magnitudes more expensive than a more common, run-of-the-mill TCP/IP switch, so it has often been seen as a roadblock to AVB deployments simply because of the cost factor in replacing an infrastructure of more common (read cheaper), Layer 3 switches with Layer 2 AVB-compatible (read more expensive) switches.

When talking about most networking protocols, especially AVB, the discussion dives into layers and layers of standards and revisions. AVB in and of itself, refers to the IEEE 802.1 set of standards along with others outlined in IEEE 1722 and IEEE 1733 [22]. So I know all this talk of IEEE standards gets really confusing so it is helpful to remember that there is a hierarchy to all this. In an AES White Paper by Axel Holzinger and Andreas Hildebrand with a very long title called “Realtime Linear Audio Distribution Over Networks A Comparison of Layer 2 And 3 Solutions Using The Example Of Ethernet AVB And RAVENNA” they lay out the four AVB protocols in 802.1:

 

 

It’s important here to stop and go over some new terminology when discussing devices in an AVB domain since it is Layer 2, after all. Instead of talking about a network, senders, receivers, and switches we are going to replace the same consecutive terms with domain, talkers, listeners, and bridges [24].

An example of a basic AVB network

IEEE 802.1AS is basically an AVB-specific profile of the IEEE 1588 standards for PTPv2. One of the editions of this standard, IEEE 802.1AS-2011, introduces gPTP (or “generalized PTP”). When used in conjunction with IEEE 1722-2011, gPTP introduces a presentation time for media data which indicates “when the rendered media data shall be presented to the viewer or listener” [25]. What I have learned from all this research is that the IEEE loves nesting new standards within other standards like a convoluted russian doll. The Stream Reservation Protocol (SRP also known as IEEE 802.1Qat) is the key that makes AVB shine from other network protocols because it allows endpoints in the network to check routes and reserve bandwidth, and SRP “checks end-to-end bandwidth availability before an A/V stream starts” [26]. This basically ensures that data won’t be sent until stream bandwidth is available and lets the endpoints decide the best route to take in the domain. So in a Dante deployment, adding additional switches daisy-chained in a network increases overall network latency the more hops that are added, and results in a need to reevaluate the network topology configuration entirely. Dante latency is set per device and depending on the size of the network, but with AVB, thanks to SRP and the QoS improvements, the bandwidth reservation gets announced through the network and latency times are kept lower even with large network deployments.

The solidity and fast communications of AVB networks have made them more common because of their ability, as the name implies, to carry audio, video, and data on the same network. The problem with all these network protocols follows the logic of Moore’s Law. If you couldn’t tell from all the revisions of IEEE standards that I have been listing, these technologies improve and get revised very quickly. Because technology is constantly improving at a blinding pace, it’s no wonder that gear manufacturing companies haven’t been able to “settle” on a common standard the way that they settled on, say, the XLR cable. This is where the newest addition to the onslaught of protocols comes in: Milan.

The standards of AVB kept developing with more improvements just like the revisions of IEEE 1588, and have led to the latest development in AVB technology called Milan. With the collaboration of some of the biggest names in the business, Milan was developed as a subset of standards within the overarching protocol of AVB. Milan includes the use of a primary and secondary redundancy scheme like that of Dante, which was not available in previous AVB networks, among other features. The key here is that Milan is open source meaning that manufacturers can develop their own implementation of Milan specific to their gear as long as it follows the outlined standards [27]. This is pretty huge if you consider how many different networking protocols are used across different pieces of gear in the audio industry. Avnu Alliance, the organization of collaborating manufacturers who developed Milan, have put together the series of specifications for Milan under the idea that any product that is released with a “Milan-ready” certification, or a badge of that nature, will be able to talk to one another over this Milan network [28].

 

A Note On OSC And The Future

Before we conclude our journey through the world of networking, I want to take a minute for  OSC. Open Sound Control protocol, or OSC, is an open source communications protocol that was originally designed for use with electronic music instruments but has expanded to streamlining the communications between anything from controlling synthesizers, to connecting movement trackers and software programs, to controlling virtual reality [29]. It is not an audio transport protocol, but used for device communication like MIDI (except not like MIDI because it is IP-based). I think this is a great place to end on because OSC is a great example of the power of open source technology. The versatility in OSC and its open-source platform has allowed for many programs from small to large to implement this protocol, and it is a testimony to the improvement of workflows when everyone (i.e. open-source) has the ability to input changes to make things better. We’ve spent this entire blog talking about the many different standards that have been implemented over the years to try and improve upon previous technology. Yet a gridlock of progress ensues mostly due to the fact that a standard gets made and by the time it actually gets enacted, the standard is already out of date because the technology has already surpassed that previous point in time.

 

So maybe it’s time for something different.

Maybe the open source nature of Milan and OSC are the way of the future because if everyone can put their heads together to try and develop specifications that are fluid and open to change as opposed to restricted by the rigidity of bureaucracy, maybe hardware will finally be able to keep up with the pace of the minds of the people using it.

Endnotes

[1] https://www.youtube.com/playlist?list=PL8dPuuaLjXtNlUrzyH5r6jN9ulIgZBpdo

[2]https://networkengineering.stackexchange.com/questions/35016/whats-the-difference-between-frame-packet-and-payload

[3] https://www.itprc.com/how-encapsulation-works-within-the-tcpip-model/

[4] https://youtu.be/9glJEQ1lNy0

[5] https://www.learncisco.net/courses/icnd-1/building-a-network/tcpip-transport-layer.html

[6] https://www.iol.unh.edu/sites/default/files/knowledgebase/1588/ptp_overview.pdf

[7] https://www.aes.org/e-lib/browse.cfm?elib=16146 (pages 1-2)

[8] https://www.nist.gov/system/files/documents/el/isd/ieee/tutorial-basic.pdf

[9] https://www.aes.org/e-lib/browse.cfm?elib=16146 (page 5)

[10] https://en.wikipedia.org/wiki/Precision_Time_Protocol

[11]https://community.cambiumnetworks.com/t5/PTP-FAQ/IEEE-1588-What-s-the-difference-between-a-Boundary-Clock-and/td-p/50392

[12]https://www.luminex.be/improve-your-timekeeping-with-ptpv2/

[13] ibid.

[14]https://www.audinate.com/company/about/history

[15]https://www.audinate.com/support/networks-and-switches

[16]https://www.luminex.be/improve-your-timekeeping-with-ptpv2/

[17]https://en.wikipedia.org/wiki/Precision_Time_Protocol

[18]https://service.shure.com/s/article/dante-and-aes-clocking-in-depth?language=en_US

[19]https://www.ravenna-network.com/app/download/13999773923/AES67%20and%20RAVENNA%20in%20a%20nutshell.pdf?t=1559740374

[20] ibid.

[21]https://www.ravenna-network.com/using-ravenna/overview

[22 ]Kreifeldt, R. (2009, July 30). AVB for Professional A/V Use [White paper]. Avnu Alliance.

[23] https://www.aes.org/e-lib/browse.cfm?elib=16147

[24] ibid.

[25] https://www.aes.org/e-lib/browse.cfm?elib=16146 (page 6)

[26] Kreifeldt, R. (2009, July 30). AVB for Professional A/V Use [White paper]. Avnu Alliance.

[27]https://avnu.org/wp-content/uploads/2014/05/Milan-Whitepaper_FINAL-1.pdf (page 7)

[28]https://avnu.org/specifications/

[29] http://opensoundcontrol.org/osc-application-areas

 

Resources

Audinate. (2018, July 5). Dante Certification Program – Level 3 – Module 5: IP Encapsulation [Video]. YouTube.

https://www.youtube.com/watch?v=9glJEQ1lNy0&list=PLLvRirFt63Gc6FCnGVyZrqQpp73ngToBz&index=5

Audinate. (2018, July 5). Dante Certification Program – Level 3 – Module 8: ARP [Video]. YouTube. https://www.youtube.com/watch?v=x4l8Q4JwtXQ

Audinate. (2018, July 5). Dante Certification Program – Level 3 – Module 23: Advanced Clocking [Video]. YouTube.

https://www.youtube.com/watch?v=a7Y3IYr5iMs&list=PLLvRirFt63Gc6FCnGVyZrqQpp73ngToBz&index=23

Audinate. (2019, December). The Relationship Between Dante, AES67, and SMPTE ST 2110 [White paper]. Uploaded to Scribd. Retrieved from

https://www.scribd.com/document/439524961/Audinate-Dante-Domain-Manager-Broadcast-Aes67-Smpte-2110

Audinate. (n.d.). History. https://www.audinate.com/company/about/history

Audinate. (n.d.). Networks and Switches.

https://www.audinate.com/support/networks-and-switches

Avnu Alliance. (n.d.). Avnu Alliance Test Plans and Specifications.

https://avnu.org/specifications/

Bakker, R., Cooper, A. & Kitagawa, A. (2014). An introduction to networked audio [White paper]. Yamaha Commercial Audio. Retrieved from

https://download.yamaha.com/files/tcm:39-322551

Cambium Networks Community [Mark Thomas]. (2016, February 19). IEEE 1588: What’s the difference between a Boundary Clock and Transparent Clock? [Online forum post]. https://community.cambiumnetworks.com/t5/PTP-FAQ/IEEE-1588-What-s-the-difference-between-a-Boundary-Clock-and/td-p/50392

Cisco. (n.d.) Layer 3 vs Layer 2 Switching.

https://documentation.meraki.com/MS/Layer_3_Switching/Layer_3_vs_Layer_2_Switching

Crash Course. (2020, March 19). Computer Science [Video Playlist]. YouTube. https://www.youtube.com/playlist?list=PL8dPuuaLjXtNlUrzyH5r6jN9ulIgZBpdo

Eidson, J. (2005, October 10). IEEE 1588 Standard for a Precision Clock Synchronization Protocol for Networked Measurement and Control Systems [PDF of slides]. Agilent Technologies. Retrieved from

https://www.nist.gov/system/files/documents/el/isd/ieee/tutorial-basic.pdf

Garner, G. (2010, May 28). IEEE 802.1AS and IEEE 1588 [Lecture slides]. Presented at Joint ITU-T/IEEE Workshop on The Future of Ethernet Transport, Geneva 28 May 2010. Retrieved from https://www.itu.int/dms_pub/itu-t/oth/06/38/T06380000040002PDFE.pdf

Holzinger, A. & Hildebrand, A. (2011, November). Realtime Linear Audio Distribution Over Networks A Comparison Of Layer 2 And Layer 3 Solutions Using The Example Of Ethernet AVB And RAVENNA [White paper]. Presented at the AES 44th International Conference, San Diego, CA, 2011 November 18-20. Retrieved from https://www.aes.org/e-lib/browse.cfm?elib=16147

Johns, Ian. (2017, July). Ethernet Audio. Sound On Sound. Retrieved from https://www.soundonsound.com/techniques/ethernet-audio

Kreifeldt, R. (2009, July 30). AVB for Professional A/V Use [White paper]. Avnu Alliance.

Laird, Jeff. (2012, July). PTP Background and Overview. University of New Hampshire InterOperability Laboratory. Retrieved from

https://www.iol.unh.edu/sites/default/files/knowledgebase/1588/ptp_overview.pdf

LearnCisco. (n.d.). Understanding The TCP/IP Transport Layer.

TCP vs UDP | TCP 3 Way Handshake

LearnLinux. (n.d.). ARP and the ARP table.

http://www.learnlinux.org.za/courses/build/net-admin/ch03s05.html

Luminex. (2017, June 6). PTPv2 Timing protocol in AV networks. https://www.luminex.be/improve-your-timekeeping-with-ptpv2/

Milan Avnu. (2019, November). Milan: A Networked AV System Architecture [PDF of slides].

Mullins, M. (2001, July 2). Exploring the anatomy of a data packet. TechRepublic. https://www.techrepublic.com/article/exploring-the-anatomy-of-a-data-packet/

Network Engineering [radiantshaw]. (2016, September 18). What’s the difference between Frame, Packet, and Payload? [Online forum post]. Stack Exchange.

https://networkengineering.stackexchange.com/questions/35016/whats-the-difference-between-frame-packet-and-payload

Opensoundcontrol.org. (n.d.). OSC Application Areas. Retrieved August 10, 2020 from http://opensoundcontrol.org/osc-application-areas

Perales, V. & Kaltheuner, H. (2018, June 1). Milan Whitepaper [White Paper]. Avnu Alliance. https://avnu.org/wp-content/uploads/2014/05/Milan-Whitepaper_FINAL-1.pdf

Precision Time Protocol. (n.d.). In Wikipedia. Retrieved August 10, 2020, from https://en.wikipedia.org/wiki/Precision_Time_Protocol

Presonus. (n.d.). Can Dante enabled devices exist with other AVB devices on my network? https://support.presonus.com/hc/en-us/articles/210048823-Can-Dante-enabled-devices-exist-with-other-AVB-devices-on-my-network-

Quine, A. (2008, January 27). How Encapsulation Works Within the TCP/IP Model. IT Professional’s Resource Center.

https://www.itprc.com/how-encapsulation-works-within-the-tcpip-model/

Quine, A. (2008, January 27). How The Transport Layer Works. IT Professional’s Resource Center. https://www.itprc.com/how-transport-layer-works/

RAVENNA. (n.d.). AES67 and RAVENNA In A Nutshell [White Paper]. RAVENNA. https://www.ravenna-network.com/app/download/13999773923/AES67%20and%20RAVENNA%20in%20a%20nutshell.pdf?t=1559740374

RAVENNA. (n.d.). What is RAVENNA?

https://www.ravenna-network.com/using-ravenna/overview/

Rose, B., Haighton, T. & Liu, D. (n.d.). Open Sound Control. Retrieved August 10, 2020 from https://staas.home.xs4all.nl/t/swtr/documents/wt2015_osc.pdf

Shure. (2020, March 20). Dante And AES67 Clocking In Depth. Retrieved August 10, 2020 from https://service.shure.com/s/article/dante-and-aes-clocking-in-depth?language=en_US

Weibel, H. & Heinzmann, S. (2011, November). Media Clock Synchronization Based On PTP [White Paper]. Presented at the AES 44th International Conference, San Diego, CA, 2011 November 18-20. Retrieved from https://www.aes.org/e-lib/browse.cfm?elib=16146

Basic Networking For Live Sound Engineers 

Part One: Defining A Network

The World of Audio Over IP

There is a certain sense of security that comes from physically plugging a cable made of copper from one device to another. On some level my engineer brain finds comfort believing that, “As long as I patch this end to that end correctly and the integrity of the cable itself has not been compromised, the signal will get from Point A to Point B.”  I believe one of the most daunting aspects of understanding networked audio, and audio-over-IP in general, stems from the feeling of self-induced, psychological uncertainty in one’s ability to “physically” route one thing to another. I mean, after all these years consoles still have faders, buttons, and knobs because people enjoy the tactile feedback of performing a move related to their task in audio.

The psychological hurdle that must be overcome is that a network can be much like a copper multicore snake, sending multiple signals all over the place. The beauty and power of it is that it has so much more adaptability than our old copper friend. We can send larger quantities of high-quality signal around the world: a task that would be financially and physically impractical for a single project using physical wires. In this first blog, part 1 of a 3 part series, I will attempt to overview a basic understanding of what a network is and how we can create and connect to a network.

What Is A Network?

A network can refer to any group of things that interconnect to transfer data: think of a “social network” where a group of individuals exchange ideas in person or over the Internet. Cisco Systems (one of the biggest juggernauts of the industrial networking world) defines a network as “two or more connected computers that can share resources such as data, a printer, and Internet connection, applications, or a combination of these resources” (Cisco, 2006 [1]). We commonly see networks created using wired systems, Wi-Fi, or a combination of these. Wired systems build a network using physical Ethernet connections (Cat5e/Cat6 cabling) or fiber, while Wi-Fi uses radio frequencies to carry signals from device to device. “Wi-Fi” is a marketing term for the technology that the Institute of Electrical and Electronics Engineers (IEEE) define in standards 802.11, and we could dedicate an entire blog just to discussing this topic [2].

 

Unicast vs. Multicast

In a given network using the TCP/IP protocol, which stands for “Transmission Control Protocol/Internet Protocol”, devices exchange packets of data by requesting and responding to messages sent to one another. In a unicast message, one device talks directly to another as a point-to-point transmission. In a multicast message, one device can broadcast a message to multiple devices at once. To understand how devices exchange messages to one another, we must understand how IP and MAC addresses work.

I like to think of a data network like a department in a tour: there are the audio, lighting, video, and other departments, and each department has its own participants who communicate with each other within their own department. Let’s look at the analogy of a network compared to the audio department. Each individual, (the monitor engineer, PA techs, systems engineer, FOH Engineer, etc.), act as discrete hosts performing tasks like a computer or amplifier talking to one another on a data network. Every device has a unique MAC address, which stands for “Media Access Control” Address and, like the name of each person on a crew (except 48-bit and written in hexadecimal [3]), is unique to the hardware of a device on a network. An IP address is a 32-bit number written as 4 octets (if translated into binary) and is specific to devices within the same network [4]. Think of an IP address as different from a MAC address like a nickname is to a given name. There may be several folks nicknamed “Jay” on a crew, maybe Jennifer in Audio and John in Lighting, but as long as “Jay” is talking to people locally in the same department, the other hosts will know who “Jay” is being referred to.

These two networks (or tour departments) are not local to the same network

MAC addresses are specific to hardware, but IP addresses can be “reused” as long as there are no conflicts with another device of the same address within the same local network. A group of devices in the same IP range is called a LAN or Local Area Network. LANs can vary from basic to complex networks and are seen everywhere from the Wi-Fi network in our homes to a network of in-ear monitor transmitters and wireless microphone receivers connected to a laptop. So how do these devices talk to each other within a LAN?

IP Addresses and Subnet Masks within a LAN:

Let’s create a simple LAN of a laptop and a network-capable wireless microphone receiver and dive deep into understanding what composes an IP address. The computer has an IP address that is associated with it via its MAC address and the same goes for the receiver. In Figure A the two devices are directly connected from the network adapter of one to the other with an Ethernet Cat 6 cable.

Figure A

The IP address of the laptop is 192.168.1.1 and the IP address of the receiver is 192.168.1.20. Each of the four numbers separated by a period actually translates to an octet (8 bits) of binary. This is important because both devices are on the same subnet 192.168.1.XXX. A subnet is a way of dividing a network by having devices only look at other devices that are within their same network as defined by their subnet mask. There are 254 addresses available on the subnet mask 255.255.255.0. According to a Microsoft article, “Understanding TCP/IP addressing and subnetting basics”, XXX.XXX.XXX.0 is used to specify a network “without specifying a host” and XXX.XXX.XXX.255 is used to “broadcast a message to every host on the network” [5]. So, in this network example, neither the computer nor the receiver can use the IP addresses 192.168.1.0 or 192.168.1.255 because those addresses are reserved for the network and for broadcast. But how does the computer know to look for the receiver in the 192.168.1.XXX IP address range? Why doesn’t it look at 10.0.0.20? This has to do with the subnet mask of each device.

Let me give you a little history about these numbers: believe it or not, but there is an organization whose main gig is to assign IP addresses in the public Internet. The Internet Assigned Numbers Authority (IANA) manages IP addresses that connect you and your Internet Service provider (ISP) to the World Wide Web. In order to prevent conflicts with the IP addresses that connect with the Internet, the IANA enforces a set of standards created by the IETF (Internet Engineering Task Force). One set of standards referred to as RFC 1918 [6] reserves a specific set of IP ranges for private networks, like the example 192.168.1.XXX. That means that anyone can use them within their own LAN, as long as it does not connect to the Internet. To understand more about how our computers connect to the Internet, we have to talk about DNS and gateways, which is beyond the scope of this blog. The key for our laptop and receiver to determine whether another device is local to their LAN lies in the subnet mask. Both devices in Figure A have a subnet mask of 255.255.255.0. Each set of numbers, like the IP address, corresponds to an octet of binary. The difference is that instead of indicating a specific number, it indicates the number of available values for addresses in that range. The subnet mask becomes a lot easier to understand once you think about it in its true binary form. But trust me, once you understand what a subnet mask ACTUALLY refers to in binary, you will better understand how it refers to available IP addresses in the subnet.

A subnet mask is composed of 4 octets in binary. If we filled every bit in each octet except for the last and translated it to its true binary form we would get a subnet mask that looks like this:

255.255.255.0 can also be written as 11111111.11111111.11111111.00000000

Binary is base two and reflects an “on” or “off” value, which means that each position of each bit in the octet, whether it is zero or one, can mathematically equal 2^n (2 to the nth power) until you get to the 8th position.

The octet XXXXXXXX (value X in octet of either 1 or 0) can also be written as:

(2^7)+(2^6)+(2^5)+(2^4)+(2^3)+(2^2)+(2^1)+(2^0)

Binary math is simply done by “filling in” the position of the bit in the octet with a “true” value and then calculating the math from there. In other words, a binary octet of 11000000 (underlines added for emphasis) can be interpreted as

(2^7)+(2^6)+(0^5)+(0^4)+(0^3)+(0^2)+(0^1)+(0^0)=192

OK, OK, roll with me here. So if we do the binary math for all values in the octet being “true” or 1 then in the previous example,

11111111=(2^7)+(2^6)+(2^5)+(2^4)+(2^3)+(2^2)+(2^1)+(2^0)=255

So if we refer back to the first subnet mask example, we can discern based on the binary math that:

11111111.11111111.11111111.00000000=255.255.255.0

When a value is “true” or 1 in a bit in an octet, that position has been “filled” and no other values can be placed there. Think of each octet like a highway: each highway has 8 lanes that can fit up to 254 cars/hosts total on the highway (remember it is base 2 math and the values of 0 and 255 are accounted for). A value of 1 means that the lane has been filled by 2^n cars/hosts where n=lane position on the highway and the lanes count starting at 0 (because it is a computer). So to add another car, it must move to the next lane to the left or bit position. For example, if you climb up from 00000011 to 00000111 each 1 acts like cars filling up a lane, and if the lane is filled, the next bit moves on to the next left lane.

 

Each position of a bit is like a lane on a highway (top), when the value of the lowest bit is “filled” or True (remember this is an analogy, really it’s either binary On or Off), the ascending value “spills” over to the next bit (bottom) 

So why do we care about this? Well if a device has a subnet mask of 255.255.255.0 or 11111111.11111111.11111111.00000000 that means that all the binary values of the first 3 octets must match with the other devices in order for them to be considered to be “local” to the same local network. The only values or lanes “available” for hosts are in the last octet (hence the zeroes). So going back to Figure A our computer and wireless network both have a subnet mask of 255.255.255.0 which indicates that the first 3 octets of the IP address on both devices MUST be the same on both devices for them to talk to each other AND there are only 254 available IP addresses for hosts on the network (192.168.1-254). Indeed both the laptop and receiver are local because they both are on the 192.168.1.XXX subnet, and the subnet mask 255.255.255.0 only “allows” them to talk to devices within that local network.

In this example, we talked about devices given static IP addresses as opposed to addresses created using DHCP. In a static IP address, the user or network administrator defines the IP address for the device whereas a device set to DHCP, or Dynamic Host Configuration Protocol, looks to the network to determine what is the current available address for the device and assigns it to that device on a lease basis [7]. In the world of audio, the type of network addressing you choose for your system may vary from application to application, but static IP addressing is commonly preferred due to the ability for the operator to specify the exact range they want the devices to operate in as opposed to leaving it up to the network to decide. Returning to our earlier analogy of the audio department on a tour, each host needs a way to communicate with one another and also to other departments. What if the PA tech needs to talk to someone in the outside network of the lighting department? This is where routers and switches come into play.

A switch and a router often get referred to interchangeably when in fact they perform two different functions. A switch is a device that allows for data packets to be sent between devices on the same network. Switches have tables of MAC addresses on the same local network that they use to reference when sending data packets between devices. A router works by identifying IP addresses of different devices, and “directing traffic” by acting as a way to connect devices over separate networks. Routers do this by creating a “routing table” of IP addresses and when a device makes a request to talk to another device, it can reference its table to find the corresponding device to forward that message [8]. Routers are kind of like department crew chiefs where you can give them a message to be delivered to another department.

 

Routers can connect separate networks to allow them to talk to one another

Routers often get confused with their close relative the access point, and though you can use a router to function similarly to an access point, an access point cannot be a router. Routers and access points come up often in wireless applications as a way to remotely get into a network. The difference is that access points allow you to get into a specific local network or expand the current network. Unlike a router, access points do not have the capability to send messages to another network outside the LAN.

So now let’s say we want to add another device to our network in Figure A and we don’t need to cross into another network. For example, we want to add an in-ear monitor transmitter. One method we can use is to add a switch to connect all the devices.

Network from Figure A with an IEM transmitter added, all talking via a switch

The switch connects the three devices all on the same local network of 192.168.1.XXX. You can tell that they are all local to this network because they have the subnet mask 255.255.255.0, therefore all devices are only looking to “talk” to messages on 192.168.1.XXX since only the values in the last octet are available for host IP addresses. Voilà! We have created our first LAN!

It may seem daunting at first, but understanding the binary behind the numbering in IP addresses and subnet masks are the key to understanding how devices know what other hosts are considered to be on their local network or LAN. With the help of switches and access points, we can expand this local network and with the addition of routers, we can include other networks. Using these expanding devices allows us to divide our network further into different topologies. In the next blog, this concept will be expanded further in Basic Networking For Live Sound Part 2: Dividing A Network. Stay tuned!

If you want to learn more about networking, there are some GREAT resources available to you online! Check out training from companies such as:

https://www.audinate.com/learning/training-certification

https://www.cisco.com/c/en/us/training-events/training-certifications.html

https://avnu.org/training/

And more!


Endnotes

[1]https://www.cisco.com/c/dam/global/fi_fi/assets/docs/SMB_University_120307_Networking_Fundamentals.pdf

[2] https://www.cisco.com/c/en_ca/products/wireless/what-is-wifi.html

[3] https://www.audio-technica.com/cms/resource_library/files/89301711029b9788/networking_fundamentals_for_dante.pdf

[4] Ibid.

[5] https://support.microsoft.com/en-ca/help/164015/understanding-tcp-ip-addressing-and-subnetting-basics

[6] https://tools.ietf.org/html/rfc1918

[7] https://eu.dlink.com/uk/en/support/faq/firewall/what-is-dhcp-and-what-does-it-do

[8] https://www.cisco.com/c/en/us/solutions/small-business/resource-center/networking/how-does-a-router-work.html#~what-does-a-router-do


Resources:

Audinate. (n.d.). Dante Certification Program. https://www.audinate.com/learning/training-certification/dante-certification-program

Audio Technica U.S., Inc. (2014, November 5). Networking Fundamentals for Dante. https://www.audio-technica.com/cms/resource_library/files/89301711029b9788/networking_fundamentals_for_dante.pdf

Cisco. (n.d.) How Does a Router Work? https://www.cisco.com/c/en/us/solutions/small-business/resource-center/networking/how-does-a-router-work.html

Cisco. (2006). Networking Fundamentals. In SMB University: Selling Cisco SMB Foundation Solutions. Retrieved from https://www.cisco.com/c/dam/global/fi_fi/assets/docs/SMB_University_120307_Networking_Fundamentals.pdf

Cisco. (n.d.) What Is Wi-Fi? https://www.cisco.com/c/en_ca/products/wireless/what-is-wifi.html

D-Link. (2012-2018). What is DHCP and what does it do? https://eu.dlink.com/uk/en/support/faq/firewall/what-is-dhcp-and-what-does-it-do

Encyclopedia Brittanica. (n.d.). TCP/IP Internet Protocols. In Encyclopedia Brittanica. Retrieved April 26, 2020, from https://www.britannica.com/technology/domain-name

Generate Random MAC Addresses. (2020). Browserling. https://www.browserling.com/tools/random-mac

Internet Assigned Numbers Authority. (2020, April 21). In Wikipedia. https://en.wikipedia.org/wiki/Internet_Assigned_Numbers_Authority

Internet Engineering Task Force. (1996). Address Allocation for Private Internets (RFC 1918). Retrieved from https://tools.ietf.org/html/rfc1918

Microsoft Support. (2019, December 19). Understanding TCP/IP addressing and subnetting basics. https://support.microsoft.com/en-ca/help/164015/understanding-tcp-ip-addressing-and-subnetting-basics

Thomas, Jajish. (n.d.).What are Routing and Switching | Difference between Routing and Switching. OmniSecu.com. https://www.omnisecu.com/cisco-certified-network-associate-ccna/what-are-routing-and-switching.php

X