Thursday, September 12, 2013

Just Because it's Sound Doesn't Mean it has to be Mixed

Mixing is like driving—everybody does it, it gets you from here to there, and it seems like it’s been part of the culture forever.

For recording or broadcast requirements with a limited channel count, a stereo or mono mix will usually fit the bill, but for live events, perhaps we can do better.

As a case in point, consider a talker at a lectern in a large meeting room. Conventional practice would dictate routing the talker’s microphone to two loudspeakers at the front of the room via the left and right masters, and then feeding the signal with appropriate delays to additional loudspeakers throughout the audience area. A mono mix with the lectern midway between the loudspeakers will allow people sitting on or near the center line of the room to localize the talker more or less correctly by creating a phantom center image, but for everyone else, the talker will be localized incorrectly toward the front-of-house loudspeaker nearest them.

In contrast to a left-right loudspeaker system, natural sound in space does not take two paths to each of our ears. Discounting early reflections, which are not perceived as discrete sound sources, direct sound naturally takes only a single path to each ear. A bird singing in a tree, a speaking voice, a car driving past—all these sounds emanate from single sources. It is the localization of these single sources amid innumerable other individually localized sounds, each taking a single path to each of our two ears, that makes up the three-dimensional sound field in which we live. All the sounds we hear naturally, a complex series of pressure waves, are essentially “mixed” in the air acoustically with their individual localization cues intact.

Our binaural hearing mechanism employs inter-aural differences in the time-of-arrival and intensity of different sounds to localize them in three-dimensional space—left-right, front-back, up-down. This is something we’ve been doing automatically since birth, and it leaves no confusion about who is speaking or singing; the eyes easily follow the ears. By presenting us with direct sound from two points in space via two paths to each ear, however, conventional L-R sound reinforcement techniques subvert these differential inter-aural localization cues.

On this basis, we could take an alternative approach in our meeting room and feed the talker’s mic signal to a single nearby loudspeaker, perhaps one built into the front of the lectern, thus permitting pinpoint localization of the source. A number of loudspeakers with fairly narrow horizontal dispersion, hung over the audience area and in line with the direct sound so that each covers a fairly small portion of the audience, will subtly reinforce the direct sound as long as each loudspeaker is individually delayed so that its output is indistinguishable from early reflections in the target seats.

Such a system can achieve up to 8 dB of gain throughout the audience without the delay loudspeakers being perceived as discrete sources of sound, thanks to the well known Haas- or precedence-effect. A talker or singer with strong vocal projection may not even need a single “anchor” loudspeaker at the front at all.

As an added benefit to achieving intelligibility at a more natural level, the audience will tend to be unaware that there is a sound system in operation, an important step in reaching the elusive system design goal of transparency—people simply hear the talker clearly and intelligibly at a more or less normal level. This approach, which has been dubbed “source-oriented reinforcement,” precludes the sound system from acting as a barrier separating the performer from the audience, because it merely replicates what happens naturally, and does not disembody the voice through the removal of localization cues.

Traditional amplitude-based panning, which, as noted above, works only for those seated in the sweet spot along the centre axis of the venue, is replaced in this approach by time-based localization, which has been shown to work for better than 90 per cent of the audience, no matter where they are seated. Free from constraints related to phasing and comb-filtering that are imposed by a requirement for mono-compatibility or potential down-mixing—and that are largely irrelevant to live sound reinforcement—operators are empowered to manipulate delays to achieve pin-point localization of each performer for virtually every seat in the house.

Source-oriented reinforcement has been used successfully by a growing number of theatre sound designers, event producers and even DJs over the past 15 years or so, and this is where a large matrix comes into its own. Happily, many of today’s live sound boards are suitably equipped, with delay and EQ on the matrix outputs.

The situation becomes more complex when there is more than one talker, a wandering preacher, or a stage full of actors, but fortunately, such cases can be readily addressed as long as correct delays are established from each source zone to each and every loudspeaker on a one-to-one basis.

This requires more than a console level matrix with just output delays, or even assigning variable input delays to individual mics, since it necessitates a true delay-matrix allowing multiple independent time-alignments between each individual source zone and the distributed speaker system.

One such delay matrix that I have used successfully is the TiMax2 Soundhub, which offers control of both level and delay at each crosspoint in matrixes ranging from 16 x 16 up to 64 x 64 to define unique image definitions anywhere on the stage or field of play.

The Soundhub is easily added to a house system via analog, AES digital, and any of the various audio networks currently available, with the matrix typically being fed by input-channel direct outputs, or by a combination of console sends and/or output groups, as is the practice of the Royal Shakespeare Company, among others.

A familiar looking software interface allows for easy programming as well as real-time level control and 8-band parametric EQ on the outputs. A PanSpace graphical object-based pan programming screen allows the operator to drag input icons around a set of image definitions superimposed onto a jpg of the stage, a novel and intuitive way of localizing performers or manually panning sound effects.


 
The TiMax PanSpace graphical object-based pan programming screen


For complex productions involving up to 24 performers, designers can add the TiMax Tracker, a radar-based performer-tracking system that interpolates softly between image definitions as performers move around the stage, thus affording a degree of automation that is otherwise unattainable.

Where very high SPLs are not required, reinforcement of live events may best be achieved not by mixing voices and other sounds together, but by distributing them throughout the house with the location cues that maintain their separateness, which is, after all, a fundamental contributor to intelligibility, as anyone familiar with the “cocktail party” effect will attest.

As veteran West End sound designer Gareth Fry says, “I’m quite sure that in the coming years, source-oriented reinforcement will be the most common way to do vocal reinforcement in drama.”

While mixing a large number of individual audio signals together into a few channels may be a very real requirement for radio, television, cinema, and other channel-restricted media such as consumer audio playback systems, this is certainly not the case for corporate events, houses of worship, theatre and similar staged entertainment.

It may sound like heresy, but just because it’s sound doesn’t mean it has to be mixed. With the proliferation of matrix consoles, adequate DSP, and sound design devices such as the TiMax2 Soundhub and TiMax Tracker available to the sound system designer, mixing is no longer the only way to work with live sound—let alone the best way for every occasion.