How to Write Immersive Audio-Drama that your Audience can Decode Easily

THE COUNTER-INTUITIVE WAY WE ACHIEVE IMMERSION

I’m unlikely to make any friends with this week’s comments… and they probably belong more properly to a discussion of production rather than scripting – though I think writers need to understand some of these production issues in order to write effectively for the medium.

To give them some context, I’m deaf in one ear so there are features of audio that just don’t “read” for me. This means that, if you don’t like what I have to say here, you can easily write it off as the ramblings of someone disqualified to speak because, well, I can’t hear properly anyway.

So, taking it under advisement that I am fully aware that I am talking from my own (and very non-typical) experience, you should probably treat this as an opinion piece… and yes, everything I post is my opinion, but I usually feel that I am standing on more solid ground when I post than on this particular occasion.

Okay, disclaimers dealt with. I want to talk about immersion in audio drama and how I think it is achieved versus how it is subverted by our attempts to achieve it.

In the past when I’ve talked to people about immersion in audio drama, I’ve found the conversations moves very quickly to how to create dense, sound rich, audio soundscapes. After much thinking and listening, I’ve come to the (possibly controversial) conclusion that the density of the soundscape is only a minor factor in immersion and can, in fact, work against its intended outcome.

SPOTLIGHT ON PSYCHOLOGY

There are some very important concepts from the world of psychology that have a lot to say about how immersion is achieved. These include closure, cognitive load, and cognitive equilibrium. The idea of cognitive equilibrium suggests that where we are receiving sensory input that fails to match our expectations (for example where the input is so ambiguous that the environment does not seem realistic to us) our brains get thrown into a state of dis-equilibrium and we adjust by either “waking up” out of the illusion all-together (like in a movie or book where the illusion of immersion is broken because a scene is too unrealistic to maintain our willing suspension of disbelief) or we become confused (losing our sense of place). Likewise cognitive load theory, which suggests that an unrealistic overloading of sound cues (where too many sounds are layered together or presented in an unrealistic manner) has a similar effect.

Audio has its own form of spotlight. That spotlight is volume. It tells the audience what is important in the audio story… and what’s important is the thing that is most clearly audible. That being the case, one of the most important techniques for managing immersion is the fade. The human brain acts naturally via the mechanism of the Reticular Activating System (RAS) to shut down our conscious recognition of background noise. In our day to day world, our brains filter noise all the time. We don’t notice the background hum of the traffic, or wind, etc. until we stop and consciously attend to these sounds. They have been turned into silence by our minds unconscious capacity to decide what is and isn’t important. In audio drama, our brains won’t filter out the background and we need to consciously control this for the listener. As a result, greater realism is created (and thereby a greater illusion of immersion) by mimicking the behavior of the RAS through the relative control of volume (fading out background sounds once established etc.). Failure to effectively use volume as a means of focusing listener attention also results in the dissonance that leads to disequilibrium.

This lesson has been very forcefully brought home to me by my youngest son’s disability. The RAS in his brain doesn’t work properly. His brain simply doesn’t filter the stimuli that he encounters. This means that for his entire life every part of his environment has been shouting at him at full volume. It makes it very difficult for him to learn in a traditional classroom (despite his IQ being pretty much off the chart). The world is a very confusing place for him to navigate and he is constantly having to scan everything he is receiving as input, sort it, and try to focus on what is important… consciously… all the time… everyday. Those audio dramas, which in the name of immersion, layer in too much stimuli and fail to use volume to properly separate figure (that which is important) from ground (that which is not) or fail to control focus and remove distraction, create exactly this experience for their listeners. Too much sound is merely noise, and because our brains don’t process artificial soundscapes in the same way they process sound in three dimensions (the real world) we have to be especially careful about controlling the soundscapes we construct (by fading and de-emphasising background from important foreground).

I mentioned earlier that I am deaf in one ear. This isn’t a problem most of the time. I usually have to have you in my field of vision to know I am being spoken to but generally it’s fine. The one time that it is not, is if I am at a restaurant. In a restaurant, if the background noise reaches a certain level, all the stimuli sort of blend together and I get the auditory equivalent of snow on a television screen. It leaves me feeling very isolated despite having a large crowd of people around me. I’m told by friends who are not hearing impaired that this experience is not unique to me, but tends to set in at higher levels of overload – say when a jack-hammer is going on a busy roadway outside a music store with speakers blaring into the street. At a certain point, the stimuli we are trying to separate and make sense of exceeds our brain’s capacity to do so and it all becomes noise.

TECHNIQUES FOR CREATING IMMERSION WHILE AVOIDING CONFUSION

Managing Ambiguous Sounds

As a general rule, sound needs to be “readable” or it results in listener confusion. Many sounds are ambiguous (crumpling cellophane can read as rain, fire, and waves on the beach depending on how it is used). The audience must be given cues to make sense of these sounds or they will not understand them and we risk having the illusion of immersion broken. To keep sounds from being ambiguous we need to identify them before we introduce them. There is a right way and a wrong way to do this.

Here’s an example of the wrong way (an ambiguous sound is introduced without cueing).

SOUND: FLAMES – ESTABLISH AND UNDER.
JACK: Do you hear that? I think something is on fire.

Here’s an example of the right way (the dialog explicitly cues the sound of fire).

JACK: Do you hear that? I think something is on fire.
SOUND: (FADE IN) FLAMES – ESTABLISH AND UNDER.

Here’s another example of the wrong way (the ambiguous sound of rain is introduced without cueing).

SOUND: RAIN – ESTABLISH AND UNDER.
JACK: Damn but it’s started raining hard.

Here’s another example of the right way (the readily identifiable sound of thunder cues the rain).

SOUND: THUNDER – LET IT FINISH.
SOUND: RAIN – ESTABLISH AND UNDER.
JACK: Damn but it’s started raining hard.

It may seem counter intuitive, but for ambiguous sounds to “read” properly, they must be cued before they are heard, either by dialog, or by self-identifying sounds (like the thunder) that provides the auditory clues needed to decode what is happening in the drama. If they come after the sound is introduced we run the risk of having lost our audience members to confusion… and confusion will instantly break the immersion we are trying to achieve.

Setting Scenes

Another place that sound is often mis-used in audio drama, arises in the establishing of scenes. At the beginning of the scene we want to establish a sense of time and place. As often as not, we do this with sound, making the scene come to life by giving it proper ambiance. But, it is common for amateur audio drama to overdo this, and it often stands out with regard to footsteps. Footsteps are very hard to get right (usually because they are very distracting). The fact is, most of us don’t hear our footsteps as we walk around in our day. When we enter a new environment we notice them for a moment (because a transition has taken place and our brains need to sort out whether the change in sound is important or not) but then they fade from consciousness once more. Too many audio dramas maintain their background sounds and footsteps for far too long. We need to establish and fade our sounds to create a sense of reality. Footsteps should only register consciously for a few moments. As a general rule, background should be unobtrusively faded as soon as the dialog begins. Our brains register the continuing noticeable sound of footsteps under dialog as something that is not part of our natural experience and it becomes distracting. This is true of all our establishing sounds.

I know lots of folks hate narration as a tool in audio drama. I’m aware of all the reasons (we’ve talked about them elsewhere) and I won’t go over them all again here. I maintain that, for all the criticisms that can be leveled at narration, it is still a helpful tool in any audio dramatist’s toolkit. I’m just going to focus on one use of narration today that is directly relevant to the discussion so far.

There is a psychological principle called closure (again discussed elsewhere, and with many thanks to Jack Ward for giving me the name to call it by) that is of huge benefit to the audio dramatist. It is that feature of the audience that creates entire worlds around the cues (sound, music, and dialog) we provide them with. We mention a lawyer’s office and the audience supplies all the details with their imaginations. We don’t have to suggest a desk and chair etc. The audience can be relied on to construct an entire detailed set on our behalf. BUT… the more complex a scene, the more likely the soundscape that establishes it will “read” ambiguously to the listener. A few words of context can make all the difference.

Compare

SOUND: (WALLA) DISTANT CARS LOW IN BACKGROUND – ESTABLISH AND UNDER
JAKE: (YAWNS).
SOUND: TELEPHONE RINGS – CONTINUE UNTIL PICKED UP.
SOUND: CHAIR CREAKS – LET IT FINISH.
SOUND: TELEPHONE IS PICKED UP – LET IT FINISH.
JAKE: Stephano Detective Agency, Jake Stephano speaking.

With

NARRATOR: In the office of Jake Stephano, private detective…
SOUND: (WALLA) DISTANT CARS LOW IN BACKGROUND – ESTABLISH AND UNDER
JAKE: (YAWNS).
SOUND: TELEPHONE RINGS – CONTINUE UNTIL PICKED UP.
SOUND: CHAIR CREAKS – LET IT FINISH.
SOUND: TELEPHONE IS PICKED UP – LET IT FINISH.
JAKE: Stephano Detective Agency, Jake Stephano speaking.

The above isn’t a great example and I probably wouldn’t bother using narration here, but there is no denying that the scene decodes for the listener faster with the narration than without. Being forced to do a little bit of decoding never hurt anybody (we do that with sound in the real world all the time) but where listeners are forced to work hard to decode the soundscape in order to work out where and when the action is taking place, we are actively working against their sense of immersion. You simply cease to be embedded in the story when you have to consciously stop and decode the soundscape in order to understand the scene.

Pacing to Provide Greater Focus

Pacing is also important for decoding sound. Fights cannot be lengthy without creating confusion. In fact it is essential that they be fairly short (or that they are broken into chunks with explanatory dialog) otherwise they become confusing to the listener and fail to decode. That isn’t to say that confusion (for example, the background noise of an out of control brawl) has no place in an audio drama – but the foreground soundscape MUST decode easily for the listeners. A masterful example of this took place in one of Decoder Ring Theatre’s Black Jack Justice episodes. The scene took place in a casino where a shoot out was taking place. The background soundscape was one of ongoing shouts, shots, and mayhem, but the foreground was a masterclass in short, sharp, controlled action and dialog involving the protagonists. Despite all the confusion in the background there is nothing ambiguous about the foreground action and the scene “reads” perfectly.

Below is a simple example of a quick fight interaction that decodes easily for the listener.

JAKE: I have been hunting you for some time, Fletch. There is no way for you to escape.
FLETCH: Oh,yeah? We’ll see about that.
JAKE: Don’t try to run. I don’t want to hurt you Fletch, but if you make me… Damnit.
SOUND: SMACK… SMACK, SMACK – LET IT FINISH.
FLETCH: (GROANS) Ugh.
SOUND: BODY DROP – LET IT FINISH.
JAKE: (SIGHS) Well, you can’t say I didn’t warn you.

MINIMALISM VERSUS DENSITY OF SOUND DESIGN

If it’s not obvious to you, I’m something of a fan of a certain amount of minimalism when it comes to sound design. But as I said earlier, this may be because of my own deafness – I admittedly have a lower than typical tolerance for crowded sound-scapes. But this preference isn’t universally observed. The OTR show, Gunsmoke, contained the richest, most layered use of sound that I’ve heard and I absolutely love it… but that’s because they used dense sound-scapes so well and mastered the art of decoding them for their listeners.

Probably the best minimalist use of sound that I am aware of is Decoder Ring Theatre’s Red Panda Adventures – Gregg Taylor is a master of clear and accessible action. At the other end of the spectrum with regard to sound design is Gunsmoke with its dense use of sound. In both cases, however, clarity is the key.

Anecdotally, there appears to be a “sweet spot” for audio production that allows maximum immersion (with some variation due to individual differences in listeners). This sweet spot seems to exist as a continuum in which minimalist sound design and heavily layered sound can equally contribute to an increased sense of immersion so long as their use effectively mimics the real world experience of hearing and matches a listener’s expectations for the environment. That is, silence is actually fine so long as the listener expects the environment to be silent, while a large amount of noise is also fine if the environment is expected to be noisy. But noise will become distracting if it doesn’t fade realistically and silence will fail to be convincing if sound fails to intrude sufficiently on the empty background when it occurs. Some recent research suggests that we can add to our list of realistic audio cues the spatial (or reverb) characteristics of the sound. In the context of what we already know regarding how to create the illusion of immersion, it also makes perfect sense that the spatial characteristics of sounds need to match listener expectations in order to be convincing.

Anyway I’d love to hear your thoughts on sound design, scripting, creating immersion, and how to help your audience decode what is taking place in your dramas. Add your comments below.

Please follow and like us: