recordinghacks


The World Beyond 20kHz

David BlackmerUsing a study of the human hearing mechanism as his foundation, Earthworks’ founder David E Blackmer presents his arguments for, and his vision of, high-definition audio.

THERE IS MUCH controversy about how we might move forward towards higher quality reproduction of sound. The compact-disc standard assumes that there is no useful information beyond 20kHz and therefore includes a brick-wall filter just above 20kHz. Many listeners hear a great difference when 20kHz band-limited audio signals are compared with wide band signals. A number of digital systems have been proposed which sample audio signals at 96kHz and above, and with up to 24 bits of quantisation.

Many engineers have been trained to believe that human hearing receives no meaningful input from frequency components above 20kHz. I have read many irate letters from such engineers insisting that information above 20kHz is clearly useless, and any attempts to include such information in audio signals is deceptive, wasteful and foolish, and that any right-minded audio engineer should realize that this 20kHz limitation has been known to be an absolute limitation for many decades. Those of us who are convinced that there is critically important audio information to at least 40kHz are viewed as misguided.

We must look at the mechanisms involved in hearing, and attempt to understand them. Through that understanding we can develop a model of the capabilities of the transduction and analysis systems in human audition and work toward new and better standards for audio system design.

What got me started in my quest to understand the capabilities of human hearing beyond 20kHz was an incident in the late eighties. I had just acquired a MLSSA system and was comparing the sound and response of a group of high quality dome tweeters. The best of these had virtually identical frequency response to 20kHz, yet they sounded very different.

When I looked closely at their response beyond 20kHz they were visibly quite different. The metal-dome tweeters had an irregular picket fence of peaks and valleys in their amplitude response above 20kHz. The silk-dome tweeters exhibited a smooth fall off above 20kHz. The metal dome sounded harsh compared to the silk dome. How could this be? I cannot hear tones even to 20kHz, and yet the difference was audible and really quite drastic. Rather than denying what I clearly heard, I started looking for other explanations.

WHEN VIEWED FROM an evolutionary stand point, human hearing has become what it is because it is a survival tool. The human auditory sense is very effective at extracting every possible detail from the world around us so that we and our ancestors might avoid danger, find food, communicate, enjoy the sounds of nature, and appreciate the beauty of what we call music. Human hearing is generally, I believe, misunderstood to be primarily a frequency analysis system. The prevalent model of human hearing presumes that auditory perception is based on the brain’s interpretation of the outputs of a frequency analysis system which is essentially a wide dynamic range comb filter, wherein the intensity of each frequency component is transmitted to the brain. This comb filter is certainly an important part of our sound analysis system, and what an amazing filter it is. Each frequency zone is tuned sharply with a negative mechanical resistance system. Furthermore, the tuning Q of each filter element is adjusted in accordance with commands sent back to the cochlea by a series of pre-analysis centers (the cochlear nuclei) near the brain stem. A number of very fast transmission-rate nerve fibers connect the output of each hair cell to these cochlear nuclei. The human ability to interpret frequency information is amazing. Clearly, however, something is going on that cannot be explained entirely in terms of our ability to hear tones.

The inner ear is a complex device with incredible details in its construction. Acoustical pressure waves are converted into nerve pulses in the inner ear, specifically in the cochlea, which is a liquid filled spiral tube. The acoustic signal is received by the tympanic membrane where it is converted to mechanical forces which are transmitted to the oval window then into the cochlea where the pressure waves pass along the basilar membrane. This basilar membrane is an acoustically active transmission device. Along the basilar membrane are rows of two different types of hair cells, usually referred to as inner and outer.

The inner hair cells clearly relate to the frequency analysis system described above. Only about 3,000 of the 15,000 hair cells on the basilar membrane are involved in transducing frequency information using the outputs of this travelling wave filter. The outer hair cells clearly do something else, but what?

There are about 12,000 ‘outer’ hair cells arranged in three or four rows. There are four times as many outer hair cells as inner hair cells(!) However, only about 20% of the total available nerve paths connect them to the brain. The outer hair cells are interconnected by nerve fibers in a distributed network. This array seems to act as a waveform analyzer, a low-frequency transducer, and as a command center for the super fast muscle fibers (actin) which amplify and sharpen the travelling waves which pass along the basilar membrane thereby producing the comb filter. It also has the ability to extract information and transmit it to the analysis centers in the olivary complex, and then on to the cortex of the brain where conscious awareness of sonic patterns takes place. The information from the outer hair cells, which seems to be more related to waveform than frequency, is certainly correlated with the frequency domain and other information in the brain to produce the auditory sense.

Our auditory analysis system is extraordinarily sensitive to boundaries (any significant initial or final event or point of change). One result of this boundary detection process is the much greater awareness of the initial sound in a complex series of sounds such as a reverberant sound field. This initial sound component is responsible for most of our sense of content, meaning, and frequency balance in a complex signal. The human auditory system is evidently sensitive to impulse information imbedded in the tones. My suspicion is that this sense is behind what is commonly referred to as ‘air’ in the high-end literature. It probably also relates to what we think of as ‘texture’ and ‘timbre’ – that which gives each sound it’s distinctive individual character. Whatever we call it, I suggest that impulse information is an important part of how humans hear.

All the output signals from the cochlea are transmitted on nerve fibers as pulse rate and pulse position modulated signals. These signals are used to transduce information about frequency, intensity, waveform, rate of change and time. The lower frequencies are transduced to nerve impulses in the auditory system in a surprising way. Hair cell output for the lower frequencies are transmitted primarily as groups of pulses which correspond strongly to the positive half of the acoustic pressure wave with few if any pulses being transmitted during the negative half of the pressure wave. Effectively, these nerve fibers transmit on the positive half wave only. This situation exists up to somewhat above 1kHz with discernable half wave peaks riding on top of the auditory nerve signal being clearly visible to at least 5kHz. There is a sharp boundary at the beginning and end of each positive pressure pulse group, approximately at the central axis of the pressure wave. This pulse group transduction with sharp boundaries at the axis is one of the important mechanisms which accounts for the time resolution of the human ear. In 1929 Von Bekesy published a measurement of the human sound position acuity which translates to a time resolution of better than 10µs between the ears. Nordmark, in a 1976 article, concluded that the intramural resolution is better than 2µs; intramural time resolution at 250Hz is said to be about 10µs which translates to better than 1° of phase at this frequency.

The human hearing system uses waveform as well as frequency to analyze signals. It is important to maintain accurate waveform up to the highest frequency region with accurate reproduction of details down to 5µs to 10µs. The accuracy of low frequency details is equally important. We find many low frequency sounds such as drums take on a remarkable strength and emotional impact when waveform is exactly reproduced. Please notice the exceptional drum sounds on The Dead Can Dance CD Into the Labyrinth. The drum sound seems to have a very low fundamental, maybe about 20Hz. We sampled the bitstream from this sound and found that the first positive waveform had twice the period of the subsequent 40Hz waveform. Apparently one half cycle of 20Hz was enough to cause the entire sound to seem to have a 20Hz fundamental.

The human auditory system, both inner and outer hair cells, can analyze hundreds of nearly simultaneous sound components, identifying the source location, frequency, time, intensity, and transient events in each of these many sounds simultaneously and develop a detailed spatial map of all these sounds with awareness of each sound source, its position, character, timbre, loudness, and all other identification labels which we can attach to sonic sources and events. I believe that this sound quality information includes waveform, embedded transient identification, and high frequency component identification to at least 40kHz (even if you can’t ‘hear’ these frequencies in isolated form).

TO FULLY MEET the requirements of human auditory perception I believe that a sound system must cover the frequency range of about 15Hz to at least 40kHz (some say 80kHz or more) with over 120dB dynamic range to properly handle transient peaks and with a transient time accuracy of a few microseconds at high frequencies and 1°-2° phase accuracy down to 30Hz. This standard is beyond the capabilities of present day systems but it is most important that we understand the degradation of perceived sound quality that results from the compromises being made in the sound delivery systems now in use. The transducers are the most obvious problem areas, but the storage systems and all the electronics and interconnections are important as well.

Earthworks LAB102 preampOur goal at Earthworks is to produce audio tools which are far more accurate than the older equipment we grew up on. We are certainly pushing the envelope. For example, we specify our LAB102 preamp from 2Hz to 100kHz ±0.1dB. Some might believe that this wide range performance to be unimportant, but listen to the sound of the LAB102, it is true-to-life accurate. In fact the 1dB down points of the LAB preamp are 0.4Hz and 1.3MHz, but that is not the key to its accuracy. Its square wave rise time is one quarter of a microsecond. Its impulse response is practically perfect.

Microphones are the first link in the audio chain, translating the pressure waves in the air into electrical signals. Most of today’s microphones are not very accurate. Very few have good frequency response over the entire 15Hz–40kHz range which I believe to be necessary for accurate sound. In most microphones the active acoustic device is a diaphragm that receives the acoustical waves, and like a drum head it will ring when struck. To make matters worse, the pickup capsule is usually housed in a cage with many internal resonances and reflections which further color the sound. Directional microphones, because they achieve directionality by sampling the sound at multiple points, are by nature less accurate than omnis. The ringing, reflections and multiple paths to the diaphragm add up to excess phase. These microphones smear the signal in the time domain.

We have learned after many measurements and careful listening that the true impulse response of microphones is a better indicator of sound quality than is frequency amplitude response. Microphones with long and non-symmetrical impulse performance will be more colored than those with short impulse tails. To illustrate this point we have carefully recorded a variety of sources using two different omni models (QTC40Earthworks QTC1 and another well-known model) both of which have flat frequency response to 40kHz within -1dB.(Fig.1: QTC1 vs 4007). [Blackmer’s illustrations have sadly been lost. –Ed.] When played back on high-quality speakers the sound of these two microphones is quite different. When played back on speakers with near-perfect impulse and step response, which we have in our lab, the difference is even more apparent. The only significant difference we have been able to identify between these two microphones is their impulse response.

We have developed a system for deriving a microphone’s frequency response from its impulse response. After numerous comparisons between the results of our impulse conversion and the results of the more common substitution method we are convinced of the validity of this as a primary standard. You will see several examples of this in Fig.2.

Viewing the waveform as impulse response is better for interpreting higher frequency information. Lower frequency information is more easily understood from inspecting the step-function response which is the mathematical integral of impulse response. Both curves contain all information about frequency and time response within the limits imposed by the time window, the sampling processes and noise.

The electronics in very high quality sound systems must also be exceptional. Distortion and transient intermodulation should be held to a few parts per million in each amplification stage, especially in systems with many amplifiers in each chain. In the internal circuit design of audio amplifiers it is especially important to separate the signal reference point in each stage from the power supply return currents which are usually terribly nonlinear. Difference input circuits on each stage should extract the true signal from the previous stage in the amplifier. Any overall feedback must reference from the output terminals and compare directly to the input terminals to prevent admixture of ground grunge and cross-talk with the signal. Failure to observe these rules results in a harsh ‘transistor sound’. However, transistors can be used in a manner that results in an arbitrarily low distortion, intermodulation, power supply noise coupling, and whatever other errors we can name, and can therefore deliver perceptual perfection in audio signal amplification. (I use ‘perceptual perfection’ to mean a system or component so excellent that it has no error that could possibly be perceived by human hearing at its best.) My current design objective on amplifiers is to have all harmonic distortion including 19kHz and 20kHz twin-tone intermodulation products below 1 part per million and to have A-weighted noise at least 130dB below maximum sine wave output. I assume that a signal can go through many such amplifiers in a system with no detectable degradation in signal quality.

Many audio signal sources have extremely high transient peaks, often as high as 20dB above the level read on a volume indicator. It is important to have some adequate measurement tool in an audio amplification system to measure peaks and to determine that they are being handled appropriately. Many of the available peak reading meters do not read true instantaneous peak levels, but respond to something closer to a 300µs to 1ms averaged peak approximation. All system components including power amplifiers and speakers should be designed to reproduce the original peaks accurately. Recording systems truncate peaks which are beyond their capability. Analogue tape recorders often have a smooth compression of peaks which is often regarded as less damaging to the sound.

MANY RECORDISTS even like this peak clipping and use it intentionally. Most digital recorders have a brick-wall effect in which any excess peaks are squared off with disastrous effects on tweeters, and listener’s ears. Compressors and limiters are often used to smoothly reduce peaks which would otherwise be beyond the capability of the system. Such units with RMS level detectors usually sound better than those with average or quasi-peak detectors. Also, be careful to select signal processors for low distortion. If they are well designed, distortion will be very low when no gain change is required. Distortion during compression will be almost entirely third harmonic distortion which is not easily detected by the ear and which is usually acceptable when it can be heard.

A look at the specifications of some of the highly rated super-high end, ‘no feedback’, vacuum tube, power amplifiers reveals how much distortion is acceptable, or even preferable, to some excessively well-heeled audiophiles.

All connections between different parts of the electrical system must be designed to eliminate noise and signal errors due to power line ground currents, AC magnetic fields, RF pickup, crosstalk, and dielectric absorption effects in wire insulation. This is critical.

Loudspeakers are the other end of the audio system. They convert electrical signals into pressure waves in the air. Loudspeakers are usually even less accurate than microphones. Making a loudspeaker that meets the standard mentioned above is problematical. The ideal speaker is a point source. As yet no single driver exists that can accurately reproduce the entire 15Hz-40kHz range. All multidriver speaker systems involve tradeoffs and compromises.

We have built several experimental speaker systems which apply the same time-domain principles used in our Earthworks microphones. The results have been very promising. As we approach perfect impulse and step-function response something magical happens. The sound quality becomes lifelike. In a live jazz sound-reinforcement situation using some of our experimental speakers and our SR71 mics the sound quality did not change with amplification. From the audience it sounded as if it was not being amplified at all even though we were acutely aware that the sound was louder. Even with quite a bit of gain it did not sound like it was going through loudspeakers.

Listening to some Bach choral music that we recorded with QTC1 microphones into a 96kHz sampling recorder, and played back through our engineering model speakers is an startling experience. The detail and imaging are stunning. You can hear left to right, front to back and top to bottom as if you are there in the room with the performers. It is exciting to find that we are making such good progress toward our goal.

I have heard that the Victor Talking Machine Company ran ads in the 1920s in which Enrico Caruso was quoted as saying that the Victrola was so good that its sound was indistinguishable from his own voice live. In the seventies Acoustic Research ran similar ads, with considerably more justification, about live vs recorded string quartets. We have come a long way since then, but can we achieve perceptual perfection? I suspect that truly excellent sound, perhaps even perceptual perfection? As a point of reference you should assemble a test system with both microphones and speakers having excellent impulse and step response, hence nearly perfect frequency response, together with low distortion amplifiers. Test it as a sound reinforcement system and/ or studio monitoring system with both voice and music sources. You, the performers, and the audience will be amazed at the result. You don’t have such a system? Isn’t that impossible, you say? It is not! We have done it! If you want more information, here are several books which I believe anyone who is intensely involved in Audio should own and read and then reread many times.

An Introduction to the Physiology of Hearing, Second EditionAn Introduction to the Physiology of Hearing
James O. Pickles, Academic Press 1988
ISBN 0-12-554753-6 or ISBN 0-12-554754-4 pbk.

Spatial Hearing – Revised Edition: The Psychophysics of Human Sound LocalizationSpatial Hearing
Jen Blauert, MIT Press 1997
ISBN 0-262-02413-6

Experiments in HearingGeorg von Bekesy
Georg von Békésy, Acoustical Society of America
ISBN 0-88318-630-6

Hearing: Physiological Acoustics, Neural Coding, and PsychoacousticsHearing (book)
W. Lawrence Gulick, George A. Gescheider, Robert D. Frisina; Oxford University Press 1989
ISBN 0-19-50307-3

10 Responses to “The World Beyond 20kHz”

  1. ishan

    October 30th, 2012 at 6:18 am

    If our ear hears sound over 20KHz it damages our ear.but is this sound really audible or i say can we sense it or is just negligible?

  2. James

    December 10th, 2012 at 1:40 pm

    My own experience with high frequency “ultrasound” actually comes as a result of a job I have. I test various pest control devices which emit tones around or above 25kHz. Various workers in the warehouse where I do this testing verify that they can hear tones ranging up to 24kHz clearly (I cannot). Some are more sensitive to these tones and others less. I have long since decided 20kHz was an arbitrary cut off point for human hearing.

    I read this article in an attempt to discover how I can measure the decibel output of ultrasonic devices accurately above 25kHz, and though I can see the tools you used, I am wondering if you could shed light on how I would go about accurately measuring these high frequency tones, thanks.

  3. Dan Wahl

    December 25th, 2012 at 8:56 am

    Hi David,

    I have skimmed your page on HD audio. I am an engineer, EE, with interest in audio. Sorting thru the myths and finding facts in audio land is sometimes challenging. I suggest that it is possible the objections to the 20KHz top end debate could be understood by the following:

    1) Human hearing for a pure tone up to 20KHz may be limited as the nasty engineers suggest. However.

    2) Given the complexity of human hearing and physiology, there certainly must be non-linearity in many of the process that convert sound pressure into perception.

    3) Introducing frequencies above 20KHz along with what is considered the normal range of audibility, certainly could change the way we process the information. Part of the process is mechanical. While we may not be able to hear pure tones above 20KHz in isolation, it would not seem much of a stretch to imagine that the mechanics of the ear are disturbed and the response to 20-20KHz changed by the presence of “super audible” energy. This could explain your two tweeters example.

    The above is a theory, I have not basis other than reason to suggest this explanation. But if I am close, it is possible that the engineers are correct and so are the audiophiles,…

    Dan

  4. Mik Brenton

    April 9th, 2013 at 9:11 am

    I think the human ear, which is essentially a transducer, introduces a certain
    degree of non-linearity and hence some intermodulation distortion.
    So for example, reproducing two frequencies at say 25Khz and 30Khz
    will cause sum and difference frequency components in the ear.
    Thus a very small but finite component at 5Khz will be present but would
    be totally absent if the frequency range of the audio system were
    restricted to 20Khz.

  5. LASpain123@yahoo.com

    August 8th, 2013 at 9:01 am

    Hey! What about the dog and the cat? They can hear higher frequencies than we can. Why shouldn’t they be able to hear the full tonal range of the music played on our stereo?

  6. jeff

    October 2nd, 2013 at 5:07 am

    Could it be that we are hearing or perceiving harmonics of the original sounds above 20 or so kHz which give a flavour to the sound that would be cut off with conventional equipment but are still needed to complete the true sound. Just a thought.

  7. Serge

    December 19th, 2013 at 8:54 am

    As an EE and audio enthusiast I see an immediate and rather simple explanation that ties together much of what I have known so far:
    Our ear detection of the continuous audio waves seems like limited to about 18 – 25 kHz. However short time bursts of waves, though not perceived as a ‘sound’, still being processed by the brain as an important ‘additive’ of the pulse (or step) audio signal. Why? Okay, let’s remember that Fourier representation of a step signal is a multitude of wave forms of different frequencies and envelopes, phase aligned such, that rising parts of all of them coincide with the moment of the step. The step is always as ‘sharp’ as high is the frequency of the most high frequency wave ingredient we are able to provide. And here is the kicker: Those higher frequency waves have very short envelopes, and thus, (back to our ears) though not detected in a continuous wave test, still contribute to the shape of the pulse and processed by our brains to give us impression of a natural sound, of course if we provide a full path from the source through amplification and speakers.

  8. Darren

    June 26th, 2014 at 8:49 pm

    Just for the record, it is possible to hear above 20 khz. I have a dead spot at 19.5 khz, but in 2010 I experimented with the range up to the tone generator limit of 24 khz. The 17.5 khz tone used to prevent loitering teenagers peaked my interest(most adults cannot hear the tone) …I am well over 30 years old and the teen tone was like nails on a chalkboard to me.

    The pest alarm was a surprise to me one day too, the neighbors deterrent for racoons was so intensely penetrating I actually had to stop mowing a lawn and go unplug it. I work commercial landscaping and get this…I was also wearing the squishy Orange Rite Aid earplugs!!

    Florescent lights in school, tubes in cathode ray TV sets…20 khz is not the limit of all of us and all headphones reading 20 khz starts to get irritating…its like not having the color violet in the rainbow…some things loose their “dazzle”…IMHO.

    I feel oppressed now…;) Just Joking…sorta

  9. john

    September 8th, 2014 at 5:09 pm

    It’s a shame. I’m working with audio systems all over the day.

    Most of them just can’t output anything realistic above 19kHz.
    By realistic, mean “corresponding to the initial signal”. So first let’s be clear : you can just add up high frequency noise instead of the real signal and you get the same thing.

    What’s worse ? If you try to play 20kHz+ signal on a 20Hz-20kHz hardware (99% of it), you WILL hear something different : that’s called frequency aliasing and it’s related to the playing hardware, not to the original sound.

    Most of the time, what people hear above 20kHz is aliasing. The ” deaf spot” problem is symptomatic of aliasing in all its splendor.

    I had the opportunity to work with 38kHz capable hardware in a quiet room. We could test ou mikes with that (which can handle such a range). Result ?

    We could hear a difference : the 96kHz-sampled produced a noticeable difference in the 38kHz capable lloudspeakers compared to the 20kHz capable one. But when we filtered out high frequencies and downsampled to 44100Hz, we could not hear the difference anymore.

    What did the mikes tell us ? Aliasing was the culprit.

  10. Eugene

    October 24th, 2014 at 5:51 am

    We can easily check if the information beyond 20 kHz is useful or not. Produce a pure sin wave sound with frequency under the upper limit of your hearing range, for example, 18 kHz and then a distorted sound of the same frequency (which means the presence of higher harmonics). Listen to both of these sounds. If you can hear the difference between them then this information is certainly needed.

Leave a comment