Imagine yourself present in a cocktail party. There are various complex acoustic sounds from different sources. But there is a task to attend and understand what the other person is saying while the other people are also talking at the same time. This is the “cocktail party effect” (Cherry, 1953). The phenomenon was first described by E. C. Cherry in 1953. After the observance of the said phenomenon, there are some researches conducted in order to explore this highly specialized and complicated process. In 1990, Bergman documented the concepts of Auditory Scene Analysis in his book.
What is Auditory Scene Analysis?
Auditory Scene Analysis (or ASA) is “the task of grouping and segregating the neutral representation of sounds, to make sense of the what and where of the auditory environment, and is carded out by the brain” (Bregman, 1990; Feng & Ratnam, 2000, p. 299). According to Okuno & Rosenthal (1998), ASA is the sum of the person’s acoustic patterns at the ear that are recovered and given separate descriptions for each separate sound-producing event. These sounds are from the various events occurring around the environment of the listener. It is observed that such sounds are overlapping in time and in frequency. But even though such sounds differs from one another, “there is only one acoustic pattern in the ear of the listener” (Okuno & Rosenthal, 1990, p. 1).
This complicated process is the instinctive ability of human beings to recognize the distance, loudness, pitch, direction, and tone of several individual sounds on a simultaneous manner. This mechanism can describe an auditory picture of a scene or situation by using the ears, body, and brain functioning all together at the same time in order to decode a dynamic localization and static cues. The dynamic localization cues include the vision, early echo response, reverberation, and head motion while on the other hand; static cues are shoulder echo, head shadow, inter-aural time difference, and the filtering of certain frequencies depending upon the identified source of the sound.
There are a number of elements familiar to both auditory scene analysis and visual scene analysis (Bregman, 1990; Julesz & Hirsh, 1972). They occupy a “grouping and segregation of elemental features to extract distinct perceptual objects” (Feng & Ratnam, 2000, p. 699). In visual, there is a representation of space in the retina while in auditory, objects are not constant. In there, a brain function in which it relies completely on the time evolution of the sound waveform will perform scene analysis. Then, it appears to do so by means of sequential and spectral integration. Furthermore, sound source position is computed centrally in audition (Feng & Ratnam, 2000).
Why is it important?
The sense of hearing, or audition, is the ability to detect sound and an involuntary process of sound waves striking the ear drum. This is one of the most important and long-established five senses along with smelling, seeing, tasting, and touching (Hearing, 2005). Auditory Scene Analysis (ASA) is important because it helps us to identify our surrounding areas and the events taking place.
The ASA is significant to every human being because it gives a “sense of hearing” to have a handle on and understand the properties of sound. People can identify sound of a guitar being strummed, a chirping bird on hot summer weather, or a car approaching him/her. Based on natural listening environment, some auditory energy is produced by the individual event sequencing mixture. From the listener’s ear, there is an energy that rises from other simultaneous events.
People need to understand the nature of sound. It is believed that the elementary duty of the acoustic system is to organize the dissonance of occurrence wisps into consequential clumps that corresponds to various real-world activities. A satisfactory acoustic capability of a person is a sign of well-being and balance. Thus, it is important to understand some more complicated processes evident in the manifestation of sounds rather than its physiological explanation only.
How do listeners do it?
According to some auditory psychophysicists, there are two types of examining the phenomenon of sound – sound source determination (Yost, 1992) and source segregation (Bregman, 1990).
Sound source determination – It is a requirement that ears must be use in listening. Using the above example, the threshold for comprehending speech improves when the sound sources are widely separated in space. The experimentation suggests that there is a close up link between sound identification and the ability to separate sound sources in space (Feng & Ratnam, 2000). Certainly, in humans’ sound detection, thresholds are highest when noise and sound sources instigate from the similar location. On the contrary, thresholds are lowered by as much as 10-18 dB when the angular partition between the two sources is increased (Gatehouse, 1987; Saberi, et al 1991). It is supposed that the taking apart of the two sounds in space assists in decomposing the acoustic scene into its elemental sound sources. This, in turn, can diminish detection thresholds.
Source segregation – Feng and Ratnam (2000, p. 699) noted that “the segregation of concurrent sounds into individual auditory streams is also facilitated if the sounds (a) are spectrally separated, (b) have uncorrelated waveform envelopes, (c) start and stop asynchronously, and (d) have harmonics with different fundamentals”. The aforementioned situations make it more convenient and easier to concentrate the individual sound sources and to categorize the contents from these sources.
With such observation, Cherry (1953) showed an evidence that the “cocktail party effect” involves attentional mechanisms. The reason behind this is the fact that a specific person can hear someone talking without necessarily rotating the head or body. He utilized a selective attention paradigm in order to study the issue. He found out that when the listener was asked to attend to one message delivered to one ear while a different message was delivered to the other ear, the unattended message could not be recalled. However, when the messages were split between the ears, or mixed and presented monaurally, very little of either message could be determined.
Bergman (1990) contends that the combination of the sounds reaching the ears is subjected to a two-stage auditory scene analysis. He specified that the primary stage was the decomposition of acoustic signal into a number of sensory elements. The succeeding stage was the grouping of elements that are likely occurred in the uniform environmental source into perceptual structures that are interpreted by higher level processes.
The number of messages a person can attend to at the same time and what aspects of unattended messages can be processed are major issues yet to be resolved. It is deemed that almost all of the theories of selective attention are drawn from visual processing and applied to dichotic listening conditions. They are not easily applicable to auditory information processing in the real world, where competing sounds reach both ears.
Conclusion
The Auditory Scene Analysis (ASA) is a complex process and human ability. Today, newer technologies and other advances in the digital age are utilized in order to provide a clearer and more succinct description of this multifaceted phenomenon. It is very important to realize that the auditory system is an ever-changing, adaptive, and synthetic in nature. The system has its own process to learn and improve its performance through experience. The most important thing above all is the fact that every human being is one of the most complex yet fascinating creations that possess such ability like receiving and understanding sound.
References
Bregman A. S. ( 1990). Auditory Scene Analysis: The Perceptual Organization of Sound. Cambridge, MA.: The MIT Press.
Cherry E.C. (1953). Some experiments on the recognition of speech, with one and two ears. Journal of the Acoustical Society of America. 25. pp. 975-79.
Feng, A.S.& Ratnam, R. (2000). Neural Basis of Hearing in Real-World Situations. Annual Review of Psychology. Annual Reviews, Inc. Gale Group.
Gatehouse, R.W. (1987). Further research on free field masking. Journal of the Acoustical Society of America. 82. S108 (Suppl.)
Hearing (sense). (2005, November 25). Wikipedia, The Free Encyclopedia. Retrieved 07:55, November 24, 2010 from http://en.wikipedia.org/w/index.php?title=Hearing_%28sense%29&oldid=29186328.
Julesz B. & Hirsh, I.J. (1972). Visual and auditory perception- an essay in comparison. In Human Communication: A Unified View, ed. EE David Jr, PB Denes. New York: McGraw-Hill. p. 458.
Okuno, H.G. & Rosenthal, D.F. (Eds). (1998) Computational Auditory Scene Analysis. Mahwah, NJ: Lawrence Erlbaum Associates.
Saberi, K., Dostal, L., Sadralodabai, T., Bull, V., Perrott, D.R. (1991). Free-field release from masking. Journal of the Acoustical Society of America. 90. pp. 1355-70.
Yost W.A. (1992). Auditory image perception and analysis. Hearing Research.56. pp. 8-19.
No comments:
Post a Comment