Publications of Dr. Martin Rothenberg:
A Multichannel Electroglottograph
Published in the Journal of Voice, Vol. 6., No. 1, pp. 36-43, © 1992 Raven Press, Ltd., New York
Summary: It is shown that a practical multichannel electroglottograph can be implemented by using synchronization of the RF signal sources of the various channels to limit interchannel interference. A multichannel electroglottograph can be used to determine the extent to which a given trace is generated primarily by the degree of vocal fold contact and not by noise or artifactual voice-synchronous distortion components. A vertical array with as few as two electrode-pairs can be used to simplify correct electrode placement, as well as to generate a signal that tracks vertical movements of the larynx. Various types of noise and artifactual variation possible in an electroglottograph trace are described and illustrated.
Key Words: Glottography Larynx Vocal function Voice testing.
THE SINGLE-CHANNEL ELECTROGLOTTOGRAPH
The electroglottograph (EGG) is a device for the noninvasive measurement of the time variation of the degree of contact between the vibrating vocal folds during voice production. Though it is difficult to verify the assumption precisely, the aspect of contact being measured by a typical EGG unit is nominally considered to be the vocal fold contact area (VFCA). This article describes a new type of EGG developed at the Speech Research Laboratory at Syracuse University that, by the use of multiple channels, can make the technique easier to use and more reliable and, in addition, can provide a quantitative measure of vertical movements of the larynx during voice production.
To measure VFCA, an EGG records variations in the transverse electrical impedance of the larynx and nearby tissues by means of a small electrical current applied by electrodes on the surface of the neck. This impedance will vary slightly with the area of contact between the moist vocal folds during that part of the glottal vibratory cycle in which the folds are in contact. However, it should be emphasized that because the percentage variation in the neck impedance caused by vocal fold contact can be extremely small and varies considerably between subjects, no absolute measure of contact area is obtained, only the pattern of variation for a given subject.
By using electrical currents in the megahertz region, presumably to capacitively bypass the relatively nonconductive outer layer of the skin and the myelin insulation of muscle fibers between the electrodes and the glottis, and low-noise electronics, a fairly noise-free waveform can be obtained for most subjects that is relatively independent of the unit selected and strongly related to VFCA (1). The low frequency phase distortion caused by the high-pass filtering needed during continuous speech (1) can be avoided by the use of linear-phase filters (2). The previously available commercial EGG units are compared quite thoroughly by Baken (3).
Comparisons of EGG waveforms from modern commercial units with pictures of the glottis (superior view) and inverse-filtered air flow and at least one test using an excised canine larynx (4-6) verify that, when properly designed and properly used, an EGG can yield a waveform for most subjects that reflects at least fairly well the variation in vocal fold contact area during the vibratory cycle. With experience in its use, many vibratory features can be recognized, such as the open versus closed segments of the vibratory cycle (2) and features indicating the vertical or horizontal phasing of vocal fold motion (vocal folds rolling together or apart rather than having a parallel approach or separation (1), or occasional waveform perturbations indicating a contact anomaly such as the irregular breaking of a mucous bridge during the opening phase.
At present, EGGs are being used at many research laboratories, but except for rudimentary applications such as the measurement of vocal period, the technique has not been accepted for general clinical use. There are basically three reasons that electroglottography is not used more commonly. First, there are many subjects for whom the previously available commercial units either yield no output or one that is very noisy and/or very different from vocal fold contact area. More significantly, in cases of noisy or distorted waveforms, the user often has no clear indication that the waveform is not valid. (Extremely noisy waveforms, that is, waveforms that are relatively small in amplitude and show a fine-grained irregularity that differs from cycle to cycle, can generally be considered inaccurate as a representation of the VFCA. Conversely, very strong, noise-free waveforms from a normal voice can usually be trusted. An extreme amount of subcutaneous fatty tissue about the neck is also a good indication that the EGG signal may not be a good indicator of the VFCA.)
Second, to obtain a waveform that represents primarily the VFCA, previous units require accurate placement of the electrodes with respect to the vocal folds. The practice of using extra guard-ring or reference electrodes for reducing noise makes accurate placement more important, since if the glottis is mistakenly placed in the electrical field going to the guard or reference electrode, the closing of the vocal folds can actually act to draw current away from the primary electrode and cause a partial signal inversion, or at least a distortion of the waveform. This can be easily tested experimentally by purposely shifting the contactor locations during a held vowel and looking for changes in the waveform.
Third, electroglottography is not used more commonly because the various waveform features of interest to the clinician have not yet been clearly charted. This is undoubtedly due in part to the first two problems, since it would be a waste of effort to document in detail the characteristics of a device that cannot be trusted.
The version of the EGG described here circumvents the first two problems, that is, it makes proper placement easy to accomplish, or alternatively, it can be operated so as to not require accurate placement of the contactors. It also provides a means for verifying that the signal can be trusted as an indicator of the VFCA. By providing a more reliable and easily used device, this new EGG should also facilitate the research required for waveform interpretation, the third problem area.
To see why previous EGG units are not sufficiently reliable as an indicator of the VFCA in many cases, it is important to understand and differentiate between the various types of noise or signal distortion in the EGG signal when it is considered an indicator of the VFCA. Some of the more significant noise types will be described here with reference to Figure 1. At the left in Fig. 1 is a schematic representation of a basic two-electrode (single-channel) EGG. In the configuration shown, the EGG has a transmitter that is a source of RF electrical current having a source impedance that is high with respect to the impedance of the neck between the electrodes, and a receiver that measures the amplitude of the RF interelectrode voltage caused by the transmitter current. The configuration shown in Fig. 1 is used in the EGG units described later. (In an alternate configuration, the transmitter and receiver can be connected in series with the neck impedance and the resulting changes in electrical current can be measured.)
The amplitude of this interelectrode voltage will tend to vary with the transverse neck impedance that, in turn, varies (inversely) with the VFCA. We have found that the variation in the amplitude of the receiver input voltage, that is, the percent modulation of the RF carrier, which is caused by and represents the VFCA, can range from roughly 0.02 to 2% of the average RF voltage magnitude for typical electrode dimensions, and is typically between 0. 1 0.5% for non-obese adult male subjects and about half these values for non-obese adult female subjects.
As shown at the right in Figure 1, the EGG output signal can be considered to consist of a desired VFCA waveform (shown inverted, with increasing contact negative-going, as is the convention in our laboratory to facilitate correlations with glottal airflow and glottal area patterns), to which is added a number of noise or distortion components. Figure 1 illustrates the three most significant types of noise, namely, low-frequency artifact, random noise, and voice-synchronous noise.
Low-frequency artifact
A low-frequency artifact, illustrated at the lower right, can result from such factors as electrode movement or the muscularly controlled (nonvibratory) movements of the larynx and the articulators during continuous speech. Since these movements vary little during each glottal cycle, their effects on the EGG waveform are theoretically removable by means of a high-pass filter with a cutoff frequency slightly below the voice fundamental frequency. If the filter is of the linear phase shift or constant delay variety (these descriptions are mathematically equivalent), little distortion of the VFCA waveform will be introduced by the filter, aside from a small known, fixed delay. Since low-frequency artifacts can be removed by such filtering, this component has not been included in the illustrative EGG signal waveforms in the figure, and in this article the term EGG waveform is restricted, as is customary, to the components of the translaryngeal electrical impedance at or above the voice fundamental frequency (what some researchers refer to as the Lx signal, 7).
It should be noted, however, that there are some elements of these lower-frequency components of the translaryngeal electrical impedance that are of potential interest, most notably the variation in the average value of the waveform during vocal fold adduction or abduction gestures. Thus, some manufacturers make available an output containing these lower-frequency components (the Gx output in some Laryngograph models or the Extended Low-Frequency output in the Glottal Enterprises models). The user, though, should keep in mind that these low-frequency outputs will always contain, to some degree, artifacts from other movements in or near the larynx artifacts that are inherently not separable from the desired components.
Random noise
A small amount of broad-band random noise, analogous to the hiss in a weak AM broadcast transmission or the snow in a weak television signal, is always introduced by the electronics in the transmitter and receiver circuitry and by RF energy from the environment that is picked up by the receiver circuit. These sources are characterized by the letter R in Figure 1. Random noise can be difficult to identify in an EGG signal from a very hoarse or aperiodic voice, since the noise causes cycle-to-cycle variations in the signal that may be similar in some respects to aperiodicities caused by irregular vocal fold movements. However, in most cases random noise is easy to identify in the EGG waveform by its variability between glottal cycles. In addition, if the EGG unit employs no automatic gain or level control circuitry, the level of random noise in an EGG waveform is easy to measure by merely stopping the voice, as by holding the vocal folds closed against a positive lung pressure, and measuring the resulting broad-band noise, since the random noise components tend not to depend on the presence or absence of vocal fold vibration.
Voice-synchronous noise
The most inherently troublesome noise sources are those that are caused by the voice itself and therefore tend to produce EGG components that are synchronous with the desired VFCA signal, that is, that are the same or similar in every glottal cycle. Such components, represented as S in Figure 1, tend to appear as a distortion of the waveform. Voice-synchronous noise can come from any voice-generated physiological vibration that can affect the electrical impedance between the EGG electrodes. Examples can include tissue vibration at the skin-electrode interface, vibration of the pharyngeal walls or tongue, or vibratory movements of the false vocal folds or adjacent structures. Because of the mass of the tissues involved, the tissue vibration causing the synchronous noise will tend to be smoothly varying at the voice fundamental frequency, and, as a result, the voice-synchronous noise components will tend to be much more smoothly varying (have changes in the waveform that are less abrupt and much weaker high-frequency harmonics) than the VFCA waveform. However, voice-synchronous artifacts at a frequency as high as the frequency of the first formant are also possible. For a given voice production, the amplitude of some of the voice-synchronous noise components can be estimated very roughly by moving the electrodes away from the vocal folds to reduce the VFCA component; however, any noise components generated close to the vocal folds cannot be so measured.
The sketches of the EGG output in Figure 1, represented as A + R + S, illustrate that when the true VFCA signal (A in the figure) is small in amplitude, the EGG output can be dominated by random and/or voice-synchronous noise. In my experience, the VFCA component may be too small in amplitude for some applications when the modulation of the RF transmitter current caused by the variations in vocal fold contact falls much below about 0.1%, though the precise boundaries for various voices and applications are not well determined at this time. On the other hand, with a well-designed EGG unit, properly placed electrodes, and good electrode-skin contact, modulation percentages greater than about 0.2% generally produce an EGG output in which the VFCA component A tends clearly to dominate, as illustrated in the lowermost A + R + S trace in Fig. 1.
It should be mentioned that there are other possible distortion factors, such as power line interference (easily identified by its synchronism to the power line frequency and generally removable by better electrical shielding and grounding or by moving to another test location) or a nonuniform electric field over the area of the vocal folds. However, the three types illustrated in Figure 1 appear to me to characterize well the major distortion components inherent in the signal.
Thus, I have shown by illustration in Fig. 1 that a VFCA signal that is too weak can result in an EGG waveform that is dominated by either low-frequency artifacts, random noise, or voice-synchronous noise, with voice-synchronous noise the most difficult to separate from the true VFCA waveform. Moreover, with some neck physiologies, a weak VFCA component can be present even when the electrodes are placed optimally, at the level of and lateral to the vocal folds. However, the noise and artifact problems are increased when the VFCA signal component is decreased when the electrodes are not placed optimally, as when an optimal position is difficult to locate or because of movements of the larynx or neck during the test procedure. Thus, to get a maximally clear recording of the VFCA and be confident of the result, one must have an EGG system that makes it easy for the user to answer the following two questions: What is the best position for the electrodes? Is the resulting EGG signal sufficiently strong to trust as an adequate representation of the VFCA?
What follows will be a description of a new type of EGG that, for most subjects, allows the user to answer these questions. In addition, this new EGG has a capability for actually tracking the larynx as it moves vertically during speech and, when non-negligible voice-synchronous noise is present, for indicating to the user which aspects or segments of the EGG waveform are not to be trusted to represent the VFCA.
A MULTICHANNEL EGG
The EGG system proposed here uses multielectrode arrays on each side of the neck to provide simultaneous EGG measurements at a number of neck locations. Each electrode pair, consisting of corresponding opposed electrodes, is connected to its respective transmitter and receiver, as in Figure 1, to constitute a channel, in our terminology. The electrodes in each array can be configured horizontally, vertically, or in a two-dimensional pattern (assuming more than two channels). For example, a horizontal array has been used by our laboratory on an experimental basis to identify the presence of phase differences in the vibratory pattern along the vocal folds. However, this article focuses on the properties of a vertical array. Since a multichannel system employing a vertical array can be used to track the position of the larynx as it moves vertically during speech or singing (as discussed later), we refer to the system illustrated in Figure 2 as a tracking multichannel EGG, or TMEGG. The following is a description of a version of the two-channel TMEGG developed in our laboratory. The features and method of application of this unit are discussed with reference to the sketch in Figure 2.
A major problem in implementing a multichannel EGG is the noise and distortion that can be generated by interference between the RF electrical currents in the various channels. Though there are a number of methods that can be used to reduce such interference, I presently prefer the technique of time-synchronizing the RF signal sources. In the two-channel vertical array prototype constructed using this principle, careful electrical design has resulted in a noise level in each channel that is no more than that of any preexisting commercial design, even though somewhat smaller electrodes are used than is commonly the practice.
Thus, good performance is attained with electrodes small enough to be used in an array. This high level of performance has also been attained without the use of field-forming or reference electrode techniques that would distort the output from electrode pairs not at the level of the glottis. In addition, since the design provides separate electric fields for each electrode pair, more electrodes could be added without signal degradation. The frequency of the electrical current used, 2 MHz, and the maximum voltage and current to which the subject is exposed, about 1 V and 10 mA, respectively, are similar to that in other commercial units.
An important feature of the electrical design is that it does not employ the feedback or automatic level-adjusting techniques of some previous designs, so that the DC component of the demodulated receiver voltage can be calibrated in terms of the transverse impedance of the neck, and the ratio of the amplitude of the AC component of the TMEGG output in each channel to the DC output for that channel can be readily calibrated in terms of percent modulation of the electrode voltage. Thus, the percent modulation for each channel could be displayed for the operator as a measure of the efficiency of operation and signal reliability. To simplify the display, it should be sufficient to show only the percent modulation of the strongest channel (that is, the channel for which the percent modulation is greatest), as illustrated in Figure 2. This indication of percent modulation could be compared with a range of percent modulation sufficient for proper operation, when such a range is developed by future research.
In Figure 2, the outputs of the two channels are shown displayed separately on an oscilloscope for comparison; however, it is also possible to automatically either combine the channel outputs or select between them, so as to produce one optimized signal for display or recording. If desired, amplitude normalization of this final output signal could be added, using some form of automatic gain control circuit. Naturally, the percent modulation measurement would be made using a signal that preceded any such normalization.
When a two-electrode-pair TMEGG is used with a multichannel display device such as the oscilloscope in Figure 2, the user would normally position the electrode array for approximately equal amplitudes. Positioning for equal waveform amplitudes would be expected to place the electrode pairs at least roughly equidistant from the level of the glottis, unless there were gross differences in the contact pattern of the vocal folds along their vertical dimension and, in addition, the electrical field intensity from an electrode pair was significantly non-uniform over the vertical dimension of vocal fold contact. Equal waveform amplitudes would also not indicate a centered glottal position if the physiology of the neck caused grossly different field intensities for each electrode pair at the plane equidistant from each electrode pair. However, we have not found evidence that either of these factors is significant in subjects tested to date.
In an alternate positioning procedure, a relatively simple electronic circuit can be used to compare the output amplitudes and provide the user with a meter or bar graph indication of correct position. Such a meter, labeled larynx height, is shown on the EGG electronics unit in Figure 2. A center-zero position on the meter would indicate that the traces A and B were of equal amplitude, and therefore that the vocal folds were approximately centered vertically between the electrode pairs. If both traces are electronically recorded during a given segment of speech, it is possible to compensate for subsequent vertical movements of the larynx by selecting the strongest trace during any specific short period to be analyzed.
The electrical voltage applied to the larynx height meter in Figure 2 could also be output as a tracking signal that would trace vertical movements of the larynx during voice production, as illustrated in Fig. 2. Since these vertical movements are much slower than the vocal fold vibrations, they can be recorded directly on a chart recorder having a frequency response flat to only 5 or 10 Hz. An approximate calibration of the tracking signal, as in terms of volts per millimeter of larynx movement, is possible by means of a reciprocal technique in which the larynx is held still during a constant vowel while the electrodes are moved vertically by some convenient increment, say 5 mm, and the resulting variation in the tracking voltage is recorded. Though we have not yet explored the limitations in this regard, we expect that the efficacy of this tracking function will vary significantly with the strength of the VFCA component of the EGG waveforms obtained for the individual being tested.
SOME ILLUSTRATIVE RESULTS
Typical output waveforms for the two-channel prototype are shown in Figure 3 for a typical adult male subject. For these tests, the electrodes in each array were circular, 18 mm in diameter, and separated by an 8-mm vertical gap. In the sequence of photographs shown in the figure, the electrode arrays were both shifted vertically by increments of about 5 mm during similar repetitions of the vowel /a/. The waveforms indicate that, with this subject and the electrodes used, movements of the larynx of as little as about 2 mm should be resolvable from changes in the relative amplitudes of the upper and lower EGG waveforms. Additional resolution may be attainable with yet smaller electrodes, if a corresponding increase in noise in the EGG waveform could be tolerated.
In the center photograph in Figure 3, the similar amplitudes in the upper and lower traces imply that the glottis was at a level approximately midway between the upper and lower electrode pairs. (The two channels had been calibrated for equal sensitivity to changes in electrical impedance.) Moreover, the similarity in the shapes of the two waveforms also indicates that for this subject the electrode configuration used for an electrode pair was not highly sensitive to correct placement for a reliable waveform.
To illustrate the significance of voice-synchronous noise in the interpretation of EGG waveforms, Figure 4 shows the waveforms from another adult male subject having a strong EGG signal (a high percent modulation of the RF carrier). The magnification of the scales in Fig. 4 is greater than in Figure 3, to better show the waveform details. As with the subject in Fig. 3, the percent modulation of the electrical current was high enough so that little random noise is present. (The slight step-effect graininess of the traces was caused by the digital sampling used to capture the traces and not by noise in the signal.) Also, since a constant vowel articulation (/a/) was used, there is no appreciable low-frequency artifact. The vertical position of the two-electrode pair array (the same array as used for Fig. 3) was carefully adjusted for equal trace amplitudes from the upper and lower channels, so that the glottis was at least approximately centered between the electrode pairs.
When the traces from the upper and lower channels in Figure 4 are superimposed at the bottom of the figure, it can be seen that they agree quite well, except for a segment occurring just after the indication of glottal closure (the sharp downward movement of the inverted VFCA trace). Thus, my interpretation of the EGG waveforms in Fig. 4 would be that the waveforms represented the VFCA through most of the glottal cycle, since the traces agree so well. However, the disparity just after the glottal closure indicates that there is some ambiguity in their interpretation as the VFCA. During this segment, the upper trace is slightly offset from the lower and has oscillations at a frequency similar to the frequency of the subject's first formant.
My interpretation of the oscillations is that they are a voice-synchronous noise component generated by strong acoustic pressure variations in the pharynx just after the closing of the vocal folds and picked up by the electrode pair closest to the pharynx. That the oscillations are formant-related is supported by the fact that the oscillations would change in frequency or disappear when the vowel articulation was changed or when a weaker voice was used.
Also, it is known from measurements of acoustic pressure and inverse-filtered air flow that this subject had a very strong voice, with the complete glottal closure and small open quotient (ratio of the period of the open phase of the vocal fold vibratory cycle to the entire vibratory period) that are characteristic of strong male voices. The small open quotient is also verified by the shape of the EGG waveform (2). I would hypothesize that the formant energy is causing some oscillation of a structure or structures close to the pharynx (tongue, pharyngeal wall, false vocal folds, etc.) that affect the electric field of the upper electrode pair much more than that of the lower pair.
An alternate explanation for the formant-related oscillations in the upper trace in Figure 4 is that the formant energy in the pharynx is causing oscillatory variations in the VFCA along the upper margins of the vocal folds, and that these variations are recorded only by the upper electrodes. However, from the gross dimensions of the electrodes and of the neck at the level of the larynx, we would assume that the electrical fields of the respective electrode pairs were not so nonuniform over the vertical dimension of vocal fold contact that a formant-induced VFCA component would register in only the lower trace.
Thus, because the waveform differences are convincingly explainable in terms of formant energy in the pharynx causing voice-synchronous noise in the upper trace, we might assume that the lower trace better represents the VFCA during this segment. However, in most cases in which the traces disagree there will be no such determination of validity possible, and differences between the traces will be interpretable only as indicating that neither trace can be trusted in regions in which the waveforms disagree significantly.
The larynx vertical tracking function can be implemented in different ways from the amplitudes of the various channel outputs of a vertical array, depending in part on the number of channels. For a two-channel array, I have found the function T = log (A1/A2) to yield a reasonable indication of the position of the array with respect to the vocal folds, where Al and A2 are some convenient measure of the amplitudes of the EGG waveforms in the two respective channels, as averaged over a few glottal cycles. For this application, the most important properties of the function T are that it will be zero when Al = A2 and be positive or negative respectively, depending on whether Al is larger or smaller than A2. In addition, assuming only that the amplitudes Al and A2 increase or decrease as the respective electrode pair moves closer to or farther from the level of the glottis, the function T will vary monotonically with relative larynx height.
To test the function T = log (A1/A2) empirically, we have informally used a process of reciprocal calibration in which the electrode arrays are moved vertically over the neck (generally carrying the loose neck skin with them) with the larynx height fixed, during continuous phonation, to simulate vertical movements of the larynx with fixed electrode position. We have found that the function T tends to be fairly linearly related to relative larynx height for displacements of at least 0.5 cm from the T = 0 position, when tested with a few speakers having a strong EGG signal. However, this relationship of T to larynx height would be expected to vary considerably with the speaker and with the electrode configuration, and further research is needed to determine the range of variation in the measure that T provides.
To demonstrate that a useful tracking function may be implementable for at least some subjects, the function T is shown in Figure 5 used as an indicator of relative larynx height for a male subject having a strong EGG signal. The subject was a trained nonprofessional singer. The amplitudes Al and A2 used in deriving T were taken to be peak values of the respective high-pass filtered channel outputs. The speaker was the same as that for Figure 4, and the same electrode array was used. After the arrays were positioned so that the tracking signal T was approximately zero during normal voice production, the subject sang an ascending then descending scale in a comfortable range, starting at A3, first in a style in which the larynx height varied maximally with pitch and then in a style in which the larynx height was held relatively constant. No precise calibration was made of the tracking function in terms of larynx height for this demonstration experiment; however, a maximum movement of the laryngeal prominence of between 1 and 1½ cm was noted in similar vocalizations performed with maximum movement.
In summary, I have demonstrated that multichannel techniques can be used to produce an EGG that can verify the fidelity of its own output waveform as an indicator of the time patterning of vocal fold contact and can yield a signal that helps the user properly position the electrodes and/or track vertical movements of the larynx during voiced speech or singing. The use of improved EGG units incorporating such techniques should make possible a higher level of confidence in the results of research into the use of electroglottography in the study of voice production, in voice analysis, and in voice training and vocal pedagogy.
Acknowledgment: The work reported here was supported by two grants from the National Institutes of Health: Research Grant NS-08919 to Syracuse University in the initial stages and a Small Business Innovative Research (SBIR) Grant to Glottal Enterprises, Inc. during later development. Sandra Rothenberg provided measurements of percent modulation as part of an undergraduate research project.
REFERENCES
1. Childers DJ, Krishnamurthy AK. A critical review of electroglottography. In: Reviews in biomedical engineering 12, No. 2. Boca Raton, Florida: CRC Press, 1977: 131-61
3. Baken RJ. Clinical measurement of speech and voice. Boston: College Hill Press, 1987.
4. Lecluse FLE, Brocaar MP, Verschurre J. The electoglottography and its relation to glottal activity. Fol Phoniatr 1975; 27: 215-24.
6. Scherer RC, Druker DG, Titze IR. Electroglottography and direct measurement of vocal fold contact area. In: Fujimora O, ed. Vocal physiology: voice production, mechanisms and functions. New York: Raven Press, 1988: 279-91.
7. Fourcin AJ, Abberton E. First applications of a new laryngograph. Med Biol Illus 1971; 21: 172-82.