# Natural statistics of binaural sounds
## Natural statistics of binaural sounds
Wiktor M lynarski ∗ 1 and J¨ urgen Jost 1,2
1 Max-Planck Institute for Mathematics in the Sciences, Leipzig, Germany 2 Santa Fe Institute, Santa Fe, New Mexico, USA
April 19, 2022
## Abstract
Binaural sound localization is usually considered a discrimination task, where interaural time (ITD) and level (ILD) disparities at pure frequency channels are utilized to identify a position of a sound source. In natural conditions binaural circuits are exposed to a stimulation by sound waves originating from multiple, often moving and overlapping sources. Therefore statistics of binaural cues depend on acoustic properties and the spatial configuration of the environment. In order to process binaural sounds efficiently, the auditory system should be adapted to naturally encountered cue distributions. Statistics of cues encountered naturally and their dependence on the physical properties of an auditory scene have not been studied before. Here, we performed binaural recordings of three auditory scenes with varying spatial properties. We have analyzed empirical cue distributions from each scene by fitting them with parametric probability density functions which allowed for an easy comparison of different scenes. Higher order statistics of binaural waveforms were analyzed by performing Independent Component Analysis (ICA) and studying properties of learned basis functions. Obtained results can be related to known neuronal mechanisms and suggest how binaural hearing can be understood in terms of adaptation to the natural signal statistics.
## Introduction
The idea that sensory systems reflect the statistical structure of stimuli encountered by organisms in their ecological niches [4, 3, 44] has driven numerous theoretical and experimental studies. Obtained results suggest that tuning properties of sensory neurons match regularities present in natural stimuli [46]. In light of this theory, neural representations, coding mechanisms and anatomical structures could be undestood by studying characteristics of the sensory environment.
To date, natural scene statistics research have been focusing mostly on visual stimuli [27]. Nevertheless, a number of interesting results relating natural sound statistics to the auditory system have also been delivered. For instance, Rieke et al demonstrated that auditory neurons in the frog increase information transmission, when the spectrum of the white-noise stimulus is shaped to match the spectrum of a frog call [43]. In a more recent experiment, Hsu and colleagues [26] have shown similar facilitation effects in the zebra finch auditory system using stimuli with power and phase modulation spectrum of a conspecific song. In a statistical study it has been shown that modulation spectra of natural sounds display a characteristic statistical
∗ Corresponding author. Email: mlynar@mis.mpg.de
signature [47] which allowed to form quantitative predictions about neural representations and coding of sounds. Other statistical models of natural auditory scenes have also led to interesting observations. Low-order, marginal statistics of amplitude envelopes, for instance, seem to be preserved across frequency channels as shown by Attias and Schreiner [2]. This means that all locations along the cochlea may be exposed to (on average) similar stimulation patterns in the natural environment. A strong evidence of adaptation of the early auditory system to natural sounds was provided by two complementary studies by Lewicki [32] and Smith and Lewicki [48]. The authors modeled high order statistics of natural stimuli by learning sparse representations of short sound chunks. In such a way, they reproduced filter shapes of the cat's cochlear nerve. These results were recently extended by Carlson et al [13] who obtained features resembling spectro-temporal receptive fields in the cat's Inferior Colliculus by learning sparse codes of speech spectrograms. Human perceptual capabilities have also been related to natural sound statistics in a recent study by McDermott and Simoncelli [37]. In a series of psychophysical experiments the authors have shown that perceived realism and recognizability of sound 'textures' by human subjects depends on how well the time-averaged statistics of stimulus modulation correspond to those of natural sounds. The aquired body of evidence strongly suggests that neural representations of acoustic stimuli reflect structures present in the natural auditory environment.
The above mentioned studies investigated statistical properties of single channel, monaural sounds relating them to the functioning of the nervous system. However, in natural hearing conditions the sensory input is determined by many additional factors - not only properties of the sound source. Air pressure waveforms reaching the cochlea are affected by positions and motion patterns of sound sources as well as head movements of the listening subject. These spatial aspects generate differences between stimuli present in each ear , which are traditionally divided into two classes: interaural level and phase differences [21]. The sound wavefront reaches firstly the ipsilateral ear and after a very short time delay the contralateral one. This generates the interaural time difference (ITD). After cochlear filtering - in pure frequency channels, ITDs correspond to phase differences (IPDs). Additionally, sound received by the contralateral ear is attenuated by the head, which generates the interaural level difference (ILD). According to the widely acknowdledged duplex theory [42, 21], in mammals, IPDs are used to localize low frequency sounds. The theory predicts that in higher frequency regimes IPDs become ambiguous and therefore sounds of frequency above a certain threshold (around 1 . 5 kHz in humans) are localized based on ILDs which become more pronounced due to the low-pass filtering properties of the head. Binaural cues are of a relative nature and positions of auditory objects are not represented on the sensory epiphelium - the cochlear membrane - in a direct way. They are reflected in binaural cue values, which themselves vary with changing spatial configuration of the environment and depend on sound sources' spectra.
Binaural hearing mechanisms have also been studied in terms of adaptation to natural stimulus statistics. Harper and McAlpine [23] have shown that tuning properties of IPD sensitive neurons in a number of species can be predicted from distributions of this cue naturally encountered by the organism. This was done by forming a model neuronal representation of maximal sensitivity to the stimulus change, as quantified by the Fisher information. Two recent experimental studies revealed rapid adaptation of binaural neurons and perceptual mechanisms to changing cue statistics. Dahmen and colleagues [14] stimulated human and animal subjects with non-stationary ILD sequences. They collected electrophysiological and psychophysical evidence in favor of adaptation to the stimulus distribution. Maier et al [33], in turn, have shown that neural tuning curves in the guinea pig and human performance in a localization task can be adapted to varying ITD distributions. Both - neural representation and human performance were, however, constrained to represent midline locations with the highest accuracy. One has to
note that Maier et al, take an issue with the interpretation of results obtained by Dahmen et al. suggesting that they may be explained by adaptation to the sound level and not ILDs per se.
Adaptation of the binaural auditory system to changes in the cue distribution occuring on different timescales seems to be experimentally confirmed. Despite this fact, the statistical structure of binaural sounds encountered in the natural environment and its dependence on the auditory scene have not yet been studied. In this paper we address this shortage. We performed binaural recordings of three real-world auditory scenes characterized by different acoustic properties and spatial dynamics. In the next step we extracted binaural cues - IPDs and ILDs and studied their marginal distributions by means of fitting parametric probability density functions. Parameters of fitted distributions allowed for an easy comparison of different scenes, and revealed which aspects change and which seem to remain invariant in different auditory environments. To analyze high-order statistics of binaural waveforms we performed Independent Component Analysis (ICA) of the signal, and studied properties of the learned features. The results obtained suggest how mechanisms of binaural hearing can be understood in terms of adaptation to natural stimulus statistics. They also allow for experimental predictions regarding neural computation and representation of the auditory space.
## Results
## Binaural spectra
In the first step of the analysis, monaural Fourier spectra were compared with each other. Frequency spectra of recorded sounds are displayed on figure 2. Strong differences across all recorded auditory scenes were present. In two of them - the forrest walk scene and the city center scene, frequency spectrum had an exponential (power-law) shape, which is a characteristic signature of natural sounds [53]. Since the nocturnal nature scene was dominated by grasshoper sounds, its spectrum had two dominant peaks around 7 and 10 kHz. In all three cases, sounds in both ears contained a similar amount of energy in lower frequencies (below 4 kHz) - which is reflected by a good overlap of monaural spectra on the plots. In higher frequencies though, the spectral power was not always equally distributed in both ears. This difference is most strongly visible in the spectrum of the nocturnal nature scene. There, due to a persistent presence of a sound source (a grasshoper) closer to the right ear, corresponding frequencies were amplified with respect to the contralateral ear. Since the spatial configuration of the scene was static, this effect was not averaged out in time. Monaural spectra of the forrest walk scene overlapped to a much higher degree. A small notch in the left ear spectrum is visible around 6 kHz. This is most probably due to the fact that the recording subject stood next to a stream flowing at his right side for a period of time. The city center scene, has almost identical monaural spectra. This is a reflection of its rapidly changing spatial configuration - sound sources of similar quality (mostly human speakers) were present in all positions during the time of the recording.
## Interaural level difference statistics
An example joint amplitude distribution in the left and the right ear is depicted in figure 3 A. It is not easily described by any parametric probability density function (pdf), however monaural amplitudes reveal a strong linear correlation. The correlation coefficient can be therefore used as a simple measure of interaural redundancy by indicating how similar the amplitude signal in both ears is, at a particular frequency channel. High correlation values would imply that both ears receive similar information, while low correlations indicate that the signal at both sides of the head is generated by different sources. Interaural amplitude correlations for all recorded
scenes are plotted as a function of frequency on figure 3 B. A general trend across the scenes is that correlations among low frequency channels (below 1 kHz) are strong (larger than 0 . 5) and decay with a frequency increase. Such a trend is expected due to the filtering properties of the head, which attenuates low frequencies much less than higher ones. The spatial structure of the scene also finds reflection in binaural correlation - for instance, a peak is visible in the nocturnal nature scene at 7 kHz. This is due to a presence of a spatially fixed source generating sound at this frequency (see figure 2). The most dynamic scene - city center - reveals, as expected, lowest correlations across most of the spectrum.
Interaural level differences ILD were computed separately in each frequency channel. Figure 3 C displays an example ILD distribution (black line) together with a best fitting Gaussian (blue dotted line) and logistic distribution (red dashed line). Logistic distributions provided the best fit to ILD distributions for all frequencies and recorded scenes, as confirmed by the KS-test (results not shown). ILD distribution at frequency ω was therefore defined as
$$\rho ( I L D _ { w } | u _ { w } , \sigma _ { w } ) = - \frac { e x p ( - I L D _ { w } ) } { a _ { w } ( 1 + e x p ( - I L ) } }$$
where µ ω and σ ω are frequency specific mean and scale parameters of the logistic pdf respectively. The variance of the logistic distribution is fully determined by the scale parameter.
Empirical ILD distributions are plotted in figure 4 A. As can be immediately observed, they preserve similar shape in all frequency channels and auditory scenes, regardless of their type. The mean ( µ ω ) and scale ( σ ω ) parameters of the fitted distributions are plotted as a function of frequency in figures 4 B and C respectively. The mean of all distributions is very close to 0 dB in most cases. In the two non-static scenes, i.e., forrest walk and city center, deviations from 0 are very small. Marginal ILD distributions of the spatially constant scene - nocturnal nature - were slightly shifted away from zero for frequencies generated by a sound source of a fixed position. The scale parameter behaved differently than the mean. In all auditory scenes it grew monotonically with the increasing frequency. The increase was quite rapid for frequencies below 1 kHz - from 1 . 5 to 2. For higher frequencies the change was much smaller and in the 1 -11 kHz interval σ did not exceed the value of 2 . 5. What may be a surprising observation is the relatively small change in the ILD distribution, when comparing high and low frequencies. It is known that level differences become much more pronounced in the high frequency channels [30], and one could expect a strong difference with a frequency increase. These results can be partially explained by observing a close relationship between Fourier spectra of binaural sounds and means of ILD distributions. In a typical, natural setting sound sources on the left side of the head are qualitatively (spectrally) similar to the ones on the other side, therefore the spectral power in the same frequency bands remains similar in both ears. Average ILDs deviate from 0 if a sound source was present at a fixed position during the averaged time period. Increase in the ILD variance (defined by the scale parameter σ ) with increasing frequency, can be explained by the filtering properties of the head. While for lower frequencies a range of possible ILDs is low, since large spatial displacements generate weak ILD differences, in higher frequency regimes ILDs become more sensitive to the sound source position hence their variability grows. On the other hand, objects on both sides of the head reveal similar motion patterns and in this way reduce the ILD variability, which may account for the the small rate of change. Despite observed differences, ILD distributions revealed a strong invariance to frequency and were homogenous across different auditory scenes.
## Interaural phase difference statistics
Marginal distributions of a univariate, monaural phases over a long time period are all uniform, since phase visits cyclically all values on a unit circle. An interesting structure appears in a joint distribution of left and right ear phase values from the same frequency channel (an example is plotted in figure 5). Monaural phases reveal dependence in their difference. This means that their joint probability is determined by the probability of their difference:
$$( 2 )$$
where φ L and φ R are instantenous phase values in the left and the right ear respectively. The well known physical mechanisms explain this effect. The sound wavefront reaches first the ear ipsilateral to the sound source and then, after a short delay the contralateral one. The temporal difference generates a phase offset, which is reflected in the joint distribution of monaural phases. This simple observation implies, however, that IPDs constitute an intrinsic statistical structure of the natural binaural signal.
IPD histograms were well approximated by the von Mises distribution (additional structure was present in IPDs from the forrest walk scene - see subsection ). A distribution of two monaural phase variables revealing dependence in the difference can be then written as a von Mises distribution of their differences:
$$\rho ( \phi _ { L , w } , \phi _ { R , w } ) = \rho ( I P D _ { w } | k _ { w } , \mu _ { w } ) = \frac { 2 m } { e ^ { - k \cos ( I P D _ { w } - \mu _ { w } ) }$$
where IPD ω = φ L,ω -φ R,ω is the IPD at frequency ω , µ ω and κ ω are frequency specific mean and concentration parameters and I 0 is the modified Bessel function of order 0. In such a case, the concentration parameter κ controls mutual dependence of monaural phases [12]. For large κ ω values φ L,ω and φ R,ω are strongly dependent and the dependence vanishes for κ = 0.
## IPD distributions
Figure 6 A depicts IPD histograms in all scenes depending on the frequency channel. Thick black lines mark IPD ω,max - the 'maximal IPD' value i.e. phase displacement corresponding to a time interval required for a sound to travel the entire interaural distance. IPD ω,max can be computed in a following way. Assuming a spherical head shape, the time period required by the sound wave to travel the distance between the ears is equal to:
$$\overrightarrow { I T D } = \frac { R _ { head } } { v s n d } ( \theta + \sin ( \theta ) )$$
where R head is the head radius, v snd the speed of sound and Θ the angular position of the sound source measured in radians from the midline. The ITD is maximized for sounds located directly oposite to one of the ears, deviating from the midline by π 2 (Θ = π 2 ). ITD max becomes
$$I T D _ { m a x } = \frac { R _ { h e a d } ( \pi + 1 ) . } { v s n d }$$
The maximal IPD is then computed separately in each frequency channel ω
$$1 P D _ { w , \max } = 2 π w I T D _ { \max . }$$
The above calculations assume a spherical head shape, which is a major simplification. It is, however, sufficient for the sake of the current analysis.
At low frequencies most IPD values do not exceed the 'forbidden' line, and the resulting plot has a triangular shape. This is a common tendency in IPD distributions, visible across all auditory scenes. Additionally, due to phase wrapping, for frequencies where π ≤ | IPD max | ≤ 2 π the probability mass is shifted away from the center of the unit circle towards the -π and π values, which is visible as blue, circular regions in the middle of the plot. This trend is not present in the forrest walk scene, where a clear peak at 0 radians is visible for almost all frequencies. This figure can be compared with figure 3 in [23] and 14 in [19]. The two panels below, i.e., figures 6 B and C, display plots of the κ and µ parameters of von Mises distributions as a function of frequency. The concentration parameter κ decreases in all three scenes from a value close to 1 . 5 (strong concentration) to below 0 . 5 in the 200 Hz to 500 Hz interval, which seems to be a robust property in all environments. Afterwards, small rebounds are visible. For auditory scenes recorded by a static subject, i.e., nocturnal nature and city center, rebounds occur at frequencies where IPD max corresponds to π multiplicities (this is again, an effect of phase wrapping). The κ value is higher for a more static scene - nocturnal nature - reflecting a lower IPD variance. For frequencies above 2 kHz, concentration converges to 0 in all three scenes. This means that IPD distributions become uniform and monaural phases mutually independent. The frequency dependence of the position parameter µ is visible on figure 6 C. For the forrest walk scene, IPD distributions were centered at the 0 value with an exception at 700 Hz. For the two scenes recorded by a static subject , distribution peaks were roughly aligned along the IPD max as long as it did not exceed -π or π value. In higher frequencies they varied much stronger, although one has to note that for distributions close to uniform ( κ → 0), position of the peak becomes an ill defined and arbitrary parameter.
Equations 4 - 6 allow to compute the 'maximal' IPD value ( IPD max ), constrained by the size of the organism's head. A single, point sound source in an anechoic environment would never generate IPD exceeding IPD max . In natural hearing conditions however, such IPDs occur due to the presence of two sound sources at both sides of the head or due to acoustic reflections [21]. Their presence is visible in figure 6 as probability mass lying outside of the black lines marking maximal IPD values at particular frequencies. Figure 7 displays a proportion of IPDs larger than the one defined by the head size plotted against frequency. The lines corresponding to three recorded auditory environments lie in parallel to each other, displaying almost the same trend up to a vertical shift. The highest proportion of IPDs exceeding the 'maximal' value was present in the nocturnal nature scene. This was most probably caused by a large number of very similar sound sources (grasshoppers) at each side of the head. They generated non-synchronized and strongly overlapping waveforms. Phase information in each ear resulted therefore from an acoustic summation of multiple sources, hence instantenous IPD was not directly related to a single source position and often exceeded the IPD max value. Surprisingly, IPDs in the most spatially dynamic scene - city center - did not exceed the IPD max limit as often. This may be due to a smaller number of sound sources present and may indicate that the proportion of 'forbidden' IPDs is a signature of a number of sound sources present in the scene. For nocturnal nature and city center scenes the proportion peaked at 400 Hz achieving values of 0 . 45 and 0 . 35 respectively. For a forrest walk scene, the peak at 400 Hz did not exceed the value of 0 . 31 at 200 Hz. All proportion curves converged to 0 at 734 Hz frequency, where IPD max = π .
## Separation of speech with single channel IPDs
As already mentioned before, IPD distributions at most frequency channels in the forrest walk scene revealed an additional property, namely a clear, sharp peak at 0 radians. This feature was not present in the two other, statically recorded scenes. As an example, IPD distribution at 561 Hz is depicted in figure 8 A. The histogram structure reflects the elevated presence of sounds
with IPDs close to 0 hence equal monaural phase values. Zero IPDs can be generated either by sources located at the midline (directly in front or directly in the back) or self-produced sounds such as speech, breathing or loud footsteps.
As visible in figure 8 two components contributed to the structure of the marginal IPD distribution - the sharp 'peak component' (dashed blue line) and the broad 'background' (dashed red line). Due to this property, IPD histograms were well suited to be modelled by a mixture model. This means that their pdf could be represented as a linear combination of two von Mises distributions in the following way
$$\sum _ { i = 1 } ^ { n } p ( C _ { i } ) p ( I P D _ { w _ { i } } | w _ { i } , p _ { w _ { i } } )$$
where κ ω ∈ R 2 and µ ω ∈ R 2 are parameter vectors, C i ∈ { 1 , 2 } are class labels, p ( C i ) are prior probabilities of class membership and p ( IPD ω | κ ω,i , µ ω,i ) are von Mises distributions defined by equation 3. A fitted mixture of von Mises distributions is also visible in figure 8 A, where dashed lines are mixture components and a continuous black line is the marginal distribution. It is clearly visible that a two-component mixture fits the data much better than a plain von Mises distribution. There is also an additional advantage of fitting such a mixture model, namely it allows to perform a classification problem and assign each IPD sample (and therefore each associated sound sample) to one of the two classes defined by mixture components. Since the prior over class labels is assumed to be uniform, this procedure is equivalent to finding a maximumlikelihood estimate ˆ C of C
$$c = \arg \max _ { c } p ( I P D _ { c } | C )$$
In this way, if no sound source at the midline is present, a separation of self generated sounds from the background should be easily performed using information from a single frequency channel. Results of a self-generated speech separation task are displayed in figure 8 B. A two-second binaural sound chunk included two self-spoken words with a background consisting of a flowing stream. Each sample was classified basing on an associated IPD value at 561 Hz. Samples belonging to the second, sharp component are coloured blue and background ones are red. It can be observed that the algorithm has successfully separated spoken words from the environmental noise. Audio samples are available in the supplementary material.
## Independent components of binaural waveforms
In this section, instead of studying predetermined features of the stimulus (binaural cues), we use binaural waveforms to train Independent Compnent Analysis (ICA) - a statistical model which optimizes a general-purpose objective - coding efficiency [8]. In the ICA model, short (8 . 7 ms) epochs of binaural sounds are assumed to be a linear superposition of basis functions multiplied by linear coefficients s (see figure 9 A). Linear coefficients are assumed to be independent and sparse , i.e., close to 0 for most of data samples in the training dataset. Basis functions learned by ICA can be interpreted as patterns of correlated variability present in the dataset.
Figure 9 B depicts exemplary basis functions learned from each recording. Each feature consists of two parts, representing signal in the left and the right ear (black and red colours respectively). Features trained on different recordings vary in their shape. Those differences are explicitely visible in spectrotemporal representations of basis functions depicted on figure 10. Each shape corresponds to an equiprobability contour of a Wigner distribution associated with a single basis function. Wigner distributions localize energy of a temporal signal in the time frequency plane. Left and right ear parts belonging to the same feature are plotted with
the same color. The obtained time-frequency tilings reveal a strong dependence on the auditory scene. Firstly basis function shapes are different - from time extended and frequency-localized in the city center scene, to temporally brief, instantenous features of the forrest walk scene. Despite shape differences, in each case, basis functions tile the time-frequency plane uniformly. Their shapes constitute an interesting aspect of the auditory scene and can be compared with results obtained by [1, 32]. This is, however not the focus of the current work.
Sounds of the most spatially static scene - nocturnal nature - were modelled mostly by features of the same spectrotemporal properties in each ear (with an anomally which occurred around 3 . 5 kHz). This is visible in figure 10 - blobs of the same color lie mostly in the same region on the left and the right ear plots. In more dynamic scenes, independent components (ICs) captured different, non-trivial dependencies. Pure frequency features learned from the city center recording had similar monaural parts below 3 . 5 kHz. Above this threshold, a cross-frequency interaural coupling appeared - in the right ear panel, blue colored features lie in the high frequency regime, while in the left ear they occupy a low frequency region. This means that to represent natural binaural signal efficiently, monaural information from different frequencies should be processed simultaneously. Interaural dependencies represented by ICs of the forrest walk scene were even more complex. Since most of the basis functions were much more temporal than spectral, time dependencies were also captured in addition to the spectral ones. High frequency events in the right ear were coupled with more temporally extended, low-frequency features of the left ear. Interestingly, tiling of the time-frequency plane associated with the right ear was not as uniform as for the left one.
The majority of learned basis functions was highly localized in frequency, which agrees with results obtained by [1, 32, 48]. However, some basis functions did not have well localized spectra. They were excluded from the analysis, that is why the number of basis functions varies across the analyzed auditory scenes. See materials and methods for the detailed discussion. To understand how spectral power was distributed in monaural parts of ICs, we computed a peak power ratio (PPR):
$$P P R = 1 0 \log _ { 1 0 } ( \frac { A _ { max , L } } { A _ { max , R } } )$$
where A max,L , A max,R are maximal spectrum values of the left and right ear parts of each IC respectively. Each circle in figure 11 represents a single IC. Its vertical and horizontal coordinates are monaural peak frequencies and colors encode the PPR value. Features which lie along the diagonal can be considered as a representation of 'classical' ILDs, since they encode features of the same frequency in each ear and differ only in level. ICs lying away from the diagonal with high absolute PPR values represent more monaural information, and those with the low absoulte PPR other aspects of the stimulus, such as interaural cross-frequency couplings. Figure 12 depicts proportion of features of same monaural frequencies (on diagonal) and those which bind different frequency channels (off diagonal). A pronounced difference among auditory scenes is visible in figure 11. The majority of basis functions learned from the nocturnal nature scene (161) clusters closely to the diagonal. The basis function set trained on the mostly dynamic scene (city center) separates into three clear subpopulations. Two of them, including 140 features were monaural. Monaural basis functions were dominated mostly by the spectrum of a single ear part, and the part representing the contralateral ear was of a very low frequency, close to a DC component. The binaural subpopulation contained 111 basis functions perfectly aligned with the diagonal. Such separation suggests that waveforms in both ears were highly independent and should be modelled using a large set of separate, monaural events. ICA trained on the forrest walk scene yielded a set of basis functions which was a compromise between nocturnal nature and city center scenes. Even though the highest number of features - 165 lied off the diagonal, the separation was not as sharp as for the city center scene. What clearly appeared was a division
into two subpopulations, members of which were dominated by the spectrum of one of the ears. ICs mostly coupled low frequencies ( < 2 kHz) from one ear with a broad range of frequencies in the other. Those properties may imply that in the case of this scene, both - features modelling binaural dependendencies and capturing purely monaural events - were required to model the data. To allow further comparison of learned ICs and known coding mechanisms in the binaural auditory system, we computed ILD and IPD cue values. This was done only for features encoding the same frequency information in both ears, since phase differences are ill defined otherwise, and auditory brainstem extracts cues mostly from the same frequency channels [21]. Results are visible in figure 13. IPDs represented by independent components separated into two channels in the city center and forrest walk scenes. The range of IPDs was higher for a more spatially varying scene, which is visible as a strong scatter of points.
For the nocturnal nature scene no such separation is visible. This is perhaps due to the fact that object positions were mostly fixed, generating lowly-varying IPDs captured by the learned ICs. Therefore the model did not have to generalize over a broader range of IPDs. ILDs in turn, in all scenes were separated into two distinct channels. The separation strength correlated with the scene's spatial variability and was highest for the city center scene and lowest for the nocturnal nature. Interestingly, in the latter one, ILD features were present also in high frequencies which was not the case in the two others. Also here, the separation of features seems to reflect the spatial structure and dynamics of the auditory scene.
## Discussion
Binaural cues are usually studied in a relationship to the angular position of the generating stimulus [18, 17, 24]. In probabilistic terms this corresponds to modelling the conditional probability distribution p ( cue | θ ), where θ is the angular stimulus location. According to the Bayes theorem, position inference given the cue can be then performed by: (a) computing the posterior distribution p ( θ | cue ) and (b) identifying θ for instance, as a maximum of the posterior distribution. Formally this process can be described by the following equations:
$$p ( \beta | cue ) \times p ( cue | \beta ) p ( cue )$$
$$\theta = \arg \max _ { q } p ( | c u e )$$
where ˆ θ is the estimated position. The Bayesian approach to sound localization has been succesfuly applied before, for instance to predict behavior and neural representation of binaural cues in the barn owl [18].
In the present study, we focused on marginal distributions of cues and binaural waveforms. This approach allows us to understand aspects of binaural hearing in the natural environment which are not directly related to the sound localization task. Marginal distributions p ( cue ) describe global properties of the stimulus to which the nervous system is exposed under natural conditions. Knowledge of a typical stimulus structure allows to predict properties of the sensory neurons [46, 5] and helps in understanding the complexity of the task such as binaural auditory scene analysis, when performed in ecological conditions.
## Binaural cues in complex auditory environments
Binaural scenes recorded and studied in this paper were selected to represent broad groups of possible auditory environments of different acoustic and spatial properties. In all three cases,
waveforms in each ear were for most of the time an acoustic summation of multiple sound sources. Additional factors, which influenced monaural stimuli were motion trajectories of objects and the listener, as well as sound reflections. Instantenous binaural cue values were therefore not generated by a single, point source, but were a function of a complex auditory scene. Inversion of a cue value to a sound position becomes, in such a setting, an inverse problem, since multiple scene configurations could give rise to the same cue value (for instance an ILD equal to 0 can be generated by a single source located at the midline, or two identical sources symmetricaly located on both sides of the head, see section ). In such scenarios, the sound localization task can not be performed as a simple inversion of a cue value to the sound position (the most simple case described by equations 10 and 11). It rather becomes equivalent to the cocktail party problem [36]. Localization of a sound source in complex listening situations has been a subject of substantial psychophysical [9] and electrophysiological [51, 29, 15, 6] research. An interesting theoretical model has been suggested by Faller and Merimaa [16]. The authors of this study proposed that to localize one sound source out of many present, the auditory system could use instantenous binaural cues only in time intervals when the left and the right ear waveforms are highly coherent (i.e. their cross-correlation peak exceeds a certain threshold). In such brief moments, ILD and IPD values would correspond to only a single source. This mechanism is able to explain numerous psychophysical studies. Meffin and Grothe [38] hypothesized that the auditory brainstem may perform low-pass filtering of localization cues to reject rapidly fluctuating 'spurious' cue values, which may originate from multiple sources. The aforemetioned mechanisms, however, involve a rejection of a large amount of information, by discarding 'ambiguous cues', which may still contain information useful in the auditory scene parsing. In very general terms, a useful strategy for the auditory system would be to use higher dimensional stimulus features (such as temporal cue sequences, or cross-frequency cue dependencies) to separate a source (or sources) of interest from the background and infere its spatial configuration. It has been demonstrated, for instance, that neurons in the Inferior Colliculus of the rat show a stronger response to dynamic, 'ecologically valid' IPD sequences, than to constant IPDs [49, 50]. In the auditory cortex of macaque monkeys, neurons become sensitive to even more complex IPD sequences [34]. Such properties may be examples of tuning to high-dimensional, binaural stimulus aspects. In the above mentioned view, instantenous binaural cues, as extracted by the early brainstem nuclei LSO and MSO [21], provide information useful in the auditory scene analysis task. In natural conditions however, their mere identification is not necessarily equivalent to the localization of the sound position. Binaural cues may rather serve as inputs to further computations (which are not necesserily limited to sound localization per se) performed in the higher stages of the binaural auditory pathway.
## Implications for neural processing and representation of binaural sounds
As predicted by physics of sound propagation, monaural phase values in natural environments reveal dependence in their difference. The strength of the dependence is measured by κ - the concentration parameter of the von Mises IPD distribution [12]. Interestingly, humans stop using IPDs to localize sounds above 1 . 5 kHz [54] i.e. the frequency regime, where monaural phases become marginally independent (as reflected by the decay of the κ parameter).
In anechoic environments, point sources of sound generate ITD values which are constrained by the head size of the listener. It has been, however, observed that in many species, IPD sensitive neurons have peaks of their tuning curves located outside of this 'physiological' range [21]. This representational strategy has been explained by suggesting that in mammals IPDs are encoded by the activity of two separate, broadly tuned neural channels. Notably, such a representation emerges as a consequence of maximizing Fisher information about naturally occuring IPDs [23].
Here, we demonstrate that in natural hearing conditions a substantial amount of IPDs (up to 45%) lies outside of the physiological range. Those IPD values may be a result of a reflection [20] or a presence of multiple spatially separate desynchronized sound sources [21]. Sound reflections generate reproducible cues and carry information about the spatial properties of the scene [20]. If a large IPD did not arise as a result of a reflection, it means that at least two sound sources contribute to the stimulus at the same frequency. Especially in the latter case, IPDs provide not only spatial information useful to identify the position of the sound, but become a strong source separation cue. Proportion of IPDs exceeding the physiological range decreased with growing frequency (since the maximal IPD limit increases). This observation agrees with the experimental data showing that in many species, neurons with low best frequency are tuned to large IPDs which often exceed the physiological range [35, 10, 22, 31]. Taken together, IPDs larger than predicted by the head size occur frequently in natural hearing conditions and carry important information. This can be an additional factor, explaining why in mammals, peaks of the IPD tuning curves lie outside of the physiological range. As demonstrated in section , interaural phase differences can be used for not explicitely spatial hearing tasks, such as extraction of self-generated speech (and potentially other sounds, such as steps). If there is no sound source present at the midline location (corresponding to 0 IPD), a simple classification procedure suffices to identify and separate vocalization of oneself from the background sound using information from a single frequency channel. Differentiation between self generated sounds and sounds of the environment is a behaviorally relevant task which has to be routinely performed by animals.
According to the Duplex Theory, ILDs contribute mostly to localization of high frequency sounds, since the head attenuates higher frequencies much stronger than lower ones [9]. Analysis of human Head Related Transfer Functions (HRTFs) shows that ILDs are almost constant at different spatial positions for low frequencies and become more variable (hence more informative) when frequency increases above 4 kHz [30]. For single sound sources in anechoic environments, ILDs can have values as large as 40 dB [30]. Based on those observations, one could expect that natural ILD distributions are strongly frequency dependent. Somewhat surprisingly, natural distributions reveal a quite homogenous structure across different frequency channels, which is well captured by the logistic distribution. Overall, averages are equal (or very close to) 0 dB in different auditory environments, and for all studied frequencies. Variance slightly increases with increasing frequency. Homogeneity of distribution forms and averages can be explained in the following way. Typical, natural auditory scenes consist of similar sound sources at both sides of the head (human speakers, grasshoppers, wind, etc). Each of the sound sources has a similar spectrum, hence they all contribute to waveforms in the left and right ear mutually cancelling each other. For this reason, ILD averages are close to 0 and have a similar shape in different environments. The variance increase can be explained by properties of the head related filtering since small movements of high frequency sources give rise to a large ILD variability. As mentioned before, interaural level differences are mostly believed to contribute to localization of high frequency sounds [9] since then they are large enough to be easily detectable. It has been, however, demonstrated that sound sources proximal to the listener can generate pronounced ILDs also in low frequencies (below 1 . 5 kHz) [11, 45]. Our results show that in natural environments the auditory system is exposed to a similar ILD distribution across all frequencies, including the low ones. The distribution includes also relatively large values (above 10 dB). Close sound sources and other environmental factors such as the wind perceived in only one ear generate large low-frequency ILDs. One could therefore speculate that neurons with low best frequencies should also form an ILD representation. Indeed, such neurons have been found in the Lateral Superior Olive (LSO) of the cat [52].
To go beyond studying one-dimensional features of the binaural signal (ILDs and IPDs), the probability distribution of short binaural waveforms was modelled by performing Independent
Component Analysis. A similar analysis in the visual domain was performed by Hoyer and Hyv¨ arinen [25] for binocular image pairs. The ICA algorithm has identified complex patterns of dependency, different for each studied auditory scene. Interestingly, the spectrotemporal shape of the monaural parts of the basis functions varied strongly across recorded auditory scenes. The obtained results can be compared with other studies which applied Independent Component Analysis to natural sounds [32, 1, 7]. Linear codes learned by the ICA model show that one should adopt different representations depending on the properties of the acoustic environment. A static scene (nocturnal nature) generated monaural waveforms, which were highly redundant, since the signal in each ear was originating mostly from the same source. For this reason, the majority of basis functions represented amplitude fluctuations in both ears and in the same frequency channels. When sound sources moved rapidly and independently from each other at both sides of the head, waveforms in each ear were much less redundant. That is why a representation of the dynamic binaural scene (city center) consisted of three, clearly separate populations of basis functions - two representing monaural signal, and one binaural. Interestingly, binaural functions coupled monaural channels of the same frequency. The moderatly dynamic scene (forrest walk) was best represented by basis functions which were mostly monaural and modelled a broad range of binaural cross-frequency dependencies. A variety of different dependency forms were captured, including temporal, spectral and spectrotemporal ones. This implies that information present in the binaural signal goes beyond instantenous binaural cue values. This notion goes in line with studies, which have found and characterized spectrotemporal binaural neurons at the higher stages of the auditory pathway [41, 39]. Binaural hearing in the natural environment may also rely on comparison of spectrotemporal information at both sides of the head.
## Conclusions
In the present study, we analyzed marginal statistics of binaural cues and waveforms. Thereby, we provided a general statistical characterization of the stimulus processed by the binaural auditory system in natural listening conditions. We have also made availible natural binaural recordings, which may be used by other researchers in the field. In a broad perspective, this study contributes to the lines of research that attempt to explain properties of the auditory system by analyzing natural stimulus structures. Further understanding of binaural hearing mechanisms will require a more systematic analysis of higher order stimulus statistics. This is the subject of future research.
## Materials and Methods
## Recorded scenes
The main goal of the study was to analyze cue distributions in different auditory environments. To this end, three auditory scenes of different spatial dynamics and acoustic properties were recorded. Each of the recordings lasted 12 minutes.
1. Nocturnal nature -the recording subject sat in a randomly selected position in the garden during a summer evening. During the recording the subject kept his head still, looking ahead, with his chin parallel to the ground. The dominating background sound are grasshopper calls. Other acoustic events included sounds of a distant storm and a few cars passing by on a near-by road. The spatial configuration of this scene did not change much in time - it was almost static.
2. City center - the recording subject sat in a touristic area of an old part of town, fixating the head as in the previous case. During the recording many moving and static human speakers were present. Contrasted with the previous example, the spatial configuration of the scene varied continuously.
3. Forrest walk - this recording was performed by a subject freely moving in the wooded area. A second speaker was present, engaged in a free conversation with the recording subject. In addition to speech, this scene included environmental sounds such as flowing water, cracks of broken sticks, leave crunching, wind etc. The binaural signal was affected not only by the spatial scene configuration, but also by the head and body motion patterns of the recording subject.
Two of the analyzed auditory scenes (nocturnal nature and city center) were recorded by a non-moving subject, therefore sound statistics were unaffected by the listener's motion patterns and self generated sounds. In the third scene (forrest walk) the subject was moving freely and speaking sparsely. Scene recordings are available in the supplementary material.
## Binaural recordings
Recordings were performed using Soundman OKM-II binaural microphones which were placed in the left and the right ear channels of the recording subject. A Soundman DR2 recorder was used to simultaneously record sound in both channels in an uncompressed wave format at 44100 Hz sampling rate. The head circumference of the recording subject was equal to 60 cm. Assuming a spherical head model this corresponds to a 9 . 5 cm head radius.
## Frequency filtering and cue extraction
Prior to the analysis, raw recordings were down-sampled to the 22050 Hz sampling rate. The filtering and cue extraction pipeline is schematically depicted in figure 1
To emulate spectral decomposition of the signal performed by the cochlea, sound waveforms from each ear were transformed using a filterbank of 64 linear gammatone filters. Filter center frequencies were lineary spaced between 200 and 3000 Hz for IPD analysis and 200 and 10000 Hz for ILD analysis.
A Hilbert transform in each frequency channel was performed. In result, instantenous phase φ L,R ( ω, t ) and amplitude A L,R ( ω, t ) were extracted, separating level and phase information. Instantenous binaural cue values were computed in corresponding frequency channels ω from both ears according to the following equations:
$$\frac { I L D ( w , t ) } { I R ( w , t ) } = 1 0 \times \log _ { 1 0 } \frac { A _ { R } ( w , t ) } { A _ { L } ( w , t ) }$$
$$IP D ( w , t ) = \phi _ { L } ( w , t ) - \phi _ { R } ( w , t )$$
IPDs with an absolute value exceeding Î were wrapped to a [ -Î , Î ] interval. Time series of IPD and ILD cues obtained in this way in each frequency channel were subjected to the further analysis.
## Independent Component Analysis
Independent Component Analysis (ICA) is a family of algorithms which attempt to find a linear transformation of the data that minimizes redundancy [28]. Given the data matrix X ∈ R n × m
(where n is the number of data dimensions and m number of samples), ICA finds a filter matrix W ∈ R n × n with
$$\vert x \vert = s$$
where the columns of X are data vectors x ∈ R n , the rows of W are linear filters w ∈ R n and S ∈ R n × m is a matrix of latent coefficients, which according to the assumptions are marginally independent. Equivalently the model can be defined using a basis function matrix A = W -1 , such that:
$$x = A S$$
The columns a ∈ R n of the matrix A are called basis functions. In modelling of neural systems they are usually interpreted as linear receptive fields forming an efficient code of the training data ensemble [28]. Each data vector can be represented as a linear combination of basis functions a , multiplied by linear coefficients s according to the equation 16.
$$x ( t ) = \sum _ { i } s _ { i } a _ { i } ( t )$$
where t indexes the data dimensions. The set of basis functions a is called a dictionary. ICA attempts to learn a linear, maximally non-redundant code, hence the latent coefficients s are assumed to be statistically independent i.e.
$$p ( s ) = \sum _ { i = 1 } ^ { n } p ( s _ { i } )$$
The marginal probability distributions p ( s i ) are typically assumed to be sparse (i.e. of high kurtosis), since natural sounds and images have an intrinsically sparse structure [40] and can be represented as a combination of a small number of primitives. In the current work we assumed a logistic distribution of the form:
$$p ( s _ { 1 } \vert u _ { 2 } s ) = \frac { e ^ { - s _ { 1 } } } { \{ 1 + e ^ { - s _ { 2 } } \} }$$
with position µ = 0 and the scale parameter ξ = 1. Basis functions were learned by maximizing the log-likelihood of the model via gradient ascent [28].
Prior to ICA learning, the recordings were downsampled to a 14700 Hz sampling rate (to obtain easy comparison with results in [32]). A training dataset was created by randomly drawing 100000 intervals each 128 samples long (corresponding to 8 . 7 ms).
## Acknowledgments
This work was funded by the DFG graduate college InterNeuro.
## References
- [1] Samer A Abdallah and Mark D Plumbley. If the independent components of natural images are edges, what are the independent components of natural sounds. In Proceedings of International Conference on Independent Component Analysis and Signal Separation (ICA2001) , pages 534-539, 2001.
- [2] H Attias and CE Schreiner. Temporal low-order statistics of natural sounds. Advances in neural information processing systems , pages 27-33, 1997.
- [3] Fred Attneave. Some informational aspects of visual perception. Psychological review , 61(3):183, 1954.
- [4] Horace B Barlow. Possible principles underlying the transformation of sensory messages. Sensory communication , pages 217-234, 1961.
- [5] Horace B Barlow. Unsupervised learning. Neural computation , 1(3):295-311, 1989.
- [6] Caitlin S Baxter, Brian S Nelson, and Terry T Takahashi. The role of envelope shape in the localization of multiple sound sources and echoes in the barn owl. Journal of neurophysiology , 109(4):924-931, 2013.
- [7] Anthony J Bell and Terrence J Sejnowski. Learning the higher-order structure of a natural sound*. Network: Computation in Neural Systems , 7(2):261-266, 1996.
- [8] Anthony J Bell and Terrence J Sejnowski. The independent components of natural scenes are edge filters. Vision research , 37(23):3327-3338, 1997.
- [9] Jens Blauert. Spatial hearing: the psychophysics of human sound localization . MIT press, 1997.
- [10] Antje Brand, Oliver Behrend, Torsten Marquardt, David McAlpine, and Benedikt Grothe. Precise inhibition is essential for microsecond interaural time difference coding. Nature , 417(6888):543-547, 2002.
- [11] Douglas S Brungart and William M Rabinowitz. Auditory localization of nearby sources. head-related transfer functions. The Journal of the Acoustical Society of America , 106:1465, 1999.
- [12] Charles F Cadieu and Kilian Koepsell. Phase coupling estimation from multivariate phase statistics. Neural computation , 22(12):3107-3126, 2010.
- [13] Nicole L Carlson, Vivienne L Ming, and Michael Robert DeWeese. Sparse codes for speech predict spectrotemporal receptive fields in the inferior colliculus. PLoS computational biology , 8(7):e1002594, 2012.
- [14] Johannes C Dahmen, Peter Keating, Fernando R Nodal, Andreas L Schulz, and Andrew J King. Adaptation to stimulus statistics in the perception and neural representation of auditory space. Neuron , 66(6):937-948, 2010.
- [15] Mitchell L Day, Kanthaiah Koka, and Bertrand Delgutte. Neural encoding of sound source location in the presence of a concurrent, spatially separated source. Journal of Neurophysiology , 108(9):2612-2628, 2012.
- [16] Christof Faller and Juha Merimaa. Source localization in complex listening situations: Selection of binaural cues based on interaural coherence. The Journal of the Acoustical Society of America , 116:3075, 2004.
- [17] Brian J Fischer. Optimal models of sound localization by barn owls. In Advances in Neural Information Processing Systems , pages 449-456, 2007.
- [18] Brian J Fischer and Jos´ e Luis Pe˜ na. Owl's behavior and neural representation predicted by bayesian inference. Nature neuroscience , 14(8):1061-1066, 2011.
- [19] Dan FM Goodman and Romain Brette. Spike-timing-based computation in sound localization. PLoS computational biology , 6(11):e1000993, 2010.
- [20] Boris Gour´ evitch and Romain Brette. The impact of early reflections on binaural cues. The Journal of the Acoustical Society of America , 132:9, 2012.
- [21] Benedikt Grothe, Michael Pecka, and David McAlpine. Mechanisms of sound localization in mammals. Physiological Reviews , 90(3):983-1012, 2010.
- [22] Kenneth E Hancock and Bertrand Delgutte. A physiologically based model of interaural time difference discrimination. The Journal of neuroscience , 24(32):7110-7117, 2004.
- [23] Nicol S Harper and David McAlpine. Optimal neural population coding of an auditory spatial cue. Nature , 430(7000):682-686, 2004.
- [24] Paul M Hofman and A John Van Opstal. Bayesian reconstruction of sound localization cues from responses to random spectra. Biological cybernetics , 86(4):305-316, 2002.
- [25] Patrik O Hoyer and Aapo Hyv¨ arinen. Independent component analysis applied to feature extraction from colour and stereo images. Network: Computation in Neural Systems , 11(3):191-210, 2000.
- [26] Anne Hsu, Sarah MN Woolley, Thane E Fremouw, and Fr´ ed´ eric E Theunissen. Modulation power and phase spectrum of natural sounds enhance neural encoding performed by single auditory neurons. The Journal of neuroscience , 24(41):9201-9211, 2004.
- [27] Aapo Hyv` earinen, Jarmo Hurri, and Patrick O Hoyer. Natural Image Statistics , volume 39. Springer, 2009.
- [28] Aapo Hyv` earinen, Jarmo Hurri, and Patrick O Hoyer. Natural Image Statistics , volume 39. Springer, 2009.
- [29] Clifford H Keller and Terry T Takahashi. Localization and identification of concurrent sounds in the owl's auditory space map. The Journal of neuroscience , 25(45):10446-10461, 2005.
- [30] Andrew J King, Jan WH Schnupp, and Timothy P Doubell. The shape of ears to come: dynamic coding of auditory space. Trends in cognitive sciences , 5(6):261-270, 2001.
- [31] Shigeyuki Kuwada and Tom C Yin. Binaural interaction in low-frequency neurons in inferior colliculus of the cat. i. effects of long interaural delays, intensity, and repetition rate on interaural delay function. Journal of Neurophysiology , 50(4):981-999, 1983.
- [32] Michael S Lewicki. Efficient coding of natural sounds. Nature neuroscience , 5(4):356-363, 2002.
- [33] Julia K Maier, Phillipp Hehrmann, Nicol S Harper, Georg M Klump, Daniel Pressnitzer, and David McAlpine. Adaptive coding is constrained to midline locations in a spatial listening task. Journal of Neurophysiology , 108(7):1856-1868, 2012.
- [34] Brian J Malone, Brian H Scott, and Malcolm N Semple. Context-dependent adaptive coding of interaural phase disparity in the auditory cortex of awake macaques. The Journal of neuroscience , 22(11):4625-4638, 2002.
- [35] David McAlpine, Dan Jiang, and Alan R Palmer. A neural code for low-frequency sound localization in mammals. Nature neuroscience , 4(4):396-401, 2001.
- [36] Josh H McDermott. The cocktail party problem. Current Biology , 19(22):R1024-R1027, 2009.
- [37] Josh H McDermott and Eero P Simoncelli. Sound texture perception via statistics of the auditory periphery: evidence from sound synthesis. Neuron , 71(5):926-940, 2011.
- [38] Hamish Meffin and Benedikt Grothe. Selective filtering to spurious localization cues in the mammalian auditory brainstem. The Journal of the Acoustical Society of America , 126:2437, 2009.
- [39] Lee M Miller, Monty A Escab´ ı, Heather L Read, and Christoph E Schreiner. Spectrotemporal receptive fields in the lemniscal auditory thalamus and cortex. Journal of Neurophysiology , 87(1):516-527, 2002.
- [40] Bruno A Olshausen and David J Field. Sparse coding with an overcomplete basis set: A strategy employed by v1? Vision research , 37(23):3311-3325, 1997.
- [41] Anqi Qiu, Christoph E Schreiner, and Monty A Escab´ ı. Gabor analysis of auditory midbrain receptive fields: spectro-temporal and binaural composition. Journal of Neurophysiology , 90(1):456-476, 2003.
- [42] Lord Rayleigh. On our perception of the direction of a source of sound. Proceedings of the Musical Association , 2:75-84, 1875.
- [43] F Rieke, DA Bodnar, and W Bialek. Naturalistic stimuli increase the rate and efficiency of information transmission by primary auditory afferents. Proceedings of the Royal Society of London. Series B: Biological Sciences , 262(1365):259-265, 1995.
- [44] Fred Rieke, David Warland, Rob Deruytervansteveninck, and William Bialek. Spikes: exploring the neural code (computational neuroscience). 1999.
- [45] Barbara G Shinn-Cunningham, Scott Santarelli, and Norbert Kopco. Tori of confusion: Binaural localization cues for sources within reach of a listener. The Journal of the Acoustical Society of America , 107:1627, 2000.
- [46] Eero P Simoncelli and Bruno A Olshausen. Natural image statistics and neural representation. Annual review of neuroscience , 24(1):1193-1216, 2001.
- [47] Nandini C Singh and Fr´ ed´ eric E Theunissen. Modulation spectra of natural sounds and ethological theories of auditory processing. The Journal of the Acoustical Society of America , 114:3394, 2003.
- [48] Evan C Smith and Michael S Lewicki. Efficient auditory coding. Nature , 439(7079):978-982, 2006.
- [49] Matthew W Spitzer and Malcolm N Semple. Interaural phase coding in auditory midbrain: influence of dynamic stimulus features. Science , 254(5032):721-724, 1991.
- [50] Matthew W Spitzer and Malcolm N Semple. Transformation of binaural response properties in the ascending auditory pathway: influence of time-varying interaural phase disparity. Journal of neurophysiology , 80(6):3062-3076, 1998.
- [51] Terry T Takahashi and Clifford H Keller. Representation of multiple sound sources in the owl's auditory space map. The Journal of neuroscience , 14(8):4780-4793, 1994.
- [52] Daniel J Tollin and Tom CT Yin. Interaural phase and level difference sensitivity in lowfrequency neurons in the lateral superior olive. The Journal of neuroscience , 25(46):1064810657, 2005.
- [53] Richard F Voss and John Clarke. 1/fnoise'in music and speech. Nature , 258:317-318, 1975.
- [54] Frederic L Wightman and Doris J Kistler. Sound localization. In Human psychophysics , pages 155-192. Springer, 1993.
## Figures
Figure 1: Preprocessing and cue extraction pipeline
<details>
<summary>Image 1 Details</summary>

### Visual Description
## Diagram: Auditory Localization Processing
### Overview
This image depicts a diagram illustrating the processing of auditory information for sound localization. It shows a signal processing pathway from left and right ear sound inputs to Interaural Level Difference (ILD) and Interaural Phase Difference (IPD) statistics, followed by two charts visualizing the filter responses for ILD and IPD across different frequencies.
### Components/Axes
The diagram consists of three main parts: A, B, and C.
* **Part A:** A flow diagram showing the processing steps. It includes components labeled: "Left ear sound", "γ-tone filterbank", "Hilbert transform", "Amplitude", "Phase", "ILD statistics", "IPD statistics", and "Right ear sound". Arrows indicate the direction of signal flow.
* **Part B:** A chart titled "ILD" (Interaural Level Difference). The x-axis is labeled "Frequency [kHz]" and ranges from approximately 0.01 to 10 kHz. The y-axis is labeled "Filter response [dB]" and ranges from 0 to -60 dB. Multiple lines represent filter responses at different levels.
* **Part C:** A chart titled "IPD" (Interaural Phase Difference). The x-axis is labeled "Frequency [kHz]" and ranges from approximately 0.01 to 3 kHz. The y-axis is labeled "Filter response [dB]" and ranges from 0 to -60 dB. Multiple lines represent filter responses at different levels.
### Detailed Analysis or Content Details
**Part A: Signal Processing Flow**
1. **Left ear sound:** An initial waveform representing sound input to the left ear.
2. **γ-tone filterbank:** The sound is processed by a γ-tone filterbank.
3. **Hilbert transform:** The output of the filterbank is then passed through a Hilbert transform, which separates the signal into Amplitude and Phase components.
4. **ILD statistics:** Amplitude information is used to calculate Interaural Level Difference (ILD) statistics.
5. **IPD statistics:** Phase information is used to calculate Interaural Phase Difference (IPD) statistics.
6. **Right ear sound:** An initial waveform representing sound input to the right ear.
7. **γ-tone filterbank:** The sound is processed by a γ-tone filterbank.
8. **Hilbert transform:** The output of the filterbank is then passed through a Hilbert transform, which separates the signal into Amplitude and Phase components.
**Part B: ILD Filter Responses**
The ILD chart displays multiple curves representing filter responses. The curves generally exhibit the following characteristics:
* **Low Frequencies (0.01 - ~0.5 kHz):** Most curves are relatively flat, near 0 dB.
* **Mid Frequencies (~0.5 - 2 kHz):** Curves begin to diverge, with some decreasing rapidly towards -60 dB.
* **High Frequencies (~2 - 10 kHz):** Curves continue to diverge, with a wider range of filter responses. Some curves remain near 0 dB, while others reach -60 dB.
* There are approximately 15 curves visible, each representing a different filter response. The curves are closely spaced, particularly at lower frequencies.
**Part C: IPD Filter Responses**
The IPD chart displays multiple curves representing filter responses. The curves generally exhibit the following characteristics:
* **Low Frequencies (0.01 - ~0.2 kHz):** Most curves are relatively flat, near 0 dB.
* **Mid Frequencies (~0.2 - 1 kHz):** Curves begin to diverge, with some decreasing rapidly towards -60 dB.
* **High Frequencies (~1 - 3 kHz):** Curves continue to diverge, with a wider range of filter responses. Some curves remain near 0 dB, while others reach -60 dB.
* There are approximately 15 curves visible, each representing a different filter response. The curves are closely spaced, particularly at lower frequencies.
### Key Observations
* The ILD chart shows a broader frequency range (up to 10 kHz) compared to the IPD chart (up to 3 kHz).
* Both ILD and IPD filter responses exhibit a frequency-dependent behavior, with greater differentiation at higher frequencies.
* The filter responses in both charts are diverse, suggesting a complex representation of auditory information.
* The curves in both charts are generally decreasing, indicating that the filter responses attenuate the signal at higher frequencies.
### Interpretation
This diagram illustrates how the auditory system processes sound to determine its location. The γ-tone filterbank decomposes the sound into different frequency components. The Hilbert transform extracts amplitude and phase information, which are then used to calculate ILD and IPD. These differences between the left and right ears provide cues for sound localization.
The ILD chart shows how the system filters different frequencies to extract level differences, which are particularly important for high-frequency sounds. The IPD chart shows how the system filters different frequencies to extract phase differences, which are particularly important for low-frequency sounds.
The diversity of filter responses suggests that the auditory system is capable of representing a wide range of sound localization cues. The frequency-dependent behavior of the filter responses indicates that the system prioritizes different cues at different frequencies. The attenuation of signals at higher frequencies may be related to the limitations of the auditory system or to the characteristics of natural sounds.
</details>
Figure 2: Frequency spectra of binaural recordings
<details>
<summary>Image 2 Details</summary>

### Visual Description
\n
## Chart: Sound Power Spectral Density in Different Environments
### Overview
The image presents three line charts, each representing the normalized power spectral density of sound as a function of frequency, measured in the left and right ears. The three environments are "Nocturnal nature", "Forrest walk", and "City center". Each chart displays two lines: one for the left ear (black) and one for the right ear (grey).
### Components/Axes
* **X-axis:** Frequency [kHz], ranging from 0 to 10 kHz.
* **Y-axis:** Normalized power, ranging from 0 to 1.
* **Legend:** Located in the top-left corner of each chart.
* Black line: "left ear"
* Grey line: "right ear"
* **Titles:** Each chart has a title indicating the environment: "Nocturnal nature", "Forrest walk", and "City center", positioned above the respective chart.
### Detailed Analysis or Content Details
**1. Nocturnal Nature:**
* **Trend:** Both lines (left and right ear) show a generally increasing trend from 0 kHz to approximately 8 kHz, followed by a decrease. There's a prominent peak around 8-9 kHz.
* **Data Points (approximate):**
* Left Ear:
* 0 kHz: ~0.1
* 2 kHz: ~0.25
* 4 kHz: ~0.4
* 6 kHz: ~0.6
* 8 kHz: ~0.8
* 10 kHz: ~0.6
* Right Ear:
* 0 kHz: ~0.1
* 2 kHz: ~0.2
* 4 kHz: ~0.35
* 6 kHz: ~0.55
* 8 kHz: ~0.75
* 10 kHz: ~0.55
**2. Forrest Walk:**
* **Trend:** Both lines show a decreasing trend across the entire frequency range. The decrease is more pronounced at lower frequencies.
* **Data Points (approximate):**
* Left Ear:
* 0 kHz: ~0.4
* 2 kHz: ~0.3
* 4 kHz: ~0.2
* 6 kHz: ~0.15
* 8 kHz: ~0.1
* 10 kHz: ~0.05
* Right Ear:
* 0 kHz: ~0.35
* 2 kHz: ~0.25
* 4 kHz: ~0.18
* 6 kHz: ~0.12
* 8 kHz: ~0.08
* 10 kHz: ~0.04
**3. City Center:**
* **Trend:** Both lines show a rapid decrease in normalized power from 0 kHz to approximately 2 kHz, then leveling off at a very low value.
* **Data Points (approximate):**
* Left Ear:
* 0 kHz: ~0.3
* 2 kHz: ~0.1
* 4 kHz: ~0.05
* 6 kHz: ~0.03
* 8 kHz: ~0.02
* 10 kHz: ~0.02
* Right Ear:
* 0 kHz: ~0.25
* 2 kHz: ~0.08
* 4 kHz: ~0.04
* 6 kHz: ~0.02
* 8 kHz: ~0.01
* 10 kHz: ~0.01
### Key Observations
* The "Nocturnal nature" environment exhibits the highest overall sound power, particularly in the higher frequency range (8-10 kHz).
* The "Forrest walk" environment has moderate sound power, decreasing steadily with increasing frequency.
* The "City center" environment has the lowest sound power, with a sharp drop-off at lower frequencies.
* In all three environments, the left and right ear measurements are relatively close, suggesting a symmetrical sound field.
### Interpretation
The charts demonstrate how the soundscape varies significantly depending on the environment. The "Nocturnal nature" environment, likely containing sounds from insects, animals, and wind, has a broader frequency spectrum and higher power, especially at higher frequencies. The "Forrest walk" environment, with sounds like rustling leaves and bird calls, has a more subdued sound profile. The "City center" environment, dominated by traffic and human activity, is characterized by low-frequency sounds that are quickly attenuated, resulting in a very low overall sound power at higher frequencies.
The similarity between the left and right ear measurements suggests that the sound sources are relatively diffuse in all three environments, meaning the sound is coming from multiple directions rather than a single, localized source. This data could be used to assess the acoustic characteristics of different environments and their potential impact on human hearing and well-being. The differences in spectral density could also be used to classify environments based on their acoustic signatures.
</details>
Figure 3: Binaural amplitude statistics. A) An exemplary plot of joint amplitude distribution in both ears B) ILD distribution for a fixed channel toghether with a Gaussian and a logistic fit C) Interaural correlations of amplitudes across frequency channels
<details>
<summary>Image 3 Details</summary>

### Visual Description
## Charts: Auditory Spatialization & Correlation Analysis
### Overview
The image presents three charts (A, B, and C) related to auditory spatialization and correlation. Chart A is a 2D heatmap showing the probability density of log-amplitude differences between the left and right ears. Chart B displays probability density distributions of interaural level differences, fitted with different curves. Chart C shows the left-right ear amplitude correlation as a function of frequency for different environments.
### Components/Axes
**Chart A: Log-Amplitude Heatmap**
* **X-axis:** Log-amplitude left ear [dB], ranging from approximately -14 to 10 dB.
* **Y-axis:** Log-amplitude right ear [dB], ranging from approximately -10 to 10 dB.
* **Color Scale:** Represents probability density, ranging from 0 (dark blue) to 0.16 (red).
* **No explicit legend labels beyond the color scale.**
**Chart B: Interaural Level Difference Distribution**
* **X-axis:** Interaural Level Difference [dB], ranging from approximately -20 to 20 dB.
* **Y-axis:** Probability density, ranging from 0 to 0.12.
* **Legend:**
* Red solid line: Raw data
* Blue dashed line: Logistic fit
* Black dotted line: Normal fit
**Chart C: Left-Right Ear Amplitude Correlation**
* **X-axis:** Frequency [kHz], ranging from 0.2 to 10 kHz (logarithmic scale).
* **Y-axis:** Left-right ear amplitude correlation, ranging from 0 to 0.8.
* **Legend:**
* Black dashed line: Nocturnal nature
* Gray solid line: Forrest walk
* Black solid line: City center
### Detailed Analysis or Content Details
**Chart A: Log-Amplitude Heatmap**
The heatmap shows a concentration of probability density around the center (approximately 0 dB for both left and right ear log-amplitude). The density decreases as you move away from the center in either direction. There's a slight elongation along the diagonal, suggesting a positive correlation between left and right ear amplitudes. The highest density appears to be around (0 dB, 0 dB).
**Chart B: Interaural Level Difference Distribution**
The raw data (red line) is a unimodal distribution, peaking around 0 dB. The Logistic fit (blue dashed line) closely follows the raw data. The Normal fit (black dotted line) is also similar, but slightly broader and less peaked. The peak probability density is approximately 0.11 for all three curves.
**Chart C: Left-Right Ear Amplitude Correlation**
* **Nocturnal nature (dashed line):** Starts at approximately 0.65 at 0.2 kHz, decreases gradually to around 0.3 at 10 kHz, with some fluctuations.
* **Forrest walk (solid gray line):** Starts at approximately 0.55 at 0.2 kHz, decreases to around 0.25 at 10 kHz, with more pronounced fluctuations than the nocturnal nature.
* **City center (solid black line):** Starts at approximately 0.4 at 0.2 kHz, decreases to around 0.15 at 10 kHz, exhibiting the most significant fluctuations.
* All three lines show a general downward trend, indicating a decrease in correlation with increasing frequency. The city center consistently has the lowest correlation across all frequencies.
### Key Observations
* Chart A suggests that equal log-amplitudes in both ears are the most probable scenario.
* Chart B shows that the interaural level difference is approximately normally distributed around 0 dB.
* Chart C demonstrates that the correlation between left and right ear amplitudes decreases with increasing frequency, and this decrease is more pronounced in noisy environments (city center).
* The city center environment exhibits the lowest left-right ear amplitude correlation across all frequencies, indicating a more diffuse sound field.
### Interpretation
The data suggests an analysis of how sound is perceived spatially. Chart A provides insight into the distribution of sound intensity differences between the ears, which is a key cue for sound localization. Chart B quantifies the distribution of interaural level differences, a crucial parameter in auditory spatial perception. Chart C reveals how environmental noise impacts the coherence of sound signals reaching each ear.
The decreasing correlation with frequency in Chart C is likely due to the increased wavelength of lower frequencies, which allows them to diffract around obstacles more easily, resulting in greater similarity between the signals reaching each ear. The lower correlation in the city center is expected, as urban environments are characterized by numerous sound reflections and a more diffuse sound field.
The logistic and normal fits in Chart B suggest that the interaural level difference can be modeled using standard statistical distributions. The close match between the raw data and the fitted curves indicates that these models are reasonably accurate representations of the underlying data. The data collectively demonstrates the complex interplay between sound intensity, frequency, and environmental context in shaping our auditory experience.
</details>
Figure 4: Interaural level difference distributions. A) Histograms plotted as a function of frequency B)
<details>
<summary>Image 4 Details</summary>

### Visual Description
## Heatmaps and Line Graphs: Spatial Hearing Analysis in Different Environments
### Overview
The image presents a spatial hearing analysis comparing three environments: Nocturnal nature, Forrest walk, and City center. It consists of three heatmaps (A) showing Interaural Level Difference (ILD) probability density as a function of frequency, and two line graphs (B & C) depicting Scale-σ and Location-μ respectively, also as a function of frequency. The line graphs compare the three environments.
### Components/Axes
* **Heatmaps (A):**
* **X-axis:** ILD [dB] (ranging approximately from -10 dB to 10 dB)
* **Y-axis:** Frequency [kHz] (ranging approximately from 0 kHz to 11 kHz)
* **Color Scale:** Prob density (ranging approximately from 0 to 0.16)
* **Environments:** Nocturnal nature, Forrest walk, City center. Each heatmap represents one environment.
* **Dashed White Lines:** Vertical lines are present in each heatmap, positioned at approximately 0 dB ILD.
* **Line Graph B:**
* **X-axis:** Frequency [kHz] (ranging approximately from 0.1 kHz to 10 kHz)
* **Y-axis:** Scale - σ (ranging approximately from 1.5 to 3)
* **Lines:** Nocturnal nature (dashed black), Forrest walk (solid gray), City center (solid black).
* **Shaded Areas:** Light gray shaded areas around each line represent uncertainty.
* **Line Graph C:**
* **X-axis:** Frequency [kHz] (ranging approximately from 0.1 kHz to 10 kHz)
* **Y-axis:** Location - μ (ranging approximately from -4 to 2)
* **Lines:** Nocturnal nature (dashed black), Forrest walk (solid gray), City center (solid black).
* **Shaded Areas:** Light gray shaded areas around each line represent uncertainty.
### Detailed Analysis or Content Details
**Heatmaps (A):**
* **Nocturnal Nature:** The heatmap shows a concentration of probability density around 0 dB ILD and frequencies between 2 kHz and 8 kHz. There is a diagonal band of higher probability density extending from the bottom-left to the top-right.
* **Forrest Walk:** Similar to Nocturnal Nature, the heatmap shows a concentration around 0 dB ILD and frequencies between 2 kHz and 8 kHz. The diagonal band is less pronounced than in Nocturnal Nature.
* **City Center:** The heatmap shows a more diffuse distribution of probability density, with a slight concentration around 0 dB ILD and frequencies between 2 kHz and 6 kHz. The diagonal band is barely visible.
**Line Graph B (Scale - σ):**
* **Nocturnal Nature (dashed black):** Starts at approximately 1.7 at 0.2 kHz, increases to approximately 2.6 at 2 kHz, then decreases slightly to approximately 2.5 at 10 kHz.
* **Forrest Walk (solid gray):** Starts at approximately 1.8 at 0.2 kHz, increases to approximately 2.7 at 2 kHz, then decreases to approximately 2.4 at 10 kHz.
* **City Center (solid black):** Starts at approximately 1.7 at 0.2 kHz, increases to approximately 2.6 at 2 kHz, then remains relatively constant at approximately 2.5 to 2.6 at 10 kHz.
**Line Graph C (Location - μ):**
* **Nocturnal Nature (dashed black):** Starts at approximately 1.2 at 0.2 kHz, decreases to approximately -1.5 at 2 kHz, then increases to approximately -0.5 at 10 kHz.
* **Forrest Walk (solid gray):** Starts at approximately 0.8 at 0.2 kHz, decreases to approximately -2.0 at 2 kHz, then increases to approximately 0.2 at 10 kHz.
* **City Center (solid black):** Starts at approximately 0.4 at 0.2 kHz, decreases to approximately -2.5 at 2 kHz, then increases to approximately 1.0 at 10 kHz.
### Key Observations
* The heatmaps suggest that in Nocturnal Nature and Forrest Walk, sounds are more likely to have an ILD close to 0 dB, indicating a more centered sound source. The City Center shows a more diffuse ILD distribution.
* The Scale-σ values are similar across all three environments, with a peak around 2 kHz.
* The Location-μ values show a clear trend of decreasing towards negative values at 2 kHz in all environments, then increasing again at higher frequencies. The City Center exhibits the most negative Location-μ value at 2 kHz.
### Interpretation
The data suggests that the spatial hearing characteristics differ significantly between the three environments. Nocturnal nature and forest walks provide a more natural and centered soundscape, as indicated by the concentration of ILD around 0 dB in the heatmaps. The city center, with its more diffuse ILD distribution, likely presents a more complex and less predictable soundscape.
The line graphs reveal that the scale and location parameters of the spatial hearing model are influenced by the environment. The similar Scale-σ values suggest that the overall precision of spatial localization is comparable across the environments. However, the differences in Location-μ values indicate that the perceived location of sound sources is shifted differently in each environment, particularly at 2 kHz. The more negative Location-μ in the city center might reflect the impact of urban noise and reverberation on spatial perception.
The dashed lines in the heatmaps, consistently positioned at 0 dB ILD, likely represent a reference point or a common auditory processing mechanism. The uncertainty represented by the shaded areas in the line graphs highlights the variability in spatial hearing perception. The overall pattern suggests that the auditory system adapts to the acoustic characteristics of the environment to optimize spatial localization.
</details>
Figure 5: Binaural phase statistics A) Exemplary joint probability distribution of monaural phases B) An IPD histogram (black line) and a fitted von-Mises distribution
<details>
<summary>Image 5 Details</summary>

### Visual Description
## Heatmap & Line Graph: Interaural Phase Difference Analysis
### Overview
The image presents two visualizations related to interaural phase difference. Panel A is a heatmap showing the probability density of phase combinations between the left and right ears. Panel B is a line graph comparing the probability density of raw data versus a Von Mises distribution fit to the data, both plotted against the Interaural Phase Difference.
### Components/Axes
**Panel A (Heatmap):**
* **X-axis:** Phase left ear [rad], ranging from -π to π.
* **Y-axis:** Phase right ear [rad], ranging from -π to π.
* **Color Scale:** Represents "prob density" (probability density), ranging from 0 (blue) to 0.07 (red).
* **Title:** A
**Panel B (Line Graph):**
* **X-axis:** Interaural Phase Difference [rad], ranging from approximately -π to π.
* **Y-axis:** Probability density, ranging from 0 to 0.5.
* **Legend:**
* Black solid line: "Raw data"
* Blue dashed line: "Von Mises"
* **Title:** B
### Detailed Analysis or Content Details
**Panel A (Heatmap):**
The heatmap displays a diagonal pattern. The highest probability density (red) is concentrated along the diagonal where the phase of the left and right ears are approximately equal. As the phase difference between the left and right ears increases (moving away from the diagonal), the probability density decreases (shifting towards blue). There are alternating red and blue bands along the diagonal, suggesting a periodic pattern in the probability density.
**Panel B (Line Graph):**
The black "Raw data" line initially rises from approximately 0 at -π, reaches a peak around 0 on the Interaural Phase Difference axis, and then declines to approximately 0 at π. The line exhibits some fluctuations. The blue dashed "Von Mises" line closely follows the "Raw data" line, but is smoother. The Von Mises line also peaks around 0, and declines to approximately 0 at π.
* **Raw Data (Black Line):**
* At -Ï€: Approximately 0.02
* At -2.5 rad: Approximately 0.08
* At -1.5 rad: Approximately 0.15
* At -0.5 rad: Approximately 0.25
* At 0 rad: Approximately 0.32
* At 0.5 rad: Approximately 0.28
* At 1.5 rad: Approximately 0.15
* At 2.5 rad: Approximately 0.08
* At π: Approximately 0.02
* **Von Mises (Blue Line):**
* At -Ï€: Approximately 0.02
* At -2.5 rad: Approximately 0.09
* At -1.5 rad: Approximately 0.17
* At -0.5 rad: Approximately 0.27
* At 0 rad: Approximately 0.34
* At 0.5 rad: Approximately 0.28
* At 1.5 rad: Approximately 0.17
* At 2.5 rad: Approximately 0.09
* At π: Approximately 0.02
### Key Observations
* The heatmap (Panel A) shows a strong correlation between the phase of the left and right ears.
* The probability density is highest when the phases are aligned.
* The Von Mises distribution (Panel B) provides a good fit to the raw data, suggesting that the Interaural Phase Difference follows this distribution.
* The raw data exhibits some noise or variability, which is smoothed out by the Von Mises distribution.
### Interpretation
The data suggests that the auditory system is most sensitive to sounds where the phase difference between the left and right ears is minimal. This is consistent with the mechanism of sound localization, where the brain uses interaural phase differences to determine the direction of a sound source. The Von Mises distribution provides a mathematical model for describing the distribution of Interaural Phase Differences, and its close fit to the raw data indicates that this model is a good representation of the underlying neural processes. The heatmap visually demonstrates the probability of different phase combinations, while the line graph quantifies the probability density of the Interaural Phase Difference. The fact that the Von Mises distribution closely matches the raw data suggests that the Interaural Phase Difference is not uniformly distributed, but rather is concentrated around a preferred value (in this case, 0). This concentration likely reflects the brain's preference for sounds originating from the midline.
</details>
Figure 6: IPD distributions. A) Histograms B) Concentration parameter κ ω as a function of frequency C) Position parameter µ ω as a function of frequency
<details>
<summary>Image 6 Details</summary>

### Visual Description
## Heatmaps and Line Graphs: Spatial Audio Analysis in Different Environments
### Overview
The image presents a comparative analysis of spatial audio characteristics across three environments: Nocturnal Nature, Forrest Walk, and City Center. It consists of three subplots: A) Heatmaps showing the distribution of Interaural Phase Difference (IPD) versus Frequency, B) Line graphs depicting Concentration (K) versus Frequency, and C) Line graphs showing Position (µ) versus Frequency. The data appears to be related to sound localization cues.
### Components/Axes
* **A) Heatmaps:**
* X-axis: Interaural Phase Difference (IPD) [rad], ranging from -π to π.
* Y-axis: Frequency [kHz], ranging from 0.2 to 3.
* Color Scale: Log probability density, ranging from -3 (blue) to 3 (red).
* Environments: Nocturnal Nature, Forrest Walk, City Center (displayed as separate heatmaps).
* **B) Concentration (K) vs. Frequency:**
* X-axis: Frequency [kHz], ranging from 0 to 3.
* Y-axis: Concentration - K, ranging from 0 to 2.
* Line Styles/Colors:
* Nocturnal Nature: Solid black line.
* Forrest Walk: Solid gray line.
* City Center: Dashed black line.
* **C) Position (µ) vs. Frequency:**
* X-axis: Frequency [kHz], ranging from 0 to 3.
* Y-axis: Position - µ, ranging from -π to π.
* Line Styles/Colors:
* Nocturnal Nature: Solid black line.
* Forrest Walk: Solid gray line.
* City Center: Dashed black line.
* Vertical dashed lines are present at approximately 0.2, 0.5, 1.5 and 2.5 kHz.
### Detailed Analysis or Content Details
**A) Heatmaps:**
* **Nocturnal Nature:** The heatmap shows a concentration of probability density around IPD = 0 for frequencies below 1.5 kHz. Above 1.5 kHz, the density is more dispersed, with some concentration at positive IPD values. The highest density appears to be around 0.5 kHz and IPD = 0, with a log prob density of approximately 2.5.
* **Forrest Walk:** Similar to Nocturnal Nature, the heatmap shows a concentration around IPD = 0 for lower frequencies (below 1.5 kHz). The density is more spread out at higher frequencies, with a slight bias towards positive IPD values. The highest density appears to be around 0.5 kHz and IPD = 0, with a log prob density of approximately 2.5.
* **City Center:** The heatmap is more diffuse than the other two. There is a weak concentration around IPD = 0 for lower frequencies, but the density is generally lower across the entire range. The highest density appears to be around 0.5 kHz and IPD = 0, with a log prob density of approximately 1.5.
**B) Concentration (K) vs. Frequency:**
* **Nocturnal Nature:** The line starts at approximately K = 1.8 at 0 kHz, decreases to approximately K = 0.5 at 1 kHz, and then remains relatively stable around K = 0.3 until 3 kHz.
* **Forrest Walk:** The line starts at approximately K = 1.5 at 0 kHz, decreases to approximately K = 0.4 at 1 kHz, and then remains relatively stable around K = 0.2 until 3 kHz.
* **City Center:** The line starts at approximately K = 0.8 at 0 kHz, decreases to approximately K = 0.2 at 1 kHz, and then remains relatively stable around K = 0.1 until 3 kHz.
**C) Position (µ) vs. Frequency:**
* **Nocturnal Nature:** The line oscillates around µ = 0. It has a peak at approximately µ = 0.8 at 0.5 kHz, a trough at approximately µ = -0.8 at 1 kHz, a peak at approximately µ = 0.6 at 1.5 kHz, and a trough at approximately µ = -0.4 at 2.5 kHz.
* **Forrest Walk:** The line oscillates around µ = 0, but with smaller amplitudes than Nocturnal Nature. It has a peak at approximately µ = 0.4 at 0.5 kHz, a trough at approximately µ = -0.4 at 1 kHz, a peak at approximately µ = 0.3 at 1.5 kHz, and a trough at approximately µ = -0.2 at 2.5 kHz.
* **City Center:** The line oscillates around µ = 0, but with a different phase and amplitude compared to the other two environments. It has a trough at approximately µ = -0.6 at 0.5 kHz, a peak at approximately µ = 0.6 at 1 kHz, a trough at approximately µ = -0.4 at 1.5 kHz, and a peak at approximately µ = 0.4 at 2.5 kHz.
### Key Observations
* The Nocturnal Nature and Forrest Walk environments exhibit similar patterns in both the heatmaps and line graphs, suggesting similar acoustic characteristics.
* The City Center environment shows a more diffuse IPD distribution and lower concentration values, indicating a more complex and less coherent soundscape.
* The Position (µ) curves show distinct oscillatory patterns for each environment, suggesting different spatial cues for sound localization.
* The vertical dashed lines in subplot C highlight specific frequencies where the Position (µ) curves exhibit notable changes.
### Interpretation
The data suggests that the spatial audio characteristics differ significantly between the three environments. Nocturnal Nature and Forrest Walk provide more coherent spatial cues (as indicated by the concentrated IPD distributions and higher concentration values), while the City Center environment presents a more diffuse and complex soundscape. The oscillatory patterns in the Position (µ) curves likely represent the variations in perceived sound source location as a function of frequency, and the differences between the environments suggest that the brain relies on different cues for sound localization in each setting. The lower concentration and more diffuse IPD in the City Center could be due to reflections, reverberation, and the presence of multiple sound sources. The data could be used to develop more realistic spatial audio rendering algorithms for virtual reality or augmented reality applications, tailored to specific environments. The anomalies in the City Center data suggest a more complex acoustic environment that requires more sophisticated modeling.
</details>
Figure 7: Proportion of IPDs exceeding the 'maximal IPD' threshold in each frequency channel
<details>
<summary>Image 7 Details</summary>

### Visual Description
\n
## Line Chart: Proportion of IPD > IPD_max vs. Frequency
### Overview
This image presents a line chart illustrating the proportion of Interaural Phase Difference (IPD) exceeding the maximum IPD (IPD_max) across different frequency ranges for three distinct environments: Nocturnal nature, Forrest walk, and City center. The chart aims to compare how sound localization cues vary in these environments.
### Components/Axes
* **X-axis:** Frequency [kHz], ranging from approximately 0.2 kHz to 1.0 kHz. The axis is labeled "Frequency [kHz]".
* **Y-axis:** Proportion of IPD > IPD_max, ranging from 0.0 to 0.45. The axis is labeled "Proportion of IPD > IPD_max".
* **Legend:** Located in the top-right corner of the chart.
* Nocturnal nature: Represented by a dashed black line.
* Forrest walk: Represented by a solid black line.
* City center: Represented by a light gray shaded area.
### Detailed Analysis
The chart displays three curves representing the proportion of IPD exceeding IPD_max for each environment as a function of frequency.
* **Nocturnal nature (dashed black line):** This line initially starts at approximately 0.36 at 0.2 kHz, rises to a peak of approximately 0.43 at around 0.38 kHz, and then declines, crossing 0.15 at approximately 0.65 kHz and approaching 0 at 0.8 kHz.
* **Forrest walk (solid black line):** This line begins at approximately 0.28 at 0.2 kHz, gradually decreases to approximately 0.18 at 0.4 kHz, and then rapidly declines, reaching approximately 0.05 at 0.6 kHz and approaching 0 at 0.7 kHz.
* **City center (light gray shaded area):** This area starts at approximately 0.30 at 0.2 kHz, remains relatively stable until approximately 0.4 kHz, and then gradually decreases, reaching approximately 0.10 at 0.6 kHz and approaching 0 at 0.8 kHz. The shaded area represents a range of values, indicating variability within the City center environment.
### Key Observations
* The Nocturnal nature environment exhibits the highest proportion of IPD > IPD_max, particularly in the lower frequency range (0.2 - 0.4 kHz).
* The Forrest walk environment consistently shows the lowest proportion of IPD > IPD_max across all frequencies.
* The City center environment falls between the Nocturnal nature and Forrest walk environments, with a relatively stable proportion in the lower frequencies and a gradual decline as frequency increases.
* All three environments show a general trend of decreasing proportion of IPD > IPD_max as frequency increases.
### Interpretation
The data suggests that sound localization cues, as indicated by the proportion of IPD exceeding IPD_max, are most prominent in the Nocturnal nature environment and least prominent in the Forrest walk environment. This could be due to differences in sound reflection and absorption characteristics of these environments. The City center environment, with its complex soundscape, exhibits intermediate characteristics.
The decline in IPD proportion with increasing frequency across all environments is consistent with the physics of sound wave propagation. Higher frequencies have shorter wavelengths and are less susceptible to interaural time differences, which are the basis of IPD.
The variability represented by the shaded area for the City center environment likely reflects the diverse sound sources and acoustic conditions present in urban settings. This data could be used to inform the design of virtual auditory environments or to understand how humans perceive sound in different real-world settings. The peak in Nocturnal nature suggests a specific frequency range where sound localization is most effective in that environment.
</details>
Figure 8: Self speech separation using single channel IPDs. A) An exemplary IPD distribution in the forrest walk scene B) Classification results
<details>
<summary>Image 8 Details</summary>

### Visual Description
\n
## Chart/Diagram Type: Auditory Signal Analysis - IPD Distribution & Waveforms
### Overview
The image presents two panels (A and B) related to auditory signal processing. Panel A displays a probability density function (PDF) of the Interaural Phase Difference (IPD) for a 561 Hz tone, showing the distribution for a mixture signal and its two components. Panel B shows the waveforms of the signal in the right and left ears over time.
### Components/Axes
**Panel A:**
* **Title:** "561 [Hz]" - indicating the frequency of the analyzed tone.
* **X-axis:** "IPD[rad]" - Interaural Phase Difference in radians, ranging from approximately -π to π.
* **Y-axis:** "probability density" - ranging from 0 to 0.4.
* **Legend:**
* "mixture" - represented by a solid black line and filled grey area.
* "comp 1" - represented by a dashed red line.
* "comp 2" - represented by a dashed blue line.
**Panel B:**
* **X-axis:** "Time [s]" - Time in seconds, ranging from 0 to 2.
* **Y-axis:** No explicit label, but represents the amplitude of the signal.
* **Labels:**
* "Right ear" - above the top waveform plot.
* "Left ear" - above the bottom waveform plot.
* **Waveforms:**
* Grey waveforms representing the auditory signal.
* Red lines overlaid on the waveforms, likely representing an envelope or another processed signal.
### Detailed Analysis or Content Details
**Panel A: IPD Distribution**
* **Mixture (Black):** The IPD distribution for the mixture signal is approximately Gaussian-shaped, peaking near 0 rad. The maximum probability density is approximately 0.33. The distribution extends from approximately -π to π, with a slight asymmetry.
* **Comp 1 (Red):** The IPD distribution for component 1 is a single peak centered around approximately -0.5 rad. The maximum probability density is approximately 0.18.
* **Comp 2 (Blue):** The IPD distribution for component 2 is a single peak centered around approximately 0.5 rad. The maximum probability density is approximately 0.11.
* The combined distributions of Comp 1 and Comp 2 create the mixture distribution.
**Panel B: Waveforms**
* **Right Ear:** The waveform shows a series of peaks and troughs, indicating a periodic signal. The red line appears to follow the general envelope of the waveform, with some deviations. The waveform appears to be a complex signal with multiple components.
* **Left Ear:** The waveform is similar to the right ear, but with a phase shift. The red line again follows the envelope, but with similar deviations. The phase shift is visually apparent by the offset of the peaks and troughs compared to the right ear waveform.
### Key Observations
* The IPD distribution of the mixture signal is centered around 0 rad, suggesting that the signal is largely in-phase between the two ears.
* The two components (Comp 1 and Comp 2) have IPDs shifted to the left and right, respectively. This suggests that these components are spatially separated.
* The waveforms in the right and left ears are similar but phase-shifted, consistent with the IPD information.
* The red lines in Panel B do not appear to be a simple envelope, as they deviate from the peak amplitudes in several places.
### Interpretation
The data suggests an analysis of a sound source composed of two distinct components. The IPD distributions indicate that these components originate from different spatial locations. The mixture signal's IPD distribution reflects the combination of these two components. The waveforms in Panel B confirm the phase difference between the ears, which is the basis for spatial hearing. The red lines in Panel B could represent a processed signal, such as a smoothed version of the waveform or a different frequency component. The fact that the red line doesn't perfectly track the waveform suggests a more complex relationship than a simple envelope. This could be related to the decomposition of the signal into its components, as suggested by the IPD analysis. The analysis is likely related to sound localization or binaural hearing research.
</details>
Figure 9: Independent components of natural binaural sounds. A) Explanation of the ICA model. Coefficients s i are assumed to be sparse and independent. B) Exemplary ICA basis functions from each recorded scene.
<details>
<summary>Image 9 Details</summary>

### Visual Description
\n
## Diagram: Soundscape Decomposition
### Overview
The image presents a diagram illustrating the decomposition of a complex soundscape into its constituent sound sources. Part A shows a mathematical representation of this decomposition, while Part B displays example waveforms for different environments: nocturnal nature, a forest walk, and a city center. Waveforms are shown for both the left and right ears.
### Components/Axes
* **Part A:** A mathematical equation representing soundscape decomposition. `s1` through `sn` represent individual sound sources. The 'X' symbol appears to represent a summation or combination of these sources.
* **Part B:** Three columns representing different environments: "Nocturnal nature", "Forrest walk", and "City center". Each column contains four rows of waveforms, presumably representing different sound sources or time segments.
* **Time Scale:** A horizontal bar labeled "8.7 ms" indicates the time scale for the waveforms.
* **Ear Labels:** Labels "Left ear" and "Right ear" are positioned below the time scale, indicating which waveforms correspond to each ear.
* **Waveform Color:** Red waveforms are consistently used throughout the diagram. Black waveforms are also used throughout the diagram.
### Detailed Analysis or Content Details
**Part A:**
The equation shows a complex soundscape being represented as the sum of individual sound sources (s1 to sn). The equation is: `waveform = s1 + s2 + ... + sn`.
**Part B:**
* **Nocturnal Nature:**
* Top Row (Red): A relatively slow, sinusoidal waveform with a consistent amplitude.
* Second Row (Black): A more complex waveform with higher frequency components and varying amplitude.
* Third Row (Red): A series of short, sharp pulses.
* Fourth Row (Black): A complex waveform with a mix of frequencies and amplitudes.
* **Forrest Walk:**
* Top Row (Red): A slow, sinusoidal waveform with a smaller amplitude than the nocturnal nature example.
* Second Row (Black): A waveform with a few distinct peaks and troughs.
* Third Row (Red): A series of short, sharp pulses, similar to the nocturnal nature example but less frequent.
* Fourth Row (Black): A complex waveform with a mix of frequencies and amplitudes.
* **City Center:**
* Top Row (Red): A fast, sinusoidal waveform with a relatively high frequency.
* Second Row (Black): A highly complex waveform with a very high frequency and amplitude.
* Third Row (Red): A series of short, sharp pulses, more frequent than in the other environments.
* Fourth Row (Black): A very complex waveform with a high frequency and amplitude.
The waveforms in each environment appear to be paired, with the top two rows being red and the bottom two rows being black. This suggests a distinction between different types of sound sources or processing stages.
### Key Observations
* The complexity of the waveforms increases from nocturnal nature to forest walk to city center. This suggests that the soundscapes become more diverse and chaotic in more urban environments.
* The presence of sharp pulses (red waveforms in the third row) is consistent across all environments, but their frequency varies.
* The waveforms for the left and right ears appear to be similar within each environment, suggesting that the sound sources are relatively symmetrical.
* The time scale (8.7 ms) provides a reference for the duration of the waveforms.
### Interpretation
This diagram demonstrates a method for decomposing complex soundscapes into their constituent sound sources. The mathematical equation in Part A provides a conceptual framework for this decomposition, while Part B illustrates how this framework can be applied to real-world environments. The different waveforms observed in each environment reflect the unique acoustic characteristics of that environment.
The increasing complexity of the waveforms from nocturnal nature to city center suggests that urban environments are characterized by a greater diversity of sound sources and more rapid changes in sound intensity. The presence of sharp pulses in all environments may represent transient sounds such as impacts or clicks.
The diagram highlights the importance of considering both the frequency and amplitude of sound waves when analyzing soundscapes. The different waveforms observed in each environment provide valuable information about the acoustic properties of that environment and can be used to inform soundscape design and management. The use of red and black waveforms may indicate different processing stages or types of sound sources, but further information would be needed to confirm this interpretation. The diagram is a conceptual illustration and does not provide quantitative data. It is a qualitative representation of soundscape decomposition.
</details>
Figure 10: Independent components plotted on a time frequency plane. Rows correspond to auditory scenes. Columns to ears. Shapes of the same color form a single independent component.
<details>
<summary>Image 10 Details</summary>

### Visual Description
## Heatmap: Soundscape Frequency Analysis by Ear and Environment
### Overview
This image presents a 3x2 grid of heatmaps visualizing soundscape frequency analysis. Each heatmap represents a different environment (Nocturnal nature, Forest walk, City center) and is split into two panels: one for the "Left ear" and one for the "Right ear". The heatmaps display frequency (in kHz) on the y-axis and time (in ms) on the x-axis, with color representing the intensity of sound at that frequency and time.
### Components/Axes
* **X-axis:** Time [ms], ranging from 0 to 8.7 ms.
* **Y-axis:** Frequency [kHz], ranging from 0 to 7.1 kHz.
* **Color Scale:** A diverging color scale is used, with red representing higher sound intensity, blue representing lower sound intensity, and shades of orange and cyan representing intermediate intensities.
* **Labels:**
* "Left ear" (top-left of the left column)
* "Right ear" (top-left of the right column)
* "Nocturnal nature" (top-right of the left column)
* "Forest walk" (center-right of the left column)
* "City center" (bottom-right of the left column)
* **Grid Structure:** 3 rows (environments) x 2 columns (ears).
### Detailed Analysis or Content Details
**1. Nocturnal Nature:**
* **Left Ear:** A relatively sparse distribution of sound. Higher frequencies (above 3.5 kHz) show some scattered red areas, indicating occasional bursts of higher-intensity sound. The lower frequencies (0-2 kHz) are predominantly blue, indicating low intensity. There's a slight upward trend in intensity towards 8.7ms.
* **Right Ear:** Similar to the left ear, but with a more pronounced concentration of red areas in the 2-4 kHz range around 4.3ms. Lower frequencies are also predominantly blue.
**2. Forest Walk:**
* **Left Ear:** More consistent sound intensity across frequencies compared to Nocturnal Nature. A significant band of orange/red color is visible between 2-5 kHz, particularly around 4.3ms. Lower frequencies remain predominantly blue.
* **Right Ear:** Similar pattern to the left ear, but with a more defined band of orange/red between 2-5 kHz. The intensity appears slightly higher overall compared to the left ear.
**3. City Center:**
* **Left Ear:** High sound intensity across almost all frequencies and time points. Predominantly red and orange, indicating a consistently loud environment. There's a slight increase in intensity towards 8.7ms.
* **Right Ear:** Similar to the left ear, with high sound intensity across all frequencies. The distribution appears slightly more uniform, with less distinct banding than in the Forest Walk environment.
### Key Observations
* **Environmental Differences:** The City Center exhibits the highest overall sound intensity, followed by the Forest Walk, and then Nocturnal Nature.
* **Frequency Distribution:** The Forest Walk shows a concentration of sound in the 2-5 kHz range, potentially indicating bird song or rustling leaves.
* **Ear Differences:** There are subtle differences in sound intensity between the left and right ears, suggesting directional sound sources. The right ear often shows slightly higher intensity, but this varies by environment.
* **Temporal Trends:** In some environments (Nocturnal Nature, City Center), there's a slight increase in sound intensity towards the end of the time window (8.7ms).
### Interpretation
This data suggests a clear relationship between the environment and the soundscape characteristics. The Nocturnal Nature environment is the quietest, with sound primarily concentrated in lower frequencies. The Forest Walk environment is characterized by a more balanced frequency distribution, with a notable presence of mid-range frequencies. The City Center environment is the loudest, with sound present across the entire frequency spectrum.
The subtle differences between the left and right ears indicate that the soundscape is not uniform and that sound sources are likely directional. The slight increase in sound intensity towards the end of the time window in some environments could indicate a change in the soundscape over time, such as an approaching sound source or a gradual increase in ambient noise.
The heatmaps provide a visual representation of the complexity of soundscapes and how they vary depending on the environment. This type of analysis could be used to assess noise pollution, study animal communication, or design more effective noise cancellation systems. The diverging color scheme effectively highlights the relative intensity of sound at different frequencies and time points, making it easy to identify patterns and trends.
</details>
Figure 11: Peak frequencies of IC monaural parts plotted against each other. Colors encode the Peak Power Ratio
<details>
<summary>Image 11 Details</summary>

### Visual Description
## Scatter Plots: Peak Frequency Ratio by Environment
### Overview
The image presents three scatter plots, each representing a different sound environment: "Nocturnal nature", "Forrest walk", and "City center". Each plot visualizes the relationship between peak frequency detected in the left ear versus the peak frequency detected in the right ear, with data points colored according to the "peak ratio [dB]". A dashed black line representing the equality line (left ear frequency = right ear frequency) is overlaid on each plot.
### Components/Axes
Each plot shares the following components:
* **X-axis:** "Peak freq - left ear [kHz]", ranging from approximately 0.2 to 7.5 kHz.
* **Y-axis:** "Peak freq - right ear [kHz]", ranging from approximately 0.2 to 7.5 kHz.
* **Color Scale/Legend:** Located at the bottom-center of the image, the legend represents "peak ratio [dB]", with a gradient from blue (-15 dB) to red (15 dB).
* **Title:** Each plot has a title indicating the environment: "Nocturnal nature", "Forrest walk", and "City center", positioned at the top-center.
* **Equality Line:** A dashed black line is present in each plot, running diagonally from the bottom-left to the top-right, representing where the peak frequency in the left ear equals the peak frequency in the right ear.
### Detailed Analysis or Content Details
**Plot 1: Nocturnal Nature**
* **Trend:** The data points generally cluster around the equality line, but with a noticeable spread. There's a slight tendency for points to fall *below* the line at lower frequencies (left ear > right ear).
* **Data Points:**
* Numerous blue points (peak ratio ~ -15 dB) are concentrated around (0.2 kHz, 0.2 kHz) to (2 kHz, 2 kHz).
* Points transition through lighter blues, then to neutral colors around the equality line.
* Red points (peak ratio ~ 15 dB) are scattered, primarily between (2 kHz, 3 kHz) and (6 kHz, 7 kHz).
* Approximate data points: (0.5, 0.5) - blue, (2, 1.5) - neutral, (4, 5) - neutral, (6, 6.5) - red.
**Plot 2: Forrest Walk**
* **Trend:** Similar to "Nocturnal nature", the data points are clustered around the equality line, but with a wider spread. There's a more pronounced tendency for points to fall *below* the line at lower frequencies.
* **Data Points:**
* Blue points are concentrated around (0.2 kHz, 0.2 kHz) to (3 kHz, 3 kHz).
* A larger number of points are scattered *below* the equality line compared to "Nocturnal nature".
* Red points are scattered, primarily between (3 kHz, 4 kHz) and (7 kHz, 7.5 kHz).
* Approximate data points: (0.5, 0.5) - blue, (2, 1.5) - neutral, (4, 5) - neutral, (6, 6.5) - red.
**Plot 3: City Center**
* **Trend:** The data points are more dispersed and show a stronger tendency to fall *below* the equality line, especially at lower frequencies. The spread is significantly wider than in the other two environments.
* **Data Points:**
* Blue points are concentrated around (0.2 kHz, 0.2 kHz) to (2 kHz, 2 kHz).
* A substantial number of points are scattered *below* the equality line.
* Red points are scattered, primarily between (2 kHz, 3 kHz) and (6 kHz, 6.5 kHz).
* Approximate data points: (0.5, 0.5) - blue, (2, 1.5) - neutral, (4, 5) - neutral, (6, 6.5) - red.
### Key Observations
* The "City center" environment exhibits the greatest deviation from the equality line, suggesting a larger difference in peak frequencies detected between the left and right ears.
* The "Nocturnal nature" environment shows the least deviation, indicating a more balanced frequency distribution between the ears.
* The color distribution (peak ratio) appears to be correlated with the position relative to the equality line. Points below the line tend to be blue (negative peak ratio), while points above the line tend to be red (positive peak ratio).
* The spread of data points increases from "Nocturnal nature" to "Forrest walk" to "City center", indicating greater variability in peak frequencies in more complex sound environments.
### Interpretation
These scatter plots likely represent the spatial hearing characteristics in different acoustic environments. The peak frequency ratio indicates the difference in the dominant frequencies perceived by each ear.
* **Nocturnal nature:** The close clustering around the equality line suggests a relatively symmetrical sound field, with sounds arriving at both ears with similar frequencies. This could be due to the absence of strong directional sound sources or reflections.
* **Forrest walk:** The wider spread and slight downward trend suggest that sounds are arriving at the ears with slightly different frequencies, potentially due to the complex soundscape of a forest (e.g., rustling leaves, bird calls from different directions).
* **City center:** The significant deviation from the equality line and the large spread indicate a highly asymmetrical sound field. This is likely due to the presence of numerous sound sources (traffic, construction, people) and strong reflections from buildings, creating significant interaural differences in frequency.
The plots demonstrate how the acoustic environment influences the perceived frequency distribution between the ears. The "peak ratio" metric provides a quantitative measure of this asymmetry, which could be relevant for understanding sound localization, spatial awareness, and the impact of noise pollution on auditory perception. The dashed line serves as a baseline for comparison, highlighting the degree to which the sound environment alters the symmetry of frequency perception.
</details>
Figure 12: Proportion of Independent Components with the same frequency peak in each ear
<details>
<summary>Image 12 Details</summary>

### Visual Description
\n
## Bar Chart: Number of Basis Functions by Environment
### Overview
This is a bar chart comparing the number of basis functions, categorized as "on diagonal" and "off diagonal", across three different environments: "Nocturnal nature", "Forrest walk", and "City center". The y-axis represents the "Number of basis functions", ranging from 0 to 300, while the x-axis represents the environment.
### Components/Axes
* **X-axis:** Environment (Nocturnal nature, Forrest walk, City center)
* **Y-axis:** Number of basis functions (Scale: 0 to 300, increments of 50)
* **Legend:**
* Black: "on diagonal"
* Gray: "off diagonal"
* **Chart Title:** Not explicitly present, but the chart represents a comparison of basis functions.
### Detailed Analysis
The chart consists of three groups of stacked bars, one for each environment. Each bar is divided into two sections representing "on diagonal" and "off diagonal" basis functions.
* **Nocturnal nature:**
* "on diagonal": Approximately 160 basis functions.
* "off diagonal": Approximately 110 basis functions.
* Total: Approximately 270 basis functions.
* **Forrest walk:**
* "on diagonal": Approximately 60 basis functions.
* "off diagonal": Approximately 180 basis functions.
* Total: Approximately 240 basis functions.
* **City center:**
* "on diagonal": Approximately 100 basis functions.
* "off diagonal": Approximately 170 basis functions.
* Total: Approximately 270 basis functions.
### Key Observations
* "Nocturnal nature" and "City center" have similar total numbers of basis functions (around 270).
* "Forrest walk" has a significantly lower total number of basis functions (around 240).
* The proportion of "off diagonal" basis functions is highest in "Forrest walk", comprising roughly 75% of the total.
* The proportion of "on diagonal" basis functions is highest in "Nocturnal nature", comprising roughly 59% of the total.
### Interpretation
The data suggests that the distribution of basis functions differs significantly across the three environments. The higher number of "on diagonal" basis functions in "Nocturnal nature" might indicate a more structured or predictable signal in that environment. Conversely, the higher number of "off diagonal" basis functions in "Forrest walk" could suggest a more complex or less predictable signal. The "City center" environment shows a balance between "on" and "off" diagonal basis functions, potentially reflecting a mix of structured and unstructured signals.
The concept of "on diagonal" and "off diagonal" basis functions is not explicitly defined in the image, but it likely relates to some form of signal decomposition or feature extraction. The environments could be influencing the characteristics of the signals being analyzed, leading to the observed differences in basis function distribution. Further context would be needed to fully understand the meaning of these findings.
</details>
Figure 13: Binaural cues represented by ICs capturing the same frequency in each ear. A) IPD as a function of frequency B) ILD as a function of frequency.
<details>
<summary>Image 13 Details</summary>

### Visual Description
## Scatter Plots: Interaural Phase Difference (IPD) and Interaural Level Difference (ILD) vs. Peak Frequency
### Overview
The image presents six scatter plots arranged in a 2x3 grid. The plots visualize the relationship between peak frequency (on the x-axis) and either Interaural Phase Difference (IPD) or Interaural Level Difference (ILD) (on the y-axis) for three different acoustic environments: Nocturnal nature, Forrest walk, and City center. Each environment is represented by two plots – one for IPD and one for ILD. A dashed grey line is overlaid on each of the top row plots.
### Components/Axes
* **X-axis (all plots):** Peak freq - left [kHz]. Scale ranges from 0 to 3 kHz for the top row plots (IPD) and 0 to 7.5 kHz for the bottom row plots (ILD).
* **Y-axis (top row plots):** IPD [rad]. Scale ranges from approximately -π to π.
* **Y-axis (bottom row plots):** ILD [dB]. Scale ranges from approximately -1.2 to 12 dB.
* **Titles (top row):** Nocturnal nature, Forrest walk, City center.
* **Labels (left side):** A) and B) to distinguish the IPD and ILD plots.
* **Data Points:** Black dots representing individual data points.
* **Overlaid Line:** Dashed grey line present in all three IPD plots.
### Detailed Analysis or Content Details
**A) IPD vs. Peak Frequency**
* **Nocturnal Nature:** The data points generally cluster between 0.2 and 2.5 kHz. The trend shows a positive correlation between peak frequency and IPD up to approximately 1.5 kHz, after which the IPD values become more scattered and decrease. Approximate data points: (0.3 kHz, 0.3 rad), (1.0 kHz, 1.5 rad), (2.0 kHz, 0.1 rad).
* **Forrest Walk:** Similar to Nocturnal Nature, the data points are concentrated between 0.2 and 2.5 kHz. The trend also shows a positive correlation between peak frequency and IPD up to approximately 1.5 kHz, followed by more scattered data. Approximate data points: (0.4 kHz, 0.4 rad), (1.2 kHz, 1.7 rad), (2.2 kHz, 0.2 rad).
* **City Center:** Data points are distributed between 0.2 and 3.0 kHz. The positive correlation between peak frequency and IPD is observed up to approximately 1.5 kHz, but the scatter is more pronounced than in the other two environments. Approximate data points: (0.5 kHz, 0.5 rad), (1.4 kHz, 1.8 rad), (2.8 kHz, -0.1 rad). The dashed grey line appears to approximate the trend of the data.
**B) ILD vs. Peak Frequency**
* **Nocturnal Nature:** Data points are spread between 0.5 and 6.5 kHz. There's a slight positive correlation between peak frequency and ILD up to approximately 2 kHz, after which the ILD values become more variable. Approximate data points: (1.0 kHz, 1.0 dB), (3.0 kHz, 2.0 dB), (5.0 kHz, 0.5 dB).
* **Forrest Walk:** Data points are distributed between 0.5 and 7.0 kHz. A similar trend to Nocturnal Nature is observed, with a slight positive correlation up to approximately 2 kHz. Approximate data points: (0.8 kHz, 0.8 dB), (2.5 kHz, 2.5 dB), (6.0 kHz, 0.2 dB).
* **City Center:** Data points are spread between 0.5 and 7.5 kHz. A positive correlation between peak frequency and ILD is visible up to approximately 3 kHz, followed by a decrease in ILD. Approximate data points: (1.2 kHz, 2.0 dB), (3.5 kHz, 4.0 dB), (6.5 kHz, -0.5 dB).
### Key Observations
* The IPD plots all exhibit a similar trend of increasing IPD with increasing peak frequency up to a certain point, followed by a decrease or increased scatter.
* The ILD plots show a more subtle positive correlation between peak frequency and ILD, with more variability in the data.
* The City Center environment appears to have the most scattered data points in both IPD and ILD plots, suggesting a more complex acoustic environment.
* The dashed grey line in the IPD plots seems to represent a general trend of the data, but doesn't perfectly fit all data points.
### Interpretation
The data suggests that the acoustic environments influence the relationship between peak frequency and interaural cues (IPD and ILD). The consistent trend in the IPD plots indicates that lower frequencies are generally associated with larger interaural phase differences, which is consistent with the physics of sound wave propagation. The differences between the environments suggest that the complexity of the soundscape affects the precision of these cues. The City Center, with its more scattered data, likely has more reflections and reverberations, making it harder to determine the precise location of sound sources. The ILD plots, while showing a less pronounced trend, suggest that higher frequencies are associated with larger interaural level differences, which is also expected due to the head-shadow effect. The overlaid dashed line in the IPD plots could represent a model or expectation of the IPD-frequency relationship, against which the observed data is compared. The differences between the observed data and the model could reveal insights into the specific acoustic characteristics of each environment.
</details>