# Binaural LCMV Beamforming with Partial Noise Estimation
**Authors**: Nico Gößling, Elior Hadad, Sharon Gannot, Simon Doclo
## Binaural LCMV Beamforming with Partial Noise Estimation
Nico G¨ oßling, Student Member, IEEE, Elior Hadad, Member, IEEE, Sharon Gannot, Senior Member, IEEE and Simon Doclo, Senior Member, IEEE
Abstract -Besides reducing undesired sources, i.e., interfering sources and background noise, another important objective of a binaural beamforming algorithm is to preserve the spatial impression of the acoustic scene, which can be achieved by preserving the binaural cues of all sound sources. While the binaural minimum variance distortionless response (BMVDR) beamformer provides a good noise reduction performance and preserves the binaural cues of the desired source, it does not allow to control the reduction of the interfering sources and distorts the binaural cues of the interfering sources and the background noise. Hence, several extensions have been proposed. First, the binaural linearly constrained minimum variance (BLCMV) beamformer uses additional constraints, enabling to control the reduction of the interfering sources while preserving their binaural cues. Second, the BMVDR with partial noise estimation (BMVDR-N) mixes the output signals of the BMVDR with the noisy reference microphone signals, enabling to control the binaural cues of the background noise. Aiming at merging the advantages of both extensions, in this paper we propose the BLCMV with partial noise estimation (BLCMV-N). We show that the output signals of the BLCMV-N can be interpreted as a mixture between the noisy reference microphone signals and the output signals of a BLCMV using an adjusted interference scaling parameter. We provide a theoretical comparison between the BMVDR, the BLCMV, the BMVDR-N and the proposed BLCMV-N in terms of noise and interference reduction performance and binaural cue preservation. Experimental results using recorded signals as well as the results of a perceptual listening test show that the BLCMV-N is able to preserve the binaural cues of an interfering source (like the BLCMV), while enabling to trade off between noise reduction performance and binaural cue preservation of the background noise (like the BMVDR-N).
Index Terms -Binaural cues, binaural noise reduction, MVDR beamformer, LCMV beamformer, hearing devices
## I. INTRODUCTION
B EAMFORMING algorithms for head-mounted assistive hearing devices (e.g., hearing aids, earbuds and hearables) are crucial to improve speech quality and speech intelligibility in noisy acoustic environments. Assuming a binaural configuration where both devices exchange their microphone signals, the information captured by all microphones on both
This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - Project ID 352015383 (SFB 1330 B2) and Project ID 390895286 (EXC 2177/1) and the Israeli Ministry of Science and Technology, #88962, 2019.
E. Hadad and S. Gannot are with the Faculty of Engineering, BarIlan University, Ramat-Gan, 5290002, Israel (e-mail: elior.hadad@biu.ac.il; sharon.gannot@biu.ac.il).
N. G¨ oßling and S. Doclo are with the Department of Medical Physics and Acoustics and the Cluster of Excellence Hearing4all, University of Oldenburg, 26111 Oldenburg, Germany (e-mail: nico.goessling@uol.de; simon.doclo@uol.de).
sides of the head can be exploited [1]-[3]. Besides reducing interfering sources (e.g., competing speakers) and background noise (e.g., diffuse babble noise), another important objective of a binaural beamforming algorithm is the preservation of the listener's spatial impression of the acoustic scene. This can be achieved by preserving the binaural cues of all sound sources, i.e., the interaural level difference (ILD) and the interaural time difference (ITD) for coherent sources (desired source and interfering sources) and the interaural coherence (IC) for incoherent sound fields (background noise) [4]. Binaural cues play a major role for spatial perception, i.e., to localize sound sources and to determine the spatial width or diffuseness of a sound field [5], and are very important for speech intelligibility due to so-called binaural unmasking [6], [7].
Unlike monaural beamforming algorithms, binaural beamforming algorithms need to generate two output signals (i.e., one for each ear), hence typically processing all available microphone signals from both devices by two different spatial filters [8]-[19]. A frequently used binaural beamforming algorithm is the binaural minimum variance distortionless response (BMVDR) beamformer, which aims at minimizing the power spectral density (PSD) of the noise component in the output signals while preserving the desired source component in the reference microphone signals on the left and the right device [2], [3], [11]. While the BMVDR provides a good noise reduction performance and preserves the binaural cues of the desired source, it does not allow to control the reduction of the interfering sources and distorts the binaural cues of the undesired sources (interfering sources and background noise). More specifically, after applying the BMVDR the binaural cues of the undesired sources are equal to the binaural cues of the desired source, such that all sources are perceived as coming from the same direction, which is obviously undesired. Hence, several extensions of the BMVDR have been proposed. On the one hand, the binaural linearly constrained minimum variance (BLCMV) beamformer uses additional interference reduction constraints, enabling to control the reduction of the interfering sources while preserving the binaural cues of the interfering sources in addition to the desired source by means of interference scaling parameters [12], [14], [17], [20]. However, due to the additional constraints there are less degrees of freedom available for noise reduction, such that the noise reduction performance for the BLCMV is lower than for the BMVDR. Furthermore, it is not possible to explicitly trade off between noise reduction performance and binaural cue preservation of the background noise. On the other hand, the BMVDR with partial noise estimation (BMVDR-N) aims
for the noise component in the output signals to be equal to a scaled version of the noise component in the reference microphone signals while preserving the desired source component in the reference microphone signals [3], [10], [11], [16]. It has been shown that the output signals of the BMVDR-N can be interpreted as a mixture between the output signals of the BMVDR and the noisy reference microphone signals, i.e., the BMVDR-N provides a trade-off between noise reduction performance and binaural cue preservation of the background noise. While for (incoherent) background noise the BMVDRN showed promising results [16], [21], the effect of partial noise estimation on a (coherent) interfering source strongly depends on the position of the interfering source relative to the desired source and is harder to control [11].
Aiming at merging the advantages of the BLCMV and the BMVDR-N, i.e., preserving the binaural cues of the interfering sources and controlling the reduction of the interfering sources as well as the binaural cues of the background noise, in this paper we propose the BLCMV with partial noise estimation (BLCMV-N). First, we derive two decompositions for the BLCMV-N which reveal differences and similarities between the BLCMV-N and the BLCMV. We show that the output signals of the BLCMV-N can be interpreted as a mixture between the noisy reference microphone signals and the output signals of a BLCMV using an adjusted interference scaling parameter. We then analytically derive the performance of the BLCMV-N in terms of noise and interference reduction performance and binaural cue preservation. We show that the output signal-to-noise ratio (SNR) of the BLCMV-N is smaller than or equal to the output SNR of the BLCMV and derive the optimal interference scaling parameter maximizing the output SNR of the BLCMV-N. The derived analytical expressions are first validated using measured anechoic acoustic transfer functions (ATFs). In addition, more realistic experiments are performed using recorded signals for a binaural hearing device in a reverberant cafeteria with one interfering source and multitalker babble noise. Both the objective performance measures as well as the results of a perceptual listening test with 13 normal-hearing participants show that the proposed BLCMVN is able to preserve the binaural cues and hence the spatial impression of the interfering source (like the BLCMV), while trading off between noise reduction performance and binaural cue preservation of the background noise (like the BMVDRN).
The remainder of this paper is organized as follows. In Section II we introduce the considered binaural hearing device configuration and the used objective performance measures. In Section III we briefly review several binaural beamforming algorithms, namely the BMVDR, the BLCMV and the BMVDR-N. In Section IV we present the BLCMV-N and derive two decompositions. In Section V we provide a detailed theoretical analysis of the proposed BLCMV-N in terms of noise and interference reduction performance and binaural cue preservation. In Section VI we first validate the analytical expressions using anechoic ATFs, followed by simulations and a perceptual listening test using realistic recordings in a reverberant room.
Fig. 1. Binaural hearing device configuration with M L microphones on the left side and M R microphones the right side.
<details>
<summary>Image 1 Details</summary>

### Visual Description
\n
## Diagram: Bidirectional Processing Architecture
### Overview
The image depicts a diagram illustrating a bidirectional processing architecture, likely within a neural network or similar computational model. It shows two processing blocks, labeled W<sub>L</sub> and W<sub>R</sub>, receiving input from a sequence of data points (y<sub>1</sub> to y<sub>M<sub>L</sub></sub>) and producing outputs (z<sub>L</sub> and z<sub>R</sub>). The diagram highlights connections between the input sequence and both processing blocks, as well as connections between the blocks themselves via dashed lines.
### Components/Axes
The diagram consists of the following components:
* **Input Sequence:** Represented by 'y<sub>1</sub>' through 'y<sub>M<sub>L</sub></sub>', indicating a sequence of M<sub>L</sub> input values.
* **Processing Block W<sub>L</sub>:** A rectangular block labeled "W<sub>L</sub>", representing a processing unit.
* **Processing Block W<sub>R</sub>:** A rectangular block labeled "W<sub>R</sub>", representing another processing unit.
* **Output z<sub>L</sub>:** The output of processing block W<sub>L</sub>.
* **Output z<sub>R</sub>:** The output of processing block W<sub>R</sub>.
* **Solid Lines:** Represent direct connections between the input sequence and the processing blocks.
* **Dashed Lines:** Represent connections between the input sequence and the processing blocks, potentially indicating a different type of connection or a later stage of processing.
* **y<sub>M<sub>L</sub>+1</sub> to y<sub>M<sub>L</sub>+M<sub>R</sub></sub>:** An extension of the input sequence, indicating additional input values.
### Detailed Analysis or Content Details
The diagram shows the following connections:
* Each element of the input sequence (y<sub>1</sub> to y<sub>M<sub>L</sub></sub>) is connected to both processing blocks W<sub>L</sub> and W<sub>R</sub> via solid lines.
* The input sequence is also connected to both processing blocks via dashed lines. The dashed lines originate from y<sub>1</sub> to y<sub>M<sub>L</sub></sub> and connect to y<sub>M<sub>L</sub>+1</sub> to y<sub>M<sub>L</sub>+M<sub>R</sub></sub>.
* Processing block W<sub>L</sub> produces output z<sub>L</sub>.
* Processing block W<sub>R</sub> produces output z<sub>R</sub>.
* The diagram suggests a bidirectional flow of information, as both processing blocks receive input from the entire sequence.
### Key Observations
* The diagram illustrates a parallel processing structure with two distinct processing blocks.
* The use of both solid and dashed lines suggests different types of connections or processing stages.
* The presence of M<sub>L</sub> and M<sub>R</sub> indicates that the input sequence can be divided into two parts or processed in two directions.
* The diagram does not provide any specific details about the internal workings of the processing blocks W<sub>L</sub> and W<sub>R</sub>.
### Interpretation
This diagram likely represents a component of a bidirectional recurrent neural network (RNN) or a similar architecture designed to process sequential data. The two processing blocks (W<sub>L</sub> and W<sub>R</sub>) could represent forward and backward passes through a recurrent layer, allowing the model to consider both past and future context when processing each element of the input sequence. The solid and dashed lines might represent different types of connections, such as direct input and recurrent connections. The outputs z<sub>L</sub> and z<sub>R</sub> could be combined or used independently for further processing. The diagram highlights the importance of considering both directions of the sequence for effective processing. The use of subscripts (M<sub>L</sub>, M<sub>R</sub>) suggests that the lengths of the forward and backward sequences might be different. This architecture is commonly used in natural language processing tasks such as machine translation and sentiment analysis, where understanding the context of a word or phrase requires considering both the preceding and following words.
</details>
## II. HEARING DEVICE CONFIGURATION
In Section II-A the considered binaural hearing device configuration and the signal model are introduced. In Sections II-B and II-C the objective performance measures and the binaural cues are defined.
## A. Signal Model
Consider the binaural hearing device configuration depicted in Figure 1 with M L microphones on the left side and M R microphones on the right side, i.e., M = M L + M R microphones in total. In this paper we consider an acoustic scenario with one desired source (target speaker) and one interfering source (competing speaker) in a noisy and reverberant environment, where the background noise is assumed to be incoherent (e.g., diffuse babble noise, sensor noise).
In the frequency-domain, the m -th microphone signal y m ( ω ) can be decomposed as
<!-- formula-not-decoded -->
with ω the normalized (radian) frequency, x m ( ω ) the desired source component, u m ( ω ) the interfering source component and n m ( ω ) the noise component in the m -th microphone signal. The undesired component v m ( ω ) is defined as the sum of the interfering source component u m ( ω ) and the noise component n m ( ω ) . For the sake of conciseness, we omit the variable ω in the remainder of the paper wherever possible. The M -dimensional noisy input vector containing all microphone signals is defined as
<!-- formula-not-decoded -->
where ( · ) T denotes the transpose. Using (1), this vector can be written as
<!-- formula-not-decoded -->
where x , u , n and v are defined similarly as y in (2).
<!-- formula-not-decoded -->
For the considered acoustic scenario, the desired source component and the interfering source component can be written as where s x and s u denote the desired source signal and the interfering source signal, respectively, and a and b denote M -dimensional ATF vectors, containing the ATFs between the microphones and the desired source and the interfering source, respectively. It should be noted that the ATFs include reverberation, microphone characteristics and the head-shadow effect.
Without loss of generality, the first microphone on each side is defined as the so-called reference microphone. To simplify the notation, the reference microphone signals y 1 and y M L +1 are denoted as y L and y R , i.e.,
<!-- formula-not-decoded -->
where e L and e R denote M -dimensional selection vectors with all elements equal to 0 except one element equal to 1, i.e., e L (1) = 1 and e R ( M L +1) = 1 . Using (3), (4) and (5), the reference microphone signals can be written as
<!-- formula-not-decoded -->
The noisy input covariance matrix R y , the desired source covariance matrix R x , the interfering source covariance matrix R u and the noise covariance matrix R n are defined as
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
with E{·} the expected value operator and ( · ) H the conjugate transpose. Assuming statistical independence between all signal components, R y can be written as
<!-- formula-not-decoded -->
with R v the undesired covariance matrix. Using (4), (8) and (9), the desired source covariance matrix and the interfering source covariance matrix can be written as rank-1 matrices, i.e.,
<!-- formula-not-decoded -->
with p x = E{| s x | 2 } the PSD of the desired source and p u = E{| s u | 2 } the PSD of the interfering source. The noise covariance matrix R n is assumed to be full-rank, i.e., invertible and positive definite.
The left and the right output signals z L and z R are obtained by filtering and summing all microphone signals using the M -dimensional filter vectors w L and w R (cf. Figure 1), i.e.,
<!-- formula-not-decoded -->
## B. Objective Performance Measures
The PSD and the cross power spectral density (CPSD) of the desired source component in the left and the right reference microphone signal are given by
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
Similarly, the output PSD of the desired source component in the left and the right output signal is given by
<!-- formula-not-decoded -->
The same definitions can be applied for the noisy input signal, the interfering source component and the noise component by substituting R x with R y , R u or R n .
The narrowband input SNR in the left and the right reference microphone signal is defined as the ratio of the input PSD of the desired source and noise components, i.e.,
<!-- formula-not-decoded -->
Similarly, the narrowband output SNR in the left and the right output signal is defined as the ratio of the output PSD of the desired source and noise components, i.e.,
<!-- formula-not-decoded -->
The SNR improvement (in dB) is defined as ∆SNR L/R = 10 log 10 SNR out L/R -10 log 10 SNR in L/R .
The narrowband input signal-to-interference ratio (SIR) in the left and the right reference microphone signal is defined as the ratio of the input PSD of the desired source and interfering source components, i.e.,
<!-- formula-not-decoded -->
Similarly, the narrowband output SIR in the left and the right output signal is defined as the ratio of the output PSD of the desired source and interfering source components, i.e.,
<!-- formula-not-decoded -->
The SIR improvement (in dB) is defined as ∆SIR L/R = 10 log 10 SIR out L/R -10 log 10 SIR in L/R .
## C. Binaural Cues
For coherent sources (desired source and interfering source) the main binaural cues used by the auditory system are the ILD and the ITD [4], which can be computed from the so-called interaural transfer function (ITF). Using (11), the input ITFs of the desired source and the interfering source are given by [11]
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
Similarly, the output ITFs of the desired source and the interfering source are given by
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
The ILD and the ITD can be calculated from the ITF as [11]
<!-- formula-not-decoded -->
with ∠( · ) denoting the unwrapped phase.
For an incoherent sound field (background noise), ILD and ITD cues are not very descriptive, but the IC is known to play a major role for spatial perception (e.g., spatial width or diffuseness) [4]. The input IC of the noise component is defined as
<!-- formula-not-decoded -->
while the output IC of the noise component is defined as
<!-- formula-not-decoded -->
Because the IC is typically complex-valued, the magnitudesquared coherence (MSC) is often used. The input and the output MSC of the noise component are defined as
<!-- formula-not-decoded -->
An MSC of 1 corresponds to a coherent source perceived as a distinct point source, while smaller MSC values correspond to a broader or even diffuse sound field impression [4].
## III. BINAURAL BEAMFORMING ALGORITHMS
In this section we briefly review three state-of-the-art binaural beamforming algorithms, namely the BMVDR beamformer, the BLCMV beamformer and the BMVDR-N beamformer. We discuss the performance of these beamforming algorithms in terms of noise and interference reduction performance and binaural cue preservation. For the sake of conciseness, we only show expressions for the left hearing device, denoted by the subscript L . It should be noted that all expressions can also be formulated for the right hearing device by changing the subscript to R .
## A. BMVDR Beamformer
The BMVDR aims at minimizing the output PSD of the noise component while preserving the desired source component in the reference microphone signals [2], [3], [11]. The constrained optimization problem for the left filter vector is given by
<!-- formula-not-decoded -->
Using (4), (6) and (9), the solution of (28) is equal to [2], [22], [23]
<!-- formula-not-decoded -->
with
<!-- formula-not-decoded -->
It should be noted that the BMVDR can also be defined using the undesired covariance matrix R v instead of the noise covariance matrix R n . However, since R v is considerably more difficult to estimate or model in practice than R n , in this paper we only consider the BMVDR using R n in (29).
By substituting (29) in (18) and (20), it has been shown in [3], [11] that the output SNR and the output SIR of the BMVDR are equal to
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
with γ a defined in (30) and
<!-- formula-not-decoded -->
Although the BMVDR yields the largest output SNR among all distortionless binaural beamforming algorithms, the output SIR depends on the relative position of the interfering source to the desired source, cf. (33).
As shown in [3], [11], [13], the BMVDR preserves the binaural cues of the desired source, i.e.,
<!-- formula-not-decoded -->
but distorts the binaural cues of the undesired sources, i.e., for the interfering source
<!-- formula-not-decoded -->
and for the background noise
<!-- formula-not-decoded -->
Hence, at the output of the BMVDR the interfering source and the (incoherent) background noise are perceived as coming from the direction of the desired source, which is obviously undesired in terms of spatial awareness.
## B. BLCMV Beamformer
In addition to preserving the desired source component in the reference microphone signals, the BLCMV preserves a scaled version of the interfering source component in the reference microphone signals while minimizing the output PSD of the noise component [12], [14]. The constrained optimization problem for the left filter vector is given by [14]
<!-- formula-not-decoded -->
with 0 < δ ≤ 1 the (real-valued) interference scaling parameter. Using (4), (6) and (9), the solution of (37) is equal to [14]
<!-- formula-not-decoded -->
with the constraint matrix C and the left response vector g L defined as
<!-- formula-not-decoded -->
By substituting (38) in (18), it has been shown in [14] that the output SNR of the BLCMV is equal to
<!-- formula-not-decoded -->
with
<!-- formula-not-decoded -->
where {·} denotes the real part of a complex number. The output SNR of the BLCMV in (40) is smaller than or equal to the output SNR of the BMVDR in (31), since less degrees of freedom are available for noise reduction. In addition, the output SIR of the BLCMV is equal to [14]
<!-- formula-not-decoded -->
which can hence be directly controlled by the interference scaling parameter δ .
As shown in [14], the BLCMV preserves the binaural cues of both the desired source and the interfering source, i.e.,
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
and the output MSC of the noise component is equal to
<!-- formula-not-decoded -->
Because R xu, 1 in (41) is a rank-2 matrix, it has been shown in [14] that the output MSC of the noise component is smaller than 1 but is not equal to the input MSC of the noise component. Furthermore, it should be noted that the output MSC of the noise component depends on the relative position of the interfering source to the desired source, cf. (41) and (42), such that it is not straightforward to control the binaural cues of the background noise.
## C. BMVDR-N beamformer
In addition to preserving the desired source component in the reference microphone signals, the BMVDR with partial noise estimation (BMVDR-N) aims at preserving a scaled version of the noise component in the reference microphone signals [3], [10], [11]. The constrained optimization problem for the left filter vector is given by
<!-- formula-not-decoded -->
with 0 ≤ η ≤ 1 the (real-valued) mixing parameter. It has been shown in [11] that the solution of (47) is equal to
<!-- formula-not-decoded -->
with w BMVDR ,L defined in (29). Hence, the output signals of the BMVDR-N can be interpreted as a mixture between the noisy reference microphone signals (scaled with η ) and the output signals of the BMVDR (scaled with 1 -η ). For η = 0 , the BMVDR-N is equal to the BMVDR, whereas for η = 1 , no beamforming is applied.
Since the output signals of the BMVDR are mixed with the noisy reference microphone signals, the output SNR of the BMVDR-N is always smaller than or equal to the output SNR of the BMVDR [11], i.e.,
<!-- formula-not-decoded -->
and decreases with increasing η . By substituting (48) in (20), it can be shown that the output SIR of the BMVDR-N is equal to with
<!-- formula-not-decoded -->
As shown in [11], [16], the BMVDR-N preserves the binaural cues of the desired source, i.e.,
<!-- formula-not-decoded -->
By substituting (48) in (24) and (26), it has been shown in [16] and [20] that the output ITF of the interfering source is equal to
<!-- formula-not-decoded -->
and the output MSC of the noise component is equal to
<!-- formula-not-decoded -->
It can be seen from (52) and (53) that only for η = 1 the binaural cues of the undesired sources (interfering source and background noise) are preserved, whereas for η = 0 the binaural cues of the undesired sources are equal to the binaural cues of the desired source (as for the BMVDR). The mixing parameter η hence allows to trade off between noise reduction performance and binaural cue preservation of the background noise, or in other words control the binaural cues of the background noise. Furthermore, it should be noted that the interference reduction performance in (50) and the output ITF of the interfering source in (52) do not only depend on the mixing parameter η but also on the relative position of the interfering source to the desired source, such that it is not straightforward to control both.
## IV. BLCMV WITH PARTIAL NOISE ESTIMATION
Aiming at merging the advantages of the BLCMV and the BMVDR-N, i.e., preserving the binaural cues of the interfering source and controlling the binaural cues of the background noise, in Section IV-A we present the BLCMV beamformer with partial noise estimation (BLCMV-N). Similarly as for the BLCMV in [14], in Sections IV-B and IV-C we derive two decompositions for the BLCMV-N which reveal differences and similarities between the BLCMV-N and the BLCMV.
<!-- formula-not-decoded -->
## A. BLCMV-N Beamformer
Compared to the BMVDR in (28), the BLCMV-N uses an additional constraint to preserve a scaled version of the interfering source component in the reference microphone signals, like the BLCMV in (37), and aims at preserving a scaled version of the noise component in the reference microphone signals, like the BMVDR-N in (47). The constrained optimization problem for the left filter vector is given by
<!-- formula-not-decoded -->
The solution of (54) is equal to (see Appendix A)
<!-- formula-not-decoded -->
with C defined in (39) and the adjusted interference scaling parameter ¯ δ equal to
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
Hence, the output signals of the BLCMV-N can be interpreted as a mixture between the noisy reference microphone signals (scaled with η ) and the output signals of a BLCMV (scaled with 1 -η ) using the adjusted interference scaling parameter ¯ δ in (56) instead of the interference scaling parameter δ . For η = 0 , the BLCMV-N is equal to the BLCMV in (38) with ¯ δ = δ , whereas for η = 1 , it should be realized that only if δ = 1 no beamforming is applied. Since mixing with the reference microphone signals not only affects the noise component but also the interfering source component, the adjusted interference scaling parameter ¯ δ depends on both the interference scaling parameter δ as well as the mixing parameter η due to the interference reduction constraint in (54). Figure 2 depicts ¯ δ as a function of η for different values of δ . It can be seen that
As will be shown in more detail in the following sections, using the parameters δ and η it is possible to control the noise reduction performance, the interference reduction performance and the binaural cues of the background noise for the BLCMVN.
## B. Decomposition into two BLCMVs
In [14] it has been shown that the BLCMV in (38) can be decomposed as the sum of two sub-BLCMVs, i.e.,
<!-- formula-not-decoded -->
with
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, AUGUST 2015
Fig. 2. Adjusted interference scaling parameter ¯ δ as a function of η for different values of δ . Fig. 2. Adjusted interference scaling parameter ¯ δ as a function of η for different values of δ .
<details>
<summary>Image 2 Details</summary>

### Visual Description
## Chart: Relationship between η, δ, and ψ
### Overview
The image presents a line chart illustrating the relationship between three variables: η (eta), δ (delta), and ψ (psi). Four distinct lines are plotted, each representing a different value of δ. The chart appears to model a decreasing function, with the shape and position of each line varying based on the δ value.
### Components/Axes
* **X-axis:** Labeled as "η" (eta), ranging from 0 to 1, with markings at 0.2, 0.4, 0.6, 0.8, and 1.
* **Y-axis:** Labeled as "ψ" (psi), ranging from -1 to 1, with markings at -0.5, 0, 0.5, and 1.
* **Legend:** Located in the top-right corner of the chart. It identifies the four lines by their corresponding δ values:
* δ = 0 (Blue circle with line)
* δ = 0.25 (Red square with line)
* δ = 0.5 (Green pentagon with line)
* δ = 0.75 (Black triangle with line)
* **Grid:** A light gray grid is present, aiding in the reading of values.
### Detailed Analysis
Here's a breakdown of each line's trend and approximate data points, cross-referencing with the legend:
* **δ = 0 (Blue):** This line slopes downward consistently.
* η = 0, ψ ≈ 0.2
* η = 0.2, ψ ≈ 0
* η = 0.4, ψ ≈ -0.2
* η = 0.6, ψ ≈ -0.6
* η = 0.8, ψ ≈ -0.9
* η = 1, ψ ≈ -1.0
* **δ = 0.25 (Red):** This line starts higher than the δ=0 line and also slopes downward, but less steeply.
* η = 0, ψ ≈ 0.4
* η = 0.2, ψ ≈ 0.2
* η = 0.4, ψ ≈ 0
* η = 0.6, ψ ≈ -0.4
* η = 0.8, ψ ≈ -0.8
* η = 1, ψ ≈ -1.0
* **δ = 0.5 (Green):** This line starts even higher and has a more gradual downward slope.
* η = 0, ψ ≈ 0.5
* η = 0.2, ψ ≈ 0.4
* η = 0.4, ψ ≈ 0.2
* η = 0.6, ψ ≈ 0
* η = 0.8, ψ ≈ -0.4
* η = 1, ψ ≈ -0.8
* **δ = 0.75 (Black):** This line has the highest starting point and the slowest downward slope.
* η = 0, ψ ≈ 0.9
* η = 0.2, ψ ≈ 0.7
* η = 0.4, ψ ≈ 0.5
* η = 0.6, ψ ≈ 0.2
* η = 0.8, ψ ≈ 0
* η = 1, ψ ≈ 0.1
### Key Observations
* As δ increases, the starting value of ψ increases (at η = 0).
* The rate of decrease in ψ with respect to η decreases as δ increases.
* All lines converge towards ψ = -1 as η approaches 1.
* The lines are smooth and continuous, suggesting a modeled relationship rather than discrete data points.
### Interpretation
The chart demonstrates the influence of the parameter δ on the relationship between η and ψ. The parameter δ appears to act as a vertical shift, increasing the initial value of ψ. The consistent downward trend across all lines suggests that ψ is a decreasing function of η, and the magnitude of this decrease is modulated by the value of δ. The convergence of the lines at higher η values indicates that the effect of δ diminishes as η increases, and the system approaches a common state. This could represent a system where δ represents an initial condition or a scaling factor, and η represents a driving force that eventually overcomes the influence of δ. The model suggests a diminishing return or saturation effect as η increases.
</details>
A. BLCMV-N Beamformer and the respective response vectors
Compared interfering source component in the reference microphone signals, like the BLCMV beamformer in (38), and aims at preserving a scaled version of the noise component in the reference microphone signals, like the BMVDR-N in (48). The constrained optimization problem for the left filter vector is given by The sub-BLCMV w x,L in (59) preserves the desired source component in the reference microphone signals and steers a null towards the interfering source, whereas the sub-BLCMV w u,L in (60) preserves the interfering source component in the reference microphone signals and steers a null towards the desired source. Using (55), it can be easily seen that the proposed BLCMV-N can be decomposed as
<!-- formula-not-decoded -->
an additional
<!-- formula-not-decoded -->
as a mixture between the noisy reference microphone signals
The solution of (56) is equal to (see Appendix for derivation) w BLCMV -N ,L = η e L +(1 -η ) R -1 n C ( C H R -1 n C ) -1 [ a ∗ L ¯ δb ∗ L ] (57) with C defined in (40) and the adjusted interference scaling parameter ¯ δ equal to ¯ δ = δ -η 1 -η . (58) The output signals of the BLCMV-N hence can be interpreted Hence, the BLCMV-N can be interpreted as a mixture of the reference microphone signals (scaled with η ), a BLCMV that preserves the desired source and rejects the interfering source (scaled with 1 -η ) and a BLCMV that preserves the interfering source and rejects the desired source (scaled with δ -η ). Since the scaling of the sub-BLCMV w x,L controls the desired source component without affecting the interfering source component and the scaling of the sub-BLCMV w u,L controls the interfering source component without affecting the desired source component [14], it can be directly observed from the scaling factors in (62) that the desired source component is not distorted and the interfering source component is scaled with δ .
## (scaled with η ) and the output signals of a BLCMV (scaled C. Decomposition using Binauralization Postfilters
and the
<!-- formula-not-decoded -->
with 1 -η ) using the adjusted interference scaling parameter ¯ δ in (58) instead of the interference scaling parameter δ . For η = 0 , the BLCMV-N is equal to the BLCMV in (39) with ¯ δ = δ , whereas for η = 1 , no beamforming is applied. Since mixing with the reference microphone signals not only affects the noise component but also the interfering source component In [14] it has also been shown that the sub-BLCMV w x,L in (59) for the left hearing device and the sub-BLCMV w x,R for the right hearing device (defined similarly as w x,L ) can be written using a common spatial filter and two binauralization postfilters as
¯
<!-- formula-not-decoded -->
to be satisfied, the adjusted interference scaling parameter δ depends on both the interference scaling parameter δ as well with the common desired BLCMV (D-BLCMV) given by
¯ δ ( η, δ ) =      > 0 , for δ > η < 0 , for δ < η 0 , for δ = η . (59) and the ATFs a L and a R between the desired source and the reference microphones used as binauralization postfilters. Similarly, the sub-BLCMV w u,L in (60) and the sub-BLCMV w u,R (defined similarly as w u,L ) can be written as w u,L = w u b ∗ L , w u,R = w u b ∗ R , (65)
B. Deco
Simila reference
BLCMV
different
BLCMV
w
BLC
with the and their
The sub- and steer
desired s rejection
microph preserves
It can th
Hence, t that
the refer
interferin pres
BLCMV
combinat rejects
t
binaural partially
interferin
BLCMV
Section cues of t
C.
Filter
For a was sho
bines bot be writte
interferin
BLCMV
with the
desir desired s
binaurali
<details>
<summary>Image 3 Details</summary>

### Visual Description
\n
## Diagram: System Block Diagram
### Overview
The image depicts a system block diagram, likely representing a signal processing or neural network architecture. The diagram shows a central input 'y' being processed through several weighted components and added to other signals to produce two outputs, 'zL' and 'zR'. The diagram is composed of rectangular blocks representing operations or components, connected by arrows indicating signal flow.
### Components/Axes
The diagram contains the following components:
* **Input:** 'y'
* **Weighted Components:** 'Wx' and 'Wu'
* **Error Signals:** 'eL' and 'eR'
* **Parameters:** 'a\*L', 'b\*L', 'a\*R', 'b\*R' (The asterisk suggests these are complex conjugates or related to a transformation)
* **Scaling Factors:** η (eta), 1-η, δ-η
* **Summation Symbols:** '+'
* **Outputs:** 'zL' and 'zR'
There are no axes in this diagram, as it is a block diagram representing a system's structure rather than a graphical representation of data.
### Detailed Analysis or Content Details
The diagram can be broken down into two parallel processing paths, one for 'zL' and one for 'zR'.
**Left Path (zL):**
1. Input 'y' is fed into 'Wx' and 'Wu'.
2. 'Wx' is connected to 'a\*L' and 'b\*L'.
3. 'Wu' is connected to 'a\*L' and 'b\*L'.
4. 'eL' is multiplied by η and added to the output of 'a\*L' multiplied by (1-η) and the output of 'b\*L' multiplied by (δ-η).
5. The result is output as 'zL'.
**Right Path (zR):**
1. Input 'y' is fed into 'Wx' and 'Wu'.
2. 'Wx' is connected to 'a\*R' and 'b\*R'.
3. 'Wu' is connected to 'a\*R' and 'b\*R'.
4. 'eR' is multiplied by η and added to the output of 'a\*R' multiplied by (1-η) and the output of 'b\*R' multiplied by (δ-η).
5. The result is output as 'zR'.
The scaling factors η, 1-η, and δ-η are used to weight the signals before summation. The values of δ are not specified.
### Key Observations
* The diagram exhibits symmetry between the left and right paths, suggesting a similar processing operation applied to different signals or features.
* The use of 'a\*' and 'b\*' suggests a transformation or filtering operation.
* The error signals 'eL' and 'eR' are fed back into the system, indicating a feedback loop or error correction mechanism.
* The scaling factors suggest a control mechanism for adjusting the contribution of different signals.
### Interpretation
This diagram likely represents a system for processing a signal 'y' into two outputs 'zL' and 'zR'. The symmetry suggests that the processing is similar for both outputs, but potentially with different parameters ('a\*', 'b\*'). The error signals 'eL' and 'eR' indicate that the system is attempting to minimize some error or difference between the desired output and the actual output. The scaling factors η, 1-η, and δ-η provide a way to adjust the influence of different components in the system.
This could be a simplified representation of a neural network layer, an adaptive filter, or a control system. The specific meaning of the components and parameters would depend on the context in which this diagram is used. The diagram is a functional representation, showing *how* the system operates rather than *what* it is calculating. The lack of specific values for δ makes it difficult to determine the exact behavior of the system. The asterisk notation on the parameters suggests a complex operation, potentially involving complex numbers or conjugate operations.
</details>
×
Fig. 3. Decomposition of the BLCMV-N into a mixture of the reference microphone signals and two BLCMVs with binauralization postfilters.
with the common interference BLCMV (I-BLCMV) given by and the ATFs b L and b R between the interfering source and the reference microphones used as binauralization postfilters.
<!-- formula-not-decoded -->
Using (63) and (65) in (62), the BLCMV-N can be decomposed as
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
Figure 3 depicts this decomposition of the BLCMV-N using common spatial filters and binauralization postfilters. The output signals of the BLCMV-N can hence be interpreted as a mixture between the reference microphone signals (scaled with η ), the binauralized output signals of the D-BLCMV (scaled with 1 -η ) and the binauralized output signals of the I-BLCMV (scaled with δ -η ).
Due to the constraints in (54), the BLCMV-N perfectly preserves the desired source component and scales the interfering source component with δ . Using (67) and (68), the noise component in the output signals of the BLCMV-N are equal to
<!-- formula-not-decoded -->
with n x = w H x n and n u = w H u n the noise component in the output signal of the D-BLCMV and the I-BLCMV, respectively. The noise component in the output signals of the BLCMV-N can hence be interpreted as a mixture between the noise component in the reference microphone signals (scaled with η ), a coherent residual noise source ( n x ) coming from the direction of the desired source (scaled with 1 -η ) and a coherent residual noise source ( n u ) coming from the direction of the interfering source (scaled with δ -η ).
## V. PERFORMANCE OF THE BLCMV-N
In this section we provide a performance analysis of the proposed BLCMV-N. In Section V-A we derive the output PSDs of the signal components. In Sections V-B and V-C
we analyze the noise and interference reduction performance and the binaural cue preservation performance. Finally, in Section V-D we discuss the setting of the mixing parameter η and the interference scaling parameter δ .
## A. Output Power Spectral Densities
Due to the constraints in (54), the output PSD of the desired and interfering source components in the left output signal of the BLCMV-N are equal to, cf. (13),
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
Furthermore, the output PSD of the noise component in the left output signal of the BLCMV-N is equal to (see Appendix B)
with
<!-- formula-not-decoded -->
with γ a defined in (30), γ ab defined in (33), and γ b and Ψ defined in (42). It can be seen that the output PSD of the noise component for the BLCMV-N is a quadratic function in both the mixing parameter η and the interference scaling parameter δ . By comparing (74) to (41), it can be observed that
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
where R δ =1 xu, 1 denotes the expression for the BLCMV in (41) with δ = 1 , corresponding to no suppression of the interfering source. Please note that for η = 0 , R xu, 3 = R xu, 1 , and for η = 1 and δ = 1 , R xu, 3 = 0 M . By using (75) in (73), it follows that
<!-- formula-not-decoded -->
## B. Noise and Interference Reduction Performance
By substituting (71) and (73) in (18), the left output SNR of the BLCMV-N is equal to
<!-- formula-not-decoded -->
which depends on both the mixing parameter η and the interference scaling parameter δ . Using (76) and realizing that the output PSD of the noise component in the left output signal of the BLCMV (for any value of δ ) is smaller than or equal to the PSD of the noise component in the left reference microphone signal, the output SNR of the BLCMV-N in (77) is smaller than or equal to the output SNR of the BLCMV in (40), i.e.,
<!-- formula-not-decoded -->
By substituting (71) and (72) in (20), the left output SIR of the BLCMV-N is equal to
<!-- formula-not-decoded -->
which is equal to the left output SIR of the BLCMV in (43) and solely controlled by the interference scaling parameter δ . For η = 0 , the left output SNR of the BLCMV-N is equal to the left output SNR of the BLCMV in (40), while for η = 1 and δ = 1 , the left output SNR of the BLCMV-N is equal to the left input SNR because no beamforming is applied.
## C. Binaural Cue Preservation
Similarly as for the BLCMV, due to the constraints in (54) the BLCMV-N preserves the binaural cues of both the desired source and the interfering source, i.e.,
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
Using (26), the output IC of the noise component for the BLCMV-N is equal to (see Appendix B for derivation of components)
<!-- formula-not-decoded -->
with R xu, 3 defined in (74). Since R xu, 3 depends on both the mixing parameter η and the interference scaling parameter δ , also the output IC of the noise component in (82) depends on both parameters. Using (27), the output MSC of the noise component for the BLCMV-N is equal to
<!-- formula-not-decoded -->
Since for η = 0 the BLCMV-N is equal to the BLCMV, the output MSC of the noise component is smaller than 1, see Section III-B. It should however be realized that in contrast to the BMVDR-N discussed in Section III-C, for η = 1 the BLCMV-N does not always preserve the MSC of the noise component. Only for η = 1 and δ = 1 the binaural cues of all signal components are preserved because no beamforming is applied.
## D. Parameter Settings
Maximizing the left output SNR in (77) corresponds to minimizing the denominator, i.e., using (75),
Setting the derivative of (84) with respect to the mixing parameter η equal to zero, yields
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
as the optimal mixing parameter η in terms of left (and right) output SNR. The derivative of (84) with respect to the interference scaling parameter δ is equal to, using (41),
<!-- formula-not-decoded -->
Setting (86) to zero and solving for δ yields the optimal interference scaling parameter in terms of left output SNR, i.e., with
As can be seen from (79), the output SIR is not affected by the mixing parameter η but is solely determined by the interference scaling parameter δ .
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
## VI. SIMULATIONS
In Section VI-A we first validate the expressions derived in the previous sections using measured anechoic ATFs. In Section VI-B we then experimentally compare the performance of the proposed BLCMV-N with the BMVDR, BLCMV and BMVDR-N using recorded signals in a reverberant environment with a competing speaker and multi-talker babble noise. Finally, in Section VI-C we compare the spatial impression of the considered binaural beamforming algorithms using a perceptual listening test.
## A. Validation Using Measured Anechoic ATFs
To validate the derived expressions for the considered algorithms we used measured anechoic ATFs of two behind-the-ear hearing aids mounted on a head-and-torso-simulator (HATS) [24]. Each hearing aid has two microphones ( M = 4 ) with an inter-microphone distance of about 14 mm . We chose the front microphone on each hearing aid as reference microphone. The ATFs were calculated from anechoic impulse responses using a 512-point FFT at a sampling rate of 16 kHz .
The desired source was placed at 0 â—¦ (in front) and the interfering source was placed at -35 â—¦ (to the left), both at a distance of 3 m from the HATS. The desired source covariance matrix R x and the interfering source covariance matrix R u were constructed using the ATF vector of the desired source a and the ATF vector of the interfering source b according to (11), where the PSD of the desired source p x and the PSD of the interfering source p u were both set to 1. As background noise we considered a combination of spatially white and cylindrically isotropic noise, i.e., the noise covariance matrix R n was constructed as
<!-- formula-not-decoded -->
with p n, w the PSD of the spatially white noise, I M the M × M -dimensional identity matrix, p n, cyl the PSD of the cylindrically isotropic noise and Γ its spatial coherence matrix. The ( i, j ) -th element of the spatial coherence matrix Γ was calculated using all available anechoic ATFs as with h ( θ k ) the anechoic ATF at angle θ k and K the total number of angles in the database ( K = 72 for [24]). The PSD of the spatially white noise p n, w was set to -55 dB , while the PSD of the cylindrically isotropic noise p n, cyl was set to 1.
<!-- formula-not-decoded -->
Fig. 4. SNR improvement for the BLCMV-N and the BMVDR-N at 500Hz .
<details>
<summary>Image 4 Details</summary>

### Visual Description
\n
## Line Chart: Delta ASNR vs. Eta for Different Delta Values
### Overview
The image presents a line chart illustrating the relationship between Delta ASNR (in dB) and Eta, for several different values of Delta. A fifth line represents BMVDR-N. The chart appears to be evaluating the performance of a system under varying conditions, likely related to signal processing or communication systems.
### Components/Axes
* **X-axis:** Labeled as "η" (Eta), ranging from 0 to 1. The scale is linear with markings at 0, 0.25, 0.5, 0.75, and 1.
* **Y-axis:** Labeled as "ΔASNR<sub>L</sub> [dB]" (Delta ASNR L in decibels), ranging from 0 to 10. The scale is linear with markings at 0, 2.5, 5, 7.5, and 10.
* **Legend:** Located in the top-right corner of the chart. It contains the following entries:
* δ = 0.01 (Blue line)
* δ = 0.25 (Orange line)
* δ = 0.477 (Yellow line)
* δ = 0.75 (Purple line)
* BMVDR-N (Black dashed line)
* **Grid:** A light gray grid is present, aiding in reading values from the chart.
### Detailed Analysis
The chart displays five lines, each representing a different Delta value or the BMVDR-N method.
* **δ = 0.01 (Blue Line):** This line starts at approximately 3.2 dB at η = 0 and slopes downward, reaching approximately 0.8 dB at η = 1. The trend is consistently downward.
* **δ = 0.25 (Orange Line):** This line begins at approximately 5.2 dB at η = 0 and decreases to approximately 2.2 dB at η = 1. The slope is steeper than the blue line.
* **δ = 0.477 (Yellow Line):** Starting at around 6.0 dB at η = 0, this line declines to approximately 3.0 dB at η = 1. It has a moderate downward slope.
* **δ = 0.75 (Purple Line):** This line starts at approximately 5.5 dB at η = 0 and falls to around 2.5 dB at η = 1. The slope is similar to the orange line.
* **BMVDR-N (Black Dashed Line):** This line begins at approximately 7.8 dB at η = 0 and decreases to approximately 1.2 dB at η = 1. It has the steepest downward slope among all the lines.
### Key Observations
* All lines exhibit a negative correlation between Eta and Delta ASNR. As Eta increases, Delta ASNR decreases.
* The BMVDR-N method consistently provides the highest Delta ASNR values across the entire range of Eta.
* The Delta value significantly impacts the Delta ASNR. Higher Delta values generally result in higher Delta ASNR, especially at lower Eta values.
* The lines converge as Eta approaches 1, suggesting that the impact of Delta diminishes at higher Eta values.
### Interpretation
The chart likely demonstrates the performance of different algorithms or configurations in a signal processing or communication system. Delta (δ) could represent a parameter controlling the level of regularization or a similar factor. Eta (η) might represent a noise level or a system parameter. Delta ASNR (ΔASNR<sub>L</sub>) is a measure of signal-to-noise ratio improvement.
The BMVDR-N method appears to be the most robust, maintaining the highest Delta ASNR across all Eta values. The convergence of the lines at higher Eta values suggests that the system's performance becomes less sensitive to Delta as the noise level or system parameter increases. The decreasing Delta ASNR with increasing Eta indicates a performance degradation as Eta increases, which could be due to increased noise or other adverse effects.
The choice of Delta value is crucial for optimizing performance, particularly at lower Eta values. The chart provides valuable insights for selecting the appropriate Delta value based on the expected operating conditions.
</details>
Fig. 5. SIR improvement for the BLCMV-N and the BMVDR-N at 500Hz .
<details>
<summary>Image 5 Details</summary>

### Visual Description
\n
## Chart: Delta ASRL vs. Eta for Different Delta Values and BMVDR-N
### Overview
The image presents a line graph illustrating the relationship between ΔASRL (in dB) and η (eta) for three different values of δ (delta) – 0.25, 0.477, and 0.75 – alongside a comparison with BMVDR-N. The graph appears to demonstrate the impact of η on ΔASRL under varying δ conditions.
### Components/Axes
* **X-axis:** η (eta), ranging from 0 to 1, with tick marks at 0, 0.25, 0.5, 0.75, and 1.
* **Y-axis:** ΔASRL [dB], ranging from 0 to 15, with tick marks at 0, 5, 10, and 15.
* **Legend:** Located in the top-right corner, containing the following entries:
* δ = 0.25 (Blue solid line)
* δ = 0.477 (Orange solid line)
* δ = 0.75 (Yellow solid line)
* BMVDR-N (Black dashed line)
### Detailed Analysis
* **δ = 0.25 (Blue Line):** This line is approximately horizontal, maintaining a ΔASRL value of around 12 dB across the entire range of η. There is a very slight downward trend, but it is minimal.
* **δ = 0.477 (Orange Line):** This line is also approximately horizontal, with a ΔASRL value of approximately 7 dB across the range of η. Similar to the blue line, there is a very slight downward trend.
* **δ = 0.75 (Yellow Line):** This line is also approximately horizontal, with a ΔASRL value of approximately 3 dB across the range of η. Again, a very slight downward trend is observed.
* **BMVDR-N (Black Dashed Line):** This line exhibits a clear downward slope. At η = 0, ΔASRL is approximately 6 dB. At η = 1, ΔASRL is approximately 2 dB. The slope is relatively consistent.
### Key Observations
* The ΔASRL values for the three δ values (0.25, 0.477, and 0.75) are relatively constant across the range of η, indicating that η has minimal impact on ΔASRL when δ is fixed.
* The BMVDR-N line shows a negative correlation between η and ΔASRL, meaning that as η increases, ΔASRL decreases.
* The BMVDR-N line consistently falls below the lines representing the different δ values.
### Interpretation
The data suggests that the parameter η has a significant impact on ΔASRL for the BMVDR-N method, but has a minimal impact when δ is held constant. The constant ΔASRL values for the different δ values indicate that these methods are less sensitive to changes in η. The fact that BMVDR-N consistently performs lower than the other methods suggests that it may be more susceptible to the effects of η.
The graph demonstrates a trade-off between η and ΔASRL for the BMVDR-N method. Increasing η leads to a decrease in ΔASRL, which could be desirable in certain applications. However, the constant ΔASRL values for the other methods suggest that they may be more robust to changes in η.
The horizontal lines for the different δ values could indicate that these methods have reached a performance plateau, and further increases in η will not lead to significant improvements in ΔASRL. The downward slope of the BMVDR-N line suggests that there is still room for improvement in this method by adjusting η.
</details>
1) Noise and Interference Reduction Performance: Using (17) and (18), Figure 4 depicts the left SNR improvement at 500 Hz for the BLCMV-N for different values of the mixing parameter η and the interference scaling parameter δ and the BMVDR-N for different values of the mixing parameter η . As expected, the BMVDR (i.e., BMVDR-N for η = 0 ) yields the largest SNR improvement (cf. (78)). Since the BMVDR-N mixes the output signals of the BMVDR with the noisy reference microphone signals, it can be observed that increasing the mixing parameter η reduces the SNR improvement of the BMVDR-N compared to the BMVDR ( η = 0 ). For the BLCMV-N, both η and δ affect the SNR improvement, which is in line with (77). Similarly to the BMVDR-N, the BLCMV-N mixes the output signals of a BLCMV with the noisy reference microphone signals. Hence, it can be observed that for any value of the interference scaling parameter δ , increasing the mixing parameter η reduces the SNR improvement of the BLCMV-N compared to the BLCMV ( η = 0 ), which is in line with (78). Since less degrees of freedom are available for noise reduction, the BLCMV ( η = 0 ) yields a smaller SNR improvement compared to the BMVDR ( η = 0 ), as discussed in Section III-B. Using (87), the interference scaling parameter δ maximizing the output SNR was equal to δ opt ,L = 0 . 477 for the considered acoustic scenario. As expected, it can be observed that using δ opt ,L leads to the largest SNR improvement of all considered values of δ . For large values of the mixing parameter η , the BLCMVN yields a larger SNR improvement than the BMVDR-N. It should be noted that the exact behaviour depends on the interference scaling parameter δ and the relative position of the interfering source to the desired source.
Using (19) and (20), Figure 5 depicts the left SIR improvement at 500 Hz for the BLCMV-N for different values of the mixing parameter η and the interference scaling parameter δ
Fig. 6. The MSC of the noise component in the reference microphone signals ( Input ), in the output signals of the BLCMV for different values of the interference scaling parameter δ , the BMVDR-N for different values of the mixing parameter η and the BLCMV-N for different values of the mixing parameter η and the interference scaling paramter δ .
<details>
<summary>Image 6 Details</summary>

### Visual Description
## Charts: MSC vs. kHz for Various Algorithms
### Overview
The image presents six charts displaying Modified Spectral Correlation (MSC) values against frequency in kHz. Each chart represents a different algorithm or parameter setting, with lines representing different values of η (eta) or δ (delta). The top-left chart shows the input signal's MSC. The remaining charts show the MSC for different algorithms (BLCMV-N, BLCMV, BMVDR-N, BLCMV-N) and parameter settings.
### Components/Axes
All charts share the following components:
* **X-axis:** Frequency in kHz, ranging from 0 to 4 kHz. The axis is labeled "kHz".
* **Y-axis:** Modified Spectral Correlation (MSC), ranging from 0 to 1. The axis is labeled "MSC".
* **Legends:** Each chart has a legend indicating the different lines and their corresponding η or δ values.
* **Titles:** Each chart has a title indicating the algorithm and parameter settings.
Specific chart details:
1. **Top-Left: Input** - Displays the MSC of the input signal.
2. **Top-Right: BLCMV-N, δ = 0.01** - Shows MSC vs. kHz for BLCMV-N with δ fixed at 0.01 and varying η (0, 0.5, 1).
3. **Middle-Left: BLCMV, δ = 0.01 & δ = 0.5** - Shows MSC vs. kHz for BLCMV with δ values of 0.01 and 0.5, and a single η value.
4. **Middle-Right: BLCMV-N, δ = 0.477** - Shows MSC vs. kHz for BLCMV-N with δ fixed at 0.477 and varying η (0, 0.5, 1).
5. **Bottom-Left: BMVDR-N** - Shows MSC vs. kHz for BMVDR-N with varying η (0, 0.5, 1).
6. **Bottom-Right: BLCMV-N, δ = 1** - Shows MSC vs. kHz for BLCMV-N with δ fixed at 1 and varying η (0, 0.5, 1).
### Detailed Analysis or Content Details
**1. Input:**
* The line (blue) starts at approximately 0.8 at 0 kHz, rapidly decreases to approximately 0.1 at 1 kHz, and remains relatively flat around 0.1 until 4 kHz.
**2. BLCMV-N, δ = 0.01:**
* η = 0 (red): Starts at approximately 0.8, drops sharply to around 0.5 at 0.5 kHz, then fluctuates between 0.5 and 0.7, peaking around 3 kHz at approximately 0.75.
* η = 0.5 (orange): Starts at approximately 0.8, drops to around 0.5 at 0.5 kHz, and then gradually increases to approximately 0.65 at 4 kHz.
* η = 1 (blue): Starts at approximately 0.8, drops to around 0.5 at 0.5 kHz, and then fluctuates between 0.5 and 0.6, peaking around 3 kHz at approximately 0.6.
**3. BLCMV, δ = 0.01 & δ = 0.5:**
* δ = 0.01 (red): Starts at approximately 0.8, drops to around 0.4 at 0.5 kHz, and then fluctuates between 0.4 and 0.6, peaking around 3 kHz at approximately 0.6.
* δ = 0.5 (orange): Starts at approximately 0.8, drops to around 0.3 at 0.5 kHz, and then fluctuates between 0.3 and 0.5, peaking around 3 kHz at approximately 0.5.
**4. BLCMV-N, δ = 0.477:**
* η = 0 (red): Starts at approximately 0.8, drops to around 0.5 at 0.5 kHz, and then fluctuates between 0.5 and 0.7, peaking around 3 kHz at approximately 0.7.
* η = 0.5 (orange): Starts at approximately 0.8, drops to around 0.5 at 0.5 kHz, and then gradually increases to approximately 0.65 at 4 kHz.
* η = 1 (blue): Starts at approximately 0.8, drops to around 0.5 at 0.5 kHz, and then fluctuates between 0.5 and 0.6, peaking around 3 kHz at approximately 0.6.
**5. BMVDR-N:**
* η = 0 (red): Starts at approximately 0.8, drops to around 0.4 at 0.5 kHz, and then remains relatively flat around 0.4 until 4 kHz.
* η = 0.5 (orange): Starts at approximately 0.8, drops to around 0.4 at 0.5 kHz, and then remains relatively flat around 0.4 until 4 kHz.
* η = 1 (blue): Starts at approximately 0.8, drops to around 0.4 at 0.5 kHz, and then remains relatively flat around 0.4 until 4 kHz.
**6. BLCMV-N, δ = 1:**
* η = 0 (red): Starts at approximately 0.8, drops sharply to around 0.5 at 0.5 kHz, and then fluctuates between 0.5 and 0.7, peaking around 3 kHz at approximately 0.7.
* η = 0.5 (orange): Starts at approximately 0.8, drops to around 0.5 at 0.5 kHz, and then gradually increases to approximately 0.65 at 4 kHz.
* η = 1 (blue): Starts at approximately 0.8, drops to around 0.5 at 0.5 kHz, and then fluctuates between 0.5 and 0.6, peaking around 3 kHz at approximately 0.6.
### Key Observations
* The input signal has a significantly higher MSC value at low frequencies compared to the other algorithms.
* All algorithms show a general decrease in MSC at lower frequencies (below 1 kHz).
* The parameter η seems to have a more pronounced effect on the MSC values for BLCMV-N and BLCMV algorithms than for BMVDR-N.
* The δ parameter significantly impacts the MSC values for BLCMV and BLCMV-N algorithms.
* BMVDR-N shows minimal variation in MSC across different η values.
### Interpretation
The charts demonstrate the performance of different algorithms in estimating the Modified Spectral Correlation (MSC) of a signal across various frequencies. The input signal's MSC profile serves as a baseline for comparison. The variations in MSC values observed for different algorithms and parameter settings (η and δ) suggest that these parameters play a crucial role in shaping the MSC estimation.
The fact that BMVDR-N exhibits minimal sensitivity to η indicates that its performance is relatively stable regardless of this parameter. Conversely, the noticeable variations in MSC values for BLCMV-N and BLCMV algorithms with different η values suggest that this parameter can be tuned to optimize performance.
The impact of the δ parameter on BLCMV and BLCMV-N algorithms highlights its importance in controlling the algorithm's behavior. The different δ values lead to distinct MSC profiles, indicating that selecting an appropriate δ value is critical for achieving desired performance.
The overall trend of decreasing MSC at lower frequencies across all algorithms suggests that these algorithms may be less effective at capturing signal correlations at lower frequencies. The fluctuations in MSC values at higher frequencies may indicate the presence of noise or other artifacts in the signal.
</details>
and the BMVDR-N for different values of the mixing parameter η . As expected from (43) and (79), both the BLCMV-N and the BLCMV ( η = 0 ) yield the same SIR improvement, which is solely controlled by the interference scaling parameter δ . Hence, increasing the interference scaling parameter δ reduces the SIR improvement for both the BLCMV-N and the BLCMV. For the BMVDR-N it can be observed that increasing the mixing parameter η reduces the SIR improvement. It should be noted that the exact behaviour depends on the relative position of the interfering source to the desired source, as can be seen from (50) and (51).
2) Binaural Cue Preservation of Background Noise: For different frequencies, Figure 6 depicts the input MSC in (27) of the noise component ( Input ) and the output MSC in (27) of the noise component for the BLCMV in (46) for different values of the interference scaling parameter δ , the BMVDR-N in (53) for different values of the mixing parameter η and the BLCMV-N for different values of the mixing parameter η and the interference scaling parameter δ . Although the BLCMV is not designed to preserve the MSC of the noise component, it can be observed that an output MSC smaller than 1 is obtained, especially for large values of δ [14]. However, since the output MSC of the noise component depends on the relative position of the interfering source to the desired source, it cannot be easily controlled. Since the BMVDR-
Fig. 7. Frequency-averaged MSC error of the noise component for the BLCMV-N and the BMVDR-N.
<details>
<summary>Image 7 Details</summary>

### Visual Description
## Chart: ΔMSC vs η for Different δ Values
### Overview
The image presents a line chart illustrating the relationship between ΔMSC (Delta MSC) on the y-axis and η (eta) on the x-axis, for four different values of δ (delta): 0.25, 0.477, 0.75, and 1. A fifth line represents BMVDR-N. The chart appears to model a decreasing trend for all lines as η increases.
### Components/Axes
* **X-axis:** η (eta), ranging from 0 to 1, with tick marks at 0, 0.25, 0.5, 0.75, and 1.
* **Y-axis:** ΔMSC (Delta MSC), ranging from 0 to 1, with tick marks at 0, 0.25, 0.5, 0.75, and 1.
* **Legend:** Located in the top-right corner, it identifies the lines by their corresponding δ values and BMVDR-N.
* δ = 0.25 (Blue line)
* δ = 0.477 (Orange line)
* δ = 0.75 (Yellow line)
* δ = 1 (Purple line)
* BMVDR-N (Black dashed line)
### Detailed Analysis
* **δ = 0.25 (Blue Line):** This line starts at approximately ΔMSC = 0.85 when η = 0. It decreases steadily, reaching approximately ΔMSC = 0.2 when η = 1.
* **δ = 0.477 (Orange Line):** This line begins at approximately ΔMSC = 0.6 when η = 0. It exhibits a decreasing trend, reaching approximately ΔMSC = 0.25 when η = 1.
* **δ = 0.75 (Yellow Line):** This line starts at approximately ΔMSC = 0.15 when η = 0. It shows a relatively flat decreasing trend, remaining around ΔMSC = 0.1 when η = 1.
* **δ = 1 (Purple Line):** This line begins at approximately ΔMSC = 0.1 when η = 0. It exhibits a slight decreasing trend, remaining around ΔMSC = 0.1 when η = 1.
* **BMVDR-N (Black Dashed Line):** This line starts at approximately ΔMSC = 0.9 when η = 0. It decreases rapidly, reaching approximately ΔMSC = 0.05 when η = 1.
### Key Observations
* All lines demonstrate a decreasing trend as η increases.
* The line for δ = 0.25 has the highest initial ΔMSC value and the most significant decrease.
* The lines for δ = 0.75 and δ = 1 are relatively flat and close to each other, indicating minimal change in ΔMSC with increasing η.
* The BMVDR-N line exhibits the steepest decrease in ΔMSC.
* The lines are all smooth curves, suggesting a continuous relationship between the variables.
### Interpretation
The chart suggests that as η increases, the ΔMSC decreases for all values of δ and for the BMVDR-N model. The rate of decrease is dependent on the value of δ. Lower values of δ (e.g., 0.25) result in a more substantial decrease in ΔMSC as η increases, while higher values of δ (e.g., 0.75 and 1) show a more moderate decrease. The BMVDR-N model demonstrates the most significant reduction in ΔMSC with increasing η.
This data could represent a sensitivity analysis, where η is a parameter influencing a system, and ΔMSC is a measure of some performance metric. The different δ values might represent different configurations or conditions within the system. The BMVDR-N model could be a baseline or reference model. The chart indicates that increasing η generally improves performance (reduces ΔMSC), but the extent of improvement varies depending on the system configuration (δ). The relatively flat lines for δ = 0.75 and δ = 1 suggest that beyond a certain point, increasing η has a diminishing effect on ΔMSC for those configurations.
</details>
N mixes the output signals of the BMVDR with the noisy reference microphone signals, it can be observed that the output MSC of the noise component is smaller than 1, and for η = 1 the MSC is perfectly preserved (but no beamforming is applied). For the BLCMV-N, it can be observed that both η and δ influence the output MSC of the noise component, as discussed in Section V-C. For η = 0 , the output MSC of the noise component for the BLCMV-N is obviously equal to the output MSC of the noise component for the BLCMV. For a fixed value of δ , it can be observed that the output MSC of the noise component approaches the input MSC of the noise component for increasing η , although it should be realized that perfect preservation of the MSC of the noise component is only possible for δ = 1 (cf. Section V-C).
For several values of the mixing parameter η , Figure 7 depicts the MSC error of the noise component for the BLCMVN and the BMVDR-N, averaged over all frequencies, i.e.,
<!-- formula-not-decoded -->
with f the frequency bin index and F the total number of frequency bins. As expected, the BMVDR ( η = 0 ) yields the largest MSC error of the noise component and increasing the mixing parameter η reduces the frequency-averaged MSC error of the noise component for the BMVDR-N [16]. For the considered acoustic scenario, it can be observed for the BLCMVN that for any value of the interference scaling parameter δ , increasing the mixing parameter η reduces the frequencyaveraged MSC error of the noise component compared to the BLCMV ( η = 0 ). Further, it can be observed that for small values of the interference scaling parameter δ , the effect of the mixing parameter η is larger than for large values of the interference scaling parameter δ , for which the frequencyaveraged MSC error is relatively small for all values of the mixing parameter η . These results clearly show that the mixing parameter η in the BLCMV-N enables to control the binaural cues of the background noise.
## B. Experimental Results Using Reverberant Recordings
For a more realistic evaluation, we compare the performance of the considered binaural beamforming algorithms using reverberant recordings. Similarly to Section VI-A, the experimental setup consists of two hearing aids, each with two microphones, mounted on a HATS in a cafeteria with a reverberation time of approximately 1 . 25 s [24]. The desired source was again placed at 0 â—¦ (at a distance of about 102 cm ), while the interfering source was again placed at -35 â—¦ (at a distance of about 118 cm ), see [24] for more details. The desired and interfering source components were generated by convolving clean speech signals with the measured reverberant room impulse responses corresponding to the desired source and interfering source positions. The desired source was a male German speaker, speaking eight sentences with a pause of 1 s between the sentences. The interfering source was a male Dutch speaker, speaking seven sentences with a pause of 0 . 25 s between the sentences. As background noise we used realistic recordings [24], consisting of multi-talker babble noise, clacking plates and temporally dominant competing speakers. The used background noise hence clearly differed from the perfectly diffuse noise in Section VI-A. The entire signal had a length of about 28 s . The desired source and the background noise were active the entire time, whereas the interfering source only became active after about 14 s . The desired source component, the interfering source component and the noise component were mixed at an input SNR of 10 dB and input SIR of 5 dB in the right reference microphone. Again, we chose the front microphone on each hearing aid as reference microphone.
As objective performance measures for noise and interference reduction performance, we used the left and the right SNR improvement ( ∆SNR L , ∆SNR R ) and the left and the right SIR improvement ( ∆SIR L , ∆SIR R ). As objective performance measure for binaural cue preservation of the background noise we used the frequency-averaged MSC error of the noise component ( ∆MSC ) as defined in (91). All objective performance measures were computed using the reference microphone signals and the output signals of all considered algorithms. Table I presents the objective performance measures for all considered algorithms.
The processing was performed at a sampling rate of 16 kHz in the STFT domain with a frame length of 8192 samples and a square-root Hann window with 50 % overlap. We used an oracle voice activity detector (i.e., using the desired source and interfering source signals) to estimate the noise covariance matrix R n , the undesired covariance matrix R v (interfering source plus background noise) and R xn = R x + R n (desired source plus background noise) over the entire signal. All binaural beamforming algorithms were implemented using relative transfer function (RTF) vectors [25], relating the ATF vectors in (4) to the reference microphones. Using the covariance whitening method (see [14], [26] for further details) the RTF vectors of the desired source and the interfering source were estimated based on generalised eigenvalue decomposition of R xn and R n or R v and R n , respectively. The mixing parameter was set to η = 0 . 3 and the interference scaling parameter was set to δ = 0 . 3 .
1) Noise and Interference Reduction Performance: In terms of noise reduction performance, it can be observed that - as expected - the BMVDR yields the highest SNR improvement ( 13 . 0 dB for the left and 12 . 9 dB for the right side). All other algorithms yield a lower SNR improvement, for the BLCMV due to the additional constraint for the interfering source, for
Fig. 8. Boxplot of the MUSHRA scores for all three evaluations. The plot depicts the median score (red line), the mean score (red dot), the first and third quartiles (blue boxes) and the interquartile ranges (whiskers). Outliers are indicated by red + markers.
<details>
<summary>Image 8 Details</summary>

### Visual Description
\n
## Box Plot Charts: Performance Scores Under Different Acoustic Conditions
### Overview
The image presents three box plot charts, each comparing the performance scores of different algorithms ("Reference", "Anchor", "BMVDR", "BLCMV", "BLCMV-N") under three distinct acoustic conditions: "Interfering source", "Background noise", and "Complete acoustic scene". Each chart displays the distribution of scores for each algorithm using box plots and individual data points.
### Components/Axes
Each chart shares the following components:
* **X-axis:** Algorithm names: "Reference", "Anchor", "BMVDR", "BLCMV", "BLCMV-N".
* **Y-axis:** "Score", ranging from 0 to 100, with tick marks at 0, 25, 50, 75, and 100.
* **Data Representation:** Box plots representing the interquartile range (IQR) and median of the scores for each algorithm. Individual red data points are plotted on top of the box plots, representing individual observations. Error bars extend from the box plots, indicating the range of the data.
* **Titles:** Each chart has a title indicating the acoustic condition: "Interfering source", "Background noise", and "Complete acoustic scene".
### Detailed Analysis or Content Details
**Chart 1: Interfering Source**
* **Reference:** The median score is approximately 5, with the box extending from roughly 0 to 15. Individual data points are scattered between 0 and 90.
* **Anchor:** The median score is approximately 70, with the box extending from roughly 50 to 85. Individual data points are scattered between 20 and 95.
* **BMVDR:** The median score is approximately 25, with the box extending from roughly 15 to 40. Individual data points are scattered between 10 and 60.
* **BLCMV:** The median score is approximately 65, with the box extending from roughly 50 to 80. Individual data points are scattered between 30 and 90.
* **BLCMV-N:** The median score is approximately 75, with the box extending from roughly 60 to 90. Individual data points are scattered between 40 and 95.
**Chart 2: Background Noise**
* **Reference:** The median score is approximately 5, with the box extending from roughly 0 to 15. Individual data points are scattered between 0 and 60.
* **Anchor:** The median score is approximately 75, with the box extending from roughly 60 to 90. Individual data points are scattered between 30 and 95.
* **BMVDR:** The median score is approximately 25, with the box extending from roughly 15 to 40. Individual data points are scattered between 10 and 60.
* **BLCMV:** The median score is approximately 70, with the box extending from roughly 55 to 85. Individual data points are scattered between 30 and 95.
* **BLCMV-N:** The median score is approximately 75, with the box extending from roughly 65 to 90. Individual data points are scattered between 40 and 95.
**Chart 3: Complete Acoustic Scene**
* **Reference:** The median score is approximately 5, with the box extending from roughly 0 to 15. Individual data points are scattered between 0 and 80.
* **Anchor:** The median score is approximately 75, with the box extending from roughly 60 to 90. Individual data points are scattered between 30 and 95.
* **BMVDR:** The median score is approximately 25, with the box extending from roughly 15 to 40. Individual data points are scattered between 10 and 60.
* **BLCMV:** The median score is approximately 55, with the box extending from roughly 40 to 70. Individual data points are scattered between 20 and 90.
* **BLCMV-N:** The median score is approximately 75, with the box extending from roughly 60 to 90. Individual data points are scattered between 40 and 95.
### Key Observations
* The "Reference" algorithm consistently performs the worst across all three acoustic conditions, with low median scores and a wide range of data points.
* The "Anchor" algorithm generally performs well, with high median scores and relatively tight distributions.
* "BLCMV-N" consistently shows high performance, often comparable to or slightly better than "Anchor".
* "BMVDR" shows moderate performance, consistently better than "Reference" but generally lower than "Anchor" and "BLCMV-N".
* The spread of data points (as indicated by the individual points and error bars) varies significantly between algorithms and acoustic conditions, suggesting varying levels of robustness.
### Interpretation
The data suggests that the "Anchor" and "BLCMV-N" algorithms are the most effective at maintaining performance across different acoustic environments. The "Reference" algorithm is consistently the least effective. The "BLCMV-N" algorithm appears to offer a slight improvement over the "Anchor" algorithm, particularly in the "Interfering source" and "Background noise" conditions.
The wide spread of data points for some algorithms indicates that their performance is sensitive to specific instances within each acoustic condition. This could be due to variations in the characteristics of the interfering source, background noise, or the overall acoustic scene.
The consistent poor performance of the "Reference" algorithm suggests it is not well-suited for these types of acoustic environments. The improvement offered by "BLCMV-N" over "BLCMV" suggests that the "N" component provides a beneficial enhancement, potentially related to noise reduction or interference cancellation. The charts provide a comparative analysis of algorithm performance, allowing for informed selection based on the expected acoustic conditions.
</details>
TABLE I OBJECTIVE PERFORMANCE MEASURES FOR ALL CONSIDERED ALGORITHMS IN THE REVERBERANT ENVIRONMENT.
| | BMVDR | BLCMV | BMVDR-N | BLCMV-N |
|-------------|---------|---------|-----------|-----------|
| ∆SNR L [dB] | 13 | 10.1 | 8.6 | 7.6 |
| ∆SNR R [dB] | 12.9 | 9.2 | 8.6 | 7 |
| ∆SIR L [dB] | -0.1 | 9.7 | 0.82 | 9.8 |
| ∆SIR R [dB] | -4.3 | 8.7 | -2.4 | 8.9 |
| ∆MSC | 0.86 | 0.64 | 0.1 | 0.19 |
the BMVDR-N due to the mixing with the noisy reference microphone signals, and for the BLCMV-N due to both effects. The partial noise estimation for the BLCMV-N seems to result in a smaller drop in noise reduction performance compared to the BLCMV ( 2 . 5 dB for the left side, 2 . 2 dB for the right side) than for the BMVDR-N compared to the BMVDR ( 4 . 4 dB for the left side, 4 . 3 dB for the right side). Please note that both for the BMVDR-N as well as for the BLCMV-N this drop in noise reduction performance depends on the relative position of the interfering source to the desired source.
In terms of interference reduction performance, it can be
BLCMV-N.
observed that both the BLCMV and the BLCMV-N approximately lead to the same SIR improvement (for the left and the right side), which is in line with the theoretical SIR improvement in (43) and (79), i.e., 10 log 10 1 δ 2 ≈ 10 . 5 dB . The fact that this theoretical SIR improvement is not reached and the fact that the SIR improvements for the BLCMV and BLCMV-N are not exactly the same is due to estimation errors in the covariance matrices, which was also already noted in [14], [17]. In addition, it can be observed that the BMVDR and BMVDR-N lead to very low (even negative) SIR improvements, which is presumably due to the fact that the interfering source is relatively close to the desired source. 2) Binaural Cue Preservation of Background Noise: As expected, the BMVDR yields the largest MSC error of the noise component ∆MSC . As discussed in Section III-B, the output MSC of the noise component for the BLCMV is typically smaller than 1, hence leading to a smaller MSC error compared to the BMVDR. Due to the mixing with the noisy reference microphone signals, both the BMVDR-N and the BLCMV-N yield a much smaller MSC error of the noise component than the BMVDR and the BLCMV, where the MSC error is slightly smaller for the BMVDR-N than for the
In conclusion, the objective performance measures show that the BLCMV-N leads to a very similar interference reduction as the BLCMV, while providing a trade-off between noise reduction performance (slightly worse than the BLCMV) and binaural cue preservation of the background noise (much better than the BLCMV).
## C. Perceptual Listenting Test
To further investigate the spatial impression of the different output signal components for the four considered algorithms, we conducted a perceptual listening test similarly to [21]. The desired source was now placed at -35 â—¦ and the interfering source was placed at 90 â—¦ , in order to enhance the perceived spatial differences between both sources. The desired source component, the interfering source component and the noise component were mixed at an input SNR of 0 dB and input SIR of 0 dB in the right reference microphone. Thirteen selfreported normal-hearing subjects participated in the perceptual listening test, where none of the authors participated. All subjects can be considered expert listeners, i.e., they were familiar with similar perceptual listening tests, and gave informed consent. The listening test was conducted in a sound proof listening booth using an RME Fireface UCX sound card with Sennheiser HD 580 headphones.
Using a procedure similar to the MUlti-Stimulus Test with Hidden Reference and Anchor (MUSHRA) [27], the task was to rate the perceived spatial difference with respect to a reference signal. For a coherent source (e.g., interfering source), this corresponds to rating differences in perceived source location, whereas for a diffuse noise field this corresponds to rating differences in perceived diffuseness. A score of 0 is associated with a large perceived spatial difference, whereas a score of 100 is associated with no perceived spatial difference. As reference signal we used the (unprocessed) reference microphone signals, while as anchor signal we used the left reference microphone signal, played back to both ears. The anchor signal was hence a monaural signal with no binaural cues, which is perceived in the center of the head.
We conducted three evaluations, where only some components were active in the output signals, the reference signal and the anchor signal. In the first evaluation, only the desired source component and the interfering source component (i.e., no noise component) were active and the task was to rate the
spatial difference for the interfering source. In the second evaluation, only the desired source component and the noise component (i.e., no interfering source component) were active and the task was to rate the spatial difference for the background noise. In the third evaluation, all signal components were active and the task was to rate the spatial difference for the interfering source and the background noise simultaneously. To familiarize the subjects with the tasks and the sound material, a training round was performed. Audio samples for all binaural beamforming algorithms and the unprocessed input signals are available online (see https://uol.de/en/sigproc/research/audiodemos/binaural-noise-reduction/blcmv-n-beamformer).
The MUSHRA scores for the three evaluations are shown in Figure 8. A one-way repeated-measures ANOVA was performed. The analysis revealed a significant within-subjects effect for all three evaluations. Hence, post-hoc comparison t-tests with Bonferroni correction were performed [28].
a) Interfering source: The within-subjects effect was significant [ F (2 . 098 , 25 . 176) = 219 . 2 , p < . 001 , GreenhouseGeisser correction]. As expected, the BLCMV and the BLCMV-N preserved the spatial impression of the interfering source significantly better than the BMVDR and the BMVDRN ( p < . 001 ). The BMVDR-N performed significantly better than the BMVDR ( p < . 001 ), which is not unexpected since the interfering source component is also mixed with the mixing paremter η . No significant difference was found between the BLCMV and the BLCMV-N ( p = 1 ).
b) Background noise: The within-subjects effect was significant [ F (3 . 072 , 36 . 869) = 332 . 066 , p < . 001 , GreenhouseGeisser correction]. As expected, the BMVDR-N and the BLCMV-N, both using partial noise estimation, preserved the spatial impression of the background noise significantly better than the BMVDR and the BLCMV ( p < . 001 ). No significant difference was found between the BMVDR-N and the BLCMV-N ( p = 1 ) and between the BMVDR and BLCMV ( p = . 614 ).
c) Complete acoustic scene: The within-subjects effect was significant [ F (2 . 905 , 34 . 858) = 171 . 783 , p < . 001 , Greenhouse-Geisser correction]. In terms of preservation of the spatial impression of the complete acoustic scene, the BMVDR-N scored significantly higher than the BMVDR ( p < . 001 ), the BLCMV scored significantly higher than the BMVDR-N ( p = . 014 ), and the proposed BLCMV-N scored significantly higher than the BLCMV ( p = . 025 ).
In summary, the results of the listening test showed that the BLCMV-N is capable of preserving the spatial impression of an interfering source and background noise in a realistic acoustic scenario, outperforming all other considered binaural beamforming algorithms in terms of spatial impression.
## VII. CONCLUSIONS
In this paper we proposed the BLCMV-N, merging the advantages of the BLCMV and the BMVDR-N, i.e., preserving the binaural cues of the interfering source and controlling the reduction of the interfering source as well as the binaural cues of the background noise. We showed that the output signals of the BLCMV-N can be interpreted as a mixture between the noisy reference microphone signals and the output signals of a BLCMV using an adjusted interference scaling parameter. We provided a theoretical comparison between the BMVDR, the BLCMV, the BMVDR-N and the proposed BLCMV-N in terms of noise and interference reduction performance and binaural cue preservation. The obtained analytical expressions were first validated using measured anechoic acoustic transfer functions. Experimental results using recorded signals in a realistic reverberant environment showed that the BLCMV-N leads to a very similar interference reduction as the BLCMV, while providing a trade-off between noise reduction performance (slightly worse than the BLCMV) and binaural cue preservation of the background noise (much better than the BLCMV). In addition, the results of a perceptual listening test with 13 normal-hearing participants showed that the proposed BLCMV-N is capable of preserving the spatial impression of an interfering source and background noise in a realistic acoustic scenario, outperforming all other considered binaural beamforming algorithms in terms of spatial impression.
## APPENDIX A DERIVATION OF THE BLCMV-N
Using (4), (6) and (39), the constrained optimization problem in (54) can be reformulated as
<!-- formula-not-decoded -->
This constrained optimization problem can be solved using the method of Lagrange multipliers, where the Lagrangian function is given by
<!-- formula-not-decoded -->
with λ L denoting the 2-dimensional vector of Lagrangian multipliers. Setting the gradient with respect to w L
<!-- formula-not-decoded -->
equal to 0 yields
<!-- formula-not-decoded -->
Substituting (95) into the constraint C H w L = g L and solving for the Lagrangian multiplier λ L yields
<!-- formula-not-decoded -->
Substituting (96) into (95), the solution to (54) is given by
<!-- formula-not-decoded -->
where, using (39),
<!-- formula-not-decoded -->
## APPENDIX B OUTPUT NOISE PSD FOR THE BLCMV-N
Using (67) in (16) with R n instead of R x , the output PSD of the noise component for the BLCMV-N is given by
<!-- formula-not-decoded -->
Using (64) and (66), the components in (99) are given by [14]
Substituting (100) in (99) yields
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
with R xu, 3 defined in (74). Similarly, it can be shown that
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
## REFERENCES
- [1] V. Hamacher, U. Kornagel, T. Lotter, and H. Puder, 'Binaural signal processing in hearing aids: Technologies and algorithms,' in Advances in Digital Speech Transmission . New York, NY, USA: Wiley, 2008, pp. 401-429.
- [3] S. Doclo, S. Gannot, D. Marquardt, and E. Hadad, 'Binaural speech processing with application to hearing devices,' in Audio Source Separation and Speech Enhancement . Wiley, 2018, ch. 18, pp. 413-442.
- [2] S. Doclo, W. Kellermann, S. Makino, and S. E. Nordholm, 'Multichannel signal enhancement algorithms for assisted listening devices: Exploiting spatial diversity using multiple microphones,' IEEE Signal Processing Magazine , vol. 32, no. 2, pp. 18-30, Mar. 2015.
- [4] J. Blauert, Spatial hearing: the psychophysics of human sound localization . Cambridge, Mass. MIT Press, 1997.
- [6] A. W. Bronkhorst and R. Plomp, 'The effect of head-induced interaural time and level differences on speech intelligibility in noise,' The Journal of the Acoustical Society of America , vol. 83, no. 4, pp. 1508-1516, Apr. 1988.
- [5] K. Kurozumi and K. Ohgushi, 'The relationship between the crosscorrelation coefficient of two-channel acoustic signals and sound image quality,' The Journal of the Acoustical Society of America , vol. 74, no. 6, pp. 1726-1733, Dec. 1983.
- [7] M. L. Hawley, R. Y. Litovsky, and J. F. Culling, 'The benefit of binaural hearing in a cocktail party: Effect of location and type of interferer,' The Journal of the Acoustical Society of America , vol. 115, no. 2, pp. 833843, Feb. 2004.
- [8] D. P. Welker, J. E. Greenberg, J. G. Desloge, and P. M. Zurek, 'Microphone-array hearing aids with binaural output. II. A twomicrophone adaptive system,' IEEE Transactions on Speech and Audio Processing , vol. 5, no. 6, pp. 543-551, 1997.
- [9] R. Aichner, H. Buchner, M. Zourub, and W. Kellermann, 'Multichannel source separation preserving spatial information,' in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , Honolulu HI, USA, Apr. 2007, pp. 5-8.
- [11] B. Cornelis, S. Doclo, T. van den Bogaert, J. Wouters, and M. Moonen, 'Theoretical analysis of binaural multi-microphone noise reduction techniques,' IEEE Transactions on Audio, Speech, and Language Processing , vol. 18, no. 2, pp. 342-355, Feb. 2010.
- [10] T. Klasen, T. van den Bogaert, M. Moonen, and J. Wouters, 'Binaural noise reduction algorithms for hearing aids that preserve interaural time delay cues,' IEEE Transactions on Signal Processing , vol. 55, no. 4, pp. 1579-1585, Apr. 2007.
- [12] E. Hadad, D. Marquardt, S. Doclo, and S. Gannot, 'Theoretical analysis of binaural transfer function MVDR beamformers with interference cue preservation constraints,' IEEE/ACM Transactions on Audio, Speech, and Language Processing , vol. 23, no. 12, pp. 2449-2464, Dec. 2015.
- [14] E. Hadad, S. Doclo, and S. Gannot, 'The binaural LCMV beamformer and its performance analysis,' IEEE/ACM Transactions on Audio, Speech, and Language Processing , vol. 24, no. 3, pp. 543-558, Mar. 2016.
- [13] D. Marquardt, V. Hohmann, and S. Doclo, 'Interaural coherence preservation in multi-channel Wiener filtering based noise reduction for binaural hearing aids,' IEEE/ACM Transactions on Audio, Speech, and Language Processing , vol. 23, no. 12, pp. 2162-2176, Dec. 2015.
- [15] A. I. Koutrouvelis, R. C. Hendriks, R. Heusdens, and J. Jensen, 'Relaxed binaural LCMV beamforming,' IEEE/ACM Transactions on Audio, Speech, and Language Processing , vol. 25, no. 1, pp. 137-152, Jan. 2017.
- [17] N. G¨ oßling, D. Marquardt, I. Merks, T. Zhang, and S. Doclo, 'Optimal binaural LCMV beamforming in complex acoustic scenarios: Theoretical and practical insights,' in Proc. International Workshop on Acoustic Signal Enhancement (IWAENC) , Tokyo, Japan, Sep. 2018, pp. 381-385.
- [16] D. Marquardt and S. Doclo, 'Interaural coherence preservation for binaural noise reduction using partial noise estimation and spectral postfiltering,' IEEE/ACM Transactions on Audio, Speech, and Language Processing , vol. 26, no. 7, pp. 1257-1270, Jan. 2018.
- [18] H. As'ad, M. Bouchard, and H. Kamkar-Parsi, 'A robust target linearly constrained minimum variance beamformer with spatial cues preservation for binaural hearing aids,' IEEE/ACM Transactions on Audio, Speech, and Language Processing , vol. 27, no. 10, pp. 1549-1563, Oct. 2019.
- [20] E. Hadad, D. Marquardt, S. Doclo, and S. Gannot, 'Comparison of binaural multichannel Wiener filters with binaural cue preservation of the interferer,' in IEEE International Conference on the Science of Electrical Engineering (ICSEE) , Eilat, Israel, Nov. 2016, pp. 1-5.
- [19] R. M. Corey and A. C. Singer, 'Binaural audio source remixing with microphone array listening devices,' in Proc. IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , Barcelona, Spain, May 2020, pp. 561-565.
- [21] N. G¨ oßling, D. Marquardt, and S. Doclo, 'Perceptual evaluation of binaural MVDR-based algorithms to preserve the interaural coherence of diffuse noise fields,' Trends in Hearing , vol. 24, pp. 1-18, Apr. 2020.
- [23] S. Gannot, E. Vincent, S. Markovich-Golan, and A. Ozerov, 'A consolidated perspective on multimicrophone speech enhancement and source separation,' IEEE/ACM Transactions on Audio, Speech, and Language Processing , vol. 25, no. 4, pp. 692-730, Apr. 2017.
- [22] B. D. Van Veen and K. M. Buckley, 'Beamforming: A versatile approach to spatial filtering,' IEEE ASSP Magazine , vol. 5, no. 2, pp. 4-24, Apr. 1988.
- [24] H. Kayser, S. D. Ewert, J. Anem¨ uller, T. Rohdenburg, V. Hohmann, and B. Kollmeier, 'Database of multichannel In-Ear and Behind-The-Ear head-related and binaural room impulse responses,' Eurasip Journal on Advances in Signal Processing , vol. 2009, p. 10 pages, Jan. 2009.
- [26] S. Markovich, S. Gannot, and I. Cohen, 'Multichannel eigenspace beamforming in a reverberant noisy environment with multiple interfering speech signals,' IEEE Transactions on Audio, Speech, and Language Processing , vol. 17, no. 6, pp. 1071-1086, Aug. 2009.
- [25] S. Gannot, D. Burshtein, and E. Weinstein, 'Signal enhancement using beamforming and non-stationarity with applications to speech,' IEEE Transactions on Signal Processing , vol. 49, no. 8, pp. 1614-1626, Aug. 2001.
- [27] ITU-R BS.1534-1, Method for the subjective assessment of intermediate quality level of coding systems. , International Telecommunications Union (ITU-T) Recommendation, Jan. 2003.
- [28] B. R. Kirkwood and J. A. C. Sterne, Essential medical statistics . John Wiley & Sons, 2010.