# Relaxed Binaural LCMV Beamforming
**Authors**: Andreas I. Koutrouvelis, Richard C. Hendriks, Richard Heusdens, Jesper Jensen
## Relaxed Binaural LCMV Beamforming
Andreas I. Koutrouvelis, Richard C. Hendriks, Richard Heusdens and Jesper Jensen
Abstract -In this paper we propose a new binaural beamforming technique which can be seen as a relaxation of the linearly constrained minimum variance (LCMV) framework. The proposed method can achieve simultaneous noise reduction and exact binaural cue preservation of the target source, similar to the binaural minimum variance distortionless response (BMVDR) method. However, unlike BMVDR, the proposed method is also able to preserve the binaural cues of multiple interferers to a certain predefined accuracy. Specifically, it is able to control the trade-off between noise reduction and binaural cue preservation of the interferers by using a separate trade-off parameter perinterferer. Moreover, we provide a robust way of selecting these trade-off parameters in such a way that the preservation accuracy for the binaural cues of the interferers is always better than the corresponding ones of the BMVDR. The relaxation of the constraints in the proposed method achieves approximate binaural cue preservation of more interferers than other previously presented LCMV-based binaural beamforming methods that use strict equality constraints.
Index Terms -Binaural cue preservation, beamforming, hearing aids, LCMV, multi-microphone noise reduction, MVDR.
## I. INTRODUCTION
Traditionally, hearing aids have been fitted bilaterally , i.e., the user wears a hearing aid on each ear, and the hearing aids are operating essentially independently of each other. As such, the noise reduction algorithm in each hearing aid estimates the signal of interest using only the recordings of the microphones from that specific hearing aid [9]. Such a setup with an independent multi-microphone algorithm per ear may severely distort the binaural cues since phase and magnitude relations of the sources reaching the two ears are modified [10]. This is harmful for the naturalness of the total sound field as received by the hearing-aid user. Ideally, all sound sources (including the undesired ones) that are present after processing should still sound as if originating from the original direction. This does not only lead to a more natural
C OMPARED to normal-hearing people, hearing-impaired people generally have more difficulties in understanding a target talker in complex acoustic environments with multiple interfering sources. To improve speech quality and intelligibility, single-microphone (see e.g. [1] for an overview) or multimicrophone noise reduction algorithms (see e.g., [2] for an overview) can be used. While the former are mostly effective in improving speech quality and reducing listening effort, the latter are also effective in improving speech intelligibility [3]. Examples of multi-microphone noise reduction algorithms include the multi-channel Wiener filter (MWF) [4], [5], the minimum variance distrortionless response (MVDR) beamformer [6], [7], or, its generalization, the linearly constrained minimum variance (LCMV) beamformer [7], [8].
This work was supported by the Oticon Foundation and the Dutch Technology Foundation STW.
perception of the acoustic environment, but can also lead to an improved intelligibility of a target speaker in certain cases; more specifically, in spatial unmasking experiments [11] it has been shown that a target speaker in a noisy background is significantly easier to understand when the noise sources are separated in space from the speaker, as compared to the situation where speaker and noise sources are co-located.
The LCMV algorithm [7], [8] minimizes the output power of the noise under multiple linear equality constraints. One of these equality constraints is typically used to guarantee that the target source remains undistorted with respect to a certain reference location or microphone. The remaining constraints can be used as additional degrees of freedom in designing the final filter. For example, it can be used to steer nulls in the directions of the interferers [7], [12], or to broaden the beam towards the target source in order to avoid pointing error problems, also known as steering vector mismatches [13], [14]. A special case of the LCMV method is the minimum variance distortionless response (MVDR) beamformer, which only uses the distortionless constraint of the target source [6], [7].
Binaural hearing aids are able to wirelessly exchange microphone signals between hearing aids. This facilitates the use of multi-microphone noise reduction algorithms which combine all microphone recordings from both hearing aids, hence allowing the usage of more microphone recordings than with the bilateral noise reduction. As such, the increased number of microphone recordings can potentially lead to better noise suppression and, thus, to a higher speech intelligibility. Moreover, by introducing proper constraints on the beamformer coefficients, binaural cue preservation of the sources can be achieved.
An alternative multi-microphone noise reduction method is the MWF which leads to the minimum mean square error (MMSE) estimate of the target source if the estimator is constrained to be linear, or, the target source and the noise are assumed to be jointly Gaussian distributed [15]. However, in [16]-[18], it was demonstrated that speech signals in time and frequency domains tend to be super-Gaussian distributed rather than Gaussian distributed. Thus, the MWF is generally not MMSE optimal. The MWF does not include a distortionless constraint for the target source and, thus, it generally introduces speech distortion in the output [4]. Several generalizations of the MWF have been proposed, among which the speech distortion weighted MWF (SDWMWF) [5], which introduces a parameter in the minimization procedure to control the trade-off between speech distortion and noise reduction. A well-known property of the MWF is the fact that it can be decomposed into an MVDR beamformer and a single-channel Wiener filter as a post-processor [19]. Notice that this holds in general, also if the filter is not constrained to be linear and the target source is not assumed to be Gaussian
distributed [15].
The binaural version of the SDW-MWF (BSDWMWF) [21], [22] preserves the binaural cues of the target. However, it was theoretically proven that the binaural cues of the interferers collapse on the binaural cues of the target source [23] (i.e., after processing the binaural cues of the interferers become identical to the binaural cues of the target source). In [22], a variation of the BSDW-MWF (called BSDW-MWF-N) was proposed which tries to partially preserve the binaural cues of the interferers. This method inserts a portion of the unprocessed noisy signal at the reference microphones to the coresponding BSDW-MWF enhanced signals. The larger the portion of the unprocessed noisy signals, the lower the noise reduction, but the better the preservation of binaural cues of the interferers and vice versa. As such, this solution exhibits a trade-off between the preservation of binaural cues and the amount of noise reduction. In [24], a subjective evaluation of BSDW-MWF and BSDW-MWF-N shows that for a moderate input SNR indeed the subjects localized the processed interferer correctly with BSDW-MWF-N and incorrectly with BSDW-MWF. However, for a small input SNR the processed interferer was also localized correctly for BSDW-MWF. This is mainly due to the inaccurate estimates of the cross power spectral density (CPSD) matrix of the target, and due to masking effects when the processed target and processed interferer are represented to the subjects simultaneously [24]. In [25], two other variations of the BSDW-MWF were proposed. The first one is capable of preserving the binaural cues of the target and completely cancel one interferer. The second one is capable of accurately preserving the binaural cues of only one interferer, while distorting the binaural cues of the target.
There are several binaural multi-microphone noise reduction methods known from the literature. These can be devided into two main categories [20]: a) methods based on the linearly constrained minimum variance (LCMV) framework and b) methods based on the multi-channel Wiener filter (MWF).
Similarly to SDW-MWF, the BSDW-MWF can be decomposed into the binaural MVDR (BMVDR) beamformer and a single-channel Wiener filter [25]. The BMVDR can preserve the binaural cues of the target source, but the binaural cues of the interferers collapse to the binaural cues of the target source. In [26], [27], the binaural linearly constrained minimum variance (BLCMV) algorithm was proposed, which achieves simultaneous noise reduction and binaural cue preservation of the target source and multiple interferers. Unlike the BMVDR, the BLCMV uses two additional linear constraints per interferer to preserve its binaural cues. A fixed interference rejection parameter is used in combination with these constraints to control the amount of noise reduction. The BLCMV is thus capable of controlling the amount of noise reduction using two constraints per interferer. However, in hearing-aid systems with a rather limited number of microphones, the degrees of freedom for noise reduction are exhausted quickly when increasing the number of interferers. This makes the BLCMV less suitable for this application.
In [28], a similar method to BLCMV, called optimal BLCMV (OBLCMV), was proposed which is able to achieve simultaneous noise reduction and binaural cue preservation of the target source and only one interferer. Unlike the BLCMV, the OBLCMV uses an optimal interference rejection parameter with respect to the binaural output SNR. In [29], [30] two independent works proposed the same LCMV-based method (we call it joint BLCMV (JBLCMV)) as an alternative to the BLCMV, which preserves the binaural cues of the target source and more than twice the number of interferers compared to the BLCMV [29]. Unlike the BLCMV, the JBLCMV requires only one linear constraint per interferer and, as a result, it has more degrees of freedom left for noise reduction. The linear constraints for the preservation of the binaural cues of the interferers have the same form as the linear constraint used in [25]. However, unlike the method in [25], the JBLCMV can preserve the binaural cues of a limited number of interferers and does not distort the binaural cues of the target source.
The remainder of this paper is organized as follows. In Section II, the signal model and the notation are presented. In Section III the key idea of the binaural beamforming is explained and several existing binaural LCMV-based algorithms are summarized. In Sections IV and V, a novel nonconvex binaural beamforming problem and its iterative convex approximation are presented, respectively. In Section VI, the evaluation of the proposed algorithm is provided. Finally, in Section VII, we draw some conclusions.
In this paper, we present an iterative, relaxed binaural LCMV beamforming method. Similar to the other binaural LCMV-based approaches, the proposed method strictly preserves the binaural cues of the target source. However, the proposed method is flexible to control the accuracy of binaural cue preservation of the interferers and, therefore, trade-off against additional noise reduction. This is achieved by using inequality constraints instead of the commonly used equality constraints. The task of each inequality constraint is the (approximate) preservation of the binaural cues of a single interferer in a controlled way. The proposed method is flexible to select a different value for the trade-off parameter of each interferer according to importance. The BMVDR and the JBLCMV can be seen as two extreme cases of the proposed method. On one hand, the BMVDR can achieve the best possible overall noise suppression compared to all the other aforementioned binaural LCMV-based methods, but causes full collapse of the binaural cues of the interferers towards the binaural cues of the target source. On the other hand, the JBLCMV can achieve the preservation of the maximum possible number of interferers compared to the other aforementioned binaural LCMV-based methods, but at the expense of less noise suppression. Unlike the JBLCMV and the BMVDR, the proposed method, is flexible to control the amount of noise suppression and binaural cue preservation according to the needs of the user. The relaxations used in the proposed method allow the usage of a substantially larger number of constraints for the approximate preservation of more interferers compared to all the other binaural LCMVbased methods including JBLCMV.
## II. SIGNAL MODEL AND NOTATION
Assume for convenience that each of the two hearing aids consists of M/ 2 microphones, where M is an even number.
Therefore, the microphone array consists of M microphones in total. The multi-microphone noise reduction methods considered in this paper operate in the frequency domain on a frame-by-frame basis. Let l denote the frame index and k the frequency-bin index. Assume that there is only one target source and there are r interferers. The k -th frequency coefficient of the l -th frame of the j -th microphone noisy signal, y j ( k, l ) , j = 1 , · · · , M , is given by
<!-- formula-not-decoded -->
where
- s ( k, l ) denotes the target signal at the source location.
- a j ( k, l ) is the acoustic transfer function (ATF) of the target signal with respect to the j -th microphone.
- u i ( k, l ) , is the i -th interfering signal at the source location.
- b ij ( k, l ) is the ATF of the i -th interfering signal with respect to the j -th microphone.
- n ij ( k, l ) is the i -th received interfering signal at the j -th microphone.
- x j ( k, l ) is the received target signal at the j -th microphone.
- v j ( k, l ) is additive noise at the j -th microphone.
Here we use in the signal model the ATFs for notational convinience. However, note that the ATFs can be replaced with relative acoustic transfer functions (RATF)s which can often be identified easier than the ATFs [12], [20].
In the remainder of the paper, the frequency and frame indices are neglected to simplify the notation. Using vector notation, Eq. (1) can be written as
<!-- formula-not-decoded -->
where y ∈ C M × 1 , x ∈ C M × 1 , n i ∈ C M × 1 and v ∈ C M × 1 are the stacked vectors of the y j , x j , n ij , v j (for j = 1 , · · · , M ) components, respectively. Moreover, x = a s and n i = b i u i , where a ∈ C M × 1 and b i ∈ C M × 1 are the stacked vectors of the a j and b ij (for j = 1 , · · · , M ) components, respectively.
Assuming that all sources and the additive noise are mutually uncorrelated, the CPSD matrix P y = E [ yy H ] of y is given by where
<!-- formula-not-decoded -->
- P n i = E [ n i n H i ] = p u i b i b H i ∈ C M × M is the CPSD matrix of n i , with p u i = E [ | u i | 2 ] the PSD of u i .
- P x = E [ xx H ] = p s aa H ∈ C M × M is the CPSD matrix of x , with p s = E [ | s | 2 ] the power spectral density (PSD) of s .
- P v = E [ vv H ] ∈ C M × M is the CPSD matrix of v .
· P = r ∑ i =1 P n i + P v is the total CPSD matrix of all disturbances.
## III. BINAURAL BEAMFORMING
Binaural multi-microphone noise reduction methods aim at the simultaneous noise reduction and binaural cue preservation of the sources. In order to preserve the binaural cues, two different spatial filters ˆ w L ∈ C M × 1 and ˆ w R ∈ C M × 1 , are applied to the left and right hearing aid, respectively, where constraints can be used to guarantee that certain phase and magnitude relations between the left and right hearing aid outputs are preserved. Note that both spatial filters use all microphone recordings from both hearing aids.
Assume for convenience and without loss of generality that the reference microphone for the left and right hearing aid is indexed as j = 1 and j = M , respectively. In the sequel of the paper, for ease of notation, the reference terms of Eq. (1) use the subscripts L and R instead of j = 1 and j = M , respectively. The two enhanced output signals at the left and right hearing aids are then given by
<!-- formula-not-decoded -->
In Section III-A, objective measures for the preservation of binaural cues are presented. In Sections III-C-III-F, the binaural MVDR (BMVDR), the binaural LCMV (BLCMV), the optimal BLCMV (OBLCMV), and the JBLCMV are reviewed, respectively. All reviewed methods are special cases of the general binaural LCMV (GBLCMV) framework, presented in Section III-B. Finally, the basic properties of all reviewed methods are summarized in Section III-G.
## A. Binaural Cues
The extent to which the binaural cues of a specific source are preserved can be expressed using the input and output interaural tranfer function (ITF) [31]. Often the ITF is decomposed into its magnitude, describing the interaural level differences (ILDs) and its phase, describing the interaural phase differences (IPDs). The input and output ITFs of the i -th interferer are defined as [31]
<!-- formula-not-decoded -->
The input and output ILDs are defined as [31]
<!-- formula-not-decoded -->
The input and output IPDs are given by [31]
<!-- formula-not-decoded -->
Note that frequently, the IPDs are converted and measured as time delays [32], i.e., interaural time differences (ITDs). The IPDs and ILDs are the dominant cues for binaural localization for low and high frequencies, respectively [33]. Typically, the IPDs become more important for frequencies below 1 kHz, while ILDs become more important for frequencies above 3 kHz [33]. In [34] it was experimentally shown that for broadband signals, the IPDs are perceptually much more important
than the ILDs for localizing a source. More specifically, it was shown that the low frequency IPDs play the most important role perceptually for correct localization. Based on this observation several proposed multi-microphone noise reduction techniques [32], [35] leave the low frequency content of the noisy measurements unprocessed, and process only the higher frequency content. Unfortunately, if a large portion of the power of the noise is concentrated at low frequencies, the noise reduction capabilities are reduced significantly. Therefore, in this paper we aim at the simultaneous preservation of binaural cues of all sources and noise reduction at all frequencies.
A binaural spatial filter, w = [ w T L w T R ] T , exactly preserves the binaural cues of the i -th interferer if ITF in n i = ITF out n i [31]. Exact preservation of ITFs also implies preservation of ILDs and IPDs [31], i.e., ILD in n i = ILD out n i and IPD in n i = IPD out n i . Non-exact preservation of binaural cues implies that there is some positive ITF error given by
<!-- formula-not-decoded -->
Moreover, non-exact presevation of binaural cues implies that there is some ILD and/or IPD errors, given by
<!-- formula-not-decoded -->
where 0 ≤ T n i ≤ 1 [31]. Eqs. (5), (6), (7), (8) and (9) apply also for the target source x . As it will become obvious in the sequel, for all methods that will be discussed in this paper, the errors in Eqs. (8), (9) with respect to the target source are always zero.
As explained before, the IPD error is perceptually more important measure for binaural localization than the ILD error for broadband signals (such as speech signals contaminated by broad-band noise signals), because the IPDs are perceptually more important than the ILDs for this category of signals. Moreover, the IPD error is perceptually more informative at low frequencies, while the ILD error is perceptually more informative at high frequencies.
## B. General Binaural LCMV Framework
All binaural LCMV-based methods discussed in this section are based on a general binaural LCMV (GBLCMV) 1 framework which is the binaural version of the classical LCMV framework [7], [8]. The GBLCMV minimizes the sum of the left and right output noise powers under multiple linear equality constraints. That is,
<!-- formula-not-decoded -->
where ˆ w GBLCMV = [ ˆ w T GBLCMV ,L ˆ w T GBLCMV ,R ] T ∈ C 2 M × 1 , Λ ∈ C 2 M × d is assumed to be a full column rank matrix (i.e., rank ( Λ ) = d ), f ∈ C d × 1 , d is the number of linear equality constraints, and
<!-- formula-not-decoded -->
1 We used the word general in order to distinguish it from the BLCMV method [26], [27].
Similarly to the classical LCMV framework [7], [8], if d ≤ 2 M , and Λ is full column rank, the GBLCMV has a closedform solution given by
In GBLCMV, the total number of degrees of freedom devoted to noise reduction is DOFGBLCMV = 2 M -d . Note that in the special case where d = 2 M , there are no degrees of freedom left for controlled noise reduction, i.e., ˆ w GBLCMV cannot reduce the objective function of the GBLCMV problem in a controlled way. Finally, if d > 2 M , the feasible set { w : w H Λ = f H } is empty and the GBLCMV problem has no solution. In conclusion, the matrix Λ has to be 'tall' (i.e., d < 2 M ), to be able to simultaneously achieve controlled noise reduction and satisfy the constraints of the GBLCMV problem. The maximum number of constraints that the GBLCMV framework can handle, while achieving controlled noise reduction, is d max = 2 M -1 , i.e., there should be always left at least one degree of freedom for noise reduction. Generally, the more degrees of freedom (i.e., the larger DOFGBLCMV), the more controlled noise reduction can be achieved.
<!-- formula-not-decoded -->
The set of linear constraints of the GBLCMV framework in Eq. (10) can be devided into two parts,
<!-- formula-not-decoded -->
The first part consists of two distortionless constraints w H L a = a L and w H R a = a R which preserve the target source at the two reference microphones. This can be written compactly as
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
where
All binaural methods discussed in this section are special cases of the GBLCMV framework and they share the constraints in Eq. (14), while the constraints w H Λ 2 = f H 2 are different.
In the sequel of the paper we use the term m ( m max ) to indicate the number (maximum number) of interferers that a special case of the GBLCMV framework can preserve, while at the same time achieving controlled noise reduction. Recall that controlled noise reduction means that there is at least one degree of freedom left for noise reduction. Moreover, m max ≤ r which means that some methods may be unable to preserve all simultaneously present interferers of the acoustic scene, because there are not enough available degrees of freedom.
## C. BMVDR
The BMVDR beamformer [30] can be formulated using the combination of the following two beamformers
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
with closed-form solutions
<!-- formula-not-decoded -->
The BMVDR is the simplest special case of the GBLCMV framework in the sense that it has the minimum number of constraints ( d = 2 ) given by Eq. (14). Specifically, the two optimization problems in Eqs. (15) and (16) can be reformulated as the following joint optimization problem,
<!-- formula-not-decoded -->
where ˆ w BMVDR = [ ˆ w T BMVDR ,L ˆ w T BMVDR ,R ] T ∈ C 2 M × 1 . Since, the BMVDR algorithm has the minimum possible number of constraints, the total number of degrees of freedom which can be devoted to noise reduction is DOFBMVDR = 2 M -2 .
The BMVDR beamformer preserves the binaural cues of the target source, but distorts the binaural cues of all the interferers [30], i.e., m max = 0 . More specifically, after beamforming, the binaural cues of the interferers collapse on the binaural cues of the target source. It can be easily shown [30] that the binaural cues of the target source are preserved due to the satisfaction of the two distortionless constraints of the problems in Eqs. (15) and (16). That is,
<!-- formula-not-decoded -->
Therefore, the ITF error is E x , BMVDR = 0 . Furthermore, it can be easily shown that the binaural cues of the interferers collapse to the binaural cues of the target source [30]. More specifically, the ITF in n i is given by
<!-- formula-not-decoded -->
while ITF out n i is given by
<!-- formula-not-decoded -->
Thus, after beamforming, the interferers will have the same ITF as the target source and their ITF error is given by
## D. BLCMV
<!-- formula-not-decoded -->
Another special case of the GBLCMV framework is the binaural linearly constrained minimum variance (BLCMV) beamformer [26], [27] which, unlike the BMVDR, uses additional constraints for the preservation of the binaural cues of m interferers. The left and right spatial filters of the BLCMV are given by [26], [27]
<!-- formula-not-decoded -->
and
<!-- formula-not-decoded -->
where the constraints w H L a = a L and w H R a = a R are the two common distortionless constraints used in all special cases in the GBLCMV framework, while the constraints w H L b i = η L b iL and w H R b i = η R b iR , for i = 1 , . . . , m , aim at a) preserving the binaural cues and b) supressing the m interferers. The amount of supression is controlled via the interference rejection parameters η L and η R which are predefined ( 0 ≤ η L , η R < 1 ) real-valued scalars. Binaural cue preservation is achieved only if η = η L = η R [26], [28]. The two optimization problems in Eqs. (23) and (24) can be compactly formulated as a joint optimization problem. That is,
<!-- formula-not-decoded -->
where and
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
The available degrees of freedom for noise reduction are DOFBLCMV = 2 M -d = 2 M -2 m -2 . Since d max = 2 M -1 (see Section III-B), BLCMV can simultaneously achieve controlled noise suppression and binaural cue preservation of at most m max = M -2 interferers.
The ITF errors of the target source and of the m interferers that are included in the constraints are zero, i.e., E x , BLCMV = 0 and E n i , BLCMV = 0 , for i = 1 , · · · , m ≤ r . However, if some interferers are not included in the constraints, their ITF error will be non-zero, i.e., E n i , BLCMV > 0 , for i = m +1 , · · · , r .
## E. OBLCMV
The OBLCMV [28] can be seen as a special case of the BLCMV (and, hence, the GBLCMV) since it solves the same optimization problem. However, it preserves the binaural cues of only one interferer (e.g., the k -th interferer) using an optimal complex-valued interference rejection parameter ˆ η = ˆ η L = ˆ η R with respect to the binaural output SNR (defined in Sec. VI-B2). More specifically, OBLCMV solves the optimization problem in Eq. (25) where Λ and f T , are given by [28]
<!-- formula-not-decoded -->
where 1 ≤ k ≤ r . The available degrees of freedom for noise reduction are DOFOBLCMV = 2 M -4 .
The ITF errors of the target source and of the k -th interferer that are included in the constraints are zero, i.e., E x , OBLCMV = 0 and E n k , OBLCMV = 0 . However, the binaural cues of all the other r -1 interferers will be distorted, i.e., E n i , BLCMV > 0 , for i ∈ { 1 , · · · , r } - { k } .
## F. JBLCMV
Recall from Section III-A that preserving binaural cues of the i -th interferer implies that the following constraint has to be satisfied
<!-- formula-not-decoded -->
which can be reformulated as:
<!-- formula-not-decoded -->
Compared to (O)BLCMV this unified constraint reduces the number of constraints, used for binaural cue preservation, by a factor 2. As a result, for a given number of interferers, more degrees of freedom can be devoted to noise reduction. The JBLCMV [29], [30] uses this type of equality constraints for the preservation of the binaural cues of m interferers. More specifically, the JBLCMV problem is given by
<!-- formula-not-decoded -->
where
<!-- formula-not-decoded -->
and w JBLCMV = [ w T JBLCMV ,L w T JBLCMV ,R ] T . Moreover,
Similarly to all other special cases of the GBLCMV framework, w H Λ 1 = f H 1 is used for the exact binaural cue preservation of the target source, while w H Λ 2 = f H 2 is used for the preservation of the binaural cues of m interferers.
<!-- formula-not-decoded -->
The JBLCMV can simultaneously achieve controlled noise reduction and binaural cue preservation of up to m max = 2 M -3 interferers [29]. Moreover, the degrees of freedom devoted to noise reduction is DOFJBLCMV = 2 M -m -2 .
## G. Summary of GBLCMV methods
We summarize some of the properties of the methods discussed in Section III. Table I gives an overview of two important factors: a) the maximum number of interferers' binaural cues that can be preserved while achieving controlled noise reduction m max, and b) the degrees of freedom (DOF) available for noise reduction. The following conclusions can be drawn from this table:
- The BMVDR has the maximum DOF, which means that it can achieve the best possible noise reduction. It
## TABLE I
SUMMARY OF A) MAXIMUM NUMBER OF INTERFERERS' BINAURAL CUES THAT CAN BE PRESERVED WHILE ACHIEVING CONTROLLED NOISE REDUCTION ( m MAX ), AND B) NUMBER OF AVAILABLE DEGREES OF FREEDOM FOR NOISE REDUCTION (DOF). ALL METHODS ARE SPECIAL CASES OF THE GBLCMV FRAMEWORK. M IS THE TOTAL NUMBER OF MICROPHONES, AND m IS THE NUMBER OF THE CONSTRAINED INTERFERERS.
| Method | m max | DOF |
|-------------------|---------|---------------|
| BMVDR [30] | 0 | 2 M - 2 |
| BLCMV [27] | M - 2 | 2 M - 2 m - 2 |
| OBLCMV [28] | 1 | 2 M - 4 |
| JBLCMV [29], [30] | 2 M - 3 | 2 M - m - 2 |
preserves the binaural cues of the target source, but not the binaural cues of the interferers.
- Unlike (O)BLCMV which uses two constraints per interferer, JBLCMV uses only one constraint per interferer. Therefore, JBLCMV can preserve the binaural cues of more interferers, or equivalently, given the same number of interferers it has more available degrees of freedom devoted to noise reduction.
In this paper, if the number of simultaneously present interferers is r > m max , the extra interferers r -m max are not included in the constraints in the GBLCMV methods, in order to always have one degree of freedom left for controlled noise reduction.
## IV. PROPOSED NON-CONVEX PROBLEM
In this section, we present a general optimization problem of which BMVDR and JBLCMV are special cases. More specifically, we relax the constraints on the binaural cues of the interferers, while keeping the strict equality constraints on the target source (i.e., w H Λ 1 = f H 1 ). The relaxation allows to trade-off the amount of noise reduction and binaural cue preservation per interferer in a controlled way. The proposed optimization problem is defined as
<!-- formula-not-decoded -->
The inequality constraints bound the ITF error (see Eq. (8)), for the interferers i = 1 , · · · , m to be less than a positive tradeoff parameter e i , i = 1 , · · · , m . These inequality constraints will be transformed, in the sequel of this section (see Eqs. (34), (35)), in such a way that they can be viewed as relaxations of the strict equality constraints in Eq. (28) used in the JBLCMV method. Note that the proposed method is flexible to choose a different e i for every interferer according to its importance. For instance, maybe certain locations are more important to be preserved than others and, therefore, a smaller e i must be used. The trade-off parameter, e i , is selected as
<!-- formula-not-decoded -->
where 0 ≤ c i ≤ 1 controls the amount of binaural cue collapse towards the target source, and the amount of noise reduction of the i -th interferer. If c i = 1 , ∀ i is used in the optimization problem in Eq. (32), then ˆ w = ˆ w BMVDR which is seen as a worst case, with respect to binaural cue preservation, because there is total collapse of binaural cues of the interferers towards the binaural cues of the target source. If c i = 0 , ∀ i we have perfect preservation of binaural cues of the m interferers, and ˆ w = ˆ w JBLCMV. Without any loss of generality, for notational convenience, we assume that the binaural cues of all interferers are of equal importance and, therefore, c i = c, ∀ i . Moreover, we keep c fixed over all frequency bins. It is worth noting that other strategies for choosing c may exist, which might lead to a better tradeoff between maximum possible noise reduction and perceptual binaural cue preservation. As explained in Section III-A, low frequency content is perceptually more important for binaural cue preservation than high frequency content. Thus, smaller c values for low frequencies and larger c values for higher frequencies may give a better perceptual trade-off.
The problem in Eq. (32) is not a convex problem and it is hard to solve. In Section V we propose a method that approximately solves the non-convex problem in an iterative way by solving at each iteration a convex problem.
## V. PROPOSED ITERATIVE CONVEX PROBLEM
By doing some simple algebraic manipulations, the optimization problem in Eq. (32) can equivalently be written as
<!-- formula-not-decoded -->
Furthermore, the optimization problem in Eq. (34) can be rewritten as
<!-- formula-not-decoded -->
We approximately solve the non-convex problem in Eq. (35) in an iterative way using w H R of the previous iteration in f 2 ,i , i = 1 , · · · , m . The new iterative problem is convex at each iteration and is given by where Λ 2 ,i is the i -th column of Λ 2 in Eq. (30).
<!-- formula-not-decoded -->
where ˆ w ( k ) = [ ˆ w T L, ( k ) ˆ w T R, ( k ) ] T is the estimated binaural spatial filter of the k -th iteration, which is initialized as ˆ w (0) = ˆ w BMVDR.
Similarly to other existing minimum variance beamformers with inequality constraints [36], [37], the convex optimization problem in Eq. (36) can be equivalently written as a second order cone programming (SOCP) problem with equality and inequality constraints (see Appendix) and it can be solved efficiently with interior point methods [38].
<!-- formula-not-decoded -->
The ITF error of the i -th interferer at the k -th iteration is given by
∣ ∣ This iterative method is stopped when all the constraints of the original problem in Eq. (32) are satisfied. The stopping criterion that we use for the proposed iterative method is given by
<!-- formula-not-decoded -->
The termination of the proposed iterative method may need a large amount of iterations because of the fixed c in Eq. (36). The reason for this is explained in detail in Section V-A. To control the speed of termination we replace in Section V-B the fixed c in Eq. (36) with a decreasing parameter Ï„ ( k ) (initialized with Ï„ (0) = c ) which controls the speed of termination. In Section V-C we show under which conditions the proposed algorithm guarantees that it will find a feasible solution satisfying the stopping criterion in Eq. (38) in a finite number of iterations. An overview of the proposed method using the adaptive Ï„ ( k ) is given in Algorithm 1.
where e i ( c ) is given in Eq. (33). Recall that f 2 = 0 (i.e., f 2 ,i = 0 , ∀ i ) is used in JBLCMV. Unlike JBLCMV, the proposed method uses f 2 ,i, ( k ) ≥ 0 , ∀ i and, therefore, the constraints dedicated for the preservation of binaural cues of the interferers are seen as relaxations of the strict equality constraints of the JBLCMV method. These relaxations enlarge the feasible set of the problem, allowing more constraints to be used compared to JBLCMV. The JBLCMV can be seen as a special case of the proposed method for c = 0 , f 2 ,i, (1) = 0 , i = 1 , · · · m . In this case, the relaxed constraints in the proposed method become identical to the strict constraints of the JBLCMV. Hence, the JBLCMV needs to run only one iteration of the problem in Eq. (36). If c = 0 , the proposed method follows the same strategy for handling r > m max simultaneously present interferers as in Section III-G. However, if c > 0 , then there is a typically large, difficult to predict m max 2 , due to the inequality constraints and, therefore, the proposed method uses m = r, ∀ r constraints for the preservation of the binaural cues of all simultaneously present interferers. Finally, if c = 1 , the proposed method does not iterate and stops immediately giving as output the initialization ˆ w (0) = ˆ w BMVDR.
## A. Speed of Termination
The proposed iterative method may have slow termination due to the fixed choice of c . In this section we explain the reason and in Section V-B we explain how to control the speed of termination.
2 The feasible set of the proposed method typically reduces by adding more inequality constraints. However it is difficult to predict after how many constraints, m , it becomes empty, i.e., what is the value of m max.
Let Φ ( k ) denote the convex feasible set in the k -th iteration of the iterative optimization problem in Eq. (36) given by
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
and Ψ( c ) the non-convex feasible set of the original nonconvex problem of Eqs. (32), (33) given by where ˆ w JBLCMV ∈ Ψ(0) , and Ψ(0) ⊆ Ψ( c ) , 0 ≤ c ≤ 1 and, therefore, ˆ w JBLCMV ∈ Ψ( c ) , 0 ≤ c ≤ 1 . In words, ˆ w JBLCMV is the feasible solution that belongs to Ψ(0) and gives the minimum output noise power.
Note that the proposed iterative method will typically try to find a solution on the boundary of Φ ( k ) . Some parts of the boundary of Φ ( k ) will be inside or on the boundary of Ψ( c ) , while other parts can be outside the set Ψ( c ) . Therefore, it is possible that the estimated ˆ w ( k ) will be outside of Ψ( c ) (see Fig. 1(a) for instance). In this case, obviously, the stopping criterion is not satisfied and, therefore, the problem goes to the next iteration. In the next iteration, Φ ( k +1) changes and a new ˆ w ( k +1) is estimated which can be again outside of Ψ( c ) (see Fig. 1(a) for instance). This repetition can happen many times leading to a very slow termination because the new estimate ˆ w ( k +1) is not selected according to a binauralcue error descent direction. To avoid this undesirable situation, we propose in Section V-B to replace the fixed c in Eq. (36) with an adaptive reduction parameter τ ( k ) , in order to make sure that solutions that are on the boundary of Φ ( k ) and that are outside Ψ( c ) will progressively provide a reduced binauralcue error, i.e., to move towards the direction of the interior of Ψ( c ) (see Fig. 1(b) for instance).
Note that the Φ ( k ) changes for every next iteration, while Ψ( c ) is constant over time. We can think of Φ ( k ) as a convex approximation set of Ψ( c ) at iteration k (see a simplistic example of the two sets in Fig. 1(a)).
## B. Avoiding Slow Termination
The termination of the proposed iterative method may need a large amount of iterations because of the fixed c in Eq. (36), as explained in Section V-A. Therefore, the replacement of c with an adaptive reduction parameter Ï„ ( k ) only in Eq. (36) is useful for guaranteed termination within a pre-selected finite maximum number of iterations, k max . More specifically, the new adaptive reduction parameter that we use in Eq. (36) instead of c is given by
<!-- formula-not-decoded -->
where τ (0) = c is selected according to the initial desired amount of collapse of binaural cues in the original non-convex problem in Eqs. (32), (33). The step α ( k max ) controls the speed of termination (i.e., how fast the stopping criterion will be satisfied), and is a function of the maximum allowed number of iterations for termination given from the user. That is,
<!-- formula-not-decoded -->
```
```
Note that we replace c with τ ( k ) only in Eq. (36) and not in the stopping criterion in Eq. (38). This is because, the stopping criterion is based on the fixed feasible set Ψ( c ) of the nonconvex problem in Eq. (32) which should remain constant over iterations (see an example of two consecutive iterations in Fig. 1). Moreover, the τ ( k ) is always non-negative, because τ ( k max ) = 0 . Small k max , speeds up the reduction of τ ( k ) and, thus, it also speeds up the termination of the proposed method. Of course a very small k max can lead to a feasible solution, ˆ w ( k ) , for which ∑ i E n i , ( k ) /lessmuch ∑ i e i ( c ) , i.e., to be far away from the boundary of Ψ( c ) . This means that ˆ w ( k ) provides better binaural cue preservation than the desired amount of binaural cue preservation, e i ( c ) . As a result, there will be less noise suppression. Ideally, we would like to arrive as close as possible to the controlled trade-off between noise reduction and binaural cue preservation given by our initial specifications (i.e., amount of collapse). Therefore, a careful choice of k max is needed in order to find a feasible solution ˆ w ( k ) that:
- to terminate as fast as possible.
- achieves a binaural-cue error ∑ i E n i , ( k ) ≈ ∑ i e i ( c ) , i.e., to be as close as possible to the boundary of Ψ( c ) .
Of course there is a trade-off between the two goals.
## C. Guaranteed Termination
In this section, we prove that the proposed iterative method using the adaptive reduction parameter in Eq. (41) guarantees termination, simultaneous controlled approximate binaural cue preservation, and controlled noise reduction, in at most k max iterations, for a limited number of interferers m ≤ 2 M -3 .
Fig. 1. Simplistic visualization of two successive iterations ( k and k +1 ) of the proposed method with (a) a fixed c , (b) a reducing τ ( k ) . In k +1 iteration the stopping criterion is satisfied in (b). On the contrary, in (a) the stopping criterion is not satisfied, because ˆ w ( k +1) / ∈ Ψ( c ) .
<details>
<summary>Image 1 Details</summary>

### Visual Description
\n
## Diagram: Illustration of State Estimation
### Overview
The image presents two diagrams, labeled (a) and (b), illustrating a state estimation process. Both diagrams depict polygonal shapes formed by lines representing different state estimations at time steps *k* and *k+1*. The diagrams use different line styles and colors to distinguish between the true state, the estimated state at time *k*, and the estimated state at time *k+1*. Points marked with a circle and a star represent the true state at time *k* and *k+1* respectively.
### Components/Axes
The diagrams do not have traditional axes. Instead, they visually represent a state space where the polygons define the estimated boundaries. The key components are:
* **Ψ(c)**: Represented by a dashed blue line.
* **Φ(k)**: Represented by a solid red line.
* **Φ(k+1)**: Represented by a dashed-dotted black line.
* **Å´(k)**: Represented by a circle.
* **Å´(k+1)**: Represented by a star.
The legend is positioned in the top-left corner of each diagram.
### Detailed Analysis or Content Details
**Diagram (a):**
* The polygon formed by the dashed blue line (Ψ(c)) is irregular, with approximately 6 vertices.
* The polygon formed by the solid red line (Φ(k)) is also irregular, with approximately 6 vertices, and appears to be contained within the blue polygon.
* The polygon formed by the dashed-dotted black line (Φ(k+1)) is irregular, with approximately 6 vertices, and appears to be contained within the red polygon.
* The circle (Å´(k)) is located near the bottom-left corner of the red polygon.
* The star (Å´(k+1)) is located near the bottom-left corner of the black polygon.
* An arrow points from the circle to the star, indicating the progression from time *k* to *k+1*.
**Diagram (b):**
* The polygon formed by the dashed blue line (Ψ(c)) is irregular, with approximately 6 vertices.
* The polygon formed by the solid red line (Φ(k)) is also irregular, with approximately 6 vertices, and appears to be contained within the blue polygon.
* The polygon formed by the dashed-dotted black line (Φ(k+1)) is irregular, with approximately 6 vertices, and appears to be contained within the red polygon.
* The circle (Å´(k)) is located near the center of the red polygon.
* The star (Å´(k+1)) is located near the center of the black polygon.
* An arrow points from the circle to the star, indicating the progression from time *k* to *k+1*.
### Key Observations
* In both diagrams, the estimated state at time *k+1* (Φ(k+1)) is a tighter approximation of the true state (Ψ(c)) than the estimated state at time *k* (Φ(k)). This suggests that the state estimation process is converging towards the true state.
* The position of the true state relative to the estimated states differs between diagrams (a) and (b). In (a), the true state is located at the corner, while in (b) it is located near the center.
* The shapes of the polygons are similar in both diagrams, indicating that the estimation process behaves consistently regardless of the initial state.
### Interpretation
These diagrams likely illustrate a recursive state estimation algorithm, such as a Kalman filter or particle filter. The blue polygon represents the true state, which is unknown to the estimator. The red and black polygons represent the estimated state at different time steps. The convergence of the estimated state towards the true state (i.e., the shrinking of the polygons) demonstrates the effectiveness of the estimation algorithm. The difference between diagrams (a) and (b) highlights the impact of the initial state on the estimation process. The arrows indicate the iterative nature of the estimation, where the estimate is updated at each time step based on new measurements. The diagrams do not provide numerical data, but rather a qualitative visualization of the estimation process.
</details>
Nevertheless, our simulation experiments (see Section VI-C) show that our algorithm a) is capable of simultaneously achieving controlled approximate binaural cue preservation and, in most cases, controlled noise reduction of more interferers than 2 M -3 for c > 0 , and b) finds a feasible solution in much fewer iterations, on average, than k max , for k max = 10 , 50 .
Theorem 1: If m ≤ 2 M -3 , the proposed method a) will always find a solution in a finite number of iterations k ≤ k max satisfying the stopping criterion of Eq. (38), and b) will always have a bounded ITF error, i.e.,
The adaptive decreasing of τ ( k ) (see Eq. (41)) results in an adaptive shrinking of Φ ( k ) . Therefore, in the case where the estimated ˆ w ( k ) will be outside of Ψ( c ) , the stopping criterion is not satisfied and, therefore, the algorithm continues with the next iteration. In the next iteration, Φ ( k ) typically shrinks due to the decreased value of τ ( k ) according to Eq. (41). The algorithm continues until there is a solution ˆ w ( k ) ∈ Ψ( c ) . Note that this does not necessarily mean that the algorithm will stop if and only if Φ ( k ) ⊆ Ψ( c ) (see e.g., Fig. 1(b) where the algorithm stops before Φ ( k ) ⊆ Ψ( c ) ). Only in the worst case scenario a solution is found when Φ ( k ) ⊆ Ψ( c ) . We show below that, for m ≤ 2 M -3 , the proposed method guarantees termination within a pre-defined finite maximum number of iterations, k max , while achieving controlled binaural cue preservation accuracy and controlled noise reduction. This is written more formally in the following theorem.
<!-- formula-not-decoded -->
and a bounded noise output power
<!-- formula-not-decoded -->
Proof: Note that for m ≤ 2 M -3 , after k max iterations τ ( k ) = 0 (see Eq. 41) and, therefore, ˆ w ( k max ) = ˆ w JBLCMV because the relaxations of the proposed method in Eq. (36) become w H ( k ) Λ 2 = 0 , which is the same as in JBLCMV as explained in Section V. Therefore, for m ≤ 2 M -3 , the algorithm, in the worst case scenario, will terminate after k max iterations (specified by the user), giving the solution ˆ w JBLCMV which always satisfies the stopping criterion, i.e., ˆ w JBLCMV ∈ Ψ( c ) , for 0 ≤ c ≤ 1 (see Section V-A). This means that the algorithm in the worst case scenario (after k max ) will have the noise output power ˆ w H JBLCMV ˜ P ˆ w JBLCMV and an ITF error equal 0 . Moreover, the noise output power cannot be less than ˆ w H BMVDR ˜ P ˆ w BMVDR (because ˆ w BMVDR achieves the best noise reduction over all the aforementioned methods, because it has the largest feasible set) and the ITF error will be E n i , ( k ) ≤ e i ( c ) ∀ i , because the stopping criterion is satisfied. Therefore, Eqs. (43) and (44) are proved.
Note that, for k = k max and m > 2 M -2 , Φ ( k max ) = ∅ 3 . However, for k < k max and m > 2 M -2 , Φ ( k ) may not be empty. As we will show in our experiments, indeed, usually it is not empty and, therefore, we may achieve simultaneous controlled approximate binaural cue preservation and, in most cases, controlled noise reduction of m > 2 M -2 interferers. This can be observed experimentally from our results in Sections VI-C2 and VI-C3.
## VI. EXPERIMENTAL RESULTS
In this section, the proposed algorithm is experimentally evaluated. In Section VI-A, the setup of our experiments is demonstrated. In Section VI-B, the performance measures are presented. In Section VI-C, the proposed method is compared to other LCMV-based methods with regard to binaural cue preservation and noise reduction. Moreover, we provide results with regard to the speed of the proposed method in terms of number of iterations.
## A. Experiment Setup
Fig. 2 shows the experimental setup that we used for our experiments. Two behind-the-ear (BTE) hearing aids, with two microphones each, are used for the experiments. Therefore, the total number of microphones is M = 4 . The publicly available database with the BTE impulse responses (IRs) in [39] is used to simulate the head IRs (we used the front and middle microphone for each hearing aid). The front microphones are selected as reference microphones.
We placed all sources on a h = 80 cm radius circle centered at the origin (0 , 0) (center of head) with an elavation of 0 o degrees. The index of each interferer (denoted by 'x' marker) is indicated in Fig. 2. The interferers 1 , 2 , 3 , 4 , 5 , 6 and 7 are speech shaped noise realizations with the same power and are placed at 15 o , 45 o , 75 o , 105 o , 165 o , 240 o and 300 o degrees, respectively. The target source (denoted by 'o' marker) is a speech signal in the look direction, i.e., 90 o degrees.
The duration of all sources is 60 sec. The microphone self noise at each microphone is simulated as white Gaussian noise (WGN) with P V = σ 2 I , where σ = 3 . 8 ∗ 10 -5 which corresponds to an SNR of 50 dB with respect to the target signal at the left reference microphone.
The noise CPSD matrices, P , are calculated (as in Eq. (3)) using the ATFs of the truncated true BTE IRs, from the
3 Recall that for m = 2 M -2 (i.e., d = 2 M ), there is a feasible solution which does not provide controlled noise reduction (see Section III-B).
Fig. 2. Experimental setup: /square hearing aids, 'o' target source, 'x' speech shaped interferers. Each source has the same distance, h , from the center of the head.
<details>
<summary>Image 2 Details</summary>

### Visual Description
\n
## Diagram: Target Acquisition Geometry
### Overview
The image depicts a 2D diagram illustrating the geometry of a target acquisition scenario. A central agent (represented by a triangle within a circle) is oriented towards a distant target. Several surrounding points are labeled with numbers 1 through 7, likely representing sensor locations or potential obstacles. A dashed line indicates the angle of view to the target, and a label 'h' denotes a distance. The diagram uses a Cartesian coordinate system with labeled x and y axes.
### Components/Axes
* **X-axis:** Labeled "x-axis". The origin appears to be at approximately x=0.
* **Y-axis:** Labeled "y-axis". The origin appears to be at approximately y=0.
* **Target:** Labeled "target", located in the upper-right quadrant.
* **Sensor/Obstacle Points:** Points 1 through 7 are distributed around the central agent.
* **Angle:** A dashed red line extending from the central agent to the target, labeled with "15°".
* **Distance:** A dashed line labeled "h" extending from the central agent towards the target.
* **Central Agent:** Represented by a triangle inside a circle. The triangle's apex points in the direction of the target. A small square is located at the center of the triangle.
### Detailed Analysis or Content Details
* **Point Locations (Approximate Coordinates):**
* 1: (0.8, 0.7)
* 2: (1.5, 0.5)
* 3: (0.7, 1.2)
* 4: (-0.5, 1.3)
* 5: (-0.8, -0.5)
* 6: (-0.5, -1.2)
* 7: (0.5, -1.3)
* **Angle to Target:** The angle between the positive x-axis and the line of sight to the target is approximately 15 degrees.
* **Distance 'h':** The distance 'h' is not numerically defined, but visually represents the distance from the central agent to the target.
* **Central Agent Orientation:** The central agent is oriented at approximately 15 degrees relative to the positive x-axis.
* **Central Agent Internal Structure:** The central agent contains a small square at its center, potentially representing a sensor or focal point.
### Key Observations
* The points 1-7 are positioned around the central agent, suggesting they might be sensors providing data or obstacles affecting the agent's view.
* The angle of 15 degrees is a key parameter in the geometry of the target acquisition.
* The diagram is a simplified 2D representation of a potentially more complex 3D scenario.
### Interpretation
The diagram illustrates a basic target acquisition problem. The central agent is attempting to locate a target, and the surrounding points may represent factors influencing the acquisition process. The 15-degree angle and distance 'h' are critical parameters for calculating the target's position and determining the agent's optimal strategy. The diagram suggests a scenario where the agent needs to account for the positions of surrounding elements (points 1-7) when aiming at the target. The internal square within the central agent could represent the agent's sensor or the point from which the angle measurement is taken. The diagram is likely used to explain or analyze the geometric relationships involved in target tracking or guidance systems.
</details>
database, and the estimated PSDs of the sources using all available data without voice activity detection (VAD) errors. Also, the constraints of all the aforementioned methods use the ATFs of the truncated true BTE IRs. The truncated BTE IRs length is 12 . 5 ms. The sampling frequency is f s = 16 kHz. We use a simple overlap-and-add analysis/synthesis method [40] with frame length 10 ms, overlap 50% and an FFT size of 256 . The analysis/synthesis window is a square-root-Hann window. The ATFs are also computed with an FFT size of 256 . Finally, the microphone signals are computed by convolving the truncated BTE IRs with the source signals at the original locations.
## B. Performance Evaluation
In this section we define the performance evaluation measures that we use to evaluate the results.
<!-- formula-not-decoded -->
1) ITFs, IPDs & ILDs: In this section three average performance measures for binaural cue preservation are defined: the average ILD error, the average IPD error, and the average ITF error. Note, as explained in Section III-A, that the IPD errors are perceptually important only for frequencies below 1 kHz, and the ILD errors are perceptually important only for frequencies above 3 kHz. Therefore, the evaluation of IPDs and ILDs will be done only for these frequency regions. We evaluate the average ILD and IPD error for all interferers as follows. Let L n i ( k, l ) and T n i ( k, l ) denote the ILD and IPD errors (for the k -th frequency bin and l -th frame), respectively, defined in Eq. (9). Then the average ILD and ITD errors are defined as and
where N and T are the number of frequency bins and the number of frames, respectively, k ILD and k IPD are the first and last frequency-bin indices in the frequency regions 3 -
<!-- formula-not-decoded -->
8 kHz and 0 -1 kHz, respectively.. Note that since the max possible value of T n i ( k, l ) is 1 , the max value of TotER IPD is r . Moreover, we evaluate the average ITF error given by where E n i is the ITF error defined in Eq. (8). Finally, we evaluate the average ITF error ratio given by
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
Please note that in these experiments we use the true ATFs in the constraints of the optimization problems of all competing methods. Therefore, we do not measure the corresponding error measures for the binaural cues of target source since they are always zero, because in all compared methods the constraints perfectly preserve the binaural cues of the target source.
which measures the average amount of binaural cue collapse by comparing the ITF error of the proposed method with the ITF error of the BMVDR. Since the proposed method will always satisfy the condition E n i , ( k ) ( k, l ) ≤ c E n i , BMVDR ( k, l ) for r ≤ 2 M -3 (see Theorem 1 ), obviously AvER ITF ( c ) ≤ c for r ≤ 2 M -3 . Note that ideally the proposed method will provide a solution as close as possible to the boundary of Ψ( c ) , i.e., AvER ITF ( c ) ≈ c (see Section V-B). Moreover, for the proposed method AvER ITF (0) = 0 and AvER ITF (1) = 1 because for c = 0 , E n i ( k, l ) = 0 (for r ≤ 2 M -3 ), and for c = 1 , E n i ( k, l ) = E n i , BMVDR ( k, l ) .
2) SNR measures: We define the binaural global segmental signal-to-noise-ratio (gsSNR) gain as
<!-- formula-not-decoded -->
where the gsSNR input and output are defined as
<!-- formula-not-decoded -->
respectively, where for the l -th frame, the binaural input signalto-noise-ratio (SNR) is defined as
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
where e T = [ e T L e T R ] , e T L = [1 , 0 , · · · , 0] and e T R = [0 , · · · , 0 , 1] , ˜ P is defined in Eq. (11) and ˜ P x is similarly defined but it uses as diagonal block matrices the P x matrix. The binaural output SNR for the l -th frame, is defined as where w = [ w T L ( k, l ) w T R ( k, l )] T . Note that gsSNR out and gsSNR in can be seen as average measures of the binaural SNR measures defined in [30].
Fig. 3. Anechoic environment: Performance curves for the competing methods in terms of (a) noise reduction, (b) average ITF error, (c) average ILD error, (d) average IPD error.
<details>
<summary>Image 3 Details</summary>

### Visual Description
## Charts: Performance Comparison of Beamforming Algorithms
### Overview
The image contains four separate charts (labeled (a), (b), (c), and (d)) comparing the performance of several beamforming algorithms as a function of the number of interferers. The charts plot different performance metrics (gSSNR gain, ToER<sup>ITE</sup>, ToER<sup>ILD</sup>, and ToER<sup>ITD</sup>) against the number of interferers.
### Components/Axes
Each chart shares a common x-axis:
* **X-axis:** Number of interferers (r), ranging from 1 to 7.
Each chart has a unique y-axis:
* **(a):** gSSNR gain (dB), ranging from 0 to 30.
* **(b):** ToER<sup>ITE</sup>, ranging from 0 to 30.
* **(c):** ToER<sup>ILD</sup>, ranging from 0 to 1000.
* **(d):** ToER<sup>ITD</sup>, ranging from 0 to 2.
Each chart also includes a legend identifying the different algorithms/parameter settings.
### Detailed Analysis or Content Details
**Chart (a): gSSNR gain (dB) vs. Number of Interferers (r)**
* **JBLCMV (Red Solid Line):** Starts at approximately 30 dB at r=1, rapidly decreases to approximately 5 dB at r=2, and continues to decrease to approximately 2 dB at r=7.
* **Prop-c = 0.3, k<sub>max</sub> = 50 (Orange Dotted Line):** Starts at approximately 25 dB at r=1, decreases to approximately 10 dB at r=2, and stabilizes around 5 dB for r > 3.
* **Prop-c = 0.3, k<sub>max</sub> = 10 (Green Dashed Line):** Starts at approximately 20 dB at r=1, decreases to approximately 8 dB at r=2, and stabilizes around 3 dB for r > 3.
* **Prop-c = 0.6, k<sub>max</sub> = 50 (Yellow Solid Line):** Starts at approximately 18 dB at r=1, decreases to approximately 6 dB at r=2, and stabilizes around 2 dB for r > 3.
* **BMVDR (Black Dashed-Dot Line):** Remains relatively constant around 2 dB across all values of r.
**Chart (b): ToER<sup>ITE</sup> vs. Number of Interferers (r)**
* **JBLCMV (Red Solid Line):** Remains relatively constant around 2 dB across all values of r.
* **Prop-c = 0.3, k<sub>max</sub> = 50 (Orange Dotted Line):** Starts at approximately 2 dB at r=1, increases to approximately 20 dB at r=7.
* **Prop-c = 0.3, k<sub>max</sub> = 10 (Green Dashed Line):** Starts at approximately 2 dB at r=1, increases to approximately 10 dB at r=7.
* **Prop-c = 0.6, k<sub>max</sub> = 50 (Yellow Solid Line):** Starts at approximately 2 dB at r=1, increases to approximately 15 dB at r=7.
* **BMVDR (Black Dashed-Dot Line):** Remains relatively constant around 2 dB across all values of r.
**Chart (c): ToER<sup>ILD</sup> vs. Number of Interferers (r)**
* **BLCMV-η = 0.1 (Red Square Markers):** Remains relatively constant around 0 dB for r=1 to r=4, then increases sharply to approximately 800 dB at r=7.
* **BLCMV-η = 0.001 (Red Circle Markers):** Remains relatively constant around 0 dB for r=1 to r=4, then increases sharply to approximately 1000 dB at r=7.
* **OBLCMV (Green Star Markers):** Remains relatively constant around 0 dB for r=1 to r=4, then increases to approximately 200 dB at r=7.
**Chart (d): ToER<sup>ITD</sup> vs. Number of Interferers (r)**
* **BLCMV-η = 0.1 (Red Square Markers):** Remains relatively constant around 0.2 dB for r=1 to r=4, then increases to approximately 1.5 dB at r=7.
* **BLCMV-η = 0.001 (Red Circle Markers):** Remains relatively constant around 0.2 dB for r=1 to r=4, then increases to approximately 1.2 dB at r=7.
* **OBLCMV (Green Star Markers):** Remains relatively constant around 0.1 dB for r=1 to r=4, then increases to approximately 1.8 dB at r=7.
### Key Observations
* In Chart (a), JBLCMV exhibits the highest gSSNR gain at low interferer counts but degrades rapidly as the number of interferers increases.
* In Chart (b), ToER<sup>ITE</sup> generally increases with the number of interferers for the "Prop-c" algorithms, while JBLCMV and BMVDR remain relatively constant.
* Charts (c) and (d) show a significant increase in ToER<sup>ILD</sup> and ToER<sup>ITD</sup> for all algorithms as the number of interferers increases, particularly for BLCMV with smaller η values.
* BMVDR consistently performs poorly in terms of gSSNR gain (Chart a) but maintains a stable ToER<sup>ITE</sup> (Chart b).
### Interpretation
These charts compare the performance of different beamforming algorithms under varying levels of interference. The algorithms are evaluated based on several metrics: gSSNR gain (signal-to-noise ratio), and different measures of ToER (tracking error rate).
The rapid degradation of JBLCMV's gSSNR gain with increasing interferers suggests it is sensitive to interference. The "Prop-c" algorithms offer a more stable performance in terms of gSSNR gain, but at the cost of lower initial gain. The consistent performance of BMVDR in ToER<sup>ITE</sup> indicates its robustness to interference in this specific metric.
The increasing ToER<sup>ILD</sup> and ToER<sup>ITD</sup> with the number of interferers in Charts (c) and (d) suggest that the algorithms struggle to accurately estimate the direction of arrival (DOA) in highly interfering environments. The performance difference between BLCMV with different η values and OBLCMV suggests that the choice of regularization parameter (η) can significantly impact performance in these scenarios.
The charts demonstrate a trade-off between different performance metrics. An algorithm that excels in one metric may perform poorly in another, highlighting the need to carefully consider the specific application requirements when selecting a beamforming algorithm. The data suggests that the optimal algorithm depends on the level of interference and the relative importance of different performance metrics.
</details>
## C. Results
In the following experiments we evaluate the performance of the proposed and reference methods (i.e., BLCMV [27] with two different values of η , OBLCMV [28], BMVDR [30] and JBLCMV [29], [30]) as a function of the number of simultaneously present interferers, 1 ≤ r ≤ 7 . For instance, for r = 1 , only the interferer with index 1 is enabled while all the others are silent. For r = 2 , only the interferers with indices 1 , 2 are enabled, while the others are silent, and so on. The binaural gsSNR in values for r = 1 , 2 , 3 , 4 , 5 , 6 and 7 are 0 . 46 , -1 . 45 , -2 . 29 , -2 . 92 , -3 . 76 , -4 . 15 , and -4 . 53 dB, respectively. Recall that each method has a different m max , except for the proposed method for c > 0 where m max is difficult to be estimated, as explained in Section V, and, therefore, m is always set to m = r . For each of the reference methods and the proposed method in the case of c = 0 and if r > m max , we will use in the constraints only the first m max interferers and the last r -m max will not be preserved. In Sections VI-C1, VI-C2 the simulations are carried out without taking into account room acoustics. In Section VI-C3 the simulations are carried out by taking into account room acoustics.
Figs. 3 and 4 show the comparison of the proposed method (denoted by Prop. -c = value , k max = value ) with the aforementioned reference methods in terms of binaural cue preservation and noise reduction. Note that BMVDR and the JBLCMV are the two extreme special cases of our method which can be denoted as Prop. -c = 1 and Prop. -c = 0 ,
1) SNR & Binaural Cue Preservation: For simplicity, we used the same c = c j , for j = 1 , · · · , m for all interferers in the proposed method. In other words, we assumed that the binaural cues of all interferers are equally important. Moreover, we selected for the adaptive change of τ ( k ) the step parameter α ( k max ) with k max ∈ { 10 , 50 } .
Fig. 4. Anechoic environment: Combination of performance curves from Fig. 3 for the competing methods in terms of (a) noise reduction, (b) average ITF error for different number of simultaneously present interferers r . The counting of r starts at the top left part of each curve.
<details>
<summary>Image 4 Details</summary>

### Visual Description
\n
## Chart: gsSNR Gain vs. TotERITF
### Overview
The image presents a line chart illustrating the relationship between gsSNR gain (in dB) and TotERITF. The chart compares the performance of several different algorithms or methods, indicated by different colored lines. The general trend shows a rapid decrease in gsSNR gain as TotERITF increases, with varying rates of decline for each method.
### Components/Axes
* **X-axis:** Labeled "TotERITF", ranging from approximately 0 to 25.
* **Y-axis:** Labeled "gsSNR gain (dB)", ranging from approximately 0 to 35.
* **Legend:** Located in the top-right corner of the chart. It identifies the following methods/algorithms with corresponding colors and markers:
* JBLCMV (Red Triangles)
* Prop.-c = 0.3, kmax = 10 (Blue Triangles)
* Prop.-c = 0.3, kmax = 50 (Magenta Crosses)
* Prop.-c = 0.6, kmax = 10 (Green Stars)
* Prop.-c = 0.6, kmax = 50 (Yellow Circles)
* BMVDR (Black Diamonds)
* BLCMV-η = 0.1 (Brown Squares)
* BLCMV-η = 0.001 (Orange Plus Signs)
* OBLCMV (Dark Green Hexagons)
### Detailed Analysis
Here's a breakdown of each line's trend and approximate data points, verified against the legend colors:
* **JBLCMV (Red Triangles):** Starts at approximately 32 dB at TotERITF = 0, rapidly decreasing to around 2 dB at TotERITF = 5, and then leveling off to approximately 1 dB at TotERITF = 25.
* **Prop.-c = 0.3, kmax = 10 (Blue Triangles):** Starts at approximately 31 dB at TotERITF = 0, decreases to around 3 dB at TotERITF = 5, and then stabilizes around 2 dB at TotERITF = 25.
* **Prop.-c = 0.3, kmax = 50 (Magenta Crosses):** Starts at approximately 30 dB at TotERITF = 0, decreases to around 4 dB at TotERITF = 5, and then stabilizes around 2 dB at TotERITF = 25.
* **Prop.-c = 0.6, kmax = 10 (Green Stars):** Starts at approximately 28 dB at TotERITF = 0, decreases to around 6 dB at TotERITF = 5, and then stabilizes around 4 dB at TotERITF = 25.
* **Prop.-c = 0.6, kmax = 50 (Yellow Circles):** Starts at approximately 26 dB at TotERITF = 0, decreases to around 5 dB at TotERITF = 5, and then stabilizes around 3 dB at TotERITF = 25.
* **BMVDR (Black Diamonds):** Starts at approximately 24 dB at TotERITF = 0, decreases to around 10 dB at TotERITF = 5, and then stabilizes around 5 dB at TotERITF = 25.
* **BLCMV-η = 0.1 (Brown Squares):** Starts at approximately 16 dB at TotERITF = 0, decreases to around 4 dB at TotERITF = 5, and then stabilizes around 1 dB at TotERITF = 25.
* **BLCMV-η = 0.001 (Orange Plus Signs):** Starts at approximately 14 dB at TotERITF = 0, decreases to around 0 dB at TotERITF = 5, and remains near 0 dB for the rest of the range.
* **OBLCMV (Dark Green Hexagons):** Starts at approximately 22 dB at TotERITF = 0, decreases to around 8 dB at TotERITF = 5, and then stabilizes around 4 dB at TotERITF = 25.
### Key Observations
* All methods exhibit a significant drop in gsSNR gain as TotERITF increases.
* JBLCMV, Prop.-c = 0.3, kmax = 10, and Prop.-c = 0.3, kmax = 50 initially provide the highest gsSNR gains.
* BLCMV-η = 0.001 performs the worst, with a very low gsSNR gain across all TotERITF values.
* The rate of decline in gsSNR gain appears to slow down as TotERITF increases beyond 5.
* The methods with higher 'kmax' values (kmax = 50) generally show slightly better performance than those with lower 'kmax' values (kmax = 10) for the Prop.-c methods.
### Interpretation
This chart demonstrates the performance trade-offs of different algorithms in the context of increasing TotERITF. TotERITF likely represents some form of interference or complexity in the environment. The gsSNR gain represents the signal-to-noise ratio improvement achieved by the algorithm.
The rapid initial decline in gsSNR gain suggests that these algorithms are effective at mitigating interference at low TotERITF levels, but their performance degrades as the interference becomes more severe. The leveling off of the curves at higher TotERITF values indicates a saturation point, where the algorithms can no longer significantly improve the signal-to-noise ratio.
The differences in performance between the algorithms suggest that some are more robust to interference than others. The parameters 'c', 'kmax', and 'η' likely control the algorithm's behavior and influence its ability to handle interference. The fact that BLCMV-η = 0.001 performs so poorly suggests that a very small value of η is detrimental to the algorithm's performance.
The chart provides valuable insights for selecting the most appropriate algorithm for a given application, based on the expected level of interference. Further investigation would be needed to understand the specific meaning of TotERITF and the parameters used in each algorithm.
</details>
respectively. However, in these figures we used the original names for clarity. The performance curves are for different number of simultaneously present interferers r . As expected, the performance curves of the proposed method always lie between the BMVDR and the JBLCMV. Fig. 4 is the combination of the curves of Figs. 3(a,b) into a single figure. Notice that the number of interferers r in this combined figure increase from r = 1 up to r = 7 along the curves from top-left, to bottom-right. As expected, the proposed method for k max = 50 achieves slightly better noise reduction and worse binaural cue preservation than for k max = 10 . This is because for a larger k max , the proposed algorithm will provide a feasible solution closer to the boundary of Ψ( c ) , as explained
Fig. 5. Anechoic environment: Average ITF error ratio as a function of c for 1 ≤ r ≤ 7 for (a) k max = 10 and (b) k max = 50 . The solid line is the c values.
<details>
<summary>Image 5 Details</summary>

### Visual Description
## Chart: Average Relative Information Flow (AvERITF) vs. Amount of Collapse
### Overview
The image presents two line graphs (labeled (a) and (b)) illustrating the relationship between the amount of collapse (c) and the Average Relative Information Flow (AvERITF(c)). Each graph displays multiple lines, each representing a different value of 'r'. The x-axis represents the 'amount of collapse (c)' ranging from 0 to 1, and the y-axis represents 'AvERITF(c)' ranging from 0 to 1.
### Components/Axes
* **X-axis (both graphs):** "amount of collapse (c)" - Scale from 0 to 1, with markers at 0, 0.2, 0.4, 0.6, 0.8, and 1.
* **Y-axis (both graphs):** "AvERITF(c)" - Scale from 0 to 1, with markers at 0, 0.2, 0.4, 0.6, 0.8, and 1.
* **Legend (top-center):**
* c = 0: 0.1: 1 (Red solid line)
* r = 1 (Blue dashed line)
* r = 2 (Yellow solid line)
* r = 3 (Black dashed-dotted line)
* r = 4 (Green dotted line)
* r = 5 (Cyan asterisk-dashed line)
* r = 6 (Magenta asterisk-dotted line)
* r = 7 (Brown asterisk-dashed line)
### Detailed Analysis or Content Details
**Graph (a):**
* **c = 0: 0.1: 1 (Red):** Starts at approximately (0, 0), increases linearly to approximately (0.8, 0.8), then sharply increases to approximately (1, 0.95).
* **r = 1 (Blue):** Starts at approximately (0, 0), increases linearly to approximately (0.9, 0.8), then sharply increases to approximately (1, 0.95).
* **r = 2 (Yellow):** Starts at approximately (0, 0), increases linearly to approximately (0.7, 0.6), then sharply increases to approximately (1, 0.9).
* **r = 3 (Black):** Starts at approximately (0, 0), increases linearly to approximately (0.6, 0.5), then sharply increases to approximately (1, 0.85).
* **r = 4 (Green):** Starts at approximately (0, 0), increases linearly to approximately (0.5, 0.4), then sharply increases to approximately (1, 0.8).
* **r = 5 (Cyan):** Starts at approximately (0, 0), increases linearly to approximately (0.4, 0.3), then sharply increases to approximately (1, 0.75).
* **r = 6 (Magenta):** Starts at approximately (0, 0), increases linearly to approximately (0.3, 0.25), then sharply increases to approximately (1, 0.7).
* **r = 7 (Brown):** Starts at approximately (0, 0), increases linearly to approximately (0.2, 0.2), then sharply increases to approximately (1, 0.65).
**Graph (b):**
* **c = 0: 0.1: 1 (Red):** Starts at approximately (0, 0), increases linearly to approximately (0.8, 0.8), then sharply increases to approximately (1, 0.95).
* **r = 1 (Blue):** Starts at approximately (0, 0), increases linearly to approximately (0.9, 0.8), then sharply increases to approximately (1, 0.95).
* **r = 2 (Yellow):** Starts at approximately (0, 0), increases linearly to approximately (0.7, 0.6), then sharply increases to approximately (1, 0.9).
* **r = 3 (Black):** Starts at approximately (0, 0), increases linearly to approximately (0.6, 0.5), then sharply increases to approximately (1, 0.85).
* **r = 4 (Green):** Starts at approximately (0, 0), increases linearly to approximately (0.5, 0.4), then sharply increases to approximately (1, 0.8).
* **r = 5 (Cyan):** Starts at approximately (0, 0), increases linearly to approximately (0.4, 0.3), then sharply increases to approximately (1, 0.75).
* **r = 6 (Magenta):** Starts at approximately (0, 0), increases linearly to approximately (0.3, 0.25), then sharply increases to approximately (1, 0.7).
* **r = 7 (Brown):** Starts at approximately (0, 0), increases linearly to approximately (0.2, 0.2), then sharply increases to approximately (1, 0.65).
### Key Observations
* Both graphs exhibit similar trends.
* As the 'amount of collapse (c)' increases, the 'AvERITF(c)' generally increases.
* The lines representing different values of 'r' diverge as 'c' increases, with lower values of 'r' resulting in higher 'AvERITF(c)' values.
* The increase in 'AvERITF(c)' becomes steeper as 'c' approaches 1 for all values of 'r'.
* The lines for c = 0:0.1:1 and r = 1 are nearly identical.
### Interpretation
The charts demonstrate the relationship between the amount of collapse and the average relative information flow, parameterized by 'r'. The 'amount of collapse' likely refers to a reduction in complexity or dimensionality. The 'AvERITF(c)' represents a measure of information transfer after this collapse.
The data suggests that as the amount of collapse increases, information flow initially increases linearly, but then experiences a rapid increase as the collapse nears completion. This could indicate that a certain level of collapse is necessary to facilitate information transfer, but excessive collapse can lead to information loss or distortion.
The parameter 'r' appears to influence the rate of information flow. Lower values of 'r' result in higher information flow, suggesting that 'r' may represent a constraint or penalty on information transfer. The nearly identical lines for c = 0:0.1:1 and r = 1 suggest that the parameter 'c' and 'r' are related.
The steep increase in AvERITF(c) near c=1 could represent a phase transition or critical point where information flow becomes highly sensitive to small changes in the amount of collapse. The graphs provide insights into the trade-off between complexity reduction (collapse) and information preservation.
</details>
Fig. 6. Anechoic environment: Average number of iterations as a function of simultaneously present interferers, r .
<details>
<summary>Image 6 Details</summary>

### Visual Description
## Chart: Average k vs. Number of Interferers
### Overview
The image presents a line chart illustrating the relationship between the average value of 'k' and the number of interferers ('r') under different parameter settings. The chart compares four different configurations defined by 'Prop-c' (Proportion-c) and 'kmax' (maximum k).
### Components/Axes
* **X-axis:** "number of interferers (r)", ranging from 1 to 7, with tick marks at each integer value.
* **Y-axis:** "average k", ranging from 1 to 7, with tick marks at each integer value.
* **Legend:** Located at the top-center of the chart, defining the four data series:
* Dotted Blue Line: "Prop-c = 0.3, kmax = 10"
* Purple Cross Line: "Prop-c = 0.3, kmax = 50"
* Green Diamond Line: "Prop-c = 0.6, kmax = 10"
* Yellow Circle Line: "Prop-c = 0.6, kmax = 50"
### Detailed Analysis
Here's a breakdown of each data series, with approximate values read from the chart:
1. **Prop-c = 0.3, kmax = 10 (Dotted Blue Line):** This line shows a generally increasing trend, but with diminishing returns.
* r = 1: k ≈ 1.8
* r = 2: k ≈ 2.1
* r = 3: k ≈ 2.4
* r = 4: k ≈ 2.7
* r = 5: k ≈ 3.0
* r = 6: k ≈ 3.2
* r = 7: k ≈ 3.3
2. **Prop-c = 0.3, kmax = 50 (Purple Cross Line):** This line exhibits a more pronounced increase initially, then plateaus.
* r = 1: k ≈ 1.9
* r = 2: k ≈ 2.2
* r = 3: k ≈ 2.7
* r = 4: k ≈ 3.4
* r = 5: k ≈ 3.9
* r = 6: k ≈ 4.0
* r = 7: k ≈ 4.0
3. **Prop-c = 0.6, kmax = 10 (Green Diamond Line):** This line shows a moderate increase, similar to the first series, but starting from a slightly higher initial value.
* r = 1: k ≈ 2.0
* r = 2: k ≈ 2.2
* r = 3: k ≈ 2.5
* r = 4: k ≈ 2.9
* r = 5: k ≈ 3.4
* r = 6: k ≈ 3.7
* r = 7: k ≈ 3.8
4. **Prop-c = 0.6, kmax = 50 (Yellow Circle Line):** This line demonstrates the most significant increase, with a steep slope initially, followed by a leveling off.
* r = 1: k ≈ 2.0
* r = 2: k ≈ 2.4
* r = 3: k ≈ 3.0
* r = 4: k ≈ 3.7
* r = 5: k ≈ 5.0
* r = 6: k ≈ 5.5
* r = 7: k ≈ 5.6
### Key Observations
* Increasing 'kmax' from 10 to 50 generally leads to higher average 'k' values, especially for larger numbers of interferers.
* Increasing 'Prop-c' from 0.3 to 0.6 also tends to increase average 'k', but the effect is less pronounced than changing 'kmax'.
* The rate of increase in average 'k' diminishes as the number of interferers increases for all configurations.
* The "Prop-c = 0.6, kmax = 50" configuration consistently yields the highest average 'k' values across all numbers of interferers.
### Interpretation
The chart suggests that the average value of 'k' is positively correlated with both the proportion 'c' and the maximum value 'kmax'. The number of interferers also has a positive impact on 'k', but this impact becomes less significant as the number of interferers grows.
The parameter 'kmax' appears to have a more substantial influence on 'k' than 'Prop-c'. This could indicate that the maximum allowed value of 'k' is a more critical factor in determining the average 'k' in this system.
The diminishing returns observed with increasing interferers suggest that there's a saturation point beyond which adding more interferers doesn't significantly increase 'k'. This could be due to limitations in the system's capacity to handle interference or the emergence of other factors that counteract the effect of additional interferers.
The chart provides insights into the behavior of a system where 'k' represents some measure of performance or efficiency, and 'r' represents the level of interference. Understanding the relationship between these variables can help optimize system parameters to achieve desired performance levels in the presence of interference.
</details>
in Section V-B.
Fig. 5 serves to visualize better the trade-off between fast termination and closeness to the boundary of Ψ( c ) (see Section V-B for details). More specifically, Fig. 5 shows the average ITF ratio of the proposed method, for k max = 10 , 50 , as a function of c for different number of simultaneously present interferers r . As expected (see Section VI-B1), AvER ITF ( c ) ≤ c for 1 ≤ r ≤ 5 . This is also the case for the curves for r = 6 , 7 except for c = 0 , as expected, because the proposed method becomes identical to the JBLCMV which can preserve the binaural cues of up to m max = 2 M -3 = 5 interferers while achieving controlled noise reduction. As expected, for k max = 50 all performance curves are closer to the boundary. In general, the larger the r , the less close the AvER ITF ( c ) of
From Figs. 3(a,b), and Fig. 4 it is clear that, indeed the proposed method achieves a controlled noise reduction and controlled approximate binaural cue preservation. The BMVDR achieves the best noise reduction performance, but it does not preserve the binaural cues of the interferers. The JBLCMV accurately preserves the highest number of simultaneously present interferers and it has worse noise reduction performance than all parametrizations of the proposed method. Note that m max = 5 for JBLCMV and, therefore, the last two interferers cannot be included in the constraints and that is why the binaural cue preservation is not perfect. The OBLCMV comes second in terms of SNR performance, but it preserves the binaural cues of only one interferer.
the proposed method is to c . In words, the more constraints we use in the proposed method, the harder it is to reach the boundary of Ψ( c ) . Note that for the two extreme values c = 0 and c = 1 , the proposed method becomes identical to the JBLCMV and the BMVDR, respectively. As was expected, for c = 0 and r ≤ 5 , AvER ITF (0) = 0 . The JBLCMV has m max = 2 M -3 = 5 and, therefore, for c = 0 and r = 6 , 7 , AvER ITF (0) > 0 . Finally, for c = 1 , for all values of r , AvER ITF (1) = 1 as expected.
2) Speed of Termination: In Fig. 6 we evaluate the number of iterations required for the proposed method to satisfy the stopping criterion (i.e., when it terminates). Fig. 6 shows the average number of iterations as a function of the simultaneously present interferers, r , of the four configurations of the proposed method that are tested in Fig. 3 and 4. It is clear that the proposed method terminates after 3-4 iterations on average, even for r = 6 , 7 > 2 M -3 . Note that for both tested values of k max , for all frames and frequency bins the proposed method terminated before reaching k max .
Fig. 7 shows a 3D histogram which depicts the statistical termination behaviour of the proposed method. Specifically, the proposed method is evaluated with different c values from 0 . 1 to 0 . 9 with a step-size 0 . 1 . For each c value it is evaluated for all numbers of simultaneously present interferers, i.e., for r = 1 , · · · , 7 as in Fig 6. Hence, this histogram represents all gathered pair-values ( c, k ) of all frequency bins for all r = 1 , · · · , 7 . The pairs ( c, k ) express the number of iterations (per frequency bin), k , that the proposed method need in order to terminate for a certain initial c . The z -axis, which is depicted with different colors, is the number of frequency bins that are associated with a certain pair ( c, k ) in the x-y axes. Again we see that, on average, after 3-4 iterations the algorithm terminates for c = 0 . 1 : 0 . 1 : 0 . 9 .
3) Reverberation: Figs. 8, 9 and 10 show the same experiments as in Figs. 4, 5, and 6, respectively, but this time in a reverberant office environment. The same signals for the interferers and the target are used here. The BTE IRs containing reverberation are also taken from the database in [39]. Note that, the aforementioned database does not have the reverberant (for the office environment) head IRs corresponding to 240 o and 300 o degrees [39]. Therefore, we
Fig. 7. Anechoic environment: Top view of 3D histogram of number of frequency bins that have pairs ( k, c ) for the proposed method for (a) k max = 10 and (b) k max = 50 .
<details>
<summary>Image 7 Details</summary>

### Visual Description
\n
## Heatmap: Iteration Count vs. Collapse Amount
### Overview
The image presents two heatmaps, labeled (a) and (b), visualizing the relationship between the "amount of collapse" (c) and the "number of iterations" (k). Both heatmaps share the same axes scales. The color intensity represents a numerical value, indicated by a colorbar on the right, ranging from approximately 0 to 1000.
### Components/Axes
* **X-axis:** "amount of collapse (c)", ranging from 0.1 to 0.9, with increments of approximately 0.1.
* **Y-axis:** "number of iterations (k)", ranging from 0 to 50, with increments of 10.
* **Colorbar:** Located on the right side of the image, representing values from 0 (dark blue) to 1000 (bright yellow).
* **Labels:** (a) and (b) are labels for the two heatmaps.
### Detailed Analysis or Content Details
**Heatmap (a):**
* The heatmap shows a predominantly dark blue color across the entire range of "amount of collapse" and "number of iterations".
* A thin band of color, transitioning from green to yellow, appears along the bottom edge of the heatmap, corresponding to low values of "number of iterations" (k) between 0 and approximately 10.
* The values within this band are approximately between 0 and 200.
* The color intensity remains consistently low (dark blue) for higher values of "number of iterations" (k).
**Heatmap (b):**
* Similar to heatmap (a), this heatmap also displays a predominantly dark blue color.
* However, the band of color along the bottom edge is more pronounced and extends to higher values.
* The color transitions from dark blue to green, then to yellow, indicating increasing values.
* The maximum value in this band reaches approximately 1000 at the rightmost edge (amount of collapse = 0.9) and low iteration counts (k < 10).
* As the "number of iterations" (k) increases, the color intensity decreases, transitioning back to dark blue.
* The values within this band are approximately between 0 and 1000.
### Key Observations
* Both heatmaps exhibit a strong dependence on the "number of iterations" (k). Higher iteration counts generally correspond to lower values.
* Heatmap (b) shows a more significant variation in values, particularly at lower iteration counts, compared to heatmap (a).
* The "amount of collapse" (c) appears to have a more pronounced effect on the values in heatmap (b) than in heatmap (a).
* The color bands are horizontal, indicating that the value is primarily determined by the number of iterations.
### Interpretation
The data suggests that the process being modeled requires a certain number of iterations to reach a stable state. The "amount of collapse" (c) likely represents a parameter influencing the rate or extent of this stabilization.
* **Heatmap (a)** indicates a relatively stable process, with values remaining low regardless of the "amount of collapse". This could mean the process is robust to changes in this parameter.
* **Heatmap (b)** demonstrates a more sensitive relationship. At low iteration counts, the "amount of collapse" significantly impacts the value, reaching a maximum at c = 0.9. As the number of iterations increases, the effect of the "amount of collapse" diminishes, and the value converges towards a lower, stable state.
The difference between the two heatmaps could be due to different initial conditions, parameters, or algorithms used in the modeling process. The colorbar indicates that the values represent some measure of error, cost, or convergence rate. The horizontal bands suggest that the system converges to a stable state as the number of iterations increases, and the amount of collapse influences the initial stages of this convergence.
</details>
used the avalaible angles, 125 o , 145 o for the 6 -th and 7 -th interferer, respectively. Moreover, the sources are now placed on a h = 100 cm radius circle centered at the origin (0 , 0) (center of head) with an elavation of 0 o degrees (because only in this distance is available for the office environment in [39]). The binaural gsSNR in values for r = 1 , 2 , 3 , 4 , 5 , 6 and 7 are now -0 . 03 , -2 . 1 , -3 . 03 , -3 . 75 , -4 . 44 , -4 . 95 , and -5 . 42 dB, respectively.
As it is shown in Fig. 8, again the performance of the proposed method is bounded between the performance of the BMVDR and the JBLCMV, except for the case with parameters c = 0 . 3 and k max = 10 for r = 7 . The reason for that is that for r > 2 M -3 there are no guarantees for bounded performance (see Theorem 1). However, in all other cases of the proposed method for r > 2 M -3 the performance is bounded. In Fig. 9 it is clear that the proposed method has very similar behaviour as in Fig. 5, i.e., by increasing k max the the proposed method approaches closer to the boundary. Finally, in Fig. 10 it is shown that the speed of termination is not effected significantly due to reverberation.
## VII. CONCLUSION
In this paper we proposed a new multi-microphone iterative binaural noise reduction method. The proposed method is capable of controlling the amount of noise reduction and the accuracy of binaural cue preservation per interferer using a robust methodology. Specifically, the inequality constraints introduced for the binaural cue preservation of the interferers, are selected in such a way that a) the binaural-cue error is always less or equal than a fraction of the corresponding binaural-cue error of the BMVDR method, and b) the achieved amount of noise suppression is larger or equal to the one achieved via JBLCMV. Therefore, the proposed method provides the flexibility to the users to parametrize the proposed method according to their needs. Moreover, the proposed method always preserves strictly the binaural cues of the target source. Although the proposed method guarantees controlled approximate binaural cue preservation and controlled noise
Fig. 8. Reverberant environment (office): Combination of performance curves from Fig. 3 for the competing methods in terms of (a) noise reduction, (b) average ITF error for different number of simultaneously present interferers r . The counting of r starts at the top left part of each curve.
<details>
<summary>Image 8 Details</summary>

### Visual Description
\n
## Chart: gSSNR Gain vs. TotERITF
### Overview
The image presents a line chart illustrating the relationship between gSSNR gain (in dB) and TotERITF. The chart compares the performance of several different algorithms or configurations, indicated by different colored lines and markers. The x-axis represents TotERITF, and the y-axis represents gSSNR gain.
### Components/Axes
* **X-axis:** TotERITF (ranging from 0 to approximately 90).
* **Y-axis:** gSSNR gain (dB) (ranging from 0 to approximately 35).
* **Legend (top-right):** Contains labels for each data series, with corresponding colors and markers:
* JBLCMV (Red, Triangle)
* Prop.-c = 0.3, kmax = 10 (Magenta, X)
* Prop.-c = 0.3, kmax = 50 (Cyan, Star)
* Prop.-c = 0.6, kmax = 10 (Yellow, Circle)
* Prop.-c = 0.6, kmax = 50 (Black, Diamond)
* BMVDR (Brown, Square)
* BLCMV-η = 0.1 (Orange, Square)
* BLCMV-η = 0.001 (Dark Red, Square)
* OBLCMV (Green, Asterisk)
### Detailed Analysis
The chart displays several lines, each representing a different algorithm's performance.
* **OBLCMV (Green Asterisk):** Starts at approximately 31 dB at TotERITF = 0, rapidly decreasing to approximately 5 dB at TotERITF = 10, and then leveling off around 4-5 dB for the remainder of the range.
* **BMVDR (Brown Square):** Starts at approximately 21 dB at TotERITF = 0, decreases sharply to approximately 7 dB at TotERITF = 10, and then remains relatively stable around 6-8 dB.
* **JBLCMV (Red Triangle):** Starts at approximately 3 dB at TotERITF = 0, increases slightly to around 4 dB at TotERITF = 10, and then remains relatively stable around 3-4 dB.
* **Prop.-c = 0.3, kmax = 10 (Magenta X):** Starts at approximately 5 dB at TotERITF = 0, decreases to around 3 dB at TotERITF = 10, and then remains relatively stable around 3-4 dB.
* **Prop.-c = 0.3, kmax = 50 (Cyan Star):** Starts at approximately 5 dB at TotERITF = 0, decreases to around 3 dB at TotERITF = 10, and then remains relatively stable around 3-4 dB.
* **Prop.-c = 0.6, kmax = 10 (Yellow Circle):** Starts at approximately 4 dB at TotERITF = 0, decreases to around 3 dB at TotERITF = 10, and then remains relatively stable around 3-4 dB.
* **Prop.-c = 0.6, kmax = 50 (Black Diamond):** Starts at approximately 4 dB at TotERITF = 0, decreases to around 3 dB at TotERITF = 10, and then remains relatively stable around 3-4 dB.
* **BLCMV-η = 0.1 (Orange Square):** Starts at approximately 6 dB at TotERITF = 0, decreases to around 4 dB at TotERITF = 10, and then remains relatively stable around 4-5 dB.
* **BLCMV-η = 0.001 (Dark Red Square):** Starts at approximately 6 dB at TotERITF = 0, decreases to around 4 dB at TotERITF = 10, and then remains relatively stable around 3-4 dB.
### Key Observations
* OBLCMV exhibits the highest initial gSSNR gain but experiences the most significant drop as TotERITF increases.
* JBLCMV, Prop.-c = 0.3, kmax = 10, Prop.-c = 0.3, kmax = 50, Prop.-c = 0.6, kmax = 10, Prop.-c = 0.6, kmax = 50, BLCMV-η = 0.001 show very similar performance, remaining relatively stable at low gSSNR gain values (around 3-4 dB) after an initial decrease.
* BMVDR and BLCMV-η = 0.1 show intermediate performance, with a more moderate decrease in gSSNR gain as TotERITF increases.
### Interpretation
The chart demonstrates the performance trade-offs between different algorithms for signal processing, likely in a noisy environment. The gSSNR gain represents the improvement in signal-to-noise ratio achieved by each algorithm. TotERITF likely represents a measure of environmental or interference characteristics.
The rapid decline in gSSNR gain for OBLCMV suggests that it is highly sensitive to increasing TotERITF, meaning its performance degrades significantly as the interference level rises. Conversely, the algorithms with stable performance (JBLCMV, Prop.-c variants, BLCMV-η = 0.001) are more robust to changes in TotERITF, maintaining a consistent level of noise reduction.
The differences between the Prop.-c and kmax parameter settings suggest that these parameters influence the algorithm's sensitivity to interference. The relatively similar performance of the Prop.-c variants indicates that the specific parameter values chosen have a limited impact on overall performance within the tested range.
The chart suggests that the choice of algorithm should be based on the expected level of interference (TotERITF). If the interference level is expected to be high, a more robust algorithm (e.g., JBLCMV) would be preferable, even if it offers lower initial gSSNR gain. If the interference level is low, OBLCMV might be a good choice due to its higher initial performance.
</details>
reduction only for m ≤ 2 M -3 interferers, it is experimentally demonstrated that is also capable of doing the same (in most cases) for more interferers and terminate in just a few iterations.
## APPENDIX
In this section, we show how the optimization problem in Eq. (36) can be equivalently written as a second order cone programming (SOCP) problem. For convenience, we reformulate the optimization problem in Eq. (36) using RATFs instead of ATFs. The left and right RATFs of the i -th interferer are ¯ b i,L = (1 /b iL ) b i and ¯ b i,R = (1 /b iR ) b i , respectively, while the left and right RATFs of the target are ¯ a L = (1 /a L ) a and ¯ a R = (1 /a R ) a , respectively. It is easy to show that the constraints of the optimization problem in Eq. (36) can be
Fig. 9. Reverberant environment (office): Average ITF error ratio as a function of c for 1 ≤ r ≤ 7 for (a) k max = 10 and (b) k max = 50 . The solid line is the c values.
<details>
<summary>Image 9 Details</summary>

### Visual Description
## Chart: Average Return to Functionality (AVERITF) vs. Amount of Collapse
### Overview
The image presents two line charts (labeled (a) and (b)) illustrating the relationship between the "amount of collapse" (c) and the "Average Return to Functionality" (AVERITF(c)). Both charts share the same x-axis (amount of collapse) and y-axis (AVERITF(c)), but display different sets of lines representing varying values of 'r'. The charts appear to model a system's recovery capability as it experiences increasing levels of collapse.
### Components/Axes
* **X-axis:** "amount of collapse (c)", ranging from 0 to 1, with markers at 0, 0.2, 0.4, 0.6, 0.8, and 1.
* **Y-axis:** "AVERITF(c)", ranging from 0 to 1, with markers at 0, 0.25, 0.5, 0.75, and 1.
* **Legend:** Located at the top-center of the image, it defines the lines by their 'r' value. The legend includes the following:
* Red solid line: c = 0:0.1:1
* Blue dashed line: r = 1
* Orange dashed-dotted line: r = 2
* Yellow dashed-dotted line: r = 3
* Green dashed line: r = 4
* Cyan dashed line: r = 5
* Magenta dashed line: r = 6
* Brown solid line: r = 7
### Detailed Analysis or Content Details
**Chart (a):**
* **Red Line (c = 0:0.1:1):** This line starts at approximately (0, 0) and increases linearly to approximately (1, 1).
* **Blue Line (r = 1):** This line starts at approximately (0, 0) and increases linearly to approximately (1, 1).
* **Orange Line (r = 2):** This line starts at approximately (0, 0) and increases linearly to approximately (1, 1).
* **Yellow Line (r = 3):** This line starts at approximately (0, 0) and increases linearly to approximately (1, 0.8).
* **Green Line (r = 4):** This line starts at approximately (0, 0) and increases linearly to approximately (1, 0.6).
* **Cyan Line (r = 5):** This line starts at approximately (0, 0) and increases linearly to approximately (1, 0.4).
* **Magenta Line (r = 6):** This line starts at approximately (0, 0) and increases linearly to approximately (1, 0.2).
* **Brown Line (r = 7):** This line starts at approximately (0, 0) and increases linearly to approximately (1, 0).
**Chart (b):**
* **Red Line (c = 0:0.1:1):** This line starts at approximately (0, 0) and increases linearly to approximately (1, 1).
* **Blue Line (r = 1):** This line starts at approximately (0, 0) and increases linearly to approximately (1, 1).
* **Orange Line (r = 2):** This line starts at approximately (0, 0) and increases linearly to approximately (1, 1).
* **Yellow Line (r = 3):** This line starts at approximately (0, 0) and increases linearly to approximately (1, 0.8).
* **Green Line (r = 4):** This line starts at approximately (0, 0) and increases linearly to approximately (1, 0.6).
* **Cyan Line (r = 5):** This line starts at approximately (0, 0) and increases linearly to approximately (1, 0.4).
* **Magenta Line (r = 6):** This line starts at approximately (0, 0) and increases linearly to approximately (1, 0.2).
* **Brown Line (r = 7):** This line starts at approximately (0, 0) and increases linearly to approximately (1, 0).
In both charts, the lines generally exhibit a positive correlation between the amount of collapse and the average return to functionality, but the slope of the lines decreases as 'r' increases.
### Key Observations
* The lines representing lower values of 'r' (r=1, r=2, r=3) are closer to a perfect linear relationship (y=x) in both charts.
* As 'r' increases, the lines become flatter, indicating a reduced return to functionality for a given amount of collapse.
* The lines for r=6 and r=7 show a significantly lower return to functionality as the amount of collapse increases.
* Both charts show nearly identical trends.
### Interpretation
The charts demonstrate how the parameter 'r' influences a system's ability to recover from collapse. A lower 'r' value suggests a more resilient system, capable of maintaining a higher level of functionality even with significant collapse. Conversely, a higher 'r' value indicates a more fragile system, where even small amounts of collapse lead to a substantial loss of functionality.
The consistent trends in both charts suggest that the relationship between collapse, 'r', and return to functionality is stable and predictable. The linear nature of the lines (especially for lower 'r' values) implies a direct proportionality between collapse and recovery, modulated by the 'r' parameter.
The charts could be modeling a variety of systems, such as network resilience, infrastructure recovery, or even organizational adaptability. The 'amount of collapse' could represent the extent of damage or disruption, while the 'Average Return to Functionality' represents the system's ability to restore essential services or operations. The parameter 'r' likely represents a characteristic of the system itself, such as its redundancy, robustness, or inherent capacity for self-repair.
</details>
Fig. 10. Reverberant environment (office): Average number of iterations as a function of simultaneously present interferers, r .
<details>
<summary>Image 10 Details</summary>

### Visual Description
## Line Chart: Average k vs. Number of Interferers
### Overview
This image presents a line chart illustrating the relationship between the average value of 'k' and the number of interferers ('r'). The chart compares four different parameter settings: two different values for 'Prop.-c' (0.3 and 0.6) combined with two different values for 'kmax' (10 and 50).
### Components/Axes
* **X-axis:** "number of interferers (r)", ranging from 1 to 7, with tick marks at each integer value.
* **Y-axis:** "average k", ranging from 1 to 7, with tick marks at each integer value.
* **Legend:** Located in the top-left corner, identifying four data series:
* "Prop.-c = 0.3, kmax = 10" (represented by a dotted blue line)
* "Prop.-c = 0.3, kmax = 50" (represented by a purple line with 'x' markers)
* "Prop.-c = 0.6, kmax = 10" (represented by a green line with diamond markers)
* "Prop.-c = 0.6, kmax = 50" (represented by a yellow line with circle markers)
### Detailed Analysis
Here's a breakdown of each data series, including observed trends and approximate data points:
1. **Prop.-c = 0.3, kmax = 10 (Dotted Blue Line):** This line shows a generally increasing trend, but with diminishing returns.
* r = 1: k ≈ 1.8
* r = 2: k ≈ 2.1
* r = 3: k ≈ 2.4
* r = 4: k ≈ 2.7
* r = 5: k ≈ 3.0
* r = 6: k ≈ 3.2
* r = 7: k ≈ 3.4
2. **Prop.-c = 0.3, kmax = 50 (Purple Line with 'x' Markers):** This line also increases, but at a faster rate than the previous one, and appears to plateau towards the end.
* r = 1: k ≈ 1.9
* r = 2: k ≈ 2.2
* r = 3: k ≈ 2.6
* r = 4: k ≈ 3.1
* r = 5: k ≈ 3.6
* r = 6: k ≈ 4.0
* r = 7: k ≈ 4.2
3. **Prop.-c = 0.6, kmax = 10 (Green Line with Diamond Markers):** This line shows a moderate increasing trend.
* r = 1: k ≈ 1.8
* r = 2: k ≈ 2.1
* r = 3: k ≈ 2.5
* r = 4: k ≈ 3.0
* r = 5: k ≈ 3.4
* r = 6: k ≈ 3.8
* r = 7: k ≈ 4.0
4. **Prop.-c = 0.6, kmax = 50 (Yellow Line with Circle Markers):** This line exhibits the most significant increase, especially between r = 4 and r = 6.
* r = 1: k ≈ 1.9
* r = 2: k ≈ 2.2
* r = 3: k ≈ 2.6
* r = 4: k ≈ 3.4
* r = 5: k ≈ 5.2
* r = 6: k ≈ 5.5
* r = 7: k ≈ 5.7
### Key Observations
* Increasing 'kmax' from 10 to 50 generally leads to higher average 'k' values for both 'Prop.-c' settings.
* Increasing 'Prop.-c' from 0.3 to 0.6 also tends to increase average 'k', but the effect is more pronounced when 'kmax' is higher.
* The most dramatic increase in 'k' occurs when 'Prop.-c' is 0.6 and 'kmax' is 50, particularly as the number of interferers increases.
* The dotted blue line (Prop.-c = 0.3, kmax = 10) shows the smallest increase in 'k' across all values of 'r'.
### Interpretation
The chart demonstrates the impact of 'Prop.-c' and 'kmax' on the average value of 'k' as the number of interferers increases. 'k' likely represents some measure of performance or cost, and 'r' represents the level of interference. The data suggests that increasing 'kmax' and 'Prop.-c' can mitigate the negative effects of interference (higher 'r'), leading to improved performance (higher 'k').
The significant jump in the yellow line (Prop.-c = 0.6, kmax = 50) between r=4 and r=6 suggests a threshold effect. Beyond a certain level of interference, increasing both 'Prop.-c' and 'kmax' becomes crucial for maintaining performance. The relatively flat slope of the blue line indicates that with low values of both parameters, the system is highly susceptible to interference.
The choice of 'Prop.-c' and 'kmax' appears to be a trade-off. Higher values may improve performance under interference, but could also introduce other costs or complexities not reflected in this chart. Further investigation would be needed to understand the implications of these parameter settings in a broader context.
</details>
equivalently written as
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
where ζ = | ¯ a ∗ R, 1 ¯ b ∗ i,L,M -1 | (with ¯ a ∗ R, 1 the first element of ¯ a H R and ¯ b ∗ i,L,M is the last element of ¯ b i,L ) and Φ 2 ,i is the i -th column of the matrix Φ 2 given by
Similar to [36], [37], we convert the complex vectors and matrices to real-valued ones, i.e.,
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
<!-- formula-not-decoded -->
Note that w T ˜ ˘ Pw = || ˜ ˘ P 1 / 2 w || 2 2 , where ˜ ˘ P 1 / 2 is the principal square root of ˜ ˘ P . The convex optimization problem in Eq. (36) can be equivalently written as
<!-- formula-not-decoded -->
where ˘ q T 1 = [ 1 1 0 0 ] , ˘ Φ 2 ,i is the i -th column of ˘ Φ 2 , and ˇ Φ 2 ,i is the i -th column of ˇ Φ 2 . Note that the problem in Eq. (65) is a standard-form SOCP problem [38].
## ACKNOWLEDGMENT
The authors would like to thank Dr. Meng Guo for his helpful comments and suggestions.
## REFERENCES
- [1] R. C. Hendriks, T. Gerkmann, and J. Jensen, DFT-Domain Based SingleMicrophone Noise Reduction for Speech Enhancement: A Survey of the State of the Art . Morgan & Claypool, 2013.
- [3] K. Eneman et al., 'Evaluation of signal enhancement algorithms for hearing instruments,' in EURASIP Europ. Signal Process. Conf. (EUSIPCO) , Aug. 2008.
- [2] M. Brandstein and D. Ward (Eds.), Microphone arrays: signal processing techniques and applications . Springer, 2001.
- [4] S. Doclo and M. Moonen, 'GSVD-based optimal filtering for single and multimicrophone speech enhancement,' IEEE Trans. Signal Process. , vol. 50, no. 9, pp. 2230-2244, Sept. 2002.
- [5] A. Spriet, M. Moonen, and J. Wouters, 'Spatially pre-processed speech distortion weighted multi-channel Wiener filtering for noise reduction,' Signal Process. , vol. 84, no. 12, pp. 2367-2387, Dec. 2004.
- [7] B. D. Van Veen and K. M. Buckley, 'Beamforming: A versatile approach to spatial filtering,' IEEE ASSP Mag. , vol. 5, no. 5, pp. 4-24, Apr. 1988.
- [6] J. Capon, 'High-resolution frequency-wavenumber spectrum analysis,' Proc. IEEE , vol. 57, no. 8, pp. 1408-1418, Aug. 1969.
- [8] O. L. Frost III, 'An algorithm for linearly constrained adaptive array processing,' Proceedings of the IEEE , vol. 60, no. 8, pp. 926-935, Aug. 1972.
- [10] T. Van den Bogaert, T. J. Klasen, L. Van Deun, J. Wouters, and M. Moonen, 'Horizontal localization with bilateral hearing aids: without is better than with,' J. Acoust. Soc. Amer. , vol. 119, no. 1, pp. 515-526, Jan. 2006.
- [9] J. M. Kates, Digital hearing aids . Plural publishing, 2008.
- [11] A. W. Bronkhorst, 'The cocktail party phenomenon: A review of research on speech intelligibility in multiple-talker conditions,' Acta Acoustica , vol. 86, no. 1, pp. 117-128, 2000.
- [13] H. Schmidt, A. B. Baggeroer, W. A. Kuperman, and E. K. Scheer, 'Environmentally tolerant beamforming for high-resolution matched field processing: deterministic mismatch,' J. Acoust. Soc. Amer. , vol. 88, no. 4, Oct. 1990.
- [12] S. Markovich, S. Gannot, and I. Cohen, 'Multichannel eigenspace beamforming in a reverberant noisy environment with multiple interfering speech signals,' IEEE Trans. Audio, Speech, Language Process. , vol. 17, no. 6, pp. 1071-1086, Aug. 2009.
- [14] S. A. Vorobyov, 'Principles of minimum variance robust adaptive beamforming design,' ELSEVIER Signal Process. , vol. 93, no. 12, pp. 3264-3277, Dec. 2013.
- [16] S. Gazor and W. Zhang, 'Speech probability distribution,' IEEE Signal Process. Lett. , vol. 10, no. 7, pp. 204-207, Jul. 2003.
- [15] R. C. Hendriks, R. Heusdens, U. Kjems, and J. Jensen, 'On optimal multichannel mean-squared error estimators for speech enhancement,' IEEE Signal Process. Lett. , vol. 16, no. 10, pp. 885-888, Oct. 2009.
- [17] R. Martin, 'Speech enhancement based on minimum mean-square error estimation and supergaussian priors,' IEEE Trans. Speech Audio Process. , vol. 13, no. 5, pp. 845-856, Sep. 2005.
- [19] P. Vary and R. Martin, Digital speech transmission: Enhancement, coding and error concealment . John Wiley & Sons, 2006.
- [18] J. S. Erkelens, R. C. Hendriks, R. Heusdens, and J. Jensen, 'Minimum mean-square error estimation of discrete Fourier coefficients with generalized gamma priors,' IEEE Trans. Audio, Speech, Language Process. , vol. 15, no. 6, pp. 1741-1752, Aug. 2007.
- [20] S. Doclo, W. Kellermann, S. Makino, and S. Nordholm, 'Multichannel signal enhancement algorithms for assisted listening devices,' IEEE Signal Process. Mag. , vol. 32, no. 2, pp. 18-30, Mar. 2015.
- [22] T. Klasen, T. Van den Bogaert, M. Moonen, and J. Wouters, 'Binaural noise reduction algorithms for hearing aids that preserve interaural time delay cues,' IEEE Trans. Signal Process. , vol. 55, no. 4, pp. 1579-1585, Apr. 2007.
- [21] T. J. Klasen, T. Van den Bogaert, M. Moonen, and J. Wouters, 'Preservation of interaural time delay for binaural hearing aids through multichannel Wiener filtering based noise reduction,' in IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , Mar. 2005, pp. 29-32.
- [23] S. Doclo, T. J. Klasen, T. Van den Bogaert, J. Wouters, and M. Moonen, 'Theoretical analysis of binaural cue preservation using multi-channel Wiener filtering and interaural transfer functions,' in Int. Workshop Acoustic Echo, Noise Control (IWAENC) , Sep. 2006.
- [25] D. Marquardt, E. Hadad, S. Gannot, and S. Doclo, 'Theoretical analysis of linearly constrained multi-channel Wiener filtering algorithms for combined noise reduction and binaural cue preservation in binaural hearing aids,' IEEE Trans. Audio, Speech, Language Process. , vol. 23, no. 12, Sept. 2015.
- [24] T. Van den Bogaert, S. Doclo, J. Wouters, and M. Moonen, 'The effect of multimicrophone noise reduction systems on sound source localization by users of binaural hearing aids,' J. Acoust. Soc. Amer. , vol. 124, no. 1, pp. 484-497, July 2008.
- [26] E. Hadad, S. Gannot, and S. Doclo, 'Binaural linearly constrained minimum variance beamformer for hearing aid applications,' in Int. Workshop Acoustic Signal Enhancement (IWAENC) , Sep. 2012, pp. 1-4.
- [28] D. Marquardt, E. Hadad, S. Gannot, and S. Doclo, 'Optimal binaural lcmv beamformers for combined noise reduction and binaural cue preservation,' in Int. Workshop Acoustic Signal Enhancement (IWAENC) , Sep. 2014, pp. 288-292.
- [27] E. Hadad, S. Doclo, and S. Gannot, 'The binaural LCMV beamformer and its performance analysis,' IEEE Trans. Audio, Speech, Language Process. , vol. 24, no. 3, pp. 543-558, Mar. 2016.
- [29] A. I. Koutrouvelis, R. C. Hendriks, J. Jensen, and R. Heusdens, 'Improved multi-microphone noise reduction preserving binaural cues,' in IEEE Int. Conf. Acoust., Speech, Signal Process. (ICASSP) , Mar. 2016.
- [31] B. Cornelis, S. Doclo, T. Van den Bogaert, M. Moonen, and J. Wouters, 'Theoretical analysis of binaural multimicrophone noise reduction techniques,' IEEE Trans. Audio, Speech, Language Process. , vol. 18, no. 2, pp. 342-355, Feb. 2010.
- [30] E. Hadad, D. Marquardt, S. Doclo, and S. Gannot, 'Theoretical analysis of binaural transfer function MVDR beamformers with interference cue preservation constraints,' IEEE Trans. Audio, Speech, Language Process. , vol. 23, no. 12, pp. 2449-2464, Dec. 2015.
- [32] J. G. Desloge, W. M. Rabinowitz, and P. M. Zurek, 'Microphone-array hearing aids with binaural output .I. Fixed-processing systems,' IEEE Trans. Speech Audio Process. , vol. 5, no. 6, pp. 529-542, Nov. 1997.
- [34] F. L. Wightman and D. J. Kistler, 'The dominant role of low-frequency interaural time differences in sound localization,' J. Acoust. Soc. Amer. , vol. 91, no. 3, pp. 1648-1661, Mar. 1992.
- [33] W. M. Hartmann, 'How we localize sound,' Physics Today , vol. 52, no. 11, pp. 24-29, Nov. 1999.
- [35] D. P. Welker, J. E. Greenberg, J. G. Desloge, and P. M. Zurek, 'Microphone-array hearing aids with binaural output .II. A twomicrophone adaptive system,' IEEE Trans. Speech Audio Process. , vol. 5, no. 6, pp. 543-551, Nov. 1997.
- [37] R. G. Lorenz and S. P. Boyd, 'Robust minimum variance beamforming,' IEEE Trans. Signal Process. , vol. 53, no. 5, pp. 1684-1696, May 2005.
- [36] S. A. Vorobyov, A. B. Gershman, and Z. Q. Luo, 'Robust adaptive beamforming using worst-case performance optimization: A solution to the signal mismatch problem,' IEEE Trans. Signal Process. , vol. 51, no. 2, pp. 313-323, Feb. 2003.
- [38] S. Boyd and L. Vandenberghe, Convex Optimization . Cambridge University Press, 2004.
- [40] J. Allen, 'Short-term spectral analysis, and modification by discrete Fourier transform,' IEEE Trans. Acoust., Speech, Signal Process. , vol. 25, no. 3, pp. 235-238, June 1977.
- [39] H. Kayser, S. Ewert, J. Annemuller, T. Rohdenburg, V. Hohmann, and B. Kollmeier, 'Database of multichannel in-ear and behind-the-ear headrelated and binaural room impulse responses,' EURASIP J. Advances Signal Process. , vol. 2009, pp. 1-10, Dec. 2009.